mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
synced 2025-09-01 06:39:05 +00:00
loongarch-next
2837 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
30d61efe11 |
9p: fix ->rename_sem exclusion
9p wants to be able to build a path from given dentry to fs root and keep it valid over a blocking operation. ->s_vfs_rename_mutex would be a natural candidate, but there are places where we need that and where we have no way to tell if ->s_vfs_rename_mutex is already held deeper in callchain. Moreover, it's only held for cross-directory renames; name changes within the same directory happen without it. Solution: * have d_move() done in ->rename() rather than in its caller * maintain a 9p-private rwsem (per-filesystem) * hold it exclusive over the relevant part of ->rename() * hold it shared over the places where we want the path. That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move() and d_exchange() calls under filesystem's control. However, there's also __d_unalias(), which isn't covered by any of that. If ->lookup() hits a directory inode with preexisting dentry elsewhere (due to e.g. rename done on server behind our back), d_splice_alias() called by ->lookup() will move/rename that alias. Add a couple of optional methods, so that __d_unalias() would do if alias->d_op->d_unalias_trylock != NULL if (!alias->d_op->d_unalias_trylock(alias)) fail (resulting in -ESTALE from lookup) __d_move(...) if alias->d_op->d_unalias_unlock != NULL alias->d_unalias_unlock(alias) where it currently does __d_move(). 9p instances do down_write_trylock() and up_write() of ->rename_mutex. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
![]() |
5be1fa8abd |
Pass parent directory inode and expected name to ->d_revalidate()
->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
![]() |
9c5968db9e |
The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec. - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones. - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest. - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code. - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups. - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code. - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c. - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator. - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading. - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated (https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/). Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED). - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled. - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL. - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests. - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size. - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic. - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated. - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated. - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed. - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic. - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy. - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic. - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions. - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed. - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting. - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface. - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior. - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi "introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors." - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram. - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal. - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance. - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation. - "Cleanup for memfd_create()" from Isaac Manjarres does that thing. - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration. - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices. - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE= =Ib2s -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ... |
||
![]() |
c159dfbdd4 |
Mainly individually changelogged singleton patches. The patch series in
this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code. - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code. - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments. - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate. - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API. - "Convert ocfs2 to use folios" from Matthew Wilcox does that. - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places. - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work. - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work. - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image. - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc. - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger. - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code. - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5SP5QAKCRDdBJ7gKXxA jqN7AQChvwXGG43n4d5SDiA/rH7ddvowQcDqhC9cAMJ1ReR7qwEA8/LIWDE4PdMX mJnaZ1/ibpEpearrChCViApQtcyEGQI= =ti4E -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ... |
||
![]() |
91fe0e4d04 |
Documentation/filesystems/proc.rst: fix possessive form of "process"
The possessive form of "process" is "process's". Fix up various misdirected attempts at this. Also reflow some paragraphs. Cc: David Hildenbrand <david@redhat.com> Cc: Wang Yaxin <wang.yaxin@zte.com.cn> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
3ab76c767b |
ksm: add ksm involvement information for each process
In /proc/<pid>/ksm_stat, add two extra ksm involvement items including KSM_mergeable and KSM_merge_any. It helps administrators to better know the system's KSM behavior at process level. ksm_merge_any: yes/no whether the process'mm is added by prctl() into the candidate list of KSM or not, and fully enabled at process level. ksm_mergeable: yes/no whether any VMAs of the process'mm are currently applicable to KSM. Purpose ======= These two items are just to improve the observability of KSM at process level, so that users can know if a certain process has enabled KSM. For example, if without these two items, when we look at /proc/<pid>/ksm_stat and there's no merging pages found, We are not sure whether it is because KSM was not enabled or because KSM did not successfully merge any pages. Although "mg" in /proc/<pid>/smaps indicate VM_MERGEABLE, it's opaque and not very obvious for non professionals. [akpm@linux-foundation.org: wording tweaks, per David and akpm] Link: https://lkml.kernel.org/r/20250110174034304QOb8eDoqtFkp3_t8mqnqc@zte.com.cn Signed-off-by: xu xin <xu.xin16@zte.com.cn> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Cc: Wang Yaxin <wang.yaxin@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
f2cad9850e |
Documentation: update the Squashfs filesystem documentation
This patch updates the following which are out of date. - Zstd has been added to the compression algorithms supported. - The filesystem mailing list (for the kernel code) is changed to linux-fsdevel rather than the now very little used Sourceforge mailing list. - The Squashfs website has been changed to the Squashfs-tools github repository. - The fact that Squashfs-tools is likely packaged by the linux distribution is mentioned. Link: https://lkml.kernel.org/r/20241229233752.54481-4-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
a7040a06e4 |
fuse: Add fuse-io-uring design documentation
[Add several documentation updates I had missed after renaming functions and also fixes 'make htmldocs'.] Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> |
||
![]() |
d0f93ac2c3 |
Documentation changes this time around include:
- Quite a bit of Chinese and Spanish translation work. - Clarifying that Git commit IDs >12chars are OK - A new nvme-multipath document - A reorganization of the admin-guide top-level page to make it readable - Clarification of the role of Acked-by and maintainer discretion on their acceptance. - Some reorganization of debugging-oriented docs. ...and typo fixes, documentation updates, etc. as usual. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmeOp8EPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YipUH/iffvlVYuqoVdPUFWdmsiNjwOCRE2MIfp8qO tPTRRHJAny+NlFT0IWlGUbLNoNXtvpN47YlkaeAjdrsjASerfpwzje7t4Z1B+jWT 0YwGBCvDIGasfRCx7D14+w5aqkEEynfsy+QurwcuDxcHMQGwt7ZCuTNOVO6BULEr L++BMwqapUr5IemP4ItQqDVVF9sp6bWEhaOnTTJCLU6oG23uUSSA/59sJmwDJUk7 6J3VGO1An4Jte9WX7qkVrSBNO5cOOhaFiFXIeNxfOioOPctBwxKiHDJnzVud8ipz R+tnUI/8hEvyJ7GZFezyZxmMnFs0P2DEYAkaN+hBs/nUjx0dKUg= =YxaS -----END PGP SIGNATURE----- Merge tag 'docs-6.14' of git://git.lwn.net/linux Pull Documentation updates from Jonathan Corbet: - Quite a bit of Chinese and Spanish translation work - Clarifying that Git commit IDs >12chars are OK - A new nvme-multipath document - A reorganization of the admin-guide top-level page to make it readable - Clarification of the role of Acked-by and maintainer discretion on their acceptance - Some reorganization of debugging-oriented docs ... and typo fixes, documentation updates, etc as usual * tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits) Documentation: Fix x86_64 UEFI outdated references to elilo Documentation/sysctl: Add timer_migration to kernel.rst docs/mm: Physical memory: Remove zone_t docs: submitting-patches: clarify that signers may use their discretion on tags docs: submitting-patches: clarify difference between Acked-by and Reviewed-by docs: submitting-patches: clarify Acked-by and introduce "# Suffix" Documentation: bug-hunting.rst: remove odd contact information docs/zh_CN: Add sak index Chinese translation doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes doc: module: Fix documented type of namespace Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst docs/zh_CN: Add landlock index Chinese translation Documentation: Fix typo localmodonfig -> localmodconfig overlayfs.rst: Fix and improve grammar docs/zh_CN: Add siphash index Chinese translation docs/zh_CN: Add security IMA-templates Chinese translation docs/zh_CN: Add security digsig Chinese translation Align git commit ID abbreviation guidelines and checks docs: process: submitting-patches: split canonical patch format section docs/zh_CN: Add security lsm Chinese translation ... |
||
![]() |
2622f29041 |
bcachefs updates for 6.14-rc1
Lots of scalability work, another big on disk format change. On disk format version goes from 1.13 to 1.20. Like 6.11, this is another big and expensive automatic/required on disk format upgrade. This is planned to be the last big on disk format upgrade before the experimental label comes off. There will be one more minor on disk format update for a few things that couldn't make this release. Headline improvements: - Fix mount time regression that some users encountered post the 6.11 disk accounting rewrite. Accounting keys were encoded little endian (typetag in the low bits) - which didn't anticipate adding accounting keys for every inode, which aren't stored in memory and we don't want to scan at mount time. - fsck time on large filesystems is improved by multiple orders of magnitude. Previously, 100TB was about the practical max filesystem size, where users were reporting fsck times of a day+. With the new changes (which nearly eliminate backpointers fsck overhead), we fsck'd a filesystem with 10PB of data in 1.5 hours. The problematic fsck passes were walking every extent and checking for missing backpointers, and walking every backpointer to check for dangling backpointers. As we've been adding more and more runtime self healing there was no reason to keep around the backpointers -> extents pass; dangling backpointers are just deleted, and we can do that when using them - thus, backpointers -> extents is now only run in debug mode. extents -> backpointers does need to exist, since missing backpointers would mean we can't find data to move it (for e.g. copygc, device evacuate, scrub). But the new on disk format version makes possible a new strategy where we sum up backpointers within a bucket and check it against the bucket sector counts, and then only scan for missing backpointers if the counts are off (and then, only for specific buckets). Full list of on disk format changes: - 1.14: backpointer_bucket_gen Backpointers now have a field for the bucket generation number, replacing the obsolete bucket_offset field. This is needed for the new "sum up backpointers within a bucket" code, since backpointers use the btree write buffer - meaning we will see stale reads, and this runs online, with the filesystem in full rw mode. - 1.15: disk_accounting_big_endian As previously described, fix the endianness of accounting keys so that accounting keys with the same typetag sort together, and accounting read can skip types it's not interested in. - 1.16: reflink_p_may_update_opts: This version indicates that a new reflink pointer field is understood and may be used; the field indicates whether the reflink pointer has permissions to update IO path options (e.g. compression, replicas) may be updated on the indirect extent it points to. This completes the rebalance/reflink data path option handling from the 6.13 pull request. - 1.17: inode_depth Add a new inode field, bi_depth, to accelerate the check_directory_structure fsck path, which checks for loops in the filesystem heirarchy. check_inodes and check_dirents check connectivity, so check_directory_structure only has to check for loops - by walking back up to the root from every directory. But a path can't be a loop if it has a counter that increases monotonically from root to leaf - adding a depth counter means that we can check for loops with only local (parent -> child) checks. We might need to occasionally renumber the depth field in fsck if directories have been moved around, but then future fsck runs will be much faster. - 1.18: persistent_inode_cursors Previously, the cursor used for inode allocation was only kept in memory, which meant that users with large filesystems and lots of files were reporting that the first create after mounting would take awhile - since it had to scan from the start. Inode allocation cursors are now persistent, and also include a generation field (incremented on wraparound, which will only happen if inode allocation is restricted to 32 bit inodes), so that we don't have to leave inode_generation keys around after a delete. The option for 32 bit inode numbers may now also be set on individual directories, and non-32 bit inode allocations are disallowed from allocating from the 32 bit part of the inode number space. - 1.19: autofix_errors Runtime self healing is now the default.o - 1.20: directory size (from Hongbo) directory i_size is now meaningful, and not 0. Release notes from the previous 6.13 pull request: - Self healing work: Allocator and reflink now run the exact same check/repair code that fsck does at runtime, where applicable. The long term goal here is to remove inconsistent() errors (that cause us to go emergency read only) by lifting fsck code up to normal runtime paths; we should only go emergency read-only if we detect an inconsistency that was due to a runtime bug - or truly catastrophic damage (corrupted btree roots/interior nodes). - Reflink repair no longer deletes reflink pointers: instead we flip an error bit and log the error, and they can still be deleted by file deletion. This means a temporary failure to find an indirect extent (perhaps repaired later by btree node scan) won't result in unnecessary data loss - Improvements to rebalance data path option handling: we can now correctly apply changed filesystem-level io path options to pending rebalance work, and soon we'll be able to apply file-level io path option changes to indirect extents. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmeOiboACgkQE6szbY3K bnY8zQ//Yoy+5ZA07tQV+Fi0JV0DZ6w3xotxNhAUeaKgCKHgp37gcKa47TFir4pd 6ha7PQV3GimFwHoIUfOY5X4Y+bEm16XblyfK3VU6IgGiE3cUg+1q8b8WrD2eHmLJ qIT8DWWpAM2AcZ/f5G37hH8pxn2t0TUuzJ1Sz7wEhJUNZEP+z+qaacnGhwuc8yQ3 Srj7Cc/NSd9T+6G2yKhERFITUrXmqVGgGihhVZqs0hCAPt8bwn5K8d1H2IKoj1N6 jJ3MQfmPIzUk0mfIjHrBlqrA+3tjtt5LGU+QpOWs8g509xHCP0BfGGOXQhjMjHVI JVSqAuIENK4V1ubz7BZcSoPAVncPeFl8Ly5Qdw5FlDBux9kKsch8wJPjn1A1gkPt Fb9VBTRkCK7WqUzkmbQh152SNC/0plb/8qFjywHNkvYyGMMlJME8zDIg40RN+0Ql ckXjlvdVGm0GbyM2GLth4gbOSXDzKrq12i3rWROnOLZ0Q2SBKfJe5K0UdRat1/nu 2sWWJNJqDzaaP1Gd/qk3Yht06GWnhI/17Bl/Znt5M8rxtSBbbxO58vi3gxasbccS l3qozuNouvAMNRBqE4ayVtjV+Aj69j1IBJnAfCareDDDf6ugjooLqu27BQkLOPg7 wswq633T6WG+UfQ44GvseiCaDW5MMh0aq7vxzjnBUoTz5usMfxg= =d0Zb -----END PGP SIGNATURE----- Merge tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs Pull bcachefs updates from Kent Overstreet: "Lots of scalability work, another big on-disk format change. On-disk format version goes from 1.13 to 1.20. Like 6.11, this is another big and expensive automatic/required on disk format upgrade. This is planned to be the last big on disk format upgrade before the experimental label comes off. There will be one more minor on disk format update for a few things that couldn't make this release. Headline improvements: - Self healing work: Allocator and reflink now run the exact same check/repair code that fsck does at runtime, where applicable. The long term goal here is to remove inconsistent() errors (that cause us to go emergency read only) by lifting fsck code up to normal runtime paths; we should only go emergency read-only if we detect an inconsistency that was due to a runtime bug - or truly catastrophic damage (corrupted btree roots/interior nodes). - Reflink repair no longer deletes reflink pointers: Instead we flip an error bit and log the error, and they can still be deleted by file deletion. This means a temporary failure to find an indirect extent (perhaps repaired later by btree node scan) won't result in unnecessary data loss - Improvements to rebalance data path option handling: We can now correctly apply changed filesystem-level io path options to pending rebalance work, and soon we'll be able to apply file-level io path option changes to indirect extents - Fix mount time regression that some users encountered post the 6.11 disk accounting rewrite. Accounting keys were encoded little endian (typetag in the low bits) - which didn't anticipate adding accounting keys for every inode, which aren't stored in memory and we don't want to scan at mount time. - fsck time on large filesystems is improved by multiple orders of magnitude. Previously, 100TB was about the practical max filesystem size, where users were reporting fsck times of a day+. With the new changes (which nearly eliminate backpointers fsck overhead), we fsck'd a filesystem with 10PB of data in 1.5 hours. The problematic fsck passes were walking every extent and checking for missing backpointers, and walking every backpointer to check for dangling backpointers. As we've been adding more and more runtime self healing there was no reason to keep around the backpointers -> extents pass; dangling backpointers are just deleted, and we can do that when using them - thus, backpointers -> extents is now only run in debug mode. extents -> backpointers does need to exist, since missing backpointers would mean we can't find data to move it (for e.g. copygc, device evacuate, scrub). But the new on disk format version makes possible a new strategy where we sum up backpointers within a bucket and check it against the bucket sector counts, and then only scan for missing backpointers if the counts are off (and then, only for specific buckets). Full list of on disk format changes: - 1.14: backpointer_bucket_gen Backpointers now have a field for the bucket generation number, replacing the obsolete bucket_offset field. This is needed for the new "sum up backpointers within a bucket" code, since backpointers use the btree write buffer - meaning we will see stale reads, and this runs online, with the filesystem in full rw mode. - 1.15: disk_accounting_big_endian As previously described, fix the endianness of accounting keys so that accounting keys with the same typetag sort together, and accounting read can skip types it's not interested in. - 1.16: reflink_p_may_update_opts: This version indicates that a new reflink pointer field is understood and may be used; the field indicates whether the reflink pointer has permissions to update IO path options (e.g. compression, replicas) may be updated on the indirect extent it points to. This completes the rebalance/reflink data path option handling from the 6.13 pull request. - 1.17: inode_depth Add a new inode field, bi_depth, to accelerate the check_directory_structure fsck path, which checks for loops in the filesystem heirarchy. check_inodes and check_dirents check connectivity, so check_directory_structure only has to check for loops - by walking back up to the root from every directory. But a path can't be a loop if it has a counter that increases monotonically from root to leaf - adding a depth counter means that we can check for loops with only local (parent -> child) checks. We might need to occasionally renumber the depth field in fsck if directories have been moved around, but then future fsck runs will be much faster. - 1.18: persistent_inode_cursors Previously, the cursor used for inode allocation was only kept in memory, which meant that users with large filesystems and lots of files were reporting that the first create after mounting would take awhile - since it had to scan from the start. Inode allocation cursors are now persistent, and also include a generation field (incremented on wraparound, which will only happen if inode allocation is restricted to 32 bit inodes), so that we don't have to leave inode_generation keys around after a delete. The option for 32 bit inode numbers may now also be set on individual directories, and non-32 bit inode allocations are disallowed from allocating from the 32 bit part of the inode number space. - 1.19: autofix_errors Runtime self healing is now the default.o - 1.20: directory size (from Hongbo) directory i_size is now meaningful, and not 0" * tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs: (268 commits) bcachefs: Fix check_inode_hash_info_matches_root() bcachefs: Document issue with bch_stripe layout bcachefs: Fix self healing on read error bcachefs: Pop all the transactions from the abort one bcachefs: Only abort the transactions in the cycle bcachefs: Introduce lock_graph_pop_from bcachefs: Convert open-coded lock_graph_pop_all to helper bcachefs: Do not allow no fail lock request to fail bcachefs: Merge the condition to avoid additional invocation Revert "bcachefs: Fix bch2_btree_node_upgrade()" bcachefs: bcachefs_metadata_version_directory_size bcachefs: make directory i_size meaningful bcachefs: check_unreachable_inodes is not actually PASS_ONLINE yet bcachefs: Don't use BTREE_ITER_cached when walking alloc btree during fsck bcachefs: Check for dirents to overwritten inodes bcachefs: bch2_btree_iter_peek_slot() handles navigating to nonexistent depth bcachefs: Don't set btree_path to updtodate if we don't fill bcachefs: __bch2_btree_pos_to_text() bcachefs: printbuf_reset() handles tabstops bcachefs: Silence read-only errors when deleting snapshots ... |
||
![]() |
f7862dfef6 |
saner replacement for debugfs_rename()
Existing primitive has several problems: 1) calling conventions are clumsy - it returns a dentry reference that is either identical to its second argument or is an ERR_PTR(-E...); in both cases no refcount changes happen. Inconvenient for users and bug-prone; it would be better to have it return 0 on success and -E... on failure. 2) it allows cross-directory moves; however, no such caller have ever materialized and considering the way debugfs is used, it's unlikely to happen in the future. What's more, any such caller would have fun issues to deal with wrt interplay with recursive removal. It also makes the calling conventions clumsier... 3) tautological rename fails; the callers have no race-free way to deal with that. 4) new name must have been formed by the caller; quite a few callers have it done by sprintf/kasprintf/etc., ending up with considerable boilerplate. Proposed replacement: int debugfs_change_name(dentry, fmt, ...). All callers convert to that easily, and it's simpler internally. IMO debugfs_rename() should go; if we ever get a real-world use case for cross-directory moves in debugfs, we can always look into the right way to handle that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20250112080705.141166-21-viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
4a489220aa |
nfs: probe for LOCALIO when v3 client reconnects to server
Re-enabling NFSv3 LOCALIO is made more complex (than NFSv4) because v3 is stateless. As such, the hueristic used to identify a LOCALIO probe point is more adhoc by nature: if/when NFSv3 client IO begins to complete again in terms of normal RPC-based NFSv3 server IO, attempt nfs_local_probe_async(). Care is taken to throttle the frequency of nfs_local_probe_async(), otherwise there could be a flood of repeat calls to nfs_local_probe_async(). The throttle is admin controlled using a new module parameter for nfsv3, e.g.: echo 512 > /sys/module/nfsv3/parameters/nfs3_localio_probe_throttle Probe for NFSv3 LOCALIO every N IO requests (512 in this case). Must be power-of-2, defaults to 0 (probing disabled). On systems that expect to use LOCALIO with NFSv3 the admin should configure the 'nfs3_localio_probe_throttle' module parameter. This commit backfills module parameter documentation in localio.rst Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> |
||
![]() |
b33f7dec3a |
nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
Also update Documentation/filesystems/nfs/localio.rst accordingly and reduce the technical documentation debt that was previously captured in that document. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> |
||
![]() |
3feec68563 |
nfs/localio: add direct IO enablement with sync and async IO support
This commit simply adds the required O_DIRECT plumbing. It doesn't address the fact that NFS doesn't ensure all writes are page aligned (nor device logical block size aligned as required by O_DIRECT). Because NFS will read-modify-write for IO that isn't aligned, LOCALIO will not use O_DIRECT semantics by default if/when an application requests the use of O_DIRECT. Allow the use of O_DIRECT semantics by: 1: Adding a flag to the nfs_pgio_header struct to allow the NFS O_DIRECT layer to signal that O_DIRECT was used by the application 2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that when enabled will cause LOCALIO to use O_DIRECT semantics (this may cause IO to fail if applications do not properly align their IO). This commit is derived from code developed by Weston Andros Adamson. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> |
||
![]() |
a883764111 |
overlayfs.rst: Fix and improve grammar
- Correct "in a way the" to "in a way that", - Add a comma to improve readability. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/cf07f705d63f04ebf7ba4ecafdc9ab6f63960e3d.1736239148.git.geert+renesas@glider.be |
||
![]() |
135ec43eb2
|
fiemap: use kernel-doc includes in fiemap docbook
Add some kernel-doc notation to structs in fiemap header files then pull that into Documentation/filesystems/fiemap.rst instead of duplicating the header file structs in fiemap.rst. This helps to future-proof fiemap.rst against struct changes. Add missing flags documentation from header files into fiemap.rst for FIEMAP_FLAG_CACHE and FIEMAP_EXTENT_SHARED. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20241121011352.201907-1-rdunlap@infradead.org Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
71008e5d6f |
docs: filesystems: bcachefs: fixed some spelling mistakes in the bcachefs coding style page
Specifically, fixed spelling of "commit" and pluralization of last sentence. Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
||
![]() |
9fb89b9765 |
Documentation: filesystems: fix two misspells
This fixes two small misspells in the filesystems documentation. Signed-off-by: Bingwu Zhang <xtex@aosc.io> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241208035447.162465-2-xtex@envs.net |
||
![]() |
dfddf35310 |
Documentation: Fix simple typo on filesystems/porting.rst
Just spotted this while reading the doc. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241213151743.23435-1-cem@kernel.org |
||
![]() |
7d4050728c |
vfs-6.13-rc1.fixes
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ0YDhQAKCRCRxhvAZXjc ouy/AQDds2VXT3baRn5mvLOnWaN9tez+TnPLUKbS8m4srJUrGgD/SQkYc14vANGL iIw6oAhDhdzrjrm0rxr2COXah6me7g0= =Xu0X -----END PGP SIGNATURE----- Merge tag 'vfs-6.13-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a few iomap bugs - Fix a wrong argument in backing file callback - Fix security mount option retrieval in statmount() - Cleanup how statmount() handles unescaped options - Add a missing inode_owner_or_capable() check for setting write hints - Clear the return value in read_kcore_iter() after a successful iov_iter_zero() - Fix a mount_setattr() selftest - Fix function signature in mount api documentation - Remove duplicate include header in the fscache code * tag 'vfs-6.13-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/backing_file: fix wrong argument in callback fs_parser: update mount_api doc to match function signature fs: require inode_owner_or_capable for F_SET_RW_HINT fs/proc/kcore.c: Clear ret value in read_kcore_iter after successful iov_iter_zero statmount: fix security option retrieval statmount: clean up unescaped option handling fscache: Remove duplicate included header iomap: elide flush from partial eof zero range iomap: lift zeroed mapping handling into iomap_zero_range() iomap: reset per-iter state on non-error iter advances iomap: warn on zero range of a post-eof folio selftests/mount_setattr: Fix failures on 64K PAGE_SIZE kernels |
||
![]() |
e68ce9474a |
A few late-arriving fixes, plus two more significant changes that were
*almost* ready at the beginning of the merge window: - A new document on debugging techniques from Sebastian Fricke - A clarification on MODULE_LICENSE terms meant to head off the sort of confusion that led to the recent Tuxedo Computers mess. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmdFEf0PHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YLCQH+wY0lGEF5BloFrNOcwKoB96rXQjLMlPVpccP lWVprapS+NrlhTq4RZ9b6qbQ1RAdu0JCppew1viwclO8g8SmUoXmqNnlYIFH+3MB HZETbPWUHK2BRQqV7h3VkgvO30hUa0kHL3WfmKpGEG1P6FsQQ5o3WDi3YN8GM6xk tfHSiR4rgBw40VLyeDtRi++aEgYa/DfWpdtco58poCiAS6soTDDEWCxSBdibeDOQ YDuj1NtqieMk963z8CoJm/Qbw/ZLfW2jd3A43cZ0h6g/oloVYSucFcMjXpePvoZr 9BSkU9OyX5BRfhU/6EbU8eWYjgu0BuBk5uvCwkQgHz1p05MGRDE= =MmbD -----END PGP SIGNATURE----- Merge tag 'docs-6.13-2' of git://git.lwn.net/linux Pull more documentation updates from Jonathan Corbet: "A few late-arriving fixes, plus two more significant changes that were *almost* ready at the beginning of the merge window: - A new document on debugging techniques from Sebastian Fricke - A clarification on MODULE_LICENSE terms meant to head off the sort of confusion that led to the recent Tuxedo Computers mess" * tag 'docs-6.13-2' of git://git.lwn.net/linux: docs: Add debugging guide for the media subsystem docs: Add debugging section to process docs/licensing: Clarify wording about "GPL" and "Proprietary" docs: core-api/gfp_mask-from-fs-io: indicate that vmalloc supports GFP_NOFS/GFP_NOIO Documentation: kernel-doc: enumerate identifier *type*s Documentation: pwrseq: Fix trivial misspellings Documentation: filesystems: update filename extensions |
||
![]() |
44b4d13b70 |
f2fs-for-6.13-rc1
This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancement: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Bug fix: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmdD25MACgkQQBSofoJI UNKgLhAAgr0Dy/VWDgRlMovckq0q5EyQu/Jospv6mJyErQ4pZwwidNn9FSf0yua9 O0Pofs1zMFWoe5R2UOvwOnahmvlwD1nnRMylA10/9hp+/aKlTRxOI7HrdL5wFgWG QRTb/k+mgoEQk8+9ElThzq/CkmQovPEUfhoxW7bE4zH9kVoxi2klFbkASZynqEFe a+TVQoDUnXvb1cbvr4zEVuD79QEmazD/bgc+gquxChCHfzX8ip4R0aCZM1ceTgm/ Vru0LUKGQTWXPPReugJbOOtoIJ/kgD9Sg5xa7Icg3nxukgiYUDdl3e7MTgfvHOK6 Fwwj+ZbM/yV/gpAQp+g+uOkKSFqfulyOb+nzX5tmebmiT2Vs6XSQ0Xo+fjm7N1QC j0G1vwz91xETK/gw2U/zL/HQVB3IU/2dtBT2ek4x6kmVL3rmHYoI6r2ofQcEFjGn 2YQ9yvvT/fY6fza88kWO0PjgIRDzw9D9ihfZVyH9MCy5n6adhWlFXIg0HbAoecDE 6xsVjb5BVYJfQvVz3FauGRXu6i3mePaURC1rrf5NKFfAWJP7pDfi9IvSL56u2aMt J+RJ7a2u1l1z/yhBxtr00KhMP586OZHVJwQvwNJV7mzBFhvOlm3a4jTzbG35dE+V MfbbjR628y/0IkqZiB7YVu1NIF2qdbZosv4nO7b584Q1h1NH/PU= =LOgM -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancements: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Fixes: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks" * tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to drop all discards after creating snapshot on lvm device f2fs: add a sysfs node to limit max read extent count per-inode f2fs: fix to shrink read extent node in batches f2fs: print message if fscorrupted was found in f2fs_new_node_page() f2fs: clear SBI_POR_DOING before initing inmem curseg f2fs: fix changing cursegs if recovery fails on zoned device f2fs: adjust unusable cap before checkpoint=disable mode f2fs: fix to requery extent which cross boundary of inquiry f2fs: fix to adjust appropriate length for fiemap f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK} f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow f2fs: replace deprecated strcpy with strscpy Revert "f2fs: remove unreachable lazytime mount option parsing" f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode f2fs: fix to map blocks correctly for direct write f2fs: fix race in concurrent f2fs_stop_gc_thread f2fs: fix fiemap failure issue when page size is 16KB f2fs: remove redundant atomic file check in defragment f2fs: fix to convert log type to segment data type correctly f2fs: clean up the unused variable additional_reserved_segments ... |
||
![]() |
cf87766dd6
|
Merge branch 'ovl.fixes'
Bring in an overlayfs fix for v6.13-rc1 that fixes a bug introduced by the overlayfs changes merged for v6.13. Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
c66f759832
|
fs_parser: update mount_api doc to match function signature
Add the missing 'name' parameter to the mount_api documentation for
fs_validate_description().
Fixes:
|
||
![]() |
1e726223be |
Documentation: filesystems: update filename extensions
Update references to most txt files to rst files. Update one reference to an md file to a rst file. Update one file path to its current location. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org Cc: Ian Kent <raven@themaw.net> Cc: autofs@vger.kernel.org Cc: Alexander Aring <aahringo@redhat.com> Cc: David Teigland <teigland@redhat.com> Cc: gfs2@lists.linux.dev Cc: Eric Biggers <ebiggers@kernel.org> Cc: Theodore Y. Ts'o <tytso@mit.edu> Cc: fsverity@lists.linux.dev Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: ocfs2-devel@lists.linux.dev Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241120055246.158368-1-rdunlap@infradead.org |
||
![]() |
c01f664e4c |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmc/WXkACgkQnJ2qBz9k QNnwjAf/c8K3Vhw9RuKMtPF0K+gC//0mLsq+WmgrtXfMLvbSymrACnwHFJzpNGeS iEqCYlCC7vlqzPXpsVRlFeHpM52oVnE/wFF0Hp1h/Y1oqbRSzur6iSl4epmmBN+K AsPoWEXco7ABqtrhoZb0b1n7io9VorHN4nLhO6KWD83nZAawJDWgSw0sNCqcT6to vVxR3baP/EhONxNquxXe2lxq26dMilehmTk4AOyYslNYb0iG4r18TPyNb7fmuuKG M+nFfMnM9EPH8lnmgx6Mg/X77d/eZoq4pMRmeqSsroB5k/AQJnNrGweNL1+yr7OY adWNOMGWdNNQXPFgGbL5yZwNZ64kRA== =Eq1B -----END PGP SIGNATURE----- Merge tag 'reiserfs_delete' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs removal from Jan Kara: "The deprecation period of reiserfs is ending at the end of this year so it is time to remove it" * tag 'reiserfs_delete' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: The last commit |
||
![]() |
ba1f9c8fe3 |
arm64 updates for 6.13:
* Support for running Linux in a protected VM under the Arm Confidential Compute Architecture (CCA) * Guarded Control Stack user-space support. Current patches follow the x86 ABI of implicitly creating a shadow stack on clone(). Subsequent patches (already on the list) will add support for clone3() allowing finer-grained control of the shadow stack size and placement from libc * AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are getting close with the upcoming dpISA support) * Other arch features: - In-kernel use of the memcpy instructions, FEAT_MOPS (previously only exposed to user; uaccess support not merged yet) - MTE: hugetlbfs support and the corresponding kselftests - Optimise CRC32 using the PMULL instructions - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG - Optimise the kernel TLB flushing to use the range operations - POE/pkey (permission overlays): further cleanups after bringing the signal handler in line with the x86 behaviour for 6.12 * arm64 perf updates: - Support for the NXP i.MX91 PMU in the existing IMX driver - Support for Ampere SoCs in the Designware PCIe PMU driver - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC - Support for Samsung's 'Mongoose' CPU PMU - Support for PMUv3.9 finer-grained userspace counter access control - Switch back to platform_driver::remove() now that it returns 'void' - Add some missing events for the CXL PMU driver * Miscellaneous arm64 fixes/cleanups: - Page table accessors cleanup: type updates, drop unused macros, reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity check addresses before runtime P4D/PUD folding - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the FEAT_ECV for the generic timers) allowing Linux to boot with firmware deployments that don't set SCTLR_EL3.ECVEn - ACPI/arm64: tighten the check for the array of platform timer structures and adjust the error handling procedure in gtdt_parse_timer_block() - Optimise the cache flush for the uprobes xol slot (skip if no change) and other uprobes/kprobes cleanups - Fix the context switching of tpidrro_el0 when kpti is enabled - Dynamic shadow call stack fixes - Sysreg updates - Various arm64 kselftest improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmc5POIACgkQa9axLQDI XvEDYA//a3eeNkgMuGdnSCVcLz+zy+oNwAwboG/4X1DqL8jiCbI4npwugPx95RIA YZOUvo9T2aL3OyefpUHll4gFHqx9OwoZIig2F70TEUmlPsGUbh0KBkdfQF3xZPdl EwV0kHSGEqMWMBwsGJGwgCYrUaf1MUQzh1GBl7VJ2ts5XsJBaBeOyKkysij26wtZ V+aHq2IUx7qQS7+HC/4P6IoHxKziFcsCMovaKaynP4cw9xXBQbDMcNlHEwndOMyk pu2zrv7GG0j3KQuVP/2Alf5FKhmI0GVGP/6Nc/zsOmw96w8Kf7HfzEtkHawr2aRq rqg/c9ivzDn1p+fUBo4ZYtrRk4IAY+yKu6hdzdLTP5+bQrBTWTO9rjQVBm9FAGYT sCdEj1NqzvExvNHD7X6ut/GJ05lmce3K+qeSXSEysN9gqiT3eomYWMXrD2V2lxzb rIDDcb/icfaqjt14Mksh19r/rzNeq7noj9CGSmcqw0BHZfHzl38Lai6pdfYzCNyn vCM/c4c1D/WWX8/lifO1JZVbhDk1jy82Iphg2KEhL8iKPxDsKBBZLmYuU1oa7tMo WryGAz9+GQwd+W9chFuaOEtMnzvW2scEJ5Eb2fEf0Qj0aEurkL+C9dZR6o1GN77V DBUxtU628Ef4PJJGfbNCwZzdd8UPYG3a/mKfQQ3dz0oz2LySlW4= =wDot -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Support for running Linux in a protected VM under the Arm Confidential Compute Architecture (CCA) - Guarded Control Stack user-space support. Current patches follow the x86 ABI of implicitly creating a shadow stack on clone(). Subsequent patches (already on the list) will add support for clone3() allowing finer-grained control of the shadow stack size and placement from libc - AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are getting close with the upcoming dpISA support) - Other arch features: - In-kernel use of the memcpy instructions, FEAT_MOPS (previously only exposed to user; uaccess support not merged yet) - MTE: hugetlbfs support and the corresponding kselftests - Optimise CRC32 using the PMULL instructions - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG - Optimise the kernel TLB flushing to use the range operations - POE/pkey (permission overlays): further cleanups after bringing the signal handler in line with the x86 behaviour for 6.12 - arm64 perf updates: - Support for the NXP i.MX91 PMU in the existing IMX driver - Support for Ampere SoCs in the Designware PCIe PMU driver - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC - Support for Samsung's 'Mongoose' CPU PMU - Support for PMUv3.9 finer-grained userspace counter access control - Switch back to platform_driver::remove() now that it returns 'void' - Add some missing events for the CXL PMU driver - Miscellaneous arm64 fixes/cleanups: - Page table accessors cleanup: type updates, drop unused macros, reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity check addresses before runtime P4D/PUD folding - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the FEAT_ECV for the generic timers) allowing Linux to boot with firmware deployments that don't set SCTLR_EL3.ECVEn - ACPI/arm64: tighten the check for the array of platform timer structures and adjust the error handling procedure in gtdt_parse_timer_block() - Optimise the cache flush for the uprobes xol slot (skip if no change) and other uprobes/kprobes cleanups - Fix the context switching of tpidrro_el0 when kpti is enabled - Dynamic shadow call stack fixes - Sysreg updates - Various arm64 kselftest improvements * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (168 commits) arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled kselftest/arm64: Try harder to generate different keys during PAC tests kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all() arm64/ptrace: Clarify documentation of VL configuration via ptrace kselftest/arm64: Corrupt P0 in the irritator when testing SSVE acpi/arm64: remove unnecessary cast arm64/mm: Change protval as 'pteval_t' in map_range() kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c kselftest/arm64: Add FPMR coverage to fp-ptrace kselftest/arm64: Expand the set of ZA writes fp-ptrace does kselftets/arm64: Use flag bits for features in fp-ptrace assembler code kselftest/arm64: Enable build of PAC tests with LLVM=1 kselftest/arm64: Check that SVCR is 0 in signal handlers selftests/mm: Fix unused function warning for aarch64_write_signal_pkey() kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests kselftest/arm64: Fix build with stricter assemblers arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux() arm64/scs: Deal with 64-bit relative offsets in FDE frames ... |
||
![]() |
241c7ed4d4 |
vfs-6.13.untorn.writes
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcopwAKCRCRxhvAZXjc oitWAQD68PGFI6/ES9x+qGsDFEZBH08icuO+a9dyaZXyNRosDgD/ex2zHj6F7IzS Ghgb9jiqWQ8l2+PDYfisxa/0jiqCbAk= =DmXf -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs untorn write support from Christian Brauner: "An atomic write is a write issed with torn-write protection. This means for a power failure or any hardware failure all or none of the data from the write will be stored, never a mix of old and new data. This work is already supported for block devices. If a block device is opened with O_DIRECT and the block device supports atomic write, then FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block device. This contains the work to expand atomic write support to filesystems, specifically ext4 and XFS. Currently, only support for writing exactly one filesystem block atomically is added. Since it's now possible to have filesystem block size > page size for XFS, it's possible to write 4K+ blocks atomically on x86" * tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: drop an obsolete comment in iomap_dio_bio_iter ext4: Do not fallback to buffered-io for DIO atomic write ext4: Support setting FMODE_CAN_ATOMIC_WRITE ext4: Check for atomic writes support in write iter ext4: Add statx support for atomic writes xfs: Support setting FMODE_CAN_ATOMIC_WRITE xfs: Validate atomic writes xfs: Support atomic write for statx fs: iomap: Atomic write support fs: Export generic_atomic_write_valid() block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid() |
||
![]() |
7956186e75 |
vfs-6.13.tmpfs
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcZIgAKCRCRxhvAZXjc oge4AQDxhsKW+v/jKHydzqzwG3Ks7DIxrUg/mcGfdtBwjiWgvwEA8t0QAAfKECAK B0+bNKJ8XJRUtZ10Jgm3dzURbEhBWgU= =4Lui -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull tmpfs case folding updates from Christian Brauner: "This adds case-insensitive support for tmpfs. The work contained in here adds support for case-insensitive file names lookups in tmpfs. The main difference from other casefold filesystems is that tmpfs has no information on disk, just on RAM, so we can't use mkfs to create a case-insensitive tmpfs. For this implementation, there's a mount option for casefolding. The rest of the patchset follows a similar approach as ext4 and f2fs. The use case for this feature is similar to the use case for ext4, to better support compatibility layers (like Wine), particularly in combination with sandboxing/container tools (like Flatpak). Those containerization tools can share a subset of the host filesystem with an application. In the container, the root directory and any parent directories required for a shared directory are on tmpfs, with the shared directories bind-mounted into the container's view of the filesystem. If the host filesystem is using case-insensitive directories, then the application can do lookups inside those directories in a case-insensitive way, without this needing to be implemented in user-space. However, if the host is only sharing a subset of a case-insensitive directory with the application, then the parent directories of the mount point will be part of the container's root tmpfs. When the application tries to do case-insensitive lookups of those parent directories on a case-sensitive tmpfs, the lookup will fail" * tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tmpfs: Initialize sysfs during tmpfs init tmpfs: Fix type for sysfs' casefold attribute libfs: Fix kernel-doc warning in generic_ci_validate_strict_name docs: tmpfs: Add casefold options tmpfs: Expose filesystem features via sysfs tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs tmpfs: Add casefold lookup support libfs: Export generic_ci_ dentry functions unicode: Recreate utf8_parse_version() unicode: Export latest available UTF-8 version number ext4: Use generic_ci_validate_strict_name helper libfs: Create the helper function generic_ci_validate_strict_name() |
||
![]() |
a29835c9d0 |
vfs-6.13.ovl
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcXJQAKCRCRxhvAZXjc opmaAQDuEpUXi3PSNwgXGRYAzeBpsonoNrBtzSRTlaza7KZCiAEAsFUSbQJNGdkN TqC9KvI1EiX2hVvWmVBVttBNH6CY+wY= =4le5 -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull overlayfs updates from Christian Brauner: "Make overlayfs support specifying layers through file descriptors. Currently overlayfs only allows specifying layers through path names. This is inconvenient for users that want to assemble an overlayfs mount purely based on file descriptors: This enables user to specify both: fsconfig(fd_overlay, FSCONFIG_SET_FD, "upperdir+", NULL, fd_upper); fsconfig(fd_overlay, FSCONFIG_SET_FD, "workdir+", NULL, fd_work); fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1); fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2); in addition to: fsconfig(fd_overlay, FSCONFIG_SET_STRING, "upperdir+", "/upper", 0); fsconfig(fd_overlay, FSCONFIG_SET_STRING, "workdir+", "/work", 0); fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower1", 0); fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower2", 0); There's also a large set of new overlayfs selftests to test new features and some older properties" * tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests: add test for specifying 500 lower layers selftests: add overlayfs fd mounting selftests selftests: use shared header Documentation,ovl: document new file descriptor based layers ovl: specify layers via file descriptors fs: add helper to use mount option as path or fd |
||
![]() |
70e7730c2a |
vfs-6.13.misc
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli pqlG+38UdATImpfqqWjPbb72sBYcfQg= =wLUh -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Fixup and improve NLM and kNFSD file lock callbacks Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock It's important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS Fix this by using an fop_flag which greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike - Add a sysctl to delete the dentry when a file is removed instead of making it a negative dentry Commit |
||
![]() |
6ac81fd55e |
vfs-6.13.mgtime
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcScQAKCRCRxhvAZXjc oj+5AP4k822a77wc/3iPFk379naIvQ4dsrgemh0/Pb6ZvzvkFQEAi3vFCfzCDR2x SkJF/RwXXKZv6U31QXMRt2Qo6wfBuAc= =nVlm -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs multigrain timestamps from Christian Brauner: "This is another try at implementing multigrain timestamps. This time with significant help from the timekeeping maintainers to reduce the performance impact. Thomas provided a base branch that contains the required timekeeping interfaces for the VFS. It serves as the base for the multi-grain timestamp work: - Multigrain timestamps allow the kernel to use fine-grained timestamps when an inode's attributes is being actively observed via ->getattr(). With this support, it's possible for a file to get a fine-grained timestamp, and another modified after it to get a coarse-grained stamp that is earlier than the fine-grained time. If this happens then the files can appear to have been modified in reverse order, which breaks VFS ordering guarantees. To prevent this, a floor value is maintained for multigrain timestamps. Whenever a fine-grained timestamp is handed out, record it, and when later coarse-grained stamps are handed out, ensure they are not earlier than that value. If the coarse-grained timestamp is earlier than the fine-grained floor, return the floor value instead. The timekeeper changes add a static singleton atomic64_t into timekeeper.c that is used to keep track of the latest fine-grained time ever handed out. This is tracked as a monotonic ktime_t value to ensure that it isn't affected by clock jumps. Because it is updated at different times than the rest of the timekeeper object, the floor value is managed independently of the timekeeper via a cmpxchg() operation, and sits on its own cacheline. Two new public timekeeper interfaces are added: (1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the coarse-grained clock and the floor time (2) ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries to swap it into the floor. A timespec64 is filled with the result. - The VFS has always used coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide when to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This adds a way to only use fine-grained timestamps when they are being actively queried. Use the (unused) top bit in inode->i_ctime_nsec as a flag that indicates whether the current timestamps have been queried via stat() or the like. When it's set, we allow the kernel to use a fine-grained timestamp iff it's necessary to make the ctime show a different value. This solves the problem of being able to distinguish the timestamp between updates, but introduces a new problem: it's now possible for a file being changed to get a fine-grained timestamp. A file that is altered just a bit later can then get a coarse-grained one that appears older than the earlier fine-grained time. This violates timestamp ordering guarantees. This is where the earlier mentioned timkeeping interfaces help. A global monotonic atomic64_t value is kept that acts as a timestamp floor. When we go to stamp a file, we first get the latter of the current floor value and the current coarse-grained time. If the inode ctime hasn't been queried then we just attempt to stamp it with that value. If it has been queried, then first see whether the current coarse time is later than the existing ctime. If it is, then we accept that value. If it isn't, then we get a fine-grained time and try to swap that into the global floor. Whether that succeeds or fails, we take the resulting floor time, convert it to realtime and try to swap that into the ctime. We take the result of the ctime swap whether it succeeds or fails, since either is just as valid. Filesystems can opt into this by setting the FS_MGTIME fstype flag. Others should be unaffected (other than being subject to the same floor value as multigrain filesystems)" * tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: reduce pointer chasing in is_mgtime() test tmpfs: add support for multigrain timestamps btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps Documentation: add a new file documenting multigrain timestamps fs: add percpu counters for significant multigrain timestamp events fs: tracepoints around multigrain timestamp events fs: handle delegated timestamps in setattr_copy_mgtime timekeeping: Add percpu counter for tracking floor swap events timekeeping: Add interfaces for handling timestamps with a floor value fs: have setattr_copy handle multigrain timestamps appropriately fs: add infrastructure for multigrain timestamps |
||
![]() |
9e0933c21c |
fs: iomap: Atomic write support
Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC flag set. Initially FSes (XFS) should only support writing a single FS block atomically. As with any atomic write, we should produce a single bio which covers the complete write length. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> [djwong: clarify a couple of things in the docs] Signed-off-by: Darrick J. Wong <djwong@kernel.org> |
||
![]() |
d56239a82e |
vfs-6.12-rc6.fixes
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZyTGAQAKCRCRxhvAZXjc opd6AQCal4omyfS8FYe4VRRZ/0XHouagq99I0U0TAmKkvoKAsgD/XrdE+pSTEkPX Pv4T9phh1cZRxcyKVu77UoYkuHJEDAg= =Lu9R -----END PGP SIGNATURE----- Merge tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull filesystem fixes from Christian Brauner: "VFS: - Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set - Add a get_tree_bdev_flags() helper that allows to modify e.g., whether errors are logged into the filesystem context during superblock creation. This is used by erofs to fix a userspace regression where an error is currently logged when its used on a regular file which is an new allowed mode in erofs. netfs: - Fix the sysfs debug path in the documentation. - Fix iov_iter_get_pages*() for folio queues by skipping the page extracation if we're at the end of a folio. afs: - Fix moving subdirectories to different parent directory. autofs: - Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in validate_dev_ioctl(). The actual ioctl number, not the ioctl command needs to be checked for autofs" * tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP autofs: fix thinko in validate_dev_ioctl() iov_iter: Fix iov_iter_get_pages*() for folio_queue afs: Fix missing subdir edit when renamed between parent dirs doc: correcting the debug path for cachefiles erofs: use get_tree_bdev_flags() to avoid misleading messages fs/super.c: introduce get_tree_bdev_flags() |
||
![]() |
128d333f0d |
f2fs: introduce device aliasing file
F2FS should understand how the device aliasing file works and support deleting the file after use. A device aliasing file can be created by mkfs.f2fs tool and it can map the whole device with an extent, not using node blocks. The file space should be pinned and normally used for read-only usages. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
![]() |
a713f830c9
|
docs: tmpfs: Add casefold options
Document mounting options for casefold support in tmpfs. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-9-f443d5814194@igalia.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
6b51b9f65c
|
doc: correcting the debug path for cachefiles
The original debug path is under "/sys/modules", that's wrong. The real path in kernel is "/sys/module". So we can correct it. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20241022013812.2880883-1-lihongbo22@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
7166c32651 |
vfs-6.12-rc5.fixes
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZxY6XAAKCRCRxhvAZXjc opmUAQCu4KhzBBdZmFw3AfZFNJvYb1onT4FiU0pnyGgfvzEdEwD6AlnlgQ7DL3ZN WBqBzUl+DpGYJfzhkqoEGH89Fagx7QM= =mm68 -----END PGP SIGNATURE----- Merge tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "afs: - Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock netfs: - Drop the references to a folio immediately after the folio has been extracted to prevent races with future I/O collection - Fix a documenation build error - Downgrade the i_rwsem for buffered writes to fix a cifs reported performance regression when switching to netfslib vfs: - Explicitly return -E2BIG from openat2() if the specified size is unexpectedly large. This aligns openat2() with other extensible struct based system calls - When copying a mount namespace ensure that we only try to remove the new copy from the mount namespace rbtree if it has already been added to it nilfs: - Clear the buffer delay flag when clearing the buffer state clags when a buffer head is discarded to prevent a kernel OOPs ocfs2: - Fix an unitialized value warning in ocfs2_setattr() proc: - Fix a kernel doc warning" * tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: proc: Fix W=1 build kernel-doc warning afs: Fix lock recursion fs: Fix uninitialized value issue in from_kuid and from_kgid fs: don't try and remove empty rbtree node netfs: Downgrade i_rwsem for a buffered write nilfs2: fix kernel bug due to missing clearing of buffer delay flag openat2: explicitly return -E2BIG for (usize > PAGE_SIZE) netfs: fix documentation build error netfs: In readahead, put the folio refs as soon extracted |
||
![]() |
fb6f20ecb1 |
reiserfs: The last commit
Deprecation period of reiserfs ends with the end of this year so it is time to remove it from the kernel. Acked-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> |
||
![]() |
caf0ea451d |
iomap: remove iomap_file_buffered_write_punch_delalloc
Currently iomap_file_buffered_write_punch_delalloc can be called from XFS either with the invalidate lock held or not. To fix this while keeping the locking in the file system and not the iomap library code we'll need to life the locking up into the file system. To prepare for that, open code iomap_file_buffered_write_punch_delalloc in the only caller, and instead export iomap_write_delalloc_release. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> |
||
![]() |
a89ed67d3c
|
Documentation,ovl: document new file descriptor based layers
Add a minimal example how to specify layers via file descriptors. Link: https://lore.kernel.org/r/20241014-work-overlayfs-v3-3-32b3fed1286e@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
b40508ca5d
|
Merge patch series "timekeeping/fs: multigrain timestamp redux"
Jeff Layton <jlayton@kernel.org> says: The VFS has always used coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide when to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. What we need is a way to only use fine-grained timestamps when they are being actively queried. Use the (unused) top bit in inode->i_ctime_nsec as a flag that indicates whether the current timestamps have been queried via stat() or the like. When it's set, we allow the kernel to use a fine-grained timestamp iff it's necessary to make the ctime show a different value. This solves the problem of being able to distinguish the timestamp between updates, but introduces a new problem: it's now possible for a file being changed to get a fine-grained timestamp. A file that is altered just a bit later can then get a coarse-grained one that appears older than the earlier fine-grained time. This violates timestamp ordering guarantees. To remedy this, keep a global monotonic atomic64_t value that acts as a timestamp floor. When we go to stamp a file, we first get the latter of the current floor value and the current coarse-grained time. If the inode ctime hasn't been queried then we just attempt to stamp it with that value. If it has been queried, then first see whether the current coarse time is later than the existing ctime. If it is, then we accept that value. If it isn't, then we get a fine-grained time and try to swap that into the global floor. Whether that succeeds or fails, we take the resulting floor time, convert it to realtime and try to swap that into the ctime. We take the result of the ctime swap whether it succeeds or fails, since either is just as valid. Filesystems can opt into this by setting the FS_MGTIME fstype flag. Others should be unaffected (other than being subject to the same floor value as multigrain filesystems). * patches from https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org: tmpfs: add support for multigrain timestamps btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps Documentation: add a new file documenting multigrain timestamps fs: add percpu counters for significant multigrain timestamp events fs: tracepoints around multigrain timestamp events fs: handle delegated timestamps in setattr_copy_mgtime fs: have setattr_copy handle multigrain timestamps appropriately fs: add infrastructure for multigrain timestamps Link: https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
e3fad0376d
|
Documentation: add a new file documenting multigrain timestamps
Add a high-level document that describes how multigrain timestamps work, rationale for them, and some info about implementation and tradeoffs. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20241002-mgtime-v10-8-d1c4717f5284@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
368196e501
|
netfs: fix documentation build error
Commit |
||
![]() |
ae80e1629a |
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
Use VM_HIGH_ARCH_5 for guarded control stack pages. Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-14-222b78d87eee@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
![]() |
09ee2a670d
|
Merge patch series "Fixup NLM and kNFSD file lock callbacks"
Benjamin Coddington <bcodding@redhat.com> says: Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients. This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock, but now that I've noticed it I can't help but try to fix it because there are big advantages for setups that might depend on timely lock notifications, and we've supported that as a feature for a long time. Its important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation. However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS. I tried to fix this by simply including the old check for lock(), but the resulting include mess and layering violations was more than I could accept. There's a much cleaner way presented here using an fop_flag, which while potentially flag-greedy, greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike. * patches from https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com: exportfs: Remove EXPORT_OP_ASYNC_LOCK NLM/NFSD: Fix lock notifications for async-capable filesystems gfs2/ocfs2: set FOP_ASYNC_LOCK fs: Introduce FOP_ASYNC_LOCK NFS: trace: show TIMEDOUT instead of 0x6e nfsd: use system_unbound_wq for nfsd_file_gc_worker() nfsd: count nfsd_file allocations nfsd: fix refcount leak when file is unhashed after being found nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire nfsd: add list_head nf_gc to struct nfsd_file Link: https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
b875bd5b38
|
exportfs: Remove EXPORT_OP_ASYNC_LOCK
Now that GFS2 and OCFS2 are signalling async ->lock() support with FOP_ASYNC_LOCK and checks for support are converted, we can remove EXPORT_OP_ASYNC_LOCK. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/0a114db814fec3086f937ae3d44a086f13b8de26.1726083391.git.bcodding@redhat.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
4965ddb166 |
USB/Thunderbolt update for 6.12-rc1
Here is the large set of USB and Thunderbolt changes for 6.12-rc1. Nothing "major" in here, except for a new 9p network gadget that has been worked on for a long time (all of the needed acks are here.) Other than that, it's the usual set of: - Thunderbolt / USB4 driver updates and additions for new hardware - dwc3 driver updates and new features added - xhci driver updates - typec driver updates - USB gadget updates and api additions to make some gadgets more configurable by userspace - dwc2 driver updates - usb phy driver updates - usbip feature additions - other minor USB driver updates All of these have been in linux-next for a long time with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZvU0/g8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykGcACfSqouxRg8FRtq+nIKHWXI9lOTnVcAoKd9PAgq 1i7yCNopPEPEW8sjz1GX =mY+S -----END PGP SIGNATURE----- Merge tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt updates from Greg KH: "Here is the large set of USB and Thunderbolt changes for 6.12-rc1. Nothing "major" in here, except for a new 9p network gadget that has been worked on for a long time (all of the needed acks are here) Other than that, it's the usual set of: - Thunderbolt / USB4 driver updates and additions for new hardware - dwc3 driver updates and new features added - xhci driver updates - typec driver updates - USB gadget updates and api additions to make some gadgets more configurable by userspace - dwc2 driver updates - usb phy driver updates - usbip feature additions - other minor USB driver updates All of these have been in linux-next for a long time with no reported issues" * tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (145 commits) sub: cdns3: Use predefined PCI vendor ID constant sub: cdns2: Use predefined PCI vendor ID constant USB: misc: yurex: fix race between read and write USB: misc: cypress_cy7c63: check for short transfer USB: appledisplay: close race between probe and completion handler USB: class: CDC-ACM: fix race between get_serial and set_serial usb: r8a66597-hcd: make read-only const arrays static usb: typec: ucsi: Fix busy loop on ASUS VivoBooks usb: dwc3: rtk: Clean up error code in __get_dwc3_maximum_speed() usb: storage: ene_ub6250: Fix right shift warnings usb: roles: Improve the fix for a false positive recursive locking complaint locking/mutex: Introduce mutex_init_with_key() locking/mutex: Define mutex_init() once net/9p/usbg: fix CONFIG_USB_GADGET dependency usb: xhci: fix loss of data on Cadence xHC usb: xHCI: add XHCI_RESET_ON_RESUME quirk for Phytium xHCI host usb: dwc3: imx8mp: disable SS_CON and U3 wakeup for system sleep usb: dwc3: imx8mp: add 2 software managed quirk properties for host mode usb: host: xhci-plat: Parse xhci-missing_cas_quirk and apply quirk usb: misc: onboard_usb_dev: add Microchip usb5744 SMBus programming support ... |
||
![]() |
684a64bf32 |
NFS Client Updates for Linux 6.12
New Features: * Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention * Add support for the LOCALIO protocol extention Bugfixes: * Fix memory leak in error path of nfs4_do_reclaim() * Simplify and guarantee lock owner uniqueness * Fix -Wformat-truncation warning * Fix folio refcounts by using folio_attach_private() * Fix failing the mount system call when the server is down * Fix detection of "Proxying of Times" server support Cleanups: * Annotate struct nfs_cache_array with __counted_by() * Remove unnecessary NULL checks before kfree() * Convert RPC_TASK_* constants to an enum * Remove obsolete or misleading comments and declerations -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmbzKDAACgkQ18tUv7Cl QOusjBAAxTSoVbHocl+9eYpvKnscPArgPXnfd6mB9rnQRgtnceTO2ei7cdiE2qhz dxQiyzlAXh3e7dGwoy02qEd6wTTqWeQ8ESdMpCAqSacBU4tu5owfzNSWunZvgYYj QOhjdmv8M1IfZnTstPlVrRNaZcDkhV1tzEtpZppkEqhTB0bHWqrcM4EdklTWT0Yc PGMpGbfuGsa4qZy2vWl7doERVEgK8mBeahLtYFD2W6phIvNWgD6IlKy66RaK2RfH nXmZoZbI2/ioi4TKvNyY8xoGMGvetLI1h8YNQYkEg060XCkisLZDOvoodUAylOTR 2jHQLG5+/ejhpD/zgPghGZDSGNN1GyZaH09E/vtiS+3k9OXxFz6Rq68VnC6kpMA4 TIUYsT8ejPzs2gW59iDFGB6cKI4XnRtxgmApW/Za0y9A72PSi+G/pbWAk7ThjTxf +HySsba4baA63opIgBSLVBrUsXZfdn/KTDTZ4nkPiq57BggGcZv7Y2ItOTXA+pB/ 5nigDKkhWsYVjMbkx6wmh+VO2gv4/Z8WqsmiDwFMpVqM0w8eycBOHjOumuuc6nmw y+2OKZqU2Npm2HI/R8lA7nB1m2QP5t7CRM2+xlZNuavHrfsMaqHNl8/9VgxlCATQ /Zo74hbhmCgQYxrTjL8XFQG9/8y0o3H5IcTEr/SgCVxHyDSan1I= =YjyC -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "New Features: - Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention - Add support for the LOCALIO protocol extention Bugfixes: - Fix memory leak in error path of nfs4_do_reclaim() - Simplify and guarantee lock owner uniqueness - Fix -Wformat-truncation warning - Fix folio refcounts by using folio_attach_private() - Fix failing the mount system call when the server is down - Fix detection of "Proxying of Times" server support Cleanups: - Annotate struct nfs_cache_array with __counted_by() - Remove unnecessary NULL checks before kfree() - Convert RPC_TASK_* constants to an enum - Remove obsolete or misleading comments and declerations" * tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits) nfs: Fix `make htmldocs` warnings in the localio documentation nfs: add "NFS Client and Server Interlock" section to localio.rst nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst nfs: add Documentation/filesystems/nfs/localio.rst nfs: implement client support for NFS_LOCALIO_PROGRAM nfs/localio: use dedicated workqueues for filesystem read and write pnfs/flexfiles: enable localio support nfs: enable localio for non-pNFS IO nfs: add LOCALIO support nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit nfsd: implement server support for NFS_LOCALIO_PROGRAM nfsd: add LOCALIO support nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO nfs_common: add NFS LOCALIO auxiliary protocol enablement SUNRPC: replace program list with program array SUNRPC: add svcauth_map_clnt_to_svc_cred_local SUNRPC: remove call_allocate() BUG_ONs nfsd: add nfsd_serv_try_get and nfsd_serv_put nfsd: add nfsd_file_acquire_local() nfsd: factor out __fh_verify to allow NULL rqstp to be passed ... |
||
![]() |
68898131d2 |
nfs: Fix make htmldocs warnings in the localio documentation
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes:
|
||
![]() |
736cd2c1ae |
nfs: add "NFS Client and Server Interlock" section to localio.rst
This section answers a new FAQ entry: 9. How does LOCALIO make certain that object lifetimes are managed properly given NFSD and NFS operate in different contexts? See the detailed "NFS Client and Server Interlock" section below. The first half of the section details NeilBrown's elegant design for LOCALIO's nfs_uuid_t based interlock and is heavily based on Neil's "net namespace refcounting" description here: https://marc.info/?l=linux-nfs&m=172498546024767&w=2 The second half of the section details the per-cpu-refcount introduced to ensure NFSD's nfsd_serv isn't destroyed while in use by a LOCALIO client. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> |
||
![]() |
f7128262b1 |
nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
Add a FAQ section to give answers to questions that have been raised during review of the localio feature. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> |
||
![]() |
92945bd81c |
nfs: add Documentation/filesystems/nfs/localio.rst
This document gives an overview of the LOCALIO auxiliary RPC protocol added to the Linux NFS client and server to allow them to reliably handshake to determine if they are on the same host. Once an NFS client and server handshake as "local", the client will bypass the network RPC protocol for read, write and commit operations. Due to this XDR and RPC bypass, these operations will operate faster. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> |
||
![]() |
b3f391fddf |
bcachefs changes for 6.12-rc1
rcu_pending, btree key cache rework: this solves lock contenting in the key cache, eliminating the biggest source of the srcu lock hold time warnings, and drastically improving performance on some metadata heavy workloads - on multithreaded creates we're now 3-4x faster than xfs. We're now using an rhashtable instead of the system inode hash table; this is another significant performance improvement on multithreaded metadata workloads, eliminating more lock contention. for_each_btree_key_in_subvolume_upto(): new helper for iterating over keys within a specific subvolume, eliminating a lot of open coded "subvolume_get_snapshot()" and also fixing another source of srcu lock time warnings, by running each loop iteration in its own transaction (as the existing for_each_btree_key() does). More work on btree_trans locking asserts; we now assert that we don't hold btree node locks when trans->locked is false, which is important because we don't use lockdep for tracking individual btree node locks. Some cleanups and improvements in the bset.c btree node lookup code, from Alan. Rework of btree node pinning, which we use in backpointers fsck. The old hacky implementation, where the shrinker just skipped over nodes in the pinned range, was causing OOMs; instead we now use another shrinker with a much higher seeks number for pinned nodes. Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue where rebalance would sometimes fall back to allocating from the full filesystem, which is not what we want when it's trying to move data to a specific target. Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache allocations. Idmap mounts are now supported - Hongbo. Rename whiteouts are now supported - Hongbo. Erasure coding can now handle devices being marked as failed, or forcibly removed. We still need the evacuate path for erasure coding, but it's getting very close to ready for people to start using. Status, and when will we be taking off experimental: ---------------------------------------------------- Going by critical, user facing bugs getting found and fixed, we're nearly there. There are a couple key items that need to be finished before we can take off the experimental label: - The end-user experience is still pretty painful when the root filesystem needs a fsck; we need some form of limited self healing so that necessary repair gets run automatically. Errors (by type) are recorded in the superblock, so what we need to do next is convert remaining inconsistent() errors to fsck() errors (so that all runtime inconsistencies are logged in the superblock), and we need to go through the list of fsck errors and classify them by which fsck passes are needed to repair them. - We need comprehensive torture testing for all our repair paths, to shake out remaining bugs there. Thomas has been working on the tooling for this, so this is coming soonish. Slightly less critical items: - We need to improve the end-user experience for degraded mounts: right now, a degraded root filesystem means dropping to an initramfs shell or somehow inputting mount options manually (we don't want to allow degraded mounts without some form of user input, except on unattended servers) - we need the mount helper to prompt the user to allow mounting degraded, and make sure this works with systemd. - Scalabiity: we have users running 100TB+ filesystems, and that's effectively the limit right now due to fsck times. We have some reworks in the pipeline to address this, we're aiming to make petabyte sized filesystems practical. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmbvHQoACgkQE6szbY3K bnYfAw/+IXQ43/O+Jzs0MLD7pKZnrlbHiX9FqYLazD40vWvkyRTQOwgTn8pVNhq3 4YWmtuZyqh036YC+bGqYFOhz20YetS5UdgbClpwmc99JJ6xsY+Z1mdpYfz5oq1Dw /pBX5iYb3rAt8UbQoZ8lcWM+GpT3GKJVgJuiLB2gRp9gATFesuh+0qU42oIVVVU5 4y3VhDBUmRk4XqEnk8hr7EIDMW0wWP3aptxYMZzeUPW0x1cEQ+FWrJo5D6lXv2KK dKv3MogvA0FFNi/eNexclPiu2pXtI7vrxT7umsxAICHLt41rWpV5ttE6io3bC4ZN qvwF9w2CpmKPKchFru9PO+QrWHVR7e6bphwf3TzyoKZ7tTn42f1RQlub7gBzI3bz ai5ZwGRIvpUoPVBj+CO+Ipog81uUb23Ma+gXg1akEFBOAb+o7I3KOOSBh5l+0cHj 3Ov1n0TLcsoO2cqoqfsV2QubW9YcWEZ76g5mKwQnUn8Cs6Fp0wWaIyK9aNkIAxcr tNDPGtH1gKitxUvju5i/LyI7y1UoeFvqJFee0VsU6QnixHn1ySzhePsJt6UEnIJT Ia3C96Igqu2mV9FxhfGHj/qi7TGjqqkZHa8+B610cDpgf15cx7Ps2DYjkuQMFCqZ Q3Q1o5De9roRq5xF2hLiYJCbzJKqd5ichFsBtLQuX572ICxbICg= =oVCy -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs Pull bcachefs updates from Kent Overstreet: - rcu_pending, btree key cache rework: this solves lock contenting in the key cache, eliminating the biggest source of the srcu lock hold time warnings, and drastically improving performance on some metadata heavy workloads - on multithreaded creates we're now 3-4x faster than xfs. - We're now using an rhashtable instead of the system inode hash table; this is another significant performance improvement on multithreaded metadata workloads, eliminating more lock contention. - for_each_btree_key_in_subvolume_upto(): new helper for iterating over keys within a specific subvolume, eliminating a lot of open coded "subvolume_get_snapshot()" and also fixing another source of srcu lock time warnings, by running each loop iteration in its own transaction (as the existing for_each_btree_key() does). - More work on btree_trans locking asserts; we now assert that we don't hold btree node locks when trans->locked is false, which is important because we don't use lockdep for tracking individual btree node locks. - Some cleanups and improvements in the bset.c btree node lookup code, from Alan. - Rework of btree node pinning, which we use in backpointers fsck. The old hacky implementation, where the shrinker just skipped over nodes in the pinned range, was causing OOMs; instead we now use another shrinker with a much higher seeks number for pinned nodes. - Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue where rebalance would sometimes fall back to allocating from the full filesystem, which is not what we want when it's trying to move data to a specific target. - Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache allocations. - Idmap mounts are now supported (Hongbo Li) - Rename whiteouts are now supported (Hongbo Li) - Erasure coding can now handle devices being marked as failed, or forcibly removed. We still need the evacuate path for erasure coding, but it's getting very close to ready for people to start using. * tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits) bcachefs: return err ptr instead of null in read sb clean bcachefs: Remove duplicated include in backpointers.c bcachefs: Don't drop devices with stripe pointers bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices bcachefs: bch_fs.rw_devs_change_count bcachefs: bch2_dev_remove_stripes() bcachefs: bch2_trigger_ptr() calculates sectors even when no device bcachefs: improve error messages in bch2_ec_read_extent() bcachefs: improve error message on too few devices for ec bcachefs: improve bch2_new_stripe_to_text() bcachefs: ec_stripe_head.nr_created bcachefs: bch_stripe.disk_label bcachefs: stripe_to_mem() bcachefs: EIO errcode cleanup bcachefs: Rework btree node pinning bcachefs: split up btree cache counters for live, freeable bcachefs: btree cache counters should be size_t bcachefs: Don't count "skipped access bit" as touched in btree cache scan bcachefs: Failed devices no longer require mounting in degraded mode bcachefs: bch2_dev_rcu_noerror() ... |
||
![]() |
617a814f14 |
ALong with the usual shower of singleton patches, notable patch series in
this pull request are: "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. "mm: split underused THPs" from Yu Zhao. Improve THP=always policy - this was overprovisioning THPs in sparsely accessed memory areas. "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo= =s0T+ -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ... |
||
![]() |
171754c380 |
vfs-6.12.blocksize
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvwAKCRCRxhvAZXjc ohg3APwJWQnqFlBddcRl4yrPJ/cgcYSYAOdHb+E+blomSwdxcwEAmwsnLPNQOtw2 rxKvQfZqhVT437bl7RpPPZrHGxwTng8= =6v1r -----END PGP SIGNATURE----- Merge tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs blocksize updates from Christian Brauner: "This contains the vfs infrastructure as well as the xfs bits to enable support for block sizes (bs) larger than page sizes (ps) plus a few fixes to related infrastructure. There has been efforts over the last 16 years to enable enable Large Block Sizes (LBS), that is block sizes in filesystems where bs > page size. Through these efforts we have learned that one of the main blockers to supporting bs > ps in filesystems has been a way to allocate pages that are at least the filesystem block size on the page cache where bs > ps. Thanks to various previous efforts it is possible to support bs > ps in XFS with only a few changes in XFS itself. Most changes are to the page cache to support minimum order folio support for the target block size on the filesystem. A motivation for Large Block Sizes today is to support high-capacity (large amount of Terabytes) QLC SSDs where the internal Indirection Unit (IU) are typically greater than 4k to help reduce DRAM and so in turn cost and space. In practice this then allows different architectures to use a base page size of 4k while still enabling support for block sizes aligned to the larger IUs by relying on high order folios on the page cache when needed. It also allows to take advantage of the drive's support for atomics larger than 4k with buffered IO support in Linux. As described this year at LSFMM, supporting large atomics greater than 4k enables databases to remove the need to rely on their own journaling, so they can disable double buffered writes, which is a feature different cloud providers are already enabling through custom storage solutions" * tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (22 commits) Documentation: iomap: fix a typo iomap: remove the iomap_file_buffered_write_punch_delalloc return value iomap: pass the iomap to the punch callback iomap: pass flags to iomap_file_buffered_write_punch_delalloc iomap: improve shared block detection in iomap_unshare_iter iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release docs:filesystems: fix spelling and grammar mistakes in iomap design page filemap: fix htmldoc warning for mapping_align_index() iomap: make zero range flush conditional on unwritten mappings iomap: fix handling of dirty folios over unwritten extents iomap: add a private argument for iomap_file_buffered_write iomap: remove set_memor_ro() on zero page xfs: enable block size larger than page size support xfs: make the calculation generic in xfs_sb_validate_fsb_count() xfs: expose block size in stat xfs: use kvmalloc for xattr buffers iomap: fix iomap_dio_zero() for fs bs > system page size filemap: cap PTE range to be created to allowed zero fill in folio_map_range() mm: split a folio in minimum folio order chunks readahead: allocate folios with mapping_min_order in readahead ... |
||
![]() |
45d986d113 |
overlayfs updates for 6.12
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmbodTUACgkQEVvVyTe/ 1Wp5Jw/+LN99lGoKXKZ7zB8gta4hMm2O+wd3//IdfTi6u59uff0QvVNbczC6SRlJ sW6czIgjKpR1xt6Orp0Rof0VZK09PMqq3Iw7Tt3xEkJtjKxppfMCQHeoi03CH9kF gOHPW+Ad4jlCk+TYFEuHk5lQIgVtJ11JDtGjr5tNa03MbrLIa0pFpTdz65bg9uTn JdPqA+fiJ/mVoXlFQEZkQPeT8CgMfTbKtQDGfcmKzrNo0ZyQdW30u5lP3MGROptn X/QX5n77UrE9Du+UHPSaVDRUd8iA/ZcZe/7RAlKXWElHpKvAiqv84Sz0Mc1fczHx nQHaUfQixGs/vakkPXxVqKn/Xvnj4GDwUqyM9mG8kN1x4QpdyFRqM66zZxtWsSi5 /PL0zwAY950a5fY4VsyRsSf6Zl/SlDdB+PVVajk6BKqMxWgvxyuVbFRUqfesHsfA V5GHil7F03sPzO12SvAJg8HuresCiafLdAPJgGXuGoeprmeSP9eYx0EqcUzLpHd7 ORUF4qEylYFp8vZBJdBnZpSGRUv/mmjDDowsKeKyfgKyVR99lLKLCtqMJ0e0eGaE mpGNg+eq//snJldvBUHrMYQvyVm18/bbOzT0ZdGPp2bMUxLjSbiab7D3S9GKRxRt 8GRg/bJfZw5tz2i1bxbe5IWfNij5pDilLeAQUsoO8pUwu/SrCy8= =wnbj -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - Increase robustness of overlayfs to crashes in the case of underlying filesystems that to not guarantee metadata ordering to persistent storage (problem was reported with ubifs). - Deny mount inside container with features that require root privileges to work properly, instead of failing operations later. - Some clarifications to overlayfs documentation. * tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: fail if trusted xattrs are needed but caller lacks permission overlayfs.rst: update metacopy section in overlayfs documentation ovl: fsync after metadata copy-up ovl: don't set the superblock's errseq_t manually |
||
![]() |
d58db3f3a0 |
Another relatively mundane cycle for docs:
- The beginning of an EEVDF scheduler document - More Chinese translations - A rethrashing of our bisection documentation ...plus the usual array of smaller fixes, and more than the usual number of typo fixes. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmboMnkACgkQF0NaE2wM flha/Qf/e8zRinIYQJ7BmombNm39w3wUiNuXr8SWq7afqhsAJJzmOZ3oyyfssL+B a1pSjhxb15UrKf1kMKhdBxhDndXvto5UekJRBY5gsTvcBMBmtIovN+ZK5Z5jObsw gzHD9of08Ti7N4C2dSBdLPHtvIBX0rVeEK4oAH7AUaQviu1cfTaLQQA0dRYsaJeX iXsts2NkGl6ZUF7mk4nlzj8+Y1zot+mCd6B53iSimNKxwsPODrCZUobJAvxg1qVU pRCQcnpx2fTBnh4ugrcLZbautyhL9bJ8VQzFeoQgYpODDgDnZyTjN6kxv65LpxAz dXi+hx5Vk7lP3BbTp9EeGn305/qQPA== =JuBw -----END PGP SIGNATURE----- Merge tag 'docs-6.12' of git://git.lwn.net/linux Pull documentation update from Jonathan Corbet: "Another relatively mundane cycle for docs: - The beginning of an EEVDF scheduler document - More Chinese translations - A rethrashing of our bisection documentation ...plus the usual array of smaller fixes, and more than the usual number of typo fixes" * tag 'docs-6.12' of git://git.lwn.net/linux: (48 commits) Remove duplicate "and" in 'Linux NVMe docs. docs:filesystems: fix spelling and grammar mistakes docs:filesystem: fix mispelled words on autofs page docs:mm: fixed spelling and grammar mistakes on vmalloc kernel stack page Documentation: PCI: fix typo in pci.rst docs/zh_CN: add the translation of kbuild/gcc-plugins.rst docs/process: fix typos docs:mm: fix spelling mistakes in heterogeneous memory management page accel/qaic: Fix a typo docs/zh_CN: update the translation of security-bugs docs: block: Fix grammar and spelling mistakes in bfq-iosched.rst Documentation: Fix spelling mistakes Documentation/gpu: Fix typo in Documentation/gpu/komeda-kms.rst scripts: sphinx-pre-install: remove unnecessary double check for $cur_version Loongarch: KVM: Add KVM hypercalls documentation for LoongArch Documentation: Document the kernel flag bdev_allow_write_mounted docs: scheduler: completion: Update member of struct completion docs: kerneldoc-preamble.sty: Suppress extra spaces in CJK literal blocks docs: submitting-patches: Advertise b4 docs: update dev-tools/kcsan.rst url about KTSAN ... |
||
![]() |
a430d95c5e |
lsm/stable-6.12 PR 20240911
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbiGGAUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPU8BAA1+A15pmS34I9pq7c8TmRz3rNEs/a zrW1aWJ0X/+axNS7sW3Pwtt1EKuaOhskKU8gNSieRhljC8rgXIVjZzLw6Atgcr5k upulGbU9TXyVisYN+PWv9/84ito6/nYsKb7Mg3nUVsdodtIFVnsk1fxYLPHQEBig Pl3i26U3VqH93Kz0W5vs/QR2uduPB8ZyscdTgcbrY9Vv1Y7IDZ2g9QsJVKLvbQKL qcPK1JkHa+sBPJxDqS9A40zgbLbdPQgWQzsXX3dz822w1Ga7FIHSqxMBA6HwHZ+L kV4P58wVfavhwt/cQSKMWI/yiGPMMd0B6yD+m8ojOvGfOfRCWxGMmEMqHNuZ3m7k Bfll5ZgZTY8phUUhiNf3nxO3F3MM/5bHdhPOj3RReqbAbS6uWr4/fThPDYY/zIo6 NCY3HGxx3Ae64uQ01gC2p/czC50jDsMwlbXiZbrgdBhjBm/CVk5ozb80mLVcGrLB +6XMzzSbC8IaNAH2fDmUJ2ABdwyNPgsSOTGZVzIanpxu1SU2/yk3SMxkp8fv5s36 wLeODUVcLgsjVV538Mkm6PGTE4TlXaH9yi6apMyJAGp0vPYx5c3Xxk2y5A5cur5p hcrbDiX2QgeqFbwsz36incmPmbef2NU2c8feR8XLtPJuwNIeRcMSje0pnkaFlRmb TAUJ1sDQAzZ8Fy0= =HIAO -----END PGP SIGNATURE----- Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Move the LSM framework to static calls This transitions the vast majority of the LSM callbacks into static calls. Those callbacks which haven't been converted were left as-is due to the general ugliness of the changes required to support the static call conversion; we can revisit those callbacks at a future date. - Add the Integrity Policy Enforcement (IPE) LSM This adds a new LSM, Integrity Policy Enforcement (IPE). There is plenty of documentation about IPE in this patches, so I'll refrain from going into too much detail here, but the basic motivation behind IPE is to provide a mechanism such that administrators can restrict execution to only those binaries which come from integrity protected storage, e.g. a dm-verity protected filesystem. You will notice that IPE requires additional LSM hooks in the initramfs, dm-verity, and fs-verity code, with the associated patches carrying ACK/review tags from the associated maintainers. We couldn't find an obvious maintainer for the initramfs code, but the IPE patchset has been widely posted over several years. Both Deven Bowers and Fan Wu have contributed to IPE's development over the past several years, with Fan Wu agreeing to serve as the IPE maintainer moving forward. Once IPE is accepted into your tree, I'll start working with Fan to ensure he has the necessary accounts, keys, etc. so that he can start submitting IPE pull requests to you directly during the next merge window. - Move the lifecycle management of the LSM blobs to the LSM framework Management of the LSM blobs (the LSM state buffers attached to various kernel structs, typically via a void pointer named "security" or similar) has been mixed, some blobs were allocated/managed by individual LSMs, others were managed by the LSM framework itself. Starting with this pull we move management of all the LSM blobs, minus the XFRM blob, into the framework itself, improving consistency across LSMs, and reducing the amount of duplicated code across LSMs. Due to some additional work required to migrate the XFRM blob, it has been left as a todo item for a later date; from a practical standpoint this omission should have little impact as only SELinux provides a XFRM LSM implementation. - Fix problems with the LSM's handling of F_SETOWN The LSM hook for the fcntl(F_SETOWN) operation had a couple of problems: it was racy with itself, and it was disconnected from the associated DAC related logic in such a way that the LSM state could be updated in cases where the DAC state would not. We fix both of these problems by moving the security_file_set_fowner() hook into the same section of code where the DAC attributes are updated. Not only does this resolve the DAC/LSM synchronization issue, but as that code block is protected by a lock, it also resolve the race condition. - Fix potential problems with the security_inode_free() LSM hook Due to use of RCU to protect inodes and the placement of the LSM hook associated with freeing the inode, there is a bit of a challenge when it comes to managing any LSM state associated with an inode. The VFS folks are not open to relocating the LSM hook so we have to get creative when it comes to releasing an inode's LSM state. Traditionally we have used a single LSM callback within the hook that is triggered when the inode is "marked for death", but not actually released due to RCU. Unfortunately, this causes problems for LSMs which want to take an action when the inode's associated LSM state is actually released; so we add an additional LSM callback, inode_free_security_rcu(), that is called when the inode's LSM state is released in the RCU free callback. - Refactor two LSM hooks to better fit the LSM return value patterns The vast majority of the LSM hooks follow the "return 0 on success, negative values on failure" pattern, however, there are a small handful that have unique return value behaviors which has caused confusion in the past and makes it difficult for the BPF verifier to properly vet BPF LSM programs. This includes patches to convert two of these"special" LSM hooks to the common 0/-ERRNO pattern. - Various cleanups and improvements A handful of patches to remove redundant code, better leverage the IS_ERR_OR_NULL() helper, add missing "static" markings, and do some minor style fixups. * tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits) security: Update file_set_fowner documentation fs: Fix file_set_fowner LSM hook inconsistencies lsm: Use IS_ERR_OR_NULL() helper function lsm: remove LSM_COUNT and LSM_CONFIG_COUNT ipe: Remove duplicated include in ipe.c lsm: replace indirect LSM hook calls with static calls lsm: count the LSMs enabled at compile time kernel: Add helper macros for loop unrolling init/main.c: Initialize early LSMs after arch code, static keys and calls. MAINTAINERS: add IPE entry with Fan Wu as maintainer documentation: add IPE documentation ipe: kunit test for parser scripts: add boot policy generation program ipe: enable support for fs-verity as a trust provider fsverity: expose verified fsverity built-in signatures to LSMs lsm: add security_inode_setintegrity() hook ipe: add support for dm-verity as a trust provider dm-verity: expose root hash digest and signature data to LSMs block,lsm: add LSM blob and new LSM hooks for block devices ipe: add permissive toggle ... |
||
![]() |
35219bc5c7 |
vfs-6.12.netfs
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvgAKCRCRxhvAZXjc onQWAQD6IxAKPU0zom2FoWNilvSzPs7WglTtvddX9pu/lT1RNAD/YC/wOLW8mvAv 9oTAmigQDQQhEWdJA9RgLZBiw7k+DAw= =zWFb -----END PGP SIGNATURE----- Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs updates from Christian Brauner: "This contains the work to improve read/write performance for the new netfs library. The main performance enhancing changes are: - Define a structure, struct folio_queue, and a new iterator type, ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See that patch for questions about naming and form. ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The problem with an xarray is that accessing it requires the use of a lock (typically the RCU read lock) - and this means that we can't supply iterate_and_advance() with a step function that might sleep (crypto for example) without having to drop the lock between pages. ITER_FOLIOQ is the iterator for a chain of folio_queue structs, where each folio_queue holds a small list of folios. A folio_queue struct is a simpler structure than xarray and is not subject to concurrent manipulation by the VM. folio_queue is used rather than a bvec[] as it can form lists of indefinite size, adding to one end and removing from the other on the fly. - Provide a copy_folio_from_iter() wrapper. - Make cifs RDMA support ITER_FOLIOQ. - Use folio queues in the write-side helpers instead of xarrays. - Add a function to reset the iterator in a subrequest. - Simplify the write-side helpers to use sheaves to skip gaps rather than trying to work out where gaps are. - In afs, make the read subrequests asynchronous, putting them into work items to allow the next patch to do progressive unlocking/reading. - Overhaul the read-side helpers to improve performance. - Fix the caching of a partial block at the end of a file. - Allow a store to be cancelled. Then some changes for cifs to make it use folio queues instead of xarrays for crypto bufferage: - Use raw iteration functions rather than manually coding iteration when hashing data. - Switch to using folio_queue for crypto buffers. - Remove the xarray bits. Make some adjustments to the /proc/fs/netfs/stats file such that: - All the netfs stats lines begin 'Netfs:' but change this to something a bit more useful. - Add a couple of stats counters to track the numbers of skips and waits on the per-inode writeback serialisation lock to make it easier to check for this as a source of performance loss. Miscellaneous work: - Ensure that the sb_writers lock is taken around vfs_{set,remove}xattr() in the cachefiles code. - Reduce the number of conditional branches in netfs_perform_write(). - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and remove cifs_post_modify(). - Move the max_len/max_nr_segs members from netfs_io_subrequest to netfs_io_request as they're only needed for one subreq at a time. - Add an 'unknown' source value for tracing purposes. - Remove NETFS_COPY_TO_CACHE as it's no longer used. - Set the request work function up front at allocation time. - Use bh-disabling spinlocks for rreq->lock as cachefiles completion may be run from block-filesystem DIO completion in softirq context. - Remove fs/netfs/io.c" * tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) docs: filesystems: corrected grammar of netfs page cifs: Don't support ITER_XARRAY cifs: Switch crypto buffer to use a folio_queue rather than an xarray cifs: Use iterate_and_advance*() routines directly for hashing netfs: Cancel dirty folios that have no storage destination cachefiles, netfs: Fix write to partial block at EOF netfs: Remove fs/netfs/io.c netfs: Speed up buffered reading afs: Make read subreqs async netfs: Simplify the writeback code netfs: Provide an iterator-reset function netfs: Use new folio_queue data type and iterator instead of xarray iter cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs iov_iter: Provide copy_folio_from_iter() mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios netfs: Use bh-disabling spinlocks for rreq->lock netfs: Set the request work function upon allocation netfs: Remove NETFS_COPY_TO_CACHE netfs: Reserve netfs_sreq_source 0 as unset/unknown netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream ... |
||
![]() |
2775df6e5e |
vfs-6.12.folio
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvgAKCRCRxhvAZXjc ou77AQD3U1KjbdgzbUi6kaUmiiWOPhfYTlm8mho8dBjqvTCB+AD/XTWSFCWWhHB4 KyQZTbjRD81xmVNbKjASazp0EA6Ahwc= =gIsD -----END PGP SIGNATURE----- Merge tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs folio updates from Christian Brauner: "This contains work to port write_begin and write_end to rely on folios for various filesystems. This converts ocfs2, vboxfs, orangefs, jffs2, hostfs, fuse, f2fs, ecryptfs, ntfs3, nilfs2, reiserfs, minixfs, qnx6, sysv, ufs, and squashfs. After this series lands a bunch of the filesystems in this list do not mention struct page anymore" * tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (61 commits) Squashfs: Ensure all readahead pages have been used Squashfs: Rewrite and update squashfs_readahead_fragment() to not use page->index Squashfs: Update squashfs_readpage_block() to not use page->index Squashfs: Update squashfs_readahead() to not use page->index Squashfs: Update page_actor to not use page->index jffs2: Use a folio in jffs2_garbage_collect_dnode() jffs2: Convert jffs2_do_readpage_nolock to take a folio buffer: Convert __block_write_begin() to take a folio ocfs2: Convert ocfs2_write_zero_page to use a folio fs: Convert aops->write_begin to take a folio fs: Convert aops->write_end to take a folio vboxsf: Use a folio in vboxsf_write_end() orangefs: Convert orangefs_write_begin() to use a folio orangefs: Convert orangefs_write_end() to use a folio jffs2: Convert jffs2_write_begin() to use a folio jffs2: Convert jffs2_write_end() to use a folio hostfs: Convert hostfs_write_end() to use a folio fuse: Convert fuse_write_begin() to use a folio fuse: Convert fuse_write_end() to use a folio f2fs: Convert f2fs_write_begin() to use a folio ... |
||
![]() |
8f72c31f45 |
vfs-6.12.misc
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEGwAKCRCRxhvAZXjc ojIuAQC433+hBkvjvmQ7H0r5rgZSjUuCTG3bSmdU7RJmPHUHhwEA85v/NGq53f+W IhandK6t+Cf0JYpFZ3N0bT88hDYVhQQ= =9zGL -----END PGP SIGNATURE----- Merge tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual pile of misc updates: Features: - Add F_CREATED_QUERY fcntl() that allows userspace to query whether a file was actually created. Often userspace wants to know whether an O_CREATE request did actually create a file without using O_EXCL. The current logic is that to first attempts to open the file without O_CREAT | O_EXCL and if ENOENT is returned userspace tries again with both flags. If that succeeds all is well. If it now reports EEXIST it retries. That works fairly well but some corner cases make this more involved. If this operates on a dangling symlink the first openat() without O_CREAT | O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL will fail with EEXIST. The reason is that openat() without O_CREAT | O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security reasons. So it's not something we can really change unless we add an explicit opt-in via O_FOLLOW which seems really ugly. All available workarounds are really nasty (fanotify, bpf lsm etc) so add a simple fcntl(). - Try an opportunistic lookup for O_CREAT. Today, when opening a file we'll typically do a fast lookup, but if O_CREAT is set, the kernel always takes the exclusive inode lock. This was likely done with the expectation that O_CREAT means that we always expect to do the create, but that's often not the case. Many programs set O_CREAT even in scenarios where the file already exists (see related F_CREATED_QUERY patch motivation above). The series contained in the pr rearranges the pathwalk-for-open code to also attempt a fast_lookup in certain O_CREAT cases. If a positive dentry is found, the inode_lock can be avoided altogether and it can stay in rcuwalk mode for the last step_into. - Expose the 64 bit mount id via name_to_handle_at() Now that we provide a unique 64-bit mount ID interface in statx(2), we can now provide a race-free way for name_to_handle_at(2) to provide a file handle and corresponding mount without needing to worry about racing with /proc/mountinfo parsing or having to open a file just to do statx(2). While this is not necessary if you are using AT_EMPTY_PATH and don't care about an extra statx(2) call, users that pass full paths into name_to_handle_at(2) need to know which mount the file handle comes from (to make sure they don't try to open_by_handle_at a file handle from a different filesystem) and switching to AT_EMPTY_PATH would require allocating a file for every name_to_handle_at(2) call - Add a per dentry expire timeout to autofs There are two fairly well known automounter map formats, the autofs format and the amd format (more or less System V and Berkley). Some time ago Linux autofs added an amd map format parser that implemented a fair amount of the amd functionality. This was done within the autofs infrastructure and some functionality wasn't implemented because it either didn't make sense or required extra kernel changes. The idea was to restrict changes to be within the existing autofs functionality as much as possible and leave changes with a wider scope to be considered later. One of these changes is implementing the amd options: 1) "unmount", expire this mount according to a timeout (same as the current autofs default). 2) "nounmount", don't expire this mount (same as setting the autofs timeout to 0 except only for this specific mount) . 3) "utimeout=<seconds>", expire this mount using the specified timeout (again same as setting the autofs timeout but only for this mount) To implement these options per-dentry expire timeouts need to be implemented for autofs indirect mounts. This is because all map keys (mounts) for autofs indirect mounts use an expire timeout stored in the autofs mount super block info. structure and all indirect mounts use the same expire timeout. Fixes: - Fix missing fput for FSCONFIG_SET_FD in autofs - Use param->file for FSCONFIG_SET_FD in coda - Delete the 'fs/netfs' proc subtreee when netfs module exits - Make sure that struct uid_gid_map fits into a single cacheline - Don't flush in-flight wb switches for superblocks without cgroup writeback - Correcting the idmapping mount example in the idmapping documentation - Fix a race between evice_inodes() and find_inode() and iput() - Refine the show_inode_state() macro definition in writeback code - Prevent dump_mapping() from accessing invalid dentry.d_name.name - Show actual source for debugfs in /proc/mounts - Annotate data-race of busy_poll_usecs in eventpoll - Don't WARN for racy path_noexec check in exec code - Handle OOM on mnt_warn_timestamp_expiry() - Fix some spelling in the iomap design documentation - Fix typo in procfs comment - Fix typo in fs/namespace.c comment Cleanups: - Add the VFS git tree to the MAINTAINERS file - Move FMODE_UNSIGNED_OFFSET to fop_flags freeing up another f_mode bit in struct file bringing us to 5 free f_mode bits - Remove the __I_DIO_WAKEUP bit from i_state flags as we can simplify the wait mechanism - Remove the unused path_put_init() helper - Replace a __u32 with u32 for s_fsnotify_mask as __u32 is uapi specific - Replace the unsigned long i_state member with a u32 i_state member in struct inode freeing up 4 bytes in struct inode. Instead of using the bit based wait apis we're now using the var event apis and using the individual bytes of the i_state member to wait on state changes - Explain how per-syscall AT_* flags should be allocated - Use in_group_or_capable() helper to simplify the posix acl mode update code - Switch to LIST_HEAD() in fsync_buffers_list() to simplify the code - Removed comment about d_rcu_to_refcount() as that function doesn't exist anymore - Add kernel documentation for lookup_fast() - Don't re-zero evenpoll fields - Remove outdated comment after close_fd() - Fix imprecise wording in comment about the pipe filesystem - Drop GFP_NOFAIL mode from alloc_page_buffers - Missing blank line warnings and struct declaration improved in file_table - Annotate struct poll_list with __counted_by() - Remove the unused read parameter in percpu-rwsem - Remove linux/prefetch.h include from direct-io code - Use kmemdup_array instead of kmemdup for multiple allocation in mnt_idmapping code - Remove unused mnt_cursor_del() declaration Performance tweaks: - Dodge smp_mb in break_lease and break_deleg in the common case - Only read fops once in fops_{get,put}() - Use RCU in ilookup() - Elide smp_mb in iversion handling in the common case - Drop one lock trip in evict()" * tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (58 commits) uidgid: make sure we fit into one cacheline proc: Fix typo in the comment fs/pipe: Correct imprecise wording in comment fhandle: expose u64 mount id to name_to_handle_at(2) uapi: explain how per-syscall AT_* flags should be allocated fs: drop GFP_NOFAIL mode from alloc_page_buffers writeback: Refine the show_inode_state() macro definition fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name mnt_idmapping: Use kmemdup_array instead of kmemdup for multiple allocation netfs: Delete subtree of 'fs/netfs' when netfs module exits fs: use LIST_HEAD() to simplify code inode: make i_state a u32 inode: port __I_LRU_ISOLATING to var event vfs: fix race between evice_inodes() and find_inode()&iput() inode: port __I_NEW to var event inode: port __I_SYNC to var event fs: reorder i_state bits fs: add i_state helpers MAINTAINERS: add the VFS git tree fs: s/__u32/u32/ for s_fsnotify_mask ... |
||
![]() |
71fdfcdd0d
|
Documentation: iomap: fix a typo
Change voidw -> void. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240820161329.1293718-1-kernel@pankajraghav.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
4b40d43d9f
|
docs: filesystems: corrected grammar of netfs page
Fixed the word "aren't" to "isn't" based on singular word "bufferage". Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com> Link: https://lore.kernel.org/r/20240912012550.13748-2-dennis.lamerice@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
0088d7581b |
tools: usb: p9_fwd: wrap USBG shell command examples in literal code blocks
Stephen Rothwell reported htmldocs warning when merging usb tree:
Documentation/filesystems/9p.rst:99: ERROR: Unexpected indentation.
That's because Sphinx tries rendering p9_fwd.py output as a grid table
instead.
Wrap shell commands in "USBG Example" section in literal code blocks
to fix above warning and to be in line with rest of commands in the doc.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20240905184059.0f30ff9a@canb.auug.org.au/
Fixes:
|
||
![]() |
2409952f64 |
docs:filesystems: fix spelling and grammar mistakes
Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20240906195400.39949-1-dennis.lamerice@gmail.com> |
||
![]() |
0cac9253a0 |
docs:filesystem: fix mispelled words on autofs page
Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20240908183741.15352-2-dennis.lamerice@gmail.com> |
||
![]() |
d89b35d83e |
bcachefs: Fix a spelling error in docs
Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
||
![]() |
b1daf3f847
|
docs:filesystems: fix spelling and grammar mistakes in iomap design page
Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com> Link: https://lore.kernel.org/r/20240908172841.9616-2-dennis.lamerice@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
930b7c32ea |
overlayfs.rst: update metacopy section in overlayfs documentation
- Provide info about trusted.overlay.metacopy extended attribute - Minor rephrasing regarding copy-up operation with metacopy=on Signed-off-by: Yuriy Belikov <yuriybelikov1@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> |
||
![]() |
d224338aa1 |
Linux 6.11-rc6
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbUG7oeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG7LUH/26M4QJ5UGJHsehd bbHlE4or0jibFyMbUiYDOElqLITjCVH6mi3Kv3E7sfyLxSsglVRRNzLCTq/UgTf8 E1L90q4wCySElzzIhH6cltuQdAhs7pRWs5BETByvIW+g+ayN0LZxUPbvB8yl/nOU Zx8flBEuM2isuRlnx+iRccbf2PxNadSkSYg2TlmZr8mfFKCiRxjU7x355Q3UcylQ b8S2jVgq69CSDF3IBOzwHZjdq5OceDsO8he0KcfSTvSgyFMcwhntAT397YEnFXnk KKjKPNCu3KqHtTxsi4Sc0wOxVcgctDv4OPethaL8yROQ7jdBTkvNpPT1yMf7bca8 ZLpSo5Y= =TBcj -----END PGP SIGNATURE----- Merge tag 'v6.11-rc6' into docs-mw This is done primarily to get a docs build fix merged via another tree so that "make htmldocs" stops failing. |
||
![]() |
673f0c3ffc |
tools: usb: p9_fwd: add usb gadget packet forwarder script
This patch is adding an small python tool to forward 9pfs requests from the USB gadget to an existing 9pfs TCP server. Since currently all 9pfs servers lack support for the usb transport this tool is an useful helper to get started. Refer the Documentation section "USBG Example" in Documentation/filesystems/9p.rst on how to use it. Signed-off-by: Jan Luebbe <jlu@pengutronix.de> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Tested-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Link: https://lore.kernel.org/r/20240116-ml-topic-u9p-v12-3-9a27de5160e0@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
a3be076dc1 |
net/9p/usbg: Add new usb gadget function transport
Add the new gadget function for 9pfs transport. This function is defining an simple 9pfs transport interface that consists of one in and one out endpoint. The endpoints transmit and receive the 9pfs protocol payload when mounting a 9p filesystem over usb. Tested-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Link: https://lore.kernel.org/r/20240116-ml-topic-u9p-v12-2-9a27de5160e0@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
420e05d0de |
fs: remove calls to set and clear the folio error flag
Nobody checks the folio error flag any more, so we can stop setting and clearing it. Also remove the documentation suggesting to not bother setting the error bit. Link: https://lkml.kernel.org/r/20240807193528.1865100-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
3717bbcb59 |
doc: correcting the idmapping mount example
In step 2, we obtain the kernel id `k1000`. So in next step (step 3), we should translate the `k1000` not `k21000`. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20240816063611.1961910-1-lihongbo22@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
28c7658b2c |
Fix spelling and gramatical errors
Fixed 3 typos in design.rst Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Link: https://lore.kernel.org/r/20240807070536.14536-1-shenxiaxi26@gmail.com Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
f92a24ae7c |
Documentation/fs/9p: Expand goo.gl link
The goo.gl URL shortener is deprecated and is due to stop expanding existing links in 2025. The old goo.gl link in the 9p docs doesn't work anyway, replace it by a kernel.org link suggested by Randy instead. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240725180041.80862-1-linux@treblig.org |
||
![]() |
5c6154ffd4 |
Changes since last update:
- Allow large folios on compressed inodes;
- Fix invalid memory accesses if z_erofs_gbuf_growsize()
partially fails;
- Two minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmbF1s8RHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qqzghAAjSi/OITIcpjYe3GOxDNEbIxTFf+5d7ge
66KRMBm5FG6xBaXOK4xhit2EdFsDVMAna6FjYyRJYpDDQX5MAUsifDeZeGZCKOV5
KpUk2CfhIp1zi9CeVA5Tl6rfdz0b0SKz77uRK9BN8PWmCdbeenOKgK6HU06S5enC
uPud9QegQELXad0bIe+uo+4zGQnqxwGMfkNEW+cfbp200m6kpig/SgttoXCtz1mm
CEZpv87MlrQLbLtox+KcesFhYofxWdx/My5ckJVbaHGZdM29HKehRLIsBFX3di5N
4OkGnFbh16lhx6s4a/b1pCeDNH1NcmP7uqCoGr+dCVKQ79glSZQvfiV4myQk7vRN
4Orj+zq4aBSmYOvizIFMH++IGDioPBxkhKv3ReeCe4t/L8LHPkGEaeT+iM9b+b4n
uK0faj6qREygf2rHIRr6ciq7GdxlMwdlF7AeNUIyCHNO84J39+GkAID3Nqs+vzuF
wSJAuqp959WgJf2Co5ugb4vjbrpq8TKfZG0yk7KnIOWJkhAKDcIwta+ooloIYtD8
5gHhMRodIgKjYKPOi9vx9cQPlISTMsqVWpRuyzH9vCvqNvkzDriAez1kFq01tZTO
gcK5QGekSgft/5h7PLFrbVV1kVfVdjNapsIY+XAEmBGGCJCScCrlIxjbsRwsseFW
JIpQXilIU3c=
=gFZW
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.11-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"As I mentioned in the merge window pull request, there is a regression
which could cause system hang due to page migration. The corresponding
fix landed upstream through MM tree last week (commit
|
||
![]() |
ac6731870e |
documentation: add IPE documentation
Add IPE's admin and developer documentation to the kernel tree. Co-developed-by: Fan Wu <wufan@linux.microsoft.com> Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com> |
||
![]() |
7c373e4f14 |
fsverity: expose verified fsverity built-in signatures to LSMs
This patch enhances fsverity's capabilities to support both integrity and authenticity protection by introducing the exposure of built-in signatures through a new LSM hook. This functionality allows LSMs, e.g. IPE, to enforce policies based on the authenticity and integrity of files, specifically focusing on built-in fsverity signatures. It enables a policy enforcement layer within LSMs for fsverity, offering granular control over the usage of authenticity claims. For instance, a policy could be established to only permit the execution of all files with verified built-in fsverity signatures. The introduction of a security_inode_setintegrity() hook call within fsverity's workflow ensures that the verified built-in signature of a file is exposed to LSMs. This enables LSMs to recognize and label fsverity files that contain a verified built-in fsverity signature. This hook is invoked subsequent to the fsverity_verify_signature() process, guaranteeing the signature's verification against fsverity's keyring. This mechanism is crucial for maintaining system security, as it operates in kernel space, effectively thwarting attempts by malicious binaries to bypass user space stack interactions. The second to last commit in this patch set will add a link to the IPE documentation in fsverity.rst. Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com> |
||
![]() |
e080a26725 |
erofs: allow large folios for compressed files
As commit
|
||
![]() |
4fdd8664c8 |
ksmbd: fix spelling mistakes in documentation
There are a couple of spelling mistakes in the documentation. This patch fixes them. Signed-off-by: Victor Timofei <victor@vtimothy.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
![]() |
889ced4c93
|
netfs: clean up after renaming FSCACHE_DEBUG config
Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First, NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas the actual intended config is called NETFS_SUPPORT. Second, the config renaming misses to adjust the documentation of the functionality of this config. Clean up those two points. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Link: https://lore.kernel.org/r/20240731073902.69262-1-lukas.bulwahn@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
1da86618bd
|
fs: Convert aops->write_begin to take a folio
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
a225800f32
|
fs: Convert aops->write_end to take a folio
Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
fbc90c042c |
- 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff). Yu has a fix which I'll send along later via the hotfixes branch. - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Is anyone reading this stuff? If so, email me! - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8= =z0Lf -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ... |
||
![]() |
6706415bf9 |
gfs2 fixes and cleanups
- Revise the glock reference counting model - Several quota related fixes - Clean up the glock demote logic -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmaVQ4gUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZToLog//T6Iljxro4CMkNvGJx2B3puo2rtbd mToxam0ZTkE/xXcxwRJDMFjxLdQ74xtiZLFJF8l/OwpiUpkKjh+hXdH4IABZG9Xm hNSvYBFiUCt86pcDKc/ia7dH/xSBN3nH1IpNtr6dCFBXHkc1tK1v+QJ+RnFDZ9Re kgMMYjmGKRfBRuR+r0uxF0V09jQYHmQ5K/o4arF5NX6ifUKX0tnnr8wB8bfXCznp uXG6Jf8TWSGDcJI+phi7o5tNUN4187RRlODPewHsBmS0bdZla5buu5q9ATBDk1Ca Btst+Oa6uIc6MHv9e9e59mVIp1NScYNnfDedFLLfxigskcGo2f7kaTlTNmccrMrm sQNyVhWG5zlUJL7OcdonKP64XAJZbFt5I29RJOiqY1Z4OxCB3OH7Rl7MhbCATq/o 6jjN6+1DOGDKy3vbxIwaIsjC1E9H5hzIsmbzjEya0TpjrHENFPOCAFvWH2PdV2yc hNhAiIKn6ihZs5QiXDGs9F46Qxb8C4nMDI/UAm2S7qAABlsD7m34PIZcfsVd+ouF ySZOhf2xfLFovk1+QqAzaOGxtxUHqNkUhpakKPcjocb2NIdwUo1k8QM+plPOvnrm 1arppql5u1DAdzAvKI2LLjeKd64Wl8TG7SJuQQq8PiBMSa/MRSlKB5N3LFiqnbzC iM/MYm1dPQ088wA= =BfXp -----END PGP SIGNATURE----- Merge tag 'gfs2-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: "Fixes and cleanups: - Revise the glock reference counting model and LRU list handling to be more sensible - Several quota related fixes: clean up the quota code, add some missing locking, and work around the on-disk corruption that the reverted patch "gfs2: ignore negated quota changes" causes - Clean up the glock demote logic in glock_work_func()" * tag 'gfs2-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits) gfs2: Clean up glock demote logic gfs2: Revert "check for no eligible quota changes" gfs2: Be more careful with the quota sync generation gfs2: Get rid of some unnecessary quota locking gfs2: Add some missing quota locking gfs2: Fold qd_fish into gfs2_quota_sync gfs2: quota need_sync cleanup gfs2: Fix and clean up function do_qc gfs2: Revert "Add quota_change type" gfs2: Revert "ignore negated quota changes" gfs2: qd_check_sync cleanups gfs2: Revert "introduce qd_bh_get_or_undo" gfs2: Check quota consistency on mount gfs2: Minor gfs2_quota_init error path cleanup gfs2: Get rid of demote_ok checks Revert "GFS2: Don't add all glocks to the lru" gfs2: Revise glock reference counting model gfs2: Switch to a per-filesystem glock workqueue gfs2: Report when glocks cannot be freed for a long time gfs2: gfs2_glock_get cleanup ... |
||
![]() |
4f5e249ec0 |
vfs-6.11.iomap
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEHLQAKCRCRxhvAZXjc ot3sAP9TBUM+vzUcQ5SVcUnSX+y3dhOGYnquORBbRc/Y6AzLMAEAu3TcsvdoaWfy 6ImUaju6iLqy9cCY3uDlNmJR16E4IgE= =Bwpy -----END PGP SIGNATURE----- Merge tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull iomap updates from Christian Brauner: "This contains some minor work for the iomap subsystem: - Add documentation on the design of iomap and how to port to it - Optimize iomap_read_folio() - Bring back the change to iomap_write_end() to no increase i_size. This is accompanied by a change to xfs to reserve blocks for truncating large realtime inodes to avoid exposing stale data when iomap_write_end() stops increasing i_size" * tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: don't increase i_size in iomap_write_end() xfs: reserve blocks for truncating large realtime inode Documentation: the design of iomap and how to port iomap: Optimize iomap_read_folio |
||
![]() |
b8fc1bd73a |
vfs-6.11.mount.api
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEGjAAKCRCRxhvAZXjc okXfAP4tFUYszUsSqYdsgy9UvXw3Dr5zOIzQmN++NdjGkbU5fgEAs2ystqEfJgr3 v7XvGbu65CvL4/slNhBZOU4yekGx5Qc= =C4QD -----END PGP SIGNATURE----- Merge tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount API updates from Christian Brauner: - Add a generic helper to parse uid and gid mount options. Currently we open-code the same logic in various filesystems which is error prone, especially since the verification of uid and gid mount options is a sensitive operation in the face of idmappings. Add a generic helper and convert all filesystems over to it. Make sure that filesystems that are mountable in unprivileged containers verify that the specified uid and gid can be represented in the owning namespace of the filesystem. - Convert hostfs to the new mount api. * tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fuse: Convert to new uid/gid option parsing helpers fuse: verify {g,u}id mount options correctly fat: Convert to new uid/gid option parsing helpers fat: Convert to new mount api fat: move debug into fat_mount_options vboxsf: Convert to new uid/gid option parsing helpers tracefs: Convert to new uid/gid option parsing helpers smb: client: Convert to new uid/gid option parsing helpers tmpfs: Convert to new uid/gid option parsing helpers ntfs3: Convert to new uid/gid option parsing helpers isofs: Convert to new uid/gid option parsing helpers hugetlbfs: Convert to new uid/gid option parsing helpers ext4: Convert to new uid/gid option parsing helpers exfat: Convert to new uid/gid option parsing helpers efivarfs: Convert to new uid/gid option parsing helpers debugfs: Convert to new uid/gid option parsing helpers autofs: Convert to new uid/gid option parsing helpers fs_parse: add uid & gid option option parsing helpers hostfs: Add const qualifier to host_root in hostfs_fill_super() hostfs: convert hostfs to use the new mount API |
||
![]() |
c10cb9148e |
docs/procfs: call out ioctl()-based PROCMAP_QUERY command existence
Call out PROCMAP_QUERY ioctl() existence in the section describing /proc/PID/maps file in documentation. We refer user to UAPI header for low-level details of this programmatic interface. Link: https://lkml.kernel.org/r/20240627170900.1672542-5-andrii@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
9f111059e7
|
fs_parse: add uid & gid option option parsing helpers
Multiple filesystems take uid and gid as options, and the code to create the ID from an integer and validate it is standard boilerplate that can be moved into common helper functions, so do that for consistency and less cut&paste. This also helps avoid the buggy pattern noted by Seth Jenkins at https://lore.kernel.org/lkml/CALxfFW4BXhEwxR0Q5LSkg-8Vb4r2MONKCcUCVioehXQKr35eHg@mail.gmail.com/ because uid/gid parsing will fail before any assignment in most filesystems. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Link: https://lore.kernel.org/r/de859d0a-feb9-473d-a5e2-c195a3d47abb@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
399ab86ea5 |
/proc/pid/smaps: add mseal info for vma
Add sl in /proc/pid/smaps to indicate vma is sealed
Link: https://lkml.kernel.org/r/20240614232014.806352-2-jeffxu@google.com
Fixes:
|
||
![]() |
a7ca193bc9
|
Documentation: the design of iomap and how to port
Capture the design of iomap and how to port filesystems to use it. Apologies for all the rst formatting, but it's necessary to distinguish code from regular text. A lot of this has been collected from various email conversations, code comments, commit messages, my own understanding of iomap, and Ritesh/Luis' previous efforts to create a document. Please note a large part of this has been taken from Dave's reply to last iomap doc patchset. Thanks to Ritesh, Luis, Dave, Darrick, Matthew, Christoph and other iomap developers who have taken time to explain the iomap design in various emails, commits, comments etc. Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Luis Chamberlain <mcgrof@kernel.org> Inspired-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240614214347.GK6125@frogsfrogsfrogs Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
713f883438 |
gfs2: Get rid of demote_ok checks
The demote_ok glock operation is only still used to prevent the inode glocks of the "jindex" and "rindex" directories from getting recycled while they are still referenced by sdp->sd_jindex and sdp->sd_rindex. However, the LRU walking code will no longer recycle glocks which are referenced, so the demote_ok glock operation is obsolete and can be removed. Each of a glock's holders in the gl_holders list is holding a reference on the glock, so when the list of holders isn't empty in demote_ok(), the existing reference count check will already prevent the glock from getting released. This means that demote_ok() is obsolete as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> |
||
![]() |
97d6fdcd79 |
gfs2: Update glocks documentation
Rearrange the table of locking modes and associated caching capability to be in order of increasing caching capability. Update the description of the glock operations. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> |
||
![]() |
9b62e02e63 |
16 hotfixes, 11 of which are cc:stable.
A few nilfs2 fixes, the remainder are for MM: a couple of selftests fixes, various singletons fixing various issues in various parts. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZlIOUgAKCRDdBJ7gKXxA jrYnAP9UeOw8YchTIsjEllmAbTMAqWGI+54CU/qD78jdIHoVWAEAmp0QqgFW3r2p jze4jBkh3lGQjykTjkUskaR71h9AZww= =AHeV -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes, 11 of which are cc:stable. A few nilfs2 fixes, the remainder are for MM: a couple of selftests fixes, various singletons fixing various issues in various parts" * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/ksm: fix possible UAF of stable_node mm/memory-failure: fix handling of dissolved but not taken off from buddy pages mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again nilfs2: fix potential hang in nilfs_detach_log_writer() nilfs2: fix unexpected freezing of nilfs_segctor_sync() nilfs2: fix use-after-free of timer for log writer thread selftests/mm: fix build warnings on ppc64 arm64: patching: fix handling of execmem addresses selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages selftests/mm: compaction_test: fix bogus test success on Aarch64 mailmap: update email address for Satya Priya mm/huge_memory: don't unpoison huge_zero_folio kasan, fortify: properly rename memintrinsics lib: add version into /proc/allocinfo output mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL |
||
![]() |
74eca356f6 |
We have a series from Xiubo that adds support for additional access
checks based on MDS auth caps which were recently made available to clients. This is needed to prevent scenarios where the MDS quietly discards updates that a UID-restricted client previously (wrongfully) acked to the user. Other than that, just a documentation fixup. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmZRsm0THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi6++B/4o7j4CNzjJcBw9UgxEUugwJYBe2Ht3 vSTUkHD9NILVgrYSHNhgkCdvU8ckv8Sd+7W/Kb0BC/GRyQd57F7zoM6mvR1WMozt lLbYU/kdVI+TcIY2bhupMma5f+nWv6vIBTzca78UhogEBuIEYHAG3BnNaT/AEqF+ yZ3uQEVQ2bHmwbPn3A5dibYxOR8zLyhmaq/RUvqpiiYcSkEfZ6QqKiMlZkcVD7F8 c+NfjwXXGNTXDhfIbG4VndQi7xLXk3GI5E8xvdo2ALwumfp2KdVlMEYT8SHggSPS A8bWq+d5o7uVV6WviNK63XcGRpadCSSSL310vR78K+tNIIq5dnI81IsU =6c1e -----END PGP SIGNATURE----- Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "A series from Xiubo that adds support for additional access checks based on MDS auth caps which were recently made available to clients. This is needed to prevent scenarios where the MDS quietly discards updates that a UID-restricted client previously (wrongfully) acked to the user. Other than that, just a documentation fixup" * tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client: doc: ceph: update userspace command to get CephFS metadata ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit ceph: check the cephx mds auth access for async dirop ceph: check the cephx mds auth access for open ceph: check the cephx mds auth access for setattr ceph: add ceph_mds_check_access() helper ceph: save cap_auths in MDS client when session is opened |
||
![]() |
a38568a0b4 |
lib: add version into /proc/allocinfo output
Add version string and a header at the beginning of /proc/allocinfo to allow later format changes. Example output: > head /proc/allocinfo allocinfo - version: 1.0 # <size> <calls> <tag info> 0 0 init/main.c:1314 func:do_initcalls 0 0 init/do_mounts.c:353 func:mount_nodev_root 0 0 init/do_mounts.c:187 func:mount_root_generic 0 0 init/do_mounts.c:158 func:do_mount_root 0 0 init/initramfs.c:493 func:unpack_to_rootfs 0 0 init/initramfs.c:492 func:unpack_to_rootfs 0 0 init/initramfs.c:491 func:unpack_to_rootfs 512 1 arch/x86/events/rapl.c:681 func:init_rapl_pmus 128 1 arch/x86/events/rapl.c:571 func:rapl_cpu_online [akpm@linux-foundation.org: remove stray newline from struct allocinfo_private] Link: https://lkml.kernel.org/r/20240514163128.3662251-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
93a2221c9c |
doc: ceph: update userspace command to get CephFS metadata
According to ceph documentation [1], "getfattr -d /some/dir" no longer displays the list of all extended attributes. Both CephFS kernel and FUSE clients hide this information. To retrieve the information you have to specify the particular attribute name e.g. "getfattr -n ceph.dir.rbytes /some/dir". [1] https://docs.ceph.com/en/latest/cephfs/quota/ Signed-off-by: Artem Ikonnikov <artem@datacrunch.io> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
||
![]() |
5ad8b6ad9a |
getting rid of bogus set_blocksize() uses, switching it
to struct file * and verifying that caller has device opened exclusively. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwkfQAKCRBZ7Krx/gZQ 62C3AQDW5vuXNx2+KDPma5YStjFpPLC0xtSyAS5D3YANjtyRFgD/TOcCarq7rvBt KubxHVFsfW+eu6ASeaoMRB83w5OIzwk= =Liix -----END PGP SIGNATURE----- Merge tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs blocksize updates from Al Viro: "This gets rid of bogus set_blocksize() uses, switches it over to be based on a 'struct file *' and verifies that the caller has the device opened exclusively" * tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make set_blocksize() fail unless block device is opened exclusive set_blocksize(): switch to passing struct file * btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens swsusp: don't bother with setting block size zram: don't bother with reopening - just use O_EXCL for open swapon(2): open swap with O_EXCL swapon(2)/swapoff(2): don't bother with block size pktcdvd: sort set_blocksize() calls out bcache_register(): don't bother with set_blocksize() |
||
![]() |
72ece20127 |
f2fs update for 6.10-rc1
In this round, we've tried to address some performance issues on zoned storage such as direct IO and write_hints. In addition, we've migrated some IO paths using folio. Meanwhile, there are multiple bug fixes in the compression paths, sanity check conditions, and error handlers. Enhancement: - allow direct io of pinned files for zoned storage - assign the write hint per stream by default - convert read paths and test_writeback to folio - avoid allocating WARM_DATA segment for direct IO Bug fix: - fix false alarm on invalid block address - fix to add missing iput() in gc_data_segment() - fix to release node block count in error path of f2fs_new_node_page() - compress: don't allow unaligned truncation on released compress inode - compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock - compress: fix error path of inc_valid_block_count() - compress: fix to update i_compr_blocks correctly - fix block migration when section is not aligned to pow2 - don't trigger OPU on pinfile for direct IO - fix to do sanity check on i_xattr_nid in sanity_check_inode() - write missing last sum blk of file pinning section - clear writeback when compression failed - fix to adjust appropirate defragment pg_end As usual, there are several minor code clean-ups, and fixes to manage missing corner cases in the error paths. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmZLpYcACgkQQBSofoJI UNJQTw/+NaY7a1EgkMUpBAzxrJMKHcuBtyG42QKqgk6new0XejQGjPHojL2nPrw/ t5G9TsbZbkHNMuhAkkTZMH+DFg92QYhByJlq79fxzya0XyGH4OaY1i4u67FLu0Qz PS/UKRkEI2B9lH+bGwa//XNMDSnzcao46bNi1SFbCNPGzU1cS35uOy/YgAdFlqTM WKJmM/AcNir4xtL30tBCVU//0OTtzT8+5YFVyPTeFR4WACsF6eTJAre9938xw1Ef p6ed6Wl2GYehqgFrAdAF07veZ1hVDSRAAB/1Mu1WKnNp57VBRjJW3DFDyApf+fIe 2KJIDJd9/ece3dycuiZP/LXPV0sODqOI1/5s9RbFVq/QAhTSME5xq8hNXTejdl28 PV6M2tKcTKMRpykppQg/K/N9PaO5Q6oFz0xlrOsrGoAhT1YnZfJi/DmzCZCCwYxW jyZor/r+849yDDdjhB94ZaByvj5S3OVqgsaunnbMBcGy+DDe0rUMXvRzVK4gTcCF lSTSp895BggWXLyPuXVNTjC4GIbzVbEDaHILPicfbqi0h5OCXG8YybKHiRs+ss6z ZrKJQxSVVvhjyHTVcBhb/Nc1s7Fm7DkX+KjV9GV3gwzB+AlVIgPlwyMTc2fZp3ST dUbmBR5+g4UUz2v4v4ZStAGy9eUFktO89u/roet8/74ppklj73E= =3mwj -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've tried to address some performance issues on zoned storage such as direct IO and write_hints. In addition, we've migrated some IO paths using folio. Meanwhile, there are multiple bug fixes in the compression paths, sanity check conditions, and error handlers. Enhancements: - allow direct io of pinned files for zoned storage - assign the write hint per stream by default - convert read paths and test_writeback to folio - avoid allocating WARM_DATA segment for direct IO Bug fixes: - fix false alarm on invalid block address - fix to add missing iput() in gc_data_segment() - fix to release node block count in error path of f2fs_new_node_page() - compress: - don't allow unaligned truncation on released compress inode - cover {reserve,release}_compress_blocks() w/ cp_rwsem lock - fix error path of inc_valid_block_count() - fix to update i_compr_blocks correctly - fix block migration when section is not aligned to pow2 - don't trigger OPU on pinfile for direct IO - fix to do sanity check on i_xattr_nid in sanity_check_inode() - write missing last sum blk of file pinning section - clear writeback when compression failed - fix to adjust appropirate defragment pg_end As usual, there are several minor code clean-ups, and fixes to manage missing corner cases in the error paths" * tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits) f2fs: initialize last_block_in_bio variable f2fs: Add inline to f2fs_build_fault_attr() stub f2fs: fix some ambiguous comments f2fs: fix to add missing iput() in gc_data_segment() f2fs: allow dirty sections with zero valid block for checkpoint disabled f2fs: compress: don't allow unaligned truncation on released compress inode f2fs: fix to release node block count in error path of f2fs_new_node_page() f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock f2fs: compress: fix error path of inc_valid_block_count() f2fs: compress: fix typo in f2fs_reserve_compress_blocks() f2fs: compress: fix to update i_compr_blocks correctly f2fs: check validation of fault attrs in f2fs_build_fault_attr() f2fs: fix to limit gc_pin_file_threshold f2fs: remove unused GC_FAILURE_PIN f2fs: use f2fs_{err,info}_ratelimited() for cleanup f2fs: fix block migration when section is not aligned to pow2 f2fs: zone: fix to don't trigger OPU on pinfile for direct IO f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode() f2fs: fix to avoid allocating WARM_DATA segment for direct IO f2fs: remove redundant parameter in is_next_segment_free() ... |
||
![]() |
119d1b8a5d |
New code for 6.10:
* Introduce Parent Pointer extended attribute for inodes. * Online Repair - Implement atomic file content exchanges i.e. exchange ranges of bytes between two files atomically. - Create temporary files to repair file-based metadata. This uses atomic file content exchange facility to swap file fork mappings between the temporary file and the metadata inode. - Allow callers of directory/xattr code to set an explicit owner number to be written into the header fields of any new blocks that are created. This is required to avoid walking every block of the new structure and modify their ownership during online repair. - Repair - Extended attributes - Inode unlinked state - Directories - Symbolic links - AGI's unlinked inode list. - Parent pointers. - Move Orphan files to lost and found directory. - Fixes for Inode repair functionality. - Introduce a new sub-AG FITRIM implementation to reduce the duration for which the AGF lock is held. - Updates for the design documentation. - Use Parent Pointers to assist in checking directories, parent pointers, extended attributes, and link counts. * Bring back delalloc support for realtime devices which have an extent size that is equal to filesystem's block size. * Improve performance of log incompat feature handling. * Fixes - Prevent userspace from reading invalid file data due to incorrect. updation of file size when performing a non-atomic clone operation. - Minor fixes to online repair. - Fix confusing return values from xfs_bmapi_write(). - Fix an out of bounds access due to incorrect h_size during log recovery. - Defer upgrading the extent counters in xfs_reflink_end_cow_extent() until we know we are going to modify the extent mapping. - Remove racy access to if_bytes check in xfs_reflink_end_cow_extent(). - Fix sparse warnings. * Cleanups - Hold inode locks on all files involved in a rename until the completion of the operation. This is in preparation for the parent pointers patchset where parent pointers are applied in a separate chained update from the actual directory update. - Compile out v4 support when disabled. - Cleanup xfs_extent_busy_clear(). - Remove unused flags and fields from struct xfs_da_args. - Remove definitions of unused functions. - Improve extended attribute validation. - Add higher level directory operations helpers to remove duplication of code. - Cleanup quota (un)reservation interfaces. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZjZC0wAKCRAH7y4RirJu 9HsCAPoCQvmPefDv56aMb5JEQNpv9dPz2Djj14hqLytQs5P/twD+LF5NhJgQNDUo Lwnb0tmkAhmG9Y4CCiN1FwSj1rq59gE= =2hXB -----END PGP SIGNATURE----- Merge tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Chandan Babu: "Online repair feature continues to be expanded. Also, we now support delayed allocation for realtime devices which have an extent size that is equal to filesystem's block size. New code: - Introduce Parent Pointer extended attribute for inodes - Bring back delalloc support for realtime devices which have an extent size that is equal to filesystem's block size - Improve performance of log incompat feature handling Online Repair: - Implement atomic file content exchanges i.e. exchange ranges of bytes between two files atomically - Create temporary files to repair file-based metadata. This uses atomic file content exchange facility to swap file fork mappings between the temporary file and the metadata inode - Allow callers of directory/xattr code to set an explicit owner number to be written into the header fields of any new blocks that are created. This is required to avoid walking every block of the new structure and modify their ownership during online repair - Repair more data structures: - Extended attributes - Inode unlinked state - Directories - Symbolic links - AGI's unlinked inode list - Parent pointers - Move Orphan files to lost and found directory - Fixes for Inode repair functionality - Introduce a new sub-AG FITRIM implementation to reduce the duration for which the AGF lock is held - Updates for the design documentation - Use Parent Pointers to assist in checking directories, parent pointers, extended attributes, and link counts Fixes: - Prevent userspace from reading invalid file data due to incorrect. updation of file size when performing a non-atomic clone operation - Minor fixes to online repair - Fix confusing return values from xfs_bmapi_write() - Fix an out of bounds access due to incorrect h_size during log recovery - Defer upgrading the extent counters in xfs_reflink_end_cow_extent() until we know we are going to modify the extent mapping - Remove racy access to if_bytes check in xfs_reflink_end_cow_extent() - Fix sparse warnings Cleanups: - Hold inode locks on all files involved in a rename until the completion of the operation. This is in preparation for the parent pointers patchset where parent pointers are applied in a separate chained update from the actual directory update - Compile out v4 support when disabled - Cleanup xfs_extent_busy_clear() - Remove unused flags and fields from struct xfs_da_args - Remove definitions of unused functions - Improve extended attribute validation - Add higher level directory operations helpers to remove duplication of code - Cleanup quota (un)reservation interfaces" * tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (221 commits) xfs: simplify iext overflow checking and upgrade xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later xfs: xfs_quota_unreserve_blkres can't fail xfs: consolidate the xfs_quota_reserve_blkres definitions xfs: clean up buffer allocation in xlog_do_recovery_pass xfs: fix log recovery buffer allocation for the legacy h_size fixup xfs: widen flags argument to the xfs_iflags_* helpers xfs: minor cleanups of xfs_attr3_rmt_blocks xfs: create a helper to compute the blockcount of a max sized remote value xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c xfs: do not allocate the entire delalloc extent in xfs_bmapi_write xfs: fix xfs_bmap_add_extent_delay_real for partial conversions xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write xfs: lift a xfs_valid_startblock into xfs_bmapi_allocate xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate xfs: fix error returns from xfs_bmapi_write ... |
||
![]() |
16dbfae867 |
bcachefs changes for 6.10-rc1
- More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZKJQgACgkQE6szbY3K bnZETg//SU9H0OHnBSMB/cteF6PKo9QR+dhT+3n+gWTxl0o/egbGTqwbzVqGtd2f J6II1BsDk8VoTOb/gFfLRShlmJfnj2jpRThU265faR/7LQYeSaqndDPkjOpTayAD Nj/DJyiSUTL753rZh3yUhOpOIHf7iapH6wuaZCPfhdfk+yvZNW8iz07JHjHLKRp8 I2cFH0r6kN916NdRkt9oDCz68WouT8eWTqwcKra04XsLEZjNJHxLpKMq4M8UdPc7 YynJPVt+aP8+VduGIq6pV8Co3afCP2oUywo11JpRmvLsw4tex/59wxOYtpMfgn6k 4H+9WqiBwkbmnLDrfFHWRameS6F/7+GRAOVuz9nkmfk61UPU15gLjSRffqZ6u2YC 7vbrXgebId/sZXtBpQd83RMMX52BnEJah0upNJ54IsSqfDYkU9lwl6CEyYpcX1hf YNBGBTbspZztc3AB13b3ow421FMhaySUg0FDmntMR9O8Z6/BXk7Ykc7b8DPEfrFs W6JY7q+ARBxr+EgFcV74fvMCf7NJTAhyv80AKryo7NFU2JZOyyaTxcTGSnolX4Mi lyHiOgicmOX+vy3vbC1dZoDcmIDJ4Uc0vixYcpKiZqxlR8XJ+wpevC50TEhxrcW+ ZO4SloQvgyjI34xu/gZgjRYb3BhXK3x+ougVFpRG8V8zQ/+ccWg= =MKrF -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs Pull bcachefs updates from Kent Overstreet: - More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions * tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits) bcachefs: add no_invalid_checks flag bcachefs: add counters for failed shrinker reclaim bcachefs: Fix sb_field_downgrade validation bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() bcachefs: s/bkey_invalid_flags/bch_validate_flags bcachefs: fsync() should not return -EROFS bcachefs: Invalid devices are now checked for by fsck, not .invalid methods bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() bcachefs: bch2_dev_get_ioref() checks for device not present bcachefs: bch2_dev_get_ioref2(); io_read.c bcachefs: bch2_dev_get_ioref2(); debug.c bcachefs: bch2_dev_get_ioref2(); journal_io.c bcachefs: bch2_dev_get_ioref2(); io_write.c bcachefs: bch2_dev_get_ioref2(); btree_io.c bcachefs: bch2_dev_get_ioref2(); backpointers.c bcachefs: bch2_dev_get_ioref2(); alloc_background.c bcachefs: for_each_bset() declares loop iter bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h bcachefs: Improve sysfs internal/btree_cache ... |
||
![]() |
61307b7be4 |
The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB nvA4E0DcPrUAFy144FNM0NTCb7u9vAw= =V3R/ -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ... |
||
![]() |
0cc6f45cec |
IOMMU Updates for Linux v6.10
Including: - Core: - IOMMU memory usage observability - This will make the memory used for IO page tables explicitly visible. - Simplify arch_setup_dma_ops() - Intel VT-d: - Consolidate domain cache invalidation - Remove private data from page fault message - Allocate DMAR fault interrupts locally - Cleanup and refactoring - ARM-SMMUv2: - Support for fault debugging hardware on Qualcomm implementations - Re-land support for the ->domain_alloc_paging() callback - ARM-SMMUv3: - Improve handling of MSI allocation failure - Drop support for the "disable_bypass" cmdline option - Major rework of the CD creation code, following on directly from the STE rework merged last time around. - Add unit tests for the new STE/CD manipulation logic - AMD-Vi: - Final part of SVA changes with generic IO page fault handling - Renesas IPMMU: - Add support for R8A779H0 hardware - A couple smaller fixes and updates across the sub-tree -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmZHJMkACgkQK/BELZcB GuND1Q/+M4RN5jM66XCfhqoP8QaI8I7zDlPDd14ismx0bjtOZhoiXpptKkAA8guo 7mS57MLqBw/hKYucm1mw+F1qi1HnRWSstKXiCPmzDm3UXYgZJlKkrOw6vydFeHJH zx2ei7TmBrc0SrsybWK3NWRfVBBkO8enGZTmti0DfHL/rOFcUM0LHegY51GcDaaH SlDr+LLDMeGynSQWhRlVNJVmEI5gpVPitY/mDUpVPoELiW9C0WGk8kPlR11z2pCR eUNiqGJUcGasOhmfiYnpJR462eg7J41glquu+YHj8ivPbbu3C4wxgruY/tR4dmJG 8s6AMAWR53JzG2SrCCwtzyRPSXmKfvixF+VKmlB2Ksc7VAn1xA0DYnY5Tx99EtXu qcEaR4SICMti0urmBGo/cGFdXi2TB1ccXqwoRtp1N3KiYnnOaQdLNO9qZdl9uUTI uleXACzkCVSssSpBfGjFcPyHU4r3WjMfX0f5ZJPpFMoQmvwV1yeMX7xTEZz4Sxew cHfBt9FAW9+4mBMTQfokBt0hZ6jwKcYl/z3Xi2oD+Ik/Qrzx5kcLA8LZLEVRXIBa SZh2ASazq/dr8YoZ744VRmlmi+nISAIHbbQMeqQEQgYQh0HpwS9g5HtpsBzNP6aB 91RHqZSccb/zNdi8e+RH79Y7pX/G5QcuVKcW6KQUBcAAb6hAgOg= =JUzp -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core: - IOMMU memory usage observability - This will make the memory used for IO page tables explicitly visible. - Simplify arch_setup_dma_ops() Intel VT-d: - Consolidate domain cache invalidation - Remove private data from page fault message - Allocate DMAR fault interrupts locally - Cleanup and refactoring ARM-SMMUv2: - Support for fault debugging hardware on Qualcomm implementations - Re-land support for the ->domain_alloc_paging() callback ARM-SMMUv3: - Improve handling of MSI allocation failure - Drop support for the "disable_bypass" cmdline option - Major rework of the CD creation code, following on directly from the STE rework merged last time around. - Add unit tests for the new STE/CD manipulation logic AMD-Vi: - Final part of SVA changes with generic IO page fault handling Renesas IPMMU: - Add support for R8A779H0 hardware ... and a couple smaller fixes and updates across the sub-tree" * tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (80 commits) iommu/arm-smmu-v3: Make the kunit into a module arm64: Properly clean up iommu-dma remnants iommu/amd: Enable Guest Translation after reading IOMMU feature register iommu/vt-d: Decouple igfx_off from graphic identity mapping iommu/amd: Fix compilation error iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() iommu/arm-smmu-v3: Move the CD generation for SVA into a function iommu/arm-smmu-v3: Allocate the CD table entry in advance iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr() iommu/arm-smmu-v3: Consolidate clearing a CD table entry iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() iommu/arm-smmu-v3: Add an ops indirection to the STE code iommu/arm-smmu-qcom: Don't build debug features as a kernel module iommu/amd: Add SVA domain support iommu: Add ops->domain_alloc_sva() iommu/amd: Initial SVA support for AMD IOMMU iommu/amd: Add support for enable/disable IOPF iommu/amd: Add IO page fault notifier handler ... |
||
![]() |
1b10b390d9 |
EFI updates for v6.10:
- Additional cleanup by Tim for the efivarfs variable name length confusion - Avoid freeing a bogus pointer when virtual remapping is omitted in the EFI boot stub -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZkMPmAAKCRAwbglWLn0t XNw2AQC96tWvh7piDGw2LTGp4zqRcoe2LV09hNE8rgk63g+2LgEA81TOhEKwpeie 23GI9WiEABdRT8kt6ebuVatfzrn5jgk= =8la7 -----END PGP SIGNATURE----- Merge tag 'efi-next-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Only a handful of changes this cycle, consisting of cleanup work and a low-prio bugfix: - Additional cleanup by Tim for the efivarfs variable name length confusion - Avoid freeing a bogus pointer when virtual remapping is omitted in the EFI boot stub" * tag 'efi-next-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: libstub: only free priv.runtime_map when allocated efi: Clear up misconceptions about a maximum variable name size efivarfs: Remove unused internal struct members Documentation: Mark the 'efivars' sysfs interface as removed efi: pstore: Request at most 512 bytes for variable names |
||
![]() |
8815da98e0 |
Another not-too-busy cycle for documentation, including:
- Some build-system changes to detect the variable fonts installed by some distributions that can break the PDF build. - Various updates and additions to the Spanish, Chinese, Italian, and Japanese translations. - Update the stable-kernel rules to match modern practice ...and the usual array of corrections, updates, and typo fixes. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmY9ASYACgkQF0NaE2wM flhPAwf/SYwHTBhKo0Xy3WsY3PHm4hsYVDwQ/Nfr6oa1mF+x4npxcN1RzPJd8iB9 zXlynnBkptwvEoukJV2hw+gVwO9ixyqJzIt7AmRFgA5cywhklpxQQAVelQG4ISR2 8M7LOXIjROJdY3OymPcQ2YF1m000tB9Khx7uvWrvMZEasXND/ITi9mFIJiOk841C 5wGTHmYKjJwuqTm6CsghAgLJkRYGHD+gtp4w8wQwQzIHJ6B8SnbVPSnYYqJ8Qt/V 31AEBgV3WJhmNiyNgP/p3rtDTCXBowSK8klOMa5CW3FQEIb4SQL/uBZ8qR8FQo2c l1zsuPKKJOqe9T+POWHXdjoryZn1Ug== =8fUD -----END PGP SIGNATURE----- Merge tag 'docs-6.10' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "Another not-too-busy cycle for documentation, including: - Some build-system changes to detect the variable fonts installed by some distributions that can break the PDF build. - Various updates and additions to the Spanish, Chinese, Italian, and Japanese translations. - Update the stable-kernel rules to match modern practice ... and the usual array of corrections, updates, and typo fixes" * tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits) cgroup: Add documentation for missing zswap memory.stat kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning. docs:core-api: fixed typos and grammar in printk-index page Documentation: tracing: Fix spelling mistakes docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4 docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4 docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4 docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4 docs: stable-kernel-rules: fix typo sent->send docs/zh_CN: remove two inconsistent spaces docs: scripts/check-variable-fonts.sh: Improve commands for detection docs: stable-kernel-rules: create special tag to flag 'no backporting' docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.) docs: stable-kernel-rules: remove code-labels tags and a indention level docs: stable-kernel-rules: call mainline by its name and change example docs: stable-kernel-rules: reduce redundancy docs, kprobes: Add riscv as supported architecture Docs: typos/spelling docs: kernel_include.py: Cope with docutils 0.21 docs: ja_JP/howto: Catch up update in v6.8 ... |
||
![]() |
706833dbe3 |
bcachefs: CodingStyle
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
||
![]() |
5ec5aab775 |
doc: split buffer.rst out of api-summary.rst
Buffer heads are no longer a generic filesystem API but an optional filesystem support library. Make the documentation structure reflect that, and include the fine documentation kept in buffer_head.h. We could give a better overview of what buffer heads are all about, but my enthusiasm for documenting it is limited. [willy@infradead.org: fix kerneldoc warning] Link: https://lkml.kernel.org/r/20240417015933.453505-1-willy@infradead.org [akpm@linux-foundation.org: remove newline at EOF] Link: https://lkml.kernel.org/r/20240416031754.4076917-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
d18a867958 |
make set_blocksize() fail unless block device is opened exclusive
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
![]() |
da51bbcdba |
Docs: typos/spelling
Fix spelling and grammar in Docs descriptions Signed-off-by: Remington Brasga <rbrasga@uci.edu> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240429225527.2329-1-rbrasga@uci.edu |
||
![]() |
22d407b164 |
lib: add allocation tagging support for memory allocation profiling
Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily instrument memory allocators. It registers an "alloc_tags" codetag type with /proc/allocinfo interface to output allocation tag information when the feature is enabled. CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory allocation profiling instrumentation. Memory allocation profiling can be enabled or disabled at runtime using /proc/sys/vm/mem_profiling sysctl when CONFIG_MEM_ALLOC_PROFILING_DEBUG=n. CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT enables memory allocation profiling by default. [surenb@google.com: Documentation/filesystems/proc.rst: fix allocinfo title] Link: https://lkml.kernel.org/r/20240326073813.727090-1-surenb@google.com [surenb@google.com: do limited memory accounting for modules with ARCH_NEEDS_WEAK_PER_CPU] Link: https://lkml.kernel.org/r/20240402180933.1663992-2-surenb@google.com [klarasmodin@gmail.com: explicitly include irqflags.h in alloc_tag.h] Link: https://lkml.kernel.org/r/20240407133252.173636-1-klarasmodin@gmail.com [surenb@google.com: fix alloc_tag_init() to prevent passing NULL to PTR_ERR()] Link: https://lkml.kernel.org/r/20240417003349.2520094-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-14-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Klara Modin <klarasmodin@gmail.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
7643f3fe27 |
f2fs: assign the write hint per stream by default
This reverts commit
|
||
![]() |
67bdcd4999 |
docs: describe xfs directory tree online fsck
I've added a scrubber that checks the directory tree structure and fixes them; describe this in the design documentation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
c91fe20e5a |
docs: update offline parent pointer repair strategy
Now update how xfs_repair checks and repairs parent pointer info. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
5220727ce8 |
docs: update online directory and parent pointer repair sections
Update the case studies of online directory and parent pointer reconstruction to reflect what they actually do in the final version. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
d85fe250f2 |
docs: update the parent pointers documentation to the final version
Now that we've decided on the ondisk format of parent pointers, update the documentation to reflect that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
e6c9e75fbe |
xfs: move files to orphanage instead of letting nlinks drop to zero
If we encounter an inode with a nonzero link count but zero observed links, move it to the orphanage. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
1e58a8ccf2 |
xfs: move orphan files to the orphanage
When we're repairing a directory structure or fixing the dotdot entry of a subdirectory, it's possible that we won't ever find a parent for the subdirectory. When this is the case, move it to the orphanage, aka /lost+found. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
f783529bee |
docs: update swapext -> exchmaps language
Start reworking the atomic swapext design documentation to refer to its new file contents/mapping exchange name. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
5302a5c8be |
xfs: only clear log incompat flags at clean unmount
While reviewing the online fsck patchset, someone spied the xfs_swapext_can_use_without_log_assistance function and wondered why we go through this inverted-bitmask dance to avoid setting the XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature. (The same principles apply to the logged extended attribute update feature bit in the since-merged LARP series.) The reason for this dance is that xfs_add_incompat_log_feature is an expensive operation -- it forces the log, pushes the AIL, and then if nobody's beaten us to it, sets the feature bit and issues a synchronous write of the primary superblock. That could be a one-time cost amortized over the life of the filesystem, but the log quiesce and cover operations call xfs_clear_incompat_log_features to remove feature bits opportunistically. On a moderately loaded filesystem this leads to us cycling those bits on and off over and over, which hurts performance. Why do we clear the log incompat bits? Back in ~2020 I think Dave and I had a conversation on IRC[2] about what the log incompat bits represent. IIRC in that conversation we decided that the log incompat bits protect unrecovered log items so that old kernels won't try to recover them and barf. Since a clean log has no protected log items, we could clear the bits at cover/quiesce time. As Dave Chinner pointed out in the thread, clearing log incompat bits at unmount time has positive effects for golden root disk image generator setups, since the generator could be running a newer kernel than what gets written to the golden image -- if there are log incompat fields set in the golden image that was generated by a newer kernel/OS image builder then the provisioning host cannot mount the filesystem even though the log is clean and recovery is unnecessary to mount the filesystem. Given that it's expensive to set log incompat bits, we really only want to do that once per bit per mount. Therefore, I propose that we only clear log incompat bits as part of writing a clean unmount record. Do this by adding an operational state flag to the xfs mount that guards whether or not the feature bit clearing can actually take place. This eliminates the l_incompat_users rwsem that we use to protect a log cleaning operation from clearing a feature bit that a frontend thread is trying to set -- this lock adds another way to fail w.r.t. locking. For the swapext series, I shard that into multiple locks just to work around the lockdep complaints, and that's fugly. Link: https://lore.kernel.org/linux-xfs/20240131230043.GA6180@frogsfrogsfrogs/ Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> |
||
![]() |
212c5c078d |
iommu: account IOMMU allocated memory
In order to be able to limit the amount of memory that is allocated by IOMMU subsystem, the memory must be accounted. Account IOMMU as part of the secondary pagetables as it was discussed at LPC. The value of SecPageTables now contains mmeory allocation by IOMMU and KVM. There is a difference between GFP_ACCOUNT and what NR_IOMMU_PAGES shows. GFP_ACCOUNT is set only where it makes sense to charge to user processes, i.e. IOMMU Page Tables, but there more IOMMU shared data that should not really be charged to a specific process. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20240413002522.1101315-12-pasha.tatashin@soleen.com Signed-off-by: Joerg Roedel <jroedel@suse.de> |
||
![]() |
5b625181fb |
Documentation: Mark the 'efivars' sysfs interface as removed
The 'efivars' sysfs interface was removed in commit
|
||
![]() |
aa98e70fc6 |
Documentation: filesystems: Add bcachefs toctree
Commit |
||
![]() |
c5d9ab85eb |
f2fs update for 6.9-rc1
In this round, there are a number of updates on mainly two areas: Zoned block device support and Per-file compression. For example, we've found several issues to support Zoned block device especially having large sections regarding to GC and file pinning used for Android devices. In compression side, we've fixed many corner race conditions that had broken the design assumption. Enhancement: - Support file pinning for Zoned block device having large section - Enhance the data recovery after sudden power cut on Zoned block device - Add more error injection cases to easily detect the kernel panics - add a proc entry show the entire disk layout - Improve various error paths paniced by BUG_ON in block allocation and GC - support SEEK_DATA and SEEK_HOLE for compression files Bug fix: - fix to avoid use-after-free issue in f2fs_filemap_fault - fix some race conditions to break the atomic write design assumption - fix to truncate meta inode pages forcely - resolve various per-file compression issues wrt the space management and compression policies - fix some swap-related bugs In addition, we removed deprecated codes such as io_bits and heap_allocation, and also fixed minor error handling routines with neat debugging messages. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmX4gS0ACgkQQBSofoJI UNLmgBAAg4mvbWjmJ5VbXs4zGLOgLRJYcY1sZRO5Ufg4LhWzoGRxL1Dru+TELw0t 1Ck2EQvP91XZ5weA5AZOfWbxcijy4+8L3P8L7ohOShudfACci0wQsx6IaUUWWylC ILA4+DkovpZrlu6th12Gj9QAM6TN9gdy3V1VLT5O/KmE1x6Pekwp2hQoIvVJRH5L I3KxOf5fTe3oWLvEN6m7yCz/8qGqz8+w0ae90UG0fqi0wVEuZJ99zsVPnuhu6uBo riFm2A6ra0I/JqoPyqn2QM6ApItM867ULo9EoyQVgq56Q1w31ENOJXsU9N7N4Wxt olgujH1SijkWk9ni57iKtMhR68e3Rs+pVsuNFmJuOPq0HASoggB66QRrVvCgM9JG z3D//CB2ONtX2XiKJMiTcX9VqIqrMw6L1eVxEZu0P96C3CS70MoBU69mdSR9Og2S 5nQXja3yzFhdk3thp6+wAJ3I04ZQkf3qoHZB+0chU2Xl1pV+5NIkBgBsSw8g/TY3 EIHMfK+TX0SBSNCvkUDEJ+Z8ZRID6tcbAquTSsBr6wxB+F9mq7onEvI8O7xwyH9W DU8xhymOE2QUoluNtyW7ww6HK913ripXIenI9LaYJnuj0XeDAcMIoPsgR7AGU5UG hshvirFdUdWRMTfXxNNUrvhOWI0qurQSVx+VV6Qb62DGqR5ofOw= =Qpvy -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs update from Jaegeuk Kim: "In this round, there are a number of updates on mainly two areas: Zoned block device support and Per-file compression. For example, we've found several issues to support Zoned block device especially having large sections regarding to GC and file pinning used for Android devices. In compression side, we've fixed many corner race conditions that had broken the design assumption. Enhancements: - Support file pinning for Zoned block device having large section - Enhance the data recovery after sudden power cut on Zoned block device - Add more error injection cases to easily detect the kernel panics - add a proc entry show the entire disk layout - Improve various error paths paniced by BUG_ON in block allocation and GC - support SEEK_DATA and SEEK_HOLE for compression files Bug fixes: - avoid use-after-free issue in f2fs_filemap_fault - fix some race conditions to break the atomic write design assumption - fix to truncate meta inode pages forcely - resolve various per-file compression issues wrt the space management and compression policies - fix some swap-related bugs In addition, we removed deprecated codes such as io_bits and heap_allocation, and also fixed minor error handling routines with neat debugging messages" * tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits) f2fs: fix to avoid use-after-free issue in f2fs_filemap_fault f2fs: truncate page cache before clearing flags when aborting atomic write f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag f2fs: prevent atomic write on pinned file f2fs: fix to handle error paths of {new,change}_curseg() f2fs: unify the error handling of f2fs_is_valid_blkaddr f2fs: zone: fix to remove pow2 check condition for zoned block device f2fs: fix to truncate meta inode pages forcely f2fs: compress: fix reserve_cblocks counting error when out of space f2fs: compress: relocate some judgments in f2fs_reserve_compress_blocks f2fs: add a proc entry show disk layout f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanup f2fs: fix to check return value of f2fs_gc_range f2fs: fix to check return value __allocate_new_segment f2fs: fix to do sanity check in update_sit_entry f2fs: fix to reset fields for unloaded curseg f2fs: clean up new_curseg() f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate() f2fs: fix blkofs_end correctly in f2fs_migrate_blocks() f2fs: ro: don't start discard thread for readonly image ... |
||
![]() |
32a50540c3 |
bcachefs updates for 6.9
- Subvolume children btree; this is needed for providing a userspace interface for walking subvolumes, which will come later - Lots of improvements to directory structure checking - Improved journal pipelining, significantly improving performance on high iodepth write workloads - Discard path improvements: the discard path is more efficient, and no longer flushes the journal unnecessarily - Buffered write path can now avoid taking the inode lock - new mm helper: memalloc_flags_{save|restore} - mempool now does kvmalloc mempools -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmXycEcACgkQE6szbY3K bnYUTg/+K4Nv2EdAqOCyHRTKaF2OgJDUb25ZDmbGpfT1XyPrNB7/+CxHqSdEP7/e FVuhtP61vnQAImDv82u9iZiab/TnuCZPUrjSobFEvrWYoGRtP9Bm9MyYB28NzmMa AXGmS4yJGVwtxxrFNxZP98IbiHYiHSoYbkqxX2E5VgLag8Ru8peb7oD0Ro3zw0rb z+6UM/seJ7on5i/9IJEMKKXFVEoZC2J5DAVoe1TghG2kgOw3cKu5OUdltLPOY5jL jkm5J5wa6Ep46nufHat92yiMxXIQrf4U9LkXxzTi5ThoSmt+Af2qXcBjqTTVqd2D 1dGxj+UG8iu4DCCbQC6EA7J5EMvxfJM0+9lk1ULUgxUs3X69co6nlI6XH1fwEMqk KpIqd35+Y/IYgogt9ioXI0dtXyL7dbaTVt6NZhc9SaPGPX+C2V0+l4bqToFdNaPH 0KATjjyQaJRE4ZFIjr6GliYOtKWDLi/HPEyoBivniUn7cF5vjSvti+cSQwNDSPpa 6jOd5Y923Iq9ZqDAPM3+mvTH8nNaaf2T2fmbPNrc5pdWbha9bGwOU71zvKHNFGm/ 66ZsnwhKSk+uwglTMZHPKSkJJXUYAHESw3slQtEWHZVlliArc55+pBHwE00bvRt7 KHUUqkqXBUPzbp/kdZGylMAdH9+8j9TE5QJ2RaoryFm/eCfexmI= =6xnj -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs Pull bcachefs updates from Kent Overstreet: - Subvolume children btree; this is needed for providing a userspace interface for walking subvolumes, which will come later - Lots of improvements to directory structure checking - Improved journal pipelining, significantly improving performance on high iodepth write workloads - Discard path improvements: the discard path is more efficient, and no longer flushes the journal unnecessarily - Buffered write path can now avoid taking the inode lock - new mm helper: memalloc_flags_{save|restore} - mempool now does kvmalloc mempools * tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs: (128 commits) bcachefs: time_stats: shrink time_stat_buffer for better alignment bcachefs: time_stats: split stats-with-quantiles into a separate structure bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet bcachefs: time_stats: add larger units bcachefs: pull out time_stats.[ch] bcachefs: reconstruct_alloc cleanup bcachefs: fix bch_folio_sector padding bcachefs: Fix btree key cache coherency during replay bcachefs: Always flush write buffer in delete_dead_inodes() bcachefs: Fix order of gc_done passes bcachefs: fix deletion of indirect extents in btree_gc bcachefs: Prefer struct_size over open coded arithmetic bcachefs: Kill unused flags argument to btree_split() bcachefs: Check for writing superblocks with nonsense member seq fields bcachefs: fix bch2_journal_buf_to_text() lib/generic-radix-tree.c: Make nodes more reasonably sized bcachefs: copy_(to|from)_user_errcode() bcachefs: Split out bkey_types.h bcachefs: fix lost journal buf wakeup due to improved pipelining bcachefs: intercept mountoption value for bool type ... |
||
![]() |
eb386617be |
bcachefs: Errcode tracepoint, documentation
Add a tracepoint for downcasting private errors to standard errors, so they can be recovered even when not logged; also, add some documentation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
||
![]() |
babbcc0232 |
New code for 6.9:
* Online Repair; ** New ondisk structures being repaired. - Inode's mode field by trying to obtain file type value from the a directory entry. - Quota counters. - Link counts of inodes. - FS summary counters. - rmap btrees. Support for in-memory btrees has been added to support repair of rmap btrees. ** Misc changes - Report corruption of metadata to the health tracking subsystem. - Enable indirect health reporting when resources are scarce. - Reduce memory usage while reparing refcount btree. - Extend "Bmap update" intent item to support atomic extent swapping on the realtime device. - Extend "Bmap update" intent item to support extended attribute fork and unwritten extents. ** Code cleanups - Bmap log intent. - Btree block pointer checking. - Btree readahead. - Buffer target. - Symbolic link code. * Remove mrlock wrapper around the rwsem. * Convert all the GFP_NOFS flag usages to use the scoped memalloc_nofs_save() API instead of direct calls with the GFP_NOFS. * Refactor and simplify xfile abstraction. Lower level APIs in shmem.c are required to be exported in order to achieve this. * Skip checking alignment constraints for inode chunk allocations when block size is larger than inode chunk size. * Do not submit delwri buffers collected during log recovery when an error has been encountered. * Fix SEEK_HOLE/DATA for file regions which have active COW extents. * Fix lock order inversion when executing error handling path during shrinking a filesystem. * Remove duplicate ifdefs. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZemMkgAKCRAH7y4RirJu 9ON5AP0Vda6sMn/ZUYoLo9ZUrUvlUb8L0dhEN5JL0XfyWW5ogAD/bH4G6pKSNyTw cSEjryuDakirdHLt5g0c+QHd2a/fzw0= =ymKk -----END PGP SIGNATURE----- Merge tag 'xfs-6.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Chandan Babu: - Online repair updates: - More ondisk structures being repaired: - Inode's mode field by trying to obtain file type value from the a directory entry - Quota counters - Link counts of inodes - FS summary counters - Support for in-memory btrees has been added to support repair of rmap btrees - Misc changes: - Report corruption of metadata to the health tracking subsystem - Enable indirect health reporting when resources are scarce - Reduce memory usage while repairing refcount btree - Extend "Bmap update" intent item to support atomic extent swapping on the realtime device - Extend "Bmap update" intent item to support extended attribute fork and unwritten extents - Code cleanups: - Bmap log intent - Btree block pointer checking - Btree readahead - Buffer target - Symbolic link code - Remove mrlock wrapper around the rwsem - Convert all the GFP_NOFS flag usages to use the scoped memalloc_nofs_save() API instead of direct calls with the GFP_NOFS - Refactor and simplify xfile abstraction. Lower level APIs in shmem.c are required to be exported in order to achieve this - Skip checking alignment constraints for inode chunk allocations when block size is larger than inode chunk size - Do not submit delwri buffers collected during log recovery when an error has been encountered - Fix SEEK_HOLE/DATA for file regions which have active COW extents - Fix lock order inversion when executing error handling path during shrinking a filesystem - Remove duplicate ifdefs * tag 'xfs-6.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (183 commits) xfs: shrink failure needs to hold AGI buffer mm/shmem.c: Use new form of *@param in kernel-doc kernel-doc: Add unary operator * to $type_param_ref xfs: use kvfree() in xlog_cil_free_logvec() xfs: xfs_btree_bload_prep_block() should use __GFP_NOFAIL xfs: fix scrub stats file permissions xfs: fix log recovery erroring out on refcount recovery failure xfs: move symlink target write function to libxfs xfs: move remote symlink target read function to libxfs xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h xfs: xfs_bmap_finish_one should map unwritten extents properly xfs: support deferred bmap updates on the attr fork xfs: support recovering bmap intent items targetting realtime extents xfs: add a realtime flag to the bmap update log redo items xfs: add a xattr_entry helper xfs: fix xfs_bunmapi to allow unmapping of partial rt extents xfs: move xfs_bmap_defer_add to xfs_bmap_item.c xfs: reuse xfs_bmap_update_cancel_item xfs: add a bi_entry helper xfs: remove xfs_trans_set_bmap_flags ... |
||
![]() |
1f44039766 |
A moderatly busy cycle for development this time around.
- Some cleanup of the main index page for easier navigation - Rework some of the other top-level pages for better readability and, with luck, fewer merge conflicts in the future. - Submit-checklist improvements, hopefully the first of many. - New Italian translations - A fair number of kernel-doc fixes and improvements. We have also dropped the recommendation to use an old version of Sphinx. - A new document from Thorsten on bisection ...and lots of fixes and updates. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmXvKVIACgkQF0NaE2wM flik1gf/ZFS1mHwDdmHA/vpx8UxdUlFEo0Pms8V24iPSW5aEIqkZ406c9DSyMTtp CXTzW+RSCfB1Q3ciYtakHBgv0RzZ5+RyaEZ1l7zVmMyw4nYvK6giYKmg8Y0EVPKI fAVuPWo5iE7io0sNVbKBKJJkj9Z8QEScM48hv/CV1FblMvHYn0lie6muJrF9G6Ez HND+hlYZtWkbRd5M86CDBiFeGMLVPx17T+psQyQIcbUYm9b+RUqZRHIVRLYbad7r 18r9+83DsOhXTVJCBBSfCSZwzF8yAm+eD1w47sxnSItF8OiIjqCzQgXs3BZe9TXH h2YyeWbMN3xByA4mEgpmOPP44RW7Pg== =SC60 -----END PGP SIGNATURE----- Merge tag 'docs-6.9' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "A moderatly busy cycle for development this time around. - Some cleanup of the main index page for easier navigation - Rework some of the other top-level pages for better readability and, with luck, fewer merge conflicts in the future. - Submit-checklist improvements, hopefully the first of many. - New Italian translations - A fair number of kernel-doc fixes and improvements. We have also dropped the recommendation to use an old version of Sphinx. - A new document from Thorsten on bisection ... and lots of fixes and updates" * tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits) docs: verify/bisect: fixes, finetuning, and support for Arch docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst docs: submit-checklist: use subheadings docs: submit-checklist: structure by category docs: new text on bisecting which also covers bug validation docs: drop the version constraints for sphinx and dependencies docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4 docs: Restore "smart quotes" for quotes docs/zh_CN: accurate translation of "function" docs: Include simplified link titles in main index docs: Correct formatting of title in admin-guide/index.rst docs: kernel_feat.py: fix build error for missing files MAINTAINERS: Set the field name for subsystem profile section kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO Fixed case issue with 'fault-injection' in documentation kernel-doc: handle #if in enums as well Documentation: update mailing list addresses doc: kerneldoc.py: fix indentation scripts/kernel-doc: simplify signature printing ... |
||
![]() |
3bf95d567d |
fscrypt updates for 6.9
Fix flakiness in a test by releasing the quota synchronously when a key is removed, and other minor cleanups. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZe/STxQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKyVAAQCJQr5l3fU+rm1FVpuVg8q/pbPdi5wJ N31pYFvY3AehtQEArdPNtBbXW3V7i9OL6CDmesuNtGr3Il5KRV1h89yyYgY= =RGab -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "Fix flakiness in a test by releasing the quota synchronously when a key is removed, and other minor cleanups" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: shrink the size of struct fscrypt_inode_info slightly fscrypt: write CBC-CTS instead of CTS-CBC fscrypt: clear keyring before calling key_put() fscrypt: explicitly require that inode->i_blkbits be set |
||
![]() |
77417942e4 |
vfs-6.9.ntfs
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem42QAKCRCRxhvAZXjc opOtAQDUkiJNaOu3fR6ENLvDZSFmaI2jQXIL8ulHYpEiFrXmKwD9EZQ8bmEYU7uO WN4VM8p8UwQ7BmIV9b+jvwciF8Qi8QI= =T03q -----END PGP SIGNATURE----- Merge tag 'vfs-6.9.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull ntfs update from Christian Brauner: "This removes the old ntfs driver. The new ntfs3 driver is a full replacement that was merged over two years ago. We've went through various userspace and either they use ntfs3 or they use the fuse version of ntfs and thus build neither ntfs nor ntfs3. I think that's a clear sign that we should risk removing the legacy ntfs driver. Quoting from Arch Linux and Debian: - Debian does neither build the legacy ntfs nor the new ntfs3: "Not currently built with Debian's kernel packages, 'ntfs' has been symlinked to 'ntfs-3g' as it relates to fstab and mount commands. Debian kernels are built without support of the ntfs3 driver developed by Paragon Software." (cf. [2]) - Archlinux provides ntfs3 as their default since 5.15: "All officially supported kernels with versions 5.15 or newer are built with CONFIG_NTFS3_FS=m and thus support it. Before 5.15, NTFS read and write support is provided by the NTFS-3G FUSE file system." (cf. [1]). It's unmaintained apart from various odd fixes as well. Worst case we have to reintroduce it if someone really has a valid dependency on it. But it's worth trying to see whether we can remove it" Link: https://wiki.archlinux.org/title/NTFS [1] Link: https://wiki.debian.org/NTFS [2] * tag 'vfs-6.9.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: remove NTFS classic from docum. index fs: Remove NTFS classic |
||
![]() |
7ea65c89d8 |
vfs-6.9.misc
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem3wQAKCRCRxhvAZXjc otRMAQDeo8qsuuIAcS2KUicKqZR5yMVvrY9r4sQzf7YRcJo5HQD+NQXkKwQuv1VO OUeScsic/+I+136AgdjWnlEYO5dp0go= =4WKU -----END PGP SIGNATURE----- Merge tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ... |
||
![]() |
8b10d36537 |
f2fs: introduce FAULT_NO_SEGMENT
Use it to simulate no free segment case during block allocation. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
![]() |
4e0197f993 |
f2fs: kill heap-based allocation
No one uses this feature. Let's kill it. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
![]() |
2f944c66ae |
fscrypt: write CBC-CTS instead of CTS-CBC
Calling CBC with ciphertext stealing "CBC-CTS" seems to be more common than calling it "CTS-CBC". E.g., CBC-CTS is used by OpenSSL, Crypto++, RFC3962, and RFC6803. The NIST SP800-38A addendum uses CBC-CS1, CBC-CS2, and CBC-CS3, distinguishing between different CTS conventions but similarly putting the CBC part first. In the interest of avoiding any idiosyncratic terminology, update the fscrypt documentation and the fscrypt_mode "friendly names" to align with the more common convention. Changing the "friendly names" only affects some log messages. The actual mode constants in the API are unchanged; those call it simply "CTS". Add a note to the documentation that clarifies that "CBC" and "CTS" in the API really mean CBC-ESSIV and CBC-CTS, respectively. Link: https://lore.kernel.org/r/20240224053550.44659-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> |
||
![]() |
a095686a23 |
xfs: support in-memory btrees
Adapt the generic btree cursor code to be able to create a btree whose buffers come from a (presumably in-memory) buftarg with a header block that's specific to in-memory btrees. We'll connect this to other parts of online scrub in the next patches. Note that in-memory btrees always have a block size matching the system memory page size for efficiency reasons. There are also a few things we need to do to finalize a btree update; that's covered in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
e5a2f47cff |
xfs: remove xfile_{get,put}_page
These functions aren't used anymore, so get rid of them. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> |
||
![]() |
e47e2e0ba9 |
xfs: remove the xfile_pread/pwrite APIs
All current and pending xfile users use the xfile_obj_load and xfile_obj_store API, so make those the actual implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> |
||
![]() |
87161a2b0a |
f2fs: deprecate io_bits
Let's deprecate an unused io_bits feature to save CPU cycles and memory. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
![]() |
2d1ab26ace |
docs: proc.rst: comm: mention the included NUL
Indicate that the actual value will be one character less. Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name> Link: https://lore.kernel.org/r/20240205154100.736499-1-mail@christoph.anton.mitterer.name [jc: did s/null/NUL/ ] Signed-off-by: Jonathan Corbet <corbet@lwn.net> |
||
![]() |
ef560389ca |
docs: filesystems: fix typo in docs
This patch resolves a spelling error in the filesystem documentation. It is submitted as part of my application to the "Linux Kernel Bug Fixing Spring Unpaid 2024" mentorship program of the Linux Kernel Foundation. Signed-off-by: Vincenzo Mezzela <vincenzo.mezzela@gmail.com> Link: https://lore.kernel.org/r/20240208162032.109184-1-vincenzo.mezzela@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
11b3f8ae70 |
fs: remove the inode argument to ->d_real() method
The only remaining user of ->d_real() method is d_real_inode(), which passed NULL inode argument to get the real data dentry. There are no longer any users that call ->d_real() with a non-NULL inode argument for getting a detry from a specific underlying layer. Remove the inode argument of the method and replace it with an integer 'type' argument, to allow callers to request the real metadata dentry instead of the real data dentry. All the current users of d_real_inode() (e.g. uprobe) continue to get the real data inode. Caller that need to get the real metadata inode (e.g. IMA/EVM) can use d_inode(d_real(dentry, D_REAL_METADATA)). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202110132.1584111-3-amir73il@gmail.com Tested-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
c7115e094c |
f2fs: introduce FAULT_BLKADDR_CONSISTENCE
We will encounter below inconsistent status when FAULT_BLKADDR type fault injection is on. Info: checkpoint state = d6 : nat_bits crc fsck compacted_summary orphan_inodes sudden-power-off [ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c100 has i_blocks: 000000c0, but has 191 blocks [FIX] (fsck_chk_inode_blk:1260) --> [0x1c100] i_blocks=0x000000c0 -> 0xbf [FIX] (fsck_chk_inode_blk:1269) --> [0x1c100] i_compr_blocks=0x00000026 -> 0x27 [ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1cadb has i_blocks: 0000002f, but has 46 blocks [FIX] (fsck_chk_inode_blk:1260) --> [0x1cadb] i_blocks=0x0000002f -> 0x2e [FIX] (fsck_chk_inode_blk:1269) --> [0x1cadb] i_compr_blocks=0x00000011 -> 0x12 [ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c62c has i_blocks: 00000002, but has 1 blocks [FIX] (fsck_chk_inode_blk:1260) --> [0x1c62c] i_blocks=0x00000002 -> 0x1 After we inject fault into f2fs_is_valid_blkaddr() during truncation, a) it missed to increase @nr_free or @valid_blocks b) it can cause in blkaddr leak in truncated dnode Which may cause inconsistent status. This patch separates FAULT_BLKADDR_CONSISTENCE from FAULT_BLKADDR, and rename FAULT_BLKADDR to FAULT_BLKADDR_VALIDITY so that we can: a) use FAULT_BLKADDR_CONSISTENCE in f2fs_truncate_data_blocks_range() to simulate inconsistent issue independently, then it can verify fsck repair flow. b) FAULT_BLKADDR_VALIDITY fault will not cause any inconsistent status, we can just use it to check error path handling in kernel side. Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
![]() |
06b8db3a7d
|
fs: remove NTFS classic from docum. index
With the remove of the NTFS classic filesystem, also remove its documentation entry from the filesystems index to prevent a kernel-doc warning: Documentation/filesystems/index.rst:63: WARNING: toctree contains reference to nonexisting document 'filesystems/ntfs' Fixes: 9c67092ed339 ("fs: Remove NTFS classic") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240124011424.731-1-rdunlap@infradead.org Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: <linux-fsdevel@vger.kernel.org> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: <linux-doc@vger.kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
7ffa8f3d30
|
fs: Remove NTFS classic
The replacement, NTFS3, was merged over two years ago. It is now time to remove the original from the tree as it is the last user of several APIs, and it is not worth changing. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240115072025.2071931-1-willy@infradead.org Acked-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Dave Chinner <david@fromorbit.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
420332b941 |
ovl: mark xwhiteouts directory with overlay.opaque='x'
An opaque directory cannot have xwhiteouts, so instead of marking an
xwhiteouts directory with a new xattr, overload overlay.opaque xattr
for marking both opaque dir ('y') and xwhiteouts dir ('x').
This is more efficient as the overlay.opaque xattr is checked during
lookup of directory anyway.
This also prevents unnecessary checking the xattr when reading a
directory without xwhiteouts, i.e. most of the time.
Note that the xwhiteouts marker is not checked on the upper layer and
on the last layer in lowerstack, where xwhiteouts are not expected.
Fixes:
|
||
![]() |
8cb1bb178c |
4 ksmbd server fixes
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmWobwQACgkQiiy9cAdy T1GTXgv/amjpAd1kEVwgfyUGvM9rsN+DtojoXt1z5xDkzJrnszI0s/WARz7o2gc/ 6nIOxKWfpxb0QHRcLebZHbN7mrZeelHLyMqbx0Wphy5Y0cUQlq1C50l6xAkua1dd /uPklGZVW9LDRvZYzdwa4Spi0tVwDgsnmK2UiWckCfc1yu9BVz3mg1gtaeg0Z/Z4 caojvSTXdJ/35xKV4udE4lo0PpbCC2h990c5iR8iOXgMuBlgIv3JQYC+3avuXXdS Erneof5Vx9sUV7p7SWCR71VFaWnMsF1/DtXMKTxAJKjYGZPckQGQjxOJeonaRltg 0YjzeMT5/d5Fv3w0L1Mtn9pxyXTQ4ywKJ3Vpm21geuw7pl/70sqdQtg+ujljfN6j uzb6mzwFezp3LGK8RI6h//JKUrnurIM/dc9FuRCoU2N91rfhGuiAvTwzrjh67WW8 HTmsnhNWnWuhH7OQJrOPQVEEs6DSUgA6MvHjslXoj7V+ksoKMe57JyZIr6Hx0CX9 W4Q0j6Hk =gj+T -----END PGP SIGNATURE----- Merge tag '6.8-rc-smb-server-fixes-part2' of git://git.samba.org/ksmbd Pull more smb server updates from Steve French: - Fix for incorrect oplock break on directories when leases disabled - UAF fix for race between create and destroy of tcp connection - Important session setup SPNEGO fix - Update ksmbd feature status summary * tag '6.8-rc-smb-server-fixes-part2' of git://git.samba.org/ksmbd: ksmbd: only v2 leases handle the directory ksmbd: fix UAF issue in ksmbd_tcp_new_connection() ksmbd: validate mech token in session setup ksmbd: update feature status in documentation |
||
![]() |
16df6e07d6 |
vfs-6.8.netfs
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZabMrQAKCRCRxhvAZXjc ovnUAQDgCOonb1tjtTvC8s8IMDUEoaVYZI91KVfsZQSJYN1sdQD+KfJmX1BhJnWG l0cEffGfnWGXMZkZqDgLPHUIPzFrmws= =1b3j -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.netfs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull netfs updates from Christian Brauner: "This extends the netfs helper library that network filesystems can use to replace their own implementations. Both afs and 9p are ported. cifs is ready as well but the patches are way bigger and will be routed separately once this is merged. That will remove lots of code as well. The overal goal is to get high-level I/O and knowledge of the page cache and ouf of the filesystem drivers. This includes knowledge about the existence of pages and folios The pull request converts afs and 9p. This removes about 800 lines of code from afs and 300 from 9p. For 9p it is now possible to do writes in larger than a page chunks. Additionally, multipage folio support can be turned on for 9p. Separate patches exist for cifs removing another 2000+ lines. I've included detailed information in the individual pulls I took. Summary: - Add NFS-style (and Ceph-style) locking around DIO vs buffered I/O calls to prevent these from happening at the same time. - Support for direct and unbuffered I/O. - Support for write-through caching in the page cache. - O_*SYNC and RWF_*SYNC writes use write-through rather than writing to the page cache and then flushing afterwards. - Support for write-streaming. - Support for write grouping. - Skip reads for which the server could only return zeros or EOF. - The fscache module is now part of the netfs library and the corresponding maintainer entry is updated. - Some helpers from the fscache subsystem are renamed to mark them as belonging to the netfs library. - Follow-up fixes for the netfs library. - Follow-up fixes for the 9p conversion" * tag 'vfs-6.8.netfs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (50 commits) netfs: Fix wrong #ifdef hiding wait cachefiles: Fix signed/unsigned mixup netfs: Fix the loop that unmarks folios after writing to the cache netfs: Fix interaction between write-streaming and cachefiles culling netfs: Count DIO writes netfs: Mark netfs_unbuffered_write_iter_locked() static netfs: Fix proc/fs/fscache symlink to point to "netfs" not "../netfs" netfs: Rearrange netfs_io_subrequest to put request pointer first 9p: Use length of data written to the server in preference to error 9p: Do a couple of cleanups 9p: Fix initialisation of netfs_inode for 9p cachefiles: Fix __cachefiles_prepare_write() 9p: Use netfslib read/write_iter afs: Use the netfs write helpers netfs: Export the netfs_sreq tracepoint netfs: Optimise away reads above the point at which there can be no data netfs: Implement a write-through caching option netfs: Provide a launder_folio implementation netfs: Provide a writepages implementation netfs, cachefiles: Pass upper bound length to allow expansion ... |
||
![]() |
fdfd6dde43 |
ksmbd: update feature status in documentation
Update ksmbd feature status in documentation file. - add support for v2 lease feature and SMB3 CCM/GCM256 encryption. - add planned compression, quic, gmac signing features. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
![]() |
499aa1ca4e |
dcache stuff for this cycle
change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc.) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+sQQAKCRBZ7Krx/gZQ 6ybjAQDM5jiS93IUzfHjCWq0nVBX5YGbDAkZOeqxbmIdQb+2UAEA6elP5r0fBBcA seo3bry4DirQMDaA/Cjh4+8r71YSOQs= =7+Hk -----END PGP SIGNATURE----- Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc)" * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) dcache: remove unnecessary NULL check in dget_dlock() kill DCACHE_MAY_FREE __d_unalias() doesn't use inode argument d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant get rid of DCACHE_GENOCIDE d_genocide(): move the extern into fs/internal.h simple_fill_super(): don't bother with d_genocide() on failure nsfs: use d_make_root() d_alloc_pseudo(): move setting ->d_op there from the (sole) caller kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller retain_dentry(): introduce a trimmed-down lockless variant __dentry_kill(): new locking scheme d_prune_aliases(): use a shrink list switch select_collect{,2}() to use of to_shrink_list() to_shrink_list(): call only if refcount is 0 fold dentry_kill() into dput() don't try to cut corners in shrink_lock_dentry() fold the call of retain_dentry() into fast_dput() Call retain_dentry() with refcount 0 dentry_kill(): don't bother with retain_dentry() on slow path ... |
||
![]() |
bf4e7080ae |
fix directory locking scheme on rename
broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+lyQAKCRBZ7Krx/gZQ 60MWAP94hTqeMIpjhsUIkrTnylrIFaiw4UCWFJzIRG1QQYKqCgD/XUaWI9np7dL6 0wR/j4CQSdJjiEFKUFE2pD3QoSuJYAQ= =+x0+ -----END PGP SIGNATURE----- Merge tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change |
||
![]() |
5b9b41617b |
Another moderately busy cycle for documentation, including:
- The minimum Sphinx requirement has been raised to 2.4.4, following a warning that was added in 6.2. - Some reworking of the Documentation/process front page to, hopefully, make it more useful. - Various kernel-doc tweaks to, for example, make it deal properly with __counted_by annotations. - We have also restored a warning for documentation of nonexistent structure members that disappeared a while back. That had the delightful consequence of adding some 600 warnings to the docs build. A sustained effort by Randy, Vegard, and myself has addressed almost all of those, bringing the documentation back into sync with the code. The fixes are going through the appropriate maintainer trees. - Various improvements to the HTML rendered docs, including automatic links to Git revisions and a nice new pulldown to make translations easy to access. - Speaking of translations, more of those for Spanish and Chinese. ...plus the usual stream of documentation updates and typo fixes. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmWcRKMPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YTKIH/AxBt/3iWt40dPf18arZHLU6tdUbmg01ttef CNKWkniCmABGKc//KYDXvjZMRDt0YlrS0KgUzrb8nIQTBlZG40D+88EwjXE0HeGP xt1Fk7OPOiJEqBZ3HEe0PDVfOiA+4yR6CmDKklCJuKg77X9atklneBwPUw/cOASk CWj+BdbwPBiSNQv48Lp87rGusKwnH/g0MN2uS0z9MPr1DYjM1K8+ngZjGW24lZHt qs5yhP43mlZGBF/lwNJXQp/xhnKAqJ9XwylBX9Wmaoxaz9yyzNVsADGvROMudgzi 9YB+Jdy7Z0JSrVoLIRhUuDOv7aW8vk+8qLmGJt2aTIsqehbQ6pk= =fCtT -----END PGP SIGNATURE----- Merge tag 'docs-6.8' of git://git.lwn.net/linux Pull documentation update from Jonathan Corbet: "Another moderately busy cycle for documentation, including: - The minimum Sphinx requirement has been raised to 2.4.4, following a warning that was added in 6.2 - Some reworking of the Documentation/process front page to, hopefully, make it more useful - Various kernel-doc tweaks to, for example, make it deal properly with __counted_by annotations - We have also restored a warning for documentation of nonexistent structure members that disappeared a while back. That had the delightful consequence of adding some 600 warnings to the docs build. A sustained effort by Randy, Vegard, and myself has addressed almost all of those, bringing the documentation back into sync with the code. The fixes are going through the appropriate maintainer trees - Various improvements to the HTML rendered docs, including automatic links to Git revisions and a nice new pulldown to make translations easy to access - Speaking of translations, more of those for Spanish and Chinese ... plus the usual stream of documentation updates and typo fixes" * tag 'docs-6.8' of git://git.lwn.net/linux: (57 commits) MAINTAINERS: use tabs for indent of CONFIDENTIAL COMPUTING THREAT MODEL A reworked process/index.rst ring-buffer/Documentation: Add documentation on buffer_percent file Translated the RISC-V architecture boot documentation. Docs: remove mentions of fdformat from util-linux Docs/zh_CN: Fix the meaning of DEBUG to pr_debug() Documentation: move driver-api/dcdbas to userspace-api/ Documentation: move driver-api/isapnp to userspace-api/ Documentation/core-api : fix typo in workqueue Documentation/trace: Fixed typos in the ftrace FLAGS section kernel-doc: handle a void function without producing a warning scripts/get_abi.pl: ignore some temp files docs: kernel_abi.py: fix command injection scripts/get_abi: fix source path leak CREDITS, MAINTAINERS, docs/process/howto: Update man-pages' maintainer docs: translations: add translations links when they exist kernel-doc: Align quick help and the code MAINTAINERS: add reviewer for Spanish translations docs: ignore __counted_by attribute in structure definitions scripts: kernel-doc: Clarify missing struct member description .. |
||
![]() |
4d925f6057 |
overlayfs updates for 6.8
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmWeozwACgkQEVvVyTe/ 1WqnyA//U2Ka5ZIncs/hA5D03LMyuCh9qlH5qAGce5vrBTxogTlFuTGGKtsUuCB5 Y4GALO+fw8aWAowt5X1XfHD3TETLVbCshT7dYjKsKy/ojANCbgkCipXBudYx+l9m fllwTZyueK0UY14kCU2DAV5PYsI/XVVykk71GSMOMLCUfRJfDI7R0vBD0NaUd7Kz Wp/M6t0MnXX23nGUdgNoroZPPj3Ts/gK2MXID+QHXGaR2+M1B1lLKfSu6TcRDLtn tbe/ivaw4y1jj3jfFwMC7sSSDyIJeZh9tBB4Rvv2vsMiYU8zAC6Eg35eIbPONu42 pUMd0QQa79H3cyYEDtUzyskzur0Jry5azzb8JdQWipgVKFh5g3CHce2XAFlVjw2a 9RyCKg41A9LvdB5l/PvBtsxig2PzaYqE09rXAfUM7eLNFlOLbL99uc1WJbIFfG43 Czh9vPxsuJ5RkdwS7R0m4GYDw8+BKW6WjpaC+Eje4I8X1rAQK0H/BLTCxe2dLRB7 0neAg8e3g6NdisRSLOP74xoEn/dhijNP7ENOFF1EdP/BFPHL7+sRsV6XYwwBeUAc c6YsxeAPylm6gvIq/ESoRiY+e5QWvImHIWP+zB/cySYdT0fQHL9WjO6/uZW0ALuv oZugICSmZ15pYlACIU8iYztRkS19CJZrUV7Gbq4+AurUKP8kCEI= =2Ohx -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: "This is a very small update with no bug fixes and no new features. The larger update of overlayfs for this cycle, the re-factoring of overlayfs code into generic backing_file helpers, was already merged via Christian. Summary: - Simplify/clarify some code No bug fixes here, just some changes following questions from Al about overlayfs code that could be a little more simple to follow. - Overlayfs documentation style fixes Mainly fixes for ReST formatting suggested by documentation developers" * tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: overlayfs.rst: fix ReST formatting overlayfs.rst: use consistent feature names ovl: initialize ovl_copy_up_ctx.destname inside ovl_do_copy_up() ovl: remove redundant ofs->indexdir member |
||
![]() |
17b9e388c6 |
fscrypt updates for 6.8
Adjust the timing of the fscrypt keyring destruction, to prepare for btrfs's fscrypt support. Also document that CephFS supports fscrypt now. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZZx4UBQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK85+AQCBHoG6R5UuPqafoDtabcCpxRW/ZHdo WzOwjvHz1/tq5AEApogvjPI/3v2gelLnG9ZrXUBZMWZN6W0LQbH/k1VHjQ8= =nvWY -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "Adjust the timing of the fscrypt keyring destruction, to prepare for btrfs's fscrypt support. Also document that CephFS supports fscrypt now" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fs: move fscrypt keyring destruction to after ->put_super f2fs: move release of block devices to after kill_block_super() fscrypt: document that CephFS supports fscrypt now fscrypt: update comment for do_remove_key() fscrypt.rst: update definition of struct fscrypt_context_v2 |
||
![]() |
12958e9c4c |
New code for 6.8:
* New features/functionality * Online repair * Reserve disk space for online repairs. * Fix misinteraction between the AIL and btree bulkloader because of which the bulk load fails to queue a buffer for writeback if it happens to be on the AIL list. * Prevent transaction reservation overflows when reaping blocks during online repair. * Whenever possible, bulkloader now copies multiple records into a block. * Support repairing of 1. Per-AG free space, inode and refcount btrees. 2. Ondisk inodes. 3. File data and attribute fork mappings. * Verify the contents of 1. Inode and data fork of realtime bitmap file. 2. Quota files. * Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks about a pmem device being removed. * Bug fixes * Fix memory leak of recovered attri intent items. * Fix UAF during log intent recovery. * Fix realtime geometry integer overflows. * Prevent scrub from live locking in xchk_iget. * Prevent fs shutdown when removing files during low free disk space. * Prevent transaction reservation overflow when extending an RT device. * Prevent incorrect warning from being printed when extending a filesystem. * Fix an off-by-one error in xreap_agextent_binval. * Serialize access to perag radix tree during deletion operation. * Fix perag memory leak during growfs. * Allow allocation of minlen realtime extent when the maximum sized realtime free extent is minlen in size. * Cleanups * Remove duplicate boilerplate code spread across functionality associated with different log items. * Cleanup resblks interfaces. * Pass defer ops pointer to defer helpers instead of an enum. * Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings. * Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size of structures and structure member offsets. This is done in order to be able to share the code with userspace. * Move XFS documentation under a new directory specific to XFS. * Do not invoke deferred ops' ->create_done callback if the deferred operation does not have an intent item associated with it. * Remove duplicate inclusion of header files from scrub/health.c. * Refactor Realtime code. * Cleanup attr code. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZZJQbwAKCRAH7y4RirJu 9JjkAP9Zg0QZNmAMsZwvgEBbuF/OnHKl4GmPA5uq0jPmSWCOqAEA0HjlOmuNfQWn 93fIw6CPbt+9QCluTYBwUisKLIJ/wgA= =qmO0 -----END PGP SIGNATURE----- Merge tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Chandan Babu: "New features/functionality: - Online repair: - Reserve disk space for online repairs - Fix misinteraction between the AIL and btree bulkloader because of which the bulk load fails to queue a buffer for writeback if it happens to be on the AIL list - Prevent transaction reservation overflows when reaping blocks during online repair - Whenever possible, bulkloader now copies multiple records into a block - Support repairing of 1. Per-AG free space, inode and refcount btrees 2. Ondisk inodes 3. File data and attribute fork mappings - Verify the contents of 1. Inode and data fork of realtime bitmap file 2. Quota files - Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks about a pmem device being removed Bug fixes: - Fix memory leak of recovered attri intent items - Fix UAF during log intent recovery - Fix realtime geometry integer overflows - Prevent scrub from live locking in xchk_iget - Prevent fs shutdown when removing files during low free disk space - Prevent transaction reservation overflow when extending an RT device - Prevent incorrect warning from being printed when extending a filesystem - Fix an off-by-one error in xreap_agextent_binval - Serialize access to perag radix tree during deletion operation - Fix perag memory leak during growfs - Allow allocation of minlen realtime extent when the maximum sized realtime free extent is minlen in size Cleanups: - Remove duplicate boilerplate code spread across functionality associated with different log items - Cleanup resblks interfaces - Pass defer ops pointer to defer helpers instead of an enum - Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings - Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size of structures and structure member offsets. This is done in order to be able to share the code with userspace - Move XFS documentation under a new directory specific to XFS - Do not invoke deferred ops' ->create_done callback if the deferred operation does not have an intent item associated with it - Remove duplicate inclusion of header files from scrub/health.c - Refactor Realtime code - Cleanup attr code" * tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (123 commits) xfs: use the op name in trace_xlog_intent_recovery_failed xfs: fix a use after free in xfs_defer_finish_recovery xfs: turn the XFS_DA_OP_REPLACE checks in xfs_attr_shortform_addname into asserts xfs: remove xfs_attr_sf_hdr_t xfs: remove struct xfs_attr_shortform xfs: use xfs_attr_sf_findname in xfs_attr_shortform_getvalue xfs: remove xfs_attr_shortform_lookup xfs: simplify xfs_attr_sf_findname xfs: move the xfs_attr_sf_lookup tracepoint xfs: return if_data from xfs_idata_realloc xfs: make if_data a void pointer xfs: fold xfs_rtallocate_extent into xfs_bmap_rtalloc xfs: simplify and optimize the RT allocation fallback cascade xfs: reorder the minlen and prod calculations in xfs_bmap_rtalloc xfs: remove XFS_RTMIN/XFS_RTMAX xfs: remove rt-wrappers from xfs_format.h xfs: factor out a xfs_rtalloc_sumlevel helper xfs: tidy up xfs_rtallocate_extent_exact xfs: merge the calls to xfs_rtallocate_range in xfs_rtallocate_block xfs: reflow the tail end of xfs_rtallocate_extent_block ... |
||
![]() |
9f2a635235 |
Quite a lot of kexec work this time around. Many singleton patches in
many places. The notable patch series are: - nilfs2 folio conversion from Matthew Wilcox in "nilfs2: Folio conversions for file paths". - Additional nilfs2 folio conversion from Ryusuke Konishi in "nilfs2: Folio conversions for directory paths". - IA64 remnant removal in Heiko Carstens's "Remove unused code after IA-64 removal". - Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere in "Treewide: enable -Wmissing-prototypes". This had some followup fixes: - Nathan Chancellor has cleaned up the hexagon build in the series "hexagon: Fix up instances of -Wmissing-prototypes". - Nathan also addressed some s390 warnings in "s390: A couple of fixes for -Wmissing-prototypes". - Arnd Bergmann addresses the same warnings for MIPS in his series "mips: address -Wmissing-prototypes warnings". - Baoquan He has made kexec_file operate in a top-down-fitting manner similar to kexec_load in the series "kexec_file: Load kernel at top of system RAM if required" - Baoquan He has also added the self-explanatory "kexec_file: print out debugging message if required". - Some checkstack maintenance work from Tiezhu Yang in the series "Modify some code about checkstack". - Douglas Anderson has disentangled the watchdog code's logging when multiple reports are occurring simultaneously. The series is "watchdog: Better handling of concurrent lockups". - Yuntao Wang has contributed some maintenance work on the crash code in "crash: Some cleanups and fixes". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZ2R6AAKCRDdBJ7gKXxA juCVAP4t76qUISDOSKugB/Dn5E4Nt9wvPY9PcufnmD+xoPsgkQD+JVl4+jd9+gAV vl6wkJDiJO5JZ3FVtBtC3DFA/xHtVgk= =kQw+ -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Quite a lot of kexec work this time around. Many singleton patches in many places. The notable patch series are: - nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio conversions for file paths'. - Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2: Folio conversions for directory paths'. - IA64 remnant removal in Heiko Carstens's 'Remove unused code after IA-64 removal'. - Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere in 'Treewide: enable -Wmissing-prototypes'. This had some followup fixes: - Nathan Chancellor has cleaned up the hexagon build in the series 'hexagon: Fix up instances of -Wmissing-prototypes'. - Nathan also addressed some s390 warnings in 's390: A couple of fixes for -Wmissing-prototypes'. - Arnd Bergmann addresses the same warnings for MIPS in his series 'mips: address -Wmissing-prototypes warnings'. - Baoquan He has made kexec_file operate in a top-down-fitting manner similar to kexec_load in the series 'kexec_file: Load kernel at top of system RAM if required' - Baoquan He has also added the self-explanatory 'kexec_file: print out debugging message if required'. - Some checkstack maintenance work from Tiezhu Yang in the series 'Modify some code about checkstack'. - Douglas Anderson has disentangled the watchdog code's logging when multiple reports are occurring simultaneously. The series is 'watchdog: Better handling of concurrent lockups'. - Yuntao Wang has contributed some maintenance work on the crash code in 'crash: Some cleanups and fixes'" * tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits) crash_core: fix and simplify the logic of crash_exclude_mem_range() x86/crash: use SZ_1M macro instead of hardcoded value x86/crash: remove the unused image parameter from prepare_elf_headers() kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE scripts/decode_stacktrace.sh: strip unexpected CR from lines watchdog: if panicking and we dumped everything, don't re-enable dumping watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting watchdog/hardlockup: adopt softlockup logic avoiding double-dumps kexec_core: fix the assignment to kimage->control_page x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init() lib/trace_readwrite.c:: replace asm-generic/io with linux/io nilfs2: cpfile: fix some kernel-doc warnings stacktrace: fix kernel-doc typo scripts/checkstack.pl: fix no space expression between sp and offset x86/kexec: fix incorrect argument passed to kexec_dprintk() x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs nilfs2: add missing set_freezable() for freezable kthread kernel: relay: remove relay_file_splice_read dead code, doesn't work docs: submit-checklist: remove all of "make namespacecheck" ... |
||
![]() |
fb46e22a9e |
Many singleton patches against the MM code. The patch series which
are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series "maple_tree: add mt_free_one() and mt_attr() helpers" "Some cleanups of maple tree" - In the series "mm: use memmap_on_memory semantics for dax/kmem" Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series "Add folio_zero_tail() and folio_fill_tail()" "Make folio_start_writeback return void" "Fix fault handler's handling of poisoned tail pages" "Convert aops->error_remove_page to ->error_remove_folio" "Finish two folio conversions" "More swap folio conversions" - Kefeng Wang has also contributed folio-related work in the series "mm: cleanup and use more folio in page fault" - Jim Cromie has improved the kmemleak reporting output in the series "tweak kmemleak report format". - In the series "stackdepot: allow evicting stack traces" Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series "mm: page_alloc: fixes for high atomic reserve caluculations". - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series "samples: introduce cgroup events listeners". - Some mapletree maintanance work from Liam Howlett in the series "maple_tree: iterator state changes". - Nhat Pham has improved zswap's approach to writeback in the series "workload-specific and memory pressure-driven zswap writeback". - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series "mm/damon: let users feed and tame/auto-tune DAMOS" "selftests/damon: add Python-written DAMON functionality tests" "mm/damon: misc updates for 6.8" - Yosry Ahmed has improved memcg's stats flushing in the series "mm: memcg: subtree stats flushing and thresholds". - In the series "Multi-size THP for anonymous memory" Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series "More buffer_head cleanups". - Suren Baghdasaryan has done work on Andrea Arcangeli's series "userfaultfd move option". UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a "KSM Advisor", in the series "mm/ksm: Add ksm advisor". This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series "mm/zswap: dstmem reuse optimizations and cleanups". - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is "Clean up the writeback paths". - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series "kasan: save mempool stack traces". - Andrey also performed some KASAN maintenance work in the series "kasan: assorted clean-ups". - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series "mm/rmap: interface overhaul". - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series "mm/mglru: Kconfig cleanup". - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series "Remove some lruvec page accounting functions". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU= =0NHs -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ... |
||
![]() |
3f6984e730 |
vfs-6.8.super
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUx4wAKCRCRxhvAZXjc osaNAQC/c+xXVfiq/pFbuK9MQLna4RGZaGcG9k312YniXbHq0AD9HAf4aPcZwPy1 /wkD4pauj3UZ3f0xBSyazGBvAXyN0Qc= =iFAQ -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ... |
||
![]() |
c1f1f5bf41 |
fscrypt: document that CephFS supports fscrypt now
The help text for CONFIG_FS_ENCRYPTION and the fscrypt.rst documentation file both list the filesystems that support fscrypt. CephFS added support for fscrypt in v6.6, so add CephFS to the list. Link: https://lore.kernel.org/r/20231227045158.87276-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> |
||
![]() |
4498a8eccc |
netfs, fscache: Remove ->begin_cache_operation
Remove ->begin_cache_operation() in favour of just calling fscache directly. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com |
||
![]() |
3485b88390 |
mm: thp: introduce multi-size THP sysfs interface
In preparation for adding support for anonymous multi-size THP, introduce new sysfs structure that will be used to control the new behaviours. A new directory is added under transparent_hugepage for each supported THP size, and contains an `enabled` file, which can be set to "inherit" (to inherit the global setting), "always", "madvise" or "never". For now, the kernel still only supports PMD-sized anonymous THP, so only 1 directory is populated. The first half of the change converts transhuge_vma_suitable() and hugepage_vma_check() so that they take a bitfield of orders for which the user wants to determine support, and the functions filter out all the orders that can't be supported, given the current sysfs configuration and the VMA dimensions. The resulting functions are renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders() respectively. Convenience functions that take a single, unencoded order and return a boolean are also defined as thp_vma_suitable_order() and thp_vma_allowable_order(). The second half of the change implements the new sysfs interface. It has been done so that each supported THP size has a `struct thpsize`, which describes the relevant metadata and is itself a kobject. This is pretty minimal for now, but should make it easy to add new per-thpsize files to the interface if needed in future (e.g. per-size defrag). Rather than keep the `enabled` state directly in the struct thpsize, I've elected to directly encode it into huge_anon_orders_[always|madvise|inherit] bitfields since this reduces the amount of work required in thp_vma_allowable_orders() which is called for every page fault. See Documentation/admin-guide/mm/transhuge.rst, as modified by this commit, for details of how the new sysfs interface works. [ryan.roberts@arm.com: fix build warning when CONFIG_SYSFS is disabled] Link: https://lkml.kernel.org/r/20231211125320.3997543-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20231207161211.2374093-4-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Barry Song <v-songbaohua@oppo.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Rientjes <rientjes@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Itaru Kitayama <itaru.kitayama@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
d17bb4620f |
overlayfs.rst: fix ReST formatting
Fix some indentation issues and fix missing newlines in quoted text by converting quoted text to code blocks. Reported-by: Christian Brauner <brauner@kernel.org> Suggested-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> |
||
![]() |
bdc10bdf4b |
overlayfs.rst: use consistent feature names
Use the feature names "metacopy" and "index" consistently throughout the document. Covert the numbered list of features "redirect_dir", "index", "xino" to section headings, so that those features could be referenced in the document by their name. Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> |
||
![]() |
48aa137e5a |
docs: filesystems: document the squashfs specific mount options
When SQUASHFS_CHOICE_DECOMP_BY_MOUNT is set, the "threads" mount option can be used to specify the decompression mode: single-threaded, multi-threaded, percpu or the number of threads used for decompression. When SQUASHFS_CHOICE_DECOMP_BY_MOUNT is not set, SQUASHFS_DECOMP_MULTI and SQUASHFS_MOUNT_DECOMP_THREADS are both set, the "threads" option can also be used to specify the number of threads used for decompression. This mount option is only mentioned in fs/squashfs/Kconfig, which makes it difficult to find. Another mount option available is "errors", which can be configured to panic the kernel when squashfs errors are encountered. Add both these options to the squashfs documentation, making them more noticeable. Link: https://lkml.kernel.org/r/20231117161215.140282-1-amiculas@cisco.com Signed-off-by: Ariel Miculas <amiculas@cisco.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
af7628d6ec |
fs: convert error_remove_page to error_remove_folio
There were already assertions that we were not passing a tail page to error_remove_page(), so make the compiler enforce that by converting everything to pass and use a folio. Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
33318c0e6b |
fscrypt.rst: update definition of struct fscrypt_context_v2
Get the copy of the fscrypt_context_v2 definition in the documentation
in sync with the actual definition, which was changed recently by
commit
|
||
![]() |
011f129fee |
Documentation: xfs: consolidate XFS docs into its own subdirectory
XFS docs are currently in upper-level Documentation/filesystems. Although these are currently 4 docs, they are already outstanding as a group and can be moved to its own subdirectory. Consolidate them into Documentation/filesystems/xfs/. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> |
||
![]() |
11ca77cdcc |
docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
By default, shared mmap is disabled in FUSE DIRECT_IO mode. However, when the DIRECT_IO_ALLOW_MMAP flag is enabled in the FUSE_INIT reply, shared mmap is allowed. Signed-off-by: Tyler Fanelli <tfanelli@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> |
||
![]() |
a8b0026847 |
rename(): avoid a deadlock in the case of parents having no common ancestor
... and fix the directory locking documentation and proof of correctness. Holding ->s_vfs_rename_mutex *almost* prevents ->d_parent changes; the case where we really don't want it is splicing the root of disconnected tree to somewhere. In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an ancestor of Y" only if X and Y are already in the same tree. Otherwise it can go from false to true, and one can construct a deadlock on that. Make lock_two_directories() report an error in such case and update the callers of lock_rename()/lock_rename_child() to handle such errors. And yes, such conditions are not impossible to create ;-/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
![]() |
22e111ed6c |
rename(): fix the locking of subdirectories
We should never lock two subdirectories without having taken ->s_vfs_rename_mutex; inode pointer order or not, the "order" proposed in |
||
![]() |
1c18edd1b7 |
__dentry_kill(): new locking scheme
Currently we enter __dentry_kill() with parent (along with the victim dentry and victim's inode) held locked. Then we mark dentry refcount as dead call ->d_prune() remove dentry from hash remove it from the parent's list of children unlock the parent, don't need it from that point on detach dentry from inode, unlock dentry and drop the inode (via ->d_iput()) call ->d_release() regain the lock on dentry check if it's on a shrink list (in which case freeing its empty husk has to be left to shrink_dentry_list()) or not (in which case we can free it ourselves). In the former case, mark it as an empty husk, so that shrink_dentry_list() would know it can free the sucker. drop the lock on dentry ... and usually the caller proceeds to drop a reference on the parent, possibly retaking the lock on it. That is painful for a bunch of reasons, starting with the need to take locks out of order, but not limited to that - the parent of positive dentry can change if we drop its ->d_lock, so getting these locks has to be done with care. Moreover, as soon as dentry is out of the parent's list of children, shrink_dcache_for_umount() won't see it anymore, making it appear as if the parent is inexplicably busy. We do work around that by having shrink_dentry_list() decrement the parent's refcount first and put it on shrink list to be evicted once we are done with __dentry_kill() of child, but that may in some cases lead to ->d_iput() on child called after the parent got killed. That doesn't happen in cases where in-tree ->d_iput() instances might want to look at the parent, but that's brittle as hell. Solution: do removal from the parent's list of children in the very end of __dentry_kill(). As the result, the callers do not need to lock the parent and by the time we really need the parent locked, dentry is negative and is guaranteed not to be moved around. It does mean that ->d_prune() will be called with parent not locked. It also means that we might see dentries in process of being torn down while going through the parent's list of children; those dentries will be unhashed, negative and with refcount marked dead. In practice, that's enough for in-tree code that looks through the list of children to do the right thing as-is. Out-of-tree code might need to be adjusted. Calling conventions: __dentry_kill(dentry) is called with dentry->d_lock held, along with ->i_lock of its inode (if any). It either returns the parent (locked, with refcount decremented to 0) or NULL (if there'd been no parent or if refcount decrement for parent hadn't reached 0). lock_for_kill() is adjusted for new requirements - it doesn't touch the parent's ->d_lock at all. Callers adjusted. Note that for dput() we don't need to bother with fast_dput() for the parent - we just need to check retain_dentry() for it, since its ->d_lock is still held since the moment when __dentry_kill() had taken it to remove the victim from the list of children. The kludge with early decrement of parent's refcount in shrink_dentry_list() is no longer needed - shrink_dcache_for_umount() sees the half-killed dentries in the list of children for as long as they are pinning the parent. They are easily recognized and accounted for by select_collect(), so we know we are not done yet. As the result, we always have the expected ordering for ->d_iput()/->d_release() vs. __dentry_kill() of the parent, no exceptions. Moreover, the current rules for shrink lists (one must make sure that shrink_dcache_for_umount() won't happen while any dentries from the superblock in question are on any shrink lists) are gone - shrink_dcache_for_umount() will do the right thing in all cases, taking such dentries out. Their empty husks (memory occupied by struct dentry itself + its external name, if any) will remain on the shrink lists, but they are no obstacles to filesystem shutdown. And such husks will get freed as soon as shrink_dentry_list() of the list they are on gets to them. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
![]() |
2f42f1eb90 |
Call retain_dentry() with refcount 0
Instead of bumping it from 0 to 1, calling retain_dentry(), then decrementing it back to 0 (with ->d_lock held all the way through), just leave refcount at 0 through all of that. It will have a visible effect for ->d_delete() - now it can be called with refcount 0 instead of 1 and it can no longer play silly buggers with dropping/regaining ->d_lock. Not that any in-tree instances tried to (it's pretty hard to get right). Any out-of-tree ones will have to adjust (assuming they need any changes). Note that we do not need to extend rcu-critical area here - we have verified that refcount is non-negative after having grabbed ->d_lock, so nobody will be able to free dentry until they get into __dentry_kill(), which won't happen until they manage to grab ->d_lock. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
![]() |
da549bdd15 |
dentry: switch the lists of children to hlist
Saves a pointer per struct dentry and actually makes the things less clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use of hlist_for_... iterators. A couple of new helpers - d_first_child() and d_next_sibling(), to make the expressions less awful. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
![]() |
01bc8e9ae2
|
porting: document block device freeze and thaw changes
Link: https://lore.kernel.org/r/20230927-vfs-super-freeze-v1-7-ecc36d9ab4d9@kernel.org Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-9-599c19f4faac@kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
4746be1d22 |
docs: vfs: fix typo in struct xattr_handlers
The structure is called struct xattr_handler, singular, not plural. Fixing the typo also makes it greppable with the whole word matching flag. Signed-off-by: Ariel Miculas <amiculas@cisco.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20231027152101.226296-1-amiculas@cisco.com> |
||
![]() |
62b241efff |
MAINTAINERS: erofs: add EROFS webpage
Add a new `W:` field of the EROFS entry points to the documentation site at <https://erofs.docs.kernel.org>. In addition, update the in-tree documentation and Kconfig too. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231117085329.1624223-1-hsiangkao@linux.alibaba.com |
||
![]() |
13d88ac54d |
vfs-6.7.fsid
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZUpEaAAKCRCRxhvAZXjc ounBAQCAoS66gnOZ+k4kOWwB2zZ1Ueh3dPFC7IcEZ+pwFS8hpAEAxUQxV0TSWf5l W/1oKRtAJyuSYvehHeMUSJmHVBiM8w4= =bNm0 -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fanotify fsid updates from Christian Brauner: "This work is part of the plan to enable fanotify to serve as a drop-in replacement for inotify. While inotify is availabe on all filesystems, fanotify currently isn't. In order to support fanotify on all filesystems two things are needed: (1) all filesystems need to support AT_HANDLE_FID (2) all filesystems need to report a non-zero f_fsid This contains (1) and allows filesystems to encode non-decodable file handlers for fanotify without implementing any exportfs operations by encoding a file id of type FILEID_INO64_GEN from i_ino and i_generation. Filesystems that want to opt out of encoding non-decodable file ids for fanotify that don't support NFS export can do so by providing an empty export_operations struct. This also partially addresses (2) by generating f_fsid for simple filesystems as well as freevxfs. Remaining filesystems will be dealt with by separate patches. Finally, this contains the patch from the current exportfs maintainers which moves exportfs under vfs with Chuck, Jeff, and Amir as maintainers and vfs.git as tree" * tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: create an entry for exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined freevxfs: derive f_fsid from bdev->bd_dev fs: report f_fsid from s_dev for "simple" filesystems exportfs: support encoding non-decodeable file handles by default exportfs: define FILEID_INO64_GEN* file handle types exportfs: make ->encode_fh() a mandatory method for NFS export exportfs: add helpers to check if filesystem can encode/decode file handles |
||
![]() |
7f851936a0 |
overlayfs update for 6.7-rc1
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmVIg7wACgkQEVvVyTe/ 1WpPtw//WiQc+gwGvLEyi9YFZWTuAOO8ZGSxF7t+CU2SYy9d91s3K8dNRx2kWBh0 ycsFIYhTyq6BrHqlg1JpI4WW8S9SW7BkXbE4f5Lm/kiiGqlJn+eCA+aR8AqPMpVx KemCHJQj2WwcjlHRonAYIyQOApNePwy7EPPDlk+TlVEgMRtDHW+CY1ftChqi8bVf /aYoFdJIbliIUWNKzBeje/Hypz1a3aUrMitquoK3A91RgIgQQmnWN6yirT4Z4gKB Vd9vnnFPu9LWjCv+RUuP3C0G4zHkP92sfbEIulKqml0Vx68JZyTy5jsUzxaGmXaQ im4neCEUV88PZoAJgGFcBHbi4y0bt5WDIf7kdZGheZJv8H/X5TQTxlXle3h0qHEp Rx65OjPGzZjfia1nFzNjUCd+jCtdp02H1WxchfsRpwmXqGPYBLsNPWN+BoSaLjtL gbEMRMAs4mAJObnhAIOZclzh0boLi4IWz0yNkBoZRxkUOAUDV7UaICQoSneWj36j OH75cNrRV7vZeMQy2aLWkxGm2LabhjzDZXwO9zwHZFMKIpGg7zko10Fko9qS+20p BHe+aqWgr0qxcY41fpKzu0ftSuNs9Jsn1/I/jqmgwFw+ogRwjuxgl2LkCHOPh3cS AMwd1ZahQDQKq6Wgh1WCOi31n1d8+/Gy74IojwSC++XaPMqYB0I= =MPCb -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - Overlayfs aio cleanups and fixes Cleanups and minor fixes in preparation for factoring out of read/write passthrough code. - Overlayfs lock ordering changes Hold mnt_writers only throughout copy up instead of a long lived elevated refcount. - Add support for nesting overlayfs private xattrs There are cases where you want to use an overlayfs mount as a lowerdir for another overlayfs mount. For example, if the system rootfs is on overlayfs due to composefs, or to make it volatile (via tmpfs), then you cannot currently store a lowerdir on the rootfs, because the inner overlayfs will eat all the whiteouts and overlay xattrs. This means you can't e.g. store on the rootfs a prepared container image for use with overlayfs. This adds support for nesting of overlayfs mounts by escaping the problematic features and unescaping them when exposing to the overlayfs user. - Add new mount options for appending lowerdirs * tag 'ovl-update-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: add support for appending lowerdirs one by one ovl: refactor layer parsing helpers ovl: store and show the user provided lowerdir mount option ovl: remove unused code in lowerdir param parsing ovl: Add documentation on nesting of overlayfs mounts ovl: Add an alternative type of whiteout ovl: Support escaped overlay.* xattrs ovl: Add OVL_XATTR_TRUSTED/USER_PREFIX_LEN macros ovl: Move xattr support to new xattrs.c file ovl: do not encode lower fh with upper sb_writers held ovl: do not open/llseek lower file with upper sb_writers held ovl: reorder ovl_want_write() after ovl_inode_lock() ovl: split ovl_want_write() into two helpers ovl: add helper ovl_file_modified() ovl: protect copying of realinode attributes to ovl inode ovl: punt write aio completion to workqueue ovl: propagate IOCB_APPEND flag on writes to realfile ovl: use simpler function to convert iocb to rw flags |
||
![]() |
87a201b43b |
Changes since last update:
- Fix inode metadata space layout documentation; - Avoid warning MicroLZMA format anymore; - Fix erofs_insert_workgroup() lockref usage; - Some cleanups. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmVBMVoRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qq8iA/+NdvaKPFvexKuJkW2n9t4UaDEPqS6qobk B1uIaXxDhz6RdhDyhLA7ysQQi1Z8g0t/MXgCWZGM1NXJ995Y0VE+dFlT+a5QPLiu 4rz8bnZpdUHVF0jIyoru/nPZQWXsYBh2AeYLKPi+nvQoCRtoRsvdKXxv76iEF45o zE2548TXqogL3Q04oH040fRzVwYROqnF85q9uCPNFS/Sh/swxyob9waBijlzbECS GLjyJ+ZsQBBG95SzwNEPSFek5+od9ty2UKQwDJfqABOhbQT2Li6r2t0afyrlqDQY m7VvvmzkFe5uYwAkuMkAOxhnW+eBzxz/whk+8W3dw/lm4dDbe0LuCrGXVdivdf3F dO5ywSEGJnOH1FqNF1ubAWdzJ+qoaTupuOpwrGM9/lHGgxCIVjW06XzKfOv/AfaQ VR2SPTrcA6eulXsLGopVho9dtHBUQZIAmJjQcWIyCpYm15NmElDXsybN/Os2Hgo0 d2Tsx9fgsN25xRsIBPxiV6qsEXqx56Nce1U4k0VV4ali81Y22nqq6rV4AFWz8KNJ LhYFcL5YgO5RxYEpoQGkt4hyeDkxt4lc/cYqCr8E8kpNEuz0P3O27NcLnSP7CJVQ pBYfSoV5tm+SyK4toeacG+HJTnohNyLE4vIXN96awoCBfJiickY7ZxJNuf6wRbOA 4Y+aMq/lQXQ= =Yum+ -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "Nothing exciting lands for this cycle, since we're still busying in developing support for sub-page blocks and large-folios of compressed data for new scenarios on Android. In this cycle, MicroLZMA format is marked as stable, and there are minor cleanups around documentation and codebase. In addition, it also fixes incorrect lockref usage in erofs_insert_workgroup(). Summary: - Fix inode metadata space layout documentation - Avoid warning for MicroLZMA format anymore - Fix erofs_insert_workgroup() lockref usage - Some cleanups" * tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix erofs_insert_workgroup() lockref usage erofs: tidy up redundant includes erofs: get rid of ROOT_NID() erofs: simplify compression configuration parser erofs: don't warn MicroLZMA format anymore erofs: fix inode metadata space layout description in documentation |
||
![]() |
babe393974 |
The number of commits for documentation is not huge this time around, but
there are some significant changes nonetheless: - Some more Spanish-language and Chinese translations. - The much-discussed documentation of the confidential-computing threat model. - Powerpc and RISCV documentation move under Documentation/arch - these complete this particular bit of documentation churn. - A large traditional-Chinese documentation update. - A new document on backporting and conflict resolution. - Some kernel-doc and Sphinx fixes. Plus the usual smattering of smaller updates and typo fixes. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmVBNv8PHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Y0JkH/36MOpkaDnsY69/dMRKSuD4mAAP2H6LS8V63 SsMgH5VCj8lcy/Tz1+J89t14pbcX8l0viKxSo4UxvzoJ5snrz8A8gZ9oqY7NCcNs nMtolnN5IwdbgGnEGqASSLsl07lnabhRK0VYv9ZO7lHjYQp97VsJ/qrjJn385HFE vYW8iRcxcKdwtuuwOtbPcdAMjP54saJdNC5wMLsfMR0csKcGbzaSNpqpiGovzT7l phG2DSxrJH0gUZyeGPryroNppaf+mVKSDSiwRdI8mzm0J67p6dZYYwBS1Iw6Awbf 8iYoj6W63/FVQbXffPx5d6ffOSQh4JkAskxgBUOzluSGusSDc+4= =9HU5 -----END PGP SIGNATURE----- Merge tag 'docs-6.7' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "The number of commits for documentation is not huge this time around, but there are some significant changes nonetheless: - Some more Spanish-language and Chinese translations - The much-discussed documentation of the confidential-computing threat model - Powerpc and RISCV documentation move under Documentation/arch - these complete this particular bit of documentation churn - A large traditional-Chinese documentation update - A new document on backporting and conflict resolution - Some kernel-doc and Sphinx fixes Plus the usual smattering of smaller updates and typo fixes" * tag 'docs-6.7' of git://git.lwn.net/linux: (40 commits) scripts/kernel-doc: Fix the regex for matching -Werror flag docs: backporting: address feedback Documentation: driver-api: pps: Update PPS generator documentation speakup: Document USB support doc: blk-ioprio: Bring the doc in line with the implementation docs: usb: fix reference to nonexistent file in UVC Gadget docs: doc-guide: mention 'make refcheckdocs' Documentation: fix typo in dynamic-debug howto scripts/kernel-doc: match -Werror flag strictly Documentation/sphinx: Remove the repeated word "the" in comments. docs: sparse: add SPDX-License-Identifier docs/zh_CN: Add subsystem-apis Chinese translation docs/zh_TW: update contents for zh_TW docs: submitting-patches: encourage direct notifications to commenters docs: add backporting and conflict resolution document docs: move riscv under arch docs: update link to powerpc/vmemmap_dedup.rst mm/memory-hotplug: fix typo in documentation docs: move powerpc under arch PCI: Update the devres documentation regarding to pcim_*() ... |
||
![]() |
1b10d2c8c6 |
Bootconfig for v6.7:
- Documentation update for /proc/cmdline, which includes both the parameters from bootloader and the embedded parameters in the kernel. - fs/proc: Add bootloader argument as a comment line to /proc/bootconfig so that the user can distinguish what parameters were passed from bootloader even if bootconfig modified that. - Documentation fix to add /proc/bootconfig to proc.rst. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmVAzokbHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bJxMH+QHgxumk/zhQ6TG1szG3 TWrkhg5oP2hGRTbJc6lK/BJM5E/xz6SNoGP6nEEW5JdkzKwNyNSLab+J0as9Ua3V KtJGiAtp4FZpVl6B2y5UBqBerRl8U4NjW8TOsEv8Dy4gVlgjdbyS7fzy3T86DEZ5 QGQ004pYVciSzmRvEmk3f0GUqsgJUYOjZWIGp2+fukpTnJhD4o3K3UwUs4Rz1tF7 cnVm34af67648/mGSxB+qHBHq4dznrWopv/SgH6OcEzOOpvEGAwPuteAms3S8g3j WOZNHB7QlNkVlASlAcC+yPHjJBtWMfRTsWQIxW5F0wSqq0SRCzTMLHU1wltJagyq lBQ= =J8P0 -----END PGP SIGNATURE----- Merge tag 'bootconfig-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig updates from Masami Hiramatsu: - Documentation update for /proc/cmdline, which includes both the parameters from bootloader and the embedded parameters in the kernel - fs/proc: Add bootloader argument as a comment line to /proc/bootconfig so that the user can distinguish what parameters were passed from bootloader even if bootconfig modified that - Documentation fix to add /proc/bootconfig to proc.rst * tag 'bootconfig-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: doc: Add /proc/bootconfig to proc.rst fs/proc: Add boot loader arguments as comment to /proc/bootconfig doc: Update /proc/cmdline documentation to include boot config |
||
![]() |
acbc3ecb80 |
doc: Add /proc/bootconfig to proc.rst
Add /proc/bootconfig description to Documentation/filesystems/proc.rst. Link: https://lore.kernel.org/all/20231005171747.541123-3-paulmck@kernel.org/ Reported-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> |
||
![]() |
24e16e385f |
ovl: add support for appending lowerdirs one by one
Add new mount options lowerdir+ and datadir+ that can be used to add layers to lower layers stack one by one. Unlike the legacy lowerdir mount option, special characters (i.e. colons and cammas) are not unescaped with these new mount options. The new mount options can be repeated to compose a large stack of lower layers, but they may not be mixed with the lagacy lowerdir mount option, because for displaying lower layers in mountinfo, we do not want to mix escaped with unescaped lower layers path syntax. Similar to data-only layer rules with the lowerdir mount option, the datadir+ option must follow at least one lowerdir+ option and the lowerdir+ option must not follow the datadir+ option. If the legacy lowerdir mount option follows lowerdir+ and datadir+ mount options, it overrides them. Sepcifically, calling: fsconfig(FSCONFIG_SET_STRING, "lowerdir", "", 0); can be used to reset previously setup lower layers. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/r/CAJfpegt7VC94KkRtb1dfHG8+4OzwPBLYqhtc8=QFUxpFJE+=RQ@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> |
||
![]() |
bb7055a734 |
ovl: Add documentation on nesting of overlayfs mounts
Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> |
||
![]() |
8829687a4a |
fscrypt updates for 6.7
This update adds support for configuring the crypto data unit size (i.e. the granularity of file contents encryption) to be less than the filesystem block size. This can allow users to use inline encryption hardware in some cases when it wouldn't otherwise be possible. In addition, there are two commits that are prerequisites for the extent-based encryption support that the btrfs folks are working on. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZT8acBQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+czAQDkStgX1ICJQANnxwbrg/SUVdZjPuFH sJw3sUVpBR81TwEA/SyWh3YzVNZdpE7PWNrCknrC+qnO8hd9QBEjnQfwIQc= =t44a -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "This update adds support for configuring the crypto data unit size (i.e. the granularity of file contents encryption) to be less than the filesystem block size. This can allow users to use inline encryption hardware in some cases when it wouldn't otherwise be possible. In addition, there are two commits that are prerequisites for the extent-based encryption support that the btrfs folks are working on" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: track master key presence separately from secret fscrypt: rename fscrypt_info => fscrypt_inode_info fscrypt: support crypto data unit size less than filesystem block size fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes fscrypt: compute max_lblk_bits from s_maxbytes and block size fscrypt: make the bounce page pool opt-in instead of opt-out fscrypt: make it clearer that key_prefix is deprecated |
||
![]() |
8b16da681e |
NFSD 6.7 Release Notes
This release completes the SunRPC thread scheduler work that was begun in v6.6. The scheduler can now find an svc thread to wake in constant time and without a list walk. Thanks again to Neil Brown for this overhaul. Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD control plane. The long-term plan is to provide the same functionality as found in /proc/fs/nfsd, plus some interesting additions, and then migrate the NFSD user space utilities to netlink. A long series to overhaul NFSD's NFSv4 operation encoding was applied in this release. The goals are to bring this family of encoding functions in line with the matching NFSv4 decoding functions and with the NFSv2 and NFSv3 XDR functions, preparing the way for better memory safety and maintainability. A further improvement to NFSD's write delegation support was contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the server to retrieve cached size and mtime data from clients holding write delegations. If the server can retrieve this information, it does not have to recall the delegation in some cases. The usual panoply of bug fixes and minor improvements round out this release. As always I am grateful to all contributors, reviewers, and testers. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmU5IuoACgkQM2qzM29m f5eVsg//bVp8S93ci/oDlKfzOwH2fO5e5rna91wrDpJxkd51h6KTx55dSRG5sjAZ EywIVOann6xCtsixAPyff5Cweg2dWvzQRsy1ZnvWQ1qZBzD5KAJY5LPkeSFUCKBo Zani/qTOYbxzgFMjZx+yDSXDPKG68WYZBQK59SI7mURu4SYdk8aRyNY8mjHfr0Vh Aqrcny4oVtXV4sL5P5G/2FUW7WKT3olA3jSYlRRNMhbs2qpEemRCCrspOEMMad+b t1+ZCg+U27PMranvOJnof4RU7peZbaxDWA0gyiUbivVXVtZn9uOs0ffhktkvechL ePc33dqdp2ITdKIPA6JlaRv5WflKXQw0YYM9Kv5mcR4A2el7owL4f/pMlPhtbYwJ IOJv15KdKVN979G2e6WMYiKK+iHfaUUguhMEXnfnGoAajHOZNQiUEo3iFQAD7LDc DvMF8d9QqYmB9IW8FOYaRRfZGJOQHf3TL79Nd08z/bn5swvlvfj77leux9Sb+0/m Luk2Xvz2AJVSXE31wzabaGHkizN+BtH+e4MMbXUHBPW5jE9v7XOnEUFr4UdZyr9P Gl87A7NcrzNjJWT5TrnzM4sOslNsx46Aeg+VuNt2fSRn2dm6iBu2B8s0N4imx6dV PX1y9VSLq5WRhjrFZ1qeiZdsuTaQtrEiNDoRIQR6nCJPAV80iFk= =B4wJ -----END PGP SIGNATURE----- Merge tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "This release completes the SunRPC thread scheduler work that was begun in v6.6. The scheduler can now find an svc thread to wake in constant time and without a list walk. Thanks again to Neil Brown for this overhaul. Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD control plane. The long-term plan is to provide the same functionality as found in /proc/fs/nfsd, plus some interesting additions, and then migrate the NFSD user space utilities to netlink. A long series to overhaul NFSD's NFSv4 operation encoding was applied in this release. The goals are to bring this family of encoding functions in line with the matching NFSv4 decoding functions and with the NFSv2 and NFSv3 XDR functions, preparing the way for better memory safety and maintainability. A further improvement to NFSD's write delegation support was contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the server to retrieve cached size and mtime data from clients holding write delegations. If the server can retrieve this information, it does not have to recall the delegation in some cases. The usual panoply of bug fixes and minor improvements round out this release. As always I am grateful to all contributors, reviewers, and testers" * tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (127 commits) svcrdma: Fix tracepoint printk format svcrdma: Drop connection after an RDMA Read error NFSD: clean up alloc_init_deleg() NFSD: Fix frame size warning in svc_export_parse() NFSD: Rewrite synopsis of nfsd_percpu_counters_init() nfsd: Clean up errors in nfs3proc.c nfsd: Clean up errors in nfs4state.c NFSD: Clean up errors in stats.c NFSD: simplify error paths in nfsd_svc() NFSD: Clean up nfsd4_encode_seek() NFSD: Clean up nfsd4_encode_offset_status() NFSD: Clean up nfsd4_encode_copy_notify() NFSD: Clean up nfsd4_encode_copy() NFSD: Clean up nfsd4_encode_test_stateid() NFSD: Clean up nfsd4_encode_exchange_id() NFSD: Clean up nfsd4_do_encode_secinfo() NFSD: Clean up nfsd4_encode_access() NFSD: Clean up nfsd4_encode_readdir() NFSD: Clean up nfsd4_encode_entry4() NFSD: Add an nfsd4_encode_nfs_cookie4() helper ... |
||
![]() |
3b3f874cc1 |
vfs-6.7.misc
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTpoQAAKCRCRxhvAZXjc
ovFNAQDgIRjXfZ1Ku+USxsRRdqp8geJVaNc3PuMmYhOYhUenqgEAmC1m+p0y31dS
P6+HlL16Mqgu0tpLCcJK9BibpDZ0Ew4=
=7yD1
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual fses.
Features:
- Rename and export helpers that get write access to a mount. They
are used in overlayfs to get write access to the upper mount.
- Print the pretty name of the root device on boot failure. This
helps in scenarios where we would usually only print
"unknown-block(1,2)".
- Add an internal SB_I_NOUMASK flag. This is another part in the
endless POSIX ACL saga in a way.
When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip
the umask because if the relevant inode has POSIX ACLs set it might
take the umask from there. But if the inode doesn't have any POSIX
ACLs set then we apply the umask in the filesytem itself. So we end
up with:
(1) no SB_POSIXACL -> strip umask in vfs
(2) SB_POSIXACL -> strip umask in filesystem
The umask semantics associated with SB_POSIXACL allowed filesystems
that don't even support POSIX ACLs at all to raise SB_POSIXACL
purely to avoid umask stripping. That specifically means NFS v4 and
Overlayfs. NFS v4 does it because it delegates this to the server
and Overlayfs because it needs to delegate umask stripping to the
upper filesystem, i.e., the filesystem used as the writable layer.
This went so far that SB_POSIXACL is raised eve on kernels that
don't even have POSIX ACL support at all.
Stop this blatant abuse and add SB_I_NOUMASK which is an internal
superblock flag that filesystems can raise to opt out of umask
handling. That should really only be the two mentioned above. It's
not that we want any filesystems to do this. Ideally we have all
umask handling always in the vfs.
- Make overlayfs use SB_I_NOUMASK too.
- Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in
IS_POSIXACL() if the kernel doesn't have support for it. This is a
very old patch but it's only possible to do this now with the wider
cleanup that was done.
- Follow-up work on fake path handling from last cycle. Citing mostly
from Amir:
When overlayfs was first merged, overlayfs files of regular files
and directories, the ones that are installed in file table, had a
"fake" path, namely, f_path is the overlayfs path and f_inode is
the "real" inode on the underlying filesystem.
In v6.5, we took another small step by introducing of the
backing_file container and the file_real_path() helper. This change
allowed vfs and filesystem code to get the "real" path of an
overlayfs backing file. With this change, we were able to make
fsnotify work correctly and report events on the "real" filesystem
objects that were accessed via overlayfs.
This method works fine, but it still leaves the vfs vulnerable to
new code that is not aware of files with fake path. A recent
example is commit
|
||
![]() |
e21fc2038c
|
exportfs: make ->encode_fh() a mandatory method for NFS export
Rename the default helper for encoding FILEID_INO32_GEN* file handles to generic_encode_ino32_fh() and convert the filesystems that used the default implementation to use the generic helper explicitly. After this change, exportfs_encode_inode_fh() no longer has a default implementation to encode FILEID_INO32_GEN* file handles. This is a step towards allowing filesystems to encode non-decodeable file handles for fanotify without having to implement any export_operations. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
5aa9130acb
|
porting: update locking requirements
Now that s_umount is never taken under open_mutex update the documentation to say so. Link: https://lore.kernel.org/r/20231017184823.1383356-1-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
0ede61d858
|
file: convert to SLAB_TYPESAFE_BY_RCU
In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based freeing for files completely. This is a pretty sensitive change overall but it might actually be worth doing. The main downside is the subtlety. The other one is that we should really wait for Jann's patch to land that enables KASAN to handle SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this exists. With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times which requires a few changes. So it isn't sufficient anymore to just acquire a reference to the file in question under rcu using atomic_long_inc_not_zero() since the file might have already been recycled and someone else might have bumped the reference. In other words, callers might see reference count bumps from newer users. For this reason it is necessary to verify that the pointer is the same before and after the reference count increment. This pattern can be seen in get_file_rcu() and __files_get_rcu(). In addition, it isn't possible to access or check fields in struct file without first aqcuiring a reference on it. Not doing that was always very dodgy and it was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a reference under rcu or they must hold the files_lock of the fdtable. Failing to do either one of this is a bug. Thanks to Jann for pointing out that we need to ensure memory ordering between reallocations and pointer check by ensuring that all subsequent loads have a dependency on the second load in get_file_rcu() and providing a fixup that was folded into this patch. Cc: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
15baf55481 |
fscrypt: track master key presence separately from secret
Master keys can be in one of three states: present, incompletely removed, and absent (as per FSCRYPT_KEY_STATUS_* used in the UAPI). Currently, the way that "present" is distinguished from "incompletely removed" internally is by whether ->mk_secret exists or not. With extent-based encryption, it will be necessary to allow per-extent keys to be derived while the master key is incompletely removed, so that I/O on open files will reliably continue working after removal of the key has been initiated. (We could allow I/O to sometimes fail in that case, but that seems problematic for reasons such as writes getting silently thrown away and diverging from the existing fscrypt semantics.) Therefore, when the filesystem is using extent-based encryption, ->mk_secret can't be wiped when the key becomes incompletely removed. As a prerequisite for doing that, this patch makes the "present" state be tracked using a new field, ->mk_present. No behavior is changed yet. The basic idea here is borrowed from Josef Bacik's patch "fscrypt: use a flag to indicate that the master key is being evicted" (https://lore.kernel.org/r/e86c16dddc049ff065f877d793ad773e4c6bfad9.1696970227.git.josef@toxicpanda.com). I reimplemented it using a "present" bool instead of an "evicted" flag, fixed a couple bugs, and tried to update everything to be consistent. Note: I considered adding a ->mk_status field instead, holding one of FSCRYPT_KEY_STATUS_*. At first that seemed nice, but it ended up being more complex (despite simplifying FS_IOC_GET_ENCRYPTION_KEY_STATUS), since it would have introduced redundancy and had weird locking rules. Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231015061055.62673-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> |
||
![]() |
2dd10de8e6 |
lockd: introduce safe async lock op
This patch reverts mostly commit |
||
![]() |
32db510708 |
ovl: fix regression in showing lowerdir mount option
Before commit |
||
![]() |
78a50b6a41 |
erofs: fix inode metadata space layout description in documentation
Xattrs, extents, data inline are _placed after_, not _followed by_ the corresponding inode. This patch fixes it. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20231010113915.436591-1-tiwei.btw@antgroup.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> |
||
![]() |
1c20c65ea0 |
doc: Update /proc/cmdline documentation to include boot config
Update the /proc/cmdline documentation to explicitly state that this file provides kernel boot parameters obtained via boot config from the kernel image as well as those supplied by the boot loader. Link: https://lore.kernel.org/all/20231005171747.541123-1-paulmck@kernel.org/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> |
||
![]() |
b78b18fb8e |
Changes since last update:
- Fix a memory leak issue when using LZMA global compressed deduplication; - Fix empty device tags in flatdev mode; - Update documentation for recent new features. -----BEGIN PGP SIGNATURE----- iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCZR7+cREceGlhbmdAa2Vy bmVsLm9yZwAKCRA5NzHcH7XmBEHRAQCYKy+4zs4J2AavzVaxRPsbgun3VfVb4pxe yqxR+vDOLgD/RAQoadAMcgKYnnDZrziLtb0myOO2SFGEIMTkI7FzEQE= =+SF9 -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Fix a memory leak issue when using LZMA global compressed deduplication - Fix empty device tags in flatdev mode - Update documentation for recent new features * tag 'erofs-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: update documentation erofs: allow empty device tags in flatdev mode erofs: fix memory leak of LZMA global compressed deduplication |
||
![]() |
3048102d9d |
erofs: update documentation
- update new features like bloom filter and DEFLATE. - add documentation for the long xattr name prefixes, which was landed upstream since v6.4. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230928134852.31118-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> |
||
![]() |
84422aee15 |
v6.6-rc4.vfs.fixes
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZRKHuAAKCRCRxhvAZXjc ohOLAQDU9Fxq5UdqCdmsyi/b24XJFZlQhcVIZy2Hrhcor9TiVQEAjuECGlxFPSgj atVOWLdugDJquiHextqTEMgIecJpNw4= =uINF -----END PGP SIGNATURE----- Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains the usual miscellaneous fixes and cleanups for vfs and individual fses: Fixes: - Revert ki_pos on error from buffered writes for direct io fallback - Add missing documentation for block device and superblock handling for changes merged this cycle - Fix reiserfs flexible array usage - Ensure that overlayfs sets ctime when setting mtime and atime - Disable deferred caller completions with overlayfs writes until proper support exists Cleanups: - Remove duplicate initialization in pipe code - Annotate aio kioctx_table with __counted_by" * tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: overlayfs: set ctime when setting mtime and atime ntfs3: put resources during ntfs_fill_super() ovl: disable IOCB_DIO_CALLER_COMP porting: document superblock as block device holder porting: document new block device opening order fs/pipe: remove duplicate "offset" initializer fs-writeback: do not requeue a clean inode having skipped pages aio: Annotate struct kioctx_table with __counted_by direct_write_fallback(): on error revert the ->ki_pos update from buffered write reiserfs: Replace 1-element array with C99 style flex-array |
||
![]() |
5b11888471 |
fscrypt: support crypto data unit size less than filesystem block size
Until now, fscrypt has always used the filesystem block size as the granularity of file contents encryption. Two scenarios have come up where a sub-block granularity of contents encryption would be useful: 1. Inline crypto hardware that only supports a crypto data unit size that is less than the filesystem block size. 2. Support for direct I/O at a granularity less than the filesystem block size, for example at the block device's logical block size in order to match the traditional direct I/O alignment requirement. (1) first came up with older eMMC inline crypto hardware that only supports a crypto data unit size of 512 bytes. That specific case ultimately went away because all systems with that hardware continued using out of tree code and never actually upgraded to the upstream inline crypto framework. But, now it's coming back in a new way: some current UFS controllers only support a data unit size of 4096 bytes, and there is a proposal to increase the filesystem block size to 16K. (2) was discussed as a "nice to have" feature, though not essential, when support for direct I/O on encrypted files was being upstreamed. Still, the fact that this feature has come up several times does suggest it would be wise to have available. Therefore, this patch implements it by using one of the reserved bytes in fscrypt_policy_v2 to allow users to select a sub-block data unit size. Supported data unit sizes are powers of 2 between 512 and the filesystem block size, inclusively. Support is implemented for both the FS-layer and inline crypto cases. This patch focuses on the basic support for sub-block data units. Some things are out of scope for this patch but may be addressed later: - Supporting sub-block data units in combination with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64, in most cases. Unfortunately this combination usually causes data unit indices to exceed 32 bits, and thus fscrypt_supported_policy() correctly disallows it. The users who potentially need this combination are using f2fs. To support it, f2fs would need to provide an option to slightly reduce its max file size. - Supporting sub-block data units in combination with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. This has the same problem described above, but also it will need special code to make DUN wraparound still happen on a FS block boundary. - Supporting use case (2) mentioned above. The encrypted direct I/O code will need to stop requiring and assuming FS block alignment. This won't be hard, but it belongs in a separate patch. - Supporting this feature on filesystems other than ext4 and f2fs. (Filesystems declare support for it via their fscrypt_operations.) On UBIFS, sub-block data units don't make sense because UBIFS encrypts variable-length blocks as a result of compression. CephFS could support it, but a bit more work would be needed to make the fscrypt_*_block_inplace functions play nicely with sub-block data units. I don't think there's a use case for this on CephFS anyway. Link: https://lore.kernel.org/r/20230925055451.59499-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> |
||
![]() |
9845f7409e |
Documentation: xfs: Remove repeated word in comments
Remove the repeated word "the" in comments. Signed-off-by: Charles Han <hanchunchao@inspur.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed: Dave Chinner <dchinner@redhat.com> Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> |