Commit Graph

4647 Commits

Author SHA1 Message Date
Matthew Wilcox (Oracle)
317c31680e f2fs: Use a folio in f2fs_find_in_inline_dir()
Remove two calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:40 +00:00
Matthew Wilcox (Oracle)
f7725a793e f2fs: Use a folio in f2fs_recover_inline_data()
Remove four calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:40 +00:00
Matthew Wilcox (Oracle)
5f5efd7030 f2fs: Use a folio in f2fs_read_inline_data()
Remove two calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:40 +00:00
Matthew Wilcox (Oracle)
466f0e661d f2fs: Use a folio in f2fs_write_inline_data()
Remove four calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:40 +00:00
Matthew Wilcox (Oracle)
0999f98e6c f2fs: Remove f2fs_get_xnode_page()
All callers have now been converted to call f2fs_get_xnode_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:40 +00:00
Matthew Wilcox (Oracle)
c972c546fa f2fs: Use a folio in read_xattr_block()
Remove a call to compound_head()

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
6023048cf6 f2fs: Convert f2fs_convert_inline_page() to f2fs_convert_inline_folio()
Both callers have a folio, so pass it in.  Removes seven calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
214235c224 f2fs: Pass folios to set_new_dnode()
Removes a lot of conversions of folios into pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
a8a4ad25b5 f2fs: Use a folio in f2fs_truncate_xattr_node()
Remove a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
d9554488b0 f2fs: Use a folio in f2fs_do_truncate_blocks()
Removes two calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
8f8c0c4544 f2fs: Pass a folio to f2fs_move_rehashed_dirents()
Remove seven hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
717735db26 f2fs: Pass a folio to f2fs_move_inline_dirents()
Pass the folio through do_convert_inline_dir() to
f2fs_move_inline_dirents().  Remove a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
9283b58a14 f2fs: Use a folio in f2fs_add_inline_entry()
Also convert f2fs_init_inode_metadata() to take a folio.  Remove
Remove three calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
869521352d f2fs: Use a folio in f2fs_try_convert_inline_dir()
Remove two hidden calls to compound_head()

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
bdbf142204 f2fs: Pass a folio to make_empty_dir()
Pass the folio into make_empty_dir() and then into
f2fs_get_new_data_folio().  Removes a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
7c99299c9a f2fs: Use a folio in f2fs_init_inode_metadata()
Since this is the only caller of f2fs_new_inode_page(), convert that
to return a folio at the same time.  Removes four calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
0e1717dd92 f2fs: Use a folio in __find_data_block()
Remove a call to f2fs_get_inode_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
c68b0bcb29 f2fs: Use a folio in prepare_write_begin
Remove a call to f2fs_get_inode_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
424fd5d831 f2fs: Remove f2fs_get_node_page()
All callers have now been converted to call f2fs_get_node_folio().
Remove this wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
da8768c875 f2fs: Use a folio in check_index_in_prev_nodes()
Remove a hidden call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:37 +00:00
Matthew Wilcox (Oracle)
1a116e876a f2fs: Use a folio in is_alive()
Remove four calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:37 +00:00
Matthew Wilcox (Oracle)
722066201f f2fs: Convert __write_node_page() to __write_node_folio()
All three callers now have a folio so pass it in.  Removes
two calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:37 +00:00
Matthew Wilcox (Oracle)
c795d9dbe0 f2fs: Convert f2fs_move_node_page() to f2fs_move_node_folio()
Pass the folio in from the one caller and use it throughout.
Removes eight hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:37 +00:00
Matthew Wilcox (Oracle)
c528defa64 f2fs: Use a folio in gc_node_segment()
Remove three calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:37 +00:00
Matthew Wilcox (Oracle)
4e9e8f81f4 f2fs: Pass a folio to f2fs_ra_node_pages()
Removes a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:37 +00:00
Matthew Wilcox (Oracle)
d68833a0b7 f2fs: Use folios in truncate_partial_nodes()
Removes two calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:37 +00:00
Matthew Wilcox (Oracle)
878a05b09c f2fs: Use a folio in truncate_nodes()
Remove two calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:36 +00:00
Matthew Wilcox (Oracle)
df41014006 f2fs: Use a folio in truncate_node()
Remove two calls to compound_head()

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:36 +00:00
Matthew Wilcox (Oracle)
fb733f9870 f2fs: Use folios in f2fs_get_dnode_of_data()
Pass the folio into __get_node_folio() and f2fs_get_node_page_ra() which
becomes f2fs_get_node_folio_ra().  That function has no callers outside
node.c, so make it static.  Removes seven calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:36 +00:00
Matthew Wilcox (Oracle)
aa220cede5 f2fs: Add f2fs_get_node_folio()
The folio equivalent of f2fs_get_node_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:36 +00:00
Matthew Wilcox (Oracle)
401da8dd78 f2fs: Use a folio in f2fs_recover_xattr_data()
Remove two calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:36 +00:00
Matthew Wilcox (Oracle)
b3094519c1 f2fs: Use a folio in write_all_xattrs()
Removes three hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:36 +00:00
Matthew Wilcox (Oracle)
153e4a7f89 f2fs: Add f2fs_get_xnode_folio()
The folio equivalent of f2fs_get_xnode_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:36 +00:00
Matthew Wilcox (Oracle)
5c93848a92 f2fs: Remove f2fs_grab_cache_page()
All callers have now been converted to f2fs_grab_cache_folio() so
we can remove this wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:35 +00:00
Matthew Wilcox (Oracle)
8b78cea812 f2fs: Use a folio in f2fs_recover_inode_page()
Save four hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:35 +00:00
Matthew Wilcox (Oracle)
7c213e98c7 f2fs: Pass a folio to f2fs_inode_chksum_verify()
Both callers now have a folio, so pass it in.  Removes three calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:35 +00:00
Matthew Wilcox (Oracle)
f0fac66fe9 f2fs: Convert read_node_page() to read_node_folio()
Both callers now have a folio, so pass it in and remove the
conversion back to a folio.  Removes two calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:35 +00:00
Matthew Wilcox (Oracle)
9d79652e44 f2fs: Use a folio in f2fs_ra_node_page()
Save a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:35 +00:00
Matthew Wilcox (Oracle)
cfa809146f f2fs: Add f2fs_new_node_folio()
Convert f2fs_new_node_page() to f2fs_new_node_folio() and add
a compatibility wrapper.  Removes five hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:35 +00:00
Matthew Wilcox (Oracle)
e57e6ee5cd f2fs: Use a folio in f2fs_move_inline_dirents()
Remove eight hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
f8fa37fbec f2fs: Use a folio in f2fs_convert_inline_inode()
Remove a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
2a96ddcb4a f2fs: Use a folio in move_data_block()
Remove 11 hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
0d53be2323 f2fs: Use a folio in ra_data_block()
Remove three hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
514163f699 f2fs: Use a folio in f2fs_xattr_fiemap()
Remove four hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
d2eb6d86e0 f2fs: Remove f2fs_get_new_data_page()
All callers have been converted to call f2fs_get_new_data_folio()
so delete this wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
a85127c58a f2fs: Use a folio in make_empty_dir()
Remove two hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
c45ce8f76a f2fs: Use a folio in f2fs_add_regular_entry()
Remove three hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:33 +00:00
Matthew Wilcox (Oracle)
c35cc972c3 f2fs: Use a folio in fill_zero()
Remove three hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:33 +00:00
Matthew Wilcox (Oracle)
6965a65caf f2fs: Remove f2fs_get_lock_data_page()
All callers have now been converted to f2fs_get_lock_data_folio(),
so remove this wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:33 +00:00
Matthew Wilcox (Oracle)
c0a4bb8ae6 f2fs: Use a folio in f2fs_defragment_range()
Remove three hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:33 +00:00
Matthew Wilcox (Oracle)
3dfcb2b43e f2fs: Use a folio in __clone_blkaddrs()
Removes five hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:33 +00:00
Matthew Wilcox (Oracle)
48b6894305 f2fs: Add f2fs_get_new_data_folio()
Convert f2fs_get_new_data_page() into f2fs_get_new_data_folio() and
add a f2fs_get_new_data_page() wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:33 +00:00
Matthew Wilcox (Oracle)
38f273c504 f2fs: Use a folio in f2fs_migrate_blocks()
Get a folio from the pagecache and use it throughout.  Removes two
calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:32 +00:00
Matthew Wilcox (Oracle)
65f99d35fe f2fs: Use a folio in f2fs_get_new_data_page()
Get folios from the pagecache instead of pages.  Remove five calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:32 +00:00
Matthew Wilcox (Oracle)
c713bbb17c f2fs: Remove f2fs_find_data_page()
All callers have now been converted to call f2fs_find_data_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:32 +00:00
Matthew Wilcox (Oracle)
d040455c79 f2fs: Use a folio in f2fs_readdir()
Get a folio instead of a page and use it throughout.  Also put the folio
before checking the error so we only have to do it once.  Saves two
hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:32 +00:00
Matthew Wilcox (Oracle)
7d5a82490f f2fs: Use a folio in f2fs_delete_entry()
Get a folio instead of a page and operate on it.  Saves a call to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:32 +00:00
Matthew Wilcox (Oracle)
e4ca8ff450 f2fs: Use a folio in find_in_level()
Get a folio instead of a page and use it throughout.  Removes a
call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:32 +00:00
Matthew Wilcox (Oracle)
366848cb71 f2fs: Remove f2fs_get_sum_page()
All callers have now been converted to call f2fs_get_sum_folio()
instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:32 +00:00
Matthew Wilcox (Oracle)
0e1073f850 f2fs: Use a folio in change_curseg()
Get a folio and use it.  Saves a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:31 +00:00
Matthew Wilcox (Oracle)
b536cd889e f2fs: Use a folio in check_index_in_prev_nodes()
Get a folio instead of a page and operate on it.  Saves a call to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:31 +00:00
Matthew Wilcox (Oracle)
5d895f7bea f2fs: Use folios in do_garbage_collect()
Get a folio instead of a page and operate on folios throughout.
Remove six calls to compound_head() and use folio_put_refs() to put
both references we hold at the same time, reducing the number of atomic
operations we do.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:31 +00:00
Matthew Wilcox (Oracle)
4a2c49d2cb f2fs: Add f2fs_get_sum_folio()
Convert f2fs_get_sum_page() to f2fs_get_sum_folio() and add a
f2fs_get_sum_page() wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:31 +00:00
Matthew Wilcox (Oracle)
6b5c4a0359 f2fs: Pass an address to scan_nat_page()
Remove a conversion from folio to page by passing in the address of the
first byte rather than the struct page containing it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:31 +00:00
Matthew Wilcox (Oracle)
350b8441c0 f2fs: Convert f2fs_get_meta_page_retry() to f2fs_get_meta_folio_retry()
Also convert get_current_nat_page() to get_current_nat_folio().
Removes three hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:31 +00:00
Matthew Wilcox (Oracle)
657b31b2d7 f2fs: Remove f2fs_get_meta_page()
All callers have now been converted to f2fs_get_meta_folio() so
we can remove this wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:31 +00:00
Matthew Wilcox (Oracle)
9fdb4325e0 f2fs: Use a folio in read_normal_summaries()
Get a folio instead of a page.  Saves a hidden call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:30 +00:00
Matthew Wilcox (Oracle)
3a34e0cdd9 f2fs: Use a folio in read_compacted_summaries()
Get a folio instead of a page.  Saves two hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:30 +00:00
Matthew Wilcox (Oracle)
8a6fb4cc55 f2fs: Use a folio in __get_nat_bitmaps()
Get a folio instead of a page.  Saves a hidden call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:30 +00:00
Matthew Wilcox (Oracle)
aa0c14ef54 f2fs: Use a folio in f2fs_get_node_info()
Get a folio instead of a page.  Saves a hidden call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:30 +00:00
Matthew Wilcox (Oracle)
eb639c8562 f2fs: Use a folio in f2fs_get_valid_checkpoint()
Convert validate_checkpoint() to return a folio and use it throughout
f2fs_get_valid_checkpoint().  Saves five hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:30 +00:00
Matthew Wilcox (Oracle)
a2c746eac3 f2fs: Use a folio in validate_checkpoint()
Convert get_checkpoint_version() to take a folio and use it throughout
validate_checkpoint().  Saves five hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:30 +00:00
Matthew Wilcox (Oracle)
375452b507 f2fs: Use a folio in f2fs_recover_orphan_inodes()
Get a folio instead of a page and use it throughout.  Saves two
hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:30 +00:00
Matthew Wilcox (Oracle)
6225716f38 f2fs: Use a folio in build_sit_entries()
Convert get_current_sit_page() to get_current_sit_folio() and then
use the folio in build_sit_entries().  Saves a hidden call to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:29 +00:00
Matthew Wilcox (Oracle)
d6f3066301 f2fs: Add f2fs_get_meta_folio()
Convert f2fs_get_meta_page() to f2fs_get_meta_folio() and add
f2fs_get_meta_page() as a wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:29 +00:00
Matthew Wilcox (Oracle)
2424ee9b75 f2fs: Remove f2fs_grab_meta_page()
All callers have now been converted to f2fs_grab_meta_folio() so
we can remove this wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:29 +00:00
Matthew Wilcox (Oracle)
1ec3662901 f2fs: Use a folio in write_compacted_summaries()
Grab a folio instead of a page.  Saves four hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:29 +00:00
Matthew Wilcox (Oracle)
43b3ed1c6c f2fs: Use a folio in write_current_sum_page()
Grab a folio instead of a page.  Saves two hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:29 +00:00
Matthew Wilcox (Oracle)
5c1b57bb83 f2fs: Use a folio in f2fs_update_meta_page()
Grab a folio instead of a page.  Saves two hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:29 +00:00
Matthew Wilcox (Oracle)
9c6b0f120e f2fs: Convert get_next_sit_page() to get_next_sit_folio()
Grab a folio instead of a page.  Also convert seg_info_to_sit_page() to
seg_info_to_sit_folio() and use a folio in f2fs_flush_sit_entries().
Saves a couple of calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:29 +00:00
Matthew Wilcox (Oracle)
5df510c8df f2fs: Use a folio in get_next_nat_page()
Call f2fs_grab_meta_folio() instead of f2fs_grab_meta_page().
Saves a hidden call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:28 +00:00
Matthew Wilcox (Oracle)
643d16687d f2fs: Use a folio in write_orphan_inodes()
Call f2fs_grab_meta_folio() instead of f2fs_grab_meta_page().
Removes four hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:28 +00:00
Matthew Wilcox (Oracle)
2525a78473 f2fs: Use a folio in f2fs_ra_meta_pages_cond()
Remove a call to find_get_page().  Saves two hidden calls to
compound_head().  Change f2fs_folio_put() to check for IS_ERR_OR_NULL
to handle the case where we got an error pointer back from
filemap_get_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:27 +00:00
Matthew Wilcox (Oracle)
95e3117621 f2fs: Use a folio in f2fs_ra_meta_pages()
Remove three hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:44 +00:00
Matthew Wilcox (Oracle)
f24f7f8cd6 f2fs: Pass a folio to next_blkaddr_of_node()
Pass the folio into sanity_check_node_footer() so that we can pass
it further into next_blkaddr_of_node().  Removes a lot of conversions
from folio->page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:44 +00:00
Matthew Wilcox (Oracle)
937d6a4d2c f2fs: Convert f2fs_get_tmp_page() to f2fs_get_tmp_folio()
Convert all the callers to receive a folio.  Removes a lot of
hidden calls to compound_head() in f2fs_put_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:44 +00:00
Matthew Wilcox (Oracle)
9030d55aed f2fs: Convert __get_meta_page() to __get_meta_folio()
Push the conversion to a page into the callers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:44 +00:00
Matthew Wilcox (Oracle)
b629c6480e f2fs: Pass a folio to f2fs_submit_merged_ipu_write()
The only caller which passes a page already has a folio, so pass
it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:44 +00:00
Matthew Wilcox (Oracle)
46fd261c67 f2fs: Use f2fs_folio_wait_writeback()
There were some missing conversions from f2fs_wait_on_page_writeback()
to f2fs_folio_wait_writeback().  Saves a call to compound_head() at each
callsite.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:43 +00:00
Matthew Wilcox (Oracle)
a8d3973863 f2fs: Convert __f2fs_write_meta_page() to __f2fs_write_meta_folio()
All callers now have a folio so pass it in.  Saves three hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:43 +00:00
Matthew Wilcox (Oracle)
668c7a5648 f2fs: Use a folio in commit_checkpoint()
Save four calls to compound_head().  Also remove the call
to f2fs_wait_on_page_writeback() as this was already done by
f2fs_grab_meta_folio() and writeback can't have restarted in the
meantime since we hold the folio locked.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:43 +00:00
Matthew Wilcox (Oracle)
b15ca18571 f2fs: Add f2fs_grab_meta_folio()
Turn f2fs_grab_meta_page() into a wrapper around f2fs_grab_meta_folio().
Saves three hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:22:43 +00:00
Matthew Wilcox (Oracle)
0d1e687e43 f2fs: Use a folio in f2fs_quota_read()
Support arbitrary size folios and remove a few hidden calls to
compound_head().  Also remove an unnecessary test of the uptodaate flag;
if mapping_read_folio_gfp() cannot bring the folio uptodate, it will
return an error.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:21:35 +00:00
Matthew Wilcox (Oracle)
c14b4562bc f2fs: Use a folio in move_data_block()
Fetch a folio from the pagecache instead of a page and operate on it
throughout.  Removes eight calls to compound_head() and an access to
page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:21:33 +00:00
Matthew Wilcox (Oracle)
6f8b9318c6 f2fs: Remove access to page->mapping in f2fs_is_cp_guaranteed()
page->mapping will be removed soon, so call page_folio() on the
page that was passed in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:18:13 +00:00
Matthew Wilcox (Oracle)
1db30d8236 f2fs: Use a folio in add_ipu_page()
Convert the incoming page to a folio at the start, then use the
folio later in the function.  Moves a call to page_folio() earlier,
and removes an access to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:18:13 +00:00
Matthew Wilcox (Oracle)
acede6a573 f2fs: Use bio_for_each_folio_all() in __has_merged_page()
Iterate over each folio rather than each page.  Convert
f2fs_compress_control_page() to f2fs_compress_control_folio() since
this is the only caller.  Removes a reference to page->mapping which
is going away soon as well as calls to fscrypt_is_bounce_page() and
fscrypt_pagecache_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:18:13 +00:00
Matthew Wilcox (Oracle)
98251710ca f2fs: Use F2FS_P_SB() in f2fs_is_compressed_page()
Remove a reference to page->mapping which is going away soon.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:18:13 +00:00
Matthew Wilcox (Oracle)
4e5109c7c5 f2fs: Introduce fio_inode()
This helper returns the inode associated with the f2fs_io_info.  That's a
relatively common thing to want, mildly awkward to get and provides one
place to change if we decide to record it directly, or change fio->page
to fio->folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[Jaegeuk Kim: fix wrong fio_inode conversion]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:18:13 +00:00
Matthew Wilcox (Oracle)
8cc29b38cb f2fs: Use a folio in f2fs_write_raw_pages()
Convert each page in rpages to a folio before operating on it.  Replaces
eight calls to compound_head() with one and removes a reference to
page->mapping which is going away soon.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:18:12 +00:00
Matthew Wilcox (Oracle)
a8cb9b3d9b f2fs: Use a folio in f2fs_compress_free_page()
Convert the incoming page to a folio and operate on it.  Removes a
reference to page->mapping which is going away soon.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:18:03 +00:00
Jaegeuk Kim
42cb74a92a f2fs: prevent kernel warning due to negative i_nlink from corrupted image
WARNING: CPU: 1 PID: 9426 at fs/inode.c:417 drop_nlink+0xac/0xd0
home/cc/linux/fs/inode.c:417
Modules linked in:
CPU: 1 UID: 0 PID: 9426 Comm: syz-executor568 Not tainted
6.14.0-12627-g94d471a4f428 #2 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:drop_nlink+0xac/0xd0 home/cc/linux/fs/inode.c:417
Code: 48 8b 5d 28 be 08 00 00 00 48 8d bb 70 07 00 00 e8 f9 67 e6 ff
f0 48 ff 83 70 07 00 00 5b 5d e9 9a 12 82 ff e8 95 12 82 ff 90
&lt;0f&gt; 0b 90 c7 45 48 ff ff ff ff 5b 5d e9 83 12 82 ff e8 fe 5f e6
ff
RSP: 0018:ffffc900026b7c28 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8239710f
RDX: ffff888041345a00 RSI: ffffffff8239717b RDI: 0000000000000005
RBP: ffff888054509ad0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff9ab36f08 R12: ffff88804bb40000
R13: ffff8880545091e0 R14: 0000000000008000 R15: ffff8880545091e0
FS:  000055555d0c5880(0000) GS:ffff8880eb3e3000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f915c55b178 CR3: 0000000050d20000 CR4: 0000000000352ef0
Call Trace:
 <task>
 f2fs_i_links_write home/cc/linux/fs/f2fs/f2fs.h:3194 [inline]
 f2fs_drop_nlink+0xd1/0x3c0 home/cc/linux/fs/f2fs/dir.c:845
 f2fs_delete_entry+0x542/0x1450 home/cc/linux/fs/f2fs/dir.c:909
 f2fs_unlink+0x45c/0x890 home/cc/linux/fs/f2fs/namei.c:581
 vfs_unlink+0x2fb/0x9b0 home/cc/linux/fs/namei.c:4544
 do_unlinkat+0x4c5/0x6a0 home/cc/linux/fs/namei.c:4608
 __do_sys_unlink home/cc/linux/fs/namei.c:4654 [inline]
 __se_sys_unlink home/cc/linux/fs/namei.c:4652 [inline]
 __x64_sys_unlink+0xc5/0x110 home/cc/linux/fs/namei.c:4652
 do_syscall_x64 home/cc/linux/arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xc7/0x250 home/cc/linux/arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb3d092324b
Code: 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66
2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 57 00 00 00 0f 05
&lt;48&gt; 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01
48
RSP: 002b:00007ffdc232d938 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb3d092324b
RDX: 00007ffdc232d960 RSI: 00007ffdc232d960 RDI: 00007ffdc232d9f0
RBP: 00007ffdc232d9f0 R08: 0000000000000001 R09: 00007ffdc232d7c0
R10: 00000000fffffffd R11: 0000000000000206 R12: 00007ffdc232eaf0
R13: 000055555d0cebb0 R14: 00007ffdc232d958 R15: 0000000000000001
 </task>

Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-14 21:14:44 +00:00
Chao Yu
05872a167c f2fs: fix to do sanity check on sbi->total_valid_block_count
syzbot reported a f2fs bug as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/f2fs.h:2521!
RIP: 0010:dec_valid_block_count+0x3b2/0x3c0 fs/f2fs/f2fs.h:2521
Call Trace:
 f2fs_truncate_data_blocks_range+0xc8c/0x11a0 fs/f2fs/file.c:695
 truncate_dnode+0x417/0x740 fs/f2fs/node.c:973
 truncate_nodes+0x3ec/0xf50 fs/f2fs/node.c:1014
 f2fs_truncate_inode_blocks+0x8e3/0x1370 fs/f2fs/node.c:1197
 f2fs_do_truncate_blocks+0x840/0x12b0 fs/f2fs/file.c:810
 f2fs_truncate_blocks+0x10d/0x300 fs/f2fs/file.c:838
 f2fs_truncate+0x417/0x720 fs/f2fs/file.c:888
 f2fs_setattr+0xc4f/0x12f0 fs/f2fs/file.c:1112
 notify_change+0xbca/0xe90 fs/attr.c:552
 do_truncate+0x222/0x310 fs/open.c:65
 handle_truncate fs/namei.c:3466 [inline]
 do_open fs/namei.c:3849 [inline]
 path_openat+0x2e4f/0x35d0 fs/namei.c:4004
 do_filp_open+0x284/0x4e0 fs/namei.c:4031
 do_sys_openat2+0x12b/0x1d0 fs/open.c:1429
 do_sys_open fs/open.c:1444 [inline]
 __do_sys_creat fs/open.c:1522 [inline]
 __se_sys_creat fs/open.c:1516 [inline]
 __x64_sys_creat+0x124/0x170 fs/open.c:1516
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/syscall_64.c:94

The reason is: in fuzzed image, sbi->total_valid_block_count is
inconsistent w/ mapped blocks indexed by inode, so, we should
not trigger panic for such case, instead, let's print log and
set fsck flag.

Fixes: 39a53e0ce0 ("f2fs: add superblock and major in-memory structure")
Reported-by: syzbot+8b376a77b2f364097fbe@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/67f3c0b2.050a0220.396535.0547.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-12 17:06:57 +00:00
Chao Yu
aa00c6d5d0 f2fs: support to disable linear lookup fallback
After commit 91b587ba79 ("f2fs: Introduce linear search for
dentries"), f2fs forced to use linear lookup whenever a hash-based
lookup fails on casefolded directory, it may affect performance
for scenarios: a) create a new file w/ filename it doesn't exist
in directory, b) lookup a file which may be removed.

This patch supports to disable linear lookup fallback, so, once there
is a solution for commit 5c26d2f1d3 ("unicode: Don't special case
ignorable code points") to fix red heart unicode issue, then we can
set an encodeing flag to disable the fallback for performance recovery.

The way is kept in line w/ ext4, refer to commit 9e28059d56 ("ext4:
introduce linear search for dentries").

Cc: Daniel Lee <chullee@google.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-12 17:06:16 +00:00
yohan.joung
d26fecb03e f2fs: prevent the current section from being selected as a victim during GC
When selecting a victim using next_victim_seg in a large section, the
selected section might already have been cleared and designated as the
new current section, making it actively in use.
This behavior causes inconsistency between the SIT and SSA.

F2FS-fs (dm-54): Inconsistent segment (70961) type [0, 1] in SSA and SIT
Call trace:
dump_backtrace+0xe8/0x10c
show_stack+0x18/0x28
dump_stack_lvl+0x50/0x6c
dump_stack+0x18/0x28
f2fs_stop_checkpoint+0x1c/0x3c
do_garbage_collect+0x41c/0x271c
f2fs_gc+0x27c/0x828
gc_thread_func+0x290/0x88c
kthread+0x11c/0x164
ret_from_fork+0x10/0x20

issue scenario
segs_per_sec=2
- seg#0 and seg#1 are all dirty
- all valid blocks are removed in seg#1
- gc select this sec and next_victim_seg=seg#0
- migrate seg#0, next_victim_seg=seg#1
- checkpoint -> sec(seg#0, seg#1)  becomes free
- allocator assigns sec(seg#0, seg#1) to curseg
- gc tries to migrate seg#1

Fixes: e3080b0120 ("f2fs: support subsectional garbage collection")
Signed-off-by: yohan.joung <yohan.joung@sk.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-12 17:05:41 +00:00
Jaegeuk Kim
05d3273ad0 f2fs: clean up unnecessary indentation
No functional change.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu
061cf3a84b f2fs: fix to do sanity check on ino and xnid
syzbot reported a f2fs bug as below:

INFO: task syz-executor140:5308 blocked for more than 143 seconds.
      Not tainted 6.14.0-rc7-syzkaller-00069-g81e4f8d68c66 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor140 state:D stack:24016 pid:5308  tgid:5308  ppid:5306   task_flags:0x400140 flags:0x00000006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5378 [inline]
 __schedule+0x190e/0x4c90 kernel/sched/core.c:6765
 __schedule_loop kernel/sched/core.c:6842 [inline]
 schedule+0x14b/0x320 kernel/sched/core.c:6857
 io_schedule+0x8d/0x110 kernel/sched/core.c:7690
 folio_wait_bit_common+0x839/0xee0 mm/filemap.c:1317
 __folio_lock mm/filemap.c:1664 [inline]
 folio_lock include/linux/pagemap.h:1163 [inline]
 __filemap_get_folio+0x147/0xb40 mm/filemap.c:1917
 pagecache_get_page+0x2c/0x130 mm/folio-compat.c:87
 find_get_page_flags include/linux/pagemap.h:842 [inline]
 f2fs_grab_cache_page+0x2b/0x320 fs/f2fs/f2fs.h:2776
 __get_node_page+0x131/0x11b0 fs/f2fs/node.c:1463
 read_xattr_block+0xfb/0x190 fs/f2fs/xattr.c:306
 lookup_all_xattrs fs/f2fs/xattr.c:355 [inline]
 f2fs_getxattr+0x676/0xf70 fs/f2fs/xattr.c:533
 __f2fs_get_acl+0x52/0x870 fs/f2fs/acl.c:179
 f2fs_acl_create fs/f2fs/acl.c:375 [inline]
 f2fs_init_acl+0xd7/0x9b0 fs/f2fs/acl.c:418
 f2fs_init_inode_metadata+0xa0f/0x1050 fs/f2fs/dir.c:539
 f2fs_add_inline_entry+0x448/0x860 fs/f2fs/inline.c:666
 f2fs_add_dentry+0xba/0x1e0 fs/f2fs/dir.c:765
 f2fs_do_add_link+0x28c/0x3a0 fs/f2fs/dir.c:808
 f2fs_add_link fs/f2fs/f2fs.h:3616 [inline]
 f2fs_mknod+0x2e8/0x5b0 fs/f2fs/namei.c:766
 vfs_mknod+0x36d/0x3b0 fs/namei.c:4191
 unix_bind_bsd net/unix/af_unix.c:1286 [inline]
 unix_bind+0x563/0xe30 net/unix/af_unix.c:1379
 __sys_bind_socket net/socket.c:1817 [inline]
 __sys_bind+0x1e4/0x290 net/socket.c:1848
 __do_sys_bind net/socket.c:1853 [inline]
 __se_sys_bind net/socket.c:1851 [inline]
 __x64_sys_bind+0x7a/0x90 net/socket.c:1851
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Let's dump and check metadata of corrupted inode, it shows its xattr_nid
is the same to its i_ino.

dump.f2fs -i 3 chaseyu.img.raw
i_xattr_nid                             [0x       3 : 3]

So that, during mknod in the corrupted directory, it tries to get and
lock inode page twice, result in deadlock.

- f2fs_mknod
 - f2fs_add_inline_entry
  - f2fs_get_inode_page --- lock dir's inode page
   - f2fs_init_acl
    - f2fs_acl_create(dir,..)
     - __f2fs_get_acl
      - f2fs_getxattr
       - lookup_all_xattrs
        - __get_node_page --- try to lock dir's inode page

In order to fix this, let's add sanity check on ino and xnid.

Cc: stable@vger.kernel.org
Reported-by: syzbot+cc448dcdc7ae0b4e4ffa@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/67e06150.050a0220.21942d.0005.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu
ecf68ffee7 f2fs: add a fast path in finish_preallocate_blocks()
This patch uses i_sem to protect access/update on f2fs_inode_info.flag
in finish_preallocate_blocks(), it avoids grabbing inode_lock() in
each open().

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu
773704c1ef f2fs: zone: fix to avoid inconsistence in between SIT and SSA
w/ below testcase, it will cause inconsistence in between SIT and SSA.

create_null_blk 512 2 1024 1024
mkfs.f2fs -m /dev/nullb0
mount /dev/nullb0 /mnt/f2fs/
touch /mnt/f2fs/file
f2fs_io pinfile set /mnt/f2fs/file
fallocate -l 4GiB /mnt/f2fs/file

F2FS-fs (nullb0): Inconsistent segment (0) type [1, 0] in SSA and SIT
CPU: 5 UID: 0 PID: 2398 Comm: fallocate Tainted: G           O       6.13.0-rc1 #84
Tainted: [O]=OOT_MODULE
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
 <TASK>
 dump_stack_lvl+0xb3/0xd0
 dump_stack+0x14/0x20
 f2fs_handle_critical_error+0x18c/0x220 [f2fs]
 f2fs_stop_checkpoint+0x38/0x50 [f2fs]
 do_garbage_collect+0x674/0x6e0 [f2fs]
 f2fs_gc_range+0x12b/0x230 [f2fs]
 f2fs_allocate_pinning_section+0x5c/0x150 [f2fs]
 f2fs_expand_inode_data+0x1cc/0x3c0 [f2fs]
 f2fs_fallocate+0x3c3/0x410 [f2fs]
 vfs_fallocate+0x15f/0x4b0
 __x64_sys_fallocate+0x4a/0x80
 x64_sys_call+0x15e8/0x1b80
 do_syscall_64+0x68/0x130
 entry_SYSCALL_64_after_hwframe+0x67/0x6f
RIP: 0033:0x7f9dba5197ca
F2FS-fs (nullb0): Stopped filesystem due to reason: 4

The reason is f2fs_gc_range() may try to migrate block in curseg, however,
its SSA block is not uptodate due to the last summary block data is still
in cache of curseg.

In this patch, we add a condition in f2fs_gc_range() to check whether
section is opened or not, and skip block migration for opened section.

Fixes: 9703d69d9d ("f2fs: support file pinning for zoned devices")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu
db03c20c08 f2fs: fix to set atomic write status more clear
1. After we start atomic write in a database file, before committing
all data, we'd better not set inode w/ vfs dirty status to avoid
redundant updates, instead, we only set inode w/ atomic dirty status.

2. After we commit all data, before committing metadata, we need to
clear atomic dirty status, and set vfs dirty status to allow vfs flush
dirty inode.

Cc: Daeho Jeong <daehojeong@google.com>
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu
2be96c2147 f2fs: fix to update injection attrs according to fault_option
When we update inject type via sysfs, it shows wrong rate value as
below, there is a same problem when we update inject rate, fix it.

Before:
F2FS-fs (vdd): build fault injection attr: rate: 0, type: 0xffff
F2FS-fs (vdd): build fault injection attr: rate: 1, type: 0x0

After:
F2FS-fs (vdd): build fault injection type: 0x1
F2FS-fs (vdd): build fault injection rate: 1

Meanwhile, let's avoid turning on all fault types when we enable fault
injection via fault_injection mount option, it will lead to shutdown
filesystem or fail the mount() easily.

mount -o fault_injection=4 /dev/vdd /mnt/f2fs
F2FS-fs (vdd): build fault injection attr: rate: 4, type: 0x7fffff
F2FS-fs (vdd): inject kmalloc in f2fs_kmalloc of f2fs_fill_super+0xbdf/0x27c0

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu
e073e92789 f2fs: add a proc entry show inject stats
This patch adds a proc entry named inject_stats to show total injected
count for each fault type.

cat /proc/fs/f2fs/<dev>/inject_stats
fault_type              injected_count
kmalloc                 0
kvmalloc                0
page alloc              0
page get                0
alloc bio(obsolete)     0
alloc nid               0
orphan                  0
no more block           0
too big dir depth       0
evict_inode fail        0
truncate fail           0
read IO error           0
checkpoint error        0
discard error           0
write IO error          0
slab alloc              0
dquot initialize        0
lock_op                 0
invalid blkaddr         0
inconsistent blkaddr    0
no free segment         0
inconsistent footer     0

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:57 +00:00
Colin Ian King
539d33455f f2fs: remove redundant assignment to variable err
The variable err is being assigned a value zero and then the following
goto page_hit reassigns err a new value. The zero assignment is redundant
and can be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
[Jaegeuk Kim: clean up braces and if condition, suggested by Dan Carpenter]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:57 +00:00
Linus Torvalds
81d8e5e213 f2fs-for-6.15-rc1
In this round, there are three major updates: 1) folio conversion, 2) refactor
 for mount API conversion, 3) some performance improvement such as direct IO,
 checkpoint speed, and IO priority hints. For stability, there are patches
 which add more sanity checks and fixes some major issues like i_size in
 atomic write operations and write pointer recovery in zoned devices.
 
 Enhancement:
  - huge folio converion work by Matthew Wilcox
  - clean up for mount API conversion by Eric Sandeen
  - improve direct IO speed in the overwrite case
  - add some sanity check on node consistency
  - set highest IO priority for checkpoint thread
  - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
  - add ioctl to get IO priority hint
  - add carve_out sysfs node for fsstat
 
 Bug fix:
  - disable nat_bits during umount to avoid potential nat entry corruption
  - fix missing i_size update on atomic writes
  - fix missing discard for active segments
  - fix running out of free segments
  - fix out-of-bounds access in f2fs_truncate_inode_blocks()
  - call f2fs_recover_quota_end() correctly
  - fix potential deadloop in prepare_compress_overwrite()
  - fix the missing write pointer correction for zoned device
  - fix to avoid panic once fallocation fails for pinfile
  - don't retry IO for corrupted data scenario
 
 There are many other clean up patches and minor bug fixes as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmfhjuYACgkQQBSofoJI
 UNJF+hAAip0Sf6bXwt43KR3lmbXg/YBTtPDACOs375Pn8pfRVsAPIAf5e1AnlBET
 rA0KpqUrEZHXenQFF9n4LIoHYqfuUw3EMempuvp4qgQcx15Kaajw+EGWVLYstNy2
 9dELc9DA55f/i1uvHezGfQFy6hMfXasf+tkaYk0z0ZYDStYboLgkVAY+F869q1Dl
 D16Y7Lna12h4eCxDdssIPDsLjFH/2LDn7SmhXsnpZxwK0Zx8JSo83WxXHnzXHxz4
 vUkukInKgqcBDvf6ufW/YF3/tqSs20XEXNK3cI1vyHx1dwij6Us+G0n0WJHL30QE
 zsliecR0X28JP1rJ3ldCjr+4Kxm6/u/Uwpinm2Fm1jL67UY4TZcaARBE8k20I6ND
 j/L1+sXrIdZ1aILM9/bwCjgXiVFdZbvlfGfpTj1duAkQgRd+/s9cjPlo5j5HTad5
 XJmzJz6YOaUNarhP/E31Z9SV9M2kEcmoDOTxKBg6ZcMWXZ27B6Z0ag92prd0GWk6
 rWDuVj0eP/LDY1QedbHRPbg1D84jVgcnldfPaf9ptln993skJGdgS0dkegTqMR0L
 H8RgpWOzWZI53gnQdePdej8diHmD8uRTrLf/oABm//GzTHC5BdwVpArOl1DCuLFF
 YkMtVEicgnWRmL3PgODAdTJXaDi3uGvT116i+lfMlUn9p+t93r0=
 =E+TE
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, there are three major updates: (1) folio conversion,
  (2) refactoring for mount API conversion, (3) some performance
  improvement such as direct IO, checkpoint speed, and IO priority
  hints.

  For stability, there are patches which add more sanity checks and
  fixes some major issues like i_size in atomic write operations and
  write pointer recovery in zoned devices.

  Enhancements:
   - huge folio converion work by Matthew Wilcox
   - clean up for mount API conversion by Eric Sandeen
   - improve direct IO speed in the overwrite case
   - add some sanity check on node consistency
   - set highest IO priority for checkpoint thread
   - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
   - add ioctl to get IO priority hint
   - add carve_out sysfs node for fsstat

  Bug fixes:
   - disable nat_bits during umount to avoid potential nat entry corruption
   - fix missing i_size update on atomic writes
   - fix missing discard for active segments
   - fix running out of free segments
   - fix out-of-bounds access in f2fs_truncate_inode_blocks()
   - call f2fs_recover_quota_end() correctly
   - fix potential deadloop in prepare_compress_overwrite()
   - fix the missing write pointer correction for zoned device
   - fix to avoid panic once fallocation fails for pinfile
   - don't retry IO for corrupted data scenario

  There are many other clean up patches and minor bug fixes as usual"

* tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: fix missing discard for active segments
  f2fs: optimize f2fs DIO overwrites
  f2fs: fix to avoid atomicity corruption of atomic file
  f2fs: pass sbi rather than sb to parse_options()
  f2fs: pass sbi rather than sb to quota qf_name helpers
  f2fs: defer readonly check vs norecovery
  f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
  f2fs: make LAZYTIME a mount option flag
  f2fs: make INLINECRYPT a mount option flag
  f2fs: factor out an f2fs_default_check function
  f2fs: consolidate unsupported option handling errors
  f2fs: use f2fs_sb_has_device_alias during option parsing
  f2fs: add carve_out sysfs node
  f2fs: fix to avoid running out of free segments
  f2fs: Remove f2fs_write_node_page()
  f2fs: Remove f2fs_write_meta_page()
  f2fs: Remove f2fs_write_data_page()
  f2fs: Remove check for ->writepage
  Revert "f2fs: rebuild nat_bits during umount"
  f2fs: fix to avoid accessing uninitialized curseg
  ...
2025-03-27 12:55:54 -07:00
Linus Torvalds
e63046adef vfs-6.15-rc1.ceph
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90r/AAKCRCRxhvAZXjc
 onzyAP9QnVuYdNZhgpl40B+TnqA8F9/QAwKjaudAiC6kYWXPrgEA3SLTcmenjfzP
 8+9OqC3WVcfTWWKXB4IDK18Yk7veVQg=
 =Eu3R
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.ceph' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs ceph updates from Christian Brauner:
 "This contains the work to remove access to page->index from ceph
  and fixes the test failure observed for ceph with generic/421 by
  refactoring ceph_writepages_start()"

* tag 'vfs-6.15-rc1.ceph' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fscrypt: Change fscrypt_encrypt_pagecache_blocks() to take a folio
  ceph: Fix error handling in fill_readdir_cache()
  fs: Remove page_mkwrite_check_truncate()
  ceph: Pass a folio to ceph_allocate_page_array()
  ceph: Convert ceph_move_dirty_page_in_page_array() to move_dirty_folio_in_page_array()
  ceph: Remove uses of page from ceph_process_folio_batch()
  ceph: Convert ceph_check_page_before_write() to use a folio
  ceph: Convert writepage_nounlock() to write_folio_nounlock()
  ceph: Convert ceph_readdir_cache_control to store a folio
  ceph: Convert ceph_find_incompatible() to take a folio
  ceph: Use a folio in ceph_page_mkwrite()
  ceph: Remove ceph_writepage()
  ceph: fix generic/421 test failure
  ceph: introduce ceph_submit_write() method
  ceph: introduce ceph_process_folio_batch() method
  ceph: extend ceph_writeback_ctl for ceph_writepages_start() refactoring
2025-03-24 12:17:13 -07:00
Chunhai Guo
21263d035f f2fs: fix missing discard for active segments
During a checkpoint, the current active segment X may not be handled
properly. This occurs when segment X has 0 valid blocks and a non-zero
number of discard blocks, for the following reasons:

locate_dirty_segment() does not mark any active segment as a prefree
segment. As a result, segment X is not included in dirty_segmap[PRE], and
f2fs_clear_prefree_segments() skips it when handling prefree segments.

add_discard_addrs() skips any segment with 0 valid blocks, so segment X is
also skipped.

Consequently, no `struct discard_cmd` is actually created for segment X.
However, the ckpt_valid_map and cur_valid_map of segment X are synced by
seg_info_to_raw_sit() during the current checkpoint process. As a result,
it cannot find the missing discard bits even in subsequent checkpoints.
Consequently, the value of sbi->discard_blks remains non-zero. Thus, when
f2fs is umounted, CP_TRIMMED_FLAG will not be set due to the non-zero
sbi->discard_blks.

Relevant code process:

f2fs_write_checkpoint()
    f2fs_flush_sit_entries()
         list_for_each_entry_safe(ses, tmp, head, set_list) {
             for_each_set_bit_from(segno, bitmap, end) {
                 ...
                 add_discard_addrs(sbi, cpc, false); // skip segment X due to its 0 valid blocks
                 ...
                 seg_info_to_raw_sit(); // sync ckpt_valid_map with cur_valid_map for segment X
                 ...
             }
         }
    f2fs_clear_prefree_segments(); // segment X is not included in dirty_segmap[PRE] and is skipped

This issue is easy to reproduce with the following operations:

root # mkfs.f2fs -f /dev/f2fs_dev
root # mount -t f2fs /dev/f2fs_dev /mnt_point
root # dd if=/dev/blk_dev of=/mnt_point/1.bin bs=4k count=256
root # sync
root # rm /mnt_point/1.bin
root # umount /mnt_point
root # dump.f2fs /dev/f2fs_dev | grep "checkpoint state"
Info: checkpoint state = 45 :  crc compacted_summary unmount ---- 'trimmed' flag is missing

Since add_discard_addrs() can handle active segments with non-zero valid
blocks, it is reasonable to fix this issue by allowing it to also handle
active segments with 0 valid blocks.

Fixes: b29555505d ("f2fs: add key functions for small discards")
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-18 01:26:06 +00:00
Yohan Joung
351bc76133 f2fs: optimize f2fs DIO overwrites
this is unnecessary when we know we are overwriting already allocated
blocks and the overhead of starting a transaction can be significant
especially for multithreaded workloads doing small writes.

Signed-off-by: Yohan Joung <yohan.joung@sk.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-18 01:26:04 +00:00
Yeongjin Gil
f098aeba04 f2fs: fix to avoid atomicity corruption of atomic file
In the case of the following call stack for an atomic file,
FI_DIRTY_INODE is set, but FI_ATOMIC_DIRTIED is not subsequently set.

f2fs_file_write_iter
  f2fs_map_blocks
    f2fs_reserve_new_blocks
      inc_valid_block_count
        __mark_inode_dirty(dquot)
          f2fs_dirty_inode

If FI_ATOMIC_DIRTIED is not set, atomic file can encounter corruption
due to a mismatch between old file size and new data.

To resolve this issue, I changed to set FI_ATOMIC_DIRTIED when
FI_DIRTY_INODE is set. This ensures that FI_DIRTY_INODE, which was
previously cleared by the Writeback thread during the commit atomic, is
set and i_size is updated.

Cc: <stable@vger.kernel.org>
Fixes: fccaa81de8 ("f2fs: prevent atomic file from being dirtied before commit")
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-17 17:38:33 +00:00
Eric Sandeen
71e9bd3d5c f2fs: pass sbi rather than sb to parse_options()
With the new mount API the sb will not be available during initial option
parsing, which will happen before fill_super reads sb from disk.

Now that the sb is no longer directly referenced in parse_options, switch
it to use sbi.

(Note that all calls to f2fs_sb_has_* originating from parse_options will
need to be deferred to later before we can use the new mount API.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:07 +00:00
Eric Sandeen
b7de231b9d f2fs: pass sbi rather than sb to quota qf_name helpers
With the new mount api we will not have the superblock available during
option parsing. Prepare for this by passing *sbi rather than *sb.

For now, we are parsing after fill_super has been done, so sbi->sb will
exist. Under the new mount API this will require more care, but do the
simple change for now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen
9cca498759 f2fs: defer readonly check vs norecovery
Defer the readonly-vs-norecovery check until after option parsing is done
so that option parsing does not require an active superblock for the test.
Add a helpful message, while we're at it.

(I think could be moved back into parsing after we switch to the new mount
API if desired, as the fs context will have RO state available.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen
0edcb2197e f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
This removes another sb instance from parse_options()

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen
9100adf326 f2fs: make LAZYTIME a mount option flag
Set LAZYTIME into sbi during parsing, and transfer it to the sb in
fill_super, so that an sb is not required during option parsing.

(Note: While lazytime is normally handled via mount flag in the vfs,
some f2fs users do expect to be able to use it as an explicit mount
option string via the mount syscall, so this option must remain.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen
7d6ee50330 f2fs: make INLINECRYPT a mount option flag
Set INLINECRYPT into sbi during parsing, and transfer it to the sb in
fill_super, so that an sb is not required during option parsing.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen
abd0e040e9 f2fs: factor out an f2fs_default_check function
The current options parsing function both parses options and validates
them - factor the validation out to reduce the size of the function and
make transition to the new mount API possible, because under the new mount
API, options are parsed one at a time, and cannot all be tested at the end
of the parsing function.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen
277352b6cb f2fs: consolidate unsupported option handling errors
When certain build-time options are disabled, some mount options are not
accepted. For quota and compression, all related options are dismissed
with a single error message. For xattr, acl, and fault injection, each
option is handled individually. In addition, inline_xattr_size was missed
when CONFIG_F2FS_FS_XATTR was disabled.

Collapse xattr, acl, and fault injection errors into a single string, for
simplicity, and handle the missing inline_xattr_size case.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen
64ee7503cb f2fs: use f2fs_sb_has_device_alias during option parsing
Rather than using F2FS_HAS_FEATURE directly, use f2fs_sb_has_device_alias
macro during option parsing for consistency.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Daeho Jeong
d7b549def0 f2fs: add carve_out sysfs node
For several zoned storage devices, vendors will provide extra space
which was used for device level GC than specs and F2FS can use this
space for filesystem level GC. To do that, we can reserve the space
using reserved_blocks. However, it is not enough, since this extra
space should not be shown to users. So, with this new sysfs node,
we can hide the space by substracting reserved_blocks from total
bytes.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:15:59 +00:00
Chao Yu
f7f8932ca6 f2fs: fix to avoid running out of free segments
If checkpoint is disabled, GC can not reclaim any segments, we need
to detect such condition and bail out from fallocate() of a pinfile,
rather than letting allocator running out of free segment, which may
cause f2fs to be shutdown.

reproducer:
mkfs.f2fs -f /dev/vda 16777216
mount -o checkpoint=disable:10% /dev/vda /mnt/f2fs
for ((i=0;i<4096;i++)) do { dd if=/dev/zero of=/mnt/f2fs/$i bs=1M count=1; } done
sync
for ((i=0;i<4096;i+=2)) do { rm /mnt/f2fs/$i; } done
sync
touch /mnt/f2fs/pinfile
f2fs_io pinfile set /mnt/f2fs/pinfile
f2fs_io fallocate 0 0 4201644032 /mnt/f2fs/pinfile

cat /sys/kernel/debug/f2fs/status
output:
  - Free: 0 (0)

Fixes: f5a53edcf0 ("f2fs: support aligned pinned file")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-12 14:23:36 +00:00
Matthew Wilcox (Oracle)
7ff0104a80 f2fs: Remove f2fs_write_node_page()
Mappings which implement writepages should not implement writepage
as it can only harm writeback patterns.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 19:38:00 +00:00
Matthew Wilcox (Oracle)
3b47398d98 f2fs: Remove f2fs_write_meta_page()
Mappings which implement writepages should not implement writepage
as it can only harm writeback patterns.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 19:37:57 +00:00
Matthew Wilcox (Oracle)
6ad3ddbee8 f2fs: Remove f2fs_write_data_page()
Mappings which implement writepages should not implement writepage
as it can only harm writeback patterns.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 19:37:55 +00:00
Matthew Wilcox (Oracle)
448a834f89 f2fs: Remove check for ->writepage
We're almost able to remove a_ops->writepage.  This check is unnecessary
as we'll never call into __f2fs_write_data_pages() for character
devices.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 19:37:41 +00:00
Chao Yu
19426c4988 Revert "f2fs: rebuild nat_bits during umount"
This reverts commit 94c821fb28.

It reports that there is potential corruption in node footer,
the most suspious feature is nat_bits, let's revert recovery
related code.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:54 +00:00
Chao Yu
986c50f6bc f2fs: fix to avoid accessing uninitialized curseg
syzbot reports a f2fs bug as below:

F2FS-fs (loop3): Stopped filesystem due to reason: 7
kworker/u8:7: attempt to access beyond end of device
BUG: unable to handle page fault for address: ffffed1604ea3dfa
RIP: 0010:get_ckpt_valid_blocks fs/f2fs/segment.h:361 [inline]
RIP: 0010:has_curseg_enough_space fs/f2fs/segment.h:570 [inline]
RIP: 0010:__get_secs_required fs/f2fs/segment.h:620 [inline]
RIP: 0010:has_not_enough_free_secs fs/f2fs/segment.h:633 [inline]
RIP: 0010:has_enough_free_secs+0x575/0x1660 fs/f2fs/segment.h:649
 <TASK>
 f2fs_is_checkpoint_ready fs/f2fs/segment.h:671 [inline]
 f2fs_write_inode+0x425/0x540 fs/f2fs/inode.c:791
 write_inode fs/fs-writeback.c:1525 [inline]
 __writeback_single_inode+0x708/0x10d0 fs/fs-writeback.c:1745
 writeback_sb_inodes+0x820/0x1360 fs/fs-writeback.c:1976
 wb_writeback+0x413/0xb80 fs/fs-writeback.c:2156
 wb_do_writeback fs/fs-writeback.c:2303 [inline]
 wb_workfn+0x410/0x1080 fs/fs-writeback.c:2343
 process_one_work kernel/workqueue.c:3236 [inline]
 process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
 worker_thread+0x870/0xd30 kernel/workqueue.c:3398
 kthread+0x7a9/0x920 kernel/kthread.c:464
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Commit 8b10d36537 ("f2fs: introduce FAULT_NO_SEGMENT") allows to trigger
no free segment fault in allocator, then it will update curseg->segno to
NULL_SEGNO, though, CP_ERROR_FLAG has been set, f2fs_write_inode() missed
to check the flag, and access invalid curseg->segno directly in below call
path, then resulting in panic:

- f2fs_write_inode
 - f2fs_is_checkpoint_ready
  - has_enough_free_secs
   - has_not_enough_free_secs
    - __get_secs_required
     - has_curseg_enough_space
      - get_ckpt_valid_blocks
      : access invalid curseg->segno

To avoid this issue, let's:
- check CP_ERROR_FLAG flag in prior to f2fs_is_checkpoint_ready() in
f2fs_write_inode().
- in has_curseg_enough_space(), save curseg->segno into a temp variable,
and verify its validation before use.

Fixes: 8b10d36537 ("f2fs: introduce FAULT_NO_SEGMENT")
Reported-by: syzbot+b6b347b7a4ea1b2e29b6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67973c2b.050a0220.11b1bb.0089.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:53 +00:00
Chao Yu
1788971e0b f2fs: introduce FAULT_INCONSISTENT_FOOTER
To simulate inconsistent node footer error.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:53 +00:00
Chao Yu
2aac2538a9 f2fs: do sanity check on xattr node footer in f2fs_get_xnode_page()
This patch introduces a new wrapper f2fs_get_xnode_page(), then, caller
can use it to load xattr block to page cache, meanwhile it will do sanity
check on xattr node footer.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:53 +00:00
Chao Yu
1cf6b5670a f2fs: do sanity check on inode footer in f2fs_get_inode_page()
This patch introduces a new wrapper f2fs_get_inode_page(), then, caller
can use it to load inode block to page cache, meanwhile it will do sanity
check on inode footer.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:53 +00:00
Chao Yu
c2ecba0265 f2fs: control nat_bits feature via mount option
Introduce a new mount option "nat_bits" to control nat_bits feature,
by default nat_bits feature is disabled.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-08 16:04:10 +00:00
Jaegeuk Kim
8a2d9f00d5 f2fs: set highest IO priority for checkpoint thread
The checkpoint is the top priority thread which can stop all the filesystem
operations. Let's make it RT priority.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-05 14:37:26 +00:00
Matthew Wilcox (Oracle)
59b59a9431
fscrypt: Change fscrypt_encrypt_pagecache_blocks() to take a folio
ext4 and ceph already have a folio to pass; f2fs needs to be properly
converted but this will do for now.  This removes a reference
to page->index and page->mapping as well as removing a call to
compound_head().

Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250304170224.523141-1-willy@infradead.org
Acked-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 12:57:15 +01:00
Matthew Wilcox (Oracle)
0cd402baa0 f2fs: Add f2fs_find_data_folio()
Convert f2fs_find_data_page() to f2fs_find_data_folio() and add a
compatibility wrapper.  Saves six hidden calls to compound_head().
This was the last caller of f2fs_get_read_data_page(), so remove
the compatibility wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:26 +00:00
Matthew Wilcox (Oracle)
a86e109ee2 f2fs: Convert gc_data_segment() to use a folio
Use f2fs_get_read_data_folio() instead of f2fs_get_read_data_page().
Saves a hidden call to compound_head() in f2fs_put_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:26 +00:00
Matthew Wilcox (Oracle)
ab907aa2a2 f2fs: Convert truncate_partial_data_page() to use a folio
Retrieve a folio from the page cache and use it throughout.
Saves five hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:26 +00:00
Matthew Wilcox (Oracle)
6d1ba45c8d f2fs: Convert move_data_page() to use a folio
Fetch a folio from the page cache and use it throughout, saving
eight hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:26 +00:00
Matthew Wilcox (Oracle)
20f974cd21 f2fs: Add f2fs_get_lock_data_folio()
Convert f2fs_get_lock_data_page() to f2fs_get_lock_data_folio() and
add a compatibility wrapper.  Removes three hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:26 +00:00
Matthew Wilcox (Oracle)
4ae71b1996 f2fs: Add f2fs_get_read_data_folio()
Convert f2fs_get_read_data_page() into f2fs_get_read_data_folio() and
add a compatibility wrapper.  Saves seven hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:26 +00:00
Matthew Wilcox (Oracle)
b8fcb84230 f2fs: Hoist the page_folio() call to the start of f2fs_merge_page_bio()
Remove one call to compound_head() and a reference to page->mapping
by calling page_folio() early on.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:26 +00:00
Matthew Wilcox (Oracle)
922e24acb4 f2fs: Use a folio throughout __get_meta_page()
Use f2fs_grab_cache_folio() to get a folio and use it throughout,
removing seven calls to compound_head() and a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:26 +00:00
Matthew Wilcox (Oracle)
520b17e093 f2fs: Use a folio throughout f2fs_truncate_inode_blocks()
Use f2fs_get_node_folio() to get a folio and use it throughout.  Remove a
few calls to compound_head() and a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:25 +00:00
Matthew Wilcox (Oracle)
4d417ae2bf f2fs: Add f2fs_get_node_folio()
Change __get_node_page() to return a folio and convert back to a page in
f2fs_get_node_page() and f2fs_get_node_page_ra().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:25 +00:00
Matthew Wilcox (Oracle)
1a58a41ccc f2fs: Convert f2fs_in_warm_node_list() to take a folio
All its callers now have access to a folio, so pass it in.  Removes
an access to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:25 +00:00
Matthew Wilcox (Oracle)
521a468486 f2fs: Mark some functions as taking a const page pointer
The compiler can make some optimisations if we tell it that a function
call doesn't change this memory.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:25 +00:00
Matthew Wilcox (Oracle)
fb9660481e f2fs: Convert f2fs_write_end_io() to use a folio_iter
Iterate over each folio in the bio instead of each page.
Follow the pattern in ext4 for handling bounce folios.  Removes
a few calls to compound_head() and references to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:25 +00:00
Matthew Wilcox (Oracle)
cd8f95718c f2fs: Use a folio in do_write_page()
Convert fio->page to a folio then use it where folio APIs exist.
Removes a reference to page->mapping and a hidden call to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:25 +00:00
Matthew Wilcox (Oracle)
48a34c5981 f2fs: Use a folio in __get_node_page()
Retrieve a folio from the page cache and use it throughout.  Saves six
hidden calls to compound_head() and removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:25 +00:00
Matthew Wilcox (Oracle)
8d77f68dae f2fs: Add f2fs_grab_cache_folio()
Convert f2fs_grab_cache_page() into f2fs_grab_cache_folio()
and add a wrapper.  Removes several calls to deprecated functions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:24 +00:00
Matthew Wilcox (Oracle)
e11a311395 f2fs: Return a folio from last_fsync_dnode()
Convert last_page to last_folio in f2fs_fsync_node_pages() and
use folio APIs where they exist.  Saves a few hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:24 +00:00
Matthew Wilcox (Oracle)
18f3814fa6 f2fs: Convert last_fsync_dnode() to use a folio
Use the folio APIs where they exist.  Saves several hidden calls to
compound_head().  Also removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:24 +00:00
Matthew Wilcox (Oracle)
e23bebc3c0 f2fs: Convert f2fs_fsync_node_pages() to use a folio
Use the folio APIs where they exist.  Saves several hidden calls to
compound_head().  Also removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:24 +00:00
Matthew Wilcox (Oracle)
de90f76144 f2fs: Pass a folio to flush_dirty_inode()
Its one caller now has a folio; pass it in and do page conversions where
necessary inside flush_dirty_inode().  Saves two hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:24 +00:00
Matthew Wilcox (Oracle)
5d0a912848 f2fs: Convert f2fs_sync_node_pages() to use a folio
Use the folio APIs where they exist.  Saves several hidden calls to
compound_head().  Also removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:24 +00:00
Matthew Wilcox (Oracle)
015d9c56bd f2fs: Convert f2fs_flush_inline_data() to use a folio
Use the folio APIs where they exist.  Saves several hidden calls to
compound_head().  Also removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:24 +00:00
Matthew Wilcox (Oracle)
894ac9d330 f2fs: Add f2fs_folio_put()
Convert f2fs_put_page() to f2fs_folio_put() and add a wrapper.
Replaces three calls to compound_head() with one.

[Jaegeuk Kim: fix missing null pointer check in f2fs_put_page]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:23 +00:00
Matthew Wilcox (Oracle)
17683927d0 f2fs: Add f2fs_folio_wait_writeback()
Convert f2fs_wait_on_page_writeback() to f2fs_folio_wait_writeback()
and add a compatibiility wrapper.  Replaces five calls to
compound_head() with one.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 17:02:23 +00:00
Chao Yu
e6494977bd f2fs: fix to avoid out-of-bounds access in f2fs_truncate_inode_blocks()
syzbot reports an UBSAN issue as below:

------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in fs/f2fs/node.h:381:10
index 18446744073709550692 is out of range for type '__le32[5]' (aka 'unsigned int[5]')
CPU: 0 UID: 0 PID: 5318 Comm: syz.0.0 Not tainted 6.14.0-rc3-syzkaller-00060-g6537cfb395f3 #0
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:231 [inline]
 __ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429
 get_nid fs/f2fs/node.h:381 [inline]
 f2fs_truncate_inode_blocks+0xa5e/0xf60 fs/f2fs/node.c:1181
 f2fs_do_truncate_blocks+0x782/0x1030 fs/f2fs/file.c:808
 f2fs_truncate_blocks+0x10d/0x300 fs/f2fs/file.c:836
 f2fs_truncate+0x417/0x720 fs/f2fs/file.c:886
 f2fs_file_write_iter+0x1bdb/0x2550 fs/f2fs/file.c:5093
 aio_write+0x56b/0x7c0 fs/aio.c:1633
 io_submit_one+0x8a7/0x18a0 fs/aio.c:2052
 __do_sys_io_submit fs/aio.c:2111 [inline]
 __se_sys_io_submit+0x171/0x2e0 fs/aio.c:2081
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f238798cde9

index 18446744073709550692 (decimal, unsigned long long)
= 0xfffffffffffffc64 (hexadecimal, unsigned long long)
= -924 (decimal, long long)

In f2fs_truncate_inode_blocks(), UBSAN detects that get_nid() tries to
access .i_nid[-924], it means both offset[0] and level should zero.

The possible case should be in f2fs_do_truncate_blocks(), we try to
truncate inode size to zero, however, dn.ofs_in_node is zero and
dn.node_page is not an inode page, so it fails to truncate inode page,
and then pass zeroed free_from to f2fs_truncate_inode_blocks(), result
in this issue.

	if (dn.ofs_in_node || IS_INODE(dn.node_page)) {
		f2fs_truncate_data_blocks_range(&dn, count);
		free_from += count;
	}

I guess the reason why dn.node_page is not an inode page could be: there
are multiple nat entries share the same node block address, once the node
block address was reused, f2fs_get_node_page() may load a non-inode block.

Let's add a sanity check for such condition to avoid out-of-bounds access
issue.

Reported-by: syzbot+6653f10281a1badc749e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/66fdcdf3.050a0220.40bef.0025.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 00:47:13 +00:00
Chao Yu
d8f5b91d77 f2fs: fix to call f2fs_recover_quota_end() correctly
f2fs_recover_quota_begin() and f2fs_recover_quota_end() should be called
in pair, there is some cases we may skip calling f2fs_recover_quota_end(),
fix it.

Fixes: e1bb7d3d9c ("f2fs: fix to recover quota data correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 00:47:13 +00:00
Chao Yu
3147ee567d f2fs: fix potential deadloop in prepare_compress_overwrite()
Jan Prusakowski reported a kernel hang issue as below:

When running xfstests on linux-next kernel (6.14.0-rc3, 6.12) I
encountered a problem in generic/475 test where fsstress process
gets blocked in __f2fs_write_data_pages() and the test hangs.
The options I used are:

MKFS_OPTIONS  -- -O compression -O extra_attr -O project_quota -O quota /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o discard,compress_extension=* /dev/vdc /vdc

INFO: task kworker/u8:0:11 blocked for more than 122 seconds.
      Not tainted 6.14.0-rc3-xfstests-lockdep #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:0    state:D stack:0     pid:11    tgid:11    ppid:2      task_flags:0x4208160 flags:0x00004000
Workqueue: writeback wb_workfn (flush-253:0)
Call Trace:
 <TASK>
 __schedule+0x309/0x8e0
 schedule+0x3a/0x100
 schedule_preempt_disabled+0x15/0x30
 __mutex_lock+0x59a/0xdb0
 __f2fs_write_data_pages+0x3ac/0x400
 do_writepages+0xe8/0x290
 __writeback_single_inode+0x5c/0x360
 writeback_sb_inodes+0x22f/0x570
 wb_writeback+0xb0/0x410
 wb_do_writeback+0x47/0x2f0
 wb_workfn+0x5a/0x1c0
 process_one_work+0x223/0x5b0
 worker_thread+0x1d5/0x3c0
 kthread+0xfd/0x230
 ret_from_fork+0x31/0x50
 ret_from_fork_asm+0x1a/0x30
 </TASK>

The root cause is: once generic/475 starts toload error table to dm
device, f2fs_prepare_compress_overwrite() will loop reading compressed
cluster pages due to IO error, meanwhile it has held .writepages lock,
it can block all other writeback tasks.

Let's fix this issue w/ below changes:
- add f2fs_handle_page_eio() in prepare_compress_overwrite() to
detect IO error.
- detect cp_error earler in f2fs_read_multi_pages().

Fixes: 4c8ff7095b ("f2fs: support data compression")
Reported-by: Jan Prusakowski <jprusakowski@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 00:47:13 +00:00
Leo Stone
81edb983b3 f2fs: add check for deleted inode
The syzbot reproducer mounts a f2fs image, then tries to unlink an
existing file. However, the unlinked file already has a link count of 0
when it is read for the first time in do_read_inode().

Add a check to sanity_check_inode() for i_nlink == 0.

[Chao Yu: rebase the code and fix orphan inode recovery issue]
Reported-by: syzbot+b01a36acd7007e273a83@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b01a36acd7007e273a83
Fixes: 39a53e0ce0 ("f2fs: add superblock and major in-memory structure")
Signed-off-by: Leo Stone <leocstone@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-28 16:31:06 +00:00
Jaegeuk Kim
201e07aec6 f2fs: fix the missing write pointer correction
If checkpoint was disabled, we missed to fix the write pointers.

Cc: <stable@vger.kernel.org>
Fixes: 1015035609 ("f2fs: fix changing cursegs if recovery fails on zoned device")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-28 16:23:01 +00:00
Chao Yu
1b60b23975 f2fs: fix to set .discard_granularity correctly
commit 4f993264fe ("f2fs: introduce discard_unit mount option") introduced
a bug, when we enable discard_unit=section option, it will set
.discard_granularity to BLKS_PER_SEC(), however discard granularity only
supports [1, 512], once section size is not equal to segment size, it will
cause issue_discard_thread() in DPOLICY_BG mode will not select discard entry
w/ any granularity to issue.

Fixes: 4f993264fe ("f2fs: introduce discard_unit mount option")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Yohan Joung <yohan.joung@sk.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-28 16:22:56 +00:00
NeilBrown
88d5baf690
Change inode_operations.mkdir to return struct dentry *
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g.  on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns.  For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.

This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes.  Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.

This lookup-after-mkdir requires that the directory remains locked to
avoid races.  Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.

To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
  NULL - the directory was created and no other dentry was used
  ERR_PTR() - an error occurred
  non-NULL - this other dentry was spliced in

This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations.  Subsequent patches will make
further changes to some file-systems to return a correct dentry.

Not all filesystems reliably result in a positive hashed dentry:

- NFS, cifs, hostfs will sometimes need to perform a lookup of
  the name to get inode information.  Races could result in this
  returning something different. Note that this lookup is
  non-atomic which is what we are trying to avoid.  Placing the
  lookup in filesystem code means it only happens when the filesystem
  has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
  operation ensures that lookup will be called to correctly populate
  the dentry.  This could be fixed but I don't think it is important
  to any of the users of vfs_mkdir() which look at the dentry.

The recommendation to use
    d_drop();d_splice_alias()
is ugly but fits with current practice.  A planned future patch will
change this.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 20:00:17 +01:00
Jaegeuk Kim
a907f3a68e f2fs: add a sysfs entry to reclaim POSIX_FADV_NOREUSE pages
1. fadvise(fd1, POSIX_FADV_NOREUSE, {0,3});
2. fadvise(fd2, POSIX_FADV_NOREUSE, {1,2});
3. fadvise(fd3, POSIX_FADV_NOREUSE, {3,1});
4. echo 1024 > /sys/fs/f2fs/tuning/reclaim_caches_kb

This gives a way to reclaim file-backed pages by iterating all f2fs mounts until
reclaiming 1MB page cache ranges, registered by #1, #2, and #3.

5. cat /sys/fs/f2fs/tuning/reclaim_caches_kb
-> gives total number of registered file ranges.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-13 17:58:36 +00:00
Jaegeuk Kim
ef0c333cad f2fs: keep POSIX_FADV_NOREUSE ranges
This patch records POSIX_FADV_NOREUSE ranges for users to reclaim the caches
instantly off from LRU.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-13 17:58:31 +00:00
Chao Yu
48ea8b2004 f2fs: fix to avoid panic once fallocation fails for pinfile
syzbot reports a f2fs bug as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2746!
CPU: 0 UID: 0 PID: 5323 Comm: syz.0.0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
RIP: 0010:get_new_segment fs/f2fs/segment.c:2746 [inline]
RIP: 0010:new_curseg+0x1f52/0x1f70 fs/f2fs/segment.c:2876
Call Trace:
 <TASK>
 __allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3210
 f2fs_allocate_new_section fs/f2fs/segment.c:3224 [inline]
 f2fs_allocate_pinning_section+0xfa/0x4e0 fs/f2fs/segment.c:3238
 f2fs_expand_inode_data+0x696/0xca0 fs/f2fs/file.c:1830
 f2fs_fallocate+0x537/0xa10 fs/f2fs/file.c:1940
 vfs_fallocate+0x569/0x6e0 fs/open.c:327
 do_vfs_ioctl+0x258c/0x2e40 fs/ioctl.c:885
 __do_sys_ioctl fs/ioctl.c:904 [inline]
 __se_sys_ioctl+0x80/0x170 fs/ioctl.c:892
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Concurrent pinfile allocation may run out of free section, result in
panic in get_new_segment(), let's expand pin_sem lock coverage to
include f2fs_gc(), so that we can make sure to reclaim enough free
space for following allocation.

In addition, do below changes to enhance error path handling:
- call f2fs_bug_on() only in non-pinfile allocation path in
get_new_segment().
- call reset_curseg_fields() to reset all fields of curseg in
new_curseg()

Fixes: f5a53edcf0 ("f2fs: support aligned pinned file")
Reported-by: syzbot+15669ec8c35ddf6c3d43@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/675cd64e.050a0220.37aaf.00bb.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-12 02:30:50 +00:00
Jaegeuk Kim
5f95c1812a f2fs: add ioctl to get IO priority hint
This patch adds an ioctl to give a per-file priority hint to attach
REQ_PRIO.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-12 02:29:30 +00:00
Chao Yu
4f91f07470 f2fs: add dump_stack() in f2fs_handle_critical_error()
To show call stack, so that we can see who causes critical error, note
that it won't call dump_stack() for shutdown path.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-12 02:28:22 +00:00
Chao Yu
1534747d31 f2fs: don't retry IO for corrupted data scenario
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]

If node block is loaded successfully, but its content is inconsistent, it
doesn't need to retry IO.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-10 17:13:40 +00:00
Zhiguo Niu
bd409934c0 f2fs: fix to return SHRINK_EMPTY if no objects to free
Quoted from include/linux/shrinker.h
"count_objects should return the number of freeable items in the cache. If
 there are no objects to free, it should return SHRINK_EMPTY, while 0 is
 returned in cases of the number of freeable items cannot be determined
 or shrinker should skip this cache for this time (e.g., their number
 is below shrinkable limit)."

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-10 17:03:19 +00:00
Chao Yu
eb85c2410d f2fs: quota: fix to avoid warning in dquot_writeback_dquots()
F2FS-fs (dm-59): checkpoint=enable has some unwritten data.

------------[ cut here ]------------
WARNING: CPU: 6 PID: 8013 at fs/quota/dquot.c:691 dquot_writeback_dquots+0x2fc/0x308
pc : dquot_writeback_dquots+0x2fc/0x308
lr : f2fs_quota_sync+0xcc/0x1c4
Call trace:
dquot_writeback_dquots+0x2fc/0x308
f2fs_quota_sync+0xcc/0x1c4
f2fs_write_checkpoint+0x3d4/0x9b0
f2fs_issue_checkpoint+0x1bc/0x2c0
f2fs_sync_fs+0x54/0x150
f2fs_do_sync_file+0x2f8/0x814
__f2fs_ioctl+0x1960/0x3244
f2fs_ioctl+0x54/0xe0
__arm64_sys_ioctl+0xa8/0xe4
invoke_syscall+0x58/0x114

checkpoint and f2fs_remount may race as below, resulting triggering warning
in dquot_writeback_dquots().

atomic write                                    remount
                                                - do_remount
                                                 - down_write(&sb->s_umount);
                                                  - f2fs_remount
- ioctl
 - f2fs_do_sync_file
  - f2fs_sync_fs
   - f2fs_write_checkpoint
    - block_operations
     - locked = down_read_trylock(&sbi->sb->s_umount)
       : fail to lock due to the write lock was held by remount
                                                 - up_write(&sb->s_umount);
     - f2fs_quota_sync
      - dquot_writeback_dquots
       - WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount))
       : trigger warning because s_umount lock was unlocked by remount

If checkpoint comes from mount/umount/remount/freeze/quotactl, caller of
checkpoint has already held s_umount lock, calling dquot_writeback_dquots()
in the context should be safe.

So let's record task to sbi->umount_lock_holder, so that checkpoint can
know whether the lock has held in the context or not by checking current
w/ it.

In addition, in order to not misrepresent caller of checkpoint, we should
not allow to trigger async checkpoint for those callers: mount/umount/remount/
freeze/quotactl.

Fixes: af033b2aa8 ("f2fs: guarantee journalled quota data by checkpoint")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-10 16:58:42 +00:00
Kohei Enju
53333cdf5b f2fs: remove unnecessary null checking
When __GFP_DIRECT_RECLAIM (included in both GFP_NOIO and GFP_KERNEL) is
specified, bio_alloc_bioset() never fails to allocate a bio.
Commit 67883ade7a ("f2fs: remove FAULT_ALLOC_BIO") replaced
f2fs_bio_alloc() with bio_alloc_bioset(), but null checking after
bio_alloc_bioset() was still left.

Fixes: 67883ade7a ("f2fs: remove FAULT_ALLOC_BIO")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-06 18:34:43 +00:00
Jaegeuk Kim
21925ede44 f2fs: introduce f2fs_base_attr for global sysfs entries
In /sys/fs/f2fs/features, there's no f2fs_sb_info, so let's avoid to get
the pointer.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-06 18:34:43 +00:00
Linus Torvalds
6d61a53dd6 f2fs-for-6.14-rc1
In this series, there are several major improvements such as 1) folio conversion
 made by Matthew, 2) speed-up of block truncation, 3) caching more dentry pages.
 In addition, we implemented a linear dentry search to address recent unicode
 regression, and figured out some false alarms that we could get rid of.
 
 Enhancement:
  - foilio conversion in various IO paths
  - optimize f2fs_truncate_data_blocks_range()
  - cache more dentry pages
  - remove unnecessary blk_finish_plug
  - procfs: show mtime in segment_bits
 
 Bug fix:
  - introduce linear search for dentries
  - don't call block truncation for aliased file
  - fix using wrong 'submitted' value in f2fs_write_cache_pages
  - fix to do sanity check correctly on i_inline_xattr_size
  - avoid trying to get invalid block address
  - fix inconsistent dirty state of atomic file
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmeYV/QACgkQQBSofoJI
 UNKsPg/+NzFrK/D5nFJ6t86T2XdngzESbI+gbydA8CrT7VoAw5Es0GTswnsStnqF
 DaWWiz9TYDTJWarKMklZ8zcGwcQGAPZqyg3X+eUPb2Rfr9DK80Twov5nfzai/ZVM
 iJQuT7vAqbgJnmF1caJYghuOuJpd43U1lK/CxEomXzBCGVJipvSa7Mzh9awUS0P+
 luvTYjZXh3BISZDnqIbxVjZjcd6TKoBHVqKtz0JbrghVKJRXiVHr4IPnzUQ6hCE8
 MvN07mfQJPyIrZV1jVX/syYKUgwS/QYAmeca/uFGoYO0cSn3qAhdn0PLWpQBIB+D
 ST2SIE9penLlhCb8zN4d6Q6LwEcOWIbtcXffsix3EBCQosKqrqznV0SJ+fjGjuuw
 kX3ICsidYzB8GeHtf6dgH8dRqP4kvYnDe6P0Ho6iuxCZPHWiVauthORuMqerXFNn
 8hHtnGMqybGnT6Py51bt4qlxIgTVl3YO1643Ej8ihpCXJPoCmi6cTyK/M/KaZoaM
 6YYeTZwWbPuCclLm+iVNUPs0asxESSBqHTXm+r9NkaExtmclFyQs1edZ/pYUihq2
 CjvluyKVMuLVieU631am6X3H8sJsgepb8mjsJagtqF36DlCSW8jHgaqkl4gyi5m8
 V4c3w2rmh8IssjTCXxEGtqRQ/Qdbabo9aiFcNa37t1ov7+6GzEk=
 =PEtq
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this series, there are several major improvements such as folio
  conversion by Matthew, speed-up of block truncation, and caching more
  dentry pages.

  In addition, we implemented a linear dentry search to address recent
  unicode regression, and figured out some false alarms that we could
  get rid of.

  Enhancements:
   - foilio conversion in various IO paths
   - optimize f2fs_truncate_data_blocks_range()
   - cache more dentry pages
   - remove unnecessary blk_finish_plug
   - procfs: show mtime in segment_bits

  Bug fixes:
   - introduce linear search for dentries
   - don't call block truncation for aliased file
   - fix using wrong 'submitted' value in f2fs_write_cache_pages
   - fix to do sanity check correctly on i_inline_xattr_size
   - avoid trying to get invalid block address
   - fix inconsistent dirty state of atomic file"

* tag 'f2fs-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits)
  f2fs: fix inconsistent dirty state of atomic file
  f2fs: fix to avoid changing 'check only' behaior of recovery
  f2fs: Clean up the loop outside of f2fs_invalidate_blocks()
  f2fs: procfs: show mtime in segment_bits
  f2fs: fix to avoid return invalid mtime from f2fs_get_section_mtime()
  f2fs: Fix format specifier in sanity_check_inode()
  f2fs: avoid trying to get invalid block address
  f2fs: fix to do sanity check correctly on i_inline_xattr_size
  f2fs: remove blk_finish_plug
  f2fs: Optimize f2fs_truncate_data_blocks_range()
  f2fs: fix using wrong 'submitted' value in f2fs_write_cache_pages
  f2fs: add parameter @len to f2fs_invalidate_blocks()
  f2fs: update_sit_entry_for_release() supports consecutive blocks.
  f2fs: introduce update_sit_entry_for_release/alloc()
  f2fs: don't call block truncation for aliased file
  f2fs: Introduce linear search for dentries
  f2fs: add parameter @len to f2fs_invalidate_internal_cache()
  f2fs: expand f2fs_invalidate_compress_page() to f2fs_invalidate_compress_pages_range()
  f2fs: ensure that node info flags are always initialized
  f2fs: The GC triggered by ioctl also needs to mark the segno as victim
  ...
2025-01-27 20:58:58 -08:00
Linus Torvalds
9c5968db9e The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
 
 - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the
   page allocator so we end up with the ability to allocate and free
   zero-refcount pages.  So that callers (ie, slab) can avoid a refcount
   inc & dec.
 
 - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use
   large folios other than PMD-sized ones.
 
 - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and
   fixes for this small built-in kernel selftest.
 
 - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of
   the mapletree code.
 
 - "mm: fix format issues and param types" from Keren Sun implements a
   few minor code cleanups.
 
 - "simplify split calculation" from Wei Yang provides a few fixes and a
   test for the mapletree code.
 
 - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes
   continues the work of moving vma-related code into the (relatively) new
   mm/vma.c.
 
 - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
   Hildenbrand cleans up and rationalizes handling of gfp flags in the page
   allocator.
 
 - "readahead: Reintroduce fix for improper RA window sizing" from Jan
   Kara is a second attempt at fixing a readahead window sizing issue.  It
   should reduce the amount of unnecessary reading.
 
 - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
   addresses an issue where "huge" amounts of pte pagetables are
   accumulated
   (https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/).
   Qi's series addresses this windup by synchronously freeing PTE memory
   within the context of madvise(MADV_DONTNEED).
 
 - "selftest/mm: Remove warnings found by adding compiler flags" from
   Muhammad Usama Anjum fixes some build warnings in the selftests code
   when optional compiler warnings are enabled.
 
 - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David
   Hildenbrand tightens the allocator's observance of __GFP_HARDWALL.
 
 - "pkeys kselftests improvements" from Kevin Brodsky implements various
   fixes and cleanups in the MM selftests code, mainly pertaining to the
   pkeys tests.
 
 - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
   estimate application working set size.
 
 - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
   provides some cleanups to memcg's hugetlb charging logic.
 
 - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
   removes the global swap cgroup lock.  A speedup of 10% for a tmpfs-based
   kernel build was demonstrated.
 
 - "zram: split page type read/write handling" from Sergey Senozhatsky
   has several fixes and cleaups for zram in the area of zram_write_page().
   A watchdog softlockup warning was eliminated.
 
 - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky
   cleans up the pagetable destructor implementations.  A rare
   use-after-free race is fixed.
 
 - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
   simplifies and cleans up the debugging code in the VMA merging logic.
 
 - "Account page tables at all levels" from Kevin Brodsky cleans up and
   regularizes the pagetable ctor/dtor handling.  This results in
   improvements in accounting accuracy.
 
 - "mm/damon: replace most damon_callback usages in sysfs with new core
   functions" from SeongJae Park cleans up and generalizes DAMON's sysfs
   file interface logic.
 
 - "mm/damon: enable page level properties based monitoring" from
   SeongJae Park increases the amount of information which is presented in
   response to DAMOS actions.
 
 - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes
   DAMON's long-deprecated debugfs interfaces.  Thus the migration to sysfs
   is completed.
 
 - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter
   Xu cleans up and generalizes the hugetlb reservation accounting.
 
 - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
   removes a never-used feature of the alloc_pages_bulk() interface.
 
 - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
   extends DAMOS filters to support not only exclusion (rejecting), but
   also inclusion (allowing) behavior.
 
 - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
   "introduces a new memory descriptor for zswap.zpool that currently
   overlaps with struct page for now.  This is part of the effort to reduce
   the size of struct page and to enable dynamic allocation of memory
   descriptors."
 
 - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and
   simplifies the swap allocator locking.  A speedup of 400% was
   demonstrated for one workload.  As was a 35% reduction for kernel build
   time with swap-on-zram.
 
 - "mm: update mips to use do_mmap(), make mmap_region() internal" from
   Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
   mmap_region() can be made MM-internal.
 
 - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU
   regressions and otherwise improves MGLRU performance.
 
 - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park
   updates DAMON documentation.
 
 - "Cleanup for memfd_create()" from Isaac Manjarres does that thing.
 
 - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand
   provides various cleanups in the areas of hugetlb folios, THP folios and
   migration.
 
 - "Uncached buffered IO" from Jens Axboe implements the new
   RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache
   reading and writing.  To permite userspace to address issues with
   massive buildup of useless pagecache when reading/writing fast devices.
 
 - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
   Weißschuh fixes and optimizes some of the MM selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA
 jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz
 uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE=
 =Ib2s
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "The various patchsets are summarized below. Plus of course many
  indivudual patches which are described in their changelogs.

   - "Allocate and free frozen pages" from Matthew Wilcox reorganizes
     the page allocator so we end up with the ability to allocate and
     free zero-refcount pages. So that callers (ie, slab) can avoid a
     refcount inc & dec

   - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
     use large folios other than PMD-sized ones

   - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
     and fixes for this small built-in kernel selftest

   - "mas_anode_descend() related cleanup" from Wei Yang tidies up part
     of the mapletree code

   - "mm: fix format issues and param types" from Keren Sun implements a
     few minor code cleanups

   - "simplify split calculation" from Wei Yang provides a few fixes and
     a test for the mapletree code

   - "mm/vma: make more mmap logic userland testable" from Lorenzo
     Stoakes continues the work of moving vma-related code into the
     (relatively) new mm/vma.c

   - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
     Hildenbrand cleans up and rationalizes handling of gfp flags in the
     page allocator

   - "readahead: Reintroduce fix for improper RA window sizing" from Jan
     Kara is a second attempt at fixing a readahead window sizing issue.
     It should reduce the amount of unnecessary reading

   - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
     addresses an issue where "huge" amounts of pte pagetables are
     accumulated:

       https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/

     Qi's series addresses this windup by synchronously freeing PTE
     memory within the context of madvise(MADV_DONTNEED)

   - "selftest/mm: Remove warnings found by adding compiler flags" from
     Muhammad Usama Anjum fixes some build warnings in the selftests
     code when optional compiler warnings are enabled

   - "mm: don't use __GFP_HARDWALL when migrating remote pages" from
     David Hildenbrand tightens the allocator's observance of
     __GFP_HARDWALL

   - "pkeys kselftests improvements" from Kevin Brodsky implements
     various fixes and cleanups in the MM selftests code, mainly
     pertaining to the pkeys tests

   - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
     estimate application working set size

   - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
     provides some cleanups to memcg's hugetlb charging logic

   - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
     removes the global swap cgroup lock. A speedup of 10% for a
     tmpfs-based kernel build was demonstrated

   - "zram: split page type read/write handling" from Sergey Senozhatsky
     has several fixes and cleaups for zram in the area of
     zram_write_page(). A watchdog softlockup warning was eliminated

   - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
     Brodsky cleans up the pagetable destructor implementations. A rare
     use-after-free race is fixed

   - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
     simplifies and cleans up the debugging code in the VMA merging
     logic

   - "Account page tables at all levels" from Kevin Brodsky cleans up
     and regularizes the pagetable ctor/dtor handling. This results in
     improvements in accounting accuracy

   - "mm/damon: replace most damon_callback usages in sysfs with new
     core functions" from SeongJae Park cleans up and generalizes
     DAMON's sysfs file interface logic

   - "mm/damon: enable page level properties based monitoring" from
     SeongJae Park increases the amount of information which is
     presented in response to DAMOS actions

   - "mm/damon: remove DAMON debugfs interface" from SeongJae Park
     removes DAMON's long-deprecated debugfs interfaces. Thus the
     migration to sysfs is completed

   - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
     Peter Xu cleans up and generalizes the hugetlb reservation
     accounting

   - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
     removes a never-used feature of the alloc_pages_bulk() interface

   - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
     extends DAMOS filters to support not only exclusion (rejecting),
     but also inclusion (allowing) behavior

   - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
     introduces a new memory descriptor for zswap.zpool that currently
     overlaps with struct page for now. This is part of the effort to
     reduce the size of struct page and to enable dynamic allocation of
     memory descriptors

   - "mm, swap: rework of swap allocator locks" from Kairui Song redoes
     and simplifies the swap allocator locking. A speedup of 400% was
     demonstrated for one workload. As was a 35% reduction for kernel
     build time with swap-on-zram

   - "mm: update mips to use do_mmap(), make mmap_region() internal"
     from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
     mmap_region() can be made MM-internal

   - "mm/mglru: performance optimizations" from Yu Zhao fixes a few
     MGLRU regressions and otherwise improves MGLRU performance

   - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
     Park updates DAMON documentation

   - "Cleanup for memfd_create()" from Isaac Manjarres does that thing

   - "mm: hugetlb+THP folio and migration cleanups" from David
     Hildenbrand provides various cleanups in the areas of hugetlb
     folios, THP folios and migration

   - "Uncached buffered IO" from Jens Axboe implements the new
     RWF_DONTCACHE flag which provides synchronous dropbehind for
     pagecache reading and writing. To permite userspace to address
     issues with massive buildup of useless pagecache when
     reading/writing fast devices

   - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
     Weißschuh fixes and optimizes some of the MM selftests"

* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm/compaction: fix UBSAN shift-out-of-bounds warning
  s390/mm: add missing ctor/dtor on page table upgrade
  kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
  tools: add VM_WARN_ON_VMG definition
  mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
  seqlock: add missing parameter documentation for raw_seqcount_try_begin()
  mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
  mm/page_alloc: remove the incorrect and misleading comment
  zram: remove zcomp_stream_put() from write_incompressible_page()
  mm: separate move/undo parts from migrate_pages_batch()
  mm/kfence: use str_write_read() helper in get_access_type()
  selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
  kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
  selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
  selftests/mm: vm_util: split up /proc/self/smaps parsing
  selftests/mm: virtual_address_range: unmap chunks after validation
  selftests/mm: virtual_address_range: mmap() without PROT_WRITE
  selftests/memfd/memfd_test: fix possible NULL pointer dereference
  mm: add FGP_DONTCACHE folio creation flag
  mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
  ...
2025-01-26 18:36:23 -08:00
Kairui Song
27701521be mm, swap: clean up device availability check
Remove highest_bit and lowest_bit.  After the HDD allocation path has been
removed, the only purpose of these two fields is to determine whether the
device is full or not, which can instead be determined by checking the
inuse_pages.

Link: https://lkml.kernel.org/r/20250113175732.48099-6-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chis Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:36 -08:00
Jianan Huang
03511e9369 f2fs: fix inconsistent dirty state of atomic file
When testing the atomic write fix patches, the f2fs_bug_on was
triggered as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:935!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 UID: 0 PID: 257 Comm: bash Not tainted 6.13.0-rc1-00033-gc283a70d3497 #5
RIP: 0010:f2fs_evict_inode+0x50f/0x520
Call Trace:
 <TASK>
 ? __die_body+0x65/0xb0
 ? die+0x9f/0xc0
 ? do_trap+0xa1/0x170
 ? f2fs_evict_inode+0x50f/0x520
 ? f2fs_evict_inode+0x50f/0x520
 ? handle_invalid_op+0x65/0x80
 ? f2fs_evict_inode+0x50f/0x520
 ? exc_invalid_op+0x39/0x50
 ? asm_exc_invalid_op+0x1a/0x20
 ? __pfx_f2fs_get_dquots+0x10/0x10
 ? f2fs_evict_inode+0x50f/0x520
 ? f2fs_evict_inode+0x2e5/0x520
 evict+0x186/0x2f0
 prune_icache_sb+0x75/0xb0
 super_cache_scan+0x1a8/0x200
 do_shrink_slab+0x163/0x320
 shrink_slab+0x2fc/0x470
 drop_slab+0x82/0xf0
 drop_caches_sysctl_handler+0x4e/0xb0
 proc_sys_call_handler+0x183/0x280
 vfs_write+0x36d/0x450
 ksys_write+0x68/0xd0
 do_syscall_64+0xc8/0x1a0
 ? arch_exit_to_user_mode_prepare+0x11/0x60
 ? irqentry_exit_to_user_mode+0x7e/0xa0

The root cause is: f2fs uses FI_ATOMIC_DIRTIED to indicate dirty
atomic files during commit. If the inode is dirtied during commit,
such as by f2fs_i_pino_write, the vfs inode keeps clean and the
f2fs inode is set to FI_DIRTY_INODE. The FI_DIRTY_INODE flag cann't
be cleared by write_inode later due to the clean vfs inode. Finally,
f2fs_bug_on is triggered due to this inconsistent state when evict.

To reproduce this situation:
- fd = open("/mnt/test.db", O_WRONLY)
- ioctl(fd, F2FS_IOC_START_ATOMIC_WRITE)
- mv /mnt/test.db /mnt/test1.db
- ioctl(fd, F2FS_IOC_COMMIT_ATOMIC_WRITE)
- echo 3 > /proc/sys/vm/drop_caches

To fix this problem, clear FI_DIRTY_INODE after commit, then
f2fs_mark_inode_dirty_sync will ensure a consistent dirty state.

Fixes: fccaa81de8 ("f2fs: prevent atomic file from being dirtied before commit")
Signed-off-by: Yunlei He <heyunlei@xiaomi.com>
Signed-off-by: Jianan Huang <huangjianan@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-25 01:12:12 +00:00
Zhiguo Niu
edf3c08600 f2fs: fix to avoid changing 'check only' behaior of recovery
The following two 'check only recovery' processes are very dependent on
the return value of f2fs_recover_fsync_data, especially when the return
value is greater than 0.
1. when device has readonly mode, shown as commit
23738e7447 ("f2fs: fix to restrict mount condition on readonly block device")
2. mount optiont NORECOVERY or DISABLE_ROLL_FORWARD is set, shown as commit
6781eabba1 ("f2fs: give -EINVAL for norecovery and rw mount")

However, commit c426d99127 ("f2fs: Check write pointer consistency of open zones")
will change the return value unexpectedly, thereby changing the caller's behavior

This patch let the f2fs_recover_fsync_data return correct value,and not do
f2fs_check_and_fix_write_pointer when the device is read-only.

Fixes: c426d99127 ("f2fs: Check write pointer consistency of open zones")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22 21:04:56 +00:00
Yi Sun
6d4008dc4a f2fs: Clean up the loop outside of f2fs_invalidate_blocks()
Now f2fs_invalidate_blocks() supports a continuous range of addresses,
so the for loop can be omitted.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22 21:04:56 +00:00
Chao Yu
f6370a360d f2fs: procfs: show mtime in segment_bits
Show mtime in segment_bits for debug.

cat /proc/fs//f2fs/loop0/segment_bits
format: segment_type|valid_blocks|bitmaps|mtime
segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN)
0         3|1  | 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00| ffffffffffffffff
1         4|3  | 00 d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00| ffffffffffffffff
2         5|0  | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00| ffffffffffffffff
3         0|1  | 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00| ffffffffffffffff

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22 21:04:56 +00:00
Chao Yu
207764e5d6 f2fs: fix to avoid return invalid mtime from f2fs_get_section_mtime()
syzbot reported a f2fs bug as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/gc.c:373!
CPU: 0 UID: 0 PID: 5316 Comm: syz.0.0 Not tainted 6.13.0-rc3-syzkaller-00044-gaef25be35d23 #0
RIP: 0010:get_cb_cost fs/f2fs/gc.c:373 [inline]
RIP: 0010:get_gc_cost fs/f2fs/gc.c:406 [inline]
RIP: 0010:f2fs_get_victim+0x68b1/0x6aa0 fs/f2fs/gc.c:912
Call Trace:
 <TASK>
 __get_victim fs/f2fs/gc.c:1707 [inline]
 f2fs_gc+0xc89/0x2f60 fs/f2fs/gc.c:1915
 f2fs_ioc_gc fs/f2fs/file.c:2624 [inline]
 __f2fs_ioctl+0x4cc9/0xb8b0 fs/f2fs/file.c:4482
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:906 [inline]
 __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

w/ below testcase, it can reproduce directly:
- dd if=/dev/zero of=/tmp/file bs=1M count=64
- mkfs.f2fs /tmp/file
- mount -t f2fs -o loop,mode=fragment:block /tmp/file /mnt/f2fs
- echo 0 >  /sys/fs/f2fs/loop0/min_ssr_sections
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=5
- umount /mnt/f2fs
- for((i=4096;i<16384;i+=512)) do inject.f2fs --sit 0 --blk $i --mb mtime --val -1 /tmp/file; done
- mount -o loop /tmp/file /mnt/f2fs
- f2fs_io gc 0 /mnt/f2fs/file

static unsigned int get_cb_cost()
{
	...
	mtime = f2fs_get_section_mtime(sbi, segno);
	f2fs_bug_on(sbi, mtime == INVALID_MTIME);
	...
}

The root cause is: mtime in f2fs_sit_entry can be fuzzed to INVALID_MTIME,
then it will trigger BUG_ON in get_cb_cost() during GC.

Let's change behavior of f2fs_get_section_mtime() as below for fix:
- return INVALID_MTIME only if total valid blocks is zero.
- return INVALID_MTIME - 1 if average mtime calculated is
INVALID_MTIME.

Fixes: b19ee72722 ("f2fs: introduce f2fs_get_section_mtime")
Reported-by: syzbot+b9972806adbe20a910eb@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/6768c82e.050a0220.226966.0035.GAE@google.com
Cc: liuderong <liuderong@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22 21:04:56 +00:00
Nathan Chancellor
a68905d48a f2fs: Fix format specifier in sanity_check_inode()
When building for 32-bit platforms, for which 'size_t' is 'unsigned int',
there is a warning due to an incorrect format specifier:

  fs/f2fs/inode.c:320:6: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat]
    318 |                 f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_inline_xattr_size: %d, min: %lu, max: %lu",
        |                                                                                                 ~~~
        |                                                                                                 %u
    319 |                           __func__, inode->i_ino, fi->i_inline_xattr_size,
    320 |                           MIN_INLINE_XATTR_SIZE, MAX_INLINE_XATTR_SIZE);
        |                           ^~~~~~~~~~~~~~~~~~~~~
  fs/f2fs/f2fs.h:1855:46: note: expanded from macro 'f2fs_warn'
   1855 |         f2fs_printk(sbi, false, KERN_WARNING fmt, ##__VA_ARGS__)
        |                                              ~~~    ^~~~~~~~~~~
  fs/f2fs/xattr.h:86:31: note: expanded from macro 'MIN_INLINE_XATTR_SIZE'
     86 | #define MIN_INLINE_XATTR_SIZE (sizeof(struct f2fs_xattr_header) / sizeof(__le32))
        |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use the format specifier for 'size_t', '%zu', to resolve the warning.

Fixes: 5c1768b672 ("f2fs: fix to do sanity check correctly on i_inline_xattr_size")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22 21:04:49 +00:00
Jaegeuk Kim
e02938613e f2fs: avoid trying to get invalid block address
In f2fs_new_inode(), if we fail to get a new inode, we go iput(), followed by
f2fs_evict_inode(). If the inode is not marked as bad, it'll try to call
f2fs_remove_inode_page() which tries to read the inode block given node id.
But, there's no block address allocated yet, which gives a chance to access
a wrong block address, if the block device has some garbage data in NAT table.

We need to make sure NAT table should have zero data for all the unallocated
node ids, but also would be better to take this unnecessary path as well.
Let's mark the faild inode as bad.

Fixes: 0abd675e97 ("f2fs: support plain user/group quota")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-21 16:28:16 +00:00
Chao Yu
5c1768b672 f2fs: fix to do sanity check correctly on i_inline_xattr_size
syzbot reported an out-of-range access issue as below:

UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3292:19
index 18446744073709550491 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
CPU: 0 UID: 0 PID: 5338 Comm: syz.0.0 Not tainted 6.12.0-syzkaller-10689-g7af08b57bcb9 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:231 [inline]
 __ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429
 read_inline_xattr+0x273/0x280
 lookup_all_xattrs fs/f2fs/xattr.c:341 [inline]
 f2fs_getxattr+0x57b/0x13b0 fs/f2fs/xattr.c:533
 vfs_getxattr_alloc+0x472/0x5c0 fs/xattr.c:393
 ima_read_xattr+0x38/0x60 security/integrity/ima/ima_appraise.c:229
 process_measurement+0x117a/0x1fb0 security/integrity/ima/ima_main.c:353
 ima_file_check+0xd9/0x120 security/integrity/ima/ima_main.c:572
 security_file_post_open+0xb9/0x280 security/security.c:3121
 do_open fs/namei.c:3830 [inline]
 path_openat+0x2ccd/0x3590 fs/namei.c:3987
 do_file_open_root+0x3a7/0x720 fs/namei.c:4039
 file_open_root+0x247/0x2a0 fs/open.c:1382
 do_handle_open+0x85b/0x9d0 fs/fhandle.c:414
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

index: 18446744073709550491 (decimal, unsigned long long)
= 0xfffffffffffffb9b (hexadecimal) = -1125 (decimal, long long)
UBSAN detects that inline_xattr_addr() tries to access .i_addr[-1125].

w/ below testcase, it can reproduce this bug easily:
- mkfs.f2fs -f -O extra_attr,flexible_inline_xattr /dev/sdb
- mount -o inline_xattr_size=512 /dev/sdb /mnt/f2fs
- touch /mnt/f2fs/file
- umount /mnt/f2fs
- inject.f2fs --node --mb i_inline --nid 4 --val 0x1 /dev/sdb
- inject.f2fs --node --mb i_inline_xattr_size --nid 4 --val 2048 /dev/sdb
- mount /dev/sdb /mnt/f2fs
- getfattr /mnt/f2fs/file

The root cause is if metadata of filesystem and inode were fuzzed as below:
- extra_attr feature is enabled
- flexible_inline_xattr feature is enabled
- ri.i_inline_xattr_size = 2048
- F2FS_EXTRA_ATTR bit in ri.i_inline was not set

sanity_check_inode() will skip doing sanity check on fi->i_inline_xattr_size,
result in using invalid inline_xattr_size later incorrectly, fix it.

Meanwhile, let's fix to check lower boundary for .i_inline_xattr_size w/
MIN_INLINE_XATTR_SIZE like we did in parse_options().

There is a related issue reported by syzbot, Qasim Ijaz has anlyzed and
fixed it w/ very similar way [1], as discussed, we all agree that it will
be better to do sanity check in sanity_check_inode() for fix, so finally,
let's fix these two related bugs w/ current patch.

Including commit message from Qasim's patch as below, thanks a lot for
his contribution.

"In f2fs_getxattr(), the function lookup_all_xattrs() allocates a 12-byte
(base_size) buffer for an inline extended attribute. However, when
__find_inline_xattr() calls __find_xattr(), it uses the macro
"list_for_each_xattr(entry, addr)", which starts by calling
XATTR_FIRST_ENTRY(addr). This skips a 24-byte struct f2fs_xattr_header
at the beginning of the buffer, causing an immediate out-of-bounds read
in a 12-byte allocation. The subsequent !IS_XATTR_LAST_ENTRY(entry)
check then dereferences memory outside the allocated region, triggering
the slab-out-of bounds read.

This patch prevents the out-of-bounds read by adding a check to bail
out early if inline_size is too small and does not account for the
header plus the 4-byte value that IS_XATTR_LAST_ENTRY reads."

[1]: https://lore.kernel.org/linux-f2fs-devel/Z32y1rfBY9Qb5ZjM@qasdev.system/

Fixes: 6afc662e68 ("f2fs: support flexible inline xattr size")
Reported-by: syzbot+69f5379a1717a0b982a1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/674f4e7d.050a0220.17bd51.004f.GAE@google.com
Reported-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=f5e74075e096e757bdbf
Tested-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com>
Tested-by: Qasim Ijaz <qasdev00@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-16 17:27:51 +00:00
Jaegeuk Kim
4811fee828 f2fs: remove blk_finish_plug
Let's remove unclear blk_finish_plug.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-16 17:27:51 +00:00
Yi Sun
120ac1dc32 f2fs: Optimize f2fs_truncate_data_blocks_range()
Function f2fs_invalidate_blocks() can process consecutive
blocks at a time, so f2fs_truncate_data_blocks_range() is
optimized to use the new functionality of
f2fs_invalidate_blocks().

Add two variables @blkstart and @blklen, @blkstart records
the first address of the consecutive blocks, and @blkstart
records the number of consecutive blocks.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-16 17:27:29 +00:00
zangyangyang1
c84c242493 f2fs: fix using wrong 'submitted' value in f2fs_write_cache_pages
When f2fs_write_single_data_page fails, f2fs_write_cache_pages
will use the last 'submitted' value incorrectly, which will cause
'nwritten' and 'wbc->nr_to_write' calculation errors

Signed-off-by: zangyangyang1 <zangyangyang1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-13 18:49:38 +00:00
Yi Sun
e53c568f46 f2fs: add parameter @len to f2fs_invalidate_blocks()
New function can process some consecutive blocks at a time.

Function f2fs_invalidate_blocks()->down_write() and up_write()
are very time-consuming, so if f2fs_invalidate_blocks() can
process consecutive blocks at one time, it will save a lot of time.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-13 18:49:32 +00:00
Yi Sun
81ffbd224e f2fs: update_sit_entry_for_release() supports consecutive blocks.
This function can process some consecutive blocks at a time.

When using update_sit_entry() to release consecutive blocks,
ensure that the consecutive blocks belong to the same segment.
Because after update_sit_entry_for_realese(), @segno is still
in use in update_sit_entry().

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:46 +00:00
Yi Sun
66baee2b88 f2fs: introduce update_sit_entry_for_release/alloc()
No logical changes, just for cleanliness.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:45 +00:00
Jaegeuk Kim
cf5817ce66 f2fs: don't call block truncation for aliased file
This patch should avoid the below warning which does not corrupt the metadata
tho.

[   51.508120][  T253] F2FS-fs (dm-59): access invalid blkaddr:36
[   51.508156][  T253]  __f2fs_is_valid_blkaddr+0x330/0x384
[   51.508162][  T253]  f2fs_is_valid_blkaddr_raw+0x10/0x24
[   51.508163][  T253]  f2fs_truncate_data_blocks_range+0x1ec/0x438
[   51.508177][  T253]  f2fs_remove_inode_page+0x8c/0x148
[   51.508194][  T253]  f2fs_evict_inode+0x230/0x76c

Fixes: 128d333f0d ("f2fs: introduce device aliasing file")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:45 +00:00
Daniel Lee
91b587ba79 f2fs: Introduce linear search for dentries
This patch addresses an issue where some files in case-insensitive
directories become inaccessible due to changes in how the kernel function,
utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3
("unicode: Don't special case ignorable code points").

F2FS uses these case-folded names to calculate hash values for locating
dentries and stores them on disk. Since utf8_casefold() can produce
different output across kernel versions, stored hash values and newly
calculated hash values may differ. This results in affected files no
longer being found via the hash-based lookup.

To resolve this, the patch introduces a linear search fallback.
If the initial hash-based search fails, F2FS will sequentially scan the
directory entries.

Fixes: 5c26d2f1d3 ("unicode: Don't special case ignorable code points")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586
Signed-off-by: Daniel Lee <chullee@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:45 +00:00
Yi Sun
d217b5cea4 f2fs: add parameter @len to f2fs_invalidate_internal_cache()
New function can process some consecutive blocks at a time.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:45 +00:00
Yi Sun
3d56fbb1f0 f2fs: expand f2fs_invalidate_compress_page() to f2fs_invalidate_compress_pages_range()
New function f2fs_invalidate_compress_pages_range() adds the @len
parameter. So it can process some consecutive blocks at a time.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:45 +00:00
Dmitry Antipov
76f01376df f2fs: ensure that node info flags are always initialized
Syzbot has reported the following KMSAN splat:

BUG: KMSAN: uninit-value in f2fs_new_node_page+0x1494/0x1630
 f2fs_new_node_page+0x1494/0x1630
 f2fs_new_inode_page+0xb9/0x100
 f2fs_init_inode_metadata+0x176/0x1e90
 f2fs_add_inline_entry+0x723/0xc90
 f2fs_do_add_link+0x48f/0xa70
 f2fs_symlink+0x6af/0xfc0
 vfs_symlink+0x1f1/0x470
 do_symlinkat+0x471/0xbc0
 __x64_sys_symlink+0xcf/0x140
 x64_sys_call+0x2fcc/0x3d90
 do_syscall_64+0xd9/0x1b0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Local variable new_ni created at:
 f2fs_new_node_page+0x9d/0x1630
 f2fs_new_inode_page+0xb9/0x100

So adjust 'f2fs_get_node_info()' to ensure that 'flag'
field of 'struct node_info' is always initialized.

Reported-by: syzbot+5141f6db57a2f7614352@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5141f6db57a2f7614352
Fixes: e05df3b115 ("f2fs: add node operations")
Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:54 +00:00
Yongpeng Yang
e9a844f6e4 f2fs: The GC triggered by ioctl also needs to mark the segno as victim
In SSR mode, the segment selected for allocation might be the same as
the target segment of the GC triggered by ioctl, resulting in the GC
moving the CURSEG_I(sbi, type)->segno.
Thread A				Thread B or Thread A
- f2fs_ioc_gc_range
 - __f2fs_ioc_gc_range(.victim_segno=segno#N)
  - f2fs_gc
   - __get_victim
    - f2fs_get_victim
    : segno#N is valid, return segno#N as source segment of GC
					- f2fs_allocate_data_block
						- need_new_seg
						- get_ssr_segment
						- f2fs_get_victim
						: get segno #N as destination segment
						- change_curseg

Fixes: e066b83c9b ("f2fs: add ioctl to flush data from faster device to cold area")
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:29 +00:00
zangyangyang1
5f65945427 f2fs: cache more dentry pages
While traversing dir entries in dentry page, it's better to refresh current
accessed page in lru list by using FGP_ACCESSED flag, otherwise, such page
may has less chance to survive during memory reclaim, result in causing
additional IO when revisiting dentry page.

Signed-off-by: zangyangyang1 <zangyangyang1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:28 +00:00
Matthew Wilcox (Oracle)
c910a64bc4 f2fs: Remove calls to folio_file_mapping()
All folios that f2fs sees belong to f2fs and not to the swapcache
so it can dereference folio->mapping directly like all other
filesystems do.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:26 +00:00
Matthew Wilcox (Oracle)
19bbd306dd f2fs: Convert __read_io_type() to take a folio
Remove the last call to page_file_mapping() as both callers can now pass
in a folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:23 +00:00
Matthew Wilcox (Oracle)
f58d864582 f2fs: Use a data folio in f2fs_submit_page_bio()
Remove a call to compound_head().  We can call bio_add_folio_nofail()
here because we just allocated the bio, so we know it can't fail and
thus the error path can never be taken.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:19 +00:00
Matthew Wilcox (Oracle)
0765b3f989 f2fs: Use a folio more in f2fs_submit_page_bio()
Cache the result of page_folio(fio->page) in a local variable so
we don't have to keep calling it.  Saves a couple of calls to
compound_head() and removes an access to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:16 +00:00
Matthew Wilcox (Oracle)
e0821645dd f2fs: Convert f2fs_finish_read_bio() to use folios
Use bio_for_each_folio_all() to iterate over each folio in the bio.
This lets us use folio_end_read() which saves an atomic operation and
memory barrier compared to marking the folio uptodate and unlocking
it as two separate operations.  This also removes a few hidden calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:13 +00:00
Matthew Wilcox (Oracle)
1cf7460070 f2fs: Add F2FS_F_SB()
This is the folio equivalent of F2FS_P_SB().  Removes a call to
page_file_mapping() as we know folios seen by f2fs are never part of
the swap cache.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:10 +00:00
Matthew Wilcox (Oracle)
87e2a15bc0 f2fs: Convert submit tracepoints to take a folio
Remove accesses to page->index and page->mapping as well as
unnecessary calls to page_file_mapping().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:07 +00:00
Matthew Wilcox (Oracle)
ac866908d7 f2fs: Use a folio in f2fs_write_compressed_pages()
Remove accesses to page->index and an unnecessary reference to
page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:04 +00:00
Matthew Wilcox (Oracle)
1cda5bc0b2 f2fs: Use a folio in f2fs_truncate_partial_cluster()
Convert the incoming page to a folio and use it throughout.
Removes an access to page->index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:01 +00:00
Matthew Wilcox (Oracle)
ff6c82a934 f2fs: Use a folio in f2fs_compress_write_end()
This removes an access of page->index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:11:58 +00:00
Matthew Wilcox (Oracle)
a909c17953 f2fs: Use a folio in f2fs_all_cluster_page_ready()
Remove references to page->index and use folio_test_uptodate()
instead of PageUptodate().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:11:51 +00:00
Eric Biggers
3ca4bec40e f2fs: switch to using the crc32 library
Now that the crc32() library function takes advantage of
architecture-specific optimizations, it is unnecessary to go through the
crypto API.  Just use crc32().  This is much simpler, and it improves
performance due to eliminating the crypto API overhead.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-19-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01 17:23:02 -08:00
Linus Torvalds
44b4d13b70 f2fs-for-6.13-rc1
This series introduces a device aliasing feature where user can carve out
 partitions but reclaim the space back by deleting aliased file in root dir.
 In addition to that, there're numerous minor bug fixes in zoned device support,
 checkpoint=disable, extent cache management, fiemap, and lazytime mount option.
 The full list of noticeable changes can be found below.
 
 Enhancement:
  - introduce device aliasing file
  - add stats in debugfs to show multiple devices
  - add a sysfs node to limit max read extent count per-inode
  - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable
  - decrease spare area for pinned files for zoned devices
 
 Bug fix:
  - Revert "f2fs: remove unreachable lazytime mount option parsing"
  - adjust unusable cap before checkpoint=disable mode
  - fix to drop all discards after creating snapshot on lvm device
  - fix to shrink read extent node in batches
  - fix changing cursegs if recovery fails on zoned device
  - fix to adjust appropriate length for fiemap
  - fix fiemap failure issue when page size is 16KB
  - fix to avoid forcing direct write to use buffered IO on inline_data inode
  - fix to map blocks correctly for direct write
  - fix to account dirty data in __get_secs_required()
  - fix null-ptr-deref in f2fs_submit_page_bio()
  - f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmdD25MACgkQQBSofoJI
 UNKgLhAAgr0Dy/VWDgRlMovckq0q5EyQu/Jospv6mJyErQ4pZwwidNn9FSf0yua9
 O0Pofs1zMFWoe5R2UOvwOnahmvlwD1nnRMylA10/9hp+/aKlTRxOI7HrdL5wFgWG
 QRTb/k+mgoEQk8+9ElThzq/CkmQovPEUfhoxW7bE4zH9kVoxi2klFbkASZynqEFe
 a+TVQoDUnXvb1cbvr4zEVuD79QEmazD/bgc+gquxChCHfzX8ip4R0aCZM1ceTgm/
 Vru0LUKGQTWXPPReugJbOOtoIJ/kgD9Sg5xa7Icg3nxukgiYUDdl3e7MTgfvHOK6
 Fwwj+ZbM/yV/gpAQp+g+uOkKSFqfulyOb+nzX5tmebmiT2Vs6XSQ0Xo+fjm7N1QC
 j0G1vwz91xETK/gw2U/zL/HQVB3IU/2dtBT2ek4x6kmVL3rmHYoI6r2ofQcEFjGn
 2YQ9yvvT/fY6fza88kWO0PjgIRDzw9D9ihfZVyH9MCy5n6adhWlFXIg0HbAoecDE
 6xsVjb5BVYJfQvVz3FauGRXu6i3mePaURC1rrf5NKFfAWJP7pDfi9IvSL56u2aMt
 J+RJ7a2u1l1z/yhBxtr00KhMP586OZHVJwQvwNJV7mzBFhvOlm3a4jTzbG35dE+V
 MfbbjR628y/0IkqZiB7YVu1NIF2qdbZosv4nO7b584Q1h1NH/PU=
 =LOgM
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This series introduces a device aliasing feature where user can carve
  out partitions but reclaim the space back by deleting aliased file in
  root dir.

  In addition to that, there're numerous minor bug fixes in zoned device
  support, checkpoint=disable, extent cache management, fiemap, and
  lazytime mount option. The full list of noticeable changes can be
  found below.

  Enhancements:
   - introduce device aliasing file
   - add stats in debugfs to show multiple devices
   - add a sysfs node to limit max read extent count per-inode
   - modify f2fs_is_checkpoint_ready logic to allow more data to be
     written with the CP disable
   - decrease spare area for pinned files for zoned devices

  Fixes:
   - Revert "f2fs: remove unreachable lazytime mount option parsing"
   - adjust unusable cap before checkpoint=disable mode
   - fix to drop all discards after creating snapshot on lvm device
   - fix to shrink read extent node in batches
   - fix changing cursegs if recovery fails on zoned device
   - fix to adjust appropriate length for fiemap
   - fix fiemap failure issue when page size is 16KB
   - fix to avoid forcing direct write to use buffered IO on inline_data
     inode
   - fix to map blocks correctly for direct write
   - fix to account dirty data in __get_secs_required()
   - fix null-ptr-deref in f2fs_submit_page_bio()
   - fix inconsistent update of i_blocks in release_compress_blocks and
     reserve_compress_blocks"

* tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: fix to drop all discards after creating snapshot on lvm device
  f2fs: add a sysfs node to limit max read extent count per-inode
  f2fs: fix to shrink read extent node in batches
  f2fs: print message if fscorrupted was found in f2fs_new_node_page()
  f2fs: clear SBI_POR_DOING before initing inmem curseg
  f2fs: fix changing cursegs if recovery fails on zoned device
  f2fs: adjust unusable cap before checkpoint=disable mode
  f2fs: fix to requery extent which cross boundary of inquiry
  f2fs: fix to adjust appropriate length for fiemap
  f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
  f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
  f2fs: replace deprecated strcpy with strscpy
  Revert "f2fs: remove unreachable lazytime mount option parsing"
  f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
  f2fs: fix to map blocks correctly for direct write
  f2fs: fix race in concurrent f2fs_stop_gc_thread
  f2fs: fix fiemap failure issue when page size is 16KB
  f2fs: remove redundant atomic file check in defragment
  f2fs: fix to convert log type to segment data type correctly
  f2fs: clean up the unused variable additional_reserved_segments
  ...
2024-11-26 12:50:58 -08:00
Chao Yu
bc8aeb04fd f2fs: fix to drop all discards after creating snapshot on lvm device
Piergiorgio reported a bug in bugzilla as below:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
 __issue_discard_cmd+0x1ca/0x350 [f2fs]
 issue_discard_thread+0x191/0x480 [f2fs]
 kthread+0xcf/0x100
 ret_from_fork+0x31/0x50
 ret_from_fork_asm+0x1a/0x30

w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs

The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.

This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().

Cc: stable@vger.kernel.org
Fixes: 35ec7d5748 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-23 15:48:15 +00:00
Chao Yu
009a8241a8 f2fs: add a sysfs node to limit max read extent count per-inode
Quoted:
"at this time, there are still 1086911 extent nodes in this zombie
extent tree that need to be cleaned up.

crash_arm64_sprd_v8.0.3++> extent_tree.node_cnt ffffff80896cc500
  node_cnt = {
    counter = 1086911
  },
"

As reported by Xiuhong, there will be a huge number of extent nodes
in extent tree, it may potentially cause:
- slab memory fragments
- extreme long time shrink on extent tree
- low mapping efficiency

Let's add a sysfs node to limit max read extent count for each inode,
by default, value of this threshold is 10240, it can be updated
according to user's requirement.

Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-23 15:48:13 +00:00
Chao Yu
3fc5d5a182 f2fs: fix to shrink read extent node in batches
We use rwlock to protect core structure data of extent tree during
its shrink, however, if there is a huge number of extent nodes in
extent tree, during shrink of extent tree, it may hold rwlock for
a very long time, which may trigger kernel hang issue.

This patch fixes to shrink read extent node in batches, so that,
critical region of the rwlock can be shrunk to avoid its extreme
long time hold.

Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:16:54 +00:00
Chao Yu
81520c684c f2fs: print message if fscorrupted was found in f2fs_new_node_page()
If fs corruption occurs in f2fs_new_node_page(), let's print
more information about corrupted metadata into kernel log.

Meanwhile, it updates to record ERROR_INCONSISTENT_NAT instead
of ERROR_INVALID_BLKADDR if blkaddr in nat entry is not
NULL_ADDR which means nat bitmap and nat entry is inconsistent.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:16:39 +00:00
Sheng Yong
f88c7904b5 f2fs: clear SBI_POR_DOING before initing inmem curseg
SBI_POR_DOING can be cleared after recovery is completed, so that
changes made before recovery can be persistent, and subsequent
errors can be recorded into cp/sb.

Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:14:07 +00:00
Sheng Yong
1015035609 f2fs: fix changing cursegs if recovery fails on zoned device
Fsync data recovery attempts to check and fix write pointer consistency
of cursegs and all other zones. If the write pointers of cursegs are
unaligned, cursegs are changed to new sections.

If recovery fails, zone write pointers are still checked and fixed,
but the latest checkpoint cannot be written back. Additionally, retry-
mount skips recovery and rolls back to reuse the old cursegs whose
zones are already finished. This can lead to unaligned write later.

This patch addresses the issue by leaving writer pointers untouched if
recovery fails. When retry-mount is performed, cursegs and other zones
are checked and fixed after skipping recovery.

Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:13:53 +00:00
Daeho Jeong
a35749b1ed f2fs: adjust unusable cap before checkpoint=disable mode
The unusable cap value must be adjusted before checking whether
checkpoint=disable is feasible.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Chao Yu
6787a82245 f2fs: fix to requery extent which cross boundary of inquiry
dd if=/dev/zero of=file bs=4k count=5
xfs_io file -c "fiemap -v 2 16384"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..31]:         139272..139303      32 0x1000
     1: [32..39]:        139304..139311       8 0x1001
xfs_io file -c "fiemap -v 0 16384"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..31]:         139272..139303      32 0x1000
xfs_io file -c "fiemap -v 0 16385"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..39]:         139272..139311      40 0x1001

There are two problems:
- continuous extent is split to two
- FIEMAP_EXTENT_LAST is missing in last extent

The root cause is: if upper boundary of inquiry crosses extent,
f2fs_map_blocks() will truncate length of returned extent to
F2FS_BYTES_TO_BLK(len), and also, it will stop to query latter
extent or hole to make sure current extent is last or not.

In order to fix this issue, once we found an extent locates
in the end of inquiry range by f2fs_map_blocks(), we need to
expand inquiry range to requiry.

Cc: stable@vger.kernel.org
Fixes: 7f63eb77af ("f2fs: report unwritten area in f2fs_fiemap")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Zhiguo Niu
77569f785c f2fs: fix to adjust appropriate length for fiemap
If user give a file size as "length" parameter for fiemap
operations, but if this size is non-block size aligned,
it will show 2 segments fiemap results even this whole file
is contiguous on disk, such as the following results:

 ./f2fs_io fiemap 0 19034 ylog/analyzer.py
Fiemap: offset = 0 len = 19034
        logical addr.    physical addr.   length           flags
0       0000000000000000 0000000020baa000 0000000000004000 00001000
1       0000000000004000 0000000020bae000 0000000000001000 00001001

after this patch:
./f2fs_io fiemap 0 19034 ylog/analyzer.py
Fiemap: offset = 0 len = 19034
    logical addr.    physical addr.   length           flags
0    0000000000000000 00000000315f3000 0000000000005000 00001001

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Chao Yu
7461f37094 f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
f2fs doesn't support different blksize in one instance, so
bytes_to_blks() and blks_to_bytes() are equal to F2FS_BYTES_TO_BLK
and F2FS_BLK_TO_BYTES, let's use F2FS_BYTES_TO_BLK/F2FS_BLK_TO_BYTES
instead for cleanup.

Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Chao Yu
3273d8ad94 f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
It missed to cast variable to unsigned long long type before
bit shift, which will cause overflow, fix it.

Fixes: f7ef9b83b5 ("f2fs: introduce macros to convert bytes and blocks in f2fs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Daniel Yang
789ca0eb47 f2fs: replace deprecated strcpy with strscpy
strcpy is deprecated. Kernel docs recommend replacing strcpy with
strscpy. The function strcpy() return value isn't used so there
shouldn't be an issue replacing with the safer alternative strscpy.

Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Jaegeuk Kim
acff9409dd Revert "f2fs: remove unreachable lazytime mount option parsing"
This reverts commit 54f43a10fa.

The above commit broke the lazytime mount, given

mount("/dev/vdb", "/mnt/test", "f2fs", 0, "lazytime");

CC: stable@vger.kernel.org # 6.11+
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:15 +00:00
Linus Torvalds
0f25f0e4ef the bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same
 scope where they'd been created, getting rid of reassignments
 and passing them by reference, converting to CLASS(fd{,_pos,_raw}).
 
 We are getting very close to having the memory safety of that stuff
 trivial to verify.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ
 69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9
 CN18glYmD3wRyU6Bwl4vGODouSJvDgA=
 =gVH3
 -----END PGP SIGNATURE-----

Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' class updates from Al Viro:
 "The bulk of struct fd memory safety stuff

  Making sure that struct fd instances are destroyed in the same scope
  where they'd been created, getting rid of reassignments and passing
  them by reference, converting to CLASS(fd{,_pos,_raw}).

  We are getting very close to having the memory safety of that stuff
  trivial to verify"

* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
  deal with the last remaing boolean uses of fd_file()
  css_set_fork(): switch to CLASS(fd_raw, ...)
  memcg_write_event_control(): switch to CLASS(fd)
  assorted variants of irqfd setup: convert to CLASS(fd)
  do_pollfd(): convert to CLASS(fd)
  convert do_select()
  convert vfs_dedupe_file_range().
  convert cifs_ioctl_copychunk()
  convert media_request_get_by_fd()
  convert spu_run(2)
  switch spufs_calls_{get,put}() to CLASS() use
  convert cachestat(2)
  convert do_preadv()/do_pwritev()
  fdget(), more trivial conversions
  fdget(), trivial conversions
  privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
  o2hb_region_dev_store(): avoid goto around fdget()/fdput()
  introduce "fd_pos" class, convert fdget_pos() users to it.
  fdget_raw() users: switch to CLASS(fd_raw)
  convert vmsplice() to CLASS(fd)
  ...
2024-11-18 12:24:06 -08:00
Linus Torvalds
70e7730c2a vfs-6.13.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc
 osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli
 pqlG+38UdATImpfqqWjPbb72sBYcfQg=
 =wLUh
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Fixup and improve NLM and kNFSD file lock callbacks

     Last year both GFS2 and OCFS2 had some work done to make their
     locking more robust when exported over NFS. Unfortunately, part of
     that work caused both NLM (for NFS v3 exports) and kNFSD (for
     NFSv4.1+ exports) to no longer send lock notifications to clients

     This in itself is not a huge problem because most NFS clients will
     still poll the server in order to acquire a conflicted lock

     It's important for NLM and kNFSD that they do not block their
     kernel threads inside filesystem's file_lock implementations
     because that can produce deadlocks. We used to make sure of this by
     only trusting that posix_lock_file() can correctly handle blocking
     lock calls asynchronously, so the lock managers would only setup
     their file_lock requests for async callbacks if the filesystem did
     not define its own lock() file operation

     However, when GFS2 and OCFS2 grew the capability to correctly
     handle blocking lock requests asynchronously, they started
     signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
     for also trusting posix_lock_file() was inadvertently dropped, so
     now most filesystems no longer produce lock notifications when
     exported over NFS

     Fix this by using an fop_flag which greatly simplifies the problem
     and grooms the way for future uses by both filesystems and lock
     managers alike

   - Add a sysctl to delete the dentry when a file is removed instead of
     making it a negative dentry

     Commit 681ce86235 ("vfs: Delete the associated dentry when
     deleting a file") introduced an unconditional deletion of the
     associated dentry when a file is removed. However, this led to
     performance regressions in specific benchmarks, such as
     ilebench.sum_operations/s, prompting a revert in commit
     4a4be1ad3a ("Revert "vfs: Delete the associated dentry when
     deleting a file""). This reintroduces the concept conditionally
     through a sysctl

   - Expand the statmount() system call:

       * Report the filesystem subtype in a new fs_subtype field to
         e.g., report fuse filesystem subtypes

       * Report the superblock source in a new sb_source field

       * Add a new way to return filesystem specific mount options in an
         option array that returns filesystem specific mount options
         separated by zero bytes and unescaped. This allows caller's to
         retrieve filesystem specific mount options and immediately pass
         them to e.g., fsconfig() without having to unescape or split
         them

       * Report security (LSM) specific mount options in a separate
         security option array. We don't lump them together with
         filesystem specific mount options as security mount options are
         generic and most users aren't interested in them

         The format is the same as for the filesystem specific mount
         option array

   - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command

   - Optimize acl_permission_check() to avoid costly {g,u}id ownership
     checks if possible

   - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()

   - Add synchronous wakeup support for ep_poll_callback.

     Currently, epoll only uses wake_up() to wake up task. But sometimes
     there are epoll users which want to use the synchronous wakeup flag
     to give a hint to the scheduler, e.g., the Android binder driver.
     So add a wake_up_sync() define, and use wake_up_sync() when sync is
     true in ep_poll_callback()

  Fixes:

   - Fix kernel documentation for inode_insert5() and iget5_locked()

   - Annotate racy epoll check on file->f_ep

   - Make F_DUPFD_QUERY associative

   - Avoid filename buffer overrun in initramfs

   - Don't let statmount() return empty strings

   - Add a cond_resched() to dump_user_range() to avoid hogging the CPU

   - Don't query the device logical blocksize multiple times for hfsplus

   - Make filemap_read() check that the offset is positive or zero

  Cleanups:

   - Various typo fixes

   - Cleanup wbc_attach_fdatawrite_inode()

   - Add __releases annotation to wbc_attach_and_unlock_inode()

   - Add hugetlbfs tracepoints

   - Fix various vfs kernel doc parameters

   - Remove obsolete TODO comment from io_cancel()

   - Convert wbc_account_cgroup_owner() to take a folio

   - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()

   - Reorder struct posix_acl to save 8 bytes

   - Annotate struct posix_acl with __counted_by()

   - Replace one-element array with flexible array member in freevxfs

   - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"

* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
  statmount: retrieve security mount options
  vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
  statmount: add flag to retrieve unescaped options
  fs: add the ability for statmount() to report the sb_source
  writeback: wbc_attach_fdatawrite_inode out of line
  writeback: add a __releases annoation to wbc_attach_and_unlock_inode
  fs: add the ability for statmount() to report the fs_subtype
  fs: don't let statmount return empty strings
  fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
  hfsplus: don't query the device logical block size multiple times
  freevxfs: Replace one-element array with flexible array member
  fs: optimize acl_permission_check()
  initramfs: avoid filename buffer overrun
  fs/writeback: convert wbc_account_cgroup_owner to take a folio
  acl: Annotate struct posix_acl with __counted_by()
  acl: Realign struct posix_acl to save 8 bytes
  epoll: Add synchronous wakeup support for ep_poll_callback
  coredump: add cond_resched() to dump_user_range
  mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
  mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
  ...
2024-11-18 09:35:30 -08:00
Chao Yu
26e6f59d0b f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
Jinsu Lee reported a performance regression issue, after commit
5c8764f867 ("f2fs: fix to force buffered IO on inline_data
inode"), we forced direct write to use buffered IO on inline_data
inode, it will cause performace regression due to memory copy
and data flush.

It's fine to not force direct write to use buffered IO, as it
can convert inline inode before committing direct write IO.

Fixes: 5c8764f867 ("f2fs: fix to force buffered IO on inline_data inode")
Reported-by: Jinsu Lee <jinsu1.lee@samsung.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/af03dd2c-e361-4f80-b2fd-39440766cf6e@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05 18:30:22 +00:00
Chao Yu
5dd00ebda3 f2fs: fix to map blocks correctly for direct write
f2fs_map_blocks() supports to map continuous holes or preallocated
address, we should avoid setting F2FS_MAP_MAPPED for these cases
only, otherwise, it may fail f2fs_iomap_begin(), and make direct
write fallbacking to use buffered IO and flush, result in performance
regression.

Fixes: 9f0f6bf427 ("f2fs: support to map continuous holes or preallocated address")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409122103.e45aa13b-oliver.sang@intel.com
Cc: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05 18:29:57 +00:00
Long Li
7b0033dbc4 f2fs: fix race in concurrent f2fs_stop_gc_thread
In my test case, concurrent calls to f2fs shutdown report the following
stack trace:

 Oops: general protection fault, probably for non-canonical address 0xc6cfff63bb5513fc: 0000 [#1] PREEMPT SMP PTI
 CPU: 0 UID: 0 PID: 678 Comm: f2fs_rep_shutdo Not tainted 6.12.0-rc5-next-20241029-g6fb2fa9805c5-dirty #85
 Call Trace:
  <TASK>
  ? show_regs+0x8b/0xa0
  ? __die_body+0x26/0xa0
  ? die_addr+0x54/0x90
  ? exc_general_protection+0x24b/0x5c0
  ? asm_exc_general_protection+0x26/0x30
  ? kthread_stop+0x46/0x390
  f2fs_stop_gc_thread+0x6c/0x110
  f2fs_do_shutdown+0x309/0x3a0
  f2fs_ioc_shutdown+0x150/0x1c0
  __f2fs_ioctl+0xffd/0x2ac0
  f2fs_ioctl+0x76/0xe0
  vfs_ioctl+0x23/0x60
  __x64_sys_ioctl+0xce/0xf0
  x64_sys_call+0x2b1b/0x4540
  do_syscall_64+0xa7/0x240
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The root cause is a race condition in f2fs_stop_gc_thread() called from
different f2fs shutdown paths:

  [CPU0]                       [CPU1]
  ----------------------       -----------------------
  f2fs_stop_gc_thread          f2fs_stop_gc_thread
                                 gc_th = sbi->gc_thread
    gc_th = sbi->gc_thread
    kfree(gc_th)
    sbi->gc_thread = NULL
                                 < gc_th != NULL >
                                 kthread_stop(gc_th->f2fs_gc_task) //UAF

The commit c7f114d864 ("f2fs: fix to avoid use-after-free in
f2fs_stop_gc_thread()") attempted to fix this issue by using a read
semaphore to prevent races between shutdown and remount threads, but
it fails to prevent all race conditions.

Fix it by converting to write lock of s_umount in f2fs_do_shutdown().

Fixes: 7950e9ac63 ("f2fs: stop gc/discard thread after fs shutdown")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05 18:27:51 +00:00
Xiuhong Wang
a7a7c1d423 f2fs: fix fiemap failure issue when page size is 16KB
After enable 16K page size, an infinite loop may occur in
fiemap (fm_length=UINT64_MAX) on a file, such as the 16KB
scratch.img during the remount operation in Android.

The condition for whether fiemap continues to map is to check
whether the number of bytes corresponding to the next map.m_lblk
exceeds blks_to_bytes(inode,max_inode_blocks(inode)) if there are HOLE.
The latter does not take into account the maximum size of a file with 16KB
page size, so the loop cannot be jumped out.

The following is the fail trace:
When f2fs_map_blocks reaches map.m_lblk=3936, it needs to go to the
first direct node block, so the map is 3936 + 4090 = 8026,
The next map is the second direct node block, that is,
8026 + 4090 = 12116,
The next map is the first indirect node block, that is,
12116 + 4090 * 4090 = 16740216,
The next map is the second indirect node block, that is,
16740216 + 4090 * 4090 = 33468316,
The next map is the first double indirect node block, that is,
33468316 + 4090 * 4090 * 4090 = 68451397316
Since map.m_lblk represents the address of a block, which is 32
bits, truncation will occur, that is, 68451397316 becomes
4026887876, and the number of bytes corresponding to the block
number does not exceed blks_to_bytes(inode,max_inode_blocks(inode)),
so the loop will not be jumped out.
The next time, it will be considered that it should still be a
double indirect node block, that is,
4026887876 + 4090 * 4090 * 4090 = 72444816876, which will be
truncated to 3725340140, and the loop will not be jumped out.

156.374871: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 0, start blkaddr = 0x8e00, len = 0x200, flags = 2,seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.374916: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 512, start blkaddr = 0x0, len = 0x0, flags = 0 , seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.374920: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 513, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
......
156.385747: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3935, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385752: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3936, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385755: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 8026, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385758: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 12116, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385761: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 16740216, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385764: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 33468316, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385767: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 4026887876, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385770: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3725340140, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385772: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 4026887876, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385775: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3725340140, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0

Commit a6a010f5de ("f2fs: Restrict max filesize for 16K f2fs")
has set the maximum allowed file size to (U32_MAX + 1) * F2FS_BLKSIZE,
so max_file_blocks should be used here to limit it, that is,
maxbytes defined above. And the max_inode_blocks function is not
called by other functions except here, so cleanup it.

Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05 18:26:51 +00:00
Al Viro
8152f82010 fdget(), more trivial conversions
all failure exits prior to fdget() leave the scope, all matching fdput()
are immediately followed by leaving the scope.

[xfs_ioc_commit_range() chunk moved here as well]

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03 01:28:06 -05:00
Zhiguo Niu
744e66cb87 f2fs: remove redundant atomic file check in defragment
f2fs_is_atomic_file(inode) is checked in f2fs_defragment_range,
so remove the redundant checking in f2fs_ioc_defragment.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:42 +00:00
Chao Yu
51d3d952c5 f2fs: fix to convert log type to segment data type correctly
This patch introduces a new helper log_type_to_seg_type() to convert
log type to segment data type, and uses it to clean up opened codes
in build_curseg(), and also it fixes to convert log type before use
in do_write_page().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:42 +00:00
LongPing Wei
1df2bc3c82 f2fs: clean up the unused variable additional_reserved_segments
additional_reserved_segments was introduced by
commit 300a842937 ("f2fs: fix to reserve space for IO align feature"),
and its initialization was deleted by
commit 87161a2b0a ("f2fs: deprecate io_bits").

Signed-off-by: LongPing Wei <weilongping@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Chao Yu
cffaa0976f f2fs: clean up opened code w/ {get,set}_nid()
Just cleanup, no logic change.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Zhiguo Niu
296b8cb34e f2fs: fix to avoid use GC_AT when setting gc_mode as GC_URGENT_LOW or GC_URGENT_MID
If gc_mode is set to GC_URGENT_LOW or GC_URGENT_MID, cost benefit GC
approach should be used, but if ATGC is enabled at the same time,
Age-threshold approach will be selected, which can only do amount of
GC and it is much less than the numbers of CB approach.

some traces:
  f2fs_gc-254:48-396     [007] ..... 2311600.684028: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1053, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
  f2fs_gc-254:48-396     [007] ..... 2311600.684527: f2fs_get_victim: dev = (254,48), type = No TYPE, policy = (Background GC, LFS-mode, Age-threshold), victim = 10, cost = 4294364975, ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 44898
  f2fs_gc-254:48-396     [007] ..... 2311600.714835: f2fs_gc_end: dev = (254,48), ret = 0, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
  f2fs_gc-254:48-396     [007] ..... 2311600.714843: f2fs_background_gc: dev = (254,48), wait_ms = 50, prefree = 0, free = 44898
  f2fs_gc-254:48-396     [007] ..... 2311600.771785: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:
  f2fs_gc-254:48-396     [007] ..... 2311600.772275: f2fs_gc_end: dev = (254,48), ret = -61, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0

Fixes: 0e5e81114d ("f2fs: add GC_URGENT_LOW mode in gc_urgent")
Fixes: d98af5f455 ("f2fs: introduce gc_urgent_mid mode")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Chao Yu
9395fb09e8 f2fs: fix to parse temperature correctly in f2fs_get_segment_temp()
In __get_segment_type(), __get_segment_type_6() may return
CURSEG_COLD_DATA_PINNED or CURSEG_ALL_DATA_ATGC log type, but
following f2fs_get_segment_temp() can only handle persistent
log type, fix it.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Chao Yu
f10a890308 f2fs: fix to avoid potential deadlock in f2fs_record_stop_reason()
syzbot reports deadlock issue of f2fs as below:

======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc3-syzkaller-00087-gc964ced77262 #0 Not tainted
------------------------------------------------------
kswapd0/79 is trying to acquire lock:
ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2199 [inline]
ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068

but task is already holding lock:
ffff88804bd92610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sb_internal#2){.+.+}-{0:0}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write include/linux/fs.h:1716 [inline]
       sb_start_intwrite+0x4d/0x1c0 include/linux/fs.h:1899
       f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842
       evict+0x4e8/0x9b0 fs/inode.c:725
       f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807
       evict+0x4e8/0x9b0 fs/inode.c:725
       dispose_list fs/inode.c:774 [inline]
       prune_icache_sb+0x239/0x2f0 fs/inode.c:963
       super_cache_scan+0x38c/0x4b0 fs/super.c:223
       do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
       shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
       shrink_one+0x43b/0x850 mm/vmscan.c:4818
       shrink_many mm/vmscan.c:4879 [inline]
       lru_gen_shrink_node mm/vmscan.c:4957 [inline]
       shrink_node+0x3799/0x3de0 mm/vmscan.c:5937
       kswapd_shrink_node mm/vmscan.c:6765 [inline]
       balance_pgdat mm/vmscan.c:6957 [inline]
       kswapd+0x1ca3/0x3700 mm/vmscan.c:7226
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

-> #1 (fs_reclaim){+.+.}-{0:0}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       __fs_reclaim_acquire mm/page_alloc.c:3834 [inline]
       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3848
       might_alloc include/linux/sched/mm.h:318 [inline]
       prepare_alloc_pages+0x147/0x5b0 mm/page_alloc.c:4493
       __alloc_pages_noprof+0x16f/0x710 mm/page_alloc.c:4722
       alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
       alloc_pages_noprof mm/mempolicy.c:2345 [inline]
       folio_alloc_noprof+0x128/0x180 mm/mempolicy.c:2352
       filemap_alloc_folio_noprof+0xdf/0x500 mm/filemap.c:1010
       do_read_cache_folio+0x2eb/0x850 mm/filemap.c:3787
       read_mapping_folio include/linux/pagemap.h:1011 [inline]
       f2fs_commit_super+0x3c0/0x7d0 fs/f2fs/super.c:4032
       f2fs_record_stop_reason+0x13b/0x1d0 fs/f2fs/super.c:4079
       f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174
       f2fs_write_inode+0x35f/0x4d0 fs/f2fs/inode.c:785
       write_inode fs/fs-writeback.c:1503 [inline]
       __writeback_single_inode+0x711/0x10d0 fs/fs-writeback.c:1723
       writeback_single_inode+0x1f3/0x660 fs/fs-writeback.c:1779
       sync_inode_metadata+0xc4/0x120 fs/fs-writeback.c:2849
       f2fs_release_file+0xa8/0x100 fs/f2fs/file.c:1941
       __fput+0x23f/0x880 fs/file_table.c:431
       task_work_run+0x24f/0x310 kernel/task_work.c:228
       resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
       exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
       __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
       syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218
       do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&sbi->sb_lock){++++}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3161 [inline]
       check_prevs_add kernel/locking/lockdep.c:3280 [inline]
       validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
       __lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       down_write+0x99/0x220 kernel/locking/rwsem.c:1577
       f2fs_down_write fs/f2fs/f2fs.h:2199 [inline]
       f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068
       f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174
       f2fs_evict_inode+0xa61/0x15c0 fs/f2fs/inode.c:883
       evict+0x4e8/0x9b0 fs/inode.c:725
       f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807
       evict+0x4e8/0x9b0 fs/inode.c:725
       dispose_list fs/inode.c:774 [inline]
       prune_icache_sb+0x239/0x2f0 fs/inode.c:963
       super_cache_scan+0x38c/0x4b0 fs/super.c:223
       do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
       shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
       shrink_one+0x43b/0x850 mm/vmscan.c:4818
       shrink_many mm/vmscan.c:4879 [inline]
       lru_gen_shrink_node mm/vmscan.c:4957 [inline]
       shrink_node+0x3799/0x3de0 mm/vmscan.c:5937
       kswapd_shrink_node mm/vmscan.c:6765 [inline]
       balance_pgdat mm/vmscan.c:6957 [inline]
       kswapd+0x1ca3/0x3700 mm/vmscan.c:7226
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

other info that might help us debug this:

Chain exists of:
  &sbi->sb_lock --> fs_reclaim --> sb_internal#2

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rlock(sb_internal#2);
                               lock(fs_reclaim);
                               lock(sb_internal#2);
  lock(&sbi->sb_lock);

Root cause is there will be potential deadlock in between
below tasks:

Thread A				Kswapd
- f2fs_ioc_commit_atomic_write
 - mnt_want_write_file -- down_read lock A
					- balance_pgdat
					 - __fs_reclaim_acquire  -- lock B
					  - shrink_node
					   - prune_icache_sb
					    - dispose_list
					     - f2fs_evict_inode
					      - sb_start_intwrite  -- down_read lock A
 - f2fs_do_sync_file
  - f2fs_write_inode
   - f2fs_handle_critical_error
    - f2fs_record_stop_reason
     - f2fs_commit_super
      - read_mapping_folio
       - filemap_alloc_folio_noprof
        - fs_reclaim_acquire  -- lock B

Both threads try to acquire read lock of lock A, then its upcoming write
lock grabber will trigger deadlock.

Let's always create an asynchronous task in f2fs_handle_critical_error()
rather than calling f2fs_record_stop_reason() synchronously to avoid
this potential deadlock issue.

Fixes: b62e71be21 ("f2fs: support errors=remount-ro|continue|panic mountoption")
Reported-by: syzbot+be4a9983e95a5e25c8d3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6704d667.050a0220.1e4d62.0081.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Zeng Heng
0c3a38a4b4 f2fs: Fix not used variable 'index'
Fix the following compilation warning:
fs/f2fs/data.c:2391:10: warning: variable ‘index’ set but not used
[-Wunused-but-set-variable]
 2391 |  pgoff_t index;

Only define and set the variable index when the CONFIG_F2FS_FS_COMPRESSION
is enabled.

Fixes: db92e6c729 ("f2fs: convert f2fs_mpage_readpages() to use folio")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Andrew Kreimer
e63ce120b4 f2fs: fix typos
Fix typos: datas -> data.

Via codespell.

Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Qi Han
84b5bb8bf0 f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable
When the free segment is used up during CP disable, many write or
ioctl operations will get ENOSPC error codes, even if there are
still many blocks available. We can reproduce it in the following
steps:

dd if=/dev/zero of=f2fs.img bs=1M count=65
mkfs.f2fs -f f2fs.img
mount f2fs.img f2fs_dir -o checkpoint=disable:10%
cd f2fs_dir
i=1 ; while [[ $i -lt 50 ]] ; do (file_name=./2M_file$i ; dd \
if=/dev/random of=$file_name bs=1M count=2); i=$((i+1)); done
sync
i=1 ; while [[ $i -lt 50 ]] ; do (file_name=./2M_file$i ; truncate \
-s 1K $file_name); i=$((i+1)); done
sync
dd if=/dev/zero of=./file bs=1M count=20

In f2fs_need_SSR() function, it is allowed to use SSR to allocate
blocks when CP is disabled, so in f2fs_is_checkpoint_ready function,
can we judge the number of invalid blocks when free segment is not
enough, and return ENOSPC only if the number of invalid blocks is
also not enough.

Signed-off-by: Qi Han <hanqi@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:40 +00:00
Yongpeng Yang
43563069e1 f2fs: check curseg->inited before write_sum_page in change_curseg
In the __f2fs_init_atgc_curseg->get_atssr_segment calling,
curseg->segno is NULL_SEGNO, indicating that there is no summary
block that needs to be written.

Fixes: 093749e296 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:19:01 +00:00