linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-02 16:44:59 +00:00

Author	SHA1	Message	Date
Chao Yu	e2374015f2	f2fs: fix to propagate return value of scan_nat_page() As Anatoly Trosinenko reported in bugzilla: How to reproduce: 1. Compile the `73fcb1a370` version of the kernel using the config attached 2. Unpack and mount the attached filesystem image as F2FS 3. The kernel will BUG() on mount (BUGs are explicitly enabled in config) [ 2.233612] F2FS-fs (sda): Found nat_bits in checkpoint [ 2.248422] ------------[ cut here ]------------ [ 2.248857] kernel BUG at fs/f2fs/node.c:1967! [ 2.249760] invalid opcode: 0000 [#1] SMP NOPTI [ 2.250219] Modules linked in: [ 2.251848] CPU: 0 PID: 944 Comm: mount Not tainted 4.17.0-rc5+ #1 [ 2.252331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 2.253305] RIP: 0010:build_free_nids+0x337/0x3f0 [ 2.253672] RSP: 0018:ffffae7fc0857c50 EFLAGS: 00000246 [ 2.254080] RAX: 00000000ffffffff RBX: 0000000000000123 RCX: 0000000000000001 [ 2.254638] RDX: ffff9aa7063d5c00 RSI: 0000000000000122 RDI: ffff9aa705852e00 [ 2.255190] RBP: ffff9aa705852e00 R08: 0000000000000001 R09: ffff9aa7059090c0 [ 2.255719] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9aa705852e00 [ 2.256242] R13: ffff9aa7063ad000 R14: ffff9aa705919000 R15: 0000000000000123 [ 2.256809] FS: 00000000023078c0(0000) GS:ffff9aa707800000(0000) knlGS:0000000000000000 [ 2.258654] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.259153] CR2: 00000000005511ae CR3: 0000000005872000 CR4: 00000000000006f0 [ 2.259801] Call Trace: [ 2.260583] build_node_manager+0x5cd/0x600 [ 2.260963] f2fs_fill_super+0x66a/0x17c0 [ 2.261300] ? f2fs_commit_super+0xe0/0xe0 [ 2.261622] mount_bdev+0x16e/0x1a0 [ 2.261899] mount_fs+0x30/0x150 [ 2.262398] vfs_kern_mount.part.28+0x4f/0xf0 [ 2.262743] do_mount+0x5d0/0xc60 [ 2.263010] ? _copy_from_user+0x37/0x60 [ 2.263313] ? memdup_user+0x39/0x60 [ 2.263692] ksys_mount+0x7b/0xd0 [ 2.263960] __x64_sys_mount+0x1c/0x20 [ 2.264268] do_syscall_64+0x43/0xf0 [ 2.264560] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 2.265095] RIP: 0033:0x48d31a [ 2.265502] RSP: 002b:00007ffc6fe60a08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [ 2.266089] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 000000000048d31a [ 2.266607] RDX: 00007ffc6fe62fa5 RSI: 00007ffc6fe62f9d RDI: 00007ffc6fe62f94 [ 2.267130] RBP: 00000000023078a0 R08: 0000000000000000 R09: 0000000000000000 [ 2.267670] R10: 0000000000008000 R11: 0000000000000246 R12: 0000000000000000 [ 2.268192] R13: 0000000000000000 R14: 00007ffc6fe60c78 R15: 0000000000000000 [ 2.268767] Code: e8 5f c3 ff ff 83 c3 01 41 83 c7 01 81 fb c7 01 00 00 74 48 44 39 7d 04 76 42 48 63 c3 48 8d 04 c0 41 8b 44 06 05 83 f8 ff 75 c1 <0f> 0b 49 8b 45 50 48 8d b8 b0 00 00 00 e8 37 59 69 00 b9 01 00 [ 2.270434] RIP: build_free_nids+0x337/0x3f0 RSP: ffffae7fc0857c50 [ 2.271426] ---[ end trace ab20c06cd3c8fde4 ]--- During loading NAT entries, we will do sanity check, once the entry info is corrupted, it will cause BUG_ON directly to protect user data from being overwrited. In this case, it will be better to just return failure on mount() instead of panic, so that user can get hint from kmsg and try fsck for recovery immediately rather than after an abnormal reboot. https://bugzilla.kernel.org/show_bug.cgi?id=199769 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-07-27 18:03:59 +09:00
Weichao Guo	54c55c4e4f	f2fs: support in-memory inode checksum when checking consistency Enable in-memory inode checksum to protect metadata blocks from in-memory scribbles when checking consistency, which has no performance requirements. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-07-27 18:03:59 +09:00
Jaegeuk Kim	83a3bfdb5a	f2fs: indicate shutdown f2fs to allow unmount successfully Once we shutdown f2fs, we have to flush stale pages in order to unmount the system. In order to make stable, we need to stop fault injection as well. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-07-27 18:03:56 +09:00
Jaegeuk Kim	7f2ecdd837	f2fs: flush journal nat entries for nat_bits during unmount Let's flush journal nat entries for speed up in the next run. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-07-11 19:54:51 -07:00
Kees Cook	9d2a789c1d	treewide: Use array_size in f2fs_kvzalloc() The f2fs_kvzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: f2fs_kvzalloc(handle, a * b, gfp) with: f2fs_kvzalloc(handle, array_size(a, b), gfp) as well as handling cases of: f2fs_kvzalloc(handle, a * b * c, gfp) with: f2fs_kvzalloc(handle, array3_size(a, b, c), gfp) This does, however, attempt to ignore constant size factors like: f2fs_kvzalloc(handle, 4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ expression HANDLE; type TYPE; expression THING, E; @@ ( f2fs_kvzalloc(HANDLE, - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| f2fs_kvzalloc(HANDLE, - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression HANDLE; expression COUNT; typedef u8; typedef __u8; @@ ( f2fs_kvzalloc(HANDLE, - sizeof(u8) * (COUNT) + COUNT , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(__u8) * (COUNT) + COUNT , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(char) * (COUNT) + COUNT , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(u8) * COUNT + COUNT , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(__u8) * COUNT + COUNT , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(char) * COUNT + COUNT , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ expression HANDLE; type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ expression HANDLE; identifier SIZE, COUNT; @@ f2fs_kvzalloc(HANDLE, - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression HANDLE; expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression HANDLE; expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( f2fs_kvzalloc(HANDLE, - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| f2fs_kvzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ expression HANDLE; identifier STRIDE, SIZE, COUNT; @@ ( f2fs_kvzalloc(HANDLE, - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kvzalloc(HANDLE, - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kvzalloc(HANDLE, - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kvzalloc(HANDLE, - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kvzalloc(HANDLE, - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kvzalloc(HANDLE, - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kvzalloc(HANDLE, - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kvzalloc(HANDLE, - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression HANDLE; expression E1, E2, E3; constant C1, C2, C3; @@ ( f2fs_kvzalloc(HANDLE, C1 * C2 * C3, ...) \| f2fs_kvzalloc(HANDLE, - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression HANDLE; expression E1, E2; constant C1, C2; @@ ( f2fs_kvzalloc(HANDLE, C1 * C2, ...) \| f2fs_kvzalloc(HANDLE, - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Kees Cook	026f05079b	treewide: Use array_size() in f2fs_kzalloc() The f2fs_kzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: f2fs_kzalloc(handle, a * b, gfp) with: f2fs_kzalloc(handle, array_size(a, b), gfp) as well as handling cases of: f2fs_kzalloc(handle, a * b * c, gfp) with: f2fs_kzalloc(handle, array3_size(a, b, c), gfp) This does, however, attempt to ignore constant size factors like: f2fs_kzalloc(handle, 4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ expression HANDLE; type TYPE; expression THING, E; @@ ( f2fs_kzalloc(HANDLE, - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| f2fs_kzalloc(HANDLE, - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression HANDLE; expression COUNT; typedef u8; typedef __u8; @@ ( f2fs_kzalloc(HANDLE, - sizeof(u8) * (COUNT) + COUNT , ...) \| f2fs_kzalloc(HANDLE, - sizeof(__u8) * (COUNT) + COUNT , ...) \| f2fs_kzalloc(HANDLE, - sizeof(char) * (COUNT) + COUNT , ...) \| f2fs_kzalloc(HANDLE, - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| f2fs_kzalloc(HANDLE, - sizeof(u8) * COUNT + COUNT , ...) \| f2fs_kzalloc(HANDLE, - sizeof(__u8) * COUNT + COUNT , ...) \| f2fs_kzalloc(HANDLE, - sizeof(char) * COUNT + COUNT , ...) \| f2fs_kzalloc(HANDLE, - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ expression HANDLE; type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( f2fs_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ expression HANDLE; identifier SIZE, COUNT; @@ f2fs_kzalloc(HANDLE, - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression HANDLE; expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( f2fs_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression HANDLE; expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( f2fs_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| f2fs_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ expression HANDLE; identifier STRIDE, SIZE, COUNT; @@ ( f2fs_kzalloc(HANDLE, - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kzalloc(HANDLE, - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kzalloc(HANDLE, - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kzalloc(HANDLE, - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kzalloc(HANDLE, - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kzalloc(HANDLE, - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kzalloc(HANDLE, - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| f2fs_kzalloc(HANDLE, - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression HANDLE; expression E1, E2, E3; constant C1, C2, C3; @@ ( f2fs_kzalloc(HANDLE, C1 * C2 * C3, ...) \| f2fs_kzalloc(HANDLE, - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression HANDLE; expression E1, E2; constant C1, C2; @@ ( f2fs_kzalloc(HANDLE, C1 * C2, ...) \| f2fs_kzalloc(HANDLE, - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Chao Yu	c29fd0c0e2	f2fs: let sync node IO interrupt async one Although mixed sync/async IOs can have continuous LBA, as they have different IO priority, block IO scheduler will add them into different queues and commit them separately, result in splited IOs which causes wrose performance. This patch gives high priority to synchronous IO of nodes, means that once synchronous flow starts, it can interrupt asynchronous writeback flow of system flusher, so more big IOs can be expected. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-06-04 14:33:20 -07:00
Chao Yu	aae764ece6	f2fs: don't change wbc->sync_mode We should never falsify wbc->sync_mode passed from mm, otherwise mm can trigger writeback with wrong IO priority. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-06-04 14:32:44 -07:00
Chao Yu	4d57b86dd8	f2fs: clean up symbol namespace As Ted reported: "Hi, I was looking at f2fs's sources recently, and I noticed that there is a very large number of non-static symbols which don't have a f2fs prefix. There's well over a hundred (see attached below). As one example, in fs/f2fs/dir.c there is: unsigned char get_de_type(struct f2fs_dir_entry *de) This function is clearly only useful for f2fs, but it has a generic name. This means that if any other file system tries to have the same symbol name, there will be a symbol conflict and the kernel would not successfully build. It also means that when someone is looking f2fs sources, it's not at all obvious whether a function such as read_data_page(), invalidate_blocks(), is a generic kernel function found in the fs, mm, or block layers, or a f2fs specific function. You might want to fix this at some point. Hopefully Kent's bcachefs isn't similarly using genericly named functions, since that might cause conflicts with f2fs's functions --- but just as this would be a problem that we would rightly insist that Kent fix, this is something that we should have rightly insisted that f2fs should have fixed before it was integrated into the mainline kernel. acquire_orphan_inode add_ino_entry add_orphan_inode allocate_data_block allocate_new_segments alloc_nid alloc_nid_done alloc_nid_failed available_free_memory ...." This patch adds "f2fs_" prefix for all non-static symbols in order to: a) avoid conflict with other kernel generic symbols; b) to indicate the function is f2fs specific one instead of generic one; Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-05-31 11:31:53 -07:00
Chao Yu	aec2f729fc	f2fs: clean up with clear_radix_tree_dirty_tag Introduce clear_radix_tree_dirty_tag to include common codes for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-05-31 11:31:52 -07:00
Chao Yu	7b525dd013	f2fs: clean up with is_valid_blkaddr() - rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability. - introduce is_valid_blkaddr() for cleanup. No logic change in this patch. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-05-31 11:31:50 -07:00
Jaegeuk Kim	868de6135f	f2fs: don't drop any page on f2fs_cp_error() case We still provide readdir() after shtudown, so we should keep pages to avoid additional IOs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-05-31 11:31:49 -07:00
Jaegeuk Kim	a4f843bd00	f2fs: give message and set need_fsck given broken node id syzbot hit the following crash on upstream commit `83beed7b2b` (Fri Apr 20 17:56:32 2018 +0000) Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=d154ec99402c6f628887 C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5414336294027264 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5471683234234368 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5436660795834368 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0) F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (loop0): invalid crc value ------------[ cut here ]------------ kernel BUG at fs/f2fs/node.c:1185! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4549 Comm: syzkaller704305 Not tainted 4.17.0-rc1+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: 0018:ffff8801d960e820 EFLAGS: 00010293 RAX: ffff8801d88205c0 RBX: 0000000000000003 RCX: ffffffff82f6cc06 RDX: 0000000000000000 RSI: ffffffff82f6d5e8 RDI: 0000000000000004 RBP: ffff8801d960ec30 R08: ffff8801d88205c0 R09: ffffed003b5e46c2 R10: 0000000000000003 R11: 0000000000000003 R12: ffff8801a86e00c0 R13: 0000000000000001 R14: ffff8801a86e0530 R15: ffff8801d9745240 FS: 000000000072c880(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3d403209b8 CR3: 00000001d8f3f000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: get_node_page fs/f2fs/node.c:1237 [inline] truncate_xattr_node+0x152/0x2e0 fs/f2fs/node.c:1014 remove_inode_page+0x200/0xaf0 fs/f2fs/node.c:1039 f2fs_evict_inode+0xe86/0x1710 fs/f2fs/inode.c:547 evict+0x4a6/0x960 fs/inode.c:557 iput_final fs/inode.c:1519 [inline] iput+0x62d/0xa80 fs/inode.c:1545 f2fs_fill_super+0x5f4e/0x7bf0 fs/f2fs/super.c:2849 mount_bdev+0x30c/0x3e0 fs/super.c:1164 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020 mount_fs+0xae/0x328 fs/super.c:1267 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037 vfs_kern_mount fs/namespace.c:1027 [inline] do_new_mount fs/namespace.c:2518 [inline] do_mount+0x564/0x3070 fs/namespace.c:2848 ksys_mount+0x12d/0x140 fs/namespace.c:3064 __do_sys_mount fs/namespace.c:3078 [inline] __se_sys_mount fs/namespace.c:3075 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x443dea RSP: 002b:00007ffcc7882368 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443dea RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffcc7882370 RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004 R13: 0000000000402ce0 R14: 0000000000000000 R15: 0000000000000000 RIP: __get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: ffff8801d960e820 ---[ end trace 4edbeb71f002bb76 ]--- Reported-and-tested-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-05-31 11:31:48 -07:00
Jaegeuk Kim	17c500350b	f2fs: clear PageError on writepage This patch clears PageError in some pages tagged by read path, but when we write the pages with valid contents, writepage should clear the bit likewise ext4. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-05-02 14:30:58 -07:00
Jaegeuk Kim	b87078ad3a	Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer" This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function for stability. This reverts commit `fe76b796fc`. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-05-02 14:30:57 -07:00
Matthew Wilcox	b93b016313	page cache: use xa_lock Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [willy@infradead.org: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:39 -07:00
Chao Yu	780de47cf6	f2fs: don't track new nat entry in nat set Nat entry set is used only in checkpoint(), and during checkpoint() we won't flush new nat entry with unallocated address, so we don't need to add new nat entry into nat set, then nat_entry_set::entry_cnt can indicate actual entry count we need to flush in checkpoint(). Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-03-27 20:10:29 -07:00
Chao Yu	df033caf51	f2fs: clean up with F2FS_BLK_ALIGN Clean up F2FS_BYTES_TO_BLK(x + F2FS_BLKSIZE - 1) with F2FS_BLK_ALIGN(x). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-03-27 20:09:32 -07:00
Jaegeuk Kim	bb1105e479	f2fs: align memory boundary for bitops For example, in arm64, free_nid_bitmap should be aligned to word size in order to use bit operations. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-03-17 13:57:39 +09:00
Chao Yu	c56675750d	f2fs: remove unneeded set_cold_node() When setting COLD_BIT_SHIFT flag in node block, we only need to call set_cold_node() in new_node_page() and recover_inode_page() during node page initialization. So remove unneeded set_cold_node() in other places. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-03-17 13:57:33 +09:00
Chao Yu	2882d34310	f2fs: use GFP_F2FS_ZERO for cleanup Clean up codes with GFP_F2FS_ZERO, no logic changes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-25 10:39:49 -08:00
Chao Yu	db198ae0f8	f2fs: drop page cache after fs shutdown Don't remain dirtied page cache in f2fs after shutdown, it can mitigate memory pressure of whole system, in order to keep other modules working properly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-22 14:56:56 -08:00
Yufen Yu	578c647879	f2fs: implement cgroup writeback support Cgroup writeback requires explicit support from the filesystem. f2fs's data and node writeback IOs go through __write_data_page, which sets fio for submiting IOs. So, we add io_wbc for fio, associate bios with blkcg by invoking wbc_init_bio() and account IOs issuing by wbc_account_io(). In addtion, f2fs_fill_super() is updated to set SB_I_CGROUPWB. Meta writeback IOs is left alone by this patch and will always be attributed to the root cgroup. The results show that f2fs can throttle writeback nicely for data writing and file creating. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-16 15:40:01 -08:00
Yunlei He	1eca05aa9d	f2fs: update inode info to inode page for new file After checkpoint, 1. creat a new file A ,(with dirty inode && dirty inode page && xattr info) 2. backgroud wb write back file A inode page (without update from inode cache) 3. fsync file A, write back inode page of file A with inode cache info 4. sudden power off before new checkpoint In this case, recovery process will try to recover a zero inode page. Inline xattr flag of file A will be miss and xattr info will be taken as blkaddr index. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-16 15:39:56 -08:00
Yunlei He	c376fc0f35	f2fs: no need return value in restore summary process No need return value in restore summary process Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-02 19:27:30 -08:00
Chao Yu	628b3d1438	f2fs: inject fault to kvmalloc This patch supports to inject fault into kvmalloc/kvzalloc. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-02 19:27:29 -08:00
Chao Yu	acbf054d53	f2fs: inject fault to kzalloc This patch introduces f2fs_kzalloc based on f2fs_kmalloc in order to support error injection for kzalloc(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-02 19:27:29 -08:00
Zhikang Zhang	de8b10ac13	f2fs: remove repeated f2fs_bug_on f2fs: remove repeated f2fs_bug_on which has already existed in function invalidate_blocks. Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-02 19:27:28 -08:00
Sheng Yong	e17d488bce	f2fs: remove unused parameter Commit `d260081ccf` ("f2fs: change recovery policy of xattr node block") removes the use of blkaddr, which is no longer used. So remove the parameter. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-02 19:27:27 -08:00
LiFan	5921aaa185	f2fs: fix concurrent problem for updating free bitmap alloc_nid_failed and scan_nat_page can be called at the same time, and we haven't protected add_free_nid and update_free_nid_bitmap with the same nid_list_lock. That could lead to Thread A Thread B - __build_free_nids - scan_nat_page - add_free_nid - alloc_nid_failed - update_free_nid_bitmap - update_free_nid_bitmap scan_nat_page will clear the free bitmap since the nid is PREALLOC_NID, but alloc_nid_failed needs to set the free bitmap. This results in free nid with free bitmap cleared. This patch update the bitmap under the same nid_list_lock in add_free_nid. And use __GFP_NOFAIL to make sure to update status of free nid correctly. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-02 19:27:27 -08:00
Yunlei He	66e8336137	f2fs: no need to read nat block if nat_block_bitmap is set No need to read nat block if nat_block_bitmap is set. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-02 19:27:27 -08:00
Chao Yu	292c196a36	f2fs: reserve nid resource for quota sysfile During mkfs, quota sysfiles have already occupied nid resource, it needs to adjust remaining available nid count in kernel side. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-02 19:27:26 -08:00
Linus Torvalds	a02cd4229e	f2fs-for-4.15-rc1 In this round, we introduce sysfile-based quota support which is required for Android by default. In addition, we allow that users are able to reserve some blocks in runtime to mitigate performance drops in low free space. Enhancement - assign proper data segments according to write_hints given by user - issue cache_flush on dirty devices only among multiple devices - exploit cp_error flag and add more faults to enhance fault injection test - conduct more readaheads during f2fs_readdir - add a range for discard commands Bug fix - fix zero stat->st_blocks when inline_data is set - drop crypto key and free stale memory pointer while evict_inode is failing - fix some corner cases in free space and segment management - fix wrong last_disk_size This series includes lots of clean-ups and code enhancement in terms of xattr operations, discard/flush command control. In addition, it adds versatile debugfs entries to monitor f2fs status. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAloNCPAACgkQQBSofoJI UNLYmg/8DbDp/mTXqJ0AURo84Z4OQUOTRxYkWazx4ct2WPZp2+5HCWDDoM8AAtUn 1J6/t7cU3osjos+zWvpUREZq1SPbp5m0h818HBFFJ/YMBPXucdQcd6wpepniOR5J 5uKauVd7jd2pbAAL7hKyr+iBSLrJl816wsq34Ml8y8zkDSJe4wO5YsGDqzqyKf4N 8nxMavUgerb14I/qXPb3ljlYlfaNNRlCT649QGCG78gx5hPeiUtUJ2l5DKV2xPe7 v+5lZO93FFwW1siGy+Atq+nqQJyUkeiOYGPR1NPx9tfmaPO58iOIXLirfblKASZY HXJigVf50fQQBtwdBFL8ICSop6zV6gCKkNGZCHLzcYFWWL2TQwCIP3/iJdj9Wy+j +YUYyN0dyl2mmNEDZjRNX1V+QBW1k+msmvBCb0fT1GJTQAyRfA4XfBDyg94cpWQ1 9YivNywuzG8YtghY7gYU3lCfT2OG19nXCSdz4qYUb5SSwoeGtLahLxMV4mlil4Tg dOa8CPLFhJnCqB9ivI4L6SennBr+gNgL26SeZ3PF+B5KimYOTZxbenrll1kTi1xp uCU6UR1xJS0W7Cjk8sCIu5hXkJMJwPJ0hcVeTgsxMkujLGvSSRCGb2hmOeILfwRZ N4aGn+kVmwwgKaKjD/F4CY4b3yJLdTKMjjl74u5YaMQWe4Bq4qU= =c49T -----END PGP SIGNATURE----- Merge tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we introduce sysfile-based quota support which is required for Android by default. In addition, we allow that users are able to reserve some blocks in runtime to mitigate performance drops in low free space. Enhancements: - assign proper data segments according to write_hints given by user - issue cache_flush on dirty devices only among multiple devices - exploit cp_error flag and add more faults to enhance fault injection test - conduct more readaheads during f2fs_readdir - add a range for discard commands Bug fixes: - fix zero stat->st_blocks when inline_data is set - drop crypto key and free stale memory pointer while evict_inode is failing - fix some corner cases in free space and segment management - fix wrong last_disk_size This series includes lots of clean-ups and code enhancement in terms of xattr operations, discard/flush command control. In addition, it adds versatile debugfs entries to monitor f2fs status" * tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (75 commits) f2fs: deny accessing encryption policy if encryption is off f2fs: inject fault in inc_valid_node_count f2fs: fix to clear FI_NO_PREALLOC f2fs: expose quota information in debugfs f2fs: separate nat entry mem alloc from nat_tree_lock f2fs: validate before set/clear free nat bitmap f2fs: avoid opened loop codes in __add_ino_entry f2fs: apply write hints to select the type of segments for buffered write f2fs: introduce scan_curseg_cache for cleanup f2fs: optimize the way of traversing free_nid_bitmap f2fs: keep scanning until enough free nids are acquired f2fs: trace checkpoint reason in fsync() f2fs: keep isize once block is reserved cross EOF f2fs: avoid race in between GC and block exchange f2fs: save a multiplication for last_nid calculation f2fs: fix summary info corruption f2fs: remove dead code in update_meta_page f2fs: remove unneeded semicolon f2fs: don't bother with inode->i_version f2fs: check curseg space before foreground GC ...	2017-11-16 12:10:21 -08:00
Mel Gorman	8667982014	mm, pagevec: remove cold parameter for pagevecs Every pagevec_init user claims the pages being released are hot even in cases where it is unlikely the pages are hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-15 18:21:06 -08:00
Jan Kara	67fd707f46	mm: remove nr_pages argument from pagevec_lookup_{,range}_tag() All users of pagevec_lookup() and pagevec_lookup_range() now pass PAGEVEC_SIZE as a desired number of pages. Just drop the argument. Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-15 18:21:04 -08:00
Jan Kara	028a63a6e3	f2fs: simplify page iteration loops In several places we want to iterate over all tagged pages in a mapping. However the code was apparently copied from places that iterate only over a limited range and thus it checks for index <= end, optimizes the case where we are coming close to range end which is all pointless when end == ULONG_MAX. So just remove this dead code. [akpm@linux-foundation.org: fix warnings] Link: http://lkml.kernel.org/r/20171009151359.31984-7-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-15 18:21:03 -08:00
Yunlei He	12f9ef379a	f2fs: separate nat entry mem alloc from nat_tree_lock This patch splits memory allocation part in nat_entry to avoid lock contention. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-13 18:28:48 -08:00
LiFan	0dd99ca76f	f2fs: validate before set/clear free nat bitmap In flush_nat_entries, all dirty nats will be flushed and if their new address isn't NULL_ADDR, their bitmaps will be updated, the free_nid_count of the bitmaps will be increaced regardless of whether the nats have already been occupied before. This could lead to wrong free_nid_count. So this patch checks the status of the bits beforeactually set/clear them. Fixes: `586d1492f3` ("f2fs: skip scanning free nid bitmap of full NAT blocks") Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-10 17:35:07 -08:00
Chao Yu	2fbaa25fde	f2fs: introduce scan_curseg_cache for cleanup Commit `4ac912427c` ("f2fs: introduce free nid bitmap") copied codes from __build_free_nids() into scan_free_nid_bits(), they are redundant, introduce one common function scan_curseg_cache for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-09 10:12:26 -08:00
Fan Li	9745657449	f2fs: optimize the way of traversing free_nid_bitmap We call scan_free_nid_bits only when there isn't many free nids left, it means that marked bits in free_nid_bitmap are supposed to be few, use find_next_bit_le is more efficient in such case. According to my tests, use find_next_bit_le instead of test_bit_le will cut down the traversal time to one third of its original. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-09 09:43:09 -08:00
Fan Li	74986213ad	f2fs: keep scanning until enough free nids are acquired In current version, after scan_free_nid_bits, the scan is over if nid_cnt[FREE_NID] != 0. In most cases, there are still free nids in the free list during the scan, and scan_free_nid_bits usually can't increase nid_cnt[FREE_NID]. It causes that __build_free_nids is called many times without solving the shortage of the free nids. This patch fixes that. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-09 09:41:08 -08:00
Fan Li	f6986ede80	f2fs: save a multiplication for last_nid calculation Use a slightly easier way to calculate last_nid. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-05 16:42:09 -08:00
Fan Li	37a0ab2a3b	f2fs: optimize __update_nat_bits Make three modification for __update_nat_bits: 1. Take the codes of dealing the nat with nid 0 out of the loop Such nat only needs to be dealt with once at beginning. 2. Use " nat_index == 0" instead of " start_nid == 0" to decide if it's the first nat block It's better that we don't assume @start_nid is the first nid of the nat block it's in. 3. Use " if (nat_blk->entries[i].block_addr != NULL_ADDR)" to explicitly comfirm the value of block_addr use constant to make sure the codes is right, even if the value of NULL_ADDR changes. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-05 16:42:00 -08:00
Yunlei He	f15194fcfa	f2fs: modify for accurate fggc node io stat modify for accurate fggc node io stat Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-05 16:41:59 -08:00
Fan Li	a0761f63ea	f2fs: add a function to move nid This patch add a new function to move nid from one state to another. Move operation is heavily used, by adding a new function for it we can cut down some branches from several flow. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-05 16:41:58 -08:00
Chao Yu	01eccef793	f2fs: support get_page error injection This patch adds to support get_page error injection to simulate out-of-memory test scenario. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-05 16:41:54 -08:00
Chao Yu	6afc662e68	f2fs: support flexible inline xattr size Now, in product, more and more features based on file encryption were introduced, their demand of xattr space is increasing, however, inline xattr has fixed-size of 200 bytes, once inline xattr space is full, new increased xattr data would occupy additional xattr block which may bring us more space usage and performance regression during persisting. In order to resolve above issue, it's better to expand inline xattr size flexibly according to user's requirement. So this patch introduces new filesystem feature 'flexible inline xattr', and new mount option 'inline_xattr_size=%u', once mkfs enables the feature, we can use the option to make f2fs supporting flexible inline xattr size. To support this feature, we add extra attribute i_inline_xattr_size in inode layout, indicating that how many space inline xattr borrows from block address mapping space in inode layout, by this, we can easily locate and store flexible-sized inline xattr data in inode. Inode disk layout: +----------------------+ \| .i_mode \| \| ... \| \| .i_ext \| +----------------------+ \| .i_extra_isize \| \| .i_inline_xattr_size \|-----------+ \| ... \| \| +----------------------+ \| \| .i_addr \| \| \| - block address or \| \| \| - inline data \| \| +----------------------+<---+ v \| inline xattr \| +---inline xattr range +----------------------+<---+ \| .i_nid \| +----------------------+ \| node_footer \| \| (nid, ino, offset) \| +----------------------+ Note that, we have to cnosider backward compatibility which reserved inline_data space, 200 bytes, all the time, reported by Sheng Yong. Previous inline data or directory always reserved 200 bytes in inode layout, even if inline_xattr is disabled. In order to keep inline_dentry's structure for backward compatibility, we get the space back only from inline_data. Signed-off-by: Chao Yu <yuchao0@huawei.com> Reported-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-11-05 16:41:50 -08:00
Jaegeuk Kim	9c77f754f8	f2fs: remove obsolete pointer for truncate_xattr_node This patch removes obosolete parameter for truncate_xattr_node. Suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-10-26 10:44:22 +02:00
Jaegeuk Kim	57864ae5ce	f2fs: limit # of inmemory pages If some abnormal users try lots of atomic write operations, f2fs is able to produce pinned pages in the main memory which affects system performance. This patch limits that as 20% over total memory size, and if f2fs reaches to the limit, it will drop all the inmemory pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-10-26 10:44:21 +02:00
Chao Yu	39d787bec4	f2fs: enhance multiple device flush When multiple device feature is enabled, during ->fsync we will issue flush in all devices to make sure node/data of the file being persisted into storage. But some flushes of device could be unneeded as file's data may be not writebacked into those devices. So this patch adds and manage bitmap per inode in global cache to indicate which device is dirty and it needs to issue flush during ->fsync, hence, we could improve performance of fsync in scenario of multiple device. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-10-10 12:49:53 -07:00
Chao Yu	9a4ffdf558	f2fs: obsolete ALLOC_NID_LIST list As Fan Li reported, there is no user traversing nid_list[ALLOC_NID_LIST] which is used for tracking preallocated nids. Let's drop it, and only track preallocated nids in free_nid_root radix-tree. Reported-by: Fan Li <fanofcode.li@samsung.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-10-10 12:49:53 -07:00
Chao Yu	c1fe3e9814	Revert "f2fs: reuse nids more aggressively" Commit `2683446646` ("f2fs: reuse nids more aggressively") tries to reuse nids as many as possilbe, in order to mitigate producing obsolete node pages in page cache. But acutally, before we reuse the nids and related node page cache, we will always invalidate that node page, so there will be not any obsolete node pages in cache. Let's just revert previous commit, so that nm_i::next_scan_nid can be increased ascendingly, making __build_free_nids traverses all NAT pages more easily, finally, free nid bitmap cache can be enabled as soon as possible. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-10-10 12:49:52 -07:00
Jaegeuk Kim	ee60523499	f2fs: don't check inode's checksum if it was dirtied or writebacked If another thread already made the page dirtied or writebacked, we must avoid to verify checksum. If we got an error, we need to remove its uptodate as well. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-09-05 10:50:11 -07:00
Jaegeuk Kim	a298d57f96	f2fs: don't need to update inode checksum for recovery This patch fixes "f2fs: support inode checksum". The recovered inode page will be rewritten with valid checksum. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-09-05 10:49:37 -07:00
Jaegeuk Kim	adb6dc1971	f2fs: return error when accessing insane flie offset If file offset is insane, we have to return error instead of kernel panic. Reported-by: Eric Zhang <followme999@163.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-08-29 10:02:58 -07:00
Chao Yu	b0af6d491a	f2fs: add app/fs io stat This patch enables inner app/fs io stats and introduces below virtual fs nodes for exposing stats info: /sys/fs/f2fs/<dev>/iostat_enable /proc/fs/f2fs/<dev>/iostat_info Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix wrong stat assignment] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-08-09 21:43:58 -07:00
Chao Yu	704956ecf5	f2fs: support inode checksum This patch adds to support inode checksum in f2fs. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix verification flow] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-08-03 19:09:26 -07:00
Yunlong Song	401db79f61	f2fs: provide f2fs_balance_fs to __write_node_page Let node writeback also do f2fs_balance_fs to ensure there are always enough free segments. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-08-03 19:05:04 -07:00
Chao Yu	5c57132eaf	f2fs: support project quota This patch adds to support plain project quota. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-31 16:48:32 -07:00
Chao Yu	7a2af766af	f2fs: enhance on-disk inode structure scalability This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline to indicate that on-disk structure of current inode is extended. In order to extend, we changed the inode structure a bit: Original one: struct f2fs_inode { ... struct f2fs_extent i_ext; __le32 i_addr[DEF_ADDRS_PER_INODE]; __le32 i_nid[DEF_NIDS_PER_INODE]; } Extended one: struct f2fs_inode { ... struct f2fs_extent i_ext; union { struct { __le16 i_extra_isize; __le16 i_padding; __le32 i_extra_end[0]; }; __le32 i_addr[DEF_ADDRS_PER_INODE]; }; __le32 i_nid[DEF_NIDS_PER_INODE]; } Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of i_addr field for storing i_extra_isize and i_padding. with i_extra_isize, we can calculate actual size of reserved space in i_addr, available attribute fields included in total extra attribute fields for current inode can be described as below: +--------------------+ \| .i_mode \| \| ... \| \| .i_ext \| +--------------------+ \| .i_extra_isize \|-----+ \| .i_padding \| \| \| .i_prjid \| \| \| .i_atime_extra \| \| \| .i_ctime_extra \| \| \| .i_mtime_extra \|<----+ \| .i_inode_cs \|<----- store blkaddr/inline from here \| .i_xattr_cs \| \| ... \| +--------------------+ \| \| \| block address \| \| \| +--------------------+ \| .i_nid \| +--------------------+ \| node_footer \| \| (nid, ino, offset) \| +--------------------+ Hence, with this patch, we would enhance scalability of f2fs inode for storing more newly added attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-31 16:48:30 -07:00
Yunlei He	8790568255	f2fs: alloc new nids for xattr block in recovery recovery file A: recovery file B: -get_dnode_of_data -alloc_nid -recover_xattr_data -set_node_addr(sbi, &ni, NEW_ADDR, false); --->bug_on for nid has been used by file A In recovery process, new allocated node blocks may "reuse" xattr block nids, this patch alloc new nids for xattr blocks in recovery process to avoid this problem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-26 19:34:30 -07:00
Yunlei He	5f4ce6abc2	f2fs: remove unused input parameter This patch remove unused input parameter in function new_node_page. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Yong Sheng <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-26 19:34:30 -07:00
Linus Torvalds	5cdd4c0468	for-f2fs-4.13 In this round, we've added new features such as disk quota and statx, and modified internal bio management flow to merge more IOs depending on block types. We've also made internal threads freezeable for Android battery life. In addition to them, there are some patches to avoid lock contention as well as a couple of deadlock conditions. = Enhancement - support usrquota, grpquota, and statx - manage DATA/NODE typed bios separately to serialize more IOs - modify f2fs_lock_op/wio_mutex to avoid lock contention - prevent lock contention in migratepage = Bug fix - miss to load written inode flag - fix worst case victim selection in GC - freezeable GC and discard threads for Android battery life - sanitize f2fs metadata to deal with security hole - clean up sysfs-related code and docs -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAllj6fMACgkQQBSofoJI UNJ6Ng/+PqdGV/b6KroYIXI/scFx/1t87/0W+rY9tyLr1jX7nIHn9KLPjeDdvdlk 5vEeZ/dGfW8wSI+ESzscvKberG2QlOPwJRyTB4jWR+bLatwzg7YjEblz+RX4/wfJ jKjnR7M//gRdhHdqA0xXrqguAjPbcEDK2RiVbhioMjWbZ/77j0IjcRokjMYdEf0m cJc2oMXFtlo+DJ1h9/8BmwQPTI9FfVdgbkPFTTJzV0ydQnBdxcAigrzwYZhPOVv0 n2M1dKOiQewB4OADMuepZLFqJheItlgG9wlvEjGq7zTd5epHXRIqhM6h9GikQVb9 YKAkajlKfWcwEXaEcVXtsMHC9x69Yf8xxOSQ1VrhypSUNbaynC9LDsErJx6yrF3P XC5baiqXsd/btg7tfrHJjk3gI+ck97d6TrTfUVR91X+1Tpkz7cyB226WxFKbyOG3 EYCFVMbrIN2CaHHt1xWIT2zCfX5w9ycp8kFjY6jPi0OOZrKXpFw+1AwwTu9kn4xJ iuUc8pmc0/FyPqokmLef4Qp/RRM83+f+nzW/y//lkEf3nMn6qlHzNI1RAxXnBvGV DMXzuJDcJcHGcSDr7mWyKkm6gYcak/E4DdQLQqJ6VCt6KCdCEXP/XDlig5ey5ODY uGEr1QhXIpiYAON45HUi3gmytB3J3ZdzzpsG1PEco4+hjSuFhyE= =N4GZ -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added new features such as disk quota and statx, and modified internal bio management flow to merge more IOs depending on block types. We've also made internal threads freezeable for Android battery life. In addition to them, there are some patches to avoid lock contention as well as a couple of deadlock conditions. Enhancements: - support usrquota, grpquota, and statx - manage DATA/NODE typed bios separately to serialize more IOs - modify f2fs_lock_op/wio_mutex to avoid lock contention - prevent lock contention in migratepage Bug fixes: - fix missing load of written inode flag - fix worst case victim selection in GC - freezeable GC and discard threads for Android battery life - sanitize f2fs metadata to deal with security hole - clean up sysfs-related code and docs" * tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (59 commits) f2fs: support plain user/group quota f2fs: avoid deadlock caused by lock order of page and lock_op f2fs: use spin_{,un}lock_irq{save,restore} f2fs: relax migratepage for atomic written page f2fs: don't count inode block in in-memory inode.i_blocks Revert "f2fs: fix to clean previous mount option when remount_fs" f2fs: do not set LOST_PINO for renamed dir f2fs: do not set LOST_PINO for newly created dir f2fs: skip ->writepages for {mete,node}_inode during recovery f2fs: introduce __check_sit_bitmap f2fs: stop gc/discard thread in prior during umount f2fs: introduce reserved_blocks in sysfs f2fs: avoid redundant f2fs_flush after remount f2fs: report # of free inodes more precisely f2fs: add ioctl to do gc with target block address f2fs: don't need to check encrypted inode for partial truncation f2fs: measure inode.i_blocks as generic filesystem f2fs: set CP_TRIMMED_FLAG correctly f2fs: require key for truncate(2) of encrypted file f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c ...	2017-07-10 14:29:45 -07:00
Chao Yu	0abd675e97	f2fs: support plain user/group quota This patch adds to support plain user/group quota. Change Note by Jaegeuk Kim. - Use f2fs page cache for quota files in order to consider garbage collection. so, quota files are not tolerable for sudden power-cuts, so user needs to do quotacheck. - setattr() calls dquot_transfer which will transfer inode->i_blocks. We can't reclaim that during f2fs_evict_inode(). So, we need to count node blocks as well in order to match i_blocks with dquot's space. Note that, Chao wrote a patch to count inode->i_blocks without inode block. (f2fs: don't count inode block in in-memory inode.i_blocks) - in f2fs_remount, we need to make RW in prior to dquot_resume. - handle fault_injection case during f2fs_quota_off_umount - TODO: Project quota Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-08 23:12:27 -07:00
Chao Yu	000519f278	f2fs: don't count inode block in in-memory inode.i_blocks Previously, we count all inode consumed blocks including inode block, xattr block, index block, data block into i_blocks, for other generic filesystems, they won't count inode block into i_blocks, so for userspace applications or quota system, they may detect incorrect block count according to i_blocks value in inode. This patch changes to count all blocks into inode.i_blocks excluding inode block, for on-disk i_blocks, we keep counting inode block for backward compatibility. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-07 10:34:47 -07:00
Chao Yu	0771fcc71c	f2fs: skip ->writepages for {mete,node}_inode during recovery Skip ->writepages in prior to ->writepage for {meta,node}_inode during recovery, hence unneeded loop in ->writepages can be avoided. Moreover, check SBI_POR_DOING earlier while writebacking pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-07 10:34:44 -07:00
Chao Yu	0eb0adadf2	f2fs: measure inode.i_blocks as generic filesystem Both in memory or on disk, generic filesystems record i_blocks with 512bytes sized sector count, also VFS sub module such as disk quota follows this rule, but f2fs records it with 4096bytes sized block count, this difference leads to that once we use dquota's function which inc/dec iblocks, it will make i_blocks of f2fs being inconsistent between in memory and on disk. In order to resolve this issue, this patch changes to make in-memory i_blocks of f2fs recording sector count instead of block count, meanwhile leaving on-disk i_blocks recording block count. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-04 02:11:48 -07:00
Chao Yu	1f258ec13b	f2fs: fix to avoid panic when encountering corrupt node With fault_injection option, generic/361 of fstests will complain us with below message: Call Trace: get_node_page+0x12/0x20 [f2fs] f2fs_iget+0x92/0x7d0 [f2fs] f2fs_fill_super+0x10fb/0x15e0 [f2fs] mount_bdev+0x184/0x1c0 f2fs_mount+0x15/0x20 [f2fs] mount_fs+0x39/0x150 vfs_kern_mount+0x67/0x110 do_mount+0x1bb/0xc70 SyS_mount+0x83/0xd0 do_syscall_64+0x6e/0x160 entry_SYSCALL64_slow_path+0x25/0x25 Since mkfs loop device in f2fs partition can be failed silently due to checkpoint error injection, so root inode page can be corrupted, in order to avoid needless panic, in get_node_page, it's better to leave message and return error to caller, and let fsck repaire it later. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-04 02:11:39 -07:00
Chao Yu	febeca6d37	f2fs: don't track newly allocated nat entry in list We will never persist newly allocated nat entries during checkpoint(), so we don't need to track such nat entries in nat dirty list in order to avoid: - more latency during traversing dirty list; - sorting nat sets incorrectly due to recording wrong entry_cnt in nat entry set. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-07-04 02:11:38 -07:00
Hou Pengyang	bd80a4b981	f2fs: declare load_free_nid_bitmap static Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-05-23 21:07:16 -07:00
Jaegeuk Kim	b9109b0e49	f2fs: remove unnecessary read cases in merged IO flow Merged IO flow doesn't need to care about read IOs. f2fs_submit_merged_bio -> f2fs_submit_merged_write f2fs_submit_merged_bios -> f2fs_submit_merged_writes f2fs_submit_merged_bio_cond -> f2fs_submit_merged_write_cond Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-05-23 21:05:37 -07:00
Linus Torvalds	bf5f89463f	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...	2017-05-08 18:17:56 -07:00
Michal Hocko	a7c3e901a4	mm: introduce kv[mz]alloc helpers Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [sfr@canb.auug.org.au: f2fs fixup] Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part] Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-08 17:15:12 -07:00
Yunlei He	e9cdd30770	f2fs: fix a mount fail for wrong next_scan_nid -write_checkpoint -do_checkpoint -next_free_nid <--- something wrong with next free nid -f2fs_fill_super -build_node_manager -build_free_nids -get_current_nat_page -__get_meta_page <--- attempt to access beyond end of device Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-05-03 19:00:30 -07:00
Yunlei He	66a82d1fc7	f2fs: seperate read nat page from nat_tree_lock This patch seperate nat page read io from nat_tree_lock. -lock_page -get_node_info() -current_nat_addr ...... -> write_checkpoint -get_meta_page Because we lock node page, we can make sure no other threads modify this nid concurrently. So we just obtain current_nat_addr under nat_tree_lock, node info is always same in both nat pack. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-25 14:16:39 -07:00
Jaegeuk Kim	d29fd17218	f2fs: fix not to set fsync/dentry mark Otherwise, we can see stale fsync/dentry mark given by previous calls, resulting in giving up roll-forward recovery due to wrong dentry mark. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-12 12:57:09 -07:00
Tomohiro Kusumi	68afcf2d38	f2fs: guard macro variables with braces Add braces around variables used within macros for those make sense to do it. Many of the macros in f2fs already do this. What this commit doesn't do is anything that changes line# as a result of adding braces, which usually affects the binary via __LINE__. Confirmed no diff in fs/f2fs/f2fs.ko before/after this commit on x86_64, to make sure this has no functional change as well as there's been no unexpected side effect due to callers' arithmetics within the existing code. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:10 -07:00
Yunlei He	59c9081bc8	f2fs: allow write page cache when writting cp This patch allow write data to normal file when writting new checkpoint. We relax three limitations for write_begin path: 1. data allocation 2. node allocation 3. variables in checkpoint Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-25 00:19:37 -07:00
Chao Yu	30a61ddf81	f2fs: fix race condition in between free nid allocator/initializer In below concurrent case, allocated nid can be loaded into free nid cache and be allocated again. Thread A Thread B - f2fs_create - f2fs_new_inode - alloc_nid - __insert_nid_to_list(ALLOC_NID_LIST) - f2fs_balance_fs_bg - build_free_nids - __build_free_nids - scan_nat_page - add_free_nid - __lookup_nat_cache - f2fs_add_link - init_inode_metadata - new_inode_page - new_node_page - set_node_addr - alloc_nid_done - __remove_nid_from_list(ALLOC_NID_LIST) - __insert_nid_to_list(FREE_NID_LIST) This patch makes nat cache lookup and free nid list operation being atomical to avoid this race condition. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:50 -04:00
Kinglong Mee	8f73cbb7d4	f2fs: more reasonable mem_size calculating of ino_entry Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:37 -04:00
Chao Yu	346fe752c4	f2fs: cover update_free_nid_bitmap with nid_list_lock free_nid_bitmap and free_nid_count in update_free_nid_bitmap should be updated atomically, use nid_list_lock cover them to avoid race in concurrent scenario. Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:32 -04:00
Kinglong Mee	0b28b71e29	f2fs: drop duplicate radix tree lookup of nat_entry_set The nat entry is listed from the set list for freeing, it's duplicate to do radix tree lookup again. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> [Jaegeuk Kim: remove unnecessary f2fs_bug_on] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:20 -04:00
Chao Yu	7041d5d286	f2fs: combine nat_bits and free_nid_bitmap cache Both nat_bits cache and free_nid_bitmap cache provide same functionality as a intermediate cache between free nid cache and disk, but with different granularity of indicating free nid range, and different persistence policy. nat_bits cache provides better persistence ability, and free_nid_bitmap provides better granularity. In this patch we combine advantage of both caches, so finally policy of the intermediate cache would be: - init: load free nid status from nat_bits into free_nid_bitmap - lookup: scan free_nid_bitmap before load NAT blocks - update: update free_nid_bitmap in real-time - persistence: udpate and persist nat_bits in checkpoint This patch also resolves performance regression reported by lkp-robot. commit: `4ac912427c` ("f2fs: introduce free nid bitmap") d00030cf9cd0bb96fdccc41e33d3c91dcbb672ba ("f2fs: use __set{__clear}_bit_le") 1382c0f3f9d3f936c8bc42ed1591cf7a593ef9f7 ("f2fs: combine nat_bits and free_nid_bitmap cache") `4ac912427c` d00030cf9cd0bb96fdccc41e33 1382c0f3f9d3f936c8bc42ed15 ---------------- -------------------------- -------------------------- %stddev %change %stddev %change %stddev \ \| \ \| \ 77863 ± 0% +2.1% 79485 ± 1% +50.8% 117404 ± 0% aim7.jobs-per-min 231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time 231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time.max 896604 ± 0% -0.8% 889221 ± 3% -20.2% 715260 ± 1% aim7.time.involuntary_context_switches 2394 ± 1% +4.6% 2503 ± 1% +3.7% 2481 ± 2% aim7.time.maximum_resident_set_size 6240 ± 0% -1.5% 6145 ± 1% -14.1% 5360 ± 1% aim7.time.system_time 1111357 ± 3% +1.9% 1132509 ± 2% -6.2% 1041932 ± 2% aim7.time.voluntary_context_switches ... Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:18 -04:00
Chao Yu	586d1492f3	f2fs: skip scanning free nid bitmap of full NAT blocks This patch adds to account free nids for each NAT blocks, and while scanning all free nid bitmap, do check count and skip lookuping in full NAT block. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:17 -04:00
Jaegeuk Kim	23380b8568	f2fs: use __set{__clear}_bit_le This patch uses __set{__clear}_bit_le for highter speed. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:16 -04:00
Jaegeuk Kim	9f7e4a2c49	f2fs: declare static functions This is to avoid build warning reported by kbuild test robot. Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:15 -04:00
Jaegeuk Kim	900f736251	f2fs: avoid to flush nat journal entries This patch adds a missing condition which flushes nat journal entries unnecessarily introduced by: f2fs: add bitmaps for empty or full NAT blocks Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-28 09:57:13 -08:00
Kinglong Mee	f0cdbfe6ef	f2fs: use MAX_FREE_NIDS for the free nids target F2FS has define MAX_FREE_NIDS for maximum of cached free nids target. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:48 -08:00
Chao Yu	4ac912427c	f2fs: introduce free nid bitmap In scenario of intensively node allocation, free nids will be ran out soon, then it needs to stop to load free nids by traversing NAT blocks, in worse case, if NAT blocks does not be cached in memory, it generates IOs which slows down our foreground operations. In order to speed up node allocation, in this patch we introduce a new free_nid_bitmap array, so there is an bitmap table for each NAT block, Once the NAT block is loaded, related bitmap cache will be switched on, and bitmap will be set during traversing nat entries in NAT block, later we can query and update nid usage status in memory completely. With such implementation, I expect performance of node allocation can be improved in the long-term after filesystem image is mounted. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:47 -08:00
Kinglong Mee	ced2c7ea8e	f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint There are four places that getting the crc value in f2fs_checkpoint, just add a new helper cur_cp_crc for them. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:47 -08:00
Chao Yu	55523519bc	f2fs: show simple call stack in fault injection message Previously kernel message can show that in which function we do the injection, but unfortunately, most of the caller are the same, for tracking more information of injection path, it needs to show upper caller's name. This patch supports that ability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:55 -08:00
Jaegeuk Kim	22ad0b6ab4	f2fs: add bitmaps for empty or full NAT blocks This patches adds bitmaps to represent empty or full NAT blocks containing free nid entries. If we can find valid crc\|cp_ver in the last block of checkpoint pack, we'll use these bitmaps when building free nids. In order to avoid checkpointing burden, up-to-date bitmaps will be flushed only during umount time. So, normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs which recovers this bitmap again. After this patch, we build free nids from nid #0 at mount time to make more full NAT blocks, but in runtime, we check empty NAT blocks to load free nids without loading any NAT pages from disk. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:54 -08:00
Jaegeuk Kim	25cc5d3b9d	f2fs: avoid reading NAT page by get_node_info We've not seen this buggy case for a long time, so it's time to avoid this unnecessary get_node_info() call which reading NAT page to cache nat entry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:21:47 -08:00
Chao Yu	d260081ccf	f2fs: change recovery policy of xattr node block Currently, if we call fsync after updating the xattr date belongs to the file, f2fs needs to trigger checkpoint to keep xattr data consistent. But, this policy cause low performance as checkpoint will block most foreground operations and cause unneeded and unrelated IOs around checkpoint. This patch will reuse regular file recovery policy for xattr node block, so, we change to write xattr node block tagged with fsync flag to warm area instead of cold area, and during recovery, we search warm node chain for fsynced xattr block, and do the recovery. So, for below application IO pattern, performance can be improved obviously: - touch file - create/update/delete xattr entry in file - fsync file Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:52 -08:00
Jaegeuk Kim	942fd3192f	f2fs: check last page index in cached bio to decide submission If the cached bio has the last page's index, then we need to submit it. Otherwise, we don't need to submit it and can wait for further IO merges. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:48 -08:00
Jaegeuk Kim	d68f735b3b	f2fs: check io submission more precisely This patch check IO submission more precisely than previous rough check. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:47 -08:00
Jaegeuk Kim	e7c75ab099	f2fs: avoid out-of-order execution of atomic writes We need to flush data writes before flushing last node block writes by using FUA with PREFLUSH. We don't need to guarantee precedent node writes since if those are not written, we can't reach to the last node block when scanning node block chain during roll-forward recovery. Afterwards f2fs_wait_on_page_writeback guarantees all the IO submission to disk, which builds a valid node block chain. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:35 -08:00
Jaegeuk Kim	faa24895ac	f2fs: move write_node_page above fsync_node_pages This patch just moves write_node_page and introduces an inner function. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:09:43 -08:00
Chao Yu	599a09b2c1	f2fs: check in-memory nat version bitmap This patch adds a mirror for nat version bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:49 -08:00
Chao Yu	5c9e418436	f2fs: don't cache nat entry if out of memory If we run out of memory, in cache_nat_entry, it's better to avoid loop for allocating memory to cache nat entry, so in low memory scenario, for read path of node block, I expect this can avoid unneeded latency. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:00 +09:00
Linus Torvalds	09cb6464fe	for-f2fs-4.10 This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYTx44AAoJEEAUqH6CSFDSnAQP/jeYJq5Zd0bweEF5g00Ec1Qg qNKQ57e9EHDRaDLBUmHHEaCEPRL0bw6SOUUWWqzGA07KcsIK+Yb/dGAyIcuV7WMl PjntVbYm4yARDYBHGupdOCzFSkzr8gDalb+98jJnoGUonsftljhES9jedQ1NjAms GFPHDNtirZM/r0bjKkYKjpqJ6FCxFxcGPfb/GtohDajIpohWfKZiemaXGTgtYR4d iBVek16h+Hprz90ycZBY69uz0TdAwu/gb+htMVBrAdExHWvlFzgp35OIywiAB/YX 3QD/x4t2HqOBaNYiiOAY4ukVW/Yyqa/ZAzbm+m5B5CAcFYiWXMy+cMXUY9HJJ/K0 wdvi//Avtvgpp2PVZFn2pASx14vgMFylBzuNgKpP6MPdtWTEL33jT7VYs9Nuz45E dgZ9IpiDt4DeTRuZ4mPO5iH7bVHPvAVV80bpXzirCCzDeNZ1EFFIQzXh/2UAmCxI twPXGBIYul0aIl9JkWAyhCZSd3XDSqedpfPudknjhzM9Xb1H5X0QJco7f/UwsWXH WxV6lHr1Q7UH96wJ7x/GAqj8ArOAASRV18+K51dqU+DWHnFPpBArJe39FVf8NGWs Fz1ZmlWBQ0ZgzvLkGa80llhjalXIEy/JabMrpy6VrzQGxHdmW4cVxe4dJ3710WxX VysJUcNMRKxMUTWOKsxp =Boum -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well" * tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits) f2fs: fix a missing size change in f2fs_setattr f2fs: fix to access nullified flush_cmd_control pointer f2fs: free meta pages if sanity check for ckpt is failed f2fs: detect wrong layout f2fs: call sync_fs when f2fs is idle Revert "f2fs: use percpu_counter for # of dirty pages in inode" f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage f2fs: do not activate auto_recovery for fallocated i_size f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack f2fs: fix 32-bit build f2fs: set ->owner for debugfs status file's file_operations f2fs: fix incorrect free inode count in ->statfs f2fs: drop duplicate header timer.h f2fs: fix wrong AUTO_RECOVER condition f2fs: do not recover i_size if it's valid f2fs: fix fdatasync f2fs: fix to account total free nid correctly f2fs: fix an infinite loop when flush nodes in cp f2fs: don't wait writeback for datas during checkpoint f2fs: fix wrong written_valid_blocks counting ...	2016-12-14 09:07:36 -08:00
Chao Yu	04d47e6738	f2fs: fix to account total free nid correctly Thread A Thread B Thread C - f2fs_create - f2fs_new_inode - f2fs_lock_op - alloc_nid alloc last nid - f2fs_unlock_op - f2fs_create - f2fs_new_inode - f2fs_lock_op - alloc_nid as node count still not be increased, we will loop in alloc_nid - f2fs_write_node_pages - f2fs_balance_fs_bg - f2fs_sync_fs - write_checkpoint - block_operations - f2fs_lock_all - f2fs_lock_op While creating new inode, we do not allocate and account nid atomically, so that when there is almost no free nids left, we may encounter deadloop like above stack. In order to avoid that, reuse nm_i::available_nids for accounting free nids and make nid allocation and counting being atomical during node creation. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:01 -08:00
Yunlei He	d40a43af0a	f2fs: fix an infinite loop when flush nodes in cp Thread A Thread B - write_checkpoint - block_operations -blk_start_plug -sync_node_pages - f2fs_do_sync_file - fsync_node_pages - f2fs_wait_on_page_writeback Thread A wait for global F2FS_DIRTY_NODES decreased to zero, it start a plug list, some requests have been added to this list. Thread B lock one dirty node page, and wait this page write back. But this page has been in plug list of thread A with PG_writeback flag. Thread A keep on running and its plug list has no chance to finish, so it seems a deadlock between cp and fsync path. This patch add a wait on page write back before set node page dirty to avoid this problem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Pengyang Hou <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:00 -08:00
Jaegeuk Kim	664ba972df	f2fs: use BIO_MAX_PAGES for bio allocation We don't need to allocate bio partially in order to maximize sequential writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:07 -08:00
Jaegeuk Kim	3e7b5bbbef	f2fs: declare static function for __build_free_nids This patch avoids build warning. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:06 -08:00
Chao Yu	3a2ad5672b	f2fs: don't interrupt free nids building during nid allocation Let build_free_nids support sync/async methods, in allocation flow of nids, we use synchronuous method, so that we can avoid looping in alloc_nid when free memory is low; in unblock_operations and f2fs_balance_fs_bg we use asynchronuous method in where low memory condition can interrupt us. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:02 -08:00
Jaegeuk Kim	eb0aa4b807	f2fs: clean up free nid list operations This patch cleans up to use consistent free nid list ops. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:01 -08:00
Chao Yu	b8559dc242	f2fs: split free nid list During free nid allocation, in order to do preallocation, we will tag free nid entry as allocated one and still leave it in free nid list, for other allocators who want to grab free nids, it needs to traverse the free nid list for lookup. It becomes overhead in scenario of allocating free nid intensively by multithreads. This patch splits free nid list to two list: {free,alloc}_nid_list, to keep free nids and preallocated free nids separately, after that, traverse latency will be gone, besides split nid_cnt for separate statistic. Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:00 -08:00
Eric Biggers	0c0b471e43	f2fs: fix sparse warnings f2fs contained a number of endianness conversion bugs. Also, one function should have been 'static'. Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/' Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:57 -08:00
Chao Yu	9de6927975	f2fs: fix error handling in fsync_node_pages In fsync_node_pages, if f2fs was taged with CP_ERROR_FLAG, make sure bio cache was flushed before return. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:56 -08:00
Chao Yu	933439c8f3	f2fs: give a chance to detach from dirty list If there is no dirty pages in inode, we should give a chance to detach the inode from global dirty list, otherwise it needs to call another unnecessary .writepages for detaching. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:51 -08:00
Chao Yu	2411cf5bef	f2fs: exclude free nids building and allocation During nid allocation, it needs to exclude building and allocating flow of free nids, this is because while building free nid cache, there are two steps: a) load free nids from unused nat entries in NAT pages, b) update free nid cache by checking nat journal. The two steps should be atomical, otherwise an used nid can be allocated as free one after a) and before b). This patch adds missing lock which covers build_free_nids in unlock_operation and f2fs_balance_fs_bg to avoid that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:49 -08:00
Jens Axboe	7637241e65	writeback: add wbc_to_write_flags() Add wbc_to_write_flags(), which returns the write modifier flags to use, based on a struct writeback_control. No functional changes in this patch, but it prepares us for factoring other wbc fields for write type. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-11-02 10:24:03 -06:00
Christoph Hellwig	70fd76140a	block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-01 09:43:26 -06:00
Linus Torvalds	abb5a14fa2	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...	2016-10-10 13:04:49 -07:00
Chao Yu	3f5f4959b1	f2fs: fix to commit bio cache after flushing node pages In sync_node_pages, we won't check and commit last merged pages in private bio cache of f2fs, as these pages were taged as writeback, someone who is waiting for writebacking of the page will be blocked until the cache was committed by someone else. We need to commit node type bio cache to avoid potential deadlock or long delay of waiting writeback. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-30 17:34:38 -07:00
Chao Yu	1ecc0c5c50	f2fs: support configuring fault injection per superblock Previously, we only support global fault injection configuration, so that when we configure type/rate of fault injection through sysfs, mount option, it will influence all f2fs partition which is being used. It is not make sence, since it will be not convenient if developer want to test separated partitions with different fault injection rate/type simultaneously, also it's not possible to enable fault injection in one partition and disable fault injection in other one. >From now on, we move global configuration of fault injection in module into per-superblock, hence injection testing can be more flexible. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-30 17:34:31 -07:00
Weichao Guo	5b7a487cf3	f2fs: add customized migrate_page callback This patch improves the migration of dirty pages and allows migrating atomic written pages that F2FS uses in Page Cache. Instead of the fallback releasing page path, it provides better performance for memory compaction, CMA and other users of memory page migrating. For dirty pages, there is no need to write back first when migrating. For an atomic written page before committing, we can migrate the page and update the related 'inmem_pages' list at the same time. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix some coding style] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-30 17:34:23 -07:00
Miklos Szeredi	280db3c88c	f2fs: use filemap_check_errors() Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-16 12:44:21 +02:00
Jaegeuk Kim	e8ea9b3d7e	f2fs: avoid ENOMEM during roll-forward recovery This patch gives another chances during roll-forward recovery regarding to -ENOMEM. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-13 13:02:29 -07:00
Chao Yu	5f8eaf1f9b	f2fs: remove redundant judgement condition in available_free_memory In available_free_memory, there are two same judgement conditions which is used for checking NAT excess, remove one of them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-08-29 18:31:12 -07:00
Jaegeuk Kim	b873b798af	Revert "f2fs: use percpu_rw_semaphore" LKP reported -36.3% regression of fsmark.files_per_sec due to this patch. I've confirmed that fxmark [1] has also slight regression for DWAL. [1] https://github.com/sslab-gatech/fxmark This reverts commit `ec795418c4`.	2016-08-19 11:15:08 +09:00
Linus Torvalds	4fc29c1aa3	The major change in this version is mitigating cpu overheads on write paths by replacing redundant inode page updates with mark_inode_dirty calls. And we tried to reduce lock contentions as well to improve filesystem scalability. Other feature is setting F2FS automatically when detecting host-managed SMR. = Enhancement = - ioctl to move a range of data between files - inject orphan inode errors - avoid flush commands congestion - support lazytime = Bug fixes = - return proper results for some dentry operations - fix deadlock in add_link failure - disable extent_cache for fcollapse/finsert -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXmDJFAAoJEEAUqH6CSFDSJeYP/0ru8+5/ui5VTCdNPQB9KxYD DIUaDGpeoLvmn3ZdrMEdyNr6kWbgjCE9JjOGPQ7l1/apErOGVPyaBwflKcCDwloU pAlEqVM1Q9j4qH4i9SWTlvPtsHBHB7G7YSe3vDB9fJGSTqumubIlnaBm+Wfjx31U p53WcPn9LpOyzfmvZf2tOHmvZ7bWLkE/a07x9kPC6XHUFb9C17jLRFFGeuhZQHv1 Yo7HgokBnPExa8TnEILYyX/x+eecFS/1Cp/cN0STsebSu8pStTHTcAP7qEpKQB88 Cc51Lf+d5gFeydxKDFxwdH3VWOGIr9Ppako+lHW83gJcHP0zw8zdxULab+HJMa4n MOByRRiafwu1sL0dl7TCfsYNIHdEnXhWbhcRhMVZbb5C2Q6+Htuac8ZrKSOWExNN DUqRkzeTib9u+cHxUTFFPgOGdUjDLmg3XHU7mvb+2hViluVjIImC4tqD5XPpv7vt WnaDJxLCGD/6DF2yhiVY9NysuxInLTNFFCF06LworZ4L24hlg5TvN0UeUNRO9954 ux6f+lSORCzV3TmrsHP5vwjSAW26FviPXV1q1HHJeTpWKMlhsZtHmOAJOtZKKmxP WFnHT0aiWF+sQf4qfxVQL+lLqtgRKJAI9zqGRyfDJWJp5aXdRuVsZs9pWNQF7lCo 5gVnCYk3ULjXG3b23j2S =tKTR -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The major change in this version is mitigating cpu overheads on write paths by replacing redundant inode page updates with mark_inode_dirty calls. And we tried to reduce lock contentions as well to improve filesystem scalability. Other feature is setting F2FS automatically when detecting host-managed SMR. Enhancements: - ioctl to move a range of data between files - inject orphan inode errors - avoid flush commands congestion - support lazytime Bug fixes: - return proper results for some dentry operations - fix deadlock in add_link failure - disable extent_cache for fcollapse/finsert" * tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits) f2fs: clean up coding style and redundancy f2fs: get victim segment again after new cp f2fs: handle error case with f2fs_bug_on f2fs: avoid data race when deciding checkpoin in f2fs_sync_file f2fs: support an ioctl to move a range of data blocks f2fs: fix to report error number of f2fs_find_entry f2fs: avoid memory allocation failure due to a long length f2fs: reset default idle interval value f2fs: use blk_plug in all the possible paths f2fs: fix to avoid data update racing between GC and DIO f2fs: add maximum prefree segments f2fs: disable extent_cache for fcollapse/finsert inodes f2fs: refactor __exchange_data_block for speed up f2fs: fix ERR_PTR returned by bio f2fs: avoid mark_inode_dirty f2fs: move i_size_write in f2fs_write_end f2fs: fix to avoid redundant discard during fstrim f2fs: avoid mismatching block range for discard f2fs: fix incorrect f_bfree calculation in ->statfs f2fs: use percpu_rw_semaphore ...	2016-07-27 10:36:31 -07:00
Christoph Hellwig	70246286e9	block: get rid of bio_rw and READA These two are confusing leftover of the old world order, combining values of the REQ_OP_ and REQ_ namespaces. For callers that don't special case we mostly just replace bi_rw with bio_data_dir or op_is_write, except for the few cases where a switch over the REQ_OP_ values makes more sense. Any check for READA is replaced with an explicit check for REQ_RAHEAD. Also remove the READA alias for REQ_RAHEAD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-20 17:37:01 -06:00
Jaegeuk Kim	9dfa1baff7	f2fs: use blk_plug in all the possible paths This patch reverts `19a5f5e2ef` (f2fs: drop any block plugging), and adds blk_plug in write paths additionally. The main reason is that blk_start_plug can be used to wake up from low-power mode before submitting further bios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-15 15:21:23 -07:00
Jaegeuk Kim	0a2aa8fbb9	f2fs: refactor __exchange_data_block for speed up This reduces the elapsed time to do xfstests/generic/017. Before: 715 s After: 458 s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-15 15:21:19 -07:00
Jaegeuk Kim	ec795418c4	f2fs: use percpu_rw_semaphore This patch replaces rw_semaphore with percpu_rw_semaphore for: sbi->cp_rwsem nm_i->nat_tree_lock Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-08 10:33:31 -07:00
Jaegeuk Kim	3bdad3c7ee	f2fs: skip to check the block address of node page If the node page is up-to-date, it should be alive. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-08 10:33:31 -07:00
Jaegeuk Kim	237c0790e5	f2fs: call SetPageUptodate if needed SetPageUptodate() issues memory barrier, resulting in performance degrdation. Let's avoid that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-08 10:33:29 -07:00
Jaegeuk Kim	fe76b796fc	f2fs: introduce f2fs_set_page_dirty_nobuffer This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer. When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the overall performance. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-08 10:33:28 -07:00
Chao Yu	1563ac75e7	f2fs: fix to detect truncation prior rather than EIO during read In procedure of synchonized read, after sending out the read request, reader will try to lock the page for waiting device to finish the read jobs and unlock the page, but meanwhile, truncater will race with reader, so after reader get lock of the page, it should check page's mapping to detect whether someone has truncated the page in advance, then reader has the chance to do the retry if truncation was done, otherwise read can be failed due to previous condition check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-08 10:33:25 -07:00
Jaegeuk Kim	ad4edb8314	f2fs: produce more nids and reduce readahead nats The readahead nat pages are more likely to be reclaimed quickly, so it'd better to gather more free nids in advance. And, let's keep some free nids as much as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-06 10:44:08 -07:00
Mike Christie	04d328defd	f2fs: use bio op accessors Separate the op from the rq_flag_bits and have f2fs set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Jaegeuk Kim	e589c2c477	f2fs: control not to exceed # of cached nat entries This is to avoid cache entry management overhead including radix tree. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-07 10:18:08 -07:00
Yunlong Song	0c9df7fb80	f2fs: return the errno to the caller to avoid using a wrong page Commit `aaf9607516` ("f2fs: check node page contents all the time") pointed out that "sometimes it was reported that its contents was missing", so it checks the page's mapping and contents. When "nid != nid_of_node(page)", ERR_PTR(-EIO) will be returned to the caller. However, commit `e1c51b9f1d` ("f2fs: clean up node page updating flow") moves "nid != nid_of_node(page)" test to "f2fs_bug_on(sbi, nid != nid_of_node(page))", this will return a wrong page to the caller when F2FS_CHECK_FS is off when "sometimes it was reported that its contents was missing" happens. This patch restores to check node page contents all the time, and returns the errno to make the caller known something is wrong and avoid to use the page. This patch also moves f2fs_bug_on to its proper location. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-02 18:05:22 -07:00
Jaegeuk Kim	26de9b1171	f2fs: avoid unnecessary updating inode during fsync If roll-forward recovery can recover i_size, we don't need to update inode's metadata during fsync. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-02 18:05:13 -07:00
Jaegeuk Kim	ee6d182f2a	f2fs: remove syncing inode page in all the cases This patch reduces to call them across the whole tree. - sync_inode_page() - update_inode_page() - update_inode() - f2fs_write_inode() Instead, checkpoint will flush all the dirty inode metadata before syncing node pages. Note that, this is doable, since we call mark_inode_dirty_sync() for all inode's field change which needs to update on-disk inode as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-02 18:05:12 -07:00
Jaegeuk Kim	0f18b462b2	f2fs: flush inode metadata when checkpoint is doing This patch registers all the inodes which have dirty metadata to sync when checkpoint is doing. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-02 18:05:11 -07:00
Jaegeuk Kim	205b98221c	f2fs: call mark_inode_dirty_sync for i_field changes This patch calls mark_inode_dirty_sync() for the following on-disk inode changes. -> largest -> ctime/mtime/atime -> i_current_depth -> i_xattr_nid -> i_pino -> i_advise -> i_flags -> i_mode Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-02 18:05:11 -07:00
Jaegeuk Kim	91942321e4	f2fs: use inode pointer for {set, clear}_inode_flag This patch refactors to use inode pointer for set_inode_flag and clear_inode_flag. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-02 18:05:07 -07:00
Chao Yu	0f3311a8c2	f2fs: fix to update dirty page count correctly Once we failed to merge inline data into inode page during flushing inline inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page count incorrect, result in panic in ->evict_inode, Fix it. ------------[ cut here ]------------ kernel BUG at /home/yuchao/git/devf2fs/inode.c:336! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 3 PID: 10004 Comm: umount Tainted: G O 4.6.0-rc5+ #17 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: f0c33000 ti: c5212000 task.ti: c5212000 EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3 EIP is at f2fs_evict_inode+0x85/0x490 [f2fs] EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000 ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: `c5213e78` DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0 Stack: c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0 Call Trace: [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50 [<c1204a68>] evict+0xa8/0x170 [<c1204b64>] dispose_list+0x34/0x50 [<c120588d>] evict_inodes+0x10d/0x130 [<c11ea941>] generic_shutdown_super+0x41/0xe0 [<c1185190>] ? unregister_shrinker+0x40/0x50 [<c1185190>] ? unregister_shrinker+0x40/0x50 [<c11eac52>] kill_block_super+0x22/0x70 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs] [<c11eae1d>] deactivate_locked_super+0x3d/0x70 [<c11eb383>] deactivate_super+0x43/0x60 [<c1208ec9>] cleanup_mnt+0x39/0x80 [<c1208f50>] __cleanup_mnt+0x10/0x20 [<c107d091>] task_work_run+0x71/0x90 [<c105725a>] exit_to_usermode_loop+0x72/0x9e [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0 [<c176dd48>] sysenter_past_esp+0x45/0x74 EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78 ---[ end trace d30536330b7fdc58 ]--- Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-05-20 14:55:41 -07:00
Jaegeuk Kim	79344efb93	f2fs: read node blocks ahead when truncating blocks This patch enables reading node blocks in advance when truncating large data blocks. > time rm $MNT/testfile (500GB) after drop_cachees Before : 9.422 s After : 4.821 s Reported-by: Stephen Bates <stephen.bates@microsemi.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-05-07 10:44:56 -07:00
Jaegeuk Kim	fb58ae2206	f2fs: remove an obsolete variable This patch removes an obsolete variable used in add_free_nid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-05-07 10:32:31 -07:00
Jaegeuk Kim	cb78942b82	f2fs: inject ENOSPC failures This patch injects ENOSPC failures. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-05-07 10:32:24 -07:00
Jaegeuk Kim	300e129c15	f2fs: use f2fs_grab_cache_page instead of grab_cache_page This patch converts grab_cache_page to f2fs_grab_cache_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-05-07 10:32:21 -07:00
Chao Yu	da011cc0da	f2fs: move node pages only in victim section during GC For foreground GC, we cache node blocks in victim section and set them dirty, then we call sync_node_pages to flush these node pages, but meanwhile, those node pages which does not locate in victim section will be flushed together, so more bandwidth and continuous free space would be occupied. So for this condition, it's better to leave those unrelated node page in cache for further write hit, and let CP or VM to flush them afterward. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-27 14:10:42 -07:00
Jaegeuk Kim	608514deba	f2fs: set fsync mark only for the last dnode In order to give atomic writes, we should consider power failure during sync_node_pages in fsync. So, this patch marks fsync flag only in the last dnode block. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-26 14:24:59 -07:00
Jaegeuk Kim	c267ec1526	f2fs: report unwritten status in fsync_node_pages The fsync_node_pages should return pass or failure so that user could know fsync is completed or not. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-26 14:24:54 -07:00
Jaegeuk Kim	5268137564	f2fs: split sync_node_pages with fsync_node_pages This patch splits the existing sync_node_pages into (f)sync_node_pages. The fsync_node_pages is used for f2fs_sync_file only. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-26 14:24:48 -07:00
Jaegeuk Kim	eca76e783c	f2fs: avoid needless lock for node pages when fsyncing a file When fsync is called, sync_node_pages finds a proper direct node pages to flush. But, it locks unrelated direct node pages together unnecessarily. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-26 14:24:30 -07:00
Jaegeuk Kim	ff37355886	f2fs: add BUG_ON to avoid unnecessary flow This patch adds BUG_ON instead of retrying loop. In the case of node pages, we already got this inode page, but unlocked it. By the fact that we don't truncate any node pages in operations, the page's mapping should be unchangeable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-15 08:49:47 -07:00
Jaegeuk Kim	4a6de50d54	f2fs: use PGP_LOCK to check its truncation Previously, after trylock_page is succeeded, it doesn't check its mapping. In order to fix that, we can just give PGP_LOCK to pagecache_get_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-15 08:49:47 -07:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Jaegeuk Kim	12bb0a8fd4	f2fs: submit node page write bios when really required If many threads calls fsync with data writes, we don't need to flush every bios having node page writes. The f2fs_wait_on_page_writeback will flush its bios when the page is really needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:47 -07:00
Jaegeuk Kim	17a0ee552c	f2fs: declare static functions Just to avoid sparse warnings. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:44 -07:00
Fan Li	999270de31	f2fs: modify the readahead method in ra_node_page() ra_node_page() is used to read ahead one node page. Comparing to regular read, it's faster because it doesn't wait for IO completion. But if it is called twice for reading the same block, and the IO request from the first call hasn't been completed before the second call, the second call will have to wait until the read is over. Here use the code in __do_page_cache_readahead() to solve this problem. It does nothing when someone else already puts the page in mapping. The status of page should be assured by whoever puts it there. This implement also prevents alteration of page reference count. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:43 -07:00
Chao Yu	19c7377b56	f2fs: fix to avoid deadlock when merging inline data When testing with fsstress, kworker and user threads were both blocked: INFO: task kworker/u16:1:16580 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u16:1 D ffff8803f2595390 0 16580 2 0x00000000 Workqueue: writeback bdi_writeback_workfn (flush-251:0) ffff8802730e5760 0000000000000046 ffff880274729fc0 0000000000012440 ffff8802730e5fd8 ffff8802730e4010 0000000000012440 0000000000012440 ffff8802730e5fd8 0000000000012440 ffff880274729fc0 ffff88026eb50000 Call Trace: [<ffffffff816fe9d9>] schedule+0x29/0x70 [<ffffffff816ff895>] rwsem_down_read_failed+0xa5/0xf9 [<ffffffff81378584>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffffa0694feb>] f2fs_write_data_page+0x31b/0x420 [f2fs] [<ffffffffa0690f1a>] __f2fs_writepage+0x1a/0x50 [f2fs] [<ffffffffa06922a0>] f2fs_write_data_pages+0xe0/0x290 [f2fs] [<ffffffff811473b3>] do_writepages+0x23/0x40 [<ffffffff811cc3ee>] __writeback_single_inode+0x4e/0x250 [<ffffffff811cd4f1>] writeback_sb_inodes+0x2c1/0x470 [<ffffffff811cd73e>] __writeback_inodes_wb+0x9e/0xd0 [<ffffffff811cda0b>] wb_writeback+0x1fb/0x2d0 [<ffffffff811cdb7c>] wb_do_writeback+0x9c/0x220 [<ffffffff811ce232>] bdi_writeback_workfn+0x72/0x1c0 [<ffffffff8106b74e>] process_one_work+0x1de/0x5b0 [<ffffffff8106e78f>] worker_thread+0x11f/0x3e0 [<ffffffff810750ce>] kthread+0xde/0xf0 [<ffffffff817093f8>] ret_from_fork+0x58/0x90 fsstress thread stack: [<ffffffff81139f0e>] sleep_on_page+0xe/0x20 [<ffffffff81139ef7>] __lock_page+0x67/0x70 [<ffffffff8113b100>] find_lock_page+0x50/0x80 [<ffffffff8113b24f>] find_or_create_page+0x3f/0xb0 [<ffffffffa06983a9>] sync_node_pages+0x259/0x810 [f2fs] [<ffffffffa068d874>] write_checkpoint+0x1a4/0xce0 [f2fs] [<ffffffffa0686b0c>] f2fs_sync_fs+0x7c/0xd0 [f2fs] [<ffffffffa067c813>] f2fs_sync_file+0x143/0x5f0 [f2fs] [<ffffffff811d301b>] vfs_fsync_range+0x2b/0x40 [<ffffffff811d304c>] vfs_fsync+0x1c/0x20 [<ffffffff811d3291>] do_fsync+0x41/0x70 [<ffffffff811d32d3>] SyS_fdatasync+0x13/0x20 [<ffffffff817094a2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff The reason of this issue is: CPU0: CPU1: - f2fs_write_data_pages - f2fs_sync_fs - write_checkpoint - block_operations - f2fs_lock_all - down_write(sbi->cp_rwsem) - lock_page(page) - f2fs_write_data_page - sync_node_pages - flush_inline_data - pagecache_get_page(page, GFP_LOCK) - f2fs_lock_op - down_read(sbi->cp_rwsem) This patch alters to use trylock_page in flush_inline_data to fix this ABBA deadlock issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-26 11:52:03 -08:00
Chao Yu	80dd9c0e9d	f2fs: fix incorrect upper bound when iterating inode mapping tree 1. Inode mapping tree can index page in range of [0, ULONG_MAX], however, in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX], result in miss hitting in page cache. 2. filemap_fdatawait_range accepts range parameters in unit of bytes, so the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX] as range for waiting on writeback, big number of pages will not be covered. This patch corrects above two issues. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-25 17:27:03 -08:00
Chao Yu	7a9d75481b	f2fs: trace old block address for CoWed page This patch enables to trace old block address of CoWed page for better debugging. f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:40:02 -08:00
Chao Yu	9a4cbc9e53	f2fs: try to flush inode after merging inline data When flushing node pages, if current node page is an inline inode page, we will try to merge inline data from data page into inline inode page, then skip flushing current node page, it will decrease the number of nodes to be flushed in batch in this round, which may lead to worse performance. This patch gives a chance to flush just merged inline inode pages for performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:40:02 -08:00
Chao Yu	1515aef013	f2fs: reorder nat cache lock in cache_nat_entry When lookuping nat entry in cache_nat_entry, if we fail to hit nat cache, we try to load nat entries a) from journal of current segment cache or b) from NAT pages for updating, during the process, write lock of nat_tree_lock will be held to avoid inconsistent condition in between nid cache and nat cache caused by racing among nat entry shrinker, checkpointer, nat entry updater. But this way may cause low efficient when updating nat cache, because it serializes accessing in journal cache or reading NAT pages. Here, we reorder lock and update flow as below to enhance accessing concurrency: - get_node_info - down_read(nat_tree_lock) - lookup nat cache --- hit -> unlock & return - lookup journal cache --- hit -> unlock & goto update - up_read(nat_tree_lock) update: - down_write(nat_tree_lock) - cache_nat_entry - lookup nat cache --- nohit -> update - up_write(nat_tree_lock) Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:39:55 -08:00
Chao Yu	b7ad7512b8	f2fs: split journal cache from curseg cache In curseg cache, f2fs caches two different parts: - datas of current summay block, i.e. summary entries, footer info. - journal info, i.e. sparse nat/sit entries or io stat info. With this approach, 1) it may cause higher lock contention when we access or update both of the parts of cache since we use the same mutex lock curseg_mutex to protect the cache. 2) current summary block with last journal info will be writebacked into device as a normal summary block when flushing, however, we treat journal info as valid one only in current summary, so most normal summary blocks contain junk journal data, it wastes remaining space of summary block. So, in order to fix above issues, we split curseg cache into two parts: a) current summary block, protected by original mutex lock curseg_mutex b) journal cache, protected by newly introduced r/w semaphore journal_rwsem When loading curseg cache during ->mount, we store summary info and journal info into different caches; When doing checkpoint, we combine datas of two cache into current summary block for persisting. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:39:54 -08:00
Chao Yu	dfc08a12e4	f2fs: introduce f2fs_journal struct to wrap journal info Introduce a new structure f2fs_journal to wrap journal info in struct f2fs_summary_block for readability. struct f2fs_journal { union { __le16 n_nats; __le16 n_sits; }; union { struct nat_journal nat_j; struct sit_journal sit_j; struct f2fs_extra_info info; }; } __packed; struct f2fs_summary_block { struct f2fs_summary entries[ENTRIES_IN_SUM]; struct f2fs_journal journal; struct summary_footer footer; } __packed; Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:39:53 -08:00
Yunlei He	d31c7c3f0b	f2fs: fix missing skip pages info fix missing skip pages info in f2fs_writepages trace event. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	0c3a579758	f2fs: introduce f2fs_submit_merged_bio_cond f2fs use single bio buffer per type data (META/NODE/DATA) for caching writes locating in continuous block address as many as possible, after submitting, these writes may be still cached in bio buffer, so we have to flush cached writes in bio buffer by calling f2fs_submit_merged_bio. Unfortunately, in the scenario of high concurrency, bio buffer could be flushed by someone else before we submit it as below reasons: a) there is no space in bio buffer. b) add a request of different type (SYNC, ASYNC). c) add a discontinuous block address. For this condition, f2fs_submit_merged_bio will be devastating, because it could break the following merging of writes in bio buffer, split one big bio into two smaller one. This patch introduces f2fs_submit_merged_bio_cond which can do a conditional submitting with bio buffer, before submitting it will judge whether: - page in DATA type bio buffer is matching with specified page; - page in DATA type bio buffer is belong to specified inode; - page in NODE type bio buffer is belong to specified inode; If there is no eligible page in bio buffer, we will skip submitting step, result in gaining more chance to merge consecutive block IOs in bio cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	fa3d2bdf94	f2fs: wait on page's writeback in writepages path Likewise f2fs_write_cache_pages, let's do for node and meta pages too. Especially, for node blocks, we should do this before marking its fsync and dentry flags. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	3cf4574705	f2fs: introduce get_next_page_offset to speed up SEEK_DATA When seeking data in ->llseek, if we encounter a big hole which covers several dnode pages, we will try to seek data from index of page which is the first page of next dnode page, at most we could skip searching (ADDRS_PER_BLOCK - 1) pages. However it's still not efficient, because if our indirect/double-indirect pointer are NULL, there are no dnode page locate in the tree indirect/ double-indirect pointer point to, it's not necessary to search the whole region. This patch introduces get_next_page_offset to calculate next page offset based on current searching level and max searching level returned from get_dnode_of_data, with this, we could skip searching the entire area indirect or double-indirect node block is not exist. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	81ca7350ce	f2fs: remove unneeded pointer conversion There are redundant pointer conversion in following call stack: - at position a, inode was been converted to f2fs_file_info. - at position b, f2fs_file_info was been converted to inode again. - truncate_blocks(inode,..) - fi = F2FS_I(inode) ---a - ADDRS_PER_PAGE(node_page, fi) - addrs_per_inode(fi) - inode = &fi->vfs_inode ---b - f2fs_has_inline_xattr(inode) - fi = F2FS_I(inode) - is_inode_flag_set(fi,..) In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and addrs_per_inode to acept parameter with type of inode pointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	fec1d6576c	f2fs: use wait_for_stable_page to avoid contention In write_begin, if storage supports stable_page, we don't need to wait for writeback to update its contents. This patch introduces to use wait_for_stable_page instead of wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	2049d4fcb0	f2fs: avoid multiple node page writes due to inline_data The sceanrio is: 1. create fully node blocks 2. flush node blocks 3. write inline_data for all the node blocks again 4. flush node blocks redundantly So, this patch tries to flush inline_data when flushing node blocks. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	2304cb0c44	f2fs: export dirty_nats_ratio in sysfs This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold of dirty nat entries, if current ratio exceeds configured threshold, checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	1663cae48c	f2fs: fix wrong memory condition check This patch fixes wrong decision for avaliable_free_memory. The return valus is already set as false, so we should consider true condition below only. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:56:43 -08:00
Jaegeuk Kim	12719ae14e	f2fs: avoid unnecessary f2fs_balance_fs calls Only when node page is newly dirtied, it needs to check whether we need to do f2fs_gc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:22 -08:00
Chao Yu	0e022ea8fc	f2fs: introduce __get_node_page to reuse common code There are duplicated code in between get_node_page and get_node_page_ra, introduce __get_node_page to includes common parts of these two, and export get_node_page and get_node_page_ra by reusing __get_node_page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:20 -08:00
Chao Yu	e84587250a	f2fs: check node id earily when readaheading node page Add node id check in ra_node_page and get_node_page_ra like get_node_page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:16 -08:00
Jaegeuk Kim	957efb0c21	Revert "f2fs: check the node block address of newly allocated nid" Original issue is fixed by: f2fs: cover more area with nat_tree_lock This reverts commit `24928634f8`.	2016-01-06 19:15:49 -08:00
Jaegeuk Kim	a51311938e	f2fs: cover more area with nat_tree_lock There was a subtle bug on nat cache management which incurs wrong nid allocation or wrong block addresses when try_to_free_nats is triggered heavily. This patch enlarges the previous coverage of nat_tree_lock to avoid data race. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-06 19:15:48 -08:00
Jaegeuk Kim	8d4ea29b64	f2fs: write pending bios when cp_error is set When testing ioc_shutdown, put_super is able to be hanged by waiting for writebacking pages as follows. INFO: task umount:2723 blocked for more than 120 seconds. Tainted: G O 4.4.0-rc3+ #8 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D ffff88000859f9d8 0 2723 2110 0x00000000 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c Call Trace: [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8182310c>] schedule+0x3c/0x90 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30 [<ffffffff8111617c>] ? ktime_get+0xac/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110 [<ffffffff81823a25>] bit_wait_io+0x35/0x50 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs] [<ffffffff812639bc>] evict+0xbc/0x190 [<ffffffff81263d19>] iput+0x229/0x2c0 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs] [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0 [<ffffffff812478f7>] kill_block_super+0x27/0x70 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs] [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20 [<ffffffff810ac463>] task_work_run+0x73/0xa0 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-31 13:08:02 -08:00
Chao Yu	6d5a1495ee	f2fs: let user being aware of IO error Sometimes we keep dumb when IO error occur in lower layer device, so user will not receive any error return value for some operation, but actually, the operation did not succeed. This sould be avoided, so this patch reports such kind of error to user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:15 -08:00
Jaegeuk Kim	4aa69d5667	f2fs: return early when trying to read null nid If get_node_page() gets zero nid, we can return early without getting a wrong page. For example, get_dnode_of_data() can try to do that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:12 -08:00
Jaegeuk Kim	93bae099ea	f2fs: record node block allocation in dnode_of_data This patch introduces recording node block allocation in dnode_of_data. This information helps to figure out whether any node block is allocated during specific file operations. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:07 -08:00
Jaegeuk Kim	7441ccef33	f2fs: use atomic variable for total_extent_tree It would be better to use atomic variable for total_extent_tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-22 10:31:41 -08:00
Chao Yu	e1c51b9f1d	f2fs: clean up node page updating flow If read_node_page return LOCKED_PAGE, in its caller it's better a) skip unneeded 'Update' flag and mapping info verfication; b) check nid value stored in footer structure of node page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-14 09:09:17 -08:00
Chao Yu	ea1a29a0bd	f2fs: export ra_nid_pages to sysfs After finishing building free nid cache, we will try to readahead asynchronously 4 more pages for the next reloading, the count of readahead nid pages is fixed. In some case, like SMR drive, read less sectors with fixed count each time we trigger RA may be low efficient, since we will face high seeking overhead, so we'd better let user to configure this parameter from sysfs in specific workload. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:03:43 -07:00
Chao Yu	2db2388fcf	f2fs: readahead for free nids building When there is no free nid in nid cache, all new node allocaters stop their job to wait for reloading of free nids, however reloading is synchronous as we will read 4 NAT pages for building nid cache, it cause the long latency. This patch tries to readahead more NAT pages with READA request flag after reloading of free nids. It helps to improve performance when users allocate node id intensively. Env: Sandisk 32G sd card time for i in `seq 1 60000`; { echo -n > /mnt/f2fs/$i; echo XXXXXX > /mnt/f2fs/$i;} Before: real 0m2.814s user 0m1.220s sys 0m1.536s After: real 0m2.711s user 0m1.136s sys 0m1.568s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:03:22 -07:00
Chao Yu	26879fb101	f2fs: support lower priority asynchronous readahead in ra_meta_pages Now, we use ra_meta_pages to reads continuous physical blocks as much as possible to improve performance of following reads. However, ra_meta_pages uses a synchronous readahead approach by submitting bio with READ, as READ is with high priority, it can not be used in the case of preloading blocks, and it's not sure when these RAed pages will be used. This patch supports asynchronous readahead in ra_meta_pages by tagging bio with READA flag in order to allow preloading. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:03:15 -07:00
Chao Yu	2b947003fa	f2fs: don't tag REQ_META for temporary non-meta pages In recovery or checkpoint flow, we grab pages temperarily in meta inode's mapping for caching temperary data, actually, datas in these pages were not meta data of f2fs, but still we tag them with REQ_META flag. However, lower device like eMMC may do some optimization for data of such type. So in order to avoid wrong optimization, we'd better remove such flag for temperary non-meta pages. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:01:46 -07:00
Jaegeuk Kim	a125702326	Revert "f2fs: do not skip dentry block writes" The periodic checkpoint can resolve the previous issue. So, now we can use this again to improve the reported performance regression: https://lkml.org/lkml/2015/10/8/20 This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.	2015-10-12 13:38:02 -07:00
Jaegeuk Kim	90b803e6fb	f2fs: do not skip dentry block writes Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory pressure and the number of dirty pages is pretty small. But, we didn't skip for normal data writes, which gives us not much big impact on overall performance. Moreover, by skipping some data writes, kworker falls into infinite loop to try to write blocks, when many dir inodes have only one dentry block. So, this patch removes skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:54 -07:00
Jaegeuk Kim	25b93346a6	f2fs: cover number of dirty node pages under node_write lock This number is referenced by checkpoint under node_write lock. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:49 -07:00
Chao Yu	13ec7297e5	f2fs: fix to release inode correctly In following call stack, if unfortunately we lose all chances to truncate inode page in remove_inode_page, eventually we will add the nid allocated previously into free nid cache, this nid is with NID_NEW status and with NEW_ADDR in its blkaddr pointer: - f2fs_create - f2fs_add_link - __f2fs_add_link - init_inode_metadata - new_inode_page - new_node_page - set_node_addr(, NEW_ADDR) - f2fs_init_acl failed - remove_inode_page failed - handle_failed_inode - remove_inode_page failed - iput - f2fs_evict_inode - remove_inode_page failed - alloc_nid_failed cache a nid with valid blkaddr: NEW_ADDR This may not only cause resource leak of previous inode, but also may cause incorrect use of the previous blkaddr which is located in NO.nid node entry when this nid is reused by others. This patch tries to add this inode to orphan list if we fail to truncate inode, so that we can obtain a second chance to release it in orphan recovery flow. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-24 16:35:59 -07:00
Jaegeuk Kim	f7409d0fae	f2fs: fix wrong pointer access during try_to_free_nids If we release the lock in list_for_each_entry_safe, we can lose the tmp pointer by alloc_nid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-24 09:37:42 -07:00
Jaegeuk Kim	80c545055d	f2fs: use __GFP_NOFAIL to avoid infinite loop __GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and bio_alloc. And, it also fixes the use cases of GFP_ATOMIC correctly. Suggested-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-24 09:37:21 -07:00
Jaegeuk Kim	24928634f8	f2fs: check the node block address of newly allocated nid This patch adds a routine which checks the block address of newly allocated nid. If an nid has already allocated by other thread due to subtle data races, it will result in filesystem corruption. So, it needs to check whether its block address was already allocated or not in prior to nid allocation as the last chance. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:14 -07:00
Jaegeuk Kim	2683446646	f2fs: reuse nids more aggressively If we can reuse nids as many as possible, we can mitigate producing obsolete node pages in the page cache. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:11 -07:00
Chao Yu	31696580bf	f2fs: shrink free_nids entries This patch introduces __count_free_nids/try_to_free_nids and registers them in slab shrinker for shrinking under memory pressure. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:06 -07:00
Chao Yu	a6d494b6d8	f2fs: fix to build free nids from readaheaded nat pages When there is no enough free nids in free nid cache, we will try to readahead FREE_NID_PAGES:4 nat pages into page cache of meta_inode, then, reading nat entries in nat page for adding free nids to free nid cache. But when traversing all nat pages we readaheaded in a circulation, our exit condition is not set right, one more nat page will be scanned without readaheading, resulting worse read performance. This patch fixes to read the correct number nat pages to avoid bad performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:14 -07:00
Jaegeuk Kim	86531d6b84	f2fs: callers take care of the page from bio error This patch changes for a caller to handle the page after its bio gets an error. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:07 -07:00
Jaegeuk Kim	1b38dc8e74	f2fs: shrink nat_cache entries This patch registers shrinking nat_cache entries. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:55 -07:00
Linus Torvalds	e4bc13adfd	Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block Pull cgroup writeback support from Jens Axboe: "This is the big pull request for adding cgroup writeback support. This code has been in development for a long time, and it has been simmering in for-next for a good chunk of this cycle too. This is one of those problems that has been talked about for at least half a decade, finally there's a solution and code to go with it. Also see last weeks writeup on LWN: http://lwn.net/Articles/648292/" * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits) writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled v9fs: fix error handling in v9fs_session_init() bdi: fix wrong error return value in cgwb_create() buffer: remove unusued 'ret' variable writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach\|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations ...	2015-06-25 16:00:17 -07:00
Tejun Heo	a88a341a73	writeback: move bandwidth related fields from backing_dev_info into bdi_writeback Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bandwidth related fields from backing_dev_info into bdi_writeback. * The moved fields are: bw_time_stamp, dirtied_stamp, written_stamp, write_bandwidth, avg_write_bandwidth, dirty_ratelimit, balanced_dirty_ratelimit, completions and dirty_exceeded. * writeback_chunk_size() and over_bground_thresh() now take @wb instead of @bdi. * bdi_writeout_fraction(bdi, ...) -> wb_writeout_fraction(wb, ...) bdi_dirty_limit(bdi, ...) -> wb_dirty_limit(wb, ...) bdi_position_ration(bdi, ...) -> wb_position_ratio(wb, ...) bdi_update_writebandwidth(bdi, ...) -> wb_update_write_bandwidth(wb, ...) [__]bdi_update_bandwidth(bdi, ...) -> [__]wb_update_bandwidth(wb, ...) bdi_{max\|min}_pause(bdi, ...) -> wb_{max\|min}_pause(wb, ...) bdi_dirty_limits(bdi, ...) -> wb_dirty_limits(wb, ...) * Init/exits of the relocated fields are moved to bdi_wb_init/exit() respectively. Note that explicit zeroing is dropped in the process as wb's are cleared in entirety anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[] introducing no behavior changes. v2: Typo in description fixed as suggested by Jan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-06-02 08:33:34 -06:00
Jaegeuk Kim	4375a33664	f2fs crypto: add encryption support in read/write paths This patch adds encryption support in read and write paths. Note that, in f2fs, we need to consider cleaning operation. In cleaning procedure, we must avoid encrypting and decrypting written blocks. So, this patch implements move_encrypted_block(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:52 -07:00
Chao Yu	d5b692b786	f2fs: do not re-lookup nat cache with same nid In set_node_addr, we try to lookup cached nat entry of inode and then set flag in it. But previously in this function, we have already grabbed nat entry with current node id, if the node id is the same as the one of inode, we do not need to lookup it in cache again. So this patch adds condition judgment for reducing unneeded lookup. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:41 -07:00
Jaegeuk Kim	2dcf51ab2f	f2fs: add need_dentry_mark This patch introduces need_dentry_mark() to clean up and avoid redundant node locks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:36 -07:00
Jaegeuk Kim	05ca3632e5	f2fs: add sbi and page pointer in f2fs_io_info This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure. With this change, we can reduce a lot of parameters for IO functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:32 -07:00
Chao Yu	2aa7c51a45	f2fs: make has_fsynced_inode static has_fsynced_inode() has no other caller out of node.c, make it static. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-07 11:38:32 -07:00
Wanpeng Li	57ed1e95ba	f2fs: fix unlocked nat set cache operation nm_i->nat_tree_lock is used to sync both the operations of nat entry cache tree and nat set cache tree, however, it isn't held when flush nat entries during checkpoint which lead to potential race, this patch fix it by holding the lock when gang lookup nat set cache and delete item from nat set cache. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:39 -07:00
Jaegeuk Kim	7662916591	f2fs: report -ENOENT for unreached data indices If inode has inline_data, it should report -ENOENT when accessing out-of-bound region. This is used by f2fs_fiemap which treats -ENOENT with no error. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:34 -07:00
Jaegeuk Kim	2bca1e2388	f2fs: clear page's up-to-date if block was deallocated If page's on-disk block was deallocated, let's remove up-to-date flag to avoid further access with wrong contents. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:26 -07:00
Chao Yu	429511cdf8	f2fs: add core functions for rb-tree extent cache This patch adds core functions including slab cache init function and init/lookup/update/shrink/destroy function for rb-tree based extent cache. Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail design and implementation of extent cache. Todo: * register rb-based extent cache shrink with mm shrink interface. v2: o move set_extent_info and __is_{extent,back,front}_mergeable into f2fs.h. o introduce __{attach,detach}_extent_node for code readability. o add cond_resched() when fail to invoke kmem_cache_alloc/radix_tree_insert. o fix some coding style and typo issues. v3: o fix oops due to using an unassigned pointer. o use list_del to remove extent node in shrink list. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: add static for some funcitons and declare in f2fs.h] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:46 -08:00
Jaegeuk Kim	f1a3b98e73	f2fs: fix accessing wrong indexed data blocks This patch fixes the following test. This causes: attempt to access beyond end of device sdb2: rw=16384, want=14413962000, limit=16777216 The reason is: - f2fs_write_begin - f2fs_convert_inline_inode returns -ENOSPC - f2fs_write_failed - truncate_blocks - truncate_partial_data_page - find_data_page - get_dnode_of_data returns wrong data index retrieved from inline_data - f2fs_submit_page_bio(wrong data index) - submit_bio(wrong data index) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:51 -08:00
Jaegeuk Kim	aaf9607516	f2fs: check node page contents all the time In get_node_page, if the page is up-to-date, we assumed that the page was not reclaimed at all. But, sometimes it was reported that its contents was missing. So, just for sure, let's check its mapping and contents. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:46 -08:00
Chao Yu	487261f39b	f2fs: merge {invalidate,release}page for meta/node/data pages This patch merges ->{invalidate,release}page function for meta/node/data pages. After this, duplication of codes could be removed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:44 -08:00
Jaegeuk Kim	f68daeebba	f2fs: keep PagePrivate during releasepage If PagePrivate is removed by releasepage, f2fs loses counting dirty pages. e.g., try_to_release_page will not release page when the page is dirty, but our releasepage removes PagePrivate. [<ffffffff81188d75>] try_to_release_page+0x35/0x50 [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0 [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs] [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs] [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs] [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170 [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350 [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0 [<ffffffff81203081>] new_sync_write+0x81/0xb0 [<ffffffff81203837>] vfs_write+0xb7/0x1f0 [<ffffffff81204459>] SyS_write+0x49/0xb0 [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:42 -08:00
Chao Yu	caf0047e7e	f2fs: merge flags in struct f2fs_sb_info Currently, there are several variables with Boolean type as below: struct f2fs_sb_info { ... int s_dirty; bool need_fsck; bool s_closing; ... bool por_doing; ... } For this there are some issues: 1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean type variables by compiler. 2. if we continuously add new flag into f2fs_sb_info, structure will be messed up. So in this patch, we try to: 1. switch s_dirty to Boolean type variable since it has two status 0/1. 2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag. 3. introduce an enum type which can indicate different states of sbi. 4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag to operate flags for sbi. After that, above issues will be fixed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:38 -08:00
Jaegeuk Kim	7aed0d4537	f2fs: free radix_tree_nodes used by nat_set entries In the normal case, the radix_tree_nodes are freed successfully. But, when cp_error was detected, we should destroy them forcefully. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:28 -08:00
Jaegeuk Kim	3547ea961d	f2fs: avoid potential unnecessary codes This patch relocates some operations to avoid unnecessary execution. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:26 -08:00
Jaegeuk Kim	9e4ded3f30	f2fs: activate f2fs_trace_pid This patch activates f2fs_trace_pid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:24 -08:00
Jaegeuk Kim	cf04e8eb55	f2fs: use f2fs_io_info to clean up messy parameters during IO path This patch cleans up parameters on IO paths. The key idea is to use f2fs_io_info adding a parameter, block address, and then use this structure as parameters. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:23 -08:00
Chao Yu	9ecf4b80bd	f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary Use more common function ra_meta_pages() with META_POR to readahead node blocks in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the readahead code there, and also we can remove unused function ra_sum_pages(). changes from v2: o use invalidate_mapping_pages as before suggested by Changman Lee. changes from v1: o fix one bug when using truncate_inode_pages_range which is pointed out by Jaegeuk Kim. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:23 -08:00
Chao Yu	5c27f4ee44	f2fs: merge two uchar variable in struct node_info to reduce memory cost This patch moves one member of struct nat_entry: _flag_ to struct node_info, so _version_ in struct node_info and _flag_ which are unsigned char type will merge to one 32-bit space in register/memory. So the size of nat_entry will be reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40 bytes to 32 bytes) and then slab memory using by f2fs will be reduced. changes from v2: o update description of memory usage gain for 64-bit machine suggested by Changman Lee. changes from v1: o introduce inline copy_node_info() to copy valid data from node info suggested by Jaegeuk Kim, it can avoid bug. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:23 -08:00
Jaegeuk Kim	1e84371ffe	f2fs: change atomic and volatile write policies This patch adds two new ioctls to release inmemory pages grabbed by atomic writes. o f2fs_ioc_abort_volatile_write - If transaction was failed, all the grabbed pages and data should be written. o f2fs_ioc_release_volatile_write - This is to enhance the performance of PERSIST mode in sqlite. In order to avoid huge memory consumption which causes OOM, this patch changes volatile writes to use normal dirty pages, instead blocked flushing to the disk as long as system does not suffer from memory pressure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:22 -08:00
Jaegeuk Kim	cd52b6368f	f2fs: remove checking dirty_exceed We don't need to force to write dirty_exceeded for f2fs_balance_fs_bg. This flag was only meaningful to write bypassing conditions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:21 -08:00
Jaegeuk Kim	9be32d72be	f2fs: do retry operations with cond_resched This patch revists retrial paths in f2fs. The basic idea is to use cond_resched instead of retrying from the very early stage. Suggested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-08 10:35:05 -08:00
Jaegeuk Kim	769ec6e5b7	f2fs: call radix_tree_preload before radix_tree_insert This patch tries to fix: BUG: using smp_processor_id() in preemptible [00000000] code: f2fs_gc-254:0/384 (radix_tree_node_alloc+0x14/0x74) from [<c033d8a0>] (radix_tree_insert+0x110/0x200) (radix_tree_insert+0x110/0x200) from [<c02e8264>] (gc_data_segment+0x340/0x52c) (gc_data_segment+0x340/0x52c) from [<c02e8658>] (f2fs_gc+0x208/0x400) (f2fs_gc+0x208/0x400) from [<c02e8a98>] (gc_thread_func+0x248/0x28c) (gc_thread_func+0x248/0x28c) from [<c0139944>] (kthread+0xa0/0xac) (kthread+0xa0/0xac) from [<c0105ef8>] (ret_from_fork+0x14/0x3c) The reason is that f2fs calls radix_tree_insert under enabled preemption. So, before calling it, we need to call radix_tree_preload. Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or semaphore to cover the radix tree operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-05 09:51:04 -08:00
Jaegeuk Kim	8b26ef98da	f2fs: use rw_semaphore for nat entry lock Previoulsy, we used rwlock for nat_entry lock. But, now we have a lot of complex operations in set_node_addr. (e.g., allocating kernel memories, handling radix_trees, and so on) So, this patches tries to change spinlock to rw_semaphore to give CPUs to other threads. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-03 21:23:29 -08:00
Jaegeuk Kim	4634d71ed1	f2fs: fix missing kmem_cache_free This patch fixes missing kmem_cache_free when handling errors. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-03 16:40:28 -08:00
Changman Lee	80ec2e914d	f2fs: no more dirty_nat_entires when flushing After flushing dirty nat entries, it has to be no more dirty nat entries. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-25 17:26:36 -08:00
Changman Lee	20d047c876	f2fs: check dirty_nat_cnt before flushing nat entries in journal It's meaningless to check dirty_nat_cnt after re-dirtying nat entries in journal. And although there are rooms for dirty nat entires if dirty_nat_cnt is zero, it's also meaningless to check __has_cursum_space. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-25 17:26:34 -08:00
Markus Elfring	ce3e6d25f3	f2fs: fix typos for the word "destroy" in jump labels Two jump labels were adjusted in the implementation of the create_node_manager_caches() function because these identifiers contained typos. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-25 12:08:22 -08:00
Jaegeuk Kim	27c6bd60ac	f2fs: submit bio for node blocks in the reclaim path If a node page is request to be written during the reclaiming path, we should submit the bio to avoid pending to recliam it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-19 22:49:32 -08:00
Chao Yu	67298804f3	f2fs: introduce struct inode_management to wrap inner fields Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO, and we manage fields related to inode cache separately in struct f2fs_sb_info for each inode cache type. This makes codes a bit messy, so that this patch intorduce a new struct inode_management to wrap inner fields as following which make codes more neat. /* for inner inode cache management / struct inode_management { struct radix_tree_root ino_root; / ino entry array / spinlock_t ino_lock; / for ino entry lock / struct list_head ino_list; / inode list head / unsigned long ino_num; / number of entries / }; struct f2fs_sb_info { ... struct inode_management im[MAX_INO_ENTRY]; / manage inode cache */ ... } Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-19 22:49:32 -08:00
Jaegeuk Kim	2f97c326bf	f2fs: write node pages if checkpoint is not doing It needs to write node pages if checkpoint is not doing in order to avoid memory pressure. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-10 06:51:28 -08:00
Jaegeuk Kim	e5e7ea3c86	f2fs: control the memory footprint used by ino entries This patch adds to control the memory footprint used by ino entries. This will conduct best effort, not strictly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-06 15:24:46 -08:00
Jaegeuk Kim	309cc2b6e7	f2fs: refactor flush_nat_entries to remove costly reorganizing ops Previously, f2fs tries to reorganize the dirty nat entries into multiple sets according to its nid ranges. This can improve the flushing nat pages, however, if there are a lot of cached nat entries, it becomes a bottleneck. This patch introduces a new set management flow by removing dirty nat list and adding a series of set operations when the nat entry becomes dirty. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:30:41 -07:00
Jaegeuk Kim	90a893c749	f2fs: use MAX_BIO_BLOCKS(sbi) This patch cleans up a simple macro. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:18 -07:00
Jaegeuk Kim	88bd02c947	f2fs: fix conditions to remain recovery information in f2fs_sync_file This patch revisited whole the recovery information during the f2fs_sync_file. In this patch, there are three information to make a decision. a) IS_CHECKPOINTED, /* is it checkpointed before? / b) HAS_FSYNCED_INODE, / is the inode fsynced before? / c) HAS_LAST_FSYNC, / has the latest node fsync mark? */ And, the scenarios for our rule are based on: [Term] F: fsync_mark, D: dentry_mark 1. inode(x) \| CP \| inode(x) \| dnode(F) 2. inode(x) \| CP \| inode(F) \| dnode(F) 3. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) 4. inode(x) \| CP \| dnode(F) \| inode(F) 5. CP \| inode(x) \| dnode(F) \| inode(DF) 6. CP \| inode(DF) \| dnode(F) 7. CP \| dnode(F) \| inode(DF) 8. CP \| dnode(F) \| inode(x) \| inode(DF) For example, #3, the three conditions should be changed as follows. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) a) x o o o o b) x x x x o c) x o o x o If f2fs_sync_file stops ------^, it should write inode(F) --------------^ So, the need_inode_block_update should return true, since c) get_nat_flag(e, HAS_LAST_FSYNC), is false. For example, #8, CP \| alloc \| dnode(F) \| inode(x) \| inode(DF) a) o x x x x b) x x x o c) o o x o If f2fs_sync_file stops -------^, it should write inode(DF) --------------^ Note that, the roll-forward policy should follow this rule, which means, if there are any missing blocks, we doesn't need to recover that inode. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:15 -07:00
Jaegeuk Kim	7ef35e3b9e	f2fs: introduce a flag to represent each nat entry information This patch introduces a flag in the nat entry structure to merge various information such as checkpointed and fsync_done marks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:14 -07:00
Chao Yu	184a5cd2ce	f2fs: refactor flush_sit_entries codes for reducing SIT writes In commit `aec71382c6` ("f2fs: refactor flush_nat_entries codes for reducing NAT writes"), we descripte the issue as below: "Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time." Actually, we have the same problem in using SIT journal area. In this patch, firstly we will update sit journal with dirty entries as many as possible. Secondly if there is no space in sit journal, we will remove all entries in journal and walk through the whole dirty entry bitmap of sit, accounting dirty sit entries located in same SIT block to sit entry set. All entry sets are linked to list sit_entry_set in sm_info, sorted ascending order by count of entries in set. Later we flush entries in set which have fewest entries into journal as many as we can, and then flush dense set with merged entries to disk. In this way we can use sit journal area more effectively, also we will reduce SIT update, result in gaining in performance and saving lifetime of flash device. In my testing environment, it shows this patch can help to reduce SIT block update obviously. virtual machine + hard disk: fsstress -p 20 -n 400 -l 5 sit page num cp count sit pages/cp based 2006.50 1349.75 1.486 patched 1566.25 1463.25 1.070 Our latency of merging op is small when handling a great number of dirty SIT entries in flush_sit_entries: latency(ns) dirty sit count 36038 2151 49168 2123 37174 2232 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:15:05 -07:00
Jaegeuk Kim	9850cf4a89	f2fs: need fsck.f2fs when f2fs_bug_on is triggered If any f2fs_bug_on is triggered, fsck.f2fs is needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:15:02 -07:00
Jaegeuk Kim	4081363fbe	f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-03 17:37:13 -07:00
Jaegeuk Kim	c2e69583a4	f2fs: truncate stale block for inline_data This verifies to truncate any allocated blocks, offset[0], by inline_data. Not figured out, but for making sure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-25 14:52:09 -07:00
Chao Yu	c200b1aa6c	f2fs: fix incorrect calculation with total/free inode num Theoretically, our total inodes number is the same as total node number, but there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2 (meta nid), and they should never be used by user, so our total/free inode number calculated in ->statfs is wrong. This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by recalculating total/free inode number with the macro. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:57:06 -07:00
Jaegeuk Kim	202095a7a0	f2fs: remove rewrite_node_page I think we need to let the dirty node pages remain in the page cache instead of rewriting them in their places. So, after done with successful recovery, write_checkpoint will flush all of them through the normal write path. Through this, we can avoid potential error cases in terms of block allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:57:02 -07:00
Jaegeuk Kim	cf779cab14	f2fs: handle EIO not to break fs consistency There are two rules when EIO is occurred. 1. don't write any checkpoint data to preserve the previous checkpoint 2. don't lose the cached dentry/node/meta pages So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure. Then, writing checkpoint/dentry/node blocks is not allowed. Note that, for the data pages, we can't just throw away by redirtying them. Otherwise, kworker can fall into infinite loop to flush them. (Ref. xfstests/019) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:55:05 -07:00
Jaegeuk Kim	5274651927	f2fs: unlock_page when node page is redirtied out This patch fixes missing unlock_page when a node page is redirtied out. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 09:21:01 -07:00
Jaegeuk Kim	1c35a90e8a	f2fs: fix to recover inline_xattr/data and blocks This patch fixes not to skip xattr recovery and inline xattr/data recovery order. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:34 -07:00
Jaegeuk Kim	e3b4d43f7c	f2fs: should clear the inline_xattr flag During the recovery, we should clear the inline_xattr flag if its xattr node block is recovered. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:34 -07:00
Jaegeuk Kim	617deb8c05	f2fs: fix the initial inode page for recovery If a new inode page is needed for recover_dentry, we should assing i_inline as zero. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:34 -07:00
arter97	e1c4204520	f2fs: fix typo Fix typo and some grammatical errors. The words "filesystem" and "readahead" are being used without the space treewide. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:33 -07:00
Chao Yu	70cfed88ef	f2fs: avoid skipping recover_inline_xattr after recover_inline_data When we recover data of inode in roll-forward procedure, and the inode has both inline data and inline xattr. We may skip recovering inline xattr if we recover inline data form node page first. This patch will fix the problem that we lost inline xattr data in above scenario. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-02 07:43:51 -07:00
Chao Yu	b3582c6892	f2fs: reduce competition among node page writes We do not need to block on ->node_write among different node page writers e.g. fsync/flush, unless we have a node page writer from write_checkpoint. So it's better use rw_semaphore instead of mutex type for ->node_write to promote performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 23:28:37 -07:00
Chao Yu	aec71382c6	f2fs: refactor flush_nat_entries codes for reducing NAT writes Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time. In this patch we merge dirty entries located in same NAT block to nat entry set, and linked all set to list, sorted ascending order by entries' count of set. Later we flush entries in sparse set into journal as many as we can, and then flush merged entries to disk. In this way we can not only gain in performance, but also save lifetime of flash device. In my testing environment, it shows this patch can help to reduce NAT block writes obviously. In hard disk test case: cost time of fsstress is stablely reduced by about 5%. 1. virtual machine + hard disk: fsstress -p 20 -n 200 -l 5 node num cp count nodes/cp based 4599.6 1803.0 2.551 patched 2714.6 1829.6 1.483 2. virtual machine + 32g micro SD card: fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0 -f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5 -f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S node num cp count nodes/cp based 84.5 43.7 1.933 patched 49.2 40.0 1.23 Our latency of merging op shows not bad when handling extreme case like: merging a great number of dirty nats: latency(ns) dirty nat count 3089219 24922 5129423 27422 4000250 24523 change log from v1: o fix wrong logic in add_nat_entry when grab a new nat entry set. o swith to create slab cache in create_node_manager_caches. o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency. change log from v2: o make comment position more appropriate suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 14:04:25 -07:00
Jaegeuk Kim	a014e037be	f2fs: clean up an unused parameter and assignment This patch cleans up simple unnecessary codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 14:04:25 -07:00
Jaegeuk Kim	2743f86554	f2fs: check bdi->dirty_exceeded when trying to skip data writes If we don't check the current backing device status, balance_dirty_pages can fall into infinite pausing routine. This can be occurred when a lot of directories make a small number of dirty dentry pages including files. Reported-by: Brian Chadwick <brianchad@westnet.com.au> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 05:59:45 -07:00
Linus Torvalds	64b2d1fbbf	f2fs updates for v3.16 This patch-set includes the following major enhancement patches. o enhance wait_on_page_writeback o support SEEK_DATA and SEEK_HOLE o enhance readahead flows o enhance IO flushes o support fiemap o add some tracepoints The other bug fixes are as follows. o fix to support a large volume > 2TB correctly o recovery bug fix wrt fallocated space o fix recursive lock on xattr operations o fix some cases on the remount flow And, there are a bunch of cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTleLYAAoJEEAUqH6CSFDSdhEP/iY5hTZ02jH4ZiFPP/Nee4hS l0BHeZvrMjoccaWUDu0ZvIPC8BiJ7kLOgK+/VWS7LAfY1PY11ALEtYQOrW+RM47+ jMfULegTod/F8WS2B6dk31QMhAZldtnsYvA5PS1VV3S0rht+qbOz+PDejZFgSMc3 VVQ7OZzq30gMmtsw7oh3FHfeTu4xe/bxygdMRXgljQQD2MvorJiOb4MA+ovEDd9z CZMMTQvRKjc0d8LPf0gOkZEvG63GR6klCgFMuiappUsua0O8IPIEhCyEGFrE66vS fObVKpqNAsR2ABhS2grgn6Q7UUvn4xrF6jOwDH5tuw2yzmkQiMAWINwBdgnbEy1c D5S89PQ8TkQK9KBSoU0v8WKWC4HzWZF4ZEd6eq9VxVTS8iT0w8DtNHXK518FVC34 82iqrLc0EhrcGNiW/7Nrc6WzHrWqorylCFN7atB3ruhVqeMh3MZsDU4Gq0WgB2oh pF9XVBEpJQpV35DYSAPzLkm+2+mwHVNqwdY3HcvUs7IP90+wZlrWSRG2FEfFt/e8 6nwvbORrHMTI5VfdYlEPgpjuesFmYPZqEvRGdaDa1YrHqhvvgStEPT9qiq2qVn9+ tr0HjpNRDD/etkaE6ujYOYqdxuk3mm6RY68h+KSbNcY1/VTvt2bN2telwWuDsxIq 8yOsxV2x3JB/euDAJsSU =xqsO -----END PGP SIGNATURE----- Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, there is no special interesting feature, but we've investigated a couple of tuning points with respect to the I/O flow. Several major bug fixes and a bunch of clean-ups also have been made. This patch-set includes the following major enhancement patches: - enhance wait_on_page_writeback - support SEEK_DATA and SEEK_HOLE - enhance readahead flows - enhance IO flushes - support fiemap - add some tracepoints The other bug fixes are as follows: - fix to support a large volume > 2TB correctly - recovery bug fix wrt fallocated space - fix recursive lock on xattr operations - fix some cases on the remount flow And, there are a bunch of cleanups" * tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits) f2fs: support f2fs_fiemap f2fs: avoid not to call remove_dirty_inode f2fs: recover fallocated space f2fs: fix to recover data written by dio f2fs: large volume support f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages f2fs: avoid overflow when large directory feathure is enabled f2fs: fix recursive lock by f2fs_setxattr MAINTAINERS: add a co-maintainer from samsung for F2FS MAINTAINERS: change the email address for f2fs f2fs: use inode_init_owner() to simplify codes f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency f2fs: add a tracepoint for f2fs_read_data_page f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page f2fs: add a tracepoint for f2fs_write_end f2fs: add a tracepoint for f2fs_write_begin f2fs: fix checkpatch warning f2fs: deactivate inode page if the inode is evicted f2fs: decrease the lock granularity during write_begin ...	2014-06-09 19:11:44 -07:00
Mel Gorman	2457aec637	mm: non-atomically mark page accessed during page cache allocation where possible aops->write_begin may allocate a new page and make it visible only to have mark_page_accessed called almost immediately after. Once the page is visible the atomic operations are necessary which is noticable overhead when writing to an in-memory filesystem like tmpfs but should also be noticable with fast storage. The objective of the patch is to initialse the accessed information with non-atomic operations before the page is visible. The bulk of filesystems directly or indirectly use grab_cache_page_write_begin or find_or_create_page for the initial allocation of a page cache page. This patch adds an init_page_accessed() helper which behaves like the first call to mark_page_accessed() but may called before the page is visible and can be done non-atomically. The primary APIs of concern in this care are the following and are used by most filesystems. find_get_page find_lock_page find_or_create_page grab_cache_page_nowait grab_cache_page_write_begin All of them are very similar in detail to the patch creates a core helper pagecache_get_page() which takes a flags parameter that affects its behavior such as whether the page should be marked accessed or not. Then old API is preserved but is basically a thin wrapper around this core function. Each of the filesystems are then updated to avoid calling mark_page_accessed when it is known that the VM interfaces have already done the job. There is a slight snag in that the timing of the mark_page_accessed() has now changed so in rare cases it's possible a page gets to the end of the LRU as PageReferenced where as previously it might have been repromoted. This is expected to be rare but it's worth the filesystem people thinking about it in case they see a problem with the timing change. It is also the case that some filesystems may be marking pages accessed that previously did not but it makes sense that filesystems have consistent behaviour in this regard. The test case used to evaulate this is a simple dd of a large file done multiple times with the file deleted on each iterations. The size of the file is 1/10th physical memory to avoid dirty page balancing. In the async case it will be possible that the workload completes without even hitting the disk and will have variable results but highlight the impact of mark_page_accessed for async IO. The sync results are expected to be more stable. The exception is tmpfs where the normal case is for the "IO" to not hit the disk. The test machine was single socket and UMA to avoid any scheduling or NUMA artifacts. Throughput and wall times are presented for sync IO, only wall times are shown for async as the granularity reported by dd and the variability is unsuitable for comparison. As async results were variable do to writback timings, I'm only reporting the maximum figures. The sync results were stable enough to make the mean and stddev uninteresting. The performance results are reported based on a run with no profiling. Profile data is based on a separate run with oprofile running. async dd 3.15.0-rc3 3.15.0-rc3 vanilla accessed-v2 ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%) tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%) btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%) ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%) xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%) The XFS figure is a bit strange as it managed to avoid a worst case by sheer luck but the average figures looked reasonable. samples percentage ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:10 -07:00
Jaegeuk Kim	b6fe5873cb	f2fs: fix to recover data written by dio If data are overwritten through dio, previous f2fs doesn't remain the fsync mark due to no additional node writes. Note that this patch should resolve the xfstests:311. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-04 18:41:38 +09:00
Chao Yu	bac4eef653	f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Changes from V1 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim. Call Trace: [<f1031630>] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [<f10377bb>] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [<f103c5da>] restore_node_summary+0x13a/0x280 [f2fs] [<f103e22d>] build_curseg+0x2bd/0x620 [f2fs] [<f104043b>] build_segment_manager+0x1cb/0x920 [f2fs] [<f1032c85>] f2fs_fill_super+0x535/0x8e0 [f2fs] [<c115b66a>] mount_bdev+0x16a/0x1a0 [<f102f63f>] f2fs_mount+0x1f/0x30 [f2fs] [<c115c096>] mount_fs+0x36/0x170 [<c1173635>] vfs_kern_mount+0x55/0xe0 [<c1175388>] do_mount+0x1e8/0x900 [<c1175d72>] SyS_mount+0x82/0xc0 [<c16059cc>] sysenter_do_call+0x12/0x22 Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-04 13:34:30 +09:00
Chao Yu	e574843438	f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages This patch adds a tracepoint for f2fs_write_{meta,node,data}_pages to trace when pages are fsyncing/flushing. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:59 +09:00
Chao Yu	ecda0de343	f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page This patch adds a tracepoint for f2fs_write_{meta,node,data}_page to trace when page is writting out. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:59 +09:00
Jaegeuk Kim	54b591dfda	f2fs: split grab_cache_page and wait_on_page_writeback for node pages This patch splits grab_cache_page_write_begin into grab_cache_page and wait_on_page_writeback for node pages. This patch intends to enhance the latency to get node pages by alleviating unnecessary wait_on_page_writeback. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:58 +09:00
Jaegeuk Kim	6fb03f3a40	f2fs: adjust free mem size to flush dentry blocks If so many dirty dentry blocks are cached, not reached to the flush condition, we should fall into livelock in balance_dirty_pages. So, let's consider the mem size for the condition. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:55 +09:00
Jaegeuk Kim	e8271fa390	f2fs: avoid BUG_ON when mouting corrupted image having garbage blocks If the disk has some garbage blocks, F2FS is able to face with BUG_ON when recovering direct node blocks. This patch detects the error case and avoids that prior to reaching BUG_ON. Alexey Khoroshilov addressed the potential security issues as follows. "An ability to trigger a BUG_ON assert by mounting a crafted image is usually considered as a local denial of service [1-3]. As far as I understand, the reason is that some kernel data may become inconsistent that can lead to further problems. [1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3353 [2] http://www.openwall.com/lists/oss-security/2011/06/24/4 [3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2928 etc." Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:55 +09:00
Jaegeuk Kim	7ee0eeabcd	f2fs: add available_nids to fix handling max_nid correctly This patch introduces available_nids for alloc_nids() and fixes max_nid for build_free_nids() and scan_nat_pages(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:55 +09:00
Chao Yu	94dac22e72	f2fs: introduce raw_nat_from_node_info() to simplfy codes This patch introduce raw_nat_from_node_info() to simplfy some codes, and also use exist function node_info_from_raw_nat() to do the same job. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:55 +09:00
Jingoo Han	b156d54241	f2fs: make recover_inline_xattr() static Make recover_inline_xattr() static, because this function is used only in this file. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:54 +09:00
Jaegeuk Kim	76f60268e7	f2fs: call redirty_page_for_writepage This patch replace some general codes with redirty_page_for_writepage, which can be enabled after consideration on additional procedure like counting dirty pages appropriately. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:54 +09:00
Chao Yu	2d7b822ad9	f2fs: use list_for_each_entry{_safe} for simplyfying code This patch use list_for_each_entry{_safe} instead of list_for_each{_safe} for simplfying code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-02 09:56:27 +09:00
Chao Yu	cf0ee0f09b	f2fs: avoid free slab cache under spinlock Move kmem_cache_free out of spinlock protection region for better performance. Change log from v1: o remove spinlock protection for kmem_cache_free in destroy_node_manager suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-02 09:56:12 +09:00
Jaegeuk Kim	3bb5e2c8fe	f2fs: return -EIO when node id is not matched During the cleaing of node segments, F2FS can get errored node blocks due to data race between node page lock and its valid bitmap operations. In that case, it needs to return an error to skip such the obsolete block copy. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-01 17:38:26 +09:00
Jaegeuk Kim	479f40c44a	f2fs: skip unnecessary node writes during fsync If multiple redundant fsync calls are triggered, we don't need to write its node pages with fsync mark continuously. So, this patch adds FI_NEED_FSYNC to track whether the latest node block is written with the fsync mark or not. If the mark was set, a new fsync doesn't need to write a node block. Otherwise, we should do a new node block with the mark for roll-forward recovery. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:11 +09:00
Jaegeuk Kim	a5f420101d	f2fs: remove unnecessary threshold The NM_WOUT_THRESHOLD is now obsolete since f2fs starts to control on a basis of the memory footprint. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:09 +09:00
Jaegeuk Kim	cdfc41c134	f2fs: throttle the memory footprint with a sysfs entry This patch introduces ram_thresh, a sysfs entry, which controls the memory footprint used by the free nid list and the nat cache. Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat cache was managed by NM_WOUT_THRESHOLD. However, this approach cannot be applied dynamically according to the system. So, this patch adds ram_thresh that users can specify the threshold, which is in order of 1 / 1024. For example, if the total ram size is 4GB and the value is set to 10 by default, f2fs tries to control the number of free nids and nat caches not to consume over 10 * (4GB / 1024) = 10MB. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:09 +09:00
Jaegeuk Kim	40bb0058c8	f2fs: avoid to drop nat entries due to the negative nr_shrink The try_to_free_nats should not receive the negative nr_shrink. Otherwise, it can drop all the nat entries by the while loop. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:08 +09:00
Jaegeuk Kim	3cb5ad152b	f2fs: call f2fs_wait_on_page_writeback instead of native function If a page is on writeback, f2fs can face with deadlock due to under writepages. This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw merged IOs, which is implemented by f2fs_wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:04 +09:00
Jaegeuk Kim	50c8cdb35a	f2fs: introduce nr_pages_to_write for segment alignment This patch introduces nr_pages_to_write to align page writes to the segment or other operational unit size, which can be tuned according to the system environment. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 16:37:53 +09:00
Jaegeuk Kim	d3baf95da5	f2fs: increase pages_skipped when skipping writepages This patch increases pages_skipped when skipping writepages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 16:37:16 +09:00
Jaegeuk Kim	87d6f89094	f2fs: avoid small data writes by skipping writepages This patch introduces nr_pages_to_skip(sbi, type) to determine writepages can be skipped. The dentry, node, and meta pages can be conrolled by F2FS without breaking the FS consistency. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 13:58:59 +09:00
Chao Yu	4bc8e9bcf5	f2fs: introduce f2fs_has_xattr_block for better readability This patch introduces a help function f2fs_has_xattr_block for better readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 09:29:46 +09:00
Chao Yu	987c7c3112	f2fs: introduce f2fs_has_inline_xattr for better readability This patch introduces a help function f2fs_has_inline_xattr for better readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-12 17:23:35 +09:00
Chao Yu	28cdce0459	f2fs: recover inline xattr data in roll-forward process Previously we do not recover inline xattr data of inode after power-cut, so inline xattr data may be lost. We should recover the data during the roll-forward process. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-11 16:31:06 +09:00
Gu Zheng	d653788a43	f2fs: optimize restore_node_summary slightly Previously, we ra_sum_pages to pre-read contiguous pages as more as possible, and if we fail to alloc more pages, an ENOMEM error will be reported upstream, even though we have alloced some pages yet. In fact, we can use the available pages to do the job partly, and continue the rest in the following circle. Only reporting ENOMEM upstream if we really can not alloc any available page. And another fix is ignoring dealing with the following pages if an EIO occurs when reading page from page_list. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: modify the flow for better neat code] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-10 18:45:15 +09:00
Gu Zheng	e8512d2e0c	f2fs: remove the unused ctor argument of f2fs_kmem_cache_create() Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-10 18:45:14 +09:00
Gu Zheng	b6ce391e61	f2fs: update start nid only once each circle Integrated a couple of minor changes for better readability suggested by Chao Yu. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-10 18:45:09 +09:00
Chao Yu	9cf3c3898a	f2fs: fix dirty page accounting when redirty We should de-account dirty counters for page when redirty in ->writepage(). Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75': "writeback: fix dirtied pages accounting on redirty De-account the accumulative dirty counters on page redirty. Page redirties (very common in ext4) will introduce mismatch between counters (a) and (b) a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied b) NR_WRITTEN, BDI_WRITTEN This will introduce systematic errors in balanced_rate and result in dirty page position errors (ie. the dirty pages are no longer balanced around the global/bdi setpoints)." Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-28 13:09:08 +09:00
Jaegeuk Kim	8a7ed66aaf	f2fs: introduce a radix_tree for the free_nid list This patch introduces a radix tree for the list of free_nids, which enhances the performance on free nid management. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:41 +09:00
Gu Zheng	f978f5a061	f2fs: introduce help macro on_build_free_nids() Introduce help macro on_build_free_nids() which just uses build_lock to judge whether the building free nid is going, so that we can remove the on_build_free_nids field from f2fs_sb_info. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: remove an unnecessary white line removal] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:40 +09:00
Jaegeuk Kim	fffc2a00fc	f2fs: fix to mark the checkpointed nat entry correctly The nat cache entry maintains a status whether it is checkpointed or not. So, if a new cache entry is loaded from the last checkpoint, nat_entry->checkpointed should be true. If the cache entry is modified as being dirty, nat_entry->checkpoint should be false. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:40 +09:00
Jaegeuk Kim	b63da15e8b	f2fs: fix the calculation of max_nids Total nids that f2fs can use should not include 0, nid for node inode, and nid for meta inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Chao Yu	662befda25	f2fs: introduce ra_meta_pages to readahead CP/NAT/SIT pages This patch help us to cleanup the readahead code by merging ra_{sit,nat}_pages function into ra_meta_pages. Additionally the new function is used to readahead cp block in recover_orphan_inodes. Change log from v1: o fix a deadloop bug pointed by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Jaegeuk Kim	abb2366c82	f2fs: fix to recover xattr node block If a new xattr node page was allocated and its inode is fsynced, we should recover the xattr node page during the roll-forward process after power-cut. But, previously, f2fs didn't handle that case, resulting in kernel panic as follows reported by Tom Li. BUG: unable to handle kernel paging request at ffffc9001c861a98 IP: [<ffffffffa0295236>] check_index_in_prev_nodes+0x86/0x2d0 [f2fs] Call Trace: [<ffffffff815ece9b>] ? printk+0x48/0x4a [<ffffffffa029626a>] recover_fsync_data+0xdca/0xf50 [f2fs] [<ffffffffa02873ae>] f2fs_fill_super+0x92e/0x970 [f2fs] [<ffffffff8112c9f8>] mount_bdev+0x1b8/0x200 [<ffffffffa0286a80>] ? f2fs_remount+0x130/0x130 [f2fs] [<ffffffffa0285e40>] f2fs_mount+0x10/0x20 [f2fs] [<ffffffff8112d4de>] mount_fs+0x3e/0x1b0 [<ffffffff810ef4eb>] ? __alloc_percpu+0xb/0x10 [<ffffffff8114761f>] vfs_kern_mount+0x6f/0x120 [<ffffffff811497b9>] do_mount+0x259/0xa90 [<ffffffff810ead1d>] ? memdup_user+0x3d/0x80 [<ffffffff810eadb3>] ? strndup_user+0x53/0x70 [<ffffffff8114a2c9>] SyS_mount+0x89/0xd0 [<ffffffff815feae2>] system_call_fastpath+0x16/0x1b This patch adds a recovery function of xattr node pages. Reported-by: Tom Li <biergaizi@members.fsf.org> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:52 +09:00
Jaegeuk Kim	bf39c00a9a	f2fs: drop obsolete node page when it is truncated If a node page is trucated, we'd better drop the page in the node_inode's page cache for better memory footprint. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-23 08:04:21 +09:00
Jaegeuk Kim	4ef51a8fcc	f2fs: introduce NODE_MAPPING for code consistency This patch adds NODE_MAPPING which is similar as META_MAPPING introduced by Gu Zheng. Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-22 18:41:08 +09:00
Gu Zheng	9df27d982d	f2fs: add help function META_MAPPING Introduce help function META_MAPPING() to get the cache meta blocks' address space. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-22 18:41:07 +09:00
Chris Fries	6c311ec6c2	f2fs: clean checkpatch warnings Fixed a variety of trivial checkpatch warnings. The only delta should be some minor formatting on log strings that were split / too long. Signed-off-by: Chris Fries <cfries@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-20 10:27:12 +09:00
Jaegeuk Kim	fb5566da91	f2fs: improve write performance under frequent fsync calls When considering a bunch of data writes with very frequent fsync calls, we are able to think the following performance regression. N: Node IO, D: Data IO, IO scheduler: cfq Issue pending IOs D1 D2 D3 D4 D1 D2 D3 D4 N1 D2 D3 D4 N1 N2 N1 D3 D4 N2 D1 --> N1 can be selected by cfq becase of the same priority of N and D. Then D3 and D4 would be delayed, resuling in performance degradation. So, when processing the fsync call, it'd better give higher priority to data IOs than node IOs by assigning WRITE and WRITE_SYNC respectively. This patch improves the random wirte performance with frequent fsync calls by up to 10%. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-08 11:16:20 +09:00
shifei10.ge	a225dca394	f2fs: fix truncate_partial_nodes bug The truncate_partial_nodes puts pages incorrectly in the following two cases. Note that the value for argc 'depth' can only be 2 or 3. Please see truncate_inode_blocks() and truncate_partial_nodes(). 1) An err is occurred in the first 'for' loop When err is occurred with depth = 2, pages[0] is invalid, so this page doesn't need to be put. There is no problem, however, when depth is 3, it doesn't put the pages correctly where pages[0] is valid and pages[1] is invalid. In this case, depth is set to 2 (ref to statemnt depth = i + 1), and then 'goto fail'. In label 'fail', for (i = depth - 3; i >= 0; i--) cannot meet the condition because i = -1, so pages[0] cann't be put. 2) An err happened in the second 'for' loop Now we've got pages[0] with depth = 2, or we've got pages[0] and pages[1] with depth = 3. When an err is detected, we need 'goto fail' to put such the pages. When depth is 2, in label 'fail', for (i = depth - 3; i >= 0; i--) cann't meet the condition because i = -1, so pages[0] cann't be put. When depth is 3, in label 'fail', for (i = depth - 3; i >= 0; i--) can only put pages[0], pages[1] also cann't be put. Note that 'depth' has been changed before first 'goto fail' (ref to statemnt depth = i + 1), so passing this modified 'depth' to the tracepoint, trace_f2fs_truncate_partial_nodes, is also incorrect. Signed-off-by: Shifei Ge <shifei10.ge@samsung.com> [Jaegeuk Kim: modify the description and fix one bug] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:21 +09:00
Jaegeuk Kim	58bfaf44df	f2fs: introduce F2FS_INODE macro to get f2fs_inode This patch introduces F2FS_INODE that returns struct f2fs_inode * from the inode page. By using this macro, we can remove unnecessary casting codes like below. struct f2fs_inode ri = &F2FS_NODE(inode_page)->i; -> struct f2fs_inode ri = F2FS_INODE(inode_page); Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:32:48 +09:00
Chao Yu	4f4124d0b9	f2fs: update several comments Update several comments: 1. use f2fs_{un}lock_op install of mutex_{un}lock_op. 2. update comment of get_data_block(). 3. update description of node offset. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:26:03 +09:00

... 4 5 6 7 8 ...

632 Commits