mirror_ubuntu-kernels/fs
Sidhartha Kumar a08c7193e4 mm/filemap: remove hugetlb special casing in filemap.c
Remove special cased hugetlb handling code within the page cache by
changing the granularity of ->index to the base page size rather than the
huge page size.  The motivation of this patch is to reduce complexity
within the filemap code while also increasing performance by removing
branches that are evaluated on every page cache lookup.

To support the change in index, new wrappers for hugetlb page cache
interactions are added.  These wrappers perform the conversion to a linear
index which is now expected by the page cache for huge pages.

========================= PERFORMANCE ======================================

Perf was used to check the performance differences after the patch. 
Overall the performance is similar to mainline with a very small larger
overhead that occurs in __filemap_add_folio() and
hugetlb_add_to_page_cache().  This is because of the larger overhead that
occurs in xa_load() and xa_store() as the xarray is now using more entries
to store hugetlb folios in the page cache.

Timing

aarch64
    2MB Page Size
        6.5-rc3 + this patch:
            [root@sidhakum-ol9-1 hugepages]# time fallocate -l 700GB test.txt
            real    1m49.568s
            user    0m0.000s
            sys     1m49.461s

        6.5-rc3:
            [root]# time fallocate -l 700GB test.txt
            real    1m47.495s
            user    0m0.000s
            sys     1m47.370s
    1GB Page Size
        6.5-rc3 + this patch:
            [root@sidhakum-ol9-1 hugepages1G]# time fallocate -l 700GB test.txt
            real    1m47.024s
            user    0m0.000s
            sys     1m46.921s

        6.5-rc3:
            [root@sidhakum-ol9-1 hugepages1G]# time fallocate -l 700GB test.txt
            real    1m44.551s
            user    0m0.000s
            sys     1m44.438s

x86
    2MB Page Size
        6.5-rc3 + this patch:
            [root@sidhakum-ol9-2 hugepages]# time fallocate -l 100GB test.txt
            real    0m22.383s
            user    0m0.000s
            sys     0m22.255s

        6.5-rc3:
            [opc@sidhakum-ol9-2 hugepages]$ time sudo fallocate -l 100GB /dev/hugepages/test.txt
            real    0m22.735s
            user    0m0.038s
            sys     0m22.567s

    1GB Page Size
        6.5-rc3 + this patch:
            [root@sidhakum-ol9-2 hugepages1GB]# time fallocate -l 100GB test.txt
            real    0m25.786s
            user    0m0.001s
            sys     0m25.589s

        6.5-rc3:
            [root@sidhakum-ol9-2 hugepages1G]# time fallocate -l 100GB test.txt
            real    0m33.454s
            user    0m0.001s
            sys     0m33.193s

aarch64:
    workload - fallocate a 700GB file backed by huge pages

    6.5-rc3 + this patch:
        2MB Page Size:
            --100.00%--__arm64_sys_fallocate
                          ksys_fallocate
                          vfs_fallocate
                          hugetlbfs_fallocate
                          |
                          |--95.04%--__pi_clear_page
                          |
                          |--3.57%--clear_huge_page
                          |          |
                          |          |--2.63%--rcu_all_qs
                          |          |
                          |           --0.91%--__cond_resched
                          |
                           --0.67%--__cond_resched
            0.17%     0.00%             0  fallocate  [kernel.vmlinux]       [k] hugetlb_add_to_page_cache
            0.14%     0.10%            11  fallocate  [kernel.vmlinux]       [k] __filemap_add_folio

    6.5-rc3
        2MB Page Size:
                --100.00%--__arm64_sys_fallocate
                          ksys_fallocate
                          vfs_fallocate
                          hugetlbfs_fallocate
                          |
                          |--94.91%--__pi_clear_page
                          |
                          |--4.11%--clear_huge_page
                          |          |
                          |          |--3.00%--rcu_all_qs
                          |          |
                          |           --1.10%--__cond_resched
                          |
                           --0.59%--__cond_resched
            0.08%     0.01%             1  fallocate  [kernel.kallsyms]  [k] hugetlb_add_to_page_cache
            0.05%     0.03%             3  fallocate  [kernel.kallsyms]  [k] __filemap_add_folio

x86
    workload - fallocate a 100GB file backed by huge pages

    6.5-rc3 + this patch:
        2MB Page Size:
            hugetlbfs_fallocate
            |
            --99.57%--clear_huge_page
                |
                --98.47%--clear_page_erms
                    |
                    --0.53%--asm_sysvec_apic_timer_interrupt

            0.04%     0.04%             1  fallocate  [kernel.kallsyms]     [k] xa_load
            0.04%     0.00%             0  fallocate  [kernel.kallsyms]     [k] hugetlb_add_to_page_cache
            0.04%     0.00%             0  fallocate  [kernel.kallsyms]     [k] __filemap_add_folio
            0.04%     0.00%             0  fallocate  [kernel.kallsyms]     [k] xas_store

    6.5-rc3
        2MB Page Size:
                --99.93%--__x64_sys_fallocate
                          vfs_fallocate
                          hugetlbfs_fallocate
                          |
                           --99.38%--clear_huge_page
                                     |
                                     |--98.40%--clear_page_erms
                                     |
                                      --0.59%--__cond_resched
            0.03%     0.03%             1  fallocate  [kernel.kallsyms]  [k] __filemap_add_folio

========================= TESTING ======================================

This patch passes libhugetlbfs tests and LTP hugetlb tests

********** TEST SUMMARY
*                      2M
*                      32-bit 64-bit
*     Total testcases:   110    113
*             Skipped:     0      0
*                PASS:   107    113
*                FAIL:     0      0
*    Killed by signal:     3      0
*   Bad configuration:     0      0
*       Expected FAIL:     0      0
*     Unexpected PASS:     0      0
*    Test not present:     0      0
* Strange test result:     0      0
**********

    Done executing testcases.
    LTP Version:  20220527-178-g2761a81c4

page migration was also tested using Mike Kravetz's test program.[8]

[dan.carpenter@linaro.org: fix an NULL vs IS_ERR() bug]
  Link: https://lkml.kernel.org/r/1772c296-1417-486f-8eef-171af2192681@moroto.mountain
Link: https://lkml.kernel.org/r/20230926192017.98183-1-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reported-and-tested-by: syzbot+c225dea486da4d5592bd@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c225dea486da4d5592bd
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-16 15:44:38 -07:00
..
9p - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
adfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
affs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
afs - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
autofs v6.6-vfs.autofs 2023-08-28 11:39:14 -07:00
befs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
bfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
btrfs fs: super: dynamically allocate the s_shrink 2023-10-04 10:32:26 -07:00
cachefiles - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
ceph ceph: remove unnecessary check for NULL in parse_longname() 2023-09-18 12:04:50 +02:00
coda v6.6-vfs.ctime 2023-08-28 09:31:32 -07:00
configfs configfs: convert to ctime accessor functions 2023-07-13 10:28:05 +02:00
cramfs v6.6-vfs.super 2023-08-28 11:04:18 -07:00
crypto
debugfs Char/Misc driver changes for 6.6-rc1 2023-09-01 09:53:54 -07:00
devpts v6.6-vfs.misc 2023-08-28 10:17:14 -07:00
dlm dlm: fix plock lookup when using multiple lockspaces 2023-08-25 10:31:39 -05:00
ecryptfs v6.6-vfs.misc 2023-08-28 10:17:14 -07:00
efivarfs efivarfs: fix statfs() on efivarfs 2023-09-11 09:10:02 +00:00
efs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
erofs erofs: dynamically allocate the erofs-shrinker 2023-10-04 10:32:23 -07:00
exfat for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
exportfs exportfs: remove kernel-doc warnings in exportfs 2023-08-29 17:45:22 -04:00
ext2 \n 2023-08-30 12:10:50 -07:00
ext4 ext4: call bdev_getblk() from sb_getblk_gfp() 2023-10-04 10:32:29 -07:00
f2fs f2fs: dynamically allocate the f2fs-shrinker 2023-10-04 10:32:23 -07:00
fat for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
freevxfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
fscache
fuse fuse update for 6.6 2023-09-05 12:45:55 -07:00
gfs2 gfs2: dynamically allocate the gfs2-qd shrinker 2023-10-04 10:32:23 -07:00
hfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
hfsplus for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
hostfs hostfs: convert to ctime accessor functions 2023-07-24 10:30:00 +02:00
hpfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
hugetlbfs mm/filemap: remove hugetlb special casing in filemap.c 2023-10-16 15:44:38 -07:00
iomap iomap: Spelling s/preceeding/preceding/g 2023-09-28 09:26:58 -07:00
isofs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
jbd2 jbd2,ext4: dynamically allocate the jbd2-journal shrinker 2023-10-04 10:32:25 -07:00
jffs2 jffs2: convert to ctime accessor functions 2023-07-24 10:30:01 +02:00
jfs A few small fixes 2023-08-31 15:25:01 -07:00
kernfs fs: super: dynamically allocate the s_shrink 2023-10-04 10:32:26 -07:00
lockd SUNRPC: Add enum svc_auth_status 2023-08-29 17:45:22 -04:00
minix for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
netfs netfs: Only call folio_start_fscache() one time for each folio 2023-09-18 12:03:46 -07:00
nfs nfs: dynamically allocate the nfs-acl shrinker 2023-10-04 10:32:24 -07:00
nfs_common
nfsd nfsd: dynamically allocate the nfsd-reply shrinker 2023-10-04 10:32:25 -07:00
nilfs2 nilfs2: fix potential use after free in nilfs_gccache_submit_read_data() 2023-09-29 17:20:46 -07:00
nls nls: Hide new NLS_UCS2_UTILS 2023-08-31 12:07:34 -05:00
notify dnotify: Pass argument of fcntl_dirnotify as int 2023-07-10 14:36:12 +02:00
ntfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
ntfs3 ntfs3: put resources during ntfs_fill_super() 2023-09-25 14:12:42 +02:00
ocfs2 Many ext4 and jbd2 cleanups and bug fixes for v6.6-rc1. 2023-08-31 15:18:15 -07:00
omfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
openpromfs openpromfs: convert to ctime accessor functions 2023-07-24 10:30:03 +02:00
orangefs fs: drop the timespec64 argument from update_time 2023-08-11 09:04:57 +02:00
overlayfs v6.6-rc4.vfs.fixes 2023-09-26 08:50:30 -07:00
proc fs: super: dynamically allocate the s_shrink 2023-10-04 10:32:26 -07:00
pstore pstore fix for v6.6-rc1 2023-09-02 10:45:17 -07:00
qnx4 for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
qnx6 for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
quota quota: dynamically allocate the dquota-cache shrinker 2023-10-04 10:32:24 -07:00
ramfs ramfs: convert to ctime accessor functions 2023-07-24 10:30:04 +02:00
reiserfs reiserfs: Replace 1-element array with C99 style flex-array 2023-09-11 14:07:46 +02:00
romfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
smb smb3 client fix for password freeing potential oops 2023-09-30 09:39:23 -07:00
squashfs squashfs: convert to ctime accessor functions 2023-07-24 10:30:05 +02:00
sysfs sysfs: Skip empty folders creation 2023-06-15 13:37:53 +02:00
sysv for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
tracefs eventfs: Test for dentries array allocated in eventfs_release() 2023-09-30 16:26:04 -04:00
ubifs ubifs: dynamically allocate the ubifs-slab shrinker 2023-10-04 10:32:24 -07:00
udf \n 2023-08-30 12:10:50 -07:00
ufs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
unicode
vboxsf v6.6-vfs.ctime 2023-08-28 09:31:32 -07:00
verity fsverity: skip PKCS#7 parser when keyring is empty 2023-08-20 10:33:43 -07:00
xfs xfs: dynamically allocate the xfs-qm shrinker 2023-10-04 10:32:26 -07:00
zonefs New code for 6.6: 2023-08-28 11:59:52 -07:00
aio.c aio: Annotate struct kioctx_table with __counted_by 2023-09-20 14:22:01 +02:00
anon_inodes.c
attr.c v6.6-vfs.misc 2023-08-28 10:17:14 -07:00
bad_inode.c fs: drop the timespec64 argument from update_time 2023-08-11 09:04:57 +02:00
binfmt_elf_fdpic.c fs: binfmt_elf_efpic: fix personality for ELF-FDPIC 2023-09-29 17:20:45 -07:00
binfmt_elf_test.c
binfmt_elf.c Merge branch 'expand-stack' 2023-06-28 20:35:21 -07:00
binfmt_flat.c
binfmt_misc.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
binfmt_script.c
buffer.c buffer: remove __getblk_gfp() 2023-10-04 10:32:29 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c v6.5/vfs.misc 2023-06-26 09:50:21 -07:00
d_path.c
dax.c mm: convert DAX lock/unlock page to lock/unlock folio 2023-10-04 10:32:20 -07:00
dcache.c fs/dcache: Replace printk and WARN_ON by WARN 2023-08-19 13:41:11 +02:00
direct-io.c - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
drop_caches.c fs: drop_caches: draining pages before dropping caches 2023-08-18 10:12:11 -07:00
eventfd.c eventfd: prevent underflow for eventfd semaphores 2023-07-11 11:41:34 +02:00
eventpoll.c epoll: simplify ep_alloc() 2023-07-26 14:56:07 +02:00
exec.c mm/mremap: allow moves within the same VMA for stack moves 2023-10-04 10:32:20 -07:00
fcntl.c fcntl: Cast commands with int args explicitly 2023-07-10 14:36:11 +02:00
fhandle.c
file_table.c fs: use __fput_sync in close(2) 2023-08-08 19:36:51 +02:00
file.c v6.6-vfs.misc 2023-08-28 10:17:14 -07:00
filesystems.c
fs_context.c v6.6-vfs.misc 2023-08-28 10:17:14 -07:00
fs_parser.c
fs_pin.c
fs_struct.c kill do_each_thread() 2023-08-21 13:46:25 -07:00
fs_types.c
fs-writeback.c fs-writeback: do not requeue a clean inode having skipped pages 2023-09-20 14:22:01 +02:00
fsopen.c fs: add FSCONFIG_CMD_CREATE_EXCL 2023-08-14 18:48:02 +02:00
init.c
inode.c Revert "fs: add infrastructure for multigrain timestamps" 2023-09-20 18:05:31 +02:00
internal.h for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
ioctl.c v6.6-vfs.super 2023-08-28 11:04:18 -07:00
Kconfig for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
Kconfig.binfmt riscv: support the elf-fdpic binfmt loader 2023-08-23 14:17:43 -07:00
kernel_read_file.c fs: Fix kernel-doc warnings 2023-08-19 12:12:12 +02:00
libfs.c direct_write_fallback(): on error revert the ->ki_pos update from buffered write 2023-09-20 14:22:01 +02:00
locks.c NFSD 6.6 Release Notes 2023-08-31 15:32:18 -07:00
Makefile fs: add CONFIG_BUFFER_HEAD 2023-08-02 09:13:09 -06:00
mbcache.c mbcache: dynamically allocate the mbcache shrinker 2023-10-04 10:32:25 -07:00
mnt_idmapping.c
mount.h
mpage.c
namei.c fs: Fix kernel-doc warnings 2023-08-19 12:12:12 +02:00
namespace.c v6.5/vfs.mount 2023-06-26 10:27:04 -07:00
nsfs.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
open.c v6.6-vfs.fchmodat2 2023-08-28 11:25:27 -07:00
pipe.c fs/pipe: remove duplicate "offset" initializer 2023-09-20 14:22:01 +02:00
pnode.c
pnode.h
posix_acl.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
proc_namespace.c
read_write.c fs: Fix one kernel-doc comment 2023-08-15 08:32:45 +02:00
readdir.c vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
remap_range.c
select.c
seq_file.c
signalfd.c
splice.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
stack.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
stat.c Revert "fs: add infrastructure for multigrain timestamps" 2023-09-20 18:05:31 +02:00
statfs.c
super.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c mm: userfaultfd: remove stale comment about core dump locking 2023-08-24 16:20:27 -07:00
utimes.c
xattr.c tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs 2023-08-22 10:57:46 +02:00