mirror_ubuntu-kernels/mm
Yosry Ahmed 583c27a167 mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
Patch series "Ignore non-LRU-based reclaim in memcg reclaim", v6.

Upon running some proactive reclaim tests using memory.reclaim, we noticed
some tests flaking where writing to memory.reclaim would be successful
even though we did not reclaim the requested amount fully Looking further
into it, I discovered that *sometimes* we overestimate the number of
reclaimed pages in memcg reclaim.

Reclaimed pages through other means than LRU-based reclaim are tracked
through reclaim_state in struct scan_control, which is stashed in current
task_struct.  These pages are added to the number of reclaimed pages
through LRUs.  For memcg reclaim, these pages generally cannot be linked
to the memcg under reclaim and can cause an overestimated count of
reclaimed pages.  This short series tries to address that.

Patch 1 ignores pages reclaimed outside of LRU reclaim in memcg reclaim. 
The pages are uncharged anyway, so even if we end up under-reporting
reclaimed pages we will still succeed in making progress during charging.

Patches 2-3 are just refactoring.  Patch 2 moves set_reclaim_state()
helper next to flush_reclaim_state().  Patch 3 adds a helper that wraps
updating current->reclaim_state, and renames reclaim_state->reclaimed_slab
to reclaim_state->reclaimed.


This patch (of 3):

We keep track of different types of reclaimed pages through
reclaim_state->reclaimed_slab, and we add them to the reported number of
reclaimed pages.  For non-memcg reclaim, this makes sense.  For memcg
reclaim, we have no clue if those pages are charged to the memcg under
reclaim.

Slab pages are shared by different memcgs, so a freed slab page may have
only been partially charged to the memcg under reclaim.  The same goes for
clean file pages from pruned inodes (on highmem systems) or xfs buffer
pages, there is no simple way to currently link them to the memcg under
reclaim.

Stop reporting those freed pages as reclaimed pages during memcg reclaim. 
This should make the return value of writing to memory.reclaim, and may
help reduce unnecessary reclaim retries during memcg charging.  Writing to
memory.reclaim on the root memcg is considered as cgroup_reclaim(), but
for this case we want to include any freed pages, so use the
global_reclaim() check instead of !cgroup_reclaim().

Generally, this should make the return value of
try_to_free_mem_cgroup_pages() more accurate.  In some limited cases (e.g.
freed a slab page that was mostly charged to the memcg under reclaim),
the return value of try_to_free_mem_cgroup_pages() can be underestimated,
but this should be fine.  The freed pages will be uncharged anyway, and we
can charge the memcg the next time around as we usually do memcg reclaim
in a retry loop.

Link: https://lkml.kernel.org/r/20230413104034.1086717-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20230413104034.1086717-2-yosryahmed@google.com
Fixes: f2fe7b09a5 ("mm: memcg/slab: charge individual slab objects
instead of pages")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18 16:30:10 -07:00
..
damon mm/damon/sysfs: make more kobj_type structures constant 2023-04-05 19:42:59 -07:00
kasan kasan: suppress recursive reports for HW_TAGS 2023-04-05 19:42:43 -07:00
kfence mm: kfence: improve the performance of __kfence_alloc() and __kfence_free() 2023-04-18 16:29:49 -07:00
kmsan kmsan: fix a stale comment in kmsan_save_stack_with_flags() 2023-04-18 16:29:47 -07:00
backing-dev.c mm: backing-dev: set variables dev_attr_min,max_bytes storage-class-specifier to static 2023-04-18 16:29:56 -07:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
cma.c mm: move most of core MM initialization to mm/mm_init.c 2023-04-05 19:42:52 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c mm: compaction: fix the possible deadlock when isolating hugetlb pages 2023-04-05 19:42:50 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
debug.c mm/debug: use %pGt to display page_type in dump_page() 2023-03-28 16:20:09 -07:00
dmapool_test.c dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
dmapool.c dmapool: create/destroy cleanup 2023-04-05 19:42:41 -07:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c mm: support POSIX_FADV_NOREUSE 2023-01-18 17:12:57 -08:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c mm: return an ERR_PTR from __filemap_get_folio 2023-04-05 19:42:42 -07:00
folio-compat.c mm: return an ERR_PTR from __filemap_get_folio 2023-04-05 19:42:42 -07:00
frontswap.c frontswap: don't call ->init if no ops are registered 2022-09-26 12:14:34 -07:00
gup_test.c mm/gup_test: free memory allocated via kvcalloc() using kvfree() 2022-12-15 16:37:48 -08:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
gup.c mm/gup.c: fix typo in comments 2023-03-28 16:20:14 -07:00
highmem.c highmem: fix kmap_to_page() for kmap_local_page() addresses 2022-10-12 18:51:51 -07:00
hmm.c mm/hugetlb: make walk_hugetlb_range() safe to pmd unshare 2023-01-18 17:12:39 -08:00
huge_memory.c mm/huge_memory: conditionally call maybe_mkwrite() and drop pte_wrprotect() in __split_huge_pmd_locked() 2023-04-18 16:30:01 -07:00
hugetlb_cgroup.c mm/hugetlb: increase use of folios in alloc_huge_page() 2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c mm, page_alloc: use check_pages_enabled static key to check tail pages 2023-04-18 16:29:54 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hugetlb.c mm: hwpoison: support recovery from HugePage copy-on-write faults 2023-04-18 16:30:09 -07:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: add per-VMA lock and helper functions to control it 2023-04-05 20:02:57 -07:00
internal.h mm: apply __must_check to vmap_pages_range_noflush() 2023-04-18 16:30:10 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP 2023-04-18 16:30:09 -07:00
Kconfig.debug kmemleak-test: fix kmemleak_test.c build logic 2023-04-18 16:29:47 -07:00
khugepaged.c mm/khugepaged: maintain page cache uptodate flag 2023-04-18 16:29:52 -07:00
kmemleak.c lib/stackdepot, mm: rename stack_depot_want_early_init 2023-02-16 20:43:49 -08:00
ksm.c mm: add tracepoints to ksm 2023-03-28 16:20:08 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-11 11:44:46 -08:00
madvise.c mm/madvise: use vma_lookup() instead of find_vma() 2023-04-18 16:29:52 -07:00
Makefile dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
mapping_dirty_helpers.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
memblock.c mm: avoid passing 0 to __ffs() 2023-04-18 16:29:42 -07:00
memcontrol.c memcg: page_cgroup_ino() get memcg from the page's folio 2023-04-18 16:30:09 -07:00
memfd.c mm/memfd: add write seals when apply SEAL_EXEC to executable memfd 2023-01-18 17:12:37 -08:00
memory_hotplug.c mm: avoid passing 0 to __ffs() 2023-04-18 16:29:42 -07:00
memory-failure.c mm: memory-failure: directly use IS_ENABLED(CONFIG_HWPOISON_INJECT) 2023-03-28 16:20:17 -07:00
memory-tiers.c memory tier: release the new_memtier in find_create_memory_tier() 2023-02-09 16:51:40 -08:00
memory.c mm: hwpoison: support recovery from HugePage copy-on-write faults 2023-04-18 16:30:09 -07:00
mempolicy.c mm/mempolicy: fix use-after-free of VMA iterator 2023-04-16 10:41:25 -07:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c mm/memtest: add results of early memtest to /proc/meminfo 2023-04-05 19:42:55 -07:00
migrate_device.c mm: change to return bool for isolate_lru_page() 2023-02-20 12:46:17 -08:00
migrate.c mm/migrate: revert "mm/migrate: fix wrongly apply write bit after mkdirty on sparc64" 2023-04-18 16:30:01 -07:00
mincore.c mm: return an ERR_PTR from __filemap_get_folio 2023-04-05 19:42:42 -07:00
mlock.c mm: mlock: use folios_put() in mlock_folio_batch() 2023-04-18 16:29:53 -07:00
mm_init.c mm/vmemmap/devdax: fix kernel crash when probing devdax devices 2023-04-18 16:30:09 -07:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes 2023-04-18 14:53:49 -07:00
mmu_gather.c mm: prefer xxx_page() alloc/free functions for order-0 pages 2023-03-28 16:20:16 -07:00
mmu_notifier.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
mmzone.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mprotect.c mm/userfaultfd: don't consider uffd-wp bit of writable migration entries 2023-04-18 16:29:53 -07:00
mremap.c mm/mremap: write-lock VMA while remapping it to a new address range 2023-04-05 20:02:58 -07:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: vmalloc: convert vread() to vread_iter() 2023-04-05 19:42:57 -07:00
oom_kill.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
page_alloc.c mm, page_alloc: use check_pages_enabled static key to check tail pages 2023-04-18 16:29:54 -07:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c mm/page_ext: init page_ext early if there are no deferred struct pages 2023-02-02 22:33:22 -08:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
page_isolation.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
page_owner.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
page_poison.c
page_reporting.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c mm/page_ext: do not allocate space for page_ext->flags if not needed 2023-02-02 22:33:11 -08:00
page_vma_mapped.c mm/hugetlb: introduce hugetlb_walk() 2023-01-18 17:12:39 -08:00
page-writeback.c mm,jfs: move write_one_page/folio_write_one to jfs 2023-03-28 16:20:14 -07:00
pagewalk.c mm/hugetlb: introduce hugetlb_walk() 2023-01-18 17:12:39 -08:00
percpu-internal.h mm: percpu: fix incorrect size in pcpu_obj_full_size() 2023-02-16 20:43:55 -08:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c
percpu.c mm: memcontrol: rename memcg_kmem_enabled() 2023-02-16 20:43:56 -08:00
pgalloc-track.h
pgtable-generic.c mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault() 2023-03-28 16:20:12 -07:00
process_vm_access.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-03 10:13:13 -07:00
readahead.c readahead: convert readahead_expand() to use a folio 2023-02-02 22:33:21 -08:00
rmap.c mm/khugepaged: write-lock VMA while collapsing a huge page 2023-04-05 20:02:58 -07:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
shmem.c userfaultfd: convert mfill_atomic() to use a folio 2023-04-18 16:29:55 -07:00
shrinker_debug.c mm: shrinkers: convert shrinker_rwsem to mutex 2023-03-28 16:20:17 -07:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
slab_common.c mm/kasan: simplify and refine kasan_cache code 2023-01-18 17:12:55 -08:00
slab.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
slab.h mm: move kmem_cache_init() declaration to mm/slab.h 2023-04-05 19:42:54 -07:00
slob.c Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next 2022-09-29 11:30:55 +02:00
slub.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
sparse-vmemmap.c mm/vmemmap/devdax: fix kernel crash when probing devdax devices 2023-04-18 16:30:09 -07:00
sparse.c mm/sparse: fix "unused function 'pgdat_to_phys'" warning 2023-02-02 22:33:29 -08:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c mm: return an ERR_PTR from __filemap_get_folio 2023-04-05 19:42:42 -07:00
swap.c mm: swap: fix performance regression on sparsetruncate-tiny 2023-04-16 10:41:24 -07:00
swap.h mm: remove the __swap_writepage return value 2023-02-02 22:33:33 -08:00
swapfile.c sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes 2023-04-16 12:31:58 -07:00
truncate.c mm: return an ERR_PTR from __filemap_get_folio 2023-04-05 19:42:42 -07:00
usercopy.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
userfaultfd.c userfaultfd: convert mfill_atomic() to use a folio 2023-04-18 16:29:55 -07:00
util.c mm: fix typo in __vm_enough_memory warning 2023-02-13 15:54:33 -08:00
vmalloc.c mm: vmalloc: rename addr_to_vb_xarray() function 2023-04-18 16:29:48 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim 2023-04-18 16:30:10 -07:00
vmstat.c mm: introduce per-VMA lock statistics 2023-04-05 20:03:01 -07:00
workingset.c workingset: memcg: sleep when flushing stats in workingset_refault() 2023-04-18 16:29:50 -07:00
z3fold.c mm: remove PageMovable export 2023-01-18 17:12:57 -08:00
zbud.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zpool.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zsmalloc.c zsmalloc: reset compaction source zspage pointer after putback_zspage() 2023-04-18 16:29:42 -07:00
zswap.c mm/zswap: delay the initialization of zswap 2023-04-18 16:29:48 -07:00