mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-25 12:31:07 +00:00
8641661cb7
3666 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
3822a7c409 |
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter". These filters provide users
with finer-grained control over DAMOS's actions. SeongJae has also done
some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series "mm:
support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with his
series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings. The previous BPF-based approach had
shortcomings. See "mm: In-kernel support for memory-deny-write-execute
(MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a per-node
basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage during
compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in ths
series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's series
"mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
"fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest of
the kernel in the series "Simplify the external interface for GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the series
"mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
=MlGs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
|
||
|
|
238b05ec99 |
Micro-optimize __flush_tlb_all().
Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzdIoRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iHNQ//UJ49mOUEHiyChwMxEWcHjY4KXffafg54 ahXCMgT3mwaA0OEjGvVON0w8Fe6NVeMv2n8/mR+X/r6euBpk/XJDCAa/tUfx3wUh 9AiWSktlAgR/2lgof8/UDdAaHxjabXJCS/qi2u+0D/cKr5odDR5SogCca6AeG1op ZskTJPg3teSDbW8vd9k7wkTYy8xRAHUmdBibXvbVXdehzm8Qw2RDC6pfONlI26up UEteiBT9FjxyjrLhZ2snsnNPei1JFMbvjikJCPEwRES4sFNZbevHb2NlDxUSC7M+ wwb8VnUI9UX9HMZFiqVTXfwDNAlKpQmYuSfL5juFPgLgoiiHPpdKr4tw9jiNdB++ NLeIsbQLS70wpC2PflFo4Kmpr3FQU5FA1TesBaD3c6wjey+PGvojCRyh2pbTL2tQ dM/hPZnsmYkbzUgFV8rSXCON7LdXFgkKBJJKgW9eT1njQRbxHL/6RVh67fa1640O gamXYi7icrrkaMPRCB+TwtA14S7FBVKo/I4m+4j880zY3Hrvv7pzoHPQilu5Hg32 4yZC0XJcoQcWfCOxMCOFdbeAOx+PIxPJxIuSh6HdpJ7nL5igbXJtIBea78ru2RhS 22T+0h4dpm8ZygeGvMCunZBvzrEUUJdEbIC1csOcSKPzzMzVfEf0rr+QLAFXUDVn S/rgNqnV+Hc= =iekt -----END PGP SIGNATURE----- Merge tag 'x86-mm-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm update from Ingo Molnar: "Micro-optimize __flush_tlb_all()" * tag 'x86-mm-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Use cpu_feature_enabled() when checking global pages support |
||
|
|
6e649d0856 |
Updates for this cycle were:
- rwsem micro-optimizations
- spinlock micro-optimizations
- cleanups, simplifications
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzZkURHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gBvA/6A5RMmSdOIVojmiwUYvZInA/Kpm2wuW0q
bQWW9maLb96JMpj3FB5Xs5U993WiF0Gt9aGHoND9V2wOYbnv01ElCKKgsw7zLnXb
c++txpmD+HoUGp94H8T2nA3szPLR7OpPpLmfjTWHKeWQRTStJobTTqi5jVTUZT37
92MZ2tVzapQJq5VESk0C+0FBFDobh0gTX8hwkEj83ubXK4rC071/gJD4JHZt4nWN
Up9YGNoNvw+ns7upo2C1XJ4H4ucFoCXT2smH4Oh0gk8Cfs6oP1k5H8J5aJQ+23fT
EOWzkk0vJdpukNXI1+4G4KMwCO6zv+xVxXpEBizEeTgKWwbJpgBeGrisheAUyMHT
bwfztsn+NQET11NsccmtRzspscUT42Nc+FUW0KeR2LiBKZhuD6l1Tac3w2HolycA
2YjMQx3ATOEnMFgv4jGlldlasIAnYj0qitw6wCGqkJSvrC3au/LfcBHn45SxkBWc
KZV1Oj26aH1hDxYSLyZRmFEvf/46D9CHmv8ReuFbmM6FIYwL+go+Odw/MHMFAbZR
aP9YR4e94p6WmaNwMqzozP+wN67E4TME2vG6+T1n/szKDogoBlcn/wl053pkcHa/
CsjELY82/CRRrDgWSnSKUZEFvnnBujyEiSz7pTZCzdBTMc/EcxK5CHOSyN23x+LI
TvvxFn7KM/o=
=yTly
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- rwsem micro-optimizations
- spinlock micro-optimizations
- cleanups, simplifications
* tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vduse: Remove include of rwlock.h
locking/lockdep: Remove lockdep_init_map_crosslock.
x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()
x86/PAT: Use try_cmpxchg() in set_page_memtype()
locking/rwsem: Disable preemption in all down_write*() and up_write() code paths
locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
locking/qspinlock: Micro-optimize pending state waiting for unlock
|
||
|
|
f9f57da2c2 |
x86/mtrr: Revert 90b926e68f ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
Commit
|
||
|
|
68f48381d7 |
mm: introduce __vm_flags_mod and use it in untrack_pfn
There are scenarios when vm_flags can be modified without exclusive mmap_lock, such as: - after VMA was isolated and mmap_lock was downgraded or dropped - in exit_mmap when there are no other mm users and locking is unnecessary Introduce __vm_flags_mod to avoid assertions when the caller takes responsibility for the required locking. Pass a hint to untrack_pfn to conditionally use __vm_flags_mod for flags modification to avoid assertion. Link: https://lkml.kernel.org/r/20230126193752.297968-7-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
1c71222e5f |
mm: replace vma->vm_flags direct modifications with modifier calls
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
1e15d374bb |
Revert "x86: kmsan: sync metadata pages on page fault"
This reverts commit
|
||
|
|
50fd4d5e69 |
x86/PAT: Use try_cmpxchg() in set_page_memtype()
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in set_page_memtype. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. Note that the value from *ptr should be read using READ_ONCE to prevent the compiler from merging, refetching or reordering the read. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230116163446.4734-1-ubizjak@gmail.com |
||
|
|
ebd3ad60a6 |
x86/cpu: Use cpu_feature_enabled() when checking global pages support
X86_FEATURE_PGE determines whether the CPU has enabled global page translations support. Use the faster cpu_feature_enabled() check to shave off some more cycles when flushing all TLB entries, including the global ones. What this practically saves is: mov 0x82eb308(%rip),%rax # 0xffffffff8935bec8 <boot_cpu_data+40> test $0x20,%ah ... which test the bit. Not a lot, but TLB flushing is a timing-sensitive path, so anything to make it even faster. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230125075013.9292-1-bp@alien8.de |
||
|
|
26ce6ec364 |
x86/mm: fix poking_init() for Xen PV guests
Commit |
||
|
|
90b926e68f |
x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
Since |
||
|
|
3c202d14a9 |
prandom: remove prandom_u32_max()
Convert the final two users of prandom_u32_max() that slipped in during 6.2-rc1 to use get_random_u32_below(). Then, with no more users left, we can finally remove the deprecated function. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> |
||
|
|
4f292c4de4 |
New Feature:
* Randomize the per-cpu entry areas Cleanups: * Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it * Move to "native" set_memory_rox() helper * Clean up pmd_get_atomic() and i386-PAE * Remove some unused page table size macros -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/ Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN 9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ +PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY= =wwVF -----END PGP SIGNATURE----- Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Dave Hansen: "New Feature: - Randomize the per-cpu entry areas Cleanups: - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it - Move to "native" set_memory_rox() helper - Clean up pmd_get_atomic() and i386-PAE - Remove some unused page table size macros" * tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) x86/mm: Ensure forced page table splitting x86/kasan: Populate shadow for shared chunk of the CPU entry area x86/kasan: Add helpers to align shadow addresses up and down x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area x86/mm: Recompute physical address for every page of per-CPU CEA mapping x86/mm: Rename __change_page_attr_set_clr(.checkalias) x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias() x86/mm: Untangle __change_page_attr_set_clr(.checkalias) x86/mm: Add a few comments x86/mm: Fix CR3_ADDR_MASK x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros mm: Convert __HAVE_ARCH_P..P_GET to the new style mm: Remove pointless barrier() after pmdp_get_lockless() x86/mm/pae: Get rid of set_64bit() x86_64: Remove pointless set_64bit() usage x86/mm/pae: Be consistent with pXXp_get_and_clear() x86/mm/pae: Use WRITE_ONCE() x86/mm/pae: Don't (ab)use atomic64 mm/gup: Fix the lockless PMD access ... |
||
|
|
fe36bb8736 |
Tracing updates for 6.2:
- Add options to the osnoise tracer
o panic_on_stop option that panics the kernel if osnoise is greater than some
user defined threshold.
o preempt option, to test noise while preemption is disabled
o irq option, to test noise when interrupts are disabled
- Add .percent and .graph suffix to histograms to give different outputs
- Add nohitcount to disable showing hitcount in histogram output
- Add new __cpumask() to trace event fields to annotate that a unsigned long
array is a cpumask to user space and should be treated as one.
- Add trace_trigger kernel command line parameter to enable trace event
triggers at boot up. Useful to trace stack traces, disable tracing and take
snapshots.
- Fix x86/kmmio mmio tracer to work with the updates to lockdep
- Unify the panic and die notifiers
- Add back ftrace_expect reference that is used to extract more information in
the ftrace_bug() code.
- Have trigger filter parsing errors show up in the tracing error log.
- Updated MAINTAINERS file to add kernel tracing mailing list and patchwork
info
- Use IDA to keep track of event type numbers.
- And minor fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5vIcxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qlO7AQCmtZbriadAR6N7Llj092YXmYfzrxyi
1WS35vhpZsBJ8gEA8j68l+LrgNt51N2gXlTXEHgXzdBgL/TKAPSX4D99GQY=
=z1pe
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Add options to the osnoise tracer:
- 'panic_on_stop' option that panics the kernel if osnoise is
greater than some user defined threshold.
- 'preempt' option, to test noise while preemption is disabled
- 'irq' option, to test noise when interrupts are disabled
- Add .percent and .graph suffix to histograms to give different
outputs
- Add nohitcount to disable showing hitcount in histogram output
- Add new __cpumask() to trace event fields to annotate that a unsigned
long array is a cpumask to user space and should be treated as one.
- Add trace_trigger kernel command line parameter to enable trace event
triggers at boot up. Useful to trace stack traces, disable tracing
and take snapshots.
- Fix x86/kmmio mmio tracer to work with the updates to lockdep
- Unify the panic and die notifiers
- Add back ftrace_expect reference that is used to extract more
information in the ftrace_bug() code.
- Have trigger filter parsing errors show up in the tracing error log.
- Updated MAINTAINERS file to add kernel tracing mailing list and
patchwork info
- Use IDA to keep track of event type numbers.
- And minor fixes and clean ups
* tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
tracing: Fix cpumask() example typo
tracing: Improve panic/die notifiers
ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels
tracing: Do not synchronize freeing of trigger filter on boot up
tracing: Remove pointer (asterisk) and brackets from cpumask_t field
tracing: Have trigger filter parsing errors show up in error_log
x86/mm/kmmio: Remove redundant preempt_disable()
tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
Documentation/osnoise: Add osnoise/options documentation
tracing/osnoise: Add preempt and/or irq disabled options
tracing/osnoise: Add PANIC_ON_STOP option
Documentation/osnoise: Escape underscore of NO_ prefix
tracing: Fix some checker warnings
tracing/osnoise: Make osnoise_options static
tracing: remove unnecessary trace_trigger ifdef
ring-buffer: Handle resize in early boot up
tracing/hist: Fix issue of losting command info in error_log
tracing: Fix issue of missing one synthetic field
tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
tracing/hist: Fix wrong return value in parse_action_params()
...
|
||
|
|
3e844d842d |
x86/mm: Ensure forced page table splitting
There are a few kernel users like kfence that require 4k pages to work
correctly and do not support large mappings. They use set_memory_4k()
to break down those large mappings.
That, in turn relies on cpa_data->force_split option to indicate to
set_memory code that it should split page tables regardless of whether
the need to be.
But, a recent change added an optimization which would return early
if a set_memory request came in that did not change permissions. It
did not consult ->force_split and would mistakenly optimize away the
splitting that set_memory_4k() needs. This broke kfence.
Skip the same-permission optimization when ->force_split is set.
Fixes: 127960a05548 ("x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Marco Elver <elver@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/CA+G9fYuFxZTxkeS35VTZMXwQvohu73W3xbZ5NtjebsVvH6hCuA@mail.gmail.com/
|
||
|
|
1cfaac2400 |
x86/kasan: Populate shadow for shared chunk of the CPU entry area
Popuplate the shadow for the shared portion of the CPU entry area, i.e. the read-only IDT mapping, during KASAN initialization. A recent change modified KASAN to map the per-CPU areas on-demand, but forgot to keep a shadow for the common area that is shared amongst all CPUs. Map the common area in KASAN init instead of letting idt_map_in_cea() do the dirty work so that it Just Works in the unlikely event more shared data is shoved into the CPU entry area. The bug manifests as a not-present #PF when software attempts to lookup an IDT entry, e.g. when KVM is handling IRQs on Intel CPUs (KVM performs direct CALL to the IRQ handler to avoid the overhead of INTn): BUG: unable to handle page fault for address: fffffbc0000001d8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 16c03a067 P4D 16c03a067 PUD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 5 PID: 901 Comm: repro Tainted: G W 6.1.0-rc3+ #410 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kasan_check_range+0xdf/0x190 vmx_handle_exit_irqoff+0x152/0x290 [kvm_intel] vcpu_run+0x1d89/0x2bd0 [kvm] kvm_arch_vcpu_ioctl_run+0x3ce/0xa70 [kvm] kvm_vcpu_ioctl+0x349/0x900 [kvm] __x64_sys_ioctl+0xb8/0xf0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Reported-by: syzbot+8cdd16fd5a6c0565e227@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110203504.1985010-6-seanjc@google.com |
||
|
|
bde258d974 |
x86/kasan: Add helpers to align shadow addresses up and down
Add helpers to dedup code for aligning shadow address up/down to page boundaries when translating an address to its shadow. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-5-seanjc@google.com |
||
|
|
7077d2ccb9 |
x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
Rename the CPU entry area variables in kasan_init() to shorten their names, a future fix will reference the beginning of the per-CPU portion of the CPU entry area, and shadow_cpu_entry_per_cpu_begin is a bit much. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-4-seanjc@google.com |
||
|
|
97650148a1 |
x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
Populate a KASAN shadow for the entire possible per-CPU range of the CPU
entry area instead of requiring that each individual chunk map a shadow.
Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping
was left behind, which can lead to not-present page faults during KASAN
validation if the kernel performs a software lookup into the GDT. The DS
buffer is also likely affected.
The motivation for mapping the per-CPU areas on-demand was to avoid
mapping the entire 512GiB range that's reserved for the CPU entry area,
shaving a few bytes by not creating shadows for potentially unused memory
was not a goal.
The bug is most easily reproduced by doing a sigreturn with a garbage
CS in the sigcontext, e.g.
int main(void)
{
struct sigcontext regs;
syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
memset(®s, 0, sizeof(regs));
regs.cs = 0x1d0;
syscall(__NR_rt_sigreturn);
return 0;
}
to coerce the kernel into doing a GDT lookup to compute CS.base when
reading the instruction bytes on the subsequent #GP to determine whether
or not the #GP is something the kernel should handle, e.g. to fixup UMIP
violations or to emulate CLI/STI for IOPL=3 applications.
BUG: unable to handle page fault for address: fffffbc8379ace00
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kasan_check_range+0xdf/0x190
Call Trace:
<TASK>
get_desc+0xb0/0x1d0
insn_get_seg_base+0x104/0x270
insn_fetch_from_user+0x66/0x80
fixup_umip_exception+0xb1/0x530
exc_general_protection+0x181/0x210
asm_exc_general_protection+0x22/0x30
RIP: 0003:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0003:0000000000000000 EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: syzbot+ffb4f000dc2872c93f62@syzkaller.appspotmail.com
Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Link: https://lkml.kernel.org/r/20221110203504.1985010-3-seanjc@google.com
|
||
|
|
80d72a8f76 |
x86/mm: Recompute physical address for every page of per-CPU CEA mapping
Recompute the physical address for each per-CPU page in the CPU entry
area, a recent commit inadvertantly modified cea_map_percpu_pages() such
that every PTE is mapped to the physical address of the first page.
Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Link: https://lkml.kernel.org/r/20221110203504.1985010-2-seanjc@google.com
|
||
|
|
e996365ee7 |
x86/mm: Rename __change_page_attr_set_clr(.checkalias)
Now that the checkalias functionality is taken by CPA_NO_CHECK_ALIAS rename the argument to better match is remaining purpose: primary, matching __change_page_attr(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.661001508%40infradead.org |
||
|
|
d597416683 |
x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
There is a cludge in change_page_attr_set_clr() that inhibits
propagating NX changes to the aliases (directmap and highmap) -- this
is a cludge twofold:
- it also inhibits the primary checks in __change_page_attr();
- it hard depends on single bit changes.
The introduction of set_memory_rox() triggered this last issue for
clearing both _PAGE_RW and _PAGE_NX.
Explicitly ignore _PAGE_NX in cpa_process_alias() instead.
Fixes: b38994948567 ("x86/mm: Implement native set_memory_rox()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Debugged-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221110125544.594991716%40infradead.org
|
||
|
|
ef9ab81af6 |
x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
The .checkalias argument to __change_page_attr_set_clr() is overloaded
and serves two different purposes:
- it inhibits the call to cpa_process_alias() -- as suggested by the
name; however,
- it also serves as 'primary' indicator for __change_page_attr()
( which in turn also serves as a recursion terminator for
cpa_process_alias() ).
Untangle these by extending the use of CPA_NO_CHECK_ALIAS to all
callsites that currently use .checkalias=0 for this purpose.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221110125544.527267183%40infradead.org
|
||
|
|
5ceeee7571 |
x86/mm: Add a few comments
It's a shame to hide useful comments in Changelogs, add some to the
code.
Shamelessly stolen from commit:
|
||
|
|
82328227db |
x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
Other architectures and the common mm/ use P*D_MASK, and P*D_SIZE. Remove the duplicated P*D_PAGE_MASK and P*D_PAGE_SIZE which are only used in x86/*. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20220516185202.604654-1-tatashin@google.com |
||
|
|
60463628c9 |
x86/mm: Implement native set_memory_rox()
Provide a native implementation of set_memory_rox(), avoiding the double set_memory_ro();set_memory_x(); calls. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> |
||
|
|
414ebf148c |
x86/mm: Do verify W^X at boot up
Straight up revert of commit:
|
||
|
|
3f4c8211d9 |
x86/mm: Use mm_alloc() in poking_init()
Instead of duplicating init_mm, allocate a fresh mm. The advantage is that mm_alloc() has much simpler dependencies. Additionally it makes more conceptual sense, init_mm has no (and must not have) user state to duplicate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org |
||
|
|
97e3d26b5e |
x86/mm: Randomize per-cpu entry area
Seth found that the CPU-entry-area; the piece of per-cpu data that is mapped into the userspace page-tables for kPTI is not subject to any randomization -- irrespective of kASLR settings. On x86_64 a whole P4D (512 GB) of virtual address space is reserved for this structure, which is plenty large enough to randomize things a little. As such, use a straight forward randomization scheme that avoids duplicates to spread the existing CPUs over the available space. [ bp: Fix le build. ] Reported-by: Seth Jenkins <sethjenkins@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
|
|
3f148f3318 |
x86/kasan: Map shadow for percpu pages on demand
KASAN maps shadow for the entire CPU-entry-area: [CPU_ENTRY_AREA_BASE, CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE] This will explode once the per-cpu entry areas are randomized since it will increase CPU_ENTRY_AREA_MAP_SIZE to 512 GB and KASAN fails to allocate shadow for such big area. Fix this by allocating KASAN shadow only for really used cpu entry area addresses mapped by cea_map_percpu_pages() Thanks to the 0day folks for finding and reporting this to be an issue. [ dhansen: tweak changelog since this will get committed before peterz's actual cpu-entry-area randomization ] Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Yujie Liu <yujie.liu@intel.com> Cc: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com |
||
|
|
48ea09cdda |
hardening updates for v6.2-rc1
- Convert flexible array members, fix -Wstringop-overflow warnings,
and fix KCFI function type mismatches that went ignored by
maintainers (Gustavo A. R. Silva, Nathan Chancellor, Kees Cook).
- Remove the remaining side-effect users of ksize() by converting
dma-buf, btrfs, and coredump to using kmalloc_size_roundup(),
add more __alloc_size attributes, and introduce full testing
of all allocator functions. Finally remove the ksize() side-effect
so that each allocation-aware checker can finally behave without
exceptions.
- Introduce oops_limit (default 10,000) and warn_limit (default off)
to provide greater granularity of control for panic_on_oops and
panic_on_warn (Jann Horn, Kees Cook).
- Introduce overflows_type() and castable_to_type() helpers for
cleaner overflow checking.
- Improve code generation for strscpy() and update str*() kern-doc.
- Convert strscpy and sigphash tests to KUnit, and expand memcpy
tests.
- Always use a non-NULL argument for prepare_kernel_cred().
- Disable structleak plugin in FORTIFY KUnit test (Anders Roxell).
- Adjust orphan linker section checking to respect CONFIG_WERROR
(Xin Li).
- Make sure siginfo is cleared for forced SIGKILL (haifeng.xu).
- Fix um vs FORTIFY warnings for always-NULL arguments.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmOZSOoWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjAAD/0YkvpU7f03f8hcQMJK6wv//24K
AW41hEaBikq9RcmkuvkLLrJRibGgZ5O2xUkUkxRs/HxhkhrZ0kEw8sbwZe8MoWls
F4Y9+TDjsrdHmjhfcBZdLnVxwcKK5wlaEcpjZXtbsfcdhx3TbgcDA23YELl5t0K+
I11j4kYmf9SLl4CwIrSP5iACml8CBHARDh8oIMF7FT/LrjNbM8XkvBcVVT6hTbOV
yjgA8WP2e9GXvj9GzKgqvd0uE/kwPkVAeXLNFWopPi4FQ8AWjlxbBZR0gamA6/EB
d7TIs0ifpVU2JGQaTav4xO6SsFMj3ntoUI0qIrFaTxZAvV4KYGrPT/Kwz1O4SFaG
rN5lcxseQbPQSBTFNG4zFjpywTkVCgD2tZqDwz5Rrmiraz0RyIokCN+i4CD9S0Ds
oEd8JSyLBk1sRALczkuEKo0an5AyC9YWRcBXuRdIHpLo08PsbeUUSe//4pe303cw
0ApQxYOXnrIk26MLElTzSMImlSvlzW6/5XXzL9ME16leSHOIfDeerPnc9FU9Eb3z
ODv22z6tJZ9H/apSUIHZbMciMbbVTZ8zgpkfydr08o87b342N/ncYHZ5cSvQ6DWb
jS5YOIuvl46/IhMPT16qWC8p0bP5YhxoPv5l6Xr0zq0ooEj0E7keiD/SzoLvW+Qs
AHXcibguPRQBPAdiPQ==
=yaaN
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kernel hardening updates from Kees Cook:
- Convert flexible array members, fix -Wstringop-overflow warnings, and
fix KCFI function type mismatches that went ignored by maintainers
(Gustavo A. R. Silva, Nathan Chancellor, Kees Cook)
- Remove the remaining side-effect users of ksize() by converting
dma-buf, btrfs, and coredump to using kmalloc_size_roundup(), add
more __alloc_size attributes, and introduce full testing of all
allocator functions. Finally remove the ksize() side-effect so that
each allocation-aware checker can finally behave without exceptions
- Introduce oops_limit (default 10,000) and warn_limit (default off) to
provide greater granularity of control for panic_on_oops and
panic_on_warn (Jann Horn, Kees Cook)
- Introduce overflows_type() and castable_to_type() helpers for cleaner
overflow checking
- Improve code generation for strscpy() and update str*() kern-doc
- Convert strscpy and sigphash tests to KUnit, and expand memcpy tests
- Always use a non-NULL argument for prepare_kernel_cred()
- Disable structleak plugin in FORTIFY KUnit test (Anders Roxell)
- Adjust orphan linker section checking to respect CONFIG_WERROR (Xin
Li)
- Make sure siginfo is cleared for forced SIGKILL (haifeng.xu)
- Fix um vs FORTIFY warnings for always-NULL arguments
* tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (31 commits)
ksmbd: replace one-element arrays with flexible-array members
hpet: Replace one-element array with flexible-array member
um: virt-pci: Avoid GCC non-NULL warning
signal: Initialize the info in ksignal
lib: fortify_kunit: build without structleak plugin
panic: Expose "warn_count" to sysfs
panic: Introduce warn_limit
panic: Consolidate open-coded panic_on_warn checks
exit: Allow oops_limit to be disabled
exit: Expose "oops_count" to sysfs
exit: Put an upper limit on how often we can oops
panic: Separate sysctl logic from CONFIG_SMP
mm/pgtable: Fix multiple -Wstringop-overflow warnings
mm: Make ksize() a reporting-only function
kunit/fortify: Validate __alloc_size attribute results
drm/sti: Fix return type of sti_{dvo,hda,hdmi}_connector_mode_valid()
drm/fsl-dcu: Fix return type of fsl_dcu_drm_connector_mode_valid()
driver core: Add __alloc_size hint to devm allocators
overflow: Introduce overflows_type() and castable_to_type()
coredump: Proactively round up to kmalloc bucket size
...
|
||
|
|
e2ca6ba6ba |
MM patches for 6.2-rc1.
- More userfaultfs work from Peter Xu. - Several convert-to-folios series from Sidhartha Kumar and Huang Ying. - Some filemap cleanups from Vishal Moola. - David Hildenbrand added the ability to selftest anon memory COW handling. - Some cpuset simplifications from Liu Shixin. - Addition of vmalloc tracing support by Uladzislau Rezki. - Some pagecache folioifications and simplifications from Matthew Wilcox. - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it. - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series shold have been in the non-MM tree, my bad. - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages. - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages. - Peter Xu utilized the PTE marker code for handling swapin errors. - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient. - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand. - zram support for multiple compression streams from Sergey Senozhatsky. - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway. - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations. - Vishal Moola removed the try_to_release_page() wrapper. - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache. - David Hildenbrand did some cleanup and repair work on KSM COW breaking. - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend. - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range(). - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen. - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect. - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages(). - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting. - David Hildenbrand has fixed some of our MM selftests for 32-bit machines. - Many singleton patches, as usual. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml CodAgiA51qwzId3GRytIo/tfWZSezgA= =d19R -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ... |
||
|
|
3ef3ace4e2 |
- Split MTRR and PAT init code to accomodate at least Xen PV and TDX
guests which do not get MTRRs exposed but only PAT. (TDX guests do not support the cache disabling dance when setting up MTRRs so they fall under the same category.) This is a cleanup work to remove all the ugly workarounds for such guests and init things separately (Juergen Gross) - Add two new Intel CPUs to the list of CPUs with "normal" Energy Performance Bias, leading to power savings - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern Centaur CPUs -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOYhIMACgkQEsHwGGHe VUpxug//ZKw3hYFroKhsULJi/e0j2nGARiSlJrJcFHl2vgh9yGvDsnYUyM/rgjgt cM3uCLbEG7nA6uhB3nupzaXZ8lBM1nU9kiEl/kjQ5oYf9nmJ48fLttvWGfxYN4s3 kj5fYVhlOZpntQXIWrwxnPqghUysumMnZmBJeKYiYNNfkj62l3xU2Ni4Gnjnp02I 9MmUhl7pj1aEyOQfM8rovy+wtYCg5WTOmXVlyVN+b9MwfYeK+stojvCZHxtJs9BD fezpJjjG+78xKUC7vVZXCh1p1N5Qvj014XJkVl9Hg0n7qizKFZRtqi8I769G2ptd exP8c2nDXKCqYzE8vK6ukWgDANQPs3d6Z7EqUKuXOCBF81PnMPSUMyNtQFGNM6Wp S5YSvFfCgUjp50IunOpvkDABgpM+PB8qeWUq72UFQJSOymzRJg/KXtE2X+qaMwtC 0i6VLXfMddGcmqNKDppfGtCjq2W5VrNIIJedtAQQGyl+pl3XzZeNomhJpm/0mVfJ 8UrlXZeXl/EUQ7qk40gC/Ash27pU9ZDx4CMNMy1jDIQqgufBjEoRIDSFqQlghmZq An5/BqMLhOMxUYNA7bRUnyeyxCBypetMdQt5ikBmVXebvBDmArXcuSNAdiy1uBFX KD8P3Y1AnsHIklxkLNyZRUy7fb4mgMFenUbgc0vmbYHbFl0C0pQ= =Zmgh -----END PGP SIGNATURE----- Merge tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Split MTRR and PAT init code to accomodate at least Xen PV and TDX guests which do not get MTRRs exposed but only PAT. (TDX guests do not support the cache disabling dance when setting up MTRRs so they fall under the same category) This is a cleanup work to remove all the ugly workarounds for such guests and init things separately (Juergen Gross) - Add two new Intel CPUs to the list of CPUs with "normal" Energy Performance Bias, leading to power savings - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern Centaur CPUs * tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) x86/mtrr: Make message for disabled MTRRs more descriptive x86/pat: Handle TDX guest PAT initialization x86/cpuid: Carve out all CPUID functionality x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV x86/cpu: Remove X86_FEATURE_XENPV usage in setup_cpu_entry_area() x86/cpu: Drop 32-bit Xen PV guest code in update_task_stack() x86/cpu: Remove unneeded 64-bit dependency in arch_enter_from_user_mode() x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.h x86/acpi/cstate: Optimize ARB_DISABLE on Centaur CPUs x86/mtrr: Simplify mtrr_ops initialization x86/cacheinfo: Switch cache_ap_init() to hotplug callback x86: Decouple PAT and MTRR handling x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init() x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init x86/mtrr: Get rid of __mtrr_enabled bool x86/mtrr: Simplify mtrr_bp_init() x86/mtrr: Remove set_all callback from struct mtrr_ops x86/mtrr: Disentangle MTRR init from PAT init x86/mtrr: Move cache control code to cacheinfo.c x86/mtrr: Split MTRR-specific handling from cache dis/enabling ... |
||
|
|
268325bda5 |
Random number generator updates for Linux 6.2-rc1.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
/ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
=QRhK
-----END PGP SIGNATURE-----
Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Replace prandom_u32_max() and various open-coded variants of it,
there is now a new family of functions that uses fast rejection
sampling to choose properly uniformly random numbers within an
interval:
get_random_u32_below(ceil) - [0, ceil)
get_random_u32_above(floor) - (floor, U32_MAX]
get_random_u32_inclusive(floor, ceil) - [floor, ceil]
Coccinelle was used to convert all current users of
prandom_u32_max(), as well as many open-coded patterns, resulting in
improvements throughout the tree.
I'll have a "late" 6.1-rc1 pull for you that removes the now unused
prandom_u32_max() function, just in case any other trees add a new
use case of it that needs to converted. According to linux-next,
there may be two trivial cases of prandom_u32_max() reintroductions
that are fixable with a 's/.../.../'. So I'll have for you a final
conversion patch doing that alongside the removal patch during the
second week.
This is a treewide change that touches many files throughout.
- More consistent use of get_random_canary().
- Updates to comments, documentation, tests, headers, and
simplification in configuration.
- The arch_get_random*_early() abstraction was only used by arm64 and
wasn't entirely useful, so this has been replaced by code that works
in all relevant contexts.
- The kernel will use and manage random seeds in non-volatile EFI
variables, refreshing a variable with a fresh seed when the RNG is
initialized. The RNG GUID namespace is then hidden from efivarfs to
prevent accidental leakage.
These changes are split into random.c infrastructure code used in the
EFI subsystem, in this pull request, and related support inside of
EFISTUB, in Ard's EFI tree. These are co-dependent for full
functionality, but the order of merging doesn't matter.
- Part of the infrastructure added for the EFI support is also used for
an improvement to the way vsprintf initializes its siphash key,
replacing an sleep loop wart.
- The hardware RNG framework now always calls its correct random.c
input function, add_hwgenerator_randomness(), rather than sometimes
going through helpers better suited for other cases.
- The add_latent_entropy() function has long been called from the fork
handler, but is a no-op when the latent entropy gcc plugin isn't
used, which is fine for the purposes of latent entropy.
But it was missing out on the cycle counter that was also being mixed
in beside the latent entropy variable. So now, if the latent entropy
gcc plugin isn't enabled, add_latent_entropy() will expand to a call
to add_device_randomness(NULL, 0), which adds a cycle counter,
without the absent latent entropy variable.
- The RNG is now reseeded from a delayed worker, rather than on demand
when used. Always running from a worker allows it to make use of the
CPU RNG on platforms like S390x, whose instructions are too slow to
do so from interrupts. It also has the effect of adding in new inputs
more frequently with more regularity, amounting to a long term
transcript of random values. Plus, it helps a bit with the upcoming
vDSO implementation (which isn't yet ready for 6.2).
- The jitter entropy algorithm now tries to execute on many different
CPUs, round-robining, in hopes of hitting even more memory latencies
and other unpredictable effects. It also will mix in a cycle counter
when the entropy timer fires, in addition to being mixed in from the
main loop, to account more explicitly for fluctuations in that timer
firing. And the state it touches is now kept within the same cache
line, so that it's assured that the different execution contexts will
cause latencies.
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
random: include <linux/once.h> in the right header
random: align entropy_timer_state to cache line
random: mix in cycle counter when jitter timer fires
random: spread out jitter callback to different CPUs
random: remove extraneous period and add a missing one in comments
efi: random: refresh non-volatile random seed when RNG is initialized
vsprintf: initialize siphash key using notifier
random: add back async readiness notifier
random: reseed in delayed work rather than on-demand
random: always mix cycle counter in add_latent_entropy()
hw_random: use add_hwgenerator_randomness() for early entropy
random: modernize documentation comment on get_random_bytes()
random: adjust comment to account for removed function
random: remove early archrandom abstraction
random: use random.trust_{bootloader,cpu} command line option only
stackprotector: actually use get_random_canary()
stackprotector: move get_random_canary() into stackprotector.h
treewide: use get_random_u32_inclusive() when possible
treewide: use get_random_u32_{above,below}() instead of manual loop
treewide: use get_random_u32_below() instead of deprecated function
...
|
||
|
|
c1f0fcd85d |
cxl for 6.2
- Add the cpu_cache_invalidate_memregion() API for cache flushing in
response to physical memory reconfiguration, or memory-side data
invalidation from operations like secure erase or memory-device unlock.
- Add a facility for the kernel to warn about collisions between kernel
and userspace access to PCI configuration registers
- Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1)
- Add handling and reporting of CXL errors reported via the PCIe AER
mechanism
- Add support for CXL Persistent Memory Security commands
- Add support for the "XOR" algorithm for CXL host bridge interleave
- Rework / simplify CXL to NVDIMM interactions
- Miscellaneous cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY5UpyAAKCRDfioYZHlFs
Z0ttAP4uxCjIibKsFVyexpSgI4vaZqQ9yt9NesmPwonc0XookwD+PlwP6Xc0d0Ox
t0gJ6+pwdh11NRzhcNE1pAaPcJZU4gs=
=HAQk
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl updates from Dan Williams:
"Compute Express Link (CXL) updates for 6.2.
While it may seem backwards, the CXL update this time around includes
some focus on CXL 1.x enabling where the work to date had been with
CXL 2.0 (VH topologies) in mind.
First generation CXL can mostly be supported via BIOS, similar to DDR,
however it became clear there are use cases for OS native CXL error
handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x
hosts (Restricted CXL Host (RCH) topologies). So, this update brings
RCH topologies into the Linux CXL device model.
In support of the ongoing CXL 2.0+ enabling two new core kernel
facilities are added.
One is the ability for the kernel to flag collisions between userspace
access to PCI configuration registers and kernel accesses. This is
brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware
mailbox over config-cycles.
The other is a cpu_cache_invalidate_memregion() API that maps to
wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest
VMs and architectures that do not support it yet. The CXL paths that
need it, dynamic memory region creation and security commands (erase /
unlock), are disabled when it is not present.
As for the CXL 2.0+ this cycle the subsystem gains support Persistent
Memory Security commands, error handling in response to PCIe AER
notifications, and support for the "XOR" host bridge interleave
algorithm.
Summary:
- Add the cpu_cache_invalidate_memregion() API for cache flushing in
response to physical memory reconfiguration, or memory-side data
invalidation from operations like secure erase or memory-device
unlock.
- Add a facility for the kernel to warn about collisions between
kernel and userspace access to PCI configuration registers
- Add support for Restricted CXL Host (RCH) topologies (formerly CXL
1.1)
- Add handling and reporting of CXL errors reported via the PCIe AER
mechanism
- Add support for CXL Persistent Memory Security commands
- Add support for the "XOR" algorithm for CXL host bridge interleave
- Rework / simplify CXL to NVDIMM interactions
- Miscellaneous cleanups and fixes"
* tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits)
cxl/region: Fix memdev reuse check
cxl/pci: Remove endian confusion
cxl/pci: Add some type-safety to the AER trace points
cxl/security: Drop security command ioctl uapi
cxl/mbox: Add variable output size validation for internal commands
cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size
cxl/security: Fix Get Security State output payload endian handling
cxl: update names for interleave ways conversion macros
cxl: update names for interleave granularity conversion macros
cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry
tools/testing/cxl: Require cache invalidation bypass
cxl/acpi: Fail decoder add if CXIMS for HBIG is missing
cxl/region: Fix spelling mistake "memergion" -> "memregion"
cxl/regs: Fix sparse warning
cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support
tools/testing/cxl: Add an RCH topology
cxl/port: Add RCD endpoint port enumeration
cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem
tools/testing/cxl: Add XOR Math support to cxl_test
cxl/acpi: Support CXL XOR Interleave Math (CXIMS)
...
|
||
|
|
3e12758392 |
x86/mm/kmmio: Remove redundant preempt_disable()
Now that kmmio uses rcu_read_lock_sched_notrace() there's no reason to call preempt_disable() as the read_lock_sched_notrace() already does that and is redundant. This also removes the preempt_enable_no_resched() as the "no_resched()" portion was bogus as there's no reason to do that. Link: https://lkml.kernel.org/r/20221212103703.7129cc5d@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Karol Herbst <karolherbst@gmail.com> Cc: Pekka Paalanen <ppaalanen@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
2045a3b891 |
mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()
Generalise vmemmap_populate_hugepages() so ARM64 & X86 & LoongArch can share its implementation. Link: https://lkml.kernel.org/r/20221027125253.3458989-4-chenhuacai@loongson.cn Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Acked-by: Will Deacon <will@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Min Zhou <zhoumin@loongson.cn> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Mathieu-Daudé <philmd@linaro.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Xuefeng Li <lixuefeng@loongson.cn> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
20fb6c9976 |
x86/mm/kmmio: Use rcu_read_lock_sched_notrace()
The mmiotrace tracer is "special". The purpose is to help reverse engineer binary drivers by removing the memory allocated by the driver and when the driver goes to access it, a fault occurs, the mmiotracer will record what the driver was doing and then do the work on its behalf by single stepping through the process. But to achieve this ability, it must do some special things. One is to take the rcu_read_lock() when the fault occurs, and then release it in the breakpoint that is single stepping. This makes lockdep unhappy, as it changes the state of RCU from within an exception that is not contained in that exception, and we get a nasty splat from lockdep. Instead, switch to rcu_read_lock_sched_notrace() as the RCU sched variant has the same grace period as normal RCU. This is basically the same as rcu_read_lock() but does not make lockdep complain about it. Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched(). Link: https://lore.kernel.org/linux-trace-kernel/20221209134144.04f33626@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Karol Herbst <karolherbst@gmail.com> Cc: Pekka Paalanen <ppaalanen@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
4994e387d7 |
x86/mm/kmmio: Switch to arch_spin_lock()
The mmiotrace tracer is "special". The purpose is to help reverse engineer binary drivers by removing the memory allocated by the driver and when the driver goes to access it, a fault occurs, the mmiotracer will record what the driver was doing and then do the work on its behalf by single stepping through the process. But to achieve this ability, it must do some special things. One is it needs to grab a lock while in the breakpoint handler. This is considered an NMI state, and then lockdep warns that the lock is being held in both an NMI state (really a breakpoint handler) and also in normal context. As the breakpoint/NMI state only happens when the driver is accessing memory, there's no concern of a race condition against the setup and tear-down of mmiotracer. To make lockdep and mmiotrace work together, convert the locks used in the breakpoint handler into arch_spin_lock(). Link: https://lkml.kernel.org/r/20221206191229.656244029@goodmis.org Link: https://lore.kernel.org/lkml/20221201213126.620b7dd3@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Karol Herbst <karolherbst@gmail.com> Cc: Pekka Paalanen <ppaalanen@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
c11ca45441 |
x86/pat: Handle TDX guest PAT initialization
With the decoupling of PAT and MTRR initialization, PAT will be used even with MTRRs disabled. This seems to break booting up as TDX guest, as the recommended sequence to set the PAT MSR across CPUs can't work in TDX guests due to disabling caches via setting CR0.CD isn't allowed in TDX mode. This is an inconsistency in the Intel documentation between the SDM and the TDX specification. For now handle TDX mode the same way as Xen PV guest mode by just accepting the current PAT MSR setting without trying to modify it. [ bp: Align conditions for better readability. ] Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20221205080433.16643-2-jgross@suse.com |
||
|
|
25226df4b9 |
mm/pgtable: Fix multiple -Wstringop-overflow warnings
The actual size of the following arrays at run-time depends on CONFIG_X86_PAE. 427 pmd_t *u_pmds[MAX_PREALLOCATED_USER_PMDS]; 428 pmd_t *pmds[MAX_PREALLOCATED_PMDS]; If CONFIG_X86_PAE is not enabled, their final size will be zero (which is technically not a legal storage size in C, but remains "valid" via the GNU extension). In that case, the compiler complains about trying to access objects of size zero when calling functions where these objects are passed as arguments. Fix this by sanity-checking the size of those arrays just before the function calls. Also, the following warnings are fixed by these changes when building with GCC 11+ and -Wstringop-overflow enabled: arch/x86/mm/pgtable.c:437:13: warning: ‘preallocate_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] arch/x86/mm/pgtable.c:440:13: warning: ‘preallocate_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] arch/x86/mm/pgtable.c:462:9: warning: ‘free_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] arch/x86/mm/pgtable.c:455:9: warning: ‘pgd_prepopulate_user_pmd’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] arch/x86/mm/pgtable.c:464:9: warning: ‘free_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] This is one of the last cases in the ongoing effort to globally enable -Wstringop-overflow. The alternative to this is to make the originally suggested change: make the pmds argument from an array pointer to a pointer pointer. That situation is considered "legal" for C in the sense that it does not have a way to reason about the storage. i.e.: -static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[]) +static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t **pmds) With the above change, there's no difference in binary output, and the compiler warning is silenced. However, with this patch, the compiler can actually figure out that it isn't using the code at all, and it gets dropped: text data bss dec hex filename 8218 718 32 8968 2308 arch/x86/mm/pgtable.o.before 7765 694 32 8491 212b arch/x86/mm/pgtable.o.after So this case (fixing a warning and reducing image size) is a clear win. Additionally drops an old work-around for GCC in the same code. Link: https://github.com/KSPP/linux/issues/203 Link: https://github.com/KSPP/linux/issues/181 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/Yytb67xvrnctxnEe@work |
||
|
|
a38358c934 | Merge branch 'mm-hotfixes-stable' into mm-stable | ||
|
|
d76c4f7a61 |
x86/cpu: Remove X86_FEATURE_XENPV usage in setup_cpu_entry_area()
Testing of X86_FEATURE_XENPV in setup_cpu_entry_area() can be removed, as this code path is 32-bit only, and Xen PV guests are 64-bit only. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20221104072701.20283-5-jgross@suse.com |
||
|
|
4dbd6a3e90 |
x86/ioremap: Fix page aligned size calculation in __ioremap_caller()
Current code re-calculates the size after aligning the starting and
ending physical addresses on a page boundary. But the re-calculation
also embeds the masking of high order bits that exceed the size of
the physical address space (via PHYSICAL_PAGE_MASK). If the masking
removes any high order bits, the size calculation results in a huge
value that is likely to immediately fail.
Fix this by re-calculating the page-aligned size first. Then mask any
high order bits using PHYSICAL_PAGE_MASK.
Fixes:
|
||
|
|
8032bf1233 |
treewide: use get_random_u32_below() instead of deprecated function
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> |
||
|
|
1156b4418d |
memregion: Add cpu_cache_invalidate_memregion() interface
With CXL security features, and CXL dynamic provisioning, global CPU cache flushing nvdimm requirements are no longer specific to that subsystem, even beyond the scope of security_ops. CXL will need such semantics for features not necessarily limited to persistent memory. The functionality this is enabling is to be able to instantaneously secure erase potentially terabytes of memory at once and the kernel needs to be sure that none of the data from before the erase is still present in the cache. It is also used when unlocking a memory device where speculative reads and firmware accesses could have cached poison from before the device was unlocked. Lastly this facility is used when mapping new devices, or new capacity into an established physical address range. I.e. when the driver switches DeviceA mapping AddressX to DeviceB mapping AddressX then any cached data from DeviceA:AddressX needs to be invalidated. This capability is typically only used once per-boot (for unlock), or once per bare metal provisioning event (secure erase), like when handing off the system to another tenant or decommissioning a device. It may also be used for dynamic CXL region provisioning. Users must first call cpu_cache_has_invalidate_memregion() to know whether this functionality is available on the architecture. On x86 this respects the constraints of when wbinvd() is tolerable. It is already the case that wbinvd() is problematic to allow in VMs due its global performance impact and KVM, for example, has been known to just trap and ignore the call. With confidential computing guest execution of wbinvd() may even trigger an exception. Given guests should not be messing with the bare metal address map via CXL configuration changes cpu_cache_has_invalidate_memregion() returns false in VMs. While this global cache invalidation facility, is exported to modules, since NVDIMM and CXL support can be built as a module, it is not for general use. The intent is that this facility is not available outside of specific "device-memory" use cases. To make that expectation as clear as possible the API is scoped to a new "DEVMEM" module namespace that only the NVDIMM and CXL subsystems are expected to import. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
|
|
adfe7512e1 |
x86: Decouple PAT and MTRR handling
Today, PAT is usable only with MTRR being active, with some nasty tweaks to make PAT usable when running as a Xen PV guest which doesn't support MTRR. The reason for this coupling is that both PAT MSR changes and MTRR changes require a similar sequence and so full PAT support was added using the already available MTRR handling. Xen PV PAT handling can work without MTRR, as it just needs to consume the PAT MSR setting done by the hypervisor without the ability and need to change it. This in turn has resulted in a convoluted initialization sequence and wrong decisions regarding cache mode availability due to misguiding PAT availability flags. Fix all of that by allowing to use PAT without MTRR and by reworking the current PAT initialization sequence to match better with the newly introduced generic cache initialization. This removes the need of the recently added pat_force_disabled flag, so remove the remnants of the patch adding it. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221102074713.21493-14-jgross@suse.com Signed-off-by: Borislav Petkov <bp@suse.de> |
||
|
|
e025ab842e |
mm: remove kern_addr_valid() completely
Most architectures (except arm64/x86/sparc) simply return 1 for kern_addr_valid(), which is only used in read_kcore(), and it calls copy_from_kernel_nofault() which could check whether the address is a valid kernel address. So as there is no need for kern_addr_valid(), let's remove it. Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Helge Deller <deller@gmx.de> [parisc] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: <aou@eecs.berkeley.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
1fdbed657a |
arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging
The following bug is reported to be triggered when starting X on x86-32 system with i915: [ 225.777375] kernel BUG at mm/memory.c:2664! [ 225.777391] invalid opcode: 0000 [#1] PREEMPT SMP [ 225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86 [ 225.777415] Hardware name: /8I865G775-G, BIOS F1 08/29/2006 [ 225.777421] EIP: __apply_to_page_range+0x24d/0x31c [ 225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76 [ 225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00 [ 225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80 [ 225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202 [ 225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0 [ 225.777479] Call Trace: [ 225.777486] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.777684] apply_to_page_range+0x21/0x27 [ 225.777694] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.777870] remap_io_mapping+0x49/0x75 [i915] [ 225.778046] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.778220] ? mutex_unlock+0xb/0xd [ 225.778231] ? i915_vma_pin_fence+0x6d/0xf7 [i915] [ 225.778420] vm_fault_gtt+0x2a9/0x8f1 [i915] [ 225.778644] ? lock_is_held_type+0x56/0xe7 [ 225.778655] ? lock_is_held_type+0x7a/0xe7 [ 225.778663] ? 0xc1000000 [ 225.778670] __do_fault+0x21/0x6a [ 225.778679] handle_mm_fault+0x708/0xb21 [ 225.778686] ? mt_find+0x21e/0x5ae [ 225.778696] exc_page_fault+0x185/0x705 [ 225.778704] ? doublefault_shim+0x127/0x127 [ 225.778715] handle_exception+0x130/0x130 [ 225.778723] EIP: 0xb700468a Recently pud_huge() got aware of non-present entry by commit |
||
|
|
a970174d7a |
x86/mm: Do not verify W^X at boot up
Adding on the kernel command line "ftrace=function" triggered:
CPA detected W^X violation: 8000000000000063 -> 0000000000000063 range: 0xffffffffc0013000 - 0xffffffffc0013fff PFN 10031b
WARNING: CPU: 0 PID: 0 at arch/x86/mm/pat/set_memory.c:609
verify_rwx+0x61/0x6d
Call Trace:
__change_page_attr_set_clr+0x146/0x8a6
change_page_attr_set_clr+0x135/0x268
change_page_attr_clear.constprop.0+0x16/0x1c
set_memory_x+0x2c/0x32
arch_ftrace_update_trampoline+0x218/0x2db
ftrace_update_trampoline+0x16/0xa1
__register_ftrace_function+0x93/0xb2
ftrace_startup+0x21/0xf0
register_ftrace_function_nolock+0x26/0x40
register_ftrace_function+0x4e/0x143
function_trace_init+0x7d/0xc3
tracer_init+0x23/0x2c
tracing_set_tracer+0x1d5/0x206
register_tracer+0x1c0/0x1e4
init_function_trace+0x90/0x96
early_trace_init+0x25c/0x352
start_kernel+0x424/0x6e4
x86_64_start_reservations+0x24/0x2a
x86_64_start_kernel+0x8c/0x95
secondary_startup_64_no_verify+0xe0/0xeb
This is because at boot up, kernel text is writable, and there's no
reason to do tricks to updated it. But the verifier does not
distinguish updates at boot up and at run time, and causes a warning at
time of boot.
Add a check for system_state == SYSTEM_BOOTING and allow it if that is
the case.
[ These SYSTEM_BOOTING special cases are all pretty horrid, but the x86
text_poke() code does some odd things at bootup, forcing this for now
- Linus ]
Link: https://lore.kernel.org/r/20221024112730.180916b3@gandalf.local.home
Fixes:
|