Commit Graph

8943 Commits

Author SHA1 Message Date
Linus Torvalds
70cc1b5307 powerpc updates for 6.4
- Add support for building the kernel using PC-relative addressing on Power10.
 
  - Allow HV KVM guests on Power10 to use prefixed instructions.
 
  - Unify support for the P2020 CPU (85xx) into a single machine description.
 
  - Always build the 64-bit kernel with 128-bit long double.
 
  - Drop support for several obsolete 2000's era development boards as
    identified by Paul Gortmaker.
 
  - A series fixing VFIO on Power since some generic changes.
 
  - Various other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu,
 Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley,
 Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael
 Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin,
 Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras,
 Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher
 Boessenkool, Timothy Pearson.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmRLdD8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPrED/93Hwi1/8udXYV6OAFBnLLLDX9RZWNx
 sj6W7PC/yY7MGUTIGcayrAURt5xFfQZMBdq9nhhH46Wyd8pSUe1IlpXIEgqeH64Y
 5uaSe6u4OZ5dDmiYz8bM+Y4Ixkfq1xMO0Rj27FIRJyU4Pp6gyMQ6/8W+iPU2vYIb
 jl4frVO8PpmCbW8euOyT/b9YB2h79+5nLgZT4RvmYJblKQwgzRZWaiCAU0wYf+xT
 xYsMQcqJlPstizTnXIr+GC08VrMPQR51kJCurnNUMXoAQY6toEXebveWRNZ3sH39
 K0BRQ036NNGPS4GlHVbjgLIdoWr4pUEROZ48Jy9WeiDh6OwO2vb2zrm7ZLtKlGXI
 LCQ08T9diPbAAJyVJaKjpsNXhTffuLPRhIOr1o2vqGvY+Fqbw96bPQAyFbxpJtrw
 rGTTc5e93fEdZV+eaGR8kfdNM75WM6lgiteaIbUnpirYTh/0j6YEO1Ijc4d9e5ux
 aoHN2eXQDjMm9foZf1D6vaCTRyN8LwLa9hcydKCn6+qULUQzoZ2r4cRt6eSD7+fx
 Wni1Zg7vD6xx294nVmhg5dWuD4qVIAZj3BZPFHHR2SPqy+UbEJLk/NKgznmrhhgQ
 HBcZAEFqIBZkA4e0w6LJKh/1j1rxpZUokgjMotOJq4VRivq82W1P7SioFDY3/6w5
 UvGtIgGq4Qsa2Q==
 =89WH
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add support for building the kernel using PC-relative addressing on
   Power10.

 - Allow HV KVM guests on Power10 to use prefixed instructions.

 - Unify support for the P2020 CPU (85xx) into a single machine
   description.

 - Always build the 64-bit kernel with 128-bit long double.

 - Drop support for several obsolete 2000's era development boards as
   identified by Paul Gortmaker.

 - A series fixing VFIO on Power since some generic changes.

 - Various other small features and fixes.

Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu,
Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel
Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro
Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas
Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali
Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob
Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and
Timothy Pearson.

* tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits)
  powerpc/64s: Disable pcrel code model on Clang
  powerpc: Fix merge conflict between pcrel and copy_thread changes
  powerpc/configs/powernv: Add IGB=y
  powerpc/configs/64s: Drop JFS Filesystem
  powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems
  powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest
  powerpc/configs: Make pseries_le an alias for ppc64le_guest
  powerpc/configs: Incorporate generic kvm_guest.config into guest configs
  powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs
  powerpc/configs/64s: Enable Device Mapper options
  powerpc/configs/64s: Enable PSTORE
  powerpc/configs/64s: Enable VLAN support
  powerpc/configs/64s: Enable BLK_DEV_NVME
  powerpc/configs/64s: Drop REISERFS
  powerpc/configs/64s: Use SHA512 for module signatures
  powerpc/configs/64s: Enable IO_STRICT_DEVMEM
  powerpc/configs/64s: Enable SCHEDSTATS
  powerpc/configs/64s: Enable DEBUG_VM & other options
  powerpc/configs/64s: Enable EMULATED_STATS
  powerpc/configs/64s: Enable KUNIT and most tests
  ...
2023-04-28 16:24:32 -07:00
Linus Torvalds
f20730efbd SMP cross-CPU function-call updates for v6.4:
- Remove diagnostics and adjust config for CSD lock diagnostics
 
  - Add a generic IPI-sending tracepoint, as currently there's no easy
    way to instrument IPI origins: it's arch dependent and for some
    major architectures it's not even consistently available.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq
 gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5
 M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9
 lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl
 KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT
 AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3
 JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi
 Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF
 25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB
 11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh
 yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK
 PA5+V7HcQRk=
 =Wp7f
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP cross-CPU function-call updates from Ingo Molnar:

 - Remove diagnostics and adjust config for CSD lock diagnostics

 - Add a generic IPI-sending tracepoint, as currently there's no easy
   way to instrument IPI origins: it's arch dependent and for some major
   architectures it's not even consistently available.

* tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  trace,smp: Trace all smp_function_call*() invocations
  trace: Add trace_ipi_send_cpu()
  sched, smp: Trace smp callback causing an IPI
  smp: reword smp call IPI comment
  treewide: Trace IPIs sent via smp_send_reschedule()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  trace: Add trace_ipi_send_cpumask()
  kernel/smp: Make csdlock_debug= resettable
  locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
  locking/csd_lock: Remove added data from CSD lock debugging
  locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28 15:03:43 -07:00
Linus Torvalds
2aff7c706c Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
    this inconsistently follow this new, common convention, and fix all the fallout
    that objtool can now detect statically.
 
  - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
    split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
 
  - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
 
  - Generate ORC data for __pfx code
 
  - Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
 
  - Misc improvements & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
 pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
 TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
 LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
 c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
 /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
 SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
 vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
 x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
 fRho4CqzrtM=
 =NwEV
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Linus Torvalds
7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Linus Torvalds
b6a7828502 modules-6.4-rc1
The summary of the changes for this pull requests is:
 
  * Song Liu's new struct module_memory replacement
  * Nick Alcock's MODULE_LICENSE() removal for non-modules
  * My cleanups and enhancements to reduce the areas where we vmalloc
    module memory for duplicates, and the respective debug code which
    proves the remaining vmalloc pressure comes from userspace.
 
 Most of the changes have been in linux-next for quite some time except
 the minor fixes I made to check if a module was already loaded
 prior to allocating the final module memory with vmalloc and the
 respective debug code it introduces to help clarify the issue. Although
 the functional change is small it is rather safe as it can only *help*
 reduce vmalloc space for duplicates and is confirmed to fix a bootup
 issue with over 400 CPUs with KASAN enabled. I don't expect stable
 kernels to pick up that fix as the cleanups would have also had to have
 been picked up. Folks on larger CPU systems with modules will want to
 just upgrade if vmalloc space has been an issue on bootup.
 
 Given the size of this request, here's some more elaborate details
 on this pull request.
 
 The functional change change in this pull request is the very first
 patch from Song Liu which replaces the struct module_layout with a new
 struct module memory. The old data structure tried to put together all
 types of supported module memory types in one data structure, the new
 one abstracts the differences in memory types in a module to allow each
 one to provide their own set of details. This paves the way in the
 future so we can deal with them in a cleaner way. If you look at changes
 they also provide a nice cleanup of how we handle these different memory
 areas in a module. This change has been in linux-next since before the
 merge window opened for v6.3 so to provide more than a full kernel cycle
 of testing. It's a good thing as quite a bit of fixes have been found
 for it.
 
 Jason Baron then made dynamic debug a first class citizen module user by
 using module notifier callbacks to allocate / remove module specific
 dynamic debug information.
 
 Nick Alcock has done quite a bit of work cross-tree to remove module
 license tags from things which cannot possibly be module at my request
 so to:
 
   a) help him with his longer term tooling goals which require a
      deterministic evaluation if a piece a symbol code could ever be
      part of a module or not. But quite recently it is has been made
      clear that tooling is not the only one that would benefit.
      Disambiguating symbols also helps efforts such as live patching,
      kprobes and BPF, but for other reasons and R&D on this area
      is active with no clear solution in sight.
 
   b) help us inch closer to the now generally accepted long term goal
      of automating all the MODULE_LICENSE() tags from SPDX license tags
 
 In so far as a) is concerned, although module license tags are a no-op
 for non-modules, tools which would want create a mapping of possible
 modules can only rely on the module license tag after the commit
 8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin
 or tristate.conf").  Nick has been working on this *for years* and
 AFAICT I was the only one to suggest two alternatives to this approach
 for tooling. The complexity in one of my suggested approaches lies in
 that we'd need a possible-obj-m and a could-be-module which would check
 if the object being built is part of any kconfig build which could ever
 lead to it being part of a module, and if so define a new define
 -DPOSSIBLE_MODULE [0]. A more obvious yet theoretical approach I've
 suggested would be to have a tristate in kconfig imply the same new
 -DPOSSIBLE_MODULE as well but that means getting kconfig symbol names
 mapping to modules always, and I don't think that's the case today. I am
 not aware of Nick or anyone exploring either of these options. Quite
 recently Josh Poimboeuf has pointed out that live patching, kprobes and
 BPF would benefit from resolving some part of the disambiguation as
 well but for other reasons. The function granularity KASLR (fgkaslr)
 patches were mentioned but Joe Lawrence has clarified this effort has
 been dropped with no clear solution in sight [1].
 
 In the meantime removing module license tags from code which could never
 be modules is welcomed for both objectives mentioned above. Some
 developers have also welcomed these changes as it has helped clarify
 when a module was never possible and they forgot to clean this up,
 and so you'll see quite a bit of Nick's patches in other pull
 requests for this merge window. I just picked up the stragglers after
 rc3. LWN has good coverage on the motivation behind this work [2] and
 the typical cross-tree issues he ran into along the way. The only
 concrete blocker issue he ran into was that we should not remove the
 MODULE_LICENSE() tags from files which have no SPDX tags yet, even if
 they can never be modules. Nick ended up giving up on his efforts due
 to having to do this vetting and backlash he ran into from folks who
 really did *not understand* the core of the issue nor were providing
 any alternative / guidance. I've gone through his changes and dropped
 the patches which dropped the module license tags where an SPDX
 license tag was missing, it only consisted of 11 drivers.  To see
 if a pull request deals with a file which lacks SPDX tags you
 can just use:
 
   ./scripts/spdxcheck.py -f \
 	$(git diff --name-only commid-id | xargs echo)
 
 You'll see a core module file in this pull request for the above,
 but that's not related to his changes. WE just need to add the SPDX
 license tag for the kernel/module/kmod.c file in the future but
 it demonstrates the effectiveness of the script.
 
 Most of Nick's changes were spread out through different trees,
 and I just picked up the slack after rc3 for the last kernel was out.
 Those changes have been in linux-next for over two weeks.
 
 The cleanups, debug code I added and final fix I added for modules
 were motivated by David Hildenbrand's report of boot failing on
 a systems with over 400 CPUs when KASAN was enabled due to running
 out of virtual memory space. Although the functional change only
 consists of 3 lines in the patch "module: avoid allocation if module is
 already present and ready", proving that this was the best we can
 do on the modules side took quite a bit of effort and new debug code.
 
 The initial cleanups I did on the modules side of things has been
 in linux-next since around rc3 of the last kernel, the actual final
 fix for and debug code however have only been in linux-next for about a
 week or so but I think it is worth getting that code in for this merge
 window as it does help fix / prove / evaluate the issues reported
 with larger number of CPUs. Userspace is not yet fixed as it is taking
 a bit of time for folks to understand the crux of the issue and find a
 proper resolution. Worst come to worst, I have a kludge-of-concept [3]
 of how to make kernel_read*() calls for modules unique / converge them,
 but I'm currently inclined to just see if userspace can fix this
 instead.
 
 [0] https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/
 [1] https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com
 [2] https://lwn.net/Articles/927569/
 [3] https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRG4m0SHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinQ2oP/0xlvKwJg6Ey8fHZF0qv8VOskE80zoLF
 hMazU3xfqLA+1TQvouW1YBxt3jwS3t1Ehs+NrV+nY9Yzcm0MzRX/n3fASJVe7nRr
 oqWWQU+voYl5Pw1xsfdp6C8IXpBQorpYby3Vp0MAMoZyl2W2YrNo36NV488wM9KC
 jD4HF5Z6xpnPSZTRR7AgW9mo7FdAtxPeKJ76Bch7lH8U6omT7n36WqTw+5B1eAYU
 YTOvrjRs294oqmWE+LeebyiOOXhH/yEYx4JNQgCwPdxwnRiGJWKsk5va0hRApqF/
 WW8dIqdEnjsa84lCuxnmWgbcPK8cgmlO0rT0DyneACCldNlldCW1LJ0HOwLk9pea
 p3JFAsBL7TKue4Tos6I7/4rx1ufyBGGIigqw9/VX5g0Iif+3BhWnqKRfz+p9wiMa
 Fl7cU6u7yC68CHu1HBSisK16cYMCPeOnTSd89upHj8JU/t74O6k/ARvjrQ9qmNUt
 c5U+OY+WpNJ1nXQydhY/yIDhFdYg8SSpNuIO90r4L8/8jRQYXNG80FDd1UtvVDuy
 eq0r2yZ8C0XHSlOT9QHaua/tWV/aaKtyC/c0hDRrigfUrq8UOlGujMXbUnrmrWJI
 tLJLAc7ePWAAoZXGSHrt0U27l029GzLwRdKqJ6kkDANVnTeOdV+mmBg9zGh3/Mp6
 agiwdHUMVN7X
 =56WK
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module updates from Luis Chamberlain:
 "The summary of the changes for this pull requests is:

   - Song Liu's new struct module_memory replacement

   - Nick Alcock's MODULE_LICENSE() removal for non-modules

   - My cleanups and enhancements to reduce the areas where we vmalloc
     module memory for duplicates, and the respective debug code which
     proves the remaining vmalloc pressure comes from userspace.

  Most of the changes have been in linux-next for quite some time except
  the minor fixes I made to check if a module was already loaded prior
  to allocating the final module memory with vmalloc and the respective
  debug code it introduces to help clarify the issue. Although the
  functional change is small it is rather safe as it can only *help*
  reduce vmalloc space for duplicates and is confirmed to fix a bootup
  issue with over 400 CPUs with KASAN enabled. I don't expect stable
  kernels to pick up that fix as the cleanups would have also had to
  have been picked up. Folks on larger CPU systems with modules will
  want to just upgrade if vmalloc space has been an issue on bootup.

  Given the size of this request, here's some more elaborate details:

  The functional change change in this pull request is the very first
  patch from Song Liu which replaces the 'struct module_layout' with a
  new 'struct module_memory'. The old data structure tried to put
  together all types of supported module memory types in one data
  structure, the new one abstracts the differences in memory types in a
  module to allow each one to provide their own set of details. This
  paves the way in the future so we can deal with them in a cleaner way.
  If you look at changes they also provide a nice cleanup of how we
  handle these different memory areas in a module. This change has been
  in linux-next since before the merge window opened for v6.3 so to
  provide more than a full kernel cycle of testing. It's a good thing as
  quite a bit of fixes have been found for it.

  Jason Baron then made dynamic debug a first class citizen module user
  by using module notifier callbacks to allocate / remove module
  specific dynamic debug information.

  Nick Alcock has done quite a bit of work cross-tree to remove module
  license tags from things which cannot possibly be module at my request
  so to:

   a) help him with his longer term tooling goals which require a
      deterministic evaluation if a piece a symbol code could ever be
      part of a module or not. But quite recently it is has been made
      clear that tooling is not the only one that would benefit.
      Disambiguating symbols also helps efforts such as live patching,
      kprobes and BPF, but for other reasons and R&D on this area is
      active with no clear solution in sight.

   b) help us inch closer to the now generally accepted long term goal
      of automating all the MODULE_LICENSE() tags from SPDX license tags

  In so far as a) is concerned, although module license tags are a no-op
  for non-modules, tools which would want create a mapping of possible
  modules can only rely on the module license tag after the commit
  8b41fc4454 ("kbuild: create modules.builtin without
  Makefile.modbuiltin or tristate.conf").

  Nick has been working on this *for years* and AFAICT I was the only
  one to suggest two alternatives to this approach for tooling. The
  complexity in one of my suggested approaches lies in that we'd need a
  possible-obj-m and a could-be-module which would check if the object
  being built is part of any kconfig build which could ever lead to it
  being part of a module, and if so define a new define
  -DPOSSIBLE_MODULE [0].

  A more obvious yet theoretical approach I've suggested would be to
  have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as
  well but that means getting kconfig symbol names mapping to modules
  always, and I don't think that's the case today. I am not aware of
  Nick or anyone exploring either of these options. Quite recently Josh
  Poimboeuf has pointed out that live patching, kprobes and BPF would
  benefit from resolving some part of the disambiguation as well but for
  other reasons. The function granularity KASLR (fgkaslr) patches were
  mentioned but Joe Lawrence has clarified this effort has been dropped
  with no clear solution in sight [1].

  In the meantime removing module license tags from code which could
  never be modules is welcomed for both objectives mentioned above. Some
  developers have also welcomed these changes as it has helped clarify
  when a module was never possible and they forgot to clean this up, and
  so you'll see quite a bit of Nick's patches in other pull requests for
  this merge window. I just picked up the stragglers after rc3. LWN has
  good coverage on the motivation behind this work [2] and the typical
  cross-tree issues he ran into along the way. The only concrete blocker
  issue he ran into was that we should not remove the MODULE_LICENSE()
  tags from files which have no SPDX tags yet, even if they can never be
  modules. Nick ended up giving up on his efforts due to having to do
  this vetting and backlash he ran into from folks who really did *not
  understand* the core of the issue nor were providing any alternative /
  guidance. I've gone through his changes and dropped the patches which
  dropped the module license tags where an SPDX license tag was missing,
  it only consisted of 11 drivers. To see if a pull request deals with a
  file which lacks SPDX tags you can just use:

    ./scripts/spdxcheck.py -f \
	$(git diff --name-only commid-id | xargs echo)

  You'll see a core module file in this pull request for the above, but
  that's not related to his changes. WE just need to add the SPDX
  license tag for the kernel/module/kmod.c file in the future but it
  demonstrates the effectiveness of the script.

  Most of Nick's changes were spread out through different trees, and I
  just picked up the slack after rc3 for the last kernel was out. Those
  changes have been in linux-next for over two weeks.

  The cleanups, debug code I added and final fix I added for modules
  were motivated by David Hildenbrand's report of boot failing on a
  systems with over 400 CPUs when KASAN was enabled due to running out
  of virtual memory space. Although the functional change only consists
  of 3 lines in the patch "module: avoid allocation if module is already
  present and ready", proving that this was the best we can do on the
  modules side took quite a bit of effort and new debug code.

  The initial cleanups I did on the modules side of things has been in
  linux-next since around rc3 of the last kernel, the actual final fix
  for and debug code however have only been in linux-next for about a
  week or so but I think it is worth getting that code in for this merge
  window as it does help fix / prove / evaluate the issues reported with
  larger number of CPUs. Userspace is not yet fixed as it is taking a
  bit of time for folks to understand the crux of the issue and find a
  proper resolution. Worst come to worst, I have a kludge-of-concept [3]
  of how to make kernel_read*() calls for modules unique / converge
  them, but I'm currently inclined to just see if userspace can fix this
  instead"

Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0]
Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1]
Link: https://lwn.net/Articles/927569/ [2]
Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3]

* tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits)
  module: add debugging auto-load duplicate module support
  module: stats: fix invalid_mod_bytes typo
  module: remove use of uninitialized variable len
  module: fix building stats for 32-bit targets
  module: stats: include uapi/linux/module.h
  module: avoid allocation if module is already present and ready
  module: add debug stats to help identify memory pressure
  module: extract patient module check into helper
  modules/kmod: replace implementation with a semaphore
  Change DEFINE_SEMAPHORE() to take a number argument
  module: fix kmemleak annotations for non init ELF sections
  module: Ignore L0 and rename is_arm_mapping_symbol()
  module: Move is_arm_mapping_symbol() to module_symbol.h
  module: Sync code of is_arm_mapping_symbol()
  scripts/gdb: use mem instead of core_layout to get the module address
  interconnect: remove module-related code
  interconnect: remove MODULE_LICENSE in non-modules
  zswap: remove MODULE_LICENSE in non-modules
  zpool: remove MODULE_LICENSE in non-modules
  x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules
  ...
2023-04-27 16:36:55 -07:00
Linus Torvalds
556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds
34b62f186d pci-v6.4-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmRIKooUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxq7A/9G0sInrqvqH2I9/Set/FnmMfCtGDH
 YcEjHYYxL+pztSiXTavDV+ib9iaut83oYtcV9p1bUMhJoZdKNZhrNdIGzRFSemI4
 0/ShtklPzNEu6nPPL24CnEzgbrODBU56ZvzrIE/tShEoOjkKa1triBnOA/JMxYTL
 cUwqDQlDkdpYniCgxy05QfcFZ0mmSOkbl7runGfTMTiUKKC3xSRiaW5YN9KZe3i7
 G5YHu1VVCjeQdQSICHYwyFmkyiqosCoajQNp1IHBkWqSwilzyZMg0NWJobVSA7M/
 mXXnzLtFcC60oT58/9MaggQwDTaSGDE8mG+sWv05bB2u5TQVyZEZqZ4c2FzmZIZT
 WLZYLB6PFRW0zePEuMnVkSLS2npkX+aGaBv28bf88sjorpaYNG01uYijnLEceolQ
 yBPFRN3bsRuOyHvYY/tiZX/BP7z/DS++XXwA8zQWZnYsXSlncJdwCNquV0xIwUt+
 hij4/Yu7o9SgV1LbuwtkMFAn3C9Szc65Eer+IvRRdnMZYphjVHbA5F2msRFyiCeR
 HxECtMQ1jBnVrpQAcBX1Sz+Vu5MrwCqzc2n6tvTQHDvVNjXfkG3NaFhxYPc1IL9Z
 NJMeCKfK1qzw7TtbvWXCluTTIM9N/bNJXrJhQbjNY7V6IaBZY1QNYW0ZFfGgj6Gb
 UUPgndidRy4/hzw=
 =HPXl
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Resource management:

   - Add pci_dev_for_each_resource() and pci_bus_for_each_resource()
     iterators

  PCIe native device hotplug:

   - Fix AB-BA deadlock between reset_lock and device_lock

  Power management:

   - Wait longer for devices to become ready after resume (as we do for
     reset) to accommodate Intel Titan Ridge xHCI devices

   - Extend D3hot delay for NVIDIA HDA controllers to avoid
     unrecoverable devices after a bus reset

  Error handling:

   - Clear PCIe Device Status after EDR since generic error recovery now
     only clears it when AER is native

  ASPM:

   - Work around Chromebook firmware defect that clobbers Capability
     list (including ASPM L1 PM Substates Cap) when returning from
     D3cold to D0

  Freescale i.MX6 PCIe controller driver:

   - Install imprecise external abort handler only when DT indicates
     PCIe support

  Freescale Layerscape PCIe controller driver:

   - Add ls1028a endpoint mode support

  Qualcomm PCIe controller driver:

   - Add SM8550 DT binding and driver support

   - Add SDX55 DT binding and driver support

   - Use bulk APIs for clocks of IP 1.0.0, 2.3.2, 2.3.3

   - Use bulk APIs for reset of IP 2.1.0, 2.3.3, 2.4.0

   - Add DT "mhi" register region for supported SoCs

   - Expose link transition counts via debugfs to help debug low power
     issues

   - Support system suspend and resume; reduce interconnect bandwidth
     and turn off clock and PHY if there are no active devices

   - Enable async probe by default to reduce boot time

  Miscellaneous:

   - Sort controller Kconfig entries by vendor"

* tag 'pci-v6.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (56 commits)
  PCI: xilinx: Drop obsolete dependency on COMPILE_TEST
  PCI: mobiveil: Sort Kconfig entries by vendor
  PCI: dwc: Sort Kconfig entries by vendor
  PCI: Sort controller Kconfig entries by vendor
  PCI: Use consistent controller Kconfig menu entry language
  PCI: xilinx-nwl: Add 'Xilinx' to Kconfig prompt
  PCI: hv: Add 'Microsoft' to Kconfig prompt
  PCI: meson: Add 'Amlogic' to Kconfig prompt
  PCI: Use of_property_present() for testing DT property presence
  PCI/PM: Extend D3hot delay for NVIDIA HDA controllers
  dt-bindings: PCI: qcom: Document msi-map and msi-map-mask properties
  PCI: qcom: Add SM8550 PCIe support
  dt-bindings: PCI: qcom: Add SM8550 compatible
  PCI: qcom: Add support for SDX55 SoC
  dt-bindings: PCI: qcom-ep: Fix the unit address used in example
  dt-bindings: PCI: qcom: Add SDX55 SoC
  dt-bindings: PCI: qcom: Update maintainers entry
  PCI: qcom: Enable async probe by default
  PCI: qcom: Add support for system suspend and resume
  PCI/PM: Drop pci_bridge_wait_for_secondary_bus() timeout parameter
  ...
2023-04-27 10:45:30 -07:00
Nicholas Piggin
0c993300d5 powerpc: Fix merge conflict between pcrel and copy_thread changes
Fix a conflict between commit 4e991e3c16 ("powerpc: add CFUNC
assembly label annotation") and commit b504b6aade ("powerpc:
differentiate kthread from user kernel thread start").

Fixes: 4e991e3c16 ("powerpc: add CFUNC assembly label annotation")
Fixes: b504b6aade ("powerpc: differentiate kthread from user kernel thread start")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230426055848.402993-2-npiggin@gmail.com
2023-04-26 16:20:16 +10:00
Linus Torvalds
e7989789c6 Timers and timekeeping updates:
- Improve the VDSO build time checks to cover all dynamic relocations
 
     VDSO does not allow dynamic relcations, but the build time check is
     incomplete and fragile.
 
     It's based on architectures specifying the relocation types to search
     for and does not handle R_*_NONE relocation entries correctly.
     R_*_NONE relocations are injected by some GNU ld variants if they fail
     to determine the exact .rel[a]/dyn_size to cover trailing zeros.
     R_*_NONE relocations must be ignored by dynamic loaders, so they
     should be ignored in the build time check too.
 
     Remove the architecture specific relocation types to check for and
     validate strictly that no other relocations than R_*_NONE end up
     in the VSDO .so file.
 
   - Prefer signal delivery to the current thread for
     CLOCK_PROCESS_CPUTIME_ID based posix-timers
 
     Such timers prefer to deliver the signal to the main thread of a
     process even if the context in which the timer expires is the current
     task. This has the downside that it might wake up an idle thread.
 
     As there is no requirement or guarantee that the signal has to be
     delivered to the main thread, avoid this by preferring the current
     task if it is part of the thread group which shares sighand.
 
     This not only avoids waking idle threads, it also distributes the
     signal delivery in case of multiple timers firing in the context
     of different threads close to each other better.
 
   - Align the tick period properly (again)
 
     For a long time the tick was starting at CLOCK_MONOTONIC zero, which
     allowed users space applications to either align with the tick or to
     place a periodic computation so that it does not interfere with the
     tick. The alignement of the tick period was more by chance than by
     intention as the tick is set up before a high resolution clocksource is
     installed, i.e. timekeeping is still tick based and the tick period
     advances from there.
 
     The early enablement of sched_clock() broke this alignement as the time
     accumulated by sched_clock() is taken into account when timekeeping is
     initialized. So the base value now(CLOCK_MONOTONIC) is not longer a
     multiple of tick periods, which breaks applications which relied on
     that behaviour.
 
     Cure this by aligning the tick starting point to the next multiple of
     tick periods, i.e 1000ms/CONFIG_HZ.
 
  - A set of NOHZ fixes and enhancements
 
    - Cure the concurrent writer race for idle and IO sleeptime statistics
 
      The statitic values which are exposed via /proc/stat are updated from
      the CPU local idle exit and remotely by cpufreq, but that happens
      without any form of serialization. As a consequence sleeptimes can be
      accounted twice or worse.
 
      Prevent this by restricting the accumulation writeback to the CPU
      local idle exit and let the remote access compute the accumulated
      value.
 
    - Protect idle/iowait sleep time with a sequence count
 
      Reading idle/iowait sleep time, e.g. from /proc/stat, can race with
      idle exit updates. As a consequence the readout may result in random
      and potentially going backwards values.
 
      Protect this by a sequence count, which fixes the idle time
      statistics issue, but cannot fix the iowait time problem because
      iowait time accounting races with remote wake ups decrementing the
      remote runqueues nr_iowait counter. The latter is impossible to fix,
      so the only way to deal with that is to document it properly and to
      remove the assertion in the selftest which triggers occasionally due
      to that.
 
    - Restructure struct tick_sched for better cache layout
 
    - Some small cleanups and a better cache layout for struct tick_sched
 
  - Implement the missing timer_wait_running() callback for POSIX CPU timers
 
    For unknown reason the introduction of the timer_wait_running() callback
    missed to fixup posix CPU timers, which went unnoticed for almost four
    years.
 
    While initially only targeted to prevent livelocks between a timer
    deletion and the timer expiry function on PREEMPT_RT enabled kernels, it
    turned out that fixing this for mainline is not as trivial as just
    implementing a stub similar to the hrtimer/timer callbacks.
 
    The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems
    there is a livelock issue independent of RT.
 
    CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers
    out from hard interrupt context to task work, which is handled before
    returning to user space or to a VM. The expiry mechanism moves the
    expired timers to a stack local list head with sighand lock held. Once
    sighand is dropped the task can be preempted and a task which wants to
    delete a timer will spin-wait until the expiry task is scheduled back
    in. In the worst case this will end up in a livelock when the preempting
    task and the expiry task are pinned on the same CPU.
 
    The timer wheel has a timer_wait_running() mechanism for RT, which uses
    a per CPU timer-base expiry lock which is held by the expiry code and the
    task waiting for the timer function to complete blocks on that lock.
 
    This does not work in the same way for posix CPU timers as there is no
    timer base and expiry for process wide timers can run on any task
    belonging to that process, but the concept of waiting on an expiry lock
    can be used too in a slightly different way.
 
    Add a per task mutex to struct posix_cputimers_work, let the expiry task
    hold it accross the expiry function and let the deleting task which
    waits for the expiry to complete block on the mutex.
 
    In the non-contended case this results in an extra mutex_lock()/unlock()
    pair on both sides.
 
    This avoids spin-waiting on a task which is scheduled out, prevents the
    livelock and cures the problem for RT and !RT systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGrj4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZhdEAC/lwfDWCnTXHC8ExQQRDIVNyXmDlLb
 EHB8ZY7Wc4gNZ8UEXEOLOXJHMG9bsbtPGctVewJwRGnXZWKVhpPwQba6kCRycyX0
 0J6l5DlvUaGGrpoOzOZwgETRmtIZE9tEArZR8xlfRScYd93a7yLhwIjO8JaV9vKs
 IQpAQMeJ/ysp6gHrS59qakYfoHU/ERUAu3Tk4GqHUtPtcyz3nX3eTlLWV8LySqs+
 00qr2yc0bQFUFoKzTCxtM8lcEi9ja9SOj1rw28348O+BXE4d0HC12Ie7eU/CDN2Y
 OAlWYxVjy4LMh24LDrRQKTzoVqx9MXDx2g+09B3t8NK5LgeS+EJIjujDhZF147/H
 5y906nplZUKa8BiZW5Rpm/HKH8tFI80T9XWSQCRBeMgTEJyRyRU1yASAwO4xw+dY
 Dn3tGmFGymcV/72o4ic9JFKQd8cTSxPjEJS3qqzMkEAtyI/zPBmKxj/Tce50OH40
 6FSZq1uU21ZQzszwSHISwgFtNr75laUSK4Z1te5OhPOOz+C7O9YqHvqS/1jwhPj2
 tMd8X17fRW3UTUBlBj+zqxqiEGBl/Yk2AvKrJIXGUtfWYCtjMJ7ieCf0kZ7NSVJx
 9ewubA0gqseMD783YomZsy8LLtMKnhclJeslUOVb1oKs1q/WF1R/k6qjy9vUwYaB
 nIJuHl8mxSetag==
 =SVnj
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timers and timekeeping updates from Thomas Gleixner:

 - Improve the VDSO build time checks to cover all dynamic relocations

   VDSO does not allow dynamic relocations, but the build time check is
   incomplete and fragile.

   It's based on architectures specifying the relocation types to search
   for and does not handle R_*_NONE relocation entries correctly.
   R_*_NONE relocations are injected by some GNU ld variants if they
   fail to determine the exact .rel[a]/dyn_size to cover trailing zeros.
   R_*_NONE relocations must be ignored by dynamic loaders, so they
   should be ignored in the build time check too.

   Remove the architecture specific relocation types to check for and
   validate strictly that no other relocations than R_*_NONE end up in
   the VSDO .so file.

 - Prefer signal delivery to the current thread for
   CLOCK_PROCESS_CPUTIME_ID based posix-timers

   Such timers prefer to deliver the signal to the main thread of a
   process even if the context in which the timer expires is the current
   task. This has the downside that it might wake up an idle thread.

   As there is no requirement or guarantee that the signal has to be
   delivered to the main thread, avoid this by preferring the current
   task if it is part of the thread group which shares sighand.

   This not only avoids waking idle threads, it also distributes the
   signal delivery in case of multiple timers firing in the context of
   different threads close to each other better.

 - Align the tick period properly (again)

   For a long time the tick was starting at CLOCK_MONOTONIC zero, which
   allowed users space applications to either align with the tick or to
   place a periodic computation so that it does not interfere with the
   tick. The alignement of the tick period was more by chance than by
   intention as the tick is set up before a high resolution clocksource
   is installed, i.e. timekeeping is still tick based and the tick
   period advances from there.

   The early enablement of sched_clock() broke this alignement as the
   time accumulated by sched_clock() is taken into account when
   timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is
   not longer a multiple of tick periods, which breaks applications
   which relied on that behaviour.

   Cure this by aligning the tick starting point to the next multiple of
   tick periods, i.e 1000ms/CONFIG_HZ.

 - A set of NOHZ fixes and enhancements:

     * Cure the concurrent writer race for idle and IO sleeptime
       statistics

       The statitic values which are exposed via /proc/stat are updated
       from the CPU local idle exit and remotely by cpufreq, but that
       happens without any form of serialization. As a consequence
       sleeptimes can be accounted twice or worse.

       Prevent this by restricting the accumulation writeback to the CPU
       local idle exit and let the remote access compute the accumulated
       value.

     * Protect idle/iowait sleep time with a sequence count

       Reading idle/iowait sleep time, e.g. from /proc/stat, can race
       with idle exit updates. As a consequence the readout may result
       in random and potentially going backwards values.

       Protect this by a sequence count, which fixes the idle time
       statistics issue, but cannot fix the iowait time problem because
       iowait time accounting races with remote wake ups decrementing
       the remote runqueues nr_iowait counter. The latter is impossible
       to fix, so the only way to deal with that is to document it
       properly and to remove the assertion in the selftest which
       triggers occasionally due to that.

     * Restructure struct tick_sched for better cache layout

     * Some small cleanups and a better cache layout for struct
       tick_sched

 - Implement the missing timer_wait_running() callback for POSIX CPU
   timers

   For unknown reason the introduction of the timer_wait_running()
   callback missed to fixup posix CPU timers, which went unnoticed for
   almost four years.

   While initially only targeted to prevent livelocks between a timer
   deletion and the timer expiry function on PREEMPT_RT enabled kernels,
   it turned out that fixing this for mainline is not as trivial as just
   implementing a stub similar to the hrtimer/timer callbacks.

   The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled
   systems there is a livelock issue independent of RT.

   CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU
   timers out from hard interrupt context to task work, which is handled
   before returning to user space or to a VM. The expiry mechanism moves
   the expired timers to a stack local list head with sighand lock held.
   Once sighand is dropped the task can be preempted and a task which
   wants to delete a timer will spin-wait until the expiry task is
   scheduled back in. In the worst case this will end up in a livelock
   when the preempting task and the expiry task are pinned on the same
   CPU.

   The timer wheel has a timer_wait_running() mechanism for RT, which
   uses a per CPU timer-base expiry lock which is held by the expiry
   code and the task waiting for the timer function to complete blocks
   on that lock.

   This does not work in the same way for posix CPU timers as there is
   no timer base and expiry for process wide timers can run on any task
   belonging to that process, but the concept of waiting on an expiry
   lock can be used too in a slightly different way.

   Add a per task mutex to struct posix_cputimers_work, let the expiry
   task hold it accross the expiry function and let the deleting task
   which waits for the expiry to complete block on the mutex.

   In the non-contended case this results in an extra
   mutex_lock()/unlock() pair on both sides.

   This avoids spin-waiting on a task which is scheduled out, prevents
   the livelock and cures the problem for RT and !RT systems

* tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Implement the missing timer_wait_running callback
  selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
  selftests/proc: Remove idle time monotonicity assertions
  MAINTAINERS: Remove stale email address
  timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
  timers/nohz: Add a comment about broken iowait counter update race
  timers/nohz: Protect idle/iowait sleep time under seqcount
  timers/nohz: Only ever update sleeptime from idle exit
  timers/nohz: Restructure and reshuffle struct tick_sched
  tick/common: Align tick period with the HZ tick.
  selftests/timers/posix_timers: Test delivery of signals across threads
  posix-timers: Prefer delivery of signals to the current thread
  vdso: Improve cmd_vdso_check to check all dynamic relocations
2023-04-25 11:22:46 -07:00
Michael Ellerman
6fee130204 powerpc/64: Don't call trace_hardirqs_on() in prep_irq_for_idle()
Since commit a01353cf18 ("cpuidle: Fix ct_idle_*() usage"), the
cpuidle entry code calls trace_hardirqs_on() (actually
trace_hardirqs_on_prepare()) in ct_cpuidle_enter() before calling into
the cpuidle driver.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406144535.3786008-2-mpe@ellerman.id.au
2023-04-20 13:21:49 +10:00
Michael Ellerman
7640854d96 powerpc/64: Mark prep_irq_for_idle() __cpuidle
Code in the idle path is not allowed to be instrumented because RCU is
disabled, see commit 0e985e9d22 ("cpuidle: Add comments about
noinstr/__cpuidle usage").

Mark prep_irq_for_idle() __cpuidle, which is equivalent to noinstr, to
enforce that.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406144535.3786008-1-mpe@ellerman.id.au
2023-04-20 13:21:49 +10:00
Michael Ellerman
e5b6634aa1 powerpc/irq: Mark check_return_regs_valid() notrace
check_return_regs_valid() is called from the middle of the irq exit
handling, which is all notrace, so mark it notrace also.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lore.kernel.org/all/4C073F6A-C812-4C4A-BB7A-ECD10B75FB88@linux.ibm.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406122118.3760344-1-mpe@ellerman.id.au
2023-04-20 13:21:48 +10:00
Nicholas Piggin
77e69ee7ce powerpc/64: modules support building with PCREL addresing
Build modules using PCREL addressing when CONFIG_PPC_KERNEL_PCREL=y.

- The module loader must handle several new relocation types:

  * R_PPC64_REL24_NOTOC is a function call handled like R_PPC_REL24, but
    does not restore r2 upon return. The external function call stub is
    changed to use pcrel addressing to load the function pointer rather
    than based on the module TOC.

  * R_PPC64_GOT_PCREL34 is a reference to external data. A GOT table
    must be built by hand, because the linker adds this during the final
    link (which is not done for kernel modules). The GOT table is built
    similarly to the way the external function call stub table is. This
    section is called .mygot because .got has a special meaning for the
    linker and can become upset.

  * R_PPC64_PCREL34 is used for local data addressing, but there is a
    special case where the percpu section is moved at load-time to the
    percpu area which is out of range of this relocation. This requires
    the PCREL34 relocations are converted to use GOT_PCREL34 addressing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Some coding style & formatting fixups]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-7-npiggin@gmail.com
2023-04-20 13:21:42 +10:00
Nicholas Piggin
7e3a68be42 powerpc/64: vmlinux support building with PCREL addresing
PC-Relative or PCREL addressing is an extension to the ELF ABI which
uses Power ISA v3.1 PC-relative instructions to calculate addresses,
rather than the traditional TOC scheme.

Add an option to build vmlinux using pcrel addressing. Modules continue
to use TOC addressing.

- TOC address helpers and r2 are poisoned with -1 when running vmlinux.
  r2 could be used for something useful once things are ironed out.

- Assembly must call C functions with @notoc annotation, or the linker
  complains aobut a missing nop after the call. This is done with the
  CFUNC macro introduced earlier.

- Boot: with the exception of prom_init, the execution branches to the
  kernel virtual address early in boot, before any addresses are
  generated, which ensures 34-bit pcrel addressing does not miss the
  high PAGE_OFFSET bits. TOC relative addressing has a similar
  requirement. prom_init does not go to the virtual address and its
  addresses should not carry over to the post-prom kernel.

- Ftrace trampolines are converted from TOC addressing to pcrel
  addressing, including module ftrace trampolines that currently use the
  kernel TOC to find ftrace target functions.

- BPF function prologue and function calling generation are converted
  from TOC to pcrel.

- copypage_64.S has an interesting problem, prefixed instructions have
  alignment restrictions so the linker can add padding, which makes the
  assembler treat the difference between two local labels as
  non-constant even if alignment is arranged so padding is not required.
  This may need toolchain help to solve nicely, for now move the prefix
  instruction out of the alternate patch section to work around it.

This reduces kernel text size by about 6%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-6-npiggin@gmail.com
2023-04-20 12:59:21 +10:00
Nicholas Piggin
4e991e3c16 powerpc: add CFUNC assembly label annotation
This macro is to be used in assembly where C functions are called.
pcrel addressing mode requires branches to functions with a
localentry value of 1 to have either a trailing nop or @notoc.
This macro permits the latter without changing callers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add dummy definitions to fix selftests build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-5-npiggin@gmail.com
2023-04-20 12:54:24 +10:00
Nicholas Piggin
dc5dac748a powerpc/64: Add support to build with prefixed instructions
Add an option to build kernel and module with prefixed instructions if
the CPU and toolchain support it.

This is not related to kernel support for userspace execution of
prefixed instructions.

Building with prefixed instructions breaks some extended inline asm
memory addressing, for example it will provide immediates that exceed
the range of simple load/store displacement. Whether this is a
toolchain or a kernel asm problem remains to be seen. For now, these
are replaced with simpler and less efficient direct register addressing
when compiling with prefixed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-4-npiggin@gmail.com
2023-04-20 12:54:22 +10:00
Nicholas Piggin
b270bebd34 powerpc/64s: Run at the kernel virtual address earlier in boot
This mostly consolidates the Book3E and Book3S behaviour in boot WRT
executing from the physical or virtual address.

Book3E sets up kernel virtual linear map in start_initialization_book3e
and runs from the virtual linear alias after that. This change makes
Book3S begin to execute from the virtual alias at the same point. Book3S
can not use its MMU for that at this point, but when the MMU is disabled,
the virtual linear address correctly aliases to physical memory because
the top bits of the address are ignored with MMU disabled.

Secondaries execute from the virtual address similarly early.

This reduces the differences between subarchs, but the main motivation
was to enable the PC-relative addressing ABI for Book3S, where pointer
calculations must execute from the virtual address or the top bits of
the pointer will be lost. This is similar to the requirement the TOC
relative addressing already has that the TOC pointer use its virtual
address.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-3-npiggin@gmail.com
2023-04-20 12:23:14 +10:00
Nicholas Piggin
4f18b9e6ca powerpc/64: Move initial base and TOC pointer calculation
A later change moves the non-prom case to run at the virtual address
earlier, which calls for virtual TOC and kernel base. Split these two
calculations for prom and non-prom to make that change simpler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Retain relative_toc call for start_initialization_book3e]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-2-npiggin@gmail.com
2023-04-20 12:23:14 +10:00
Josh Poimboeuf
7412a60dec cpu: Mark panic_smp_self_stop() __noreturn
In preparation for improving objtool's handling of weak noreturn
functions, mark panic_smp_self_stop() __noreturn.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/92d76ab5c8bf660f04fdcd3da1084519212de248.1681342859.git.jpoimboe@kernel.org
2023-04-14 17:31:25 +02:00
Nathan Chancellor
8002725b9e powerpc/32: Include thread_info.h in head_booke.h
When building with W=1 after commit 80b6093b55 ("kbuild: add -Wundef
to KBUILD_CPPFLAGS for W=1 builds"), the following warning occurs.

  In file included from arch/powerpc/kvm/bookehv_interrupts.S:26:
  arch/powerpc/kvm/../kernel/head_booke.h:20:6: warning: "THREAD_SHIFT" is not defined, evaluates to 0 [-Wundef]
     20 | #if (THREAD_SHIFT < 15)
        |      ^~~~~~~~~~~~

THREAD_SHIFT is defined in thread_info.h but it is not directly included
in head_booke.h, so it is possible for THREAD_SHIFT to be undefined. Add
the include to ensure that THREAD_SHIFT is always defined.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/202304050954.yskLdczH-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406-wundef-thread_shift_booke-v1-1-8deffa4d84f9@kernel.org
2023-04-11 23:13:33 +10:00
Nicholas Piggin
89fb39134a powerpc: copy_thread don't set PPR in user interrupt frame regs
syscalls do not set the PPR field in their interrupt frame and
return from syscall always sets the default PPR for userspace,
so setting the value in the ret_from_fork frame is not necessary
and mildly inconsistent. Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-9-npiggin@gmail.com
2023-04-11 23:13:33 +10:00
Nicholas Piggin
d195ce4695 powerpc: copy_thread don't set _TIF_RESTOREALL
In the kernel user thread path, don't set _TIF_RESTOREALL because
the thread is required to call kernel_execve() before it returns,
which will set _TIF_RESTOREALL if necessary via start_thread().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-8-npiggin@gmail.com
2023-04-11 23:13:33 +10:00
Nicholas Piggin
b504b6aade powerpc: differentiate kthread from user kernel thread start
Kernel created user threads start similarly to kernel threads in that
they call a kernel function after first returning from _switch, so
they share ret_from_kernel_thread for this. Kernel threads never return
from that function though, whereas user threads often do (although some
don't, e.g., IO threads).

Split these startup functions in two, and catch kernel threads that
improperly return from their function. This is intended to make the
complicated code a little bit easier to understand.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-7-npiggin@gmail.com
2023-04-11 23:13:33 +10:00
Nicholas Piggin
eed7c420aa powerpc: copy_thread differentiate kthreads and user mode threads
When copy_thread is given a kernel function to run in arg->fn, this
does not necessarily mean it is a kernel thread. User threads can be
created this way (e.g., kernel_init, see also x86's copy_thread()).
These threads run a kernel function which may call kernel_execve()
and return, which returns like a userspace exec(2) syscall.

Kernel threads are to  be differentiated with PF_KTHREAD, will always
have arg->fn set, and should never return from that function, instead
calling kthread_exit() to exit.

Create separate paths for the kthread and user kernel thread creation
logic. The kthread path will never exit and does not require a user
interrupt frame, so it gets a minimal stack frame.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-6-npiggin@gmail.com
2023-04-11 23:13:33 +10:00
Nicholas Piggin
5088a6246b powerpc/64: ret_from_fork avoid restoring regs twice
If the system call return path always restores NVGPRs then there is no
need for ret_from_fork to do it. The HANDLER_RESTORE_NVGPRS does the
right thing for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-5-npiggin@gmail.com
2023-04-11 23:13:32 +10:00
Nicholas Piggin
af5ca9d5c8 powerpc: use switch frame for ret_from_kernel_thread parameters
The kernel thread path in copy_thread creates a user interrupt frame on
stack and stores the function and arg parameters there, and
ret_from_kernel_thread loads them. This is a slightly confusing way to
overload that frame. Non-volatile registers are loaded from the switch
frame, so the parameters can be stored there. The user interrupt frame
is now only used by user threads when they return to user.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-4-npiggin@gmail.com
2023-04-11 23:13:32 +10:00
Nicholas Piggin
959791e45f powerpc: copy_thread make ret_from_fork register setup consistent
The ret_from_fork code for 64e and 32-bit set r3 for
syscall_exit_prepare the same way that 64s does, so there should
be no need to special-case them in copy_thread.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-3-npiggin@gmail.com
2023-04-11 23:13:32 +10:00
Nicholas Piggin
c013e9f2bb powerpc: copy_thread remove unused pkey code
The pkey registers (AMR, IAMR) do not get loaded from the switch frame
so it is pointless to save anything there. Remove the dead code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-2-npiggin@gmail.com
2023-04-11 23:13:32 +10:00
Mika Westerberg
09cc900632 PCI: Introduce pci_dev_for_each_resource()
Instead of open-coding it everywhere introduce a tiny helper that can be
used to iterate over each resource of a PCI device, and convert the most
obvious users into it.

While at it drop doubled empty line before pdev_sort_resources().

No functional changes intended.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230330162434.35055-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
2023-04-04 10:43:52 -05:00
Rob Herring
e4ab08be5b powerpc/isa-bridge: Remove open coded "ranges" parsing
"ranges" is a standard property with common parsing functions. Users
shouldn't be implementing their own parsing of it. Reimplement the
ISA brige "ranges" parsing using the common ranges iterator functions.

The common routines are flexible enough to work on PCI and non-PCI to
ISA bridges, so refactor pci_process_ISA_OF_ranges() and
isa_bridge_init_non_pci() into a single implementation.

Signed-off-by: Rob Herring <robh@kernel.org>
[mpe: Unsplit some strings and use pr_xxx()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230327223045.819852-1-robh@kernel.org
2023-04-04 22:11:03 +10:00
Greg Kroah-Hartman
cd8fe5b6db Merge 6.3-rc5 into driver-core-next
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03 09:33:30 +02:00
Linus Torvalds
95d0b9d89d powerpc fixes for 6.3 #4
- Fix a false positive warning in __pte_needs_flush() (with DEBUG_VM=y).
  - Fix oops when a PF_IO_WORKER thread tries to core dump.
  - Don't try to reconfigure VAS when it's disabled.
 
 Thanks to: Benjamin Gray, Haren Myneni, Jens Axboe, Nathan Lynch, Russell Currey.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmQpT20THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBHsD/kB7wxwe+h/Xddx2l9le15HfSTlVOiU
 7CO3jDYs//jt7E4ZEeHE0ppPRWcW5ICEttCrqWyT8WDfwMv7f+iBU80enZ5Rg/a7
 tX5jOel1TmsStj2bfh/myI7CX/+mEwRZ1o3J6/UgaZDlI2OLzLP/gKet3rfSh3zJ
 ilqIZxoj+ARsaYIrxnopFHmSmD5oCAsvVM6COPd3epRnIWmfd7NxtKbZU3V4WdLy
 qL9Q9Dk+RmGDSzJe1j4cRR1fA6jRIFU1WYDFz38HIVX1grFKdsm2Eeta6OXwXWGz
 TZDp82IBka6SvFkPx2RE2bGqaWbjZbA8b1mK59s5hB9trU3qFgZjoRBJoR1hp/zc
 MjfKLTxW7qC/TQAwxBKzFgTm1njRY5H5aMDnV9+6fBBC6esTgBy/la8ZygPmqWJj
 fFulUobDul9xjMg7l0CS+cjyO31Bb68fmFJOH9fmXR3fYlkP9VgJh89demD4xbMQ
 UdtAdpI8xhV98e6pX+vSlfbaeUNqzJQ7UWtr6J08nY17dhcoPxJ9RaAZdsJkZIk3
 IM/jwdhvpG++VM5CbiX+4yQHUJkekIXpfnxZIXqOl+rqv5Z0A8gRQkY469FjqZy7
 EJoYIOBGiZEgiU4oqJRkp0ifMSPMt0I0gMmRq5NxS/64mxFHlwCsJvaiHQ/fd2m8
 3DTDKQd6KOT31A==
 =1Sli
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a false positive warning in __pte_needs_flush() (with DEBUG_VM=y)

 - Fix oops when a PF_IO_WORKER thread tries to core dump

 - Don't try to reconfigure VAS when it's disabled

Thanks to Benjamin Gray, Haren Myneni, Jens Axboe, Nathan Lynch, and
Russell Currey.

* tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabled
  powerpc: Don't try to copy PPR for task with NULL pt_regs
  powerpc/64s: Fix __pte_needs_flush() false positive warning
2023-04-02 10:01:56 -07:00
Rob Herring
4d57e3515e powerpc: Use of_property_read_bool() for boolean properties
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to of_property_read_bool().

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230310144659.1541127-1-robh@kernel.org
2023-03-30 23:36:35 +11:00
Rob Herring
857d423c74 powerpc: Use of_property_present() for testing DT property presence
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties. As
part of this, convert of_get_property/of_find_property calls to the
recently added of_property_present() helper when we just want to test
for presence of a property and nothing more.

Signed-off-by: Rob Herring <robh@kernel.org>
[mpe: Drop change in ppc4xx_probe_pci_bridge(), formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230310144657.1541039-1-robh@kernel.org
2023-03-30 23:36:35 +11:00
Nathan Lynch
af8bc68263 powerpc/rtas: lockdep annotations
Add lockdep annotations for the following properties that must hold:

* Any error log retrieval must be atomically coupled with the prior
  RTAS call, without a window for another RTAS call to occur before the
  error log can be retrieved.

* All users of the core rtas_args parameter block must hold rtas_lock.

Move the definitions of rtas_lock and rtas_args up in the file so that
__do_enter_rtas_trace() can refer to them.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-6-010e4416f13f@linux.ibm.com
2023-03-30 23:36:35 +11:00
Nathan Lynch
32740fce09 powerpc/rtas: fix miswording in rtas_function kerneldoc
The 'filter' member is a pointer, not a bool; fix the wording
accordingly.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-4-010e4416f13f@linux.ibm.com
2023-03-30 23:36:35 +11:00
Nathan Lynch
1792e46ed0 powerpc/rtas: rtas_call_unlocked() kerneldoc
Add documentation for rtas_call_unlocked(), including details on how
it differs from rtas_call().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-3-010e4416f13f@linux.ibm.com
2023-03-30 23:36:34 +11:00
Nathan Lynch
271208ee5e powerpc/rtas: use memmove for potentially overlapping buffer copy
Using memcpy() isn't safe when buf is identical to rtas_err_buf, which
can happen during boot before slab is up. Full context which may not
be obvious from the diff:

	if (altbuf) {
		buf = altbuf;
	} else {
		buf = rtas_err_buf;
		if (slab_is_available())
			buf = kmalloc(RTAS_ERROR_LOG_MAX, GFP_ATOMIC);
	}
	if (buf)
		memcpy(buf, rtas_err_buf, RTAS_ERROR_LOG_MAX);

This was found by inspection and I'm not aware of it causing problems
in practice. It appears to have been introduced by commit
033ef338b6 ("powerpc: Merge rtas.c into arch/powerpc/kernel"); the
old ppc64 version of this code did not have this problem.

Use memmove() instead.

Fixes: 033ef338b6 ("powerpc: Merge rtas.c into arch/powerpc/kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-2-010e4416f13f@linux.ibm.com
2023-03-30 23:36:34 +11:00
Nicholas Piggin
aa464ba9a1 lazy tlb: introduce lazy tlb mm refcount helper functions
Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. 
This makes the lazy tlb mm references more obvious, and allows the
refcounting scheme to be modified in later changes.  There is no
functional change with this patch.

Link: https://lkml.kernel.org/r/20230203071837.1136453-3-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:08 -07:00
Jens Axboe
fd72761894 powerpc: Don't try to copy PPR for task with NULL pt_regs
powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which
from my (arguably very short) checking is not commonly done for other
archs. This is fine, except when PF_IO_WORKER's have been created and
the task does something that causes a coredump to be generated. Then we
get this crash:

  Kernel attempted to read user page (160) - exploit attempt? (uid: 1000)
  BUG: Kernel NULL pointer dereference on read at 0x00000160
  Faulting instruction address: 0xc0000000000c3a60
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
  Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod
  CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
  NIP:  c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0
  REGS: c0000000041833b0 TRAP: 0300   Not tainted  (6.3.0-rc2+)
  MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88082828  XER: 200400f8
  ...
  NIP memcpy_power7+0x200/0x7d0
  LR  ppr_get+0x64/0xb0
  Call Trace:
    ppr_get+0x40/0xb0 (unreliable)
    __regset_get+0x180/0x1f0
    regset_get_alloc+0x64/0x90
    elf_core_dump+0xb98/0x1b60
    do_coredump+0x1c34/0x24a0
    get_signal+0x71c/0x1410
    do_notify_resume+0x140/0x6f0
    interrupt_exit_user_prepare_main+0x29c/0x320
    interrupt_exit_user_prepare+0x6c/0xa0
    interrupt_return_srr_user+0x8/0x138

Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL
pt_regs.

Check for a valid pt_regs in both ppc_get/ppr_set, and return an error
if not set. The actual error value doesn't seem to be important here, so
just pick -EINVAL.

Fixes: fa439810cc ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[mpe: Trim oops in change log, add Fixes & Cc stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
2023-03-28 22:11:41 +11:00
Valentin Schneider
4c8c3c7f70 treewide: Trace IPIs sent via smp_send_reschedule()
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Changes to include the declaration of the tracepoint were driven by the
following coccinelle script:

  @func_use@
  @@
  smp_send_reschedule(...);

  @include@
  @@
  #include <trace/events/ipi.h>

  @no_include depends on func_use && !include@
  @@
    #include <...>
  +
  + #include <trace/events/ipi.h>

[csky bits]
[riscv bits]
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230307143558.294354-6-vschneid@redhat.com
2023-03-24 11:01:28 +01:00
Marco Elver
90db9dbedd kasan, powerpc: don't rename memintrinsics if compiler adds prefixes
With appropriate compiler support [1], KASAN builds use __asan prefixed
meminstrinsics, and KASAN no longer overrides memcpy/memset/memmove.

If compiler support is detected (CC_HAS_KASAN_MEMINTRINSIC_PREFIX), define
memintrinsics normally (do not prefix '__').

On powerpc, KASAN is the only user of __mem functions, which are used to
define instrumented memintrinsics.  Alias the normal versions for KASAN to
use in its implementation.

Link: https://lore.kernel.org/all/20230224085942.1791837-1-elver@google.com/ [1]
Link: https://lore.kernel.org/oe-kbuild-all/202302271348.U5lvmo0S-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230227094726.3833247-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-23 17:18:33 -07:00
Fangrui Song
aff69273af vdso: Improve cmd_vdso_check to check all dynamic relocations
The actual intention is that no dynamic relocation exists in the VDSO. For
this the VDSO build validates that the resulting .so file does not have any
relocations which are specified via $(ARCH_REL_TYPE_ABS) per architecture,
which is fragile as e.g. ARM64 lacks an entry for R_AARCH64_RELATIVE. Aside
of that ARCH_REL_TYPE_ABS is a misnomer as it checks for relative
relocations too.

However, some GNU ld ports produce unneeded R_*_NONE relocation entries. If
a port fails to determine the exact .rel[a].dyn size, the trailing zeros
become R_*_NONE relocations. E.g. ld's powerpc port recently fixed
https://sourceware.org/bugzilla/show_bug.cgi?id=29540). R_*_NONE are
generally a no-op in the dynamic loaders. So just ignore them.

Remove the ARCH_REL_TYPE_ABS defines and just validate that the resulting
.so file does not contain any R_* relocation entries except R_*_NONE.

Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for aarch64
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for vDSO, aarch64
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lore.kernel.org/r/20230310190750.3323802-1-maskray@google.com
2023-03-21 21:15:34 +01:00
Greg Kroah-Hartman
1b40189691 powerpc/sysfs: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230313182918.1312597-12-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:29:34 +01:00
Luis Chamberlain
bfedee5dc4 powerpc: Simplify sysctl registration for powersave_nap_ctl_table
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230310232850.3960676-2-mcgrof@kernel.org
2023-03-16 15:18:12 +11:00
Christophe Leroy
0aafbdf35c powerpc: Make generic_calibrate_decr() the default
ppc_md.calibrate_decr() is a mandatory item. Its nullity is never
checked so it must be non null on all platforms.

Most platforms define generic_calibrate_decr() as their
ppc_md.calibrate_decr(). Have time_init() call
generic_calibrate_decr() when ppc_md.calibrate_decr() is NULL,
and remove default assignment from all machines.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/6cb9865d916231c38401ba34ad1a98c249fae135.1676711562.git.christophe.leroy@csgroup.eu
2023-03-16 08:56:48 +11:00
Christophe Leroy
2fc39acfca powerpc/machdep: Define 'compatible' property in ppc_md and use it
Most probe functions do nothing else than checking whether
the machine is compatible to a given string.

Define that string in ppc_md structure and check it directly from
probe_machine() instead of using ppc_md.probe() for that.

Keep checking in ppc_md.probe() only for more complex probing.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/6cb9865d916231c38401ba34ad1a98c249fae135.1676711562.git.christophe.leroy@csgroup.eu
2023-03-15 00:52:10 +11:00
Bo Liu
be99429354 powerpc: Fix a kernel-doc warning
The current code provokes a kernel-doc warnings:
  arch/powerpc/kernel/process.c:1606: warning: This comment starts with
  '/**', but isn't a kernel-doc comment. Refer
  Documentation/doc-guide/kernel-doc.rst

Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20221101015452.3216-1-liubo03@inspur.com
2023-03-15 00:52:09 +11:00
Alexey Kardashevskiy
a940904443 powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
Up until now PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in the
Type1 VFIO driver. Recent development added 2 uses of iommu_ops to the
generic VFIO which broke POWER:
  - a coherency capability check;
  - blocking IOMMU domain - iommu_group_dma_owner_claimed()/...

This adds a simple iommu_ops which reports support for cache coherency
and provides a basic support for blocking domains. No other domain types
are implemented so the default domain is NULL.

Since now iommu_ops controls the group ownership, this takes it out of
VFIO.

This adds an IOMMU device into a pci_controller (=PHB) and registers it
in the IOMMU subsystem, iommu_ops is registered at this point. This
setup is done in postcore_initcall_sync.

This replaces iommu_group_add_device() with iommu_probe_device() as the
former misses necessary steps in connecting PCI devices to IOMMU
devices. This adds a comment about why explicit iommu_probe_device() is
still needed.

The previous discussion is here:
  https://lore.kernel.org/r/20220707135552.3688927-1-aik@ozlabs.ru/
  https://lore.kernel.org/r/20220701061751.1955857-1-aik@ozlabs.ru/

Fixes: e8ae0e140c ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f4708 ("vfio: Set DMA ownership for VFIO devices")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[mpe: Fix CONFIG_IOMMU_API=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2000135730.16998523.1678123860135.JavaMail.zimbra@raptorengineeringinc.com
2023-03-15 00:51:46 +11:00
Alexey Kardashevskiy
76f351096c powerpc/pci_64: Init pcibios subsys a bit later
Subsequent patches are going to add dependency/use of iommu_ops which is
initialized in subsys_initcall as well.

This moves pciobios_init() to the next initcall level.

This should not cause behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/12303156.16998521.1678123842049.JavaMail.zimbra@raptorengineeringinc.com
2023-03-14 23:36:27 +11:00
Alexey Kardashevskiy
9d67c94335 powerpc/iommu: Add "borrowing" iommu_table_group_ops
PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows
for PEs: control the ownership, create/set/unset a table the hardware
for dynamic DMA windows (DDW). VFIO uses the API to implement support on
POWER.

So far only PowerNV IODA2 (POWER8 and newer machines) implemented this
and other cases (POWER7 or nested KVM) did not and instead reused
existing iommu_table structs. This means 1) no DDW 2) ownership transfer
is done directly in the VFIO SPAPR TCE driver.

Soon POWER is going to get its own iommu_ops and ownership control is
going to move there. This implements spapr_tce_table_group_ops which
borrows iommu_table tables. The upside is that VFIO needs to know less
about POWER.

The new ops returns the existing table from create_table() and only
checks if the same window is already set. This is only going to work if
the default DMA window starts table_group.tce32_start and as big as
pe->table_group.tce32_size (not the case for IODA2+ PowerNV).

This changes iommu_table_group_ops::take_ownership() to return an error
if borrowing a table failed.

This should not cause any visible change in behavior for PowerNV.
pSeries was not that well tested/supported anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[mpe: Fix CONFIG_IOMMU_API=n build (skiroot_defconfig), & formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/525438831.16998517.1678123820075.JavaMail.zimbra@raptorengineeringinc.com
2023-03-14 23:36:27 +11:00
Song Liu
ac3b432839 module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:

1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
   obvious how these data are used (are they RO, RX, or RW?)

Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:

        MOD_TEXT,
        MOD_DATA,
        MOD_RODATA,
        MOD_RO_AFTER_INIT,
        MOD_INIT_TEXT,
        MOD_INIT_DATA,
        MOD_INIT_RODATA,

and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.

Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.

module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.

Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-09 12:55:15 -08:00
Josh Poimboeuf
071c44e427 sched/idle: Mark arch_cpu_idle_dead() __noreturn
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead()
return"), in Xen, when a previously offlined CPU was brought back
online, it unexpectedly resumed execution where it left off in the
middle of the idle loop.

There were some hacks to make that work, but the behavior was surprising
as do_idle() doesn't expect an offlined CPU to return from the dead (in
arch_cpu_idle_dead()).

Now that Xen has been fixed, and the arch-specific implementations of
arch_cpu_idle_dead() also don't return, give it a __noreturn attribute.

This will cause the compiler to complain if an arch-specific
implementation might return.  It also improves code generation for both
caller and callee.

Also fixes the following warning:

  vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:28 -08:00
Linus Torvalds
c29214bc89 powerpc fixes for 6.3 #2
- Drop orphaned VAS MAINTAINERS entry.
 
  - Fix build errors with clang and KCSAN.
 
  - Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION & recordmcount.
 
 Thanks to: Nathan Chancellor
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmQDMKkTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLfAD/4uzHcBQUzhSIUeQfKcONRynNTAi5Wn
 EwAvfEfp0amFUnakKpY+HXcL8Rt2KHzdD83O25lGaqu3MZpaVJZITrwPo/k4ufhy
 8i/YTlHJjYD0GG/HC7q507n6zVXwNCGe/TT8O2Y7zpxp1hrLEAF/eRtYuF6/YTtC
 h/7zLYdZ7D7KRW3+TfVjQ0wRP4CDw519nyQhnCb6R0BWJGOlpBdoKvbTqbeOzRgf
 IjhaYKLBAg1ep69zhBki32e5g8MGRqJGMBsHOBoBMA3/KfPBXhgYSMdl4lf9JO48
 hIVPK8vgS/Tid9u652oXwRF3t7IddTqlUMiXKZzBEIOLiDhu58oDahZ07nurAMfg
 ipnb2lRqES5OrHLhFe8s7zbn4C/spCVZ/AnkW8lDQNrVcHpMUct/0FPc3K7pN7j/
 cQK15Bztwguez8vWaSlNZ/1FPfVGQRHpJOvMFyUg7oiSvJNMivZFReaOKyHR3e1A
 VXJ0HAAOWA2dAY0u6puD7+GsC/WW7QPOdWAfvce+9CXdjZfoGjQvoRDjc/SWnIJz
 +USjEV/AxnAM+tUW7l3LjsC58wGlwMy6vP5HxinLZRpNY2NZAtxTaapzVBlQIpsP
 qK1uN1likSLkxnFsyaj1ecEP8eVOWYnfr97l+QEaJF+RyUvx1xRMPLALOL+M0030
 kkDPCU7MjTxSOg==
 =G47E
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Drop orphaned VAS MAINTAINERS entry

 - Fix build errors with clang and KCSAN

 - Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together
   with recordmcount

Thanks to Nathan Chancellor.

* tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Avoid dead code/data elimination when using recordmcount
  powerpc/vmlinux.lds: Add .text.asan/tsan sections
  powerpc: Drop orphaned VAS MAINTAINERS entry
2023-03-04 11:20:42 -08:00
Michael Ellerman
acd35dbab8 powerpc/vmlinux.lds: Add .text.asan/tsan sections
When KASAN/KCSAN are enabled clang generates .text.asan/tsan sections.
Because they are not mentioned in the linker script warnings are
generated, and when orphan handling is set to error that becomes a build
error, eg:

  ld.lld: error: vmlinux.a(init/main.o):(.text.tsan.module_ctor) is
  being placed in '.text.tsan.module_ctor' ld.lld: error:
  vmlinux.a(init/version.o):(.text.tsan.module_ctor) is being placed in
  '.text.tsan.module_ctor'

Fix it by adding the sections to our linker script, similar to the
generic change made in 848378812e ("vmlinux.lds.h: Handle clang's
module.{c,d}tor sections").

Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230222060037.2897169-1-mpe@ellerman.id.au
2023-02-28 14:32:34 +11:00
Linus Torvalds
498a1cf902 Kbuild updates for v6.3
- Change V=1 option to print both short log and full command log.
 
  - Allow V=1 and V=2 to be combined as V=12.
 
  - Make W=1 detect wrong .gitignore files.
 
  - Tree-wide cleanups for unused command line arguments passed to Clang.
 
  - Stop using -Qunused-arguments with Clang.
 
  - Make scripts/setlocalversion handle only correct release tags instead
    of any arbitrary annotated tag.
 
  - Create Debian and RPM source packages without cleaning the source tree.
 
  - Various cleanups for packaging.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmP7iHoVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGL/cQAK9q5rsNL5a2LgTbm89ORA+UV+ST
 hrAoGo5DkJHUbVH53oPzyLynFBZPvUzLK8yjApjXkyAzy2hXYnj+vbTs0s+JVCFL
 owS4NB0YP+tpHGuy8bGpWI0GMZSMwmspUteqxk86zuH8uQVAhnCaeV1/Cr6Aqj1h
 2jk1FZid3/h7qEkEgu5U8soeyFnV6VhAT6Ie5yfZ2O2RdsSqPUh6vfKrgdyW4RWz
 gito0SOUwvjIDfSmTnIIacUibisPRv2OW29OvmDp1aXj5rMhe3UfOznVE3NR86yl
 ZbWDAIm6KYT8V1ASOoAUR80qent9IPKytThLK9BVEQCT6bsujCZMvhYhhEvO30TF
 Lzsdr+FrES//xag3+hgc63FEied2xxWGQG1cRtzAhfRL9tJ03+mY1omoW6SyKqW/
 Gc9PIcTgQbCIrkeL0HuAI1q3I1vkvHXInJKtGkoHh1J9aJ8v5gQpwGA+DDRUnA+A
 LQSeEbT2Hf3MoF4CqZRnConvfhlMuLI+j5v54YPrhokxXmv7u807kjfwMFTiZ/+m
 CJFlEMf9YRv3pi8g/AYyGAg5ZQigCwzOCRUC5kguFqzZdgnjiI907GEL804lm1Mg
 lpx/HtYPyxwWEd2XyU6/C9AEIl3gm7MBd6b1tD54Tb/VmE+AvjS/O9jFYXZqnAnM
 Llv4BfK/cQKwHb6o
 =HpFZ
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Change V=1 option to print both short log and full command log

 - Allow V=1 and V=2 to be combined as V=12

 - Make W=1 detect wrong .gitignore files

 - Tree-wide cleanups for unused command line arguments passed to Clang

 - Stop using -Qunused-arguments with Clang

 - Make scripts/setlocalversion handle only correct release tags instead
   of any arbitrary annotated tag

 - Create Debian and RPM source packages without cleaning the source
   tree

 - Various cleanups for packaging

* tag 'kbuild-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (74 commits)
  kbuild: rpm-pkg: remove unneeded KERNELRELEASE from modules/headers_install
  docs: kbuild: remove description of KBUILD_LDS_MODULE
  .gitattributes: use 'dts' diff driver for *.dtso files
  kbuild: deb-pkg: improve the usability of source package
  kbuild: deb-pkg: fix binary-arch and clean in debian/rules
  kbuild: tar-pkg: use tar rules in scripts/Makefile.package
  kbuild: make perf-tar*-src-pkg work without relying on git
  kbuild: deb-pkg: switch over to source format 3.0 (quilt)
  kbuild: deb-pkg: make .orig tarball a hard link if possible
  kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile
  kbuild: srcrpm-pkg: create source package without cleaning
  kbuild: rpm-pkg: build binary packages from source rpm
  kbuild: deb-pkg: create source package without cleaning
  kbuild: add a tool to list files ignored by git
  Documentation/llvm: add Chimera Linux, Google and Meta datacenters
  setlocalversion: use only the correct release tag for git-describe
  setlocalversion: clean up the construction of version output
  .gitignore: ignore *.cover and *.mbx
  kbuild: remove --include-dir MAKEFLAG from top Makefile
  kbuild: fix trivial typo in comment
  ...
2023-02-26 11:53:25 -08:00
Linus Torvalds
d0a32f5520 powerpc updates for 6.3
- Support for configuring secure boot with user-defined keys on PowerVM LPARs.
 
  - Simplify the replay of soft-masked IRQs by making it non-recursive.
 
  - Add support for KCSAN on 64-bit Book3S.
 
  - Improvements to the API & code which interacts with RTAS (pseries firmware).
 
  - Change 32-bit powermac to assign PCI bus numbers per domain by default.
 
  - Some improvements to the 32-bit BPF JIT.
 
  - Various other small features and fixes.
 
 Thanks to: Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin Gray, Christophe
 Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict
 Glaw, Josh Poimboeuf, Kajol Jain, Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers,
 Mimi Zohar, Murphy Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin,
 Pali Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika Vasireddy,
 Sourabh Jain, Stefan Berger, Stephen Rothwell, Sudhakar Kuppusamy.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmP4GnkTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEnlEAC9UoE9JM853o9ZzpOJDrbYknHsRQad
 ztQJ9xu5qjkFHHryTmWKYdiAtNDFbcfn7+1aoc5FXrIb6BOfvBo/uRFw6P501Qwv
 Fg0MQyWUnT5WrI7+rBE2q+1+FaHBNKLycLNRSh5JpXtuKe2ubQfiFD80tarBnEnU
 6I4bqXd+xjDtnqtpfiYnil/kdZTu/MzntdkmCne6fMkflgEQFU9EVQEnnE+imqFa
 6BuCwITvZ+NyaaU+cYMeGZT7aoz9PAwkksgTxXW2gQbTIApX9WX4kYU/vbW4aHts
 0bpzMmIbSbAklYIu2PQQhSU0bLfKJ+xly8E8tozHgRX6hrFlqvtmD/T5LHTBD11f
 FFzKb0NUCD8qTIy6Hn0M1tj5egLpxxzATPe/kVTkxxqTlZrzdSEaqzft6syyJHJd
 ueo0QN53AUyBaVMtxLbnB/U/8Vnz6rLqY+8dLKzXhjYjoPJqOZh/Qlc1Tk3syPwf
 E2j4H6wFqGMTOGi453Pijkpj3qpNkNT79FG5DmClcQLJxD/EXDyffLZITrkzQa0S
 FEkcMzz/Hn9Hkf7ZuNo4DN6ss6IF0vlxoi7GNr+MRR53/aVQJUDc8z24c4ICl/3w
 20ETk57XMVJzP++Hb+yn16JyAawfQOOlckBRZ2O8W5YYVoes45hxDQxVoh8EII69
 hb3KOGYEqF5wyA==
 =ECNb
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Support for configuring secure boot with user-defined keys on PowerVM
   LPARs

 - Simplify the replay of soft-masked IRQs by making it non-recursive

 - Add support for KCSAN on 64-bit Book3S

 - Improvements to the API & code which interacts with RTAS (pseries
   firmware)

 - Change 32-bit powermac to assign PCI bus numbers per domain by
   default

 - Some improvements to the 32-bit BPF JIT

 - Various other small features and fixes

Thanks to Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin
Gray, Christophe Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand,
Greg Kroah-Hartman, Jan-Benedict Glaw, Josh Poimboeuf, Kajol Jain,
Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers, Mimi Zohar, Murphy
Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin, Pali
Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika
Vasireddy, Sourabh Jain, Stefan Berger, Stephen Rothwell, and Sudhakar
Kuppusamy.

* tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (114 commits)
  powerpc/pseries: Avoid hcall in plpks_is_available() on non-pseries
  powerpc: dts: turris1x.dts: Set lower priority for CPLD syscon-reboot
  powerpc/e500: Add missing prototype for 'relocate_init'
  powerpc/64: Fix unannotated intra-function call warning
  powerpc/epapr: Don't use wrteei on non booke
  powerpc: Pass correct CPU reference to assembler
  powerpc/mm: Rearrange if-else block to avoid clang warning
  powerpc/nohash: Fix build with llvm-as
  powerpc/nohash: Fix build error with binutils >= 2.38
  powerpc/pseries: Fix endianness issue when parsing PLPKS secvar flags
  macintosh: windfarm: Use unsigned type for 1-bit bitfields
  powerpc/kexec_file: print error string on usable memory property update failure
  powerpc/machdep: warn when machine_is() used too early
  powerpc/64: Replace -mcpu=e500mc64 by -mcpu=e5500
  powerpc/eeh: Set channel state after notifying the drivers
  selftests/powerpc: Fix incorrect kernel headers search path
  powerpc/rtas: arch-wide function token lookup conversions
  powerpc/rtas: introduce rtas_function_token() API
  powerpc/pseries/lpar: convert to papr_sysparm API
  powerpc/pseries/hv-24x7: convert to papr_sysparm API
  ...
2023-02-25 11:00:06 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds
1f2d9ffc7a Scheduler updates in this cycle are:
- Improve the scalability of the CFS bandwidth unthrottling logic
    with large number of CPUs.
 
  - Fix & rework various cpuidle routines, simplify interaction with
    the generic scheduler code. Add __cpuidle methods as noinstr to
    objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks.
 
  - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS,
    to query previously issued registrations.
 
  - Limit scheduler slice duration to the sysctl_sched_latency period,
    to improve scheduling granularity with a large number of SCHED_IDLE
    tasks.
 
  - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
    but also enable them to prevent a cascade of followup problems and
    repeat warnings.
 
  - Fix the rescheduling logic in prio_changed_dl().
 
  - Micro-optimize cpufreq and sched-util methods.
 
  - Micro-optimize ttwu_runnable()
 
  - Micro-optimize the idle-scanning in update_numa_stats(),
    select_idle_capacity() and steal_cookie_task().
 
  - Update the RSEQ code & self-tests
 
  - Constify various scheduler methods
 
  - Remove unused methods
 
  - Refine __init tags
 
  - Documentation updates
 
  - ... Misc other cleanups, fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzbJwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIvA//ZcEaB8Z6ChLRQjM+bsaudKJu3pdLQbPK
 iYbP8Da+LsAfxbEfYuGV3m+jIp0LlBOtsI/EezxQrXV+V7FvNyAX9Y00eEu/zlj8
 7Jn3LMy/DBYTwH7LwVdcU0MyIVI8ZPc6WNnkx0LOtGZn8n+qfHPSDzcP3CW+a5AV
 UvllPYpYyEmsX0Eby7CF4Ue8mSmbViw/xR3rNr8ZSve0c25XzKabw8O9kE3jiHxP
 d/zERJoAYeDyYUEuZqhfn5dTlB4an4IjNEkAfRE5SQ09RA8Gkxsa5Ar8gob9e9M1
 eQsdd4/bdhnrkM8L5qDZczqmgCTZ2bukQrxkBXhRDhLgoFxwAn77b+2ZjmIW3Lae
 AyGqRcDSg1q2oxaYm5ZiuO/t26aDOZu9vPHyHRDGt95EGbZlrp+GgeePyfCigJYz
 UmPdZAAcHdSymnnnlcvdG37WVvaVkpgWZzd8LbtBi23QR+Zc4WQ2IlgnUS5WKNNf
 VOBcAcP6E1IslDotZDQCc2dPFFQoQQEssVooyUc5oMytm7BsvxXLOeHG+Ncu/8uc
 H+U8Qn8jnqTxJbC5hkWQIJlhVKCq2FJrHxxySYTKROfUNcDgCmxboFeAcXTCIU1K
 T0S+sdoTS/CvtLklRkG0j6B8N4N98mOd9cFwUV3tX+/gMLMep3hCQs5L76JagvC5
 skkQXoONNaM=
 =l1nN
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Improve the scalability of the CFS bandwidth unthrottling logic with
   large number of CPUs.

 - Fix & rework various cpuidle routines, simplify interaction with the
   generic scheduler code. Add __cpuidle methods as noinstr to objtool's
   noinstr detection and fix boatloads of cpuidle bugs & quirks.

 - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
   previously issued registrations.

 - Limit scheduler slice duration to the sysctl_sched_latency period, to
   improve scheduling granularity with a large number of SCHED_IDLE
   tasks.

 - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
   but also enable them to prevent a cascade of followup problems and
   repeat warnings.

 - Fix the rescheduling logic in prio_changed_dl().

 - Micro-optimize cpufreq and sched-util methods.

 - Micro-optimize ttwu_runnable()

 - Micro-optimize the idle-scanning in update_numa_stats(),
   select_idle_capacity() and steal_cookie_task().

 - Update the RSEQ code & self-tests

 - Constify various scheduler methods

 - Remove unused methods

 - Refine __init tags

 - Documentation updates

 - Misc other cleanups, fixes

* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  sched/rt: pick_next_rt_entity(): check list_entry
  sched/deadline: Add more reschedule cases to prio_changed_dl()
  sched/fair: sanitize vruntime of entity being placed
  sched/fair: Remove capacity inversion detection
  sched/fair: unlink misfit task from cpu overutilized
  objtool: mem*() are not uaccess safe
  cpuidle: Fix poll_idle() noinstr annotation
  sched/clock: Make local_clock() noinstr
  sched/clock/x86: Mark sched_clock() noinstr
  x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
  x86/atomics: Always inline arch_atomic64*()
  cpuidle: tracing, preempt: Squash _rcuidle tracing
  cpuidle: tracing: Warn about !rcu_is_watching()
  cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
  cpuidle: drivers: firmware: psci: Dont instrument suspend code
  KVM: selftests: Fix build of rseq test
  exit: Detect and fix irq disabled state in oops
  cpuidle, arm64: Fix the ARM64 cpuidle logic
  cpuidle: mvebu: Fix duplicate flags assignment
  sched/fair: Limit sched slice duration
  ...
2023-02-20 17:41:08 -08:00
Sathvika Vasireddy
38d73b671a powerpc/64: Fix unannotated intra-function call warning
objtool throws the following warning:
  arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x6128:
  unannotated intra-function call

Fix the warning by annotating start_initialization_book3s symbol with the
SYM_FUNC_START_LOCAL and SYM_FUNC_END macros.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Fixes: 58f24eea52 ("powerpc/64s: Refactor initialisation after prom")
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230217043226.1020041-1-sv@linux.ibm.com
2023-02-17 22:11:55 +11:00
Christophe Leroy
3c2ce4912a powerpc/epapr: Don't use wrteei on non booke
wrteei is only for booke. Use the standard mfmsr/ori/mtmsr
when non booke.

Reported-by: Jan-Benedict Glaw <jbglaw@lug-owl.de>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b29c7f1727433b003eae050e44072741c8ac223b.1671475543.git.christophe.leroy@csgroup.eu
2023-02-17 22:07:04 +11:00
Ganesh Goudar
9efcdaac36 powerpc/eeh: Set channel state after notifying the drivers
When a PCI error is encountered 6th time in an hour we
set the channel state to perm_failure and notify the
driver about the permanent failure.

However, after upstream commit 38ddc01147 ("powerpc/eeh:
Make permanently failed devices non-actionable"), EEH handler
stops calling any routine once the device is marked as
permanent failure. This issue can lead to fatal consequences
like kernel hang with certain PCI devices.

Following log is observed with lpfc driver, with and without
this change, Without this change kernel hangs, If PCI error
is encountered 6 times for a device in an hour.

Without the change

 EEH: Beginning: 'error_detected(permanent failure)'
 PCI 0132:60:00.0#600000: EEH: not actionable (1,1,1)
 PCI 0132:60:00.1#600000: EEH: not actionable (1,1,1)
 EEH: Finished:'error_detected(permanent failure)'

With the change

 EEH: Beginning: 'error_detected(permanent failure)'
 EEH: Invoking lpfc->error_detected(permanent failure)
 EEH: lpfc driver reports: 'disconnect'
 EEH: Invoking lpfc->error_detected(permanent failure)
 EEH: lpfc driver reports: 'disconnect'
 EEH: Finished:'error_detected(permanent failure)'

To fix the issue, set channel state to permanent failure after
notifying the drivers.

Fixes: 38ddc01147 ("powerpc/eeh: Make permanently failed devices non-actionable")
Suggested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230209105649.127707-1-ganeshgr@linux.ibm.com
2023-02-15 22:41:11 +11:00
Nathan Lynch
08273c9f61 powerpc/rtas: arch-wide function token lookup conversions
With the tokens for all implemented RTAS functions now available via
rtas_function_token(), which is optimal and safe for arbitrary
contexts, there is no need to use rtas_token() or cache its result.

Most conversions are trivial, but a few are worth describing in more
detail:

* Error injection token comparisons for lockdown purposes are
  consolidated into a simple predicate: token_is_restricted_errinjct().

* A couple of special cases in block_rtas_call() do not use
  rtas_token() but perform string comparisons against names in the
  function table. These are converted to compare against token values
  instead, which is logically equivalent but less expensive.

* The lookup for the ibm,os-term token can be deferred until needed,
  instead of caching it at boot to avoid device tree traversal during
  panic.

* Since rtas_function_token() accesses a read-only data structure
  without taking any locks, xmon's lookup of set-indicator can be
  performed as needed instead of cached at startup.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-20-26929c8cce78@linux.ibm.com
2023-02-13 22:35:03 +11:00
Nathan Lynch
716bfc97bd powerpc/rtas: introduce rtas_function_token() API
Users of rtas_token() supply a string argument that can't be validated
at build time. A typo or misspelling has to be caught by inspection or
by observing wrong behavior at runtime.

Since the core RTAS code now has consolidated the names of all
possible RTAS functions and mapped them to their tokens, token lookup
can be implemented using symbolic constants to index a static array.

So introduce rtas_function_token(), a replacement API which does that,
along with a rtas_service_present()-equivalent helper,
rtas_function_implemented(). Callers supply an opaque predefined
function handle which is used internally to index the function
table. Typos or other inappropriate arguments yield build errors, and
the function handle is a type that can't be easily confused with RTAS
tokens or other integer types.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-19-26929c8cce78@linux.ibm.com
2023-02-13 22:35:03 +11:00
Nathan Lynch
43033bc62d powerpc/pseries: add RTAS work area allocator
Various pseries-specific RTAS functions take a temporary "work area"
parameter - a buffer in memory accessible to RTAS. Typically such
functions are passed the statically allocated rtas_data_buf buffer as
the argument. This buffer is protected by a global spinlock. So users
of rtas_data_buf cannot perform sleeping operations while accessing
the buffer.

Most RTAS functions that have a work area parameter can return a
status (-2/990x) that indicates that the caller should retry. Before
retrying, the caller may need to reschedule or sleep (see
rtas_busy_delay() for details). This combination of factors
leads to uncomfortable constructions like this:

	do {
		spin_lock(&rtas_data_buf_lock);
		rc = rtas_call(token, __pa(rtas_data_buf, ...);
		if (rc == 0) {
			/* parse or copy out rtas_data_buf contents */
		}
		spin_unlock(&rtas_data_buf_lock);
	} while (rtas_busy_delay(rc));

Another unfortunately common way of handling this is for callers to
blithely ignore the possibility of a -2/990x status and hope for the
best.

If users were allowed to perform blocking operations while owning a
work area, the programming model would become less tedious and
error-prone. Users could schedule away, sleep, or perform other
blocking operations without having to release and re-acquire
resources.

We could continue to use a single work area buffer, and convert
rtas_data_buf_lock to a mutex. But that would impose an unnecessarily
coarse serialization on all users. As awkward as the current design
is, it prevents longer running operations that need to repeatedly use
rtas_data_buf from blocking the progress of others.

There are more considerations. One is that while 4KB is fine for all
current in-kernel uses, some RTAS calls can take much smaller buffers,
and some (VPD, platform dumps) would likely benefit from larger
ones. Another is that at least one RTAS function (ibm,get-vpd)
has *two* work area parameters. And finally, we should expect the
number of work area users in the kernel to increase over time as we
introduce lockdown-compatible ABIs to replace less safe use cases
based on sys_rtas/librtas.

So a special-purpose allocator for RTAS work area buffers seems worth
trying.

Properties:

* The backing memory for the allocator is reserved early in boot in
  order to satisfy RTAS addressing requirements, and then managed with
  genalloc.
* Allocations can block, but they never fail (mempool-like).
* Prioritizes first-come, first-serve fairness over throughput.
* Early boot allocations before the allocator has been initialized are
  served via an internal static buffer.

Intended to replace rtas_data_buf. New code that needs RTAS work area
buffers should prefer this API.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-12-26929c8cce78@linux.ibm.com
2023-02-13 22:35:02 +11:00
Nathan Lynch
24098f580e powerpc/rtas: add tracepoints around RTAS entry
Decompose the RTAS entry C code into tracing and non-tracing variants,
calling the just-added tracepoints in the tracing-enabled path. Skip
tracing in contexts known to be unsafe (real mode, CPU offline).

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-11-26929c8cce78@linux.ibm.com
2023-02-13 22:35:02 +11:00
Nathan Lynch
77f85f69a9 powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline
Make do_enter_rtas() take a pointer to struct rtas_args and do the
__pa() conversion in one place instead of leaving it to callers. This
also makes it possible to introduce enter/exit tracepoints that access
the rtas_args struct fields.

There's no apparent reason to force inlining of do_enter_rtas()
either, and it seems to bloat the code a bit. Let the compiler decide.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-9-26929c8cce78@linux.ibm.com
2023-02-13 22:35:02 +11:00
Nathan Lynch
8252b88294 powerpc/rtas: improve function information lookups
The core RTAS support code and its clients perform two types of lookup
for RTAS firmware function information.

First, mapping a known function name to a token. The typical use case
invokes rtas_token() to retrieve the token value to pass to
rtas_call(). rtas_token() relies on of_get_property(), which performs
a linear search of the /rtas node's property list under a lock with
IRQs disabled.

Second, and less common: given a token value, looking up some
information about the function. The primary example is the sys_rtas
filter path, which linearly scans a small table to match the token to
a rtas_filter struct. Another use case to come is RTAS entry/exit
tracepoints, which will require efficient lookup of function names
from token values. Currently there is no general API for this.

We need something much like the existing rtas_filters table, but more
general and organized to facilitate efficient lookups.

Introduce:

* A new rtas_function type, aggregating function name, token,
  and filter. Other function characteristics could be added in the
  future.

* An array of rtas_function, where each element corresponds to a known
  RTAS function. All information in the table is static save the token
  values, which are derived from the device tree at boot. The array is
  sorted by function name to allow binary search.

* A named constant for each known RTAS function, used to index the
  function array. These also will be used in a client-facing API to be
  added later.

* An xarray that maps valid tokens to rtas_function objects.

Fold the existing rtas_filter table into the new rtas_function array,
with the appropriate adjustments to block_rtas_call(). Remove
now-redundant fields from struct rtas_filter. Preserve the function of
the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by
introducing a per-function flag that is set for the function entries
related to pseries LPAR migration. These have never had working users
via sys_rtas on ppc64le; see commit de0f7349a0 ("powerpc/rtas:
prevent suspend-related sys_rtas use on LE").

Convert rtas_token() to use a lockless binary search on the function
table. Fall back to the old behavior for lookups against names that
are not known to be RTAS functions, but issue a warning. rtas_token()
is for function names; it is not a general facility for accessing
arbitrary properties of the /rtas node. All known misuses of
rtas_token() have been converted to more appropriate of_ APIs in
preceding changes.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-8-26929c8cce78@linux.ibm.com
2023-02-13 22:35:02 +11:00
Nathan Lynch
836b5b9fcc powerpc/rtas: ensure 4KB alignment for rtas_data_buf
Some RTAS functions that have work area parameters impose alignment
requirements on the work area passed to them by the OS. Examples
include:

- ibm,configure-connector
- ibm,update-nodes
- ibm,update-properties

4KB is the greatest alignment required by PAPR for such
buffers. rtas_data_buf used to have a __page_aligned attribute in the
arch/ppc64 days, but that was changed to __cacheline_aligned for
unknown reasons by commit 033ef338b6 ("powerpc: Merge rtas.c into
arch/powerpc/kernel"). That works out to 128-byte alignment
on ppc64, which isn't right.

This was found by inspection and I'm not aware of any real problems
caused by this. Either current RTAS implementations don't enforce the
alignment constraints, or rtas_data_buf is always being placed at a
4KB boundary by accident (or both, perhaps).

Use __aligned(SZ_4K) to ensure the rtas_data_buf has alignment
appropriate for all users.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 033ef338b6 ("powerpc: Merge rtas.c into arch/powerpc/kernel")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-6-26929c8cce78@linux.ibm.com
2023-02-13 22:35:02 +11:00
Nathan Lynch
09d1ea72c8 powerpc/rtas: handle extended delays safely in early boot
Some code that runs early in boot calls RTAS functions that can return
-2 or 990x statuses, which mean the caller should retry. An example is
pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but
treats these benign statuses as errors instead of retrying.

pSeries_cmo_feature_init() and similar code should be made to retry
until they succeed or receive a real error, using the usual pattern:

	do {
		rc = rtas_call(token, etc...);
	} while (rtas_busy_delay(rc));

But rtas_busy_delay() will perform a timed sleep on any 990x
status. This isn't safe so early in boot, before the CPU scheduler and
timer subsystem have initialized.

The -2 RTAS status is much more likely to occur during single-threaded
boot than 990x in practice, at least on PowerVM. This is because -2
usually means that RTAS made progress but exhausted its self-imposed
timeslice, while 990x is associated with concurrent requests from the
OS causing internal contention. Regardless, according to the language
in PAPR, the OS should be prepared to handle either type of status at
any time.

Add a fallback path to rtas_busy_delay() to handle this as safely as
possible, performing a small delay on 990x. Include a counter to
detect retry loops that aren't making progress and bail out. Add __ref
to rtas_busy_delay() since it now conditionally calls an __init
function.

This was found by inspection and I'm not aware of any real
failures. However, the implementation of rtas_busy_delay() before
commit 38f7b7067d ("powerpc/rtas: rtas_busy_delay() improvements")
was not susceptible to this problem, so let's treat this as a
regression.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 38f7b7067d ("powerpc/rtas: rtas_busy_delay() improvements")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-1-26929c8cce78@linux.ibm.com
2023-02-13 22:35:01 +11:00
Russell Currey
91361b5175 powerpc/pseries: Pass PLPKS password on kexec
Before interacting with the PLPKS, we ask the hypervisor to generate a
password for the current boot, which is then required for most further
PLPKS operations.

If we kexec into a new kernel, the new kernel will try and fail to
generate a new password, as the password has already been set.

Pass the password through to the new kernel via the device tree, in
/chosen/ibm,plpks-pw. Check for the presence of this property before
trying to generate a new password - if it exists, use the existing
password and remove it from the device tree.

This only works with the kexec_file_load() syscall, not the older
kexec_load() syscall, however if you're using Secure Boot then you want
to be using kexec_file_load() anyway.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-24-ajd@linux.ibm.com
2023-02-12 22:12:39 +11:00
Andrew Donnellan
c96db155eb powerpc/secvar: Don't print error on ENOENT when reading variables
If attempting to read the size or data attributes of a  non-existent
variable (which will be possible after a later patch to expose the PLPKS
via the secvar interface), don't spam the kernel log with error messages.
Only print errors for return codes that aren't ENOENT.

Reported-by: Sudhakar Kuppusamy <sudhakar@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-14-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Andrew Donnellan
6d64c497a3 powerpc/secvar: Warn when PAGE_SIZE is smaller than max object size
Due to sysfs constraints, when writing to a variable, we can only handle
writes of up to PAGE_SIZE.

It's possible that the maximum object size is larger than PAGE_SIZE, in
which case, print a warning on boot so that the user is aware.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-13-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Andrew Donnellan
50a466bf3e powerpc/secvar: Allow backend to populate static list of variable names
Currently, the list of variables is populated by calling
secvar_ops->get_next() repeatedly, which is explicitly modelled on the
OPAL API (including the keylen parameter).

For the upcoming PLPKS backend, we have a static list of variable names.
It is messy to fit that into get_next(), so instead, let the backend put
a NULL-terminated array of variable names into secvar_ops->var_names,
which will be used if get_next() is undefined.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-12-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
86b6c0ae2c powerpc/secvar: Extend sysfs to include config vars
The forthcoming pseries consumer of the secvar API wants to expose a
number of config variables.  Allowing secvar implementations to provide
their own sysfs attributes makes it easy for consumers to expose what
they need to.

This is not being used by the OPAL secvar implementation at present, and
the config directory will not be created if no attributes are set.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-11-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Andrew Donnellan
caefd3b774 powerpc/secvar: Clean up init error messages
Remove unnecessary prefixes from error messages in secvar_sysfs_init()
(the file defines pr_fmt, so putting "secvar:" in every message is
unnecessary). Make capitalisation and punctuation more consistent.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-10-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
e024079440 powerpc/secvar: Handle max object size in the consumer
Currently the max object size is handled in the core secvar code with an
entirely OPAL-specific implementation, so create a new max_size() op and
move the existing implementation into the powernv platform.  Should be
no functional change.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-9-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
ec2f40bd00 powerpc/secvar: Handle format string in the consumer
The code that handles the format string in secvar-sysfs.c is entirely
OPAL specific, so create a new "format" op in secvar_operations to make
the secvar code more generic.  No functional change.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-8-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
16943a2faf powerpc/secvar: Use sysfs_emit() instead of sprintf()
The secvar format string and object size sysfs files are both ASCII
text, and should use sysfs_emit().  No functional change.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-7-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
26149b0202 powerpc/secvar: Warn and error if multiple secvar ops are set
The secvar code only supports one consumer at a time.

Multiple consumers aren't possible at this point in time, but we'd want
it to be obvious if it ever could happen.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-6-ajd@linux.ibm.com
2023-02-12 22:12:36 +11:00
Michael Ellerman
53cea34b0a powerpc/secvar: Use u64 in secvar_operations
There's no reason for secvar_operations to use uint64_t vs the more
common kernel type u64.

The types are compatible, but they require different printk format
strings which can lead to confusion.

Change all the secvar related routines to use u64.

Reviewed-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-5-ajd@linux.ibm.com
2023-02-12 22:12:36 +11:00
Russell Currey
c9fd295275 powerpc/secvar: Fix incorrect return in secvar_sysfs_load()
secvar_ops->get_next() returns -ENOENT when there are no more variables
to return, which is expected behaviour.

Fix this by returning 0 if get_next() returns -ENOENT.

This fixes an issue introduced in commit bd5d9c743d ("powerpc: expose
secure variables to userspace via sysfs"), but the return code of
secvar_sysfs_load() was never checked so this issue never mattered.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-4-ajd@linux.ibm.com
2023-02-12 22:12:36 +11:00
Michael Ellerman
fc8a898cfd Merge branch 'fixes' into next
Merge our fixes branch to bring in some changes that conflict with
upcoming next content.
2023-02-12 22:11:56 +11:00
Rohan McLure
4f8e09106f powerpc/kcsan: Prevent recursive instrumentation with IRQ save/restores
Instrumented memory accesses provided by KCSAN will access core-local
memories (which will save and restore IRQs) as well as restoring IRQs
directly. Avoid recursive instrumentation by applying __no_kcsan
annotation to IRQ restore routines.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
[mpe: Resolve merge conflict with IRQ replay recursion changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-5-rmclure@linux.ibm.com
2023-02-10 22:19:56 +11:00
Rohan McLure
2a7ce82dc4 powerpc/kcsan: Exclude udelay to prevent recursive instrumentation
In order for KCSAN to increase its likelihood of observing a data race,
it sets a watchpoint on memory accesses and stalls, allowing for
detection of conflicting accesses by other kernel threads or interrupts.

Stalls are implemented by injecting a call to udelay in instrumented code.
To prevent recursive instrumentation, exclude udelay from being instrumented.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-3-rmclure@linux.ibm.com
2023-02-10 22:19:56 +11:00
Rohan McLure
2fb857bc9f powerpc/kcsan: Add exclusions from instrumentation
Exclude various incompatible compilation units from KCSAN
instrumentation.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-2-rmclure@linux.ibm.com
2023-02-10 22:19:56 +11:00
Nicholas Piggin
1ee4e35076 powerpc: Skip stack validation checking alternate stacks if they are not allocated
Stack validation in early boot can just bail out of checking alternate
stacks if they are not validated yet. Checking against a NULL stack
could cause NULLish pointer values to be considered valid.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-5-npiggin@gmail.com
2023-02-10 22:19:56 +11:00
Nicholas Piggin
dc222fa773 powerpc/64: Move paca allocation to early_setup()
The early paca and boot cpuid dance is complicated and currently does
not quite work as expected for boot cpuid != 0 cases.

early_init_devtree() currently allocates the paca_ptrs and boot cpuid
paca, but until that returns and early_setup() calls setup_paca(), this
thread is currently still executing with smp_processor_id() == 0.

One problem this causes is the paca_ptrs[smp_processor_id()] pointer is
poisoned, so valid_emergency_stack() (any backtrace) and any similar
users will crash.

Another is that the hardware id which is set here will not be returned
by get_hard_smp_processor_id(smp_processor_id()), but it would work
correctly for boot_cpuid == 0, which could lead to difficult to
reproduce or find bugs. The hard id does not seem to be used by the rest
of early_init_devtree(), it just looks like all this code might have
been put here to allocate somewhere to store boot CPU hardware id while
scanning the devtree.

Rearrange things so the hwid is put in a global variable like
boot_cpuid, and do all the paca allocation and boot paca setup in the
64-bit early_setup() after we have everything ready to go.

The paca_ptrs[0] re-poisoning code in early_setup does not seem to have
ever worked, because paca_ptrs[0] was never not-poisoned when boot_cpuid
is not 0.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix build error on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-4-npiggin@gmail.com
2023-02-10 22:19:56 +11:00
Nicholas Piggin
9fa24404f5 powerpc/64: Fix task_cpu in early boot when booting non-zero cpuid
powerpc/64 can boot on a non-zero SMP processor id. Initially, the boot
CPU is said to be "assumed to be 0" until early_init_devtree() discovers
the id from the device tree. That is not a good description because the
assumption can be wrong and that has to be handled, the better
description is that 0 is used as a placeholder, and things are fixed
after the real id is discovered.

smp_processor_id() is set to the boot cpuid, but task_cpu(current) is
not, which causes the smp_processor_id() == task_cpu(current) invariant
to be broken until init_idle() in sched_init().

This is quite fragile and could lead to subtle bugs in future. One bug
is that validate_sp_size uses task_cpu() to get the process stack, so
any stack trace from the booting CPU between early_init_devtree()
and sched_init() will have problems. Early on paca_ptrs[0] will be
poisoned, so that can cause machine checks dereferencing that memory
in real mode. Later, validating the current stack pointer against the
idle task of a different secondary will probably cause no stack trace
to be printed.

Fix this by setting thread_info->cpu right after smp_processor_id() is
set to the boot cpuid.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix SMP=n build as reported by sfr]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-3-npiggin@gmail.com
2023-02-10 22:19:38 +11:00
Nicholas Piggin
ffc8e90dec powerpc/64e: Simplify address calculation in secondary hold loop
As the earlier comment explains, __secondary_hold_spinloop does not have
to be accessed at its virtual address, slightly simplifying code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203113858.1152093-4-npiggin@gmail.com
2023-02-10 22:17:36 +11:00
Nicholas Piggin
58f24eea52 powerpc/64s: Refactor initialisation after prom
Move some basic Book3S initialisation after prom to a function similar
to what Book3E looks like. Book3E returns from this function at the
virtual address mapping, and Book3S will do the same in a later change,
so making them look similar helps with that.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203113858.1152093-3-npiggin@gmail.com
2023-02-10 22:17:36 +11:00
Christophe Leroy
d9ab6da64f powerpc: Remove __kernel_text_address() in show_instructions()
That test was introducted in 2006 by
commit 00ae36de49 ("[POWERPC] Better check in show_instructions").
At that time, there was no BPF progs.

As seen in message of commit 89d21e259a ("powerpc/bpf/32: Fix Oops
on tail call tests"), when a page fault occurs in test_bpf.ko for
instance, the code is dumped as XXXXXXXXs. Allthough
__kernel_text_address() checks is_bpf_text_address(), it seems it is
not enough.

Today, show_instructions() uses get_kernel_nofault() to read the code,
so there is no real need for additional verifications.

ARM64 and x86 don't do any additional check before dumping
instructions. Do the same and remove __kernel_text_address()
in show_instructions().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4fd69ef7945518c3e27f96b95046a5c1468d35bf.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:34 +11:00
Ganesh Goudar
2115732e54 powerpc/mce: log the error for all unrecoverable errors
For all unrecoverable errors we are missing to log the
error, Since machine_check_log_err() is not getting called
for unrecoverable errors. machine_check_log_err() is called
from deferred handler, To run deferred handlers we have to do
irq work raise from the exception handler.

For recoverable errors exception vector code takes care of
running deferred handlers.

For unrecoverable errors raise irq work in save_mce_event(),
So that we log the error from MCE deferred handler.

Log without this change

 MCE: CPU27: machine check (Severe)  Real address Load/Store (foreign/control memory) [Not recovered]
 MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4]
 MCE: CPU27: Initiator CPU
 MCE: CPU27: Unknown

Log with this change

 MCE: CPU24: machine check (Severe)  Real address Load/Store (foreign/control memory) [Not recovered]
 MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48]
 MCE: CPU24: Initiator CPU
 MCE: CPU24: Unknown
 RTAS: event: 5, Type: Platform Error (224), Severity: 3

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230201095933.129482-1-ganeshgr@linux.ibm.com
2023-02-10 22:17:34 +11:00
Baoquan He
728e5ae07d powerpc: mm: add VM_IOREMAP flag to the vmalloc area
Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
specific alignment clamping in __get_vm_area_node(), they will be

 1) Shown as ioremap in /proc/vmallocinfo;

 2) Ignored by /proc/kcore reading via vread()

So for the io mapping in ioremap_phb() of ppc, we should set VM_IOREMAP in
flag to make it handled correctly as above.

Link: https://lkml.kernel.org/r/20230206084020.174506-7-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09 16:51:43 -08:00
Greg Kroah-Hartman
b505063910 powerpc/iommu: fix memory leak with using debugfs_lookup()
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230202141919.2298821-1-gregkh@linuxfoundation.org
2023-02-08 21:42:12 +11:00
Pali Rohár
5d2eb73aa0 powerpc/pci: Add option for using pci_to_OF_bus_map
The "pci-OF-bus-map" property was declared deprecated in 2006 [1] and to
the best of everyone's knowledge is not used by anything anymore [2].

The creation of the property was disabled on powermac (arch/powerpc) in
2005 by commit 35499c0195 ("powerpc: Merge in 64-bit powermac
support."). But it is still created by default on CHRP.

On powermac the actual map (pci_to_OF_bus_map) is still used by default,
even though the device tree property is not created.

Add an option to enable/disable use of the pci_to_OF_bus_map, and
creation of the property (on CHRP).

Disabling the option allows enabling CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT
which allows "normal" bus numbering and more than 256 buses, like 64-bit
and other architectures.

Mark the new option as default n, the intention is that the option and
the code will be removed in a future release.

[1]: https://lore.kernel.org/linuxppc-dev/1148016268.13249.14.camel@localhost.localdomain/
[2]: https://lore.kernel.org/all/575f239205e8635add81c9f902b7d9db7beb83ea.camel@kernel.crashing.org/

Signed-off-by: Pali Rohár <pali@kernel.org>
[mpe: Reword commit & help text, shrink option name, rework to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206113902.1857123-1-mpe@ellerman.id.au
2023-02-07 20:15:23 +11:00
Nicholas Piggin
2ea31e2e62 powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch
The RFI and STF security mitigation options can flip the
interrupt_exit_not_reentrant static branch condition concurrently with
the interrupt exit code which tests that branch.

Interrupt exit tests this condition to set MSR[EE|RI] for exit, then
again in the case a soft-masked interrupt is found pending, to recover
the MSR so the interrupt can be replayed before attempting to exit
again. If the condition changes between these two tests, the MSR and irq
soft-mask state will become corrupted, leading to warnings and possible
crashes. For example, if the branch is initially true then false,
MSR[EE] will be 0 but PACA_IRQ_HARD_DIS clear and EE may not get
enabled, leading to warnings in irq_64.c.

Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206042240.92103-1-npiggin@gmail.com
2023-02-07 10:13:33 +11:00
Nathan Chancellor
4e3feaad6f powerpc/vdso: Filter clang's auto var init zero enabler when linking
After commit 8d9acfce33 ("kbuild: Stop using '-Qunused-arguments' with
clang"), the PowerPC vDSO shows the following error with clang-13 and
older when CONFIG_INIT_STACK_ALL_ZERO is enabled:

  clang: error: argument unused during compilation: '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang' [-Werror,-Wunused-command-line-argument]

clang-14 added a change to make sure this flag never triggers
-Wunused-command-line-argument, so it is fixed with newer releases. For
older releases that the kernel still supports building with, just filter
out this flag, as has been done for other flags.

Fixes: f0a42fbab4 ("powerpc/vdso: Improve linker flags")
Fixes: 8d9acfce33 ("kbuild: Stop using '-Qunused-arguments' with clang")
Link: ca6d5813d1
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-02-05 18:51:22 +09:00
Ingo Molnar
57a30218fa Linux 6.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
 E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
 nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
 TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
 s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
 ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
 SDc/Y/c=
 =zpaD
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc6' into sched/core, to pick up fixes

Pick up fixes before merging another batch of cpuidle updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-31 15:01:20 +01:00
Nicholas Piggin
c28548012e powerpc/64: Fix perf profiling asynchronous interrupt handlers
Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the
hard irq disabled state. So when should_hard_irq_enable() returns true
because we want PMI interrupts in irq handlers, MSR[EE] is enabled but
PMIs just get soft-masked. Fix this by clearing IRQS_PMI_DISABLED before
enabling MSR[EE].

This also tidies some of the warnings, no need to duplicate them in
both should_hard_irq_enable() and do_hard_irq_enable().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121100156.2824054-1-npiggin@gmail.com
2023-01-30 20:07:42 +11:00
Nicholas Piggin
5746ca131e powerpc/64: Don't recurse irq replay
Interrupt handlers called by soft-pending irq replay code can run
softirqs, softirq replay enables and disables local irqs, which allows
interrupts to come in including soft-masked interrupts, and it can
cause pending irqs to be replayed again. That makes the soft irq replay
state machine and possible races more complicated and fragile than it
needs to be.

Use irq_enter/irq_exit around irq replay to prevent softirqs running
while interrupts are being replayed. Softirqs will now be run at the
irq_exit() call after all the irq replaying is done. This prevents irqs
being replayed while irqs are being replayed, and should hopefully make
things simpler and easier to think about and debug.

A new PACA_IRQ_REPLAYING is added to prevent asynchronous interrupt
handlers hard-enabling EE while pending irqs are being replayed, because
that causes new pending irqs to arrive which is also a complexity. This
means pending irqs won't be profiled quite so well because perf irqs
can't be taken.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121102618.2824429-1-npiggin@gmail.com
2023-01-30 18:21:41 +11:00
Nathan Lynch
12fd66651d powerpc/rtas: upgrade internal arch spinlocks
At the time commit f97bb36f70 ("powerpc/rtas: Turn rtas lock into a
raw spinlock") was written, the spinlock lockup detection code called
__delay(), which will not make progress if the timebase is not
advancing. Since the interprocessor timebase synchronization sequence
for chrp, cell, and some now-unsupported Power models can temporarily
freeze the timebase through an RTAS function (freeze-time-base), the
lock that serializes most RTAS calls was converted to arch_spinlock_t
to prevent kernel hangs in the lockup detection code.

However, commit bc88c10d7e ("locking/spinlock/debug: Remove spinlock
lockup detection code") removed that inconvenient property from the
lock debug code several years ago. So now it should be safe to
reintroduce generic locks into the RTAS support code, primarily to
increase lockdep coverage.

Making rtas_lock a spinlock_t would violate lock type nesting rules
because it can be acquired while holding raw locks, e.g. pci_lock and
irq_desc->lock. So convert it to raw_spinlock_t. There's no apparent
reason not to upgrade timebase_lock as well.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-5-nathanl@linux.ibm.com
2023-01-30 17:53:05 +11:00
Nathan Lynch
599af49155 powerpc/rtas: remove lock and args fields from global rtas struct
Only code internal to the RTAS subsystem needs access to the central
lock and parameter block. Remove these from the globally visible
'rtas' struct and make them file-static in rtas.c.

Some changed lines in rtas_call() lack appropriate spacing around
operators and cause checkpatch errors; fix these as well.

Suggested-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-4-nathanl@linux.ibm.com
2023-01-30 17:53:05 +11:00
Nathan Lynch
9bce624384 powerpc/rtas: make all exports GPL
The first symbol exports of RTAS functions and data came with the (now
removed) scanlog driver in 2003:

https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=f92e361842d5251e50562b09664082dcbd0548bb

At the time this was applied, EXPORT_SYMBOL_GPL() was very new, and
the exports of rtas_call() etc have remained non-GPL. As new APIs have
been added to the RTAS subsystem, their symbol exports have followed
the convention set by existing code.

However, the historical evidence is that RTAS function exports have been
added over time only to satisfy the needs of in-kernel users, and these
clients must have fairly intimate knowledge of how the APIs work to use
them safely. No out of tree users are known, and future ones seem
unlikely.

Arguably the default for RTAS symbols should have become
EXPORT_SYMBOL_GPL once it was available. Let's make it so now, and
exceptions can be evaluated as needed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-3-nathanl@linux.ibm.com
2023-01-30 17:53:05 +11:00
Michael Ellerman
0d7e812fd2 powerpc/rtas: Drop unused export symbols
Some RTAS symbols are never used by modular code, drop their exports.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Link: https://lore.kernel.org/r/20230127111231.84294-1-mpe@ellerman.id.au
2023-01-30 17:52:57 +11:00
Nathan Lynch
5ff92e2f27 powerpc/rtas: unexport 'rtas' symbol
No modular code needs access to the 'rtas' struct, so remove the
symbol export.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-2-nathanl@linux.ibm.com
2023-01-30 17:42:52 +11:00
Sathvika Vasireddy
8afffce6aa powerpc/85xx: Fix unannotated intra-function call warning
objtool throws the following warning:
  arch/powerpc/kernel/head_85xx.o: warning: objtool: .head.text+0x1a6c:
  unannotated intra-function call

Fix the warning by annotating KernelSPE symbol with SYM_FUNC_START_LOCAL
and SYM_FUNC_END macros.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230128124138.1066176-1-sv@linux.ibm.com
2023-01-30 15:59:59 +11:00
Josh Poimboeuf
37251c7114 powerpc/module_64: Fix "expected nop" error on module re-patching
When a module with a livepatched function is unloaded and then reloaded,
klp attempts to dynamically re-patch it.  On ppc64, that fails with the
following error:

  module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
  livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
  livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

The error happens because the restore r2 instruction had already
previously been written into the klp module's replacement function when
the original function was patched the first time.  So the instruction
wasn't a nop as expected.

When the restore r2 instruction has already been patched in, detect that
and skip the warning and the instruction write.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2f6329ffd9674df6ff57e03edeb2ca54414770ab.1674617130.git.jpoimboe@kernel.org
2023-01-26 21:08:04 +11:00
Josh Poimboeuf
bc2c6f5695 powerpc/module_64: Improve restore_r2() return semantics
restore_r2() returns 1 on success, which is surprising for a non-boolean
function.  Change it to return 0 on success and -errno on error to match
kernel coding convention.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/15baf76c271a0ae09f7b8556e50f2b4251e7049d.1674617130.git.jpoimboe@kernel.org
2023-01-26 21:08:04 +11:00
Nathan Chancellor
05e05bfc92 powerpc/vdso: Remove an unsupported flag from vgettimeofday-32.o with clang
When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
warns:

  clang-16: error: argument unused during compilation: '-fno-stack-clash-protection' [-Werror,-Wunused-command-line-argument]

This warning happens because vgettimeofday-32.c gets its base CFLAGS
from the main kernel, which may contain flags that are only supported on
a 64-bit target but not a 32-bit one, which is the case here.
-fstack-clash-protection and its negation are only suppported by the
64-bit powerpc target but that flag is included in an invocation for a
32-bit powerpc target, so clang points out that while the flag is one
that it recognizes, it is not actually used by this compiler job.

To eliminate the warning, remove -fno-stack-clash-protection from
vgettimeofday-32.c's CFLAGS when using clang, as has been done for other
flags previously.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-01-26 12:42:40 +09:00
Nathan Chancellor
f0a42fbab4 powerpc/vdso: Improve linker flags
When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, there
are several warnings in the PowerPC vDSO:

  clang-16: error: -Wl,-soname=linux-vdso32.so.1: 'linker' input unused [-Werror,-Wunused-command-line-argument]
  clang-16: error: -Wl,--hash-style=both: 'linker' input unused [-Werror,-Wunused-command-line-argument]
  clang-16: error: argument unused during compilation: '-shared' [-Werror,-Wunused-command-line-argument]

  clang-16: error: argument unused during compilation: '-nostdinc' [-Werror,-Wunused-command-line-argument]
  clang-16: error: argument unused during compilation: '-Wa,-maltivec' [-Werror,-Wunused-command-line-argument]

The first group of warnings point out that linker flags were being added
to all invocations of $(CC), even though they will only be used during
the final vDSO link. Move those flags to ldflags-y.

The second group of warnings are compiler or assembler flags that will
be unused during linking. Filter them out from KBUILD_CFLAGS so that
they are not used during linking.

Additionally, '-z noexecstack' was added directly to the ld_and_check
rule in commit 1d53c0192b ("powerpc/vdso: link with -z noexecstack")
but now that there is a common ldflags variable, it can be moved there.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-01-26 12:42:27 +09:00
Nathan Chancellor
024734d132 powerpc/vdso: Remove unused '-s' flag from ASFLAGS
When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
warns:

  clang-16: error: argument unused during compilation: '-s' [-Werror,-Wunused-command-line-argument]

The compiler's '-s' flag is a linking option (it is passed along to the
linker directly), which means it does nothing when the linker is not
invoked by the compiler. The kernel builds all .o files with '-c', which
stops the compilation pipeline before linking, so '-s' can be safely
dropped from ASFLAGS.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-01-26 12:42:06 +09:00
Masahiro Yamada
c0d3b83100 kbuild: do not print extra logs for V=2
Some scripts increase the verbose level when V=1, but others when
not V=0.

I think the former is correct because V=2 is not a log level but
a switch to print the reason for rebuilding.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2023-01-22 23:43:32 +09:00
Mike Kravetz
e9adcfecf5 mm: remove zap_page_range and create zap_vma_pages
zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas.  While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma.  In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas.  When
crossing a vma boundary, a new mmu_notifier_range_init/end call pair with
the new vma should be made.

Instead of fixing zap_page_range, do the following:
- Create a new routine zap_vma_pages() that will remove all pages within
  the passed vma.  Most users of zap_page_range pass the entire vma and
  can use this new routine.
- For callers of zap_page_range not passing the entire vma, instead call
  zap_page_range_single().
- Remove zap_page_range.

[1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>	[s390]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18 17:12:55 -08:00
Peter Zijlstra
89b3098703 arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabled
Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.

However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.

Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.618076436@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
2b5a0e425e objtool/idle: Validate __cpuidle code as noinstr
Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
2023-01-13 11:48:15 +01:00
Michael Ellerman
be5f95c877 powerpc/vmlinux.lds: Don't discard .comment
Although the powerpc linker script mentions .comment in the DISCARD
section, that has never actually caused it to be discarded, because the
earlier ELF_DETAILS macro (previously STABS_DEBUG) explicitly includes
.comment.

However commit 99cb0d917f ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro. With binutils < 2.36 that causes the DISCARD directives later in
the script to be applied earlier, causing .comment to actually be
discarded.

It's confusing to explicitly include and discard .comment, and even more
so if the behaviour depends on the toolchain version. So don't discard
.comment in order to maintain the existing behaviour in all cases.

Fixes: 83a092cf95 ("powerpc: Link warning for orphan sections")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-3-mpe@ellerman.id.au
2023-01-06 00:25:12 +11:00
Michael Ellerman
07b050f929 powerpc/vmlinux.lds: Don't discard .rela* for relocatable builds
Relocatable kernels must not discard relocations, they need to be
processed at runtime. As such they are included for CONFIG_RELOCATABLE
builds in the powerpc linker script (line 340).

However they are also unconditionally discarded later in the
script (line 414). Previously that worked because the earlier inclusion
superseded the discard.

However commit 99cb0d917f ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro (line 137). With binutils < 2.36 that causes the DISCARD
directives later in the script to be applied earlier, causing .rela* to
actually be discarded at link time, leading to build warnings and a
kernel that doesn't boot:

  ld: warning: discarding dynamic section .rela.init.rodata

Fix it by conditionally discarding .rela* only when CONFIG_RELOCATABLE
is disabled.

Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Link: https://lore.kernel.org/r/20230105132349.384666-2-mpe@ellerman.id.au
2023-01-06 00:25:01 +11:00
Michael Ellerman
4b9880dbf3 powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT
The powerpc linker script explicitly includes .exit.text, because
otherwise the link fails due to references from __bug_table and
__ex_table. The code is freed (discarded) at runtime along with
.init.text and data.

That has worked in the past despite powerpc not defining
RUNTIME_DISCARD_EXIT because DISCARDS appears late in the powerpc linker
script (line 410), and the explicit inclusion of .exit.text
earlier (line 280) supersedes the discard.

However commit 99cb0d917f ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro (line 136). With binutils < 2.36 that causes the DISCARD
directives later in the script to be applied earlier [1], causing
.exit.text to actually be discarded at link time, leading to build
errors:

  '.exit.text' referenced in section '__bug_table' of crypto/algboss.o: defined in
  discarded section '.exit.text' of crypto/algboss.o
  '.exit.text' referenced in section '__ex_table' of drivers/nvdimm/core.o: defined in
  discarded section '.exit.text' of drivers/nvdimm/core.o

Fix it by defining RUNTIME_DISCARD_EXIT, which causes the generic
DISCARDS macro to not include .exit.text at all.

1: https://lore.kernel.org/lkml/87fscp2v7k.fsf@igel.home/

Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-1-mpe@ellerman.id.au
2023-01-06 00:24:50 +11:00
Jason A. Donenfeld
6bb20c152b random: do not include <asm/archrandom.h> from random.h
The <asm/archrandom.h> header is a random.c private detail, not
something to be called by other code. As such, don't make it
automatically available by way of random.h.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-12-20 03:13:45 +01:00
Linus Torvalds
5f6e430f93 powerpc updates for 6.2
- Add powerpc qspinlock implementation optimised for large system scalability and
    paravirt. See the merge message for more details.
 
  - Enable objtool to be built on powerpc to generate mcount locations.
 
  - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is
    restricted to the patching CPU.
 
  - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI.
 
  - Sanitise user registers on interrupt entry on 64-bit Book3S.
 
  - Many other small features and fixes.
 
 Thanks to: Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen
 Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin
 Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven,
 Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain,
 Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao,
 Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure,
 Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
 Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang
 Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, Wolfram Sang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmOfrj8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIWtD/9mGF/ze2k+qFTo+30fb7bO8WJIDgsR
 dIASnZjXV7q/45elvymhUdkQv4R7xL3pzC40P1+ZKtWzGTNe+zWUQLoALNwRK85j
 8CsxZbqefGNKE5Z6ZHo9s37wsu3+jJu9yEQpGFo1LINyzeclCn5St5oqfRam+Hd/
 cPF+VfvREwZ0+YOKGBhJ2EgC+Gc9xsFY7DLQsoYlu71iZZr6Z6rgZW/EY5h3RMGS
 YKBoVwDsWaU0FpFWrr/rYTI6DqSr3AHr1+ftDg7ncCZMD6vQva6aMCCt94aLB1aE
 vC+DNdhZlA558bXGa5yA7Wr//7aUBUIwyC60DogOeZ6vw3kD9tdEd1fbH5hmqNKY
 K5bfqm28XU2959CTE8RDgsYYZvwDcfrjBIML14WZGdCQOTcGKpgOGp22o6yNb1Pq
 JKpHHnVpvu2PZ/p2XdKSm9+etr2yI6lXZAEVTS7ehdtMukButjSHEVbSCEZ8tlWz
 KokQt2J23BMHuSrXK6+67wWQBtdsLEk+LBOQmweiwarMocqvL/Zjz/5J7DR2DtH8
 wlY3wOtB1+E5j7xZ+RgK3c3jNg5dH39ZwvFsSATWTI3P+iq6OK/bbk4q4LmZt2l9
 ZIfH/CXPf9BvGCHzHa3AAd3UBbJLFwj17btMEv1wFVPS0T4LPUzkgTNTNUYeP6zL
 h1e5QfgUxvKPuQ==
 =7k3p
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add powerpc qspinlock implementation optimised for large system
   scalability and paravirt. See the merge message for more details

 - Enable objtool to be built on powerpc to generate mcount locations

 - Use a temporary mm for code patching with the Radix MMU, so the
   writable mapping is restricted to the patching CPU

 - Add an option to build the 64-bit big-endian kernel with the ELFv2
   ABI

 - Sanitise user registers on interrupt entry on 64-bit Book3S

 - Many other small features and fixes

Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn
Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET,
Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang,
Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A.
R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol
Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin,
Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika
Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng,
XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu,
and Wolfram Sang.

* tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits)
  powerpc/code-patching: Fix oops with DEBUG_VM enabled
  powerpc/qspinlock: Fix 32-bit build
  powerpc/prom: Fix 32-bit build
  powerpc/rtas: mandate RTAS syscall filtering
  powerpc/rtas: define pr_fmt and convert printk call sites
  powerpc/rtas: clean up includes
  powerpc/rtas: clean up rtas_error_log_max initialization
  powerpc/pseries/eeh: use correct API for error log size
  powerpc/rtas: avoid scheduling in rtas_os_term()
  powerpc/rtas: avoid device tree lookups in rtas_os_term()
  powerpc/rtasd: use correct OF API for event scan rate
  powerpc/rtas: document rtas_call()
  powerpc/pseries: unregister VPA when hot unplugging a CPU
  powerpc/pseries: reset the RCU watchdogs after a LPM
  powerpc: Take in account addition CPU node when building kexec FDT
  powerpc: export the CPU node count
  powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state
  powerpc/dts/fsl: Fix pca954x i2c-mux node names
  cxl: Remove unnecessary cxl_pci_window_alignment()
  selftests/powerpc: Fix resource leaks
  ...
2022-12-19 07:13:33 -06:00
Linus Torvalds
4f292c4de4 New Feature:
* Randomize the per-cpu entry areas
 Cleanups:
 * Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open
   coding it
 * Move to "native" set_memory_rox() helper
 * Clean up pmd_get_atomic() and i386-PAE
 * Remove some unused page table size macros
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ
 krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS
 cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/
 Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT
 MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh
 UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN
 9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ
 +PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv
 x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc
 VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a
 D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u
 dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY=
 =wwVF
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Dave Hansen:
 "New Feature:

   - Randomize the per-cpu entry areas

  Cleanups:

   - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it

   - Move to "native" set_memory_rox() helper

   - Clean up pmd_get_atomic() and i386-PAE

   - Remove some unused page table size macros"

* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  x86/mm: Ensure forced page table splitting
  x86/kasan: Populate shadow for shared chunk of the CPU entry area
  x86/kasan: Add helpers to align shadow addresses up and down
  x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
  x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
  x86/mm: Recompute physical address for every page of per-CPU CEA mapping
  x86/mm: Rename __change_page_attr_set_clr(.checkalias)
  x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
  x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
  x86/mm: Add a few comments
  x86/mm: Fix CR3_ADDR_MASK
  x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
  mm: Convert __HAVE_ARCH_P..P_GET to the new style
  mm: Remove pointless barrier() after pmdp_get_lockless()
  x86/mm/pae: Get rid of set_64bit()
  x86_64: Remove pointless set_64bit() usage
  x86/mm/pae: Be consistent with pXXp_get_and_clear()
  x86/mm/pae: Use WRITE_ONCE()
  x86/mm/pae: Don't (ab)use atomic64
  mm/gup: Fix the lockless PMD access
  ...
2022-12-17 14:06:53 -06:00
Peter Zijlstra
d48567c9a0 mm: Introduce set_memory_rox()
Because endlessly repeating:

	set_memory_ro()
	set_memory_x()

is getting tedious.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
2022-12-15 10:37:26 -08:00
Linus Torvalds
8702f2c611 Non-MM patches for 6.2-rc1.
- A ptrace API cleanup series from Sergey Shtylyov
 
 - Fixes and cleanups for kexec from ye xingchen
 
 - nilfs2 updates from Ryusuke Konishi
 
 - squashfs feature work from Xiaoming Ni: permit configuration of the
   filesystem's compression concurrency from the mount command line.
 
 - A series from Akinobu Mita which addresses bound checking errors when
   writing to debugfs files.
 
 - A series from Yang Yingliang to address rapido memory leaks
 
 - A series from Zheng Yejian to address possible overflow errors in
   encode_comp_t().
 
 - And a whole shower of singleton patches all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5efRgAKCRDdBJ7gKXxA
 jgvdAP0al6oFDtaSsshIdNhrzcMwfjt6PfVxxHdLmNhF1hX2dwD/SVluS1bPSP7y
 0sZp7Ustu3YTb8aFkMl96Y9m9mY1Nwg=
 =ga5B
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - A ptrace API cleanup series from Sergey Shtylyov

 - Fixes and cleanups for kexec from ye xingchen

 - nilfs2 updates from Ryusuke Konishi

 - squashfs feature work from Xiaoming Ni: permit configuration of the
   filesystem's compression concurrency from the mount command line

 - A series from Akinobu Mita which addresses bound checking errors when
   writing to debugfs files

 - A series from Yang Yingliang to address rapidio memory leaks

 - A series from Zheng Yejian to address possible overflow errors in
   encode_comp_t()

 - And a whole shower of singleton patches all over the place

* tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits)
  ipc: fix memory leak in init_mqueue_fs()
  hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount
  rapidio: devices: fix missing put_device in mport_cdev_open
  kcov: fix spelling typos in comments
  hfs: Fix OOB Write in hfs_asc2mac
  hfs: fix OOB Read in __hfs_brec_find
  relay: fix type mismatch when allocating memory in relay_create_buf()
  ocfs2: always read both high and low parts of dinode link count
  io-mapping: move some code within the include guarded section
  kernel: kcsan: kcsan_test: build without structleak plugin
  mailmap: update email for Iskren Chernev
  eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD
  rapidio: fix possible UAF when kfifo_alloc() fails
  relay: use strscpy() is more robust and safer
  cpumask: limit visibility of FORCE_NR_CPUS
  acct: fix potential integer overflow in encode_comp_t()
  acct: fix accuracy loss for input value of encode_comp_t()
  linux/init.h: include <linux/build_bug.h> and <linux/stringify.h>
  rapidio: rio: fix possible name leak in rio_register_mport()
  rapidio: fix possible name leaks when rio_add_device() fails
  ...
2022-12-12 17:28:58 -08:00
Linus Torvalds
268325bda5 Random number generator updates for Linux 6.2-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
 A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
 bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
 RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
 WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
 waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
 ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
 /ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
 PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
 RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
 CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
 APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
 =QRhK
 -----END PGP SIGNATURE-----

Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:

 - Replace prandom_u32_max() and various open-coded variants of it,
   there is now a new family of functions that uses fast rejection
   sampling to choose properly uniformly random numbers within an
   interval:

       get_random_u32_below(ceil) - [0, ceil)
       get_random_u32_above(floor) - (floor, U32_MAX]
       get_random_u32_inclusive(floor, ceil) - [floor, ceil]

   Coccinelle was used to convert all current users of
   prandom_u32_max(), as well as many open-coded patterns, resulting in
   improvements throughout the tree.

   I'll have a "late" 6.1-rc1 pull for you that removes the now unused
   prandom_u32_max() function, just in case any other trees add a new
   use case of it that needs to converted. According to linux-next,
   there may be two trivial cases of prandom_u32_max() reintroductions
   that are fixable with a 's/.../.../'. So I'll have for you a final
   conversion patch doing that alongside the removal patch during the
   second week.

   This is a treewide change that touches many files throughout.

 - More consistent use of get_random_canary().

 - Updates to comments, documentation, tests, headers, and
   simplification in configuration.

 - The arch_get_random*_early() abstraction was only used by arm64 and
   wasn't entirely useful, so this has been replaced by code that works
   in all relevant contexts.

 - The kernel will use and manage random seeds in non-volatile EFI
   variables, refreshing a variable with a fresh seed when the RNG is
   initialized. The RNG GUID namespace is then hidden from efivarfs to
   prevent accidental leakage.

   These changes are split into random.c infrastructure code used in the
   EFI subsystem, in this pull request, and related support inside of
   EFISTUB, in Ard's EFI tree. These are co-dependent for full
   functionality, but the order of merging doesn't matter.

 - Part of the infrastructure added for the EFI support is also used for
   an improvement to the way vsprintf initializes its siphash key,
   replacing an sleep loop wart.

 - The hardware RNG framework now always calls its correct random.c
   input function, add_hwgenerator_randomness(), rather than sometimes
   going through helpers better suited for other cases.

 - The add_latent_entropy() function has long been called from the fork
   handler, but is a no-op when the latent entropy gcc plugin isn't
   used, which is fine for the purposes of latent entropy.

   But it was missing out on the cycle counter that was also being mixed
   in beside the latent entropy variable. So now, if the latent entropy
   gcc plugin isn't enabled, add_latent_entropy() will expand to a call
   to add_device_randomness(NULL, 0), which adds a cycle counter,
   without the absent latent entropy variable.

 - The RNG is now reseeded from a delayed worker, rather than on demand
   when used. Always running from a worker allows it to make use of the
   CPU RNG on platforms like S390x, whose instructions are too slow to
   do so from interrupts. It also has the effect of adding in new inputs
   more frequently with more regularity, amounting to a long term
   transcript of random values. Plus, it helps a bit with the upcoming
   vDSO implementation (which isn't yet ready for 6.2).

 - The jitter entropy algorithm now tries to execute on many different
   CPUs, round-robining, in hopes of hitting even more memory latencies
   and other unpredictable effects. It also will mix in a cycle counter
   when the entropy timer fires, in addition to being mixed in from the
   main loop, to account more explicitly for fluctuations in that timer
   firing. And the state it touches is now kept within the same cache
   line, so that it's assured that the different execution contexts will
   cause latencies.

* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
  random: include <linux/once.h> in the right header
  random: align entropy_timer_state to cache line
  random: mix in cycle counter when jitter timer fires
  random: spread out jitter callback to different CPUs
  random: remove extraneous period and add a missing one in comments
  efi: random: refresh non-volatile random seed when RNG is initialized
  vsprintf: initialize siphash key using notifier
  random: add back async readiness notifier
  random: reseed in delayed work rather than on-demand
  random: always mix cycle counter in add_latent_entropy()
  hw_random: use add_hwgenerator_randomness() for early entropy
  random: modernize documentation comment on get_random_bytes()
  random: adjust comment to account for removed function
  random: remove early archrandom abstraction
  random: use random.trust_{bootloader,cpu} command line option only
  stackprotector: actually use get_random_canary()
  stackprotector: move get_random_canary() into stackprotector.h
  treewide: use get_random_u32_inclusive() when possible
  treewide: use get_random_u32_{above,below}() instead of manual loop
  treewide: use get_random_u32_below() instead of deprecated function
  ...
2022-12-12 16:22:22 -08:00
Linus Torvalds
0a1d4434db Updates for timers, timekeeping and drivers:
- Core:
 
    - The timer_shutdown[_sync]() infrastructure:
 
      Tearing down timers can be tedious when there are circular
      dependencies to other things which need to be torn down. A prime
      example is timer and workqueue where the timer schedules work and the
      work arms the timer.
 
      What needs to prevented is that pending work which is drained via
      destroy_workqueue() does not rearm the previously shutdown
      timer. Nothing in that shutdown sequence relies on the timer being
      functional.
 
      The conclusion was that the semantics of timer_shutdown_sync() should
      be:
 
 	- timer is not enqueued
     	- timer callback is not running
     	- timer cannot be rearmed
 
      Preventing the rearming of shutdown timers is done by discarding rearm
      attempts silently. A warning for the case that a rearm attempt of a
      shutdown timer is detected would not be really helpful because it's
      entirely unclear how it should be acted upon. The only way to address
      such a case is to add 'if (in_shutdown)' conditionals all over the
      place. This is error prone and in most cases of teardown not required
      all.
 
    - The real fix for the bluetooth HCI teardown based on
      timer_shutdown_sync().
 
      A larger scale conversion to timer_shutdown_sync() is work in
      progress.
 
    - Consolidation of VDSO time namespace helper functions
 
    - Small fixes for timer and timerqueue
 
  - Drivers:
 
    - Prevent integer overflow on the XGene-1 TVAL register which causes
      an never ending interrupt storm.
 
    - The usual set of new device tree bindings
 
    - Small fixes and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUuC0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodpZD/9kCDi009n65QFF1J4kE5aZuABbRMtO
 7sy66fJpDyB/MtcbPPH29uzQUEs1VMTQVB+ZM+7e1YGoxSWuSTzeoFH+yK1w4tEZ
 VPbOcvUEjG0esKUehwYFeOjSnIjy6M1Y41aOUaDnq00/azhfTrzLxQA1BbbFbkpw
 S7u2hllbyRJ8KdqQyV9cVpXmze6fcpdtNhdQeoA7qQCsSPnJ24MSpZ/PG9bAovq8
 75IRROT7CQRd6AMKAVpA9Ov8ak9nbY3EgQmoKcp5ZXfXz8kD3nHky9Lste7djgYB
 U085Vwcelt39V5iXevDFfzrBYRUqrMKOXIf2xnnoDNeF5Jlj5gChSNVZwTLO38wu
 RFEVCjCjuC41GQJWSck9LRSYdriW/htVbEE8JLc6uzUJGSyjshgJRn/PK4HjpiLY
 AvH2rd4rAap/rjDKvfWvBqClcfL7pyBvavgJeyJ8oXyQjHrHQwapPcsMFBm0Cky5
 soF0Lr3hIlQ9u+hwUuFdNZkY9mOg09g9ImEjW1AZTKY0DfJMc5JAGjjSCfuopVUN
 Uf/qqcUeQPSEaC+C9xiFs0T3svYFxBqpgPv4B6t8zAnozon9fyZs+lv5KdRg4X77
 qX395qc6PaOSQlA7gcxVw3vjCPd0+hljXX84BORP7z+uzcsomvIH1MxJepIHmgaJ
 JrYbSZ5qzY5TTA==
 =JlDe
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for timers, timekeeping and drivers:

  Core:

   - The timer_shutdown[_sync]() infrastructure:

     Tearing down timers can be tedious when there are circular
     dependencies to other things which need to be torn down. A prime
     example is timer and workqueue where the timer schedules work and
     the work arms the timer.

     What needs to prevented is that pending work which is drained via
     destroy_workqueue() does not rearm the previously shutdown timer.
     Nothing in that shutdown sequence relies on the timer being
     functional.

     The conclusion was that the semantics of timer_shutdown_sync()
     should be:
	- timer is not enqueued
    	- timer callback is not running
    	- timer cannot be rearmed

     Preventing the rearming of shutdown timers is done by discarding
     rearm attempts silently.

     A warning for the case that a rearm attempt of a shutdown timer is
     detected would not be really helpful because it's entirely unclear
     how it should be acted upon. The only way to address such a case is
     to add 'if (in_shutdown)' conditionals all over the place. This is
     error prone and in most cases of teardown not required all.

   - The real fix for the bluetooth HCI teardown based on
     timer_shutdown_sync().

     A larger scale conversion to timer_shutdown_sync() is work in
     progress.

   - Consolidation of VDSO time namespace helper functions

   - Small fixes for timer and timerqueue

  Drivers:

   - Prevent integer overflow on the XGene-1 TVAL register which causes
     an never ending interrupt storm.

   - The usual set of new device tree bindings

   - Small fixes and improvements all over the place"

* tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support
  dt-bindings: timer: renesas,tmu: Add r8a779g0 support
  clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool()
  clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock()
  clocksource/drivers/timer-ti-dm: Clear settings on probe and free
  clocksource/drivers/timer-ti-dm: Make timer_get_irq static
  clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match
  clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error
  clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use
  dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks
  dt-bindings: timer: rockchip: Add rockchip,rk3128-timer
  clockevents: Repair kernel-doc for clockevent_delta2ns()
  clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct
  clocksource/drivers/sh_cmt: Access registers according to spec
  vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy
  Bluetooth: hci_qca: Fix the teardown problem for real
  timers: Update the documentation to reflect on the new timer_shutdown() API
  timers: Provide timer_shutdown[_sync]()
  timers: Add shutdown mechanism to the internal functions
  timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode
  ...
2022-12-12 12:52:02 -08:00
Michael Ellerman
f24f21c412 Merge branch 'topic/objtool' into next
Merge the powerpc objtool support, which we were keeping in a topic
branch in case of any merge conflicts.
2022-12-08 23:57:47 +11:00
Michael Ellerman
64fdcbcc06 powerpc/prom: Fix 32-bit build
Add an IS_ENABLED() check to fix the build error:

arch/powerpc/kernel/prom.o: in function `early_init_dt_scan_cpus':
  prom.c:(.init.text+0x2ea): undefined reference to `boot_cpu_node_count'

Fixes: e13d23a404 ("powerpc: export the CPU node count")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2022-12-08 09:43:15 +11:00
Nathan Lynch
98c738c8ce powerpc/rtas: mandate RTAS syscall filtering
CONFIG_PPC_RTAS_FILTER has been optional but default-enabled since its
introduction. It's been enabled in enterprise distro kernels for a
while without causing ABI breakage that wasn't easily fixed, and it
prevents harmful abuses of the rtas syscall.

Let's make it unconditional.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-10-nathanl@linux.ibm.com
2022-12-07 22:40:43 +11:00
Nathan Lynch
f975b6559b powerpc/rtas: define pr_fmt and convert printk call sites
Set pr_fmt to "rtas: " and convert the handful of printk() uses in
rtas.c, adjusting the messages to remove now-redundant "RTAS"
strings.

Note that rtas_restart(), rtas_power_off(), and rtas_halt() all
currently use printk() without specifying a log level. These have been
changed to use pr_emerg(), which matches the behavior of
rtas_os_term().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-9-nathanl@linux.ibm.com
2022-12-07 22:40:43 +11:00
Nathan Lynch
9581f8a007 powerpc/rtas: clean up includes
rtas.c used to host complex code related to pseries-specific guest
migration and suspend, which used atomics, completions, hcalls, and
CPU hotplug APIs. That's all been deleted or moved, so remove the
include directives that have been rendered unnecessary. Sort the
remainder (with linux/ before asm/) to impose some order on where
future additions go.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-8-nathanl@linux.ibm.com
2022-12-07 22:40:42 +11:00
Nathan Lynch
c67a0e411d powerpc/rtas: clean up rtas_error_log_max initialization
The code in rtas_get_error_log_max() doesn't cause problems in
practice, but there are no measures to ensure that the lazy
initialization of the static rtas_error_log_max variable is atomic,
and it's not worth adding them.

Initialize the static rtas_error_log_max variable at boot when we're
single-threaded instead of lazily on first use. Use the more
appropriate of_property_read_u32() API instead of rtas_token() to
consult the "rtas-error-log-max" property, which is not the name of an
RTAS function. Convert use of printk() to pr_warn() and distinguish
the possible error cases.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-7-nathanl@linux.ibm.com
2022-12-07 22:40:42 +11:00
Nathan Lynch
6c606e57ee powerpc/rtas: avoid scheduling in rtas_os_term()
It's unsafe to use rtas_busy_delay() to handle a busy status from
the ibm,os-term RTAS function in rtas_os_term():

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:618
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 2, expected: 0
CPU: 7 PID: 1 Comm: swapper/0 Tainted: G      D            6.0.0-rc5-02182-gf8553a572277-dirty #9
Call Trace:
[c000000007b8f000] [c000000001337110] dump_stack_lvl+0xb4/0x110 (unreliable)
[c000000007b8f040] [c0000000002440e4] __might_resched+0x394/0x3c0
[c000000007b8f0e0] [c00000000004f680] rtas_busy_delay+0x120/0x1b0
[c000000007b8f100] [c000000000052d04] rtas_os_term+0xb8/0xf4
[c000000007b8f180] [c0000000001150fc] pseries_panic+0x50/0x68
[c000000007b8f1f0] [c000000000036354] ppc_panic_platform_handler+0x34/0x50
[c000000007b8f210] [c0000000002303c4] notifier_call_chain+0xd4/0x1c0
[c000000007b8f2b0] [c0000000002306cc] atomic_notifier_call_chain+0xac/0x1c0
[c000000007b8f2f0] [c0000000001d62b8] panic+0x228/0x4d0
[c000000007b8f390] [c0000000001e573c] do_exit+0x140c/0x1420
[c000000007b8f480] [c0000000001e586c] make_task_dead+0xdc/0x200

Use rtas_busy_delay_time() instead, which signals without side effects
whether to attempt the ibm,os-term RTAS call again.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-5-nathanl@linux.ibm.com
2022-12-07 22:23:04 +11:00
Nathan Lynch
ed2213bfb1 powerpc/rtas: avoid device tree lookups in rtas_os_term()
rtas_os_term() is called during panic. Its behavior depends on a couple
of conditions in the /rtas node of the device tree, the traversal of
which entails locking and local IRQ state changes. If the kernel panics
while devtree_lock is held, rtas_os_term() as currently written could
hang.

Instead of discovering the relevant characteristics at panic time,
cache them in file-static variables at boot. Note the lookup for
"ibm,extended-os-term" is converted to of_property_read_bool() since it
is a boolean property, not an RTAS function token.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Incorporate suggested change from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com
2022-12-07 22:22:22 +11:00
Nathan Lynch
b10af504a2 powerpc/rtasd: use correct OF API for event scan rate
rtas_token() should be used only for properties that are RTAS function
tokens. "rtas-event-scan-rate" does not contain a function token, but it
has the same size/format as token properties so reading it with
rtas_token() happens to work.

Convert to of_property_read_u32().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-3-nathanl@linux.ibm.com
2022-12-07 22:20:33 +11:00
Nathan Lynch
336e2554ec powerpc/rtas: document rtas_call()
rtas_call() has a complex calling convention, non-standard return
values, and many users. Add kernel-doc for it and remove the less
structured commentary from rtas.h.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-2-nathanl@linux.ibm.com
2022-12-07 22:20:33 +11:00
Laurent Dufour
e13d23a404 powerpc: export the CPU node count
At boot time, the FDT is parsed to compute the number of CPUs.
In addition count the number of CPU nodes and export it.

This is useful when building the FDT for a kexeced kernel since we need to
take in account the CPU node added since the boot time during CPU hotplug
operations.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221110180619.15796-2-ldufour@linux.ibm.com
2022-12-07 20:14:49 +11:00
Rohan McLure
efe1691ac8 powerpc/64e: Clear gprs on interrupt routine entry on Book3E
Zero GPRS r14-r31 on entry into the kernel for interrupt sources to
limit influence of user-space values in potential speculation gadgets.
Prior to this commit, all other GPRS are reassigned during the common
prologue to interrupt handlers and so need not be zeroised explicitly.

This may be done safely, without loss of register state prior to the
interrupt, as the common prologue saves the initial values of
non-volatiles, which are unconditionally restored in interrupt_64.S.
Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-6-rmclure@linux.ibm.com
2022-12-02 20:46:08 +11:00
Rohan McLure
1df45d78b8 powerpc/64s: Zeroise gprs on interrupt routine entry on Book3S
Zeroise user state in gprs (assign to zero) to reduce the influence of user
registers on speculation within kernel syscall handlers. Clears occur
at the very beginning of the sc and scv 0 interrupt handlers, with
restores occurring following the execution of the syscall handler.

Zeroise GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
other interrupt sources. The remaining gprs are overwritten by
entry macros to interrupt handlers, irrespective of whether or not a
given handler consumes these register values. If an interrupt does not
select the IMSR_R12 IOption, zeroise r12.

Prior to this commit, r14-r31 are restored on a per-interrupt basis at
exit, but now they are always restored on 64bit Book3S. Remove explicit
REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear
user registers on interrupt, and continue to depend on the return value
of interrupt_exit_user_prepare to determine whether or not to restore
non-volatiles.

The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
See ~0.8% performance regression with this mitigation, but this
indicates the worst-case performance due to heavier-weight interrupt
handlers. This mitigation is able to be enabled/disabled through
CONFIG_INTERRUPT_SANITIZE_REGISTERS.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-5-rmclure@linux.ibm.com
2022-12-02 20:46:05 +11:00
Rohan McLure
2487fd2e6d powerpc/64s: IOption for MSR stored in r12
Interrupt handlers in asm/exceptions-64s.S contain a great deal of common
code produced by the GEN_COMMON macros. Currently, at the exit point of
the macro, r12 will contain the contents of the MSR. A future patch will
cause these macros to zeroise architected registers to avoid potential
speculation influence of user data.

Provide an IOption that signals that r12 must be retained, as the
interrupt handler assumes it to hold the contents of the MSR.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-4-rmclure@linux.ibm.com
2022-12-02 20:46:01 +11:00
Rohan McLure
75c5d6b1e1 powerpc/64: Sanitise common exit code for interrupts
Interrupt code is shared between Book3E/S 64-bit systems for interrupt
handlers. Ensure that exit code correctly restores non-volatile gprs on
each system when CONFIG_INTERRUPT_SANITIZE_REGISTERS is enabled.

Also introduce macros for clearing/restoring registers on interrupt
entry for when this configuration option is either disabled or enabled.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-3-rmclure@linux.ibm.com
2022-12-02 20:46:01 +11:00
Nicholas Piggin
90f1b43196 powerpc: allow minimum sized kernel stack frames
This affects only 64-bit ELFv2 kernels, and reduces the minimum
asm-created stack frame size from 112 to 32 byte on those kernels.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-16-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
4cefb0f6c5 powerpc: split validate_sp into two functions
Most callers just want to validate an arbitrary kernel stack pointer,
some need a particular size. Make the size case the exceptional one
with an extra function.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-15-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
edbd0387f3 powerpc: copy_thread add a back chain to the switch stack frame
Stack unwinders need LR and the back chain as a minimum. The switch
stack uses regs->nip for its return pointer rather than lrsave, so
that was not set in the fork frame, and neither was the back chain.
This change sets those fields in the stack.

With this and the previous change, a stack trace in the switch or
interrupt stack goes from looking like this:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 3 PID: 90 Comm: systemd Not tainted
  NIP:  c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff
  [ ... regs ... ]
  NIP [c000000000011060] _switch+0x160/0x17c
  LR [c000000000010f68] _switch+0x68/0x17c
  Call Trace:

To this:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  CPU: 0 PID: 93 Comm: systemd Not tainted
  NIP:  c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff
  [ ... regs ... ]
  NIP [c000000000011060] _switch+0x160/0x17c
  LR [c000000000010f68] _switch+0x68/0x17c
  Call Trace:
  [c000000005a93e10] [c00000000000cdbc] ret_from_fork_scv+0x0/0x54
  --- interrupt: 3000 at 0x7fffa72f56d8
  NIP:  00007fffa72f56d8 LR: 0000000000000000 CTR: 0000000000000000
  [ ... regs ... ]
  NIP [00007fffa72f56d8] 0x7fffa72f56d8
  LR [0000000000000000] 0x0
  --- interrupt: 3000

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-14-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
6895dfc047 powerpc: copy_thread fill in interrupt frame marker and back chain
Backtraces will not recognise the fork system call interrupt without
the regs marker. And regular interrupt entry from userspace creates
the back chain to the user stack, so do this for the initial fork
frame too, to be consistent.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-13-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
6f291a0381 powerpc: add a define for the switch frame size and regs offset
This is open-coded in process.c, ppc32 uses a different define with the
same value, and the C definition is name differently which makes it an
extra indirection to grep for.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-12-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
1223e5a20f powerpc: add a define for the user interrupt frame size
The user interrupt frame is a different size from the kernel frame, so
give it its own name.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-11-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
e856e33692 powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset
This is a count of longs from the stack pointer to the regs marker.
Rename it to make it more distinct from the other byte offsets. It
can be derived from the byte offset definitions just added.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-10-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
d2e8ff9f14 powerpc: add a definition for the marker offset within the interrupt frame
Define a constant rather than open-code the offset for the
"regs" marker.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-9-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
c03be0a3f3 powerpc: add definition for pt_regs offset within an interrupt frame
This is a common offset that currently uses the overloaded
STACK_FRAME_OVERHEAD constant. It's easier to read and more
flexible to use a specific regs offset for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-8-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
37195b820d powerpc: simplify ppc_save_regs
Adjust the pt_regs pointer so the interrupt frame offsets can be used
to save registers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-7-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
bc0677363d powerpc: Rearrange copy_thread child stack creation
This makes it a bit clearer where the stack frame is created, and will
allow easier use of some of the stack offset constants in a later
change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-5-npiggin@gmail.com
2022-12-02 17:54:07 +11:00
Nicholas Piggin
de3d098dd1 powerpc/64: Add module check for ELF ABI version
Override the generic module ELF check to provide a check for the ELF ABI
version. This becomes important if we allow big-endian ELF ABI V2 builds
but it doesn't hurt to check now.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041539.1742489-3-npiggin@gmail.com
2022-12-02 17:54:07 +11:00
Jann Horn
d6c494e8ee vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy
find_timens_vvar_page() is not architecture-specific, as can be seen from
how all five per-architecture versions of it are the same.

(arm64, powerpc and riscv are exactly the same; x86 and s390 have two
characters difference inside a comment, less blank lines, and mark the
!CONFIG_TIME_NS version as inline.)

Refactor the five copies into a central copy in kernel/time/namespace.c.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221130115320.2918447-1-jannh@google.com
2022-12-01 11:35:40 +01:00
Jordan Niethe
3671f4ebe3 powerpc: Allow clearing and restoring registers independent of saved breakpoint state
For the coming temporary mm used for instruction patching, the
breakpoint registers need to be cleared to prevent them from
accidentally being triggered. As soon as the patching is done, the
breakpoints will be restored.

The breakpoint state is stored in the per-cpu variable current_brk[].
Add a suspend_breakpoints() function which will clear the breakpoint
registers without touching the state in current_brk[]. Add a pair
function restore_breakpoints() which will move the state in
current_brk[] back to the registers.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-2-bgray@linux.ibm.com
2022-11-30 21:46:48 +11:00
Michael Ellerman
611c020239 Merge branch 'fixes' into next
Merge our fixes branch to bring in some changes that are prerequisites
for work in next.
2022-11-30 21:46:06 +11:00
Russell Currey
f668027521 powerpc/8xx: Fix warning in hw_breakpoint_handler()
In hw_breakpoint_handler(), ea is set by wp_get_instr_detail() except
for 8xx, leading the variable to be passed uninitialised to
wp_check_constraints().  This is safe as wp_check_constraints() returns
early without using ea, so just set it to make the compiler happy.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221024041346.103608-1-ruscur@russell.cc
2022-11-24 23:31:49 +11:00
Naveen N. Rao
266b1991a4 powerpc/kprobes: Use preempt_enable() rather than the no_resched variant
preempt_enable_no_resched() is just the same as preempt_enable() when we
are in a irqs disabled context. kprobe_handler() and the post/fault
handlers are all called with irqs disabled. As such, convert those to
just use preempt_enable().

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/72639f75fe66f931ec8c2165276ffbfb0fe1006f.1666262278.git.naveen.n.rao@linux.vnet.ibm.com
2022-11-24 23:31:49 +11:00
Naveen N. Rao
04ec5d5782 powerpc/kprobes: Have optimized_callback() use preempt_enable()
Similar to x86 commit 2e62024c26 ("kprobes/x86: Use preempt_enable()
in optimized_callback()"), change powerpc optprobes to use
preempt_enable() rather than preempt_enable_no_resched() since powerpc
also removed irq disabling for optprobes in commit f72180cc93
("powerpc/kprobes: Do not disable interrupts for optprobes and
kprobes_on_ftrace").

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1885bab182626c33d9bf6421f430abf924c521a5.1666262278.git.naveen.n.rao@linux.vnet.ibm.com
2022-11-24 23:31:49 +11:00
Naveen N. Rao
2fa9482334 powerpc/kprobes: Remove preempt disable around call to get_kprobe() in arch_prepare_kprobe()
arch_prepare_kprobe() is called from register_kprobe() via
prepare_kprobe(), or through register_aggr_kprobe(), both with the
kprobe_mutex held. Per the comment for get_kprobe():
  /*
   * This routine is called either:
   *	- under the 'kprobe_mutex' - during kprobe_[un]register().
   *				OR
   *	- with preemption disabled - from architecture specific code.
   */

As such, there is no need to disable preemption around the call to
get_kprobe(). Drop the same.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1043d06a0affed83a4a46dd29466e72820ee215d.1666262278.git.naveen.n.rao@linux.vnet.ibm.com
2022-11-24 23:31:49 +11:00
Nicholas Piggin
f985adaf2f powerpc: remove the last remnants of cputime_t
cputime_t was a core kernel type, removed by commits
ed5c8c854f2b..b672592f0221. As explained in commit b672592f02
("sched/cputime: Remove generic asm headers"), the final cleanup
is for the arch to provide cputime_to_nsec[s](). Commit ade7667a98
("powerpc: Add cputime_to_nsecs()") did that, but justdidn't remove
the then-unused cputime_to_usecs(), cputime_t type, and associated
remnants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006105653.115829-1-npiggin@gmail.com
2022-11-24 23:31:48 +11:00
Michael Ellerman
d90bb7b4fd powerpc: Print instruction dump on a single line
Although the previous commit made the powerpc instruction dump usable
with scripts/decodecode, there are still some problems.

Because the dump is split across multiple lines, the script doesn't cope
with printk timestamps or caller info.

That can be fixed by printing the entire dump on one line, eg:

  [   12.016307][  T112] --- interrupt: c00
  [   12.016605][  T112] Code: 4b7aae15 60000000 3d22016e 3c62ffec 39291160 38639bc0 e8890000 4b7aadf9 60000000 4bfffee8 7c0802a6 60000000 <0fe00000> 60420000 3c4c008f 384268a0
  [   12.017655][  T112] ---[ end trace 0000000000000000 ]---

That output can then be piped directly into scripts/decodecode and
interpreted correctly.

Printing the dump on a single line does produce a very long line, about
173 characters. That is still shorter than x86, which prints nearly 200
characters even without timestamps etc.

All consoles I'm aware of will wrap the line if it's too long, so the
length should not be a functional problem. If anything it should help on
consoles like VGA by using less vertical space.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006032019.1128624-2-mpe@ellerman.id.au
2022-11-24 23:31:48 +11:00
Michael Ellerman
3e65412709 powerpc: Make instruction dump work with scripts/decodecode
Matt reported that scripts/decodecode doesn't work for the instruction
dump in the powerpc oops output. Although there are scripts around that
can decode it, it would be preferable if the standard in-tree script
worked.

All other arches prefix the instruction dump with "Code:", and that's
what the script looks for, so use that.

The script then works as expected:

  $ CROSS_COMPILE=powerpc64le-linux-gnu- ./scripts/decodecode
  Code:
  fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000
  60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378
  ^D

  All code
  ========
     0:   f0 ff c1 fb     std     r30,-16(r1)
     4:   c1 ff 21 f8     stdu    r1,-64(r1)
     8:   78 1b 7d 7c     mr      r29,r3
     ...

Note that the script doesn't cope well with printk timestamps or printk
caller info.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006032019.1128624-1-mpe@ellerman.id.au
2022-11-24 23:31:48 +11:00
Disha Goel
2223552256 powerpc/kvm: Remove unused macros from asm-offset
The kvm code was refactored to convert some of kvm assembly routines to C.
This includes commits which moved code path for the kvm guest entry/exit
for p7/8 from aseembly to C. As part of the code changes, usage of some of
the macros were removed. But definitions still exist in the assembly files.
Commits are listed below:

Commit 2e1ae9cd56 ("KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU")
Commit 9769a7fd79 ("KVM: PPC: Book3S HV: Remove radix guest support from P7/8 path")
Commit fae5c9f366 ("KVM: PPC: Book3S HV: remove ISA v3.0 and v3.1 support from P7/8 path")
Commit 57dc0eed73 ("KVM: PPC: Book3S HV P9: Implement PMU save/restore in C")

Many of the asm-offset macro definitions were missed to remove. Patch
fixes by removing the unused macros.

Signed-off-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916105736.268153-2-disgoel@linux.vnet.ibm.com
2022-11-24 23:31:47 +11:00
Sathvika Vasireddy
d0160bd5d3 powerpc/vdso: Skip objtool from running on VDSO files
Do not run objtool on VDSO files, by using OBJECT_FILES_NON_STANDARD.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-8-sv@linux.ibm.com
2022-11-18 19:00:06 +11:00
Christophe Leroy
2da3776167 powerpc/32: Fix objtool unannotated intra-function call warnings
Fix several annotations in assembly files on PPC32.

[Sathvika Vasireddy: Changed subject line and removed Kconfig change to
 enable objtool, as it is a part of "objtool/powerpc: Enable objtool to
 be built on ppc" patch in this series.]

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-7-sv@linux.ibm.com
2022-11-18 19:00:06 +11:00
Jason A. Donenfeld
8032bf1233 treewide: use get_random_u32_below() instead of deprecated function
This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
  (E)

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18 02:15:15 +01:00
Nicholas Piggin
eb761a1760 powerpc: Fix writable sections being moved into the rodata region
.data.rel.ro*  catches .data.rel.root_cpuacct, and the kernel crashes on
a store in css_clear_dir. At least we know read-only data protection is
working...

Fixes: b6adc6d6d3 ("powerpc/build: move .data.rel.ro, .sdata2 to read-only")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221116043954.3307852-1-npiggin@gmail.com
2022-11-16 21:37:14 +11:00
Sergey Shtylyov
18b9fe54d9 powerpc: ptrace: user_regset_copyin_ignore() always returns 0
user_regset_copyin_ignore() always returns 0, so checking its result seems
pointless -- don't do this anymore...

[akpm@linux-foundation.org: fix gpr32_set_common()]
Link: https://lkml.kernel.org/r/20221014212235.10770-11-s.shtylyov@omp.ru
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-15 14:30:40 -08:00
Sathvika Vasireddy
8d0c21b506 powerpc: Curb objtool unannotated intra-function call warnings
objtool throws the following unannotated intra-function call warnings:
arch/powerpc/kernel/entry_64.o: warning: objtool: .text+0x4: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe64: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xee4: unannotated intra-function call

Fix these warnings by annotating intra-function calls, using
ANNOTATE_INTRA_FUNCTION_CALL macro, to indicate that the branch targets
are valid.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-5-sv@linux.ibm.com
2022-11-15 20:11:47 +11:00
Sathvika Vasireddy
29a011fc79 powerpc: Fix objtool unannotated intra-function call warnings
Objtool throws unannotated intra-function call warnings in the following
assembly files:

arch/powerpc/kernel/vector.o: warning: objtool: .text+0x53c: unannotated intra-function call

arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x60: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x124: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x5d4: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x5dc: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xcb8: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xd0c: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x1030: unannotated intra-function call

arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x358: unannotated intra-function call
arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x728: unannotated intra-function call
arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x4d94: unannotated intra-function call
arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x4ec4: unannotated intra-function call

arch/powerpc/kvm/book3s_hv_interrupts.o: warning: objtool: .text+0x6c: unannotated intra-function call
arch/powerpc/kernel/misc_64.o: warning: objtool: .text+0x64: unannotated intra-function call

Objtool does not add STT_NOTYPE symbols with size 0 to the rbtree, which
is why find_call_destination() function is not able to find the
destination symbol for 'bl' instruction. For such symbols, objtool is
throwing unannotated intra-function call warnings in assembly files. Fix
these warnings by annotating those symbols with SYM_FUNC_START_LOCAL and
SYM_FUNC_END macros, inorder to set symbol type to STT_FUNC and symbol
size accordingly.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-4-sv@linux.ibm.com
2022-11-15 20:11:47 +11:00
Andreas Schwab
ce883a2ba3 powerpc/32: fix syscall wrappers with 64-bit arguments
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Fixes: e237506238 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87mt9cxd6g.fsf_-_@igel.home
2022-11-01 10:24:09 +11:00
Nicholas Piggin
65722736c3 powerpc/64s/interrupt: Fix clear of PACA_IRQS_HARD_DIS when returning to soft-masked context
Commit a4cb3651a1 ("powerpc/64s/interrupt: Fix lost interrupts when
returning to soft-masked context") fixed the problem of pending irqs
being cleared when clearing the HARD_DIS bit, but then it didn't clear
the bit at all. This change clears HARD_DIS without affecting other bits
in the mask.

When an interrupt hits in a soft-masked section that has MSR[EE]=1, it
can hard disable and set PACA_IRQS_HARD_DIS, which must be cleared when
returning to the EE=1 caller (unless it was set due to a MUST_HARD_MASK
interrupt becoming pending). Failure to clear this leaves the
returned-to context running with MSR[EE]=1 and PACA_IRQS_HARD_DIS, which
confuses irq assertions and could be dangerous for code that might test
the flag.

This was observed in a hash MMU kernel where a kernel hash fault hits in
a local_irqs_disabled region that has EE=1. The hash fault also runs
with EE=1, then as it returns, a decrementer hits in the restart section
and the irq restart code hard-masks which sets the PACA_IRQ_HARD_DIS
flag, which is not clear when the original context is returned to.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: a4cb3651a1 ("powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221022052207.471328-1-npiggin@gmail.com
2022-10-27 00:38:35 +11:00
Nicholas Piggin
dc398a084d powerpc/64s/interrupt: Perf NMI should not take normal exit path
NMI interrupts should exit with EXCEPTION_RESTORE_REGS not with
interrupt_return_srr, which is what the perf NMI handler currently does.
This breaks if a PMI hits after interrupt_exit_user_prepare_main() has
switched the context tracking to user mode, then the CT_WARN_ON() in
interrupt_exit_kernel_prepare() fires because it returns to kernel with
context set to user.

This could possibly be solved by soft-disabling PMIs in the exit path,
but that reduces our ability to profile that code. The warning could be
removed, but it's potentially useful.

All other NMIs and soft-NMIs return using EXCEPTION_RESTORE_REGS, so
this makes perf interrupts consistent with that and seems like the best
fix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Squash in fixups from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006140413.126443-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin
a073672eb0 powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warning
NMI PMIs really should not return using the normal interrupt_return
function. If such a PMI hits in code returning to user with the context
switched to user mode, this warning can fire. This was enough to cause
crashes when reproducing on 64s, because another perf interrupt would
hit while reporting bug, and that would cause another bug, and so on
until smashing the stack.

Work around that particular crash for now by just disabling that context
warning for PMIs. This is a hack and not a complete fix, there could be
other such problems lurking in corners. But it does fix the known crash.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221014030729.2077151-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Linus Torvalds
f1947d7c8a Random number generator fixes for Linux 6.1-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmNHYD0ACgkQSfxwEqXe
 A655AA//dJK0PdRghqrKQsl18GOCffV5TUw5i1VbJQbI9d8anfxNjVUQiNGZi4et
 qUwZ8OqVXxYx1Z1UDgUE39PjEDSG9/cCvOpMUWqN20/+6955WlNZjwA7Fk6zjvlM
 R30fz5CIJns9RFvGT4SwKqbVLXIMvfg/wDENUN+8sxt36+VD2gGol7J2JJdngEhM
 lW+zqzi0ABqYy5so4TU2kixpKmpC08rqFvQbD1GPid+50+JsOiIqftDErt9Eg1Mg
 MqYivoFCvbAlxxxRh3+UHBd7ZpJLtp1UFEOl2Rf00OXO+ZclLCAQAsTczucIWK9M
 8LCZjb7d4lPJv9RpXFAl3R1xvfc+Uy2ga5KeXvufZtc5G3aMUKPuIU7k28ZyblVS
 XXsXEYhjTSd0tgi3d0JlValrIreSuj0z2QGT5pVcC9utuAqAqRIlosiPmgPlzXjr
 Us4jXaUhOIPKI+Musv/fqrxsTQziT0jgVA3Njlt4cuAGm/EeUbLUkMWwKXjZLTsv
 vDsBhEQFmyZqxWu4pYo534VX2mQWTaKRV1SUVVhQEHm57b00EAiZohoOvweB09SR
 4KiJapikoopmW4oAUFotUXUL1PM6yi+MXguTuc1SEYuLz/tCFtK8DJVwNpfnWZpE
 lZKvXyJnHq2Sgod/hEZq58PMvT6aNzTzSg7YzZy+VabxQGOO5mc=
 =M+mV
 -----END PGP SIGNATURE-----

Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull more random number generator updates from Jason Donenfeld:
 "This time with some large scale treewide cleanups.

  The intent of this pull is to clean up the way callers fetch random
  integers. The current rules for doing this right are:

   - If you want a secure or an insecure random u64, use get_random_u64()

   - If you want a secure or an insecure random u32, use get_random_u32()

     The old function prandom_u32() has been deprecated for a while
     now and is just a wrapper around get_random_u32(). Same for
     get_random_int().

   - If you want a secure or an insecure random u16, use get_random_u16()

   - If you want a secure or an insecure random u8, use get_random_u8()

   - If you want secure or insecure random bytes, use get_random_bytes().

     The old function prandom_bytes() has been deprecated for a while
     now and has long been a wrapper around get_random_bytes()

   - If you want a non-uniform random u32, u16, or u8 bounded by a
     certain open interval maximum, use prandom_u32_max()

     I say "non-uniform", because it doesn't do any rejection sampling
     or divisions. Hence, it stays within the prandom_*() namespace, not
     the get_random_*() namespace.

     I'm currently investigating a "uniform" function for 6.2. We'll see
     what comes of that.

  By applying these rules uniformly, we get several benefits:

   - By using prandom_u32_max() with an upper-bound that the compiler
     can prove at compile-time is ≤65536 or ≤256, internally
     get_random_u16() or get_random_u8() is used, which wastes fewer
     batched random bytes, and hence has higher throughput.

   - By using prandom_u32_max() instead of %, when the upper-bound is
     not a constant, division is still avoided, because
     prandom_u32_max() uses a faster multiplication-based trick instead.

   - By using get_random_u16() or get_random_u8() in cases where the
     return value is intended to indeed be a u16 or a u8, we waste fewer
     batched random bytes, and hence have higher throughput.

  This series was originally done by hand while I was on an airplane
  without Internet. Later, Kees and I worked on retroactively figuring
  out what could be done with Coccinelle and what had to be done
  manually, and then we split things up based on that.

  So while this touches a lot of files, the actual amount of code that's
  hand fiddled is comfortably small"

* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  prandom: remove unused functions
  treewide: use get_random_bytes() when possible
  treewide: use get_random_u32() when possible
  treewide: use get_random_{u8,u16}() when possible, part 2
  treewide: use get_random_{u8,u16}() when possible, part 1
  treewide: use prandom_u32_max() when possible, part 2
  treewide: use prandom_u32_max() when possible, part 1
2022-10-16 15:27:07 -07:00
Linus Torvalds
70609c1495 powerpc fixes for 6.1 #2
- Fix 32-bit syscall wrappers with 64-bit arguments of unaligned register-pairs.
    Notably this broke ftruncate64 & pread/write64, which can lead to file corruption.
 
  - Fix lost interrupts when returning to soft-masked context on 64-bit.
 
  - Fix build failure when CONFIG_DTL=n.
 
 Thanks to: Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd Bergmann, Sachin Sant.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNJQp8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPd3D/9+uPapAs6/ZGkrbojseiF/fZNyIfTF
 HcLZ3iWUP9+APuaRxd1m3HbBoWdgFQ9yC5abYnX1z+E/yEhbTaOof9Gu1vXfxbIF
 Orlc+X12uZMrdU1L2qlgk/89QLeKOTlp/b2eUXJ332Vl4DfJDoknMsqTqbMz9NrL
 EDj6vmG3JzJgKHmplwxd7d4CExQNh9khZr3lZ6Ae/hC1cZ1PVXD67akwxsc0i8E2
 dBGzwsP1SYXPrBMXONT1GNs/whXrKBHNyLWFueab3p0FsSkbc4sY/HaAPzjq7YzK
 aPV0FQi2YpIZ5IIunZA+vW+tQ8EvyOpOUPMGb7iiN/31YcFbf/k6Y92AlCFlHRq3
 9vvsNGzDeFBfvH2wWcR1XaAMQHmBpC4L+dYbCKIyuK9DjgZtpZUV3acagRMFXqMN
 TeN9h2GXAYxCdNc27I5v3wAvbNcx69E4Bs5F7voKGD4pb7V/4d/li2oix6v0FKIy
 qAt1PGhD4IgPE3oJOEOYsJRVQNyDfG4eKlFh5ijyyc5hQeGNeKiAH+G+a61R3ueo
 GvVOdkR/1QxK1XkbgrMyclb0nHN1kEjRHeIcMFW1+dMKA9UvOlJ6u8nkOFRvA3js
 r00O1uE7DT5WzdGXgEHCCDd7PybBKxpM32m4/vzgDM7xG+DHxp3j7LTQsDOfkFsH
 bkc31LqrAw3yDQ==
 =B+7s
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix 32-bit syscall wrappers with 64-bit arguments of unaligned
   register-pairs. Notably this broke ftruncate64 & pread/write64, which
   can lead to file corruption.

 - Fix lost interrupts when returning to soft-masked context on 64-bit.

 - Fix build failure when CONFIG_DTL=n.

Thanks to Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd
Bergmann, and Sachin Sant.

* tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Fix CONFIG_DTL=n build
  powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
  powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
2022-10-14 11:16:18 -07:00
Nicholas Piggin
a4cb3651a1 powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
It's possible for an interrupt returning to an irqs-disabled context to
lose a pending soft-masked irq because it branches to part of the exit
code for irqs-enabled contexts, which is meant to clear only the
PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
just looks like a simple thinko from a recent commit (if there was no
hard mask pending, there would be no reason to clear it anyway).

This also adds comment to the code that actually does need to clear the
flag.

Fixes: e485f6c751 ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013064418.1311104-1-npiggin@gmail.com
2022-10-13 22:29:49 +11:00
Linus Torvalds
676cb49573 - hfs and hfsplus kmap API modernization from Fabio Francesco
- Valentin Schneider makes crash-kexec work properly when invoked from
   an NMI-time panic.
 
 - ntfs bugfixes from Hawkins Jiawei
 
 - Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
   percpu counters.
 
 - nilfs2 cleanups from Minghao Chi
 
 - lots of other single patches all over the tree!
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
 joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
 DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
 =lUQY
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - hfs and hfsplus kmap API modernization (Fabio Francesco)

 - make crash-kexec work properly when invoked from an NMI-time panic
   (Valentin Schneider)

 - ntfs bugfixes (Hawkins Jiawei)

 - improve IPC msg scalability by replacing atomic_t's with percpu
   counters (Jiebin Sun)

 - nilfs2 cleanups (Minghao Chi)

 - lots of other single patches all over the tree!

* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
  proc: test how it holds up with mapping'less process
  mailmap: update Frank Rowand email address
  ia64: mca: use strscpy() is more robust and safer
  init/Kconfig: fix unmet direct dependencies
  ia64: update config files
  nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
  fork: remove duplicate included header files
  init/main.c: remove unnecessary (void*) conversions
  proc: mark more files as permanent
  nilfs2: remove the unneeded result variable
  nilfs2: delete unnecessary checks before brelse()
  checkpatch: warn for non-standard fixes tag style
  usr/gen_init_cpio.c: remove unnecessary -1 values from int file
  ipc/msg: mitigate the lock contention with percpu counter
  percpu: add percpu_counter_add_local and percpu_counter_sub_local
  fs/ocfs2: fix repeated words in comments
  relay: use kvcalloc to alloc page array in relay_alloc_page_array
  proc: make config PROC_CHILDREN depend on PROC_FS
  fs: uninline inode_maybe_inc_iversion()
  ...
2022-10-12 11:00:22 -07:00
Nicholas Piggin
e237506238 powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
powerpc 32-bit system call (and function) calling convention for 64-bit
arguments requires the next available odd-pair (two sequential registers
with the first being odd-numbered) from the standard register argument
allocation.

The first argument register is r3, so a 64-bit argument that appears at
an even position in the argument list must skip a register (unless there
were preceding 64-bit arguments, which might throw things off). This
requires non-standard compat definitions to deal with the holes in the
argument register allocation.

With pt_regs syscall wrappers which use a standard mapper to map pt_regs
GPRs to function arguments, 32-bit kernels hit the same basic problem,
the standard definitions don't cope with the unused argument registers.

Fix this by having 32-bit kernels share those syscall definitions with
compat.

Thanks to Jason for spending a lot of time finding and bisecting this
and developing a trivial reproducer. The perfect bug report.

Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221012035335.866440-1-npiggin@gmail.com
2022-10-13 00:49:58 +11:00
Jason A. Donenfeld
81895a65ec treewide: use prandom_u32_max() when possible, part 1
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
        ... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
        print("Skipping 0x%x for cleanup elsewhere" % (value))
        cocci.include_match(False)
elif value & (value + 1) != 0:
        print("Skipping 0x%x because it's not a power of two minus one" % (value))
        cocci.include_match(False)
elif literal.startswith('0x'):
        coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
        coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p & (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-       T VAR;
        ... when != VAR
 }

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:55 -06:00
Joel Stanley
ae5b6779fa powerpc: Fix 85xx build
The merge of the kbuild tree dropped the renaming of the FSL_BOOKE
kconfig option.

Fixes: 8afc66e8d4 ("Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-11 10:13:34 -07:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Linus Torvalds
d4013bc4d4 bitmap patches for v6.1-rc1
From Phil Auld:
 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
 
 From me:
 cpumask: cleanup nr_cpu_ids vs nr_cpumask_bits mess
 
 This series cleans that mess and adds new config FORCE_NR_CPUS that
 allows to optimize cpumask subsystem if the number of CPUs is known
 at compile-time.
 
 From me:
 lib: optimize find_bit() functions
 
 Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros.
 
 From me:
 lib/find: add find_nth_bit()
 
 Adds find_nth_bit(), which is ~70 times faster than bitcounting with
 for_each() loop:
         for_each_set_bit(bit, mask, size)
                 if (n-- == 0)
                         return bit;
 
 Also adds bitmap_weight_and() to let people replace this pattern:
 	tmp = bitmap_alloc(nbits);
 	bitmap_and(tmp, map1, map2, nbits);
 	weight = bitmap_weight(tmp, nbits);
 	bitmap_free(tmp);
 with a single bitmap_weight_and() call.
 
 From me:
 cpumask: repair cpumask_check()
 
 After switching cpumask to use nr_cpu_ids, cpumask_check() started
 generating many false-positive warnings. This series fixes it.
 
 From Valentin Schneider:
 bitmap,cpumask: Add for_each_cpu_andnot() and for_each_cpu_andnot()
 
 Extends the API with one more function and applies it in sched/core.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmNBwmUACgkQsUSA/Tof
 vshPRwv+KlqnZlKtuSPgbo/Kgswworpi/7TqfnN9GWlb8AJ2uhjBKI3GFwv4TDow
 7KV6wdKdXYLr4pktcIhWy3qLrT+bDDExfarHRo3QI1A1W42EJ+ZiUaGnQGcnVMzD
 5q/K1YMJYq0oaesHEw5PVUh8mm6h9qRD8VbX1u+riW/VCWBj3bho9Dp4mffQ48Q6
 hVy/SnMGgClQwNYp+sxkqYx38xUqUGYoU5MzeziUmoS6pZQh+4lF33MULnI3EKmc
 /ehXilPPtOV/Tm0RovDWFfm3rjNapV9FXHu8Ob2z/c+1A29EgXnE3pwrBDkAx001
 TQrL9qbCANRDGPLzWQHw0dwFIaXvTdrSttCsfYYfU5hI4JbnJEe0Pqkaaohy7jqm
 r0dW/TlyOG5T+k8Kwdx9w9A+jKs8TbKKZ8HOaN8BpkXswVnpbzpQbj3TITZI4aeV
 6YR4URBQ5UkrVLEXFXbrOzwjL2zqDdyNoBdTJmGLJ+5b/n0HHzmyMVkegNIwLLM3
 GR7sMQae
 =Q/+F
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)

 - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)

   This series cleans that mess and adds new config FORCE_NR_CPUS that
   allows to optimize cpumask subsystem if the number of CPUs is known
   at compile-time.

 - optimize find_bit() functions (me)

   Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
   macros.

 - add find_nth_bit() (me)

   Adds find_nth_bit(), which is ~70 times faster than bitcounting with
   for_each() loop:

	for_each_set_bit(bit, mask, size)
		if (n-- == 0)
			return bit;

   Also adds bitmap_weight_and() to let people replace this pattern:

	tmp = bitmap_alloc(nbits);
	bitmap_and(tmp, map1, map2, nbits);
	weight = bitmap_weight(tmp, nbits);
	bitmap_free(tmp);

   with a single bitmap_weight_and() call.

 - repair cpumask_check() (me)

   After switching cpumask to use nr_cpu_ids, cpumask_check() started
   generating many false-positive warnings. This series fixes it.

 - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin
   Schneider)

   Extends the API with one more function and applies it in sched/core.

* tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits)
  sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()
  lib/test_cpumask: Add for_each_cpu_and(not) tests
  cpumask: Introduce for_each_cpu_andnot()
  lib/find_bit: Introduce find_next_andnot_bit()
  cpumask: fix checking valid cpu range
  lib/bitmap: add tests for for_each() loops
  lib/find: optimize for_each() macros
  lib/bitmap: introduce for_each_set_bit_wrap() macro
  lib/find_bit: add find_next{,_and}_bit_wrap
  cpumask: switch for_each_cpu{,_not} to use for_each_bit()
  net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}
  cpumask: add cpumask_nth_{,and,andnot}
  lib/bitmap: remove bitmap_ord_to_pos
  lib/bitmap: add tests for find_nth_bit()
  lib: add find_nth{,_and,_andnot}_bit()
  lib/bitmap: add bitmap_weight_and()
  lib/bitmap: don't call __bitmap_weight() in kernel code
  tools: sync find_bit() implementation
  lib/find_bit: optimize find_next_bit() functions
  lib/find_bit: create find_first_zero_bit_le()
  ...
2022-10-10 12:49:34 -07:00
Linus Torvalds
8afc66e8d4 Kbuild updates for v6.1
- Remove potentially incomplete targets when Kbuid is interrupted by
    SIGINT etc. in case GNU Make may miss to do that when stderr is piped
    to another program.
 
  - Rewrite the single target build so it works more correctly.
 
  - Fix rpm-pkg builds with V=1.
 
  - List top-level subdirectories in ./Kbuild.
 
  - Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in kallsyms.
 
  - Avoid two different modules in lib/zstd/ having shared code, which
    potentially causes building the common code as build-in and modular
    back-and-forth.
 
  - Unify two modpost invocations to optimize the build process.
 
  - Remove head-y syntax in favor of linker scripts for placing particular
    sections in the head of vmlinux.
 
  - Bump the minimal GNU Make version to 3.82.
 
  - Clean up misc Makefiles and scripts.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmM+4vcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGY2IQAInr0JUNnkkxwUSXtOcQuA3IK8RJ
 FbU9HXJRoV9H+7+l3SMlN7mIbrs5eE5fTY3iwQ3CVe139d1+1q7nvTMRv8owywJx
 GBgzswncuu1lk7iQQ//CxiqMwSCG8GJdYn1uDVy4I5jg3o+DtFZJtyq2Wb7pqsMm
 ZhZ4PozRN+idYQJSF6Vx/zEVLHI7quMBwfe4CME8/0Kg2+hnYzbXV/aUf0ED2emq
 zdCMDQgIOK5AhY+8qgMXKYnBUJMTqBp6LoR4p3ApfUkwRFY0sGa0/LK3U/B22OE7
 uWyR4fCUExGyerlcHEVev+9eBfmsLLPyqlchNwpSDOPf5OSdnKmgqJEBR/Cvx0eh
 URerPk7EHxyH3G8yi+cU2GtofNTGc5RHPRgJE2ADsQEi5TAUKGmbXMlsFRL/51Vn
 lTANZObBNa1d4enljF6TfTL5nuccOa+DKvXnH9fQ49t0QdtSikv6J/lGwilwm1Sr
 BctmCsySPuURZfkpI9OQnLuouloMXl9f7Q/+S39haS/tSgvPpyITyO71nxDnXn/s
 BbFObZJUk9QkqOACjBP1hNErTLt83uBxQ9z+rDCw/SbLIe4nw0wyneuygfHI5rI8
 3RZB2DbGauuJHX2Zs6YGS14SLSY33IsLqKR1/Vy3LrPvOHuEvNiOR8LITq5E0YCK
 OffK2Y5cIlXR0QWf
 =DHiN
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Remove potentially incomplete targets when Kbuid is interrupted by
   SIGINT etc in case GNU Make may miss to do that when stderr is piped
   to another program.

 - Rewrite the single target build so it works more correctly.

 - Fix rpm-pkg builds with V=1.

 - List top-level subdirectories in ./Kbuild.

 - Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in
   kallsyms.

 - Avoid two different modules in lib/zstd/ having shared code, which
   potentially causes building the common code as build-in and modular
   back-and-forth.

 - Unify two modpost invocations to optimize the build process.

 - Remove head-y syntax in favor of linker scripts for placing
   particular sections in the head of vmlinux.

 - Bump the minimal GNU Make version to 3.82.

 - Clean up misc Makefiles and scripts.

* tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
  docs: bump minimal GNU Make version to 3.82
  ia64: simplify esi object addition in Makefile
  Revert "kbuild: Check if linker supports the -X option"
  kbuild: rebuild .vmlinux.export.o when its prerequisite is updated
  kbuild: move modules.builtin(.modinfo) rules to Makefile.vmlinux_o
  zstd: Fixing mixed module-builtin objects
  kallsyms: ignore __kstrtab_* and __kstrtabns_* symbols
  kallsyms: take the input file instead of reading stdin
  kallsyms: drop duplicated ignore patterns from kallsyms.c
  kbuild: reuse mksysmap output for kallsyms
  mksysmap: update comment about __crc_*
  kbuild: remove head-y syntax
  kbuild: use obj-y instead extra-y for objects placed at the head
  kbuild: hide error checker logs for V=1 builds
  kbuild: re-run modpost when it is updated
  kbuild: unify two modpost invocations
  kbuild: move vmlinux.o rule to the top Makefile
  kbuild: move .vmlinux.objs rule to Makefile.modpost
  kbuild: list sub-directories in ./Kbuild
  Makefile.compiler: replace cc-ifversion with compiler-specific macros
  ...
2022-10-10 12:00:45 -07:00
Linus Torvalds
3871d93b82 Perf events updates for v6.1:
- PMU driver updates:
 
      - Add AMD Last Branch Record Extension Version 2 (LbrExtV2)
        feature support for Zen 4 processors.
 
      - Extend the perf ABI to provide branch speculation information,
        if available, and use this on CPUs that have it (eg. LbrExtV2).
 
      - Improve Intel PEBS TSC timestamp handling & integration.
 
      - Add Intel Raptor Lake S CPU support.
 
      - Add 'perf mem' and 'perf c2c' memory profiling support on
        AMD CPUs by utilizing IBS tagged load/store samples.
 
      - Clean up & optimize various x86 PMU details.
 
  - HW breakpoints:
 
      - Big rework to optimize the code for systems with hundreds of CPUs and
        thousands of breakpoints:
 
         - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
 	  per-CPU rwsem that is read-locked during most of the key operations.
 
 	- Improve the O(#cpus * #tasks) logic in toggle_bp_slot()
 	  and fetch_bp_busy_slots().
 
 	- Apply micro-optimizations & cleanups.
 
   - Misc cleanups & enhancements.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/2pMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIMA/+J+MCEVTt9kwZeBtHoPX7iZ5gnq1+McoQ
 f6ALX19AO/ZSuA7EBA3cS3Ny5eyGy3ofYUnRW+POezu9CpflLW/5N27R2qkZFrWC
 A09B86WH676ZrmXt+oI05rpZ2y/NGw4gJxLLa4/bWF3g9xLfo21i+YGKwdOnNFpl
 DEdCVHtjlMcOAU3+on6fOYuhXDcYd7PKGcCfLE7muOMOAtwyj0bUDBt7m+hneZgy
 qbZHzDU2DZ5L/LXiMyuZj5rC7V4xUbfZZfXglG38YCW1WTieS3IjefaU2tREhu7I
 rRkCK48ULDNNJR3dZK8IzXJRxusq1ICPG68I+nm/K37oZyTZWtfYZWehW/d/TnPa
 tUiTwimabz7UUqaGq9ZptxwINcAigax0nl6dZ3EseeGhkDE6j71/3kqrkKPz4jth
 +fCwHLOrI3c4Gq5qWgPvqcUlUneKf3DlOMtzPKYg7sMhla2zQmFpYCPzKfm77U/Z
 BclGOH3FiwaK6MIjPJRUXTePXqnUseqCR8PCH/UPQUeBEVHFcMvqCaa15nALed8x
 dFi76VywR9mahouuLNq6sUNePlvDd2B124PygNwegLlBfY9QmKONg9qRKOnQpuJ6
 UprRJjLOOucZ/N/jn6+ShHkqmXsnY2MhfUoHUoMQ0QAI+n++e+2AuePo251kKWr8
 QlqKxd9PMQU=
 =LcGg
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:
 "PMU driver updates:

   - Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature
     support for Zen 4 processors.

   - Extend the perf ABI to provide branch speculation information, if
     available, and use this on CPUs that have it (eg. LbrExtV2).

   - Improve Intel PEBS TSC timestamp handling & integration.

   - Add Intel Raptor Lake S CPU support.

   - Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs
     by utilizing IBS tagged load/store samples.

   - Clean up & optimize various x86 PMU details.

  HW breakpoints:

   - Big rework to optimize the code for systems with hundreds of CPUs
     and thousands of breakpoints:

      - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
        per-CPU rwsem that is read-locked during most of the key
        operations.

      - Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and
        fetch_bp_busy_slots().

      - Apply micro-optimizations & cleanups.

  - Misc cleanups & enhancements"

* tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex
  perf: Fix pmu_filter_match()
  perf: Fix lockdep_assert_event_ctx()
  perf/x86/amd/lbr: Adjust LBR regardless of filtering
  perf/x86/utils: Fix uninitialized var in get_branch_type()
  perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  perf/x86/amd: Support PERF_SAMPLE_ADDR
  perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/x86/uncore: Add new Raptor Lake S support
  perf/x86/cstate: Add new Raptor Lake S support
  perf/x86/msr: Add new Raptor Lake S support
  perf/x86: Add new Raptor Lake S support
  bpf: Check flags for branch stack in bpf_read_branch_records helper
  perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails
  perf: Use sample_flags for raw_data
  perf: Use sample_flags for addr
  ...
2022-10-10 09:27:46 -07:00
Linus Torvalds
4899a36f91 powerpc updates for 6.1
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
 
  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
 
  - Add support for syscall wrappers.
 
  - Add support for KFENCE on 64-bit.
 
  - Update 64-bit HV KVM to use the new guest state entry/exit accounting API.
 
  - Support execute-only memory when using the Radix MMU (P9 or later).
 
  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
 
  - Updates to our linker script to move more data into read-only sections.
 
  - Allow the VDSO to be randomised on 32-bit.
 
  - Many other small features and fixes.
 
 Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe
 Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva,
 Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof
 Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
 Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure,
 Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram
 Sang, ye xingchen, Zheng Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNCpBMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDx3EACCf86iumFF3RyvENtDwoTRgH3H0z2E
 /ZC4LKrtxgaPFJzKUT4F0kLK85Hw5GzMEKK42NIhAB0o5vFwmEzxOtnlHOyEufAm
 EDIZDIfxV2J9Qx/cW2DSojPj/o9O6noXwhw9SBqMwiDWd8gXmNgOUEklAO7aR7Vq
 Ne2N2FLMNthZydCoHR6dAEjfe2ceFXP5cALwzQO+ILDdZQ0UcF2Yq4yw/gEDoCrB
 FH7mmE7UaQQHvYzo85VTZu7XfUys1P7kUcnhVurOg7/07ITnvnQR+itKZXC+bSft
 1K7ULtjd2QiCgxZA/apFc3lO46kqHVFsB3onRQw12/Ku5vfGFfY0L0iK97OgM4s0
 0u4r+J7A+MM5YBJVVjwZ6woYO5CWMHYKBZepxOpcvftPxj1LNkiHsryqKILGISEC
 aIY/lI0hpeNU4QshDMXzSTgeb/VF9O5cGPncTPkOFbXxD4RpVyz8tSngsG1+D8lj
 S6B2h3k4A14rnblLOxP22jcedBlTYQcRQS4vwr0a7+63QTjfSJ12xT3ucIAKU9f7
 65rVSS/igbrfxqHDmrd60WWZBMXeK0Zy7YIG6iYPTxpP31eFpSp9wtDlV7V2+EH2
 F2p+TJY8aTA8UW+2L5gigN3RsBeeEB8zxJkB14ivICM7+XzVu11PxPDqjDZYkfzC
 ueKKvCcHhHAYqQ==
 =TFBA
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Remove our now never-true definitions for pgd_huge() and p4d_leaf().

 - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.

 - Add support for syscall wrappers.

 - Add support for KFENCE on 64-bit.

 - Update 64-bit HV KVM to use the new guest state entry/exit accounting
   API.

 - Support execute-only memory when using the Radix MMU (P9 or later).

 - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.

 - Updates to our linker script to move more data into read-only
   sections.

 - Allow the VDSO to be randomised on 32-bit.

 - Many other small features and fixes.

Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.

* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
  KVM: PPC: Book3S HV: Fix stack frame regs marker
  powerpc: Don't add __powerpc_ prefix to syscall entry points
  powerpc/64s/interrupt: Fix stack frame regs marker
  powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
  powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
  powerpc/pseries: Add firmware details to the hardware description
  powerpc/powernv: Add opal details to the hardware description
  powerpc: Add device-tree model to the hardware description
  powerpc/64: Add logical PVR to the hardware description
  powerpc: Add PVR & CPU name to hardware description
  powerpc: Add hardware description string
  powerpc/configs: Enable PPC_UV in powernv_defconfig
  powerpc/configs: Update config files for removed/renamed symbols
  powerpc/mm: Fix UBSAN warning reported on hugetlb
  powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
  powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
  powerpc: Drops STABS_DEBUG from linker scripts
  powerpc/64s: Remove lost/old comment
  powerpc/64s: Remove old STAB comment
  powerpc: remove orphan systbl_chk.sh
  ...
2022-10-09 14:05:15 -07:00
Michael Ellerman
9474689020 powerpc: Don't add __powerpc_ prefix to syscall entry points
When using syscall wrappers the __SYSCALL_DEFINEx() and related macros
add a "__powerpc_" prefix to all syscall entry points.

So for example sys_mmap becomes __powerpc_sys_mmap.

This risks breaking workflows and tools that expect the old naming
scheme. At a minimum setting a breakpoint on eg. sys_mmap with gdb no
longer works.

There seems to be no compelling reason to add the "__powerpc_" prefix,
other than that it follows what some other arches do (x86, arm64, s390).

But unlike other arches powerpc doesn't always enable syscall wrappers,
so the syscall entry points can change name depending on CONFIG options.

For those reasons drop the "__powerpc_" prefix, reverting to the
existing naming.

Doing so reveals two prototypes in signal.h that have the incorrect type
when syscall wrappers are enabled. There are already prototypes for both
functions in syscalls.h, so drop the ones from signal.h.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006135940.1223988-1-mpe@ellerman.id.au
2022-10-07 00:59:54 +11:00
Nicholas Piggin
b2e82e495a powerpc/64s/interrupt: Fix stack frame regs marker
The value of the stack frame regs marker that gets saved on the stack in
interrupt entry code does not match the regs marker value, which breaks
stack frame marker matching.

This stray instruction looks to have been introduced in a mismerge.

Fixes: bf75a3258a ("powerpc/64s/interrupt: move early boot ILE fixup into a macro")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Mismerge by yours truly -_-]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221004132952.984341-1-npiggin@gmail.com
2022-10-05 11:09:22 +11:00
Nicholas Piggin
0fa6831811 powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
irq soft-masking means that when Linux irqs are disabled, the MSR[EE]
value can change from 1 to 0 asynchronously: if a masked interrupt of
the PACA_IRQ_MUST_HARD_MASK variety fires while irqs are disabled,
the masked handler will return with MSR[EE]=0.

This means a sequence like mtmsr(mfmsr() | MSR_FP) is racy if it can
be called with local irqs disabled, unless a hard_irq_disable has been
done.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221004051157.308999-2-npiggin@gmail.com
2022-10-04 23:16:20 +11:00
Nicholas Piggin
8154850b28 powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
This new assertion added is generally harmless and gets fixed up
naturally, but it does indicate a problem with MSR manipulation
somewhere.

Fixes: c39fb71a54 ("powerpc/64s/interrupt: masked handler debug check for previous hard disable")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221004051157.308999-1-npiggin@gmail.com
2022-10-04 23:16:20 +11:00
Masahiro Yamada
3216484550 kbuild: use obj-y instead extra-y for objects placed at the head
The objects placed at the head of vmlinux need special treatments:

 - arch/$(SRCARCH)/Makefile adds them to head-y in order to place
   them before other archives in the linker command line.

 - arch/$(SRCARCH)/kernel/Makefile adds them to extra-y instead of
   obj-y to avoid them going into built-in.a.

This commit gets rid of the latter.

Create vmlinux.a to collect all the objects that are unconditionally
linked to vmlinux. The objects listed in head-y are moved to the head
of vmlinux.a by using 'ar m'.

With this, arch/$(SRCARCH)/kernel/Makefile can consistently use obj-y
for builtin objects.

There is no *.o that is directly linked to vmlinux. Drop unneeded code
in scripts/clang-tools/gen_compile_commands.py.

$(AR) mPi needs 'T' to workaround the llvm-ar bug. The fix was suggested
by Nathan Chancellor [1].

[1]: https://lore.kernel.org/llvm/YyjjT5gQ2hGMH0ni@dev-arch.thelio-3990X/

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2022-10-02 18:04:05 +09:00
Michael Ellerman
5412297079 powerpc: Add device-tree model to the hardware description
Add the model of the machine we're on to the hardware description, which
is printed at boot and in case of an oops.

eg: Hardware name: IBM,8247-22L

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-4-mpe@ellerman.id.au
2022-09-30 18:35:53 +10:00
Michael Ellerman
48b7019b6a powerpc/64: Add logical PVR to the hardware description
If we detect a logical PVR add that to the hardware description, which
is printed at boot and in case of an oops.

eg: Hardware name: ... 0xf000004

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-3-mpe@ellerman.id.au
2022-09-30 18:35:53 +10:00
Michael Ellerman
bd649d40e0 powerpc: Add PVR & CPU name to hardware description
Add the PVR and CPU name to the hardware description, which is printed
at boot and in case of an oops.

eg: Hardware name: ... POWER8E (raw) 0x4b0201

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-2-mpe@ellerman.id.au
2022-09-30 18:35:52 +10:00
Michael Ellerman
41dc056391 powerpc: Add hardware description string
Create a hardware description string, which we will use to record
various details of the hardware platform we are running on.

Print the accumulated description at boot, and use it to set the generic
description which is printed in oopses.

To begin with add ppc_md.name, aka the "machine description".

Example output at boot with the full series applied:

  Linux version 6.0.0-rc2-gcc-11.1.0-00199-g893f9007a5ce-dirty (michael@alpine1-p1) (powerpc64-linux-gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #844 SMP Thu Sep 29 22:29:53 AEST 2022
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-5b4c5a pSeries
  printk: bootconsole [udbg0] enabled

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-1-mpe@ellerman.id.au
2022-09-30 18:35:52 +10:00
Michael Ellerman
7673335e2a powerpc: Drops STABS_DEBUG from linker scripts
No toolchain we support should be generating stabs debug information
anymore. Drop the sections entirely from our linker scripts.

We removed all the manual stabs annotations in commit
1231816373 ("powerpc/32: Remove remaining .stabs annotations").

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220928130951.1732983-1-mpe@ellerman.id.au
2022-09-30 18:35:52 +10:00
Michael Ellerman
0c36099642 powerpc/64s: Remove lost/old comment
The bulk of this was moved/reworded in:
  57f266497d ("powerpc: Use gas sections for arranging exception vectors")

And now appears around line 700 in arch/powerpc/kernel/exceptions-64s.S.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220928130941.1732818-1-mpe@ellerman.id.au
2022-09-30 18:35:51 +10:00
Michael Ellerman
57a8e4b26e powerpc/64s: Remove old STAB comment
This used to be about the 0x4300 handler, but that was moved in commit
da2bc4644c ("powerpc/64s: Add new exception vector macros").

Note that "STAB" here refers to "Segment Table" not the debug format.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220928130912.1732466-1-mpe@ellerman.id.au
2022-09-30 18:35:51 +10:00
Nicholas Piggin
a08661af4c powerpc: remove orphan systbl_chk.sh
arch/powerpc/kernel/systbl_chk.sh has not been referenced since commit
ab66dcc76d ("powerpc: generate uapi header and system call table
files"). Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220929032120.3592593-1-npiggin@gmail.com
2022-09-30 18:35:51 +10:00