Commit Graph

3749 Commits

Author SHA1 Message Date
Alexander Gordeev
6cbd7cc2eb s390/smp: call smp_reinit_ipl_cpu() before scheduler is available
Currently smp_reinit_ipl_cpu() is a pre-SMP early initcall.
That ensures no CPU is running in parallel, but still not
enough to assume the code is exclusive, since the scheduling
is already available.

Move the function call to arch_call_rest_init() callback
to ensure no thread could be preempted and allow lockless
allocation of the kernel page tables. That is needed to
allow a follow-up rework of the absolute lowcore access
mechanism.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-09-14 16:46:00 +02:00
Vasily Gorbik
d61bb30e43 Merge branch 'fixes' into features
* fixes:
  s390/smp: enforce lowcore protection on CPU restart
  s390/boot: fix absolute zero lowcore corruption on boot
  s390/hugetlb: fix prepare_hugepage_range() check for 2 GB hugepages
  s390: update defconfigs
  s390: fix nospec table alignments
  s390/mm: remove useless hugepage address alignment

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-09-14 16:41:21 +02:00
Alexander Gordeev
8d96bba75a s390/smp: enforce lowcore protection on CPU restart
As result of commit 915fea04f9 ("s390/smp: enable DAT before
CPU restart callback is called") the low-address protection bit
gets mistakenly unset in control register 0 save area of the
absolute zero memory. That area is used when manual PSW restart
happened to hit an offline CPU. In this case the low-address
protection for that CPU will be dropped.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Fixes: 915fea04f9 ("s390/smp: enable DAT before CPU restart callback is called")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-09-07 14:04:01 +02:00
Alexander Gordeev
12dd19c159 s390/boot: fix absolute zero lowcore corruption on boot
Crash dump always starts on CPU0. In case CPU0 is offline the
prefix page is not installed and the absolute zero lowcore is
used. However, struct lowcore::mcesad is never assigned and
stays zero. That leads to __machine_kdump() -> save_vx_regs()
call silently stores vector registers to the absolute lowcore
at 0x11b0 offset.

Fixes: a62bc07392 ("s390/kdump: add support for vector extension")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-09-07 14:04:01 +02:00
Wolfram Sang
820109fb11 s390: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220818205948.6360-1-wsa+renesas@sang-engineering.com
Link: https://lore.kernel.org/r/20220818210102.7301-1-wsa+renesas@sang-engineering.com
[gor@linux.ibm.com: squashed two changes linked above together]
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-08-30 22:00:33 +02:00
Josh Poimboeuf
c9305b6c1f s390: fix nospec table alignments
Add proper alignment for .nospec_call_table and .nospec_return_table in
vmlinux.

[hca@linux.ibm.com]: The problem with the missing alignment of the nospec
tables exist since a long time, however only since commit e6ed91fd07
("s390/alternatives: remove padding generation code") and with
CONFIG_RELOCATABLE=n the kernel may also crash at boot time.

The above named commit reduced the size of struct alt_instr by one byte,
so its new size is 11 bytes. Therefore depending on the number of cpu
alternatives the size of the __alt_instructions array maybe odd, which
again also causes that the addresses of the nospec tables will be odd.

If the address of __nospec_call_start is odd and the kernel is compiled
With CONFIG_RELOCATABLE=n the compiler may generate code that loads the
address of __nospec_call_start with a 'larl' instruction.

This will generate incorrect code since the 'larl' instruction only works
with even addresses. In result the members of the nospec tables will be
accessed with an off-by-one offset, which subsequently may lead to
addressing exceptions within __nospec_revert().

Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/8719bf1ce4a72ebdeb575200290094e9ce047bcc.1661557333.git.jpoimboe@kernel.org
Cc: <stable@vger.kernel.org> # 4.16
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-08-30 21:57:07 +02:00
Brian Foster
13cccafe0e s390: fix double free of GS and RI CBs on fork() failure
The pointers for guarded storage and runtime instrumentation control
blocks are stored in the thread_struct of the associated task. These
pointers are initially copied on fork() via arch_dup_task_struct()
and then cleared via copy_thread() before fork() returns. If fork()
happens to fail after the initial task dup and before copy_thread(),
the newly allocated task and associated thread_struct memory are
freed via free_task() -> arch_release_task_struct(). This results in
a double free of the guarded storage and runtime info structs
because the fields in the failed task still refer to memory
associated with the source task.

This problem can manifest as a BUG_ON() in set_freepointer() (with
CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled)
when running trinity syscall fuzz tests on s390x. To avoid this
problem, clear the associated pointer fields in
arch_dup_task_struct() immediately after the new task is copied.
Note that the RI flag is still cleared in copy_thread() because it
resides in thread stack memory and that is where stack info is
copied.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: 8d9047f8b9 ("s390/runtime instrumentation: simplify task exit handling")
Fixes: 7b83c6297d ("s390/guarded storage: simplify task exit handling")
Cc: <stable@vger.kernel.org> # 4.15
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-08-25 15:12:32 +02:00
Linus Torvalds
24cb958695 s390 updates for 5.20 merge window
- Rework copy_oldmem_page() callback to take an iov_iter.
   This includes few prerequisite updates and fixes to the
   oldmem reading code.
 
 - Rework cpufeature implementation to allow for various CPU feature
   indications, which is not only limited to hardware capabilities,
   but also allows CPU facilities.
 
 - Use the cpufeature rework to autoload Ultravisor module when CPU
   facility 158 is available.
 
 - Add ELF note type for encrypted CPU state of a protected virtual CPU.
   The zgetdump tool from s390-tools package will decrypt the CPU state
   using a Customer Communication Key and overwrite respective notes to
   make the data accessible for crash and other debugging tools.
 
 - Use vzalloc() instead of vmalloc() + memset() in ChaCha20 crypto test.
 
 - Fix incorrect recovery of kretprobe modified return address in stacktrace.
 
 - Switch the NMI handler to use generic irqentry_nmi_enter() and
   irqentry_nmi_exit() helper functions.
 
 - Rework the cryptographic Adjunct Processors (AP) pass-through design
   to support dynamic changes to the AP matrix of a running guest as well
   as to implement more of the AP architecture.
 
 - Minor boot code cleanups.
 
 - Grammar and typo fixes to hmcdrv and tape drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCYu4dRBccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8DnlAP45Sk4cE35T+Z0vdHE2f0uMXE/p
 uHNjS3fDZOQVFJ2jZwEA99xPF5qPCttbR/b1VHsMSb30684IT1A4PC7y05kgfAw=
 =jCc3
 -----END PGP SIGNATURE-----

Merge tag 's390-5.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

 - Rework copy_oldmem_page() callback to take an iov_iter.

   This includes a few prerequisite updates and fixes to the oldmem
   reading code.

 - Rework cpufeature implementation to allow for various CPU feature
   indications, which is not only limited to hardware capabilities, but
   also allows CPU facilities.

 - Use the cpufeature rework to autoload Ultravisor module when CPU
   facility 158 is available.

 - Add ELF note type for encrypted CPU state of a protected virtual CPU.
   The zgetdump tool from s390-tools package will decrypt the CPU state
   using a Customer Communication Key and overwrite respective notes to
   make the data accessible for crash and other debugging tools.

 - Use vzalloc() instead of vmalloc() + memset() in ChaCha20 crypto
   test.

 - Fix incorrect recovery of kretprobe modified return address in
   stacktrace.

 - Switch the NMI handler to use generic irqentry_nmi_enter() and
   irqentry_nmi_exit() helper functions.

 - Rework the cryptographic Adjunct Processors (AP) pass-through design
   to support dynamic changes to the AP matrix of a running guest as
   well as to implement more of the AP architecture.

 - Minor boot code cleanups.

 - Grammar and typo fixes to hmcdrv and tape drivers.

* tag 's390-5.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
  Revert "s390/smp: enforce lowcore protection on CPU restart"
  Revert "s390/smp: rework absolute lowcore access"
  Revert "s390/smp,ptdump: add absolute lowcore markers"
  s390/unwind: fix fgraph return address recovery
  s390/nmi: use irqentry_nmi_enter()/irqentry_nmi_exit()
  s390: add ELF note type for encrypted CPU state of a PV VCPU
  s390/smp,ptdump: add absolute lowcore markers
  s390/smp: rework absolute lowcore access
  s390/setup: rearrange absolute lowcore initialization
  s390/boot: cleanup adjust_to_uv_max() function
  s390/smp: enforce lowcore protection on CPU restart
  s390/tape: fix comment typo
  s390/hmcdrv: fix Kconfig "its" grammar
  s390/docs: fix warnings for vfio_ap driver doc
  s390/docs: fix warnings for vfio_ap driver lock usage doc
  s390/crash: support multi-segment iterators
  s390/crash: use static swap buffer for copy_to_user_real()
  s390/crash: move copy_to_user_real() to crash_dump.c
  s390/zcore: fix race when reading from hardware system area
  s390/crash: fix incorrect number of bytes to copy to user space
  ...
2022-08-06 17:05:21 -07:00
Alexander Gordeev
953503751a Revert "s390/smp: enforce lowcore protection on CPU restart"
This reverts commit 6f5c672d17.

This breaks normal crash dump when CPU0 is offline.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-08-06 09:29:46 +02:00
Alexander Gordeev
5e441f61f5 Revert "s390/smp: rework absolute lowcore access"
This reverts commit 7d06fed77b.

This introduced vmem_mutex locking from vmem_map_4k_page()
function called from smp_reinit_ipl_cpu() with interrupts
disabled. While it is a pre-SMP early initcall no other CPUs
running in parallel nor other code taking vmem_mutex on this
boot stage - it still needs to be fixed.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-08-06 09:24:07 +02:00
Linus Torvalds
7c5c3a6177 ARM:
* Unwinder implementations for both nVHE modes (classic and
   protected), complete with an overflow stack
 
 * Rework of the sysreg access from userspace, with a complete
   rewrite of the vgic-v3 view to allign with the rest of the
   infrastructure
 
 * Disagregation of the vcpu flags in separate sets to better track
   their use model.
 
 * A fix for the GICv2-on-v3 selftest
 
 * A small set of cosmetic fixes
 
 RISC-V:
 
 * Track ISA extensions used by Guest using bitmap
 
 * Added system instruction emulation framework
 
 * Added CSR emulation framework
 
 * Added gfp_custom flag in struct kvm_mmu_memory_cache
 
 * Added G-stage ioremap() and iounmap() functions
 
 * Added support for Svpbmt inside Guest
 
 s390:
 
 * add an interface to provide a hypervisor dump for secure guests
 
 * improve selftests to use TAP interface
 
 * enable interpretive execution of zPCI instructions (for PCI passthrough)
 
 * First part of deferred teardown
 
 * CPU Topology
 
 * PV attestation
 
 * Minor fixes
 
 x86:
 
 * Permit guests to ignore single-bit ECC errors
 
 * Intel IPI virtualization
 
 * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS
 
 * PEBS virtualization
 
 * Simplify PMU emulation by just using PERF_TYPE_RAW events
 
 * More accurate event reinjection on SVM (avoid retrying instructions)
 
 * Allow getting/setting the state of the speaker port data bit
 
 * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent
 
 * "Notify" VM exit (detect microarchitectural hangs) for Intel
 
 * Use try_cmpxchg64 instead of cmpxchg64
 
 * Ignore benign host accesses to PMU MSRs when PMU is disabled
 
 * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior
 
 * Allow NX huge page mitigation to be disabled on a per-vm basis
 
 * Port eager page splitting to shadow MMU as well
 
 * Enable CMCI capability by default and handle injected UCNA errors
 
 * Expose pid of vcpu threads in debugfs
 
 * x2AVIC support for AMD
 
 * cleanup PIO emulation
 
 * Fixes for LLDT/LTR emulation
 
 * Don't require refcounted "struct page" to create huge SPTEs
 
 * Miscellaneous cleanups:
 ** MCE MSR emulation
 ** Use separate namespaces for guest PTEs and shadow PTEs bitmasks
 ** PIO emulation
 ** Reorganize rmap API, mostly around rmap destruction
 ** Do not workaround very old KVM bugs for L0 that runs with nesting enabled
 ** new selftests API for CPUID
 
 Generic:
 
 * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache
 
 * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLnyo4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtQQf/XjVWiRcWLPR9dqzRM/vvRXpiG+UL
 jU93R7m6ma99aqTtrxV/AE+kHgamBlma3Cwo+AcWk9uCVNbIhFjv2YKg6HptKU0e
 oJT3zRYp+XIjEo7Kfw+TwroZbTlG6gN83l1oBLFMqiFmHsMLnXSI2mm8MXyi3dNB
 vR2uIcTAl58KIprqNNsYJ2dNn74ogOMiXYx9XzoA9/5Xb6c0h4rreHJa5t+0s9RO
 Gz7Io3PxumgsbJngjyL1Ve5oxhlIAcZA8DU0PQmjxo3eS+k6BcmavGFd45gNL5zg
 iLpCh4k86spmzh8CWkAAwWPQE4dZknK6jTctJc0OFVad3Z7+X7n0E8TFrA==
 =PM8o
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Quite a large pull request due to a selftest API overhaul and some
  patches that had come in too late for 5.19.

  ARM:

   - Unwinder implementations for both nVHE modes (classic and
     protected), complete with an overflow stack

   - Rework of the sysreg access from userspace, with a complete rewrite
     of the vgic-v3 view to allign with the rest of the infrastructure

   - Disagregation of the vcpu flags in separate sets to better track
     their use model.

   - A fix for the GICv2-on-v3 selftest

   - A small set of cosmetic fixes

  RISC-V:

   - Track ISA extensions used by Guest using bitmap

   - Added system instruction emulation framework

   - Added CSR emulation framework

   - Added gfp_custom flag in struct kvm_mmu_memory_cache

   - Added G-stage ioremap() and iounmap() functions

   - Added support for Svpbmt inside Guest

  s390:

   - add an interface to provide a hypervisor dump for secure guests

   - improve selftests to use TAP interface

   - enable interpretive execution of zPCI instructions (for PCI
     passthrough)

   - First part of deferred teardown

   - CPU Topology

   - PV attestation

   - Minor fixes

  x86:

   - Permit guests to ignore single-bit ECC errors

   - Intel IPI virtualization

   - Allow getting/setting pending triple fault with
     KVM_GET/SET_VCPU_EVENTS

   - PEBS virtualization

   - Simplify PMU emulation by just using PERF_TYPE_RAW events

   - More accurate event reinjection on SVM (avoid retrying
     instructions)

   - Allow getting/setting the state of the speaker port data bit

   - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls
     are inconsistent

   - "Notify" VM exit (detect microarchitectural hangs) for Intel

   - Use try_cmpxchg64 instead of cmpxchg64

   - Ignore benign host accesses to PMU MSRs when PMU is disabled

   - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

   - Allow NX huge page mitigation to be disabled on a per-vm basis

   - Port eager page splitting to shadow MMU as well

   - Enable CMCI capability by default and handle injected UCNA errors

   - Expose pid of vcpu threads in debugfs

   - x2AVIC support for AMD

   - cleanup PIO emulation

   - Fixes for LLDT/LTR emulation

   - Don't require refcounted "struct page" to create huge SPTEs

   - Miscellaneous cleanups:
      - MCE MSR emulation
      - Use separate namespaces for guest PTEs and shadow PTEs bitmasks
      - PIO emulation
      - Reorganize rmap API, mostly around rmap destruction
      - Do not workaround very old KVM bugs for L0 that runs with nesting enabled
      - new selftests API for CPUID

  Generic:

   - Fix races in gfn->pfn cache refresh; do not pin pages tracked by
     the cache

   - new selftests API using struct kvm_vcpu instead of a (vm, id)
     tuple"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits)
  selftests: kvm: set rax before vmcall
  selftests: KVM: Add exponent check for boolean stats
  selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test
  selftests: KVM: Check stat name before other fields
  KVM: x86/mmu: remove unused variable
  RISC-V: KVM: Add support for Svpbmt inside Guest/VM
  RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap()
  RISC-V: KVM: Add G-stage ioremap() and iounmap() functions
  KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache
  RISC-V: KVM: Add extensible CSR emulation framework
  RISC-V: KVM: Add extensible system instruction emulation framework
  RISC-V: KVM: Factor-out instruction emulation into separate sources
  RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run
  RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function
  RISC-V: KVM: Fix variable spelling mistake
  RISC-V: KVM: Improve ISA extension by using a bitmap
  KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs()
  KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog
  KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT
  KVM: x86: Do not block APIC write for non ICR registers
  ...
2022-08-04 14:59:54 -07:00
Linus Torvalds
a0b09f2d6f Random number generator updates for Linux 6.0-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmLnDOwACgkQSfxwEqXe
 A65Fiw//Z0YaPejSslQIGitQ1b0XzdWBhyJArYDieaaiQRXMqlaSKlIUqHz38xb7
 +FykUY51/SJLjHV2riPxq1OK3/MPmk6VlTd0HHihcHVmg77oZcFcv2tPnDpZoqND
 TsBOujLbXKwxP8tNFedRY/4+K7w+ue9BTfDjuH7aCtz7uWd+4cNJmPg3x9FCfkMA
 +hbcRluwE9W3Pg4OCKwv+qxL0JF3qQtNKEOp1wpnjGAZZW/I9gFNgFBEkykvcAsj
 TkIRDc3agPFj6QgDeRIgLdnf9KCsLubKAg5oJneeCvQztJJUCSkn8nQXxpx+4sLo
 GsRgvCdfL/GyJqfSAzQJVYDHKtKMkJiCiWCC/oOALR8dzHJfSlULDAjbY1m/DAr9
 at+vi4678Or7TNx2ZSaUlCXXKZ+UT7yWMlQWax9JuxGk1hGYP5/eT1AH5SGjqUwF
 w1q8oyzxt1vUcnOzEddFXPFirnqqhAk4dQFtu83+xKM4ZssMVyeB4NZdEhAdW0ng
 MX+RjrVj4l5gWWuoS0Cx3LUxDCgV6WT0dN+Vl9axAZkoJJbcXLEmXwQ6NbzTLPWg
 1/MT7qFTxNcTCeAArMdZvvFbeh7pOBXO42pafrK/7vDRnTMUIw9tqXNLQUfvdFQp
 F5flPgiVRHDU2vSzKIFtnPTyXU0RBBGvNb4n0ss2ehH2DSsCxYE=
 =Zy3d
 -----END PGP SIGNATURE-----

Merge tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "Though there's been a decent amount of RNG-related development during
  this last cycle, not all of it is coming through this tree, as this
  cycle saw a shift toward tackling early boot time seeding issues,
  which took place in other trees as well.

  Here's a summary of the various patches:

   - The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
     option have been removed, as they overlapped with the more widely
     supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
     "random.trust_cpu". This change allowed simplifying a bit of arch
     code.

   - x86's RDRAND boot time test has been made a bit more robust, with
     RDRAND disabled if it's clearly producing bogus results. This would
     be a tip.git commit, technically, but I took it through random.git
     to avoid a large merge conflict.

   - The RNG has long since mixed in a timestamp very early in boot, on
     the premise that a computer that does the same things, but does so
     starting at different points in wall time, could be made to still
     produce a different RNG state. Unfortunately, the clock isn't set
     early in boot on all systems, so now we mix in that timestamp when
     the time is actually set.

   - User Mode Linux now uses the host OS's getrandom() syscall to
     generate a bootloader RNG seed and later on treats getrandom() as
     the platform's RDRAND-like faculty.

   - The arch_get_random_{seed_,}_long() family of functions is now
     arch_get_random_{seed_,}_longs(), which enables certain platforms,
     such as s390, to exploit considerable performance advantages from
     requesting multiple CPU random numbers at once, while at the same
     time compiling down to the same code as before on platforms like
     x86.

   - A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
     Uros.

   - A comment spelling fix"

More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:

  https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/

* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: correct spelling of "overwrites"
  random: handle archrandom with multiple longs
  um: seed rng using host OS rng
  random: use try_cmpxchg in _credit_init_bits
  timekeeping: contribute wall clock to rng on time change
  x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
  random: remove CONFIG_ARCH_RANDOM
2022-08-02 17:31:35 -07:00
Linus Torvalds
043402495d integrity-v6.0
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCYulqTBQcem9oYXJAbGlu
 dXguaWJtLmNvbQAKCRDLwZzRsCrn5SBBAP9nbAW1SPa/hDqbrclHdDrS59VkSVwv
 6ZO2yAmxJAptHwD+JzyJpJiZsqVN/Tu85V1PqeAt9c8az8f3CfDBp2+w7AA=
 =Ad+c
 -----END PGP SIGNATURE-----

Merge tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
 "Aside from the one EVM cleanup patch, all the other changes are kexec
  related.

  On different architectures different keyrings are used to verify the
  kexec'ed kernel image signature. Here are a number of preparatory
  cleanup patches and the patches themselves for making the keyrings -
  builtin_trusted_keyring, .machine, .secondary_trusted_keyring, and
  .platform - consistent across the different architectures"

* tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification
  arm64: kexec_file: use more system keyrings to verify kernel image signature
  kexec, KEYS: make the code in bzImage64_verify_sig generic
  kexec: clean up arch_kexec_kernel_verify_sig
  kexec: drop weak attribute from functions
  kexec_file: drop weak attribute from functions
  evm: Use IS_ENABLED to initialize .enabled
2022-08-02 15:21:18 -07:00
Linus Torvalds
22a39c3d86 This was a fairly quiet cycle for the locking subsystem:
- lockdep: Fix a handful of the more complex lockdep_init_map_*() primitives
    that can lose the lock_type & cause false reports. No such mishap was
    observed in the wild.
 
  - jump_label improvements: simplify the cross-arch support of
    initial NOP patching by making it arch-specific code (used on MIPS only),
    and remove the s390 initial NOP patching that was superfluous.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmLn3jERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hzeg/7BTC90XeMANhTiL23iiH7dOYZwqdFeB12
 VBqdaPaGC8i+mJzVAdGyPFwCFDww6Ak6P33PcHkemuIO5+DhWis8hfw5krHEOO1k
 AyVSMOZuWJ8/g6ZenjgNFozQ8C+3NqURrpdqN55d7jhMazPWbsNLLqUgvSSqo6DY
 Ah2O+EKrDfGNCxT6/YaTAmUryctotxafSyFDQxv3RKPfCoIIVv9b3WApYqTOqFIu
 VYTPr+aAcMsU20hPMWQI4kbQaoCxFqr3bZiZtAiS/IEunqi+PlLuWjrnCUpLwVTC
 +jOCkNJHt682FPKTWelUnCnkOg9KhHRujRst5mi1+2tWAOEvKltxfe05UpsZYC3b
 jhzddREMwBt3iYsRn65LxxsN4AMK/C/41zjejHjZpf+Q5kwDsc6Ag3L5VifRFURS
 KRwAy9ejoVYwnL7CaVHM2zZtOk4YNxPeXmiwoMJmOufpdmD1LoYbNUbpSDf+goIZ
 yPJpxFI5UN8gi8IRo3DMe4K2nqcFBC3wFn8tNSAu+44gqDwGJAJL6MsLpkLSZkk8
 3QN9O11UCRTJDkURjoEWPgRRuIu9HZ4GKNhiblDy6gNM/jDE/m5OG4OYfiMhojgc
 KlMhsPzypSpeApL55lvZ+AzxH8mtwuUGwm8lnIdZ2kIse1iMwapxdWXWq9wQr8eW
 jLWHgyZ6rcg=
 =4B89
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "This was a fairly quiet cycle for the locking subsystem:

   - lockdep: Fix a handful of the more complex lockdep_init_map_*()
     primitives that can lose the lock_type & cause false reports. No
     such mishap was observed in the wild.

   - jump_label improvements: simplify the cross-arch support of initial
     NOP patching by making it arch-specific code (used on MIPS only),
     and remove the s390 initial NOP patching that was superfluous"

* tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Fix lockdep_init_map_*() confusion
  jump_label: make initial NOP patching the special case
  jump_label: mips: move module NOP patching into arch code
  jump_label: s390: avoid pointless initial NOP patching
2022-08-01 12:15:27 -07:00
Paolo Bonzini
63f4b21041 Merge remote-tracking branch 'kvm/next' into kvm-next-5.20
KVM/s390, KVM/x86 and common infrastructure changes for 5.20

x86:

* Permit guests to ignore single-bit ECC errors

* Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit (detect microarchitectural hangs) for Intel

* Cleanups for MCE MSR emulation

s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to use TAP interface

* enable interpretive execution of zPCI instructions (for PCI passthrough)

* First part of deferred teardown

* CPU Topology

* PV attestation

* Minor fixes

Generic:

* new selftests API using struct kvm_vcpu instead of a (vm, id) tuple

x86:

* Use try_cmpxchg64 instead of cmpxchg64

* Bugfixes

* Ignore benign host accesses to PMU MSRs when PMU is disabled

* Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

* x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

* Port eager page splitting to shadow MMU as well

* Enable CMCI capability by default and handle injected UCNA errors

* Expose pid of vcpu threads in debugfs

* x2AVIC support for AMD

* cleanup PIO emulation

* Fixes for LLDT/LTR emulation

* Don't require refcounted "struct page" to create huge SPTEs

x86 cleanups:

* Use separate namespaces for guest PTEs and shadow PTEs bitmasks

* PIO emulation

* Reorganize rmap API, mostly around rmap destruction

* Do not workaround very old KVM bugs for L0 that runs with nesting enabled

* new selftests API for CPUID
2022-08-01 03:21:00 -04:00
Sven Schnelle
520763a327 s390/nmi: use irqentry_nmi_enter()/irqentry_nmi_exit()
With generic entry in place switch the nmi handler to use
the generic entry helper functions.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-28 18:05:24 +02:00
Alexander Gordeev
7d06fed77b s390/smp: rework absolute lowcore access
Temporary unsetting of the prefix page in memcpy_absolute() routine
poses a risk of executing code path with unexpectedly disabled prefix
page. This rework avoids the prefix page uninstalling and disabling
of normal and machine check interrupts when accessing the absolute
zero memory.

Although memcpy_absolute() routine can access the whole memory, it is
only used to update the absolute zero lowcore. This rework therefore
introduces a new mechanism for the absolute zero lowcore access and
scraps memcpy_absolute() routine for good.

Instead, an area is reserved in the virtual memory that is used for
the absolute lowcore access only. That area holds an array of 8KB
virtual mappings - one per CPU. Whenever a CPU is brought online, the
corresponding item is mapped to the real address of the previously
installed prefix page.

The absolute zero lowcore access works like this: a CPU calls the
new primitive get_abs_lowcore() to obtain its 8KB mapping as a
pointer to the struct lowcore. Virtual address references to that
pointer get translated to the real addresses of the prefix page,
which in turn gets swapped with the absolute zero memory addresses
due to prefixing. Once the pointer is not needed it must be released
with put_abs_lowcore() primitive:

	struct lowcore *abs_lc;
	unsigned long flags;

	abs_lc = get_abs_lowcore(&flags);
	abs_lc->... = ...;
	put_abs_lowcore(abs_lc, flags);

To ensure the described mechanism works large segment- and region-
table entries must be avoided for the 8KB mappings. Failure to do
so results in usage of Region-Frame Absolute Address (RFAA) or
Segment-Frame Absolute Address (SFAA) large page fields. In that
case absolute addresses would be used to address the prefix page
instead of the real ones and the prefixing would get bypassed.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-28 18:05:23 +02:00
Alexander Gordeev
2e2493c675 s390/setup: rearrange absolute lowcore initialization
Make the absolute lowcore assignments immediately follow
the boot CPU lowcore same member assignments. This way
readability improves when reading from up to down, with
no out of order mcck stack allocation in-between.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-28 18:05:23 +02:00
Alexander Gordeev
6f5c672d17 s390/smp: enforce lowcore protection on CPU restart
As result of commit 915fea04f9 ("s390/smp: enable DAT before
CPU restart callback is called") the low-address protection bit
gets mistakenly unset in control register 0 save area of the
absolute zero memory. That area is used when manual PSW restart
happened to hit an offline CPU. In this case the low-address
protection for that CPU will be dropped.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Fixes: 915fea04f9 ("s390/smp: enable DAT before CPU restart callback is called")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-28 18:05:23 +02:00
Alexander Gordeev
2f089a3846 Merge branch 'vmcore-iov_iter' into features
Pull changes that finalize switching of copy_oldmem_page() callback
to iov_iter interface. These changes were pulled in work.iov_iter of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-28 17:53:11 +02:00
Alexander Gordeev
ebbc957016 s390/crash: support multi-segment iterators
Make it possible to handle not only single-, but also multi-
segment iterators in copy_oldmem_iter() callback. Change the
semantics of called functions to match the iterator model -
instead of an error code the exact number of bytes copied is
returned.

The swap page used to copy data to user space is adopted for
kernel space too. That does not bring any performance impact.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Fixes: cc02e6e21a ("s390/crash: add missing iterator advance in copy_oldmem_page()")
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/5af6da3a0bffe48a90b0b7139ecf6a818b2d18e8.1658206891.git.agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-20 17:21:41 +02:00
Alexander Gordeev
6d2e5a4a13 s390/crash: use static swap buffer for copy_to_user_real()
Currently a temporary page-size buffer is allocated for copying
oldmem to user space. That limits copy_to_user_real() operation
only to stages when virtual memory is available and still makes
it possible to fail while the system is being dumped.

Instead of reallocating single page on each copy_oldmem_page()
iteration use a statically allocated buffer.

This also paves the way for a further memcpy_real() rework where
no swap buffer is needed altogether.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/334ed359680c4d45dd32feb104909f610312ef0f.1658206891.git.agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-20 17:21:41 +02:00
Alexander Gordeev
d6da673781 s390/crash: move copy_to_user_real() to crash_dump.c
Function copy_to_user_real() does not really belong to maccess.c.
It is only used for copying oldmem to user space, so let's move
it to the friends.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/e8de968d40202d87caa09aef12e9c67ec23a1c1a.1658206891.git.agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-20 17:21:41 +02:00
Alexander Gordeev
f6749da17a s390/crash: fix incorrect number of bytes to copy to user space
The number of bytes in a chunk is correctly calculated, but instead
the total number of bytes is passed to copy_to_user_real() function.

Reported-by: Matthew Wilcox <willy@infradead.org>
Fixes: df9694c797 ("s390/dump: streamline oldmem copy functions")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-20 17:21:41 +02:00
Alexander Gordeev
86caa4b678 s390/crash: remove redundant panic() on save area allocation failure
Make save_area_alloc() return classic NULL on allocation failure.
The only caller smp_save_dump_cpus() does check the return value
already and panics if NULL is returned.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-20 17:21:41 +02:00
Steffen Eiden
5fcd0d8ae2 s390/uvdevice: autoload module based on CPU facility
Make sure the uvdevice driver will be automatically loaded when
facility 158 is available.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713125644.16121-4-seiden@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-19 16:18:49 +02:00
Heiko Carstens
e2f39c9f54 s390/cpufeature: allow for facility bits
Allow for facility bits to be used in cpu features.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713125644.16121-3-seiden@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-19 16:18:49 +02:00
Heiko Carstens
0a5f9b382c s390/cpufeature: rework to allow more than only hwcap bits
Rework cpufeature implementation to allow for various cpu feature
indications, which is not only limited to hwcap bits. This is achieved
by adding a sequential list of cpu feature numbers, where each of them
is mapped to an entry which indicates what this number is about.

Each entry contains a type member, which indicates what feature
name space to look into (e.g. hwcap, or cpu facility). If wanted this
allows also to automatically load modules only in e.g. z/VM
configurations.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713125644.16121-2-seiden@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-07-19 16:18:49 +02:00
Jason A. Donenfeld
9592eef7c1 random: remove CONFIG_ARCH_RANDOM
When RDRAND was introduced, there was much discussion on whether it
should be trusted and how the kernel should handle that. Initially, two
mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and
"nordrand", a boot-time switch.

Later the thinking evolved. With a properly designed RNG, using RDRAND
values alone won't harm anything, even if the outputs are malicious.
Rather, the issue is whether those values are being *trusted* to be good
or not. And so a new set of options were introduced as the real
ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu".
With these options, RDRAND is used, but it's not always credited. So in
the worst case, it does nothing, and in the best case, maybe it helps.

Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the
center and became something certain platforms force-select.

The old options don't really help with much, and it's a bit odd to have
special handling for these instructions when the kernel can deal fine
with the existence or untrusted existence or broken existence or
non-existence of that CPU capability.

Simplify the situation by removing CONFIG_ARCH_RANDOM and using the
ordinary asm-generic fallback pattern instead, keeping the two options
that are actually used. For now it leaves "nordrand" for now, as the
removal of that will take a different route.

Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-18 15:03:37 +02:00
Michal Suchanek
0828c4a39b kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification
commit e23a8020ce ("s390/kexec_file: Signature verification prototype")
adds support for KEXEC_SIG verification with keys from platform keyring
but the built-in keys and secondary keyring are not used.

Add support for the built-in keys and secondary keyring as x86 does.

Fixes: e23a8020ce ("s390/kexec_file: Signature verification prototype")
Cc: stable@vger.kernel.org
Cc: Philipp Rudo <prudo@linux.ibm.com>
Cc: kexec@lists.infradead.org
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15 12:21:16 -04:00
Claudio Imbrenda
72b1daff26 KVM: s390: pv: add export before import
Due to upcoming changes, it will be possible to temporarily have
multiple protected VMs in the same address space, although only one
will be actually active.

In that scenario, it is necessary to perform an export of every page
that is to be imported, since the hardware does not allow a page
belonging to a protected guest to be imported into a different
protected guest.

This also applies to pages that are shared, and thus accessible by the
host.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20220628135619.32410-7-imbrenda@linux.ibm.com
Message-Id: <20220628135619.32410-7-imbrenda@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-13 14:42:11 +00:00
Claudio Imbrenda
a52c25848e KVM: s390: pv: handle secure storage violations for protected guests
A secure storage violation is triggered when a protected guest tries to
access secure memory that has been mapped erroneously, or that belongs
to a different protected guest or to the ultravisor.

With upcoming patches, protected guests will be able to trigger secure
storage violations in normal operation. This happens for example if a
protected guest is rebooted with deferred destroy enabled and the new
guest is also protected.

When the new protected guest touches pages that have not yet been
destroyed, and thus are accounted to the previous protected guest, a
secure storage violation is raised.

This patch adds handling of secure storage violations for protected
guests.

This exception is handled by first trying to destroy the page, because
it is expected to belong to a defunct protected guest where a destroy
should be possible. Note that a secure page can only be destroyed if
its protected VM does not have any CPUs, which only happens when the
protected VM is being terminated. If that fails, a normal export of
the page is attempted.

This means that pages that trigger the exception will be made
non-secure (in one way or another) before attempting to use them again
for a different secure guest.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20220628135619.32410-3-imbrenda@linux.ibm.com
Message-Id: <20220628135619.32410-3-imbrenda@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-13 14:42:11 +00:00
Steffen Eiden
1b6abe95b5 s390: Add attestation query information
We have information about the supported attestation header version
and plaintext attestation flag bits.
Let's expose it via the sysfs files.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/lkml/20220601100245.3189993-1-seiden@linux.ibm.com/
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-11 11:29:51 +02:00
Jason A. Donenfeld
e4f7440030 s390/archrandom: simplify back to earlier design and initialize earlier
s390x appears to present two RNG interfaces:
- a "TRNG" that gathers entropy using some hardware function; and
- a "DRBG" that takes in a seed and expands it.

Previously, the TRNG was wired up to arch_get_random_{long,int}(), but
it was observed that this was being called really frequently, resulting
in high overhead. So it was changed to be wired up to arch_get_random_
seed_{long,int}(), which was a reasonable decision. Later on, the DRBG
was then wired up to arch_get_random_{long,int}(), with a complicated
buffer filling thread, to control overhead and rate.

Fortunately, none of the performance issues matter much now. The RNG
always attempts to use arch_get_random_seed_{long,int}() first, which
means a complicated implementation of arch_get_random_{long,int}() isn't
really valuable or useful to have around. And it's only used when
reseeding, which means it won't hit the high throughput complications
that were faced before.

So this commit returns to an earlier design of just calling the TRNG in
arch_get_random_seed_{long,int}(), and returning false in arch_get_
random_{long,int}().

Part of what makes the simplification possible is that the RNG now seeds
itself using the TRNG at bootup. But this only works if the TRNG is
detected early in boot, before random_init() is called. So this commit
also causes that check to happen in setup_arch().

Cc: stable@vger.kernel.org
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Ingo Franzki <ifranzki@linux.ibm.com>
Cc: Juergen Christ <jchrist@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20220610222023.378448-1-Jason@zx2c4.com
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-30 19:40:36 +02:00
Ard Biesheuvel
7e6b9db27d jump_label: make initial NOP patching the special case
Instead of defaulting to patching NOP opcodes at init time, and leaving
it to the architectures to override this if this is not needed, switch
to a model where doing nothing is the default. This is the common case
by far, as only MIPS requires NOP patching at init time. On all other
architectures, the correct encodings are emitted by the compiler and so
no initial patching is needed.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220615154142.1574619-4-ardb@kernel.org
2022-06-24 09:48:55 +02:00
Ard Biesheuvel
fdfd42892f jump_label: mips: move module NOP patching into arch code
MIPS is the only remaining architecture that needs to patch jump label
NOP encodings to initialize them at load time. So let's move the module
patching part of that from generic code into arch/mips, and drop it from
the others.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220615154142.1574619-3-ardb@kernel.org
2022-06-24 09:48:55 +02:00
Ard Biesheuvel
0c3b61e00a jump_label: s390: avoid pointless initial NOP patching
Patching NOPs into other NOPs at boot time serves no purpose, so let's
use the same NOP encodings at compile time and runtime.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220615154142.1574619-2-ardb@kernel.org
2022-06-24 09:48:54 +02:00
Thomas Richter
21e8764487 s390/pai: Fix multiple concurrent event installation
Two different events such as pai_crypto/KM_AES_128/ and
pai_crypto/KM_AES_192/ can be installed multiple times on the same CPU
and the events are executed concurrently:

  # perf stat -e pai_crypto/KM_AES_128/  -C0 -a -- sleep 5 &
  # sleep 2
  # perf stat -e pai_crypto/KM_AES_192/ -C0 -a -- true

This results in the first event being installed two times with two seconds
delay. The kernel does install the second event after the first
event has been deleted and re-added, as can be seen in the traces:

 13:48:47.600350  paicrypt_start event 0x1007 (event KM_AES_128)
 13:48:49.599359  paicrypt_stop event 0x1007  (event KM_AES_128)
 13:48:49.599198  paicrypt_start event 0x1007
 13:48:49.599199  paicrypt_start event 0x1008
 13:48:49.599921  paicrypt_event_destroy event 0x1008
 13:48:52.601507  paicrypt_event_destroy event 0x1007

This is caused by functions event_sched_in() and event_sched_out() which
call the PMU's add() and start() functions on schedule_in and the PMU's
stop() and del() functions on schedule_out. This is correct for events
attached to processes.  The pai_crypto events are system-wide events
and not attached to processes.

Since the kernel common code can not be changed easily, fix this issue
and do not reset the event count value to zero each time the event is
added and started. Instead use a flag and zero the event count value
only when called immediately after the event has been initialized.
Therefore only the first invocation of the the event's add() function
initializes the event count value to zero. The following invocations
of the event's add() function leave the current event count value
untouched.

Fixes: 39d62336f5 ("s390/pai: add support for cryptography counters")

Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-23 17:24:00 +02:00
Thomas Richter
541a496644 s390/pai: Prevent invalid event number for pai_crypto PMU
The pai_crypto PMU has to check the event number. It has to be in
the supported range. This is not the case, the lower limit is not
checked. Fix this and obey the lower limit.

Fixes: 39d62336f5 ("s390/pai: add support for cryptography counters")

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-23 17:23:59 +02:00
Thomas Richter
be857b7f77 s390/cpumf: Handle events cycles and instructions identical
Events CPU_CYCLES and INSTRUCTIONS can be submitted with two different
perf_event attribute::type values:
 - PERF_TYPE_HARDWARE: when invoked via perf tool predefined events name
   cycles or cpu-cycles or instructions.
 - pmu->type: when invoked via perf tool event name cpu_cf/CPU_CYLCES/ or
   cpu_cf/INSTRUCTIONS/. This invocation also selects the PMU to which
   the event belongs.
Handle both type of invocations identical for events CPU_CYLCES and
INSTRUCTIONS. They address the same hardware.
The result is different when event modifier exclude_kernel is also set.
Invocation with event modifier for user space event counting fails.

Output before:

 # perf stat -e cpum_cf/cpu_cycles/u -- true

 Performance counter stats for 'true':

   <not supported>      cpum_cf/cpu_cycles/u

       0.000761033 seconds time elapsed

       0.000076000 seconds user
       0.000725000 seconds sys

 #

Output after:
 # perf stat -e cpum_cf/cpu_cycles/u -- true

 Performance counter stats for 'true':

           349,613      cpum_cf/cpu_cycles/u

       0.000844143 seconds time elapsed

       0.000079000 seconds user
       0.000800000 seconds sys
 #

Fixes: 6a82e23f45 ("s390/cpumf: Adjust registration of s390 PMU device drivers")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
[agordeev@linux.ibm.com corrected commit ID of Fixes commit]
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-23 17:23:59 +02:00
Alexander Gordeev
af2debd58b s390/crash: make copy_oldmem_page() return number of bytes copied
Callback copy_oldmem_page() returns either error code or zero.
Instead, it should return the error code or number of bytes copied.

Fixes: df9694c797 ("s390/dump: streamline oldmem copy functions")
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-23 17:23:30 +02:00
Alexander Gordeev
cc02e6e21a s390/crash: add missing iterator advance in copy_oldmem_page()
In case old memory was successfully copied the passed iterator
should be advanced as well. Currently copy_oldmem_page() is
always called with single-segment iterator. Should that ever
change - copy_oldmem_user and copy_oldmem_kernel() functions
would need a rework to deal with multi-segment iterators.

Fixes: 5d8de293c2 ("vmcore: convert copy_oldmem_page() to take an iov_iter")
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-23 17:23:30 +02:00
Paolo Bonzini
5552de7b92 KVM: s390: pvdump and selftest improvements
- add an interface to provide a hypervisor dump for secure guests
 - improve selftests to show tests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmKXf2wACgkQEXu8gLWm
 HHzu1Q//WjEuOX5nBjklMUlDB2oB2+vFSyW9lE7x9m38EnFTH8QTfH695ChVoNN+
 j06Fhd4ENjxqTTYs7z67tP4TSQ/LhB/GsPydKCEOnB/63+k2cnYeS3wsv19213F0
 IyvpN6MkzxoktV4m1EtKhlvXGpEBoXZCczgBLj3FYlNQ7kO8RsSkF9rOnhuP9Yjh
 l2876bWHWlbU0qWmRSAu0spkwHWjtyh/bnQKzXotQyrQ9bo1yMQvhe2HH8HVTSio
 cjRlseWVi01rJKzKcs6D7MFMctLKr5y0onxBgGJnRh27KoBY195ICH2Jz2LfJoor
 EP57YcXZqfxzKCGHTGgVYMgFeixX6nzBgqTpDIHMQzvoM1IrQKl+d5riepO03xpS
 gZxHtJqZi8s+t8w0ZFBHj83VXkzFyLuCIeui9vo3cQ00K7bBrNUSw1BAdqT5HTzW
 K2R4jSQaszjw8mDz3R3G1+yg6PjMS6cDEU1+G2Id7xSYTV3lJnBDVzas7aEUNCC4
 LzIrD5c4dscyZzIjAp9huVwpZoCNLy6jtecRTaGhA2YiE0VMWtJlMJHwbShlSnM7
 5VhEn859namvoYtN8XBaTFa/jRDOxO+LHWuOy172oaBUgaVHBjZQLyrlit1FRQvT
 SVruCmgtJ7u7RD/8uVDfPNR05DTSWQYzklJoKx2avKZj5FIx7ms=
 =/6Ue
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: pvdump and selftest improvements

- add an interface to provide a hypervisor dump for secure guests
- improve selftests to show tests
2022-06-07 12:28:53 -04:00
Linus Torvalds
1ec6574a3c This set of changes updates init and user mode helper tasks to be
ordinary user mode tasks.
 
 In commit 40966e316f ("kthread: Ensure struct kthread is present for
 all kthreads") caused init and the user mode helper threads that call
 kernel_execve to have struct kthread allocated for them.  This struct
 kthread going away during execve in turned made a use after free of
 struct kthread possible.
 
 The commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for
 init and umh") is enough to fix the use after free and is simple enough
 to be backportable.
 
 The rest of the changes pass struct kernel_clone_args to clean things
 up and cause the code to make sense.
 
 In making init and the user mode helpers tasks purely user mode tasks
 I ran into two complications.  The function task_tick_numa was
 detecting tasks without an mm by testing for the presence of
 PF_KTHREAD.  The initramfs code in populate_initrd_image was using
 flush_delayed_fput to ensuere the closing of all it's file descriptors
 was complete, and flush_delayed_fput does not work in a userspace thread.
 
 I have looked and looked and more complications and in my code review
 I have not found any, and neither has anyone else with the code sitting
 in linux-next.
 
 Link: https://lkml.kernel.org/r/87mtfu4up3.fsf@email.froward.int.ebiederm.org
 
 Eric W. Biederman (8):
       kthread: Don't allocate kthread_struct for init and umh
       fork: Pass struct kernel_clone_args into copy_thread
       fork: Explicity test for idle tasks in copy_thread
       fork: Generalize PF_IO_WORKER handling
       init: Deal with the init process being a user mode process
       fork: Explicitly set PF_KTHREAD
       fork: Stop allowing kthreads to call execve
       sched: Update task_tick_numa to ignore tasks without an mm
 
  arch/alpha/kernel/process.c      | 13 ++++++------
  arch/arc/kernel/process.c        | 13 ++++++------
  arch/arm/kernel/process.c        | 12 ++++++-----
  arch/arm64/kernel/process.c      | 12 ++++++-----
  arch/csky/kernel/process.c       | 15 ++++++-------
  arch/h8300/kernel/process.c      | 10 ++++-----
  arch/hexagon/kernel/process.c    | 12 ++++++-----
  arch/ia64/kernel/process.c       | 15 +++++++------
  arch/m68k/kernel/process.c       | 12 ++++++-----
  arch/microblaze/kernel/process.c | 12 ++++++-----
  arch/mips/kernel/process.c       | 13 ++++++------
  arch/nios2/kernel/process.c      | 12 ++++++-----
  arch/openrisc/kernel/process.c   | 12 ++++++-----
  arch/parisc/kernel/process.c     | 18 +++++++++-------
  arch/powerpc/kernel/process.c    | 15 +++++++------
  arch/riscv/kernel/process.c      | 12 ++++++-----
  arch/s390/kernel/process.c       | 12 ++++++-----
  arch/sh/kernel/process_32.c      | 12 ++++++-----
  arch/sparc/kernel/process_32.c   | 12 ++++++-----
  arch/sparc/kernel/process_64.c   | 12 ++++++-----
  arch/um/kernel/process.c         | 15 +++++++------
  arch/x86/include/asm/fpu/sched.h |  2 +-
  arch/x86/include/asm/switch_to.h |  8 +++----
  arch/x86/kernel/fpu/core.c       |  4 ++--
  arch/x86/kernel/process.c        | 18 +++++++++-------
  arch/xtensa/kernel/process.c     | 17 ++++++++-------
  fs/exec.c                        |  8 ++++---
  include/linux/sched/task.h       |  8 +++++--
  init/initramfs.c                 |  2 ++
  init/main.c                      |  2 +-
  kernel/fork.c                    | 46 +++++++++++++++++++++++++++++++++-------
  kernel/sched/fair.c              |  2 +-
  kernel/umh.c                     |  6 +++---
  33 files changed, 234 insertions(+), 160 deletions(-)
 
 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaR/MACgkQC/v6Eiaj
 j0Aayg/7Bx66872d9c6igkJ+MPCTuh+v9QKCGwiYEmiU4Q5sVAFB0HPJO27qC14u
 630X0RFNZTkPzNNEJNIW4kw6Dj8s8YRKf+FgQAVt4SzdRwT7eIPDjk1nGraopPJ3
 O04pjvuTmUyidyViRyFcf2ptx/pnkrwP8jUSc+bGTgfASAKAgAokqKE5ecjewbBc
 Y/EAkQ6QW7KxPjeSmpAHwI+t3BpBev9WEC4PbhRhsBCQFO2+PJiklvqdhVNBnIjv
 qUezll/1xv9UYgniB15Q4Nb722SmnWSU3r8as1eFPugzTHizKhufrrpyP+KMK1A0
 tdtEJNs5t2DZF7ZbGTFSPqJWmyTYLrghZdO+lOmnaSjHxK4Nda1d4NzbefJ0u+FE
 tutewowvHtBX6AFIbx+H3O+DOJM2IgNMf+ReQDU/TyNyVf3wBrTbsr9cLxypIJIp
 zze8npoLMlB7B4yxVo5ES5e63EXfi3iHl0L3/1EhoGwriRz1kWgVLUX/VZOUpscL
 RkJHsW6bT8sqxPWAA5kyWjEN+wNR2PxbXi8OE4arT0uJrEBMUgDCzydzOv5tJB00
 mSQdytxH9LVdsmxBKAOBp5X6WOLGA4yb1cZ6E/mEhlqXMpBDF1DaMfwbWqxSYi4q
 sp5zU3SBAW0qceiZSsWZXInfbjrcQXNV/DkDRDO9OmzEZP4m1j0=
 =x6fy
 -----END PGP SIGNATURE-----

Merge tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull kthread updates from Eric Biederman:
 "This updates init and user mode helper tasks to be ordinary user mode
  tasks.

  Commit 40966e316f ("kthread: Ensure struct kthread is present for
  all kthreads") caused init and the user mode helper threads that call
  kernel_execve to have struct kthread allocated for them. This struct
  kthread going away during execve in turned made a use after free of
  struct kthread possible.

  Here, commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for
  init and umh") is enough to fix the use after free and is simple
  enough to be backportable.

  The rest of the changes pass struct kernel_clone_args to clean things
  up and cause the code to make sense.

  In making init and the user mode helpers tasks purely user mode tasks
  I ran into two complications. The function task_tick_numa was
  detecting tasks without an mm by testing for the presence of
  PF_KTHREAD. The initramfs code in populate_initrd_image was using
  flush_delayed_fput to ensuere the closing of all it's file descriptors
  was complete, and flush_delayed_fput does not work in a userspace
  thread.

  I have looked and looked and more complications and in my code review
  I have not found any, and neither has anyone else with the code
  sitting in linux-next"

* tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  sched: Update task_tick_numa to ignore tasks without an mm
  fork: Stop allowing kthreads to call execve
  fork: Explicitly set PF_KTHREAD
  init: Deal with the init process being a user mode process
  fork: Generalize PF_IO_WORKER handling
  fork: Explicity test for idle tasks in copy_thread
  fork: Pass struct kernel_clone_args into copy_thread
  kthread: Don't allocate kthread_struct for init and umh
2022-06-03 16:03:05 -07:00
Linus Torvalds
4ab6cfc4ad more s390 updates for 5.19 merge window
- Add Eric Farman as maintainer for s390 virtio drivers.
 
 - Improve machine check handling, and avoid incorrectly injecting a machine
   check into a kvm guest.
 
 - Add cond_resched() call to gmap page table walker in order to avoid
   possible huge latencies. Also use non-quiesing sske instruction to speed
   up storage key handling.
 
 - Add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP so s390 behaves similar like
   common code.
 
 - Get sie control block address from correct stack slot in perf event
   code. This fixes potential random memory accesses.
 
 - Change uaccess code so that the exception handler sets the result of
   get_user() and __get_kernel_nofault() to zero in case of a fault. Until
   now this was done via input parameters for inline assemblies. Doing it
   via fault handling is what most or even all other architectures are
   doing.
 
 - Couple of other small cleanups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmKZ2IMACgkQIg7DeRsp
 bsIXgg/+JeKOFIfwWWZuk2/2drITjQxWL6OwAV/vHVL9bYVGGRgFHTfF7Qslr2Hb
 j/kh55hajjMt00bVS6P52I7cC+xB6PcxPK4jF+u2FXcL187QnWZg7VD7qkFO3E7Q
 G5jiWub0eNfd3ijytzSO1yLwv3Rh6GEIOi7lRkk88Fe2B9l0AaXbvnr7rrIoG1SS
 TCPUoCCNKEH+xPmujdN5B6CDK2ldukcPZHtAJ9Qxu6DWAWIxh+hHr/c1zW9/7kDj
 Vogc3gcgApeXTsMZu38c2tFiv6wxvg37cMa0EW+l5zkyeFn+a1CLSxA/qPi0N1UY
 pFcxrlWXWshr7vKn16lJoCe1nBVyfb9ohToLKQtnWB7RZ86lP79tVjOpiQ4qi+jl
 54yx/rdtycEDaC0w4ab5kfZPc9/EeaY4ppaOLjh4+r/Ve3x+EMcAZ9Q2Ei3ltRBy
 75kswnRvHDWMbtS8rNecdk29QNRvLpnzpGLWQ4wGCV7V9FzTtJwO5mdjuibhSoI8
 9dZw4/HGMeA2P1pAE/d1YqKbdgqlNzDyU1dYpk6PVBI0I9mUu36eZXQnjrBN++ki
 8TWQYkBv6vnUHJmkfEK7B/thYwljzlQJSje4ebj0aPWzBJ9An+U5ozoANlAu97MR
 /EVZM67snrT6aSOjhF6SNjWqMGBOlMVecF+GCsVrFEeah+qrYq4=
 =BOi4
 -----END PGP SIGNATURE-----

Merge tag 's390-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:
 "Just a couple of small improvements, bug fixes and cleanups:

   - Add Eric Farman as maintainer for s390 virtio drivers.

   - Improve machine check handling, and avoid incorrectly injecting a
     machine check into a kvm guest.

   - Add cond_resched() call to gmap page table walker in order to avoid
     possible huge latencies. Also use non-quiesing sske instruction to
     speed up storage key handling.

   - Add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP so s390 behaves
     similar like common code.

   - Get sie control block address from correct stack slot in perf event
     code. This fixes potential random memory accesses.

   - Change uaccess code so that the exception handler sets the result
     of get_user() and __get_kernel_nofault() to zero in case of a
     fault. Until now this was done via input parameters for inline
     assemblies. Doing it via fault handling is what most or even all
     other architectures are doing.

   - Couple of other small cleanups and fixes"

* tag 's390-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/stack: add union to reflect kvm stack slot usages
  s390/stack: merge empty stack frame slots
  s390/uaccess: whitespace cleanup
  s390/uaccess: use __noreturn instead of __attribute__((noreturn))
  s390/uaccess: use exception handler to zero result on get_user() failure
  s390/uaccess: use symbolic names for inline assembler operands
  s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flag
  s390/mm: use non-quiescing sske for KVM switch to keyed guest
  s390/gmap: voluntarily schedule during key setting
  MAINTAINERS: Update s390 virtio-ccw
  s390/kexec: add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP
  s390/Kconfig.debug: fix indentation
  s390/Kconfig: fix indentation
  s390/perf: obtain sie_block from the right address
  s390: generate register offsets into pt_regs automatically
  s390: simplify early program check handler
  s390/crypto: fix scatterwalk_unmap() callers in AES-GCM
2022-06-03 13:57:50 -07:00
Janosch Frank
38c218259d s390/uv: Add dump fields to query
The new dump feature requires us to know how much memory is needed for
the "dump storage state" and "dump finalize" ultravisor call. These
values are reported via the UV query call.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-3-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-3-frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-06-01 16:57:14 +02:00
Janosch Frank
ac640db3a0 s390/uv: Add SE hdr query information
We have information about the supported se header version and pcf bits
so let's expose it via the sysfs files.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-2-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-2-frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-06-01 16:57:14 +02:00
Heiko Carstens
e0ffcf3fe1 s390/stack: add union to reflect kvm stack slot usages
Add a union which describes how the empty stack slots are being used
by kvm and perf. This should help to avoid another bug like the one
which was fixed with commit c9bfb460c3 ("s390/perf: obtain sie_block
from the right address").

Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Tested-by: Nico Boehr <nrb@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-06-01 12:03:17 +02:00
Heiko Carstens
f037acb41d s390/stack: merge empty stack frame slots
Merge empty1 and empty2 arrays within the stack frame to one single
array. This is possible since with commit 42b01a553a ("s390: always
use the packed stack layout") the alternative stack frame layout is
gone.

Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-06-01 12:03:17 +02:00
Alexander Gordeev
29ccaa4b35 s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flag
Commit d768bd892f ("s390: add options to change branch prediction
behaviour for the kernel") introduced .Lsie_exit label - supposedly
to fence off SIE instruction. However, the corresponding address
range length .Lsie_crit_mcck_length was not updated, which led to
BPON code potentionally marked with CIF_MCCK_GUEST flag.

Both .Lsie_exit and .Lsie_crit_mcck_length were removed with commit
0b0ed657fe ("s390: remove critical section cleanup from entry.S"),
but the issue persisted - currently BPOFF and BPENTER macros might
get wrongly considered by the machine check handler as a guest.

Fixes: d768bd892f ("s390: add options to change branch prediction behaviour for the kernel")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-06-01 12:03:16 +02:00
Linus Torvalds
76bfd3de34 tracing updates for 5.19:
- The majority of the changes are for fixes and clean ups.
 
 Noticeable changes:
 
 - Rework trace event triggers code to be easier to interact with.
 
 - Support for embedding bootconfig with the kernel (as suppose to having it
   embedded in initram). This is useful for embedded boards without initram
   disks.
 
 - Speed up boot by parallelizing the creation of tracefs files.
 
 - Allow absolute ring buffer timestamps handle timestamps that use more than
   59 bits.
 
 - Added new tracing clock "TAI" (International Atomic Time)
 
 - Have weak functions show up in available_filter_function list as:
    __ftrace_invalid_address___<invalid-offset>
   instead of using the name of the function before it.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYpOgXRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjkKAQDbpemxvpFyJlZqT8KgEIXubu+ag2/q
 p0XDHaPS0zF9OQEAjTxg6GMEbnFYl6fzxZtOoEbiaQ7ppfdhRI8t6sSMVA8=
 =+nDD
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The majority of the changes are for fixes and clean ups.

  Notable changes:

   - Rework trace event triggers code to be easier to interact with.

   - Support for embedding bootconfig with the kernel (as suppose to
     having it embedded in initram). This is useful for embedded boards
     without initram disks.

   - Speed up boot by parallelizing the creation of tracefs files.

   - Allow absolute ring buffer timestamps handle timestamps that use
     more than 59 bits.

   - Added new tracing clock "TAI" (International Atomic Time)

   - Have weak functions show up in available_filter_function list as:
     __ftrace_invalid_address___<invalid-offset> instead of using the
     name of the function before it"

* tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (52 commits)
  ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
  tracing: Fix comments for event_trigger_separate_filter()
  x86/traceponit: Fix comment about irq vector tracepoints
  x86,tracing: Remove unused headers
  ftrace: Clean up hash direct_functions on register failures
  tracing: Fix comments of create_filter()
  tracing: Disable kcov on trace_preemptirq.c
  tracing: Initialize integer variable to prevent garbage return value
  ftrace: Fix typo in comment
  ftrace: Remove return value of ftrace_arch_modify_*()
  tracing: Cleanup code by removing init "char *name"
  tracing: Change "char *" string form to "char []"
  tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ
  tracing/timerlat: Print stacktrace in the IRQ handler if needed
  tracing/timerlat: Notify IRQ new max latency only if stop tracing is set
  kprobes: Fix build errors with CONFIG_KRETPROBES=n
  tracing: Fix return value of trace_pid_write()
  tracing: Fix potential double free in create_var_ref()
  tracing: Use strim() to remove whitespace instead of doing it manually
  ftrace: Deal with error return code of the ftrace_process_locs() function
  ...
2022-05-29 10:31:36 -07:00
Linus Torvalds
6f664045c8 Not a lot of material this cycle. Many singleton patches against various
subsystems.   Most notably some maintenance work in ocfs2 and initramfs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo/6xQAKCRDdBJ7gKXxA
 jkD9AQCPczLBbRWpe1edL+5VHvel9ePoHQmvbHQnufdTh9rB5QEAu0Uilxz4q9cx
 xSZypNhj2n9f8FCYca/ZrZneBsTnAA8=
 =AJEO
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
 "The non-MM patch queue for this merge window.

  Not a lot of material this cycle. Many singleton patches against
  various subsystems. Most notably some maintenance work in ocfs2
  and initramfs"

* tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits)
  kcov: update pos before writing pc in trace function
  ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
  ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
  fs/ntfs: remove redundant variable idx
  fat: remove time truncations in vfat_create/vfat_mkdir
  fat: report creation time in statx
  fat: ignore ctime updates, and keep ctime identical to mtime in memory
  fat: split fat_truncate_time() into separate functions
  MAINTAINERS: add Muchun as a memcg reviewer
  proc/sysctl: make protected_* world readable
  ia64: mca: drop redundant spinlock initialization
  tty: fix deadlock caused by calling printk() under tty_port->lock
  relay: remove redundant assignment to pointer buf
  fs/ntfs3: validate BOOT sectors_per_clusters
  lib/string_helpers: fix not adding strarray to device's resource list
  kernel/crash_core.c: remove redundant check of ck_cmdline
  ELF, uapi: fixup ELF_ST_TYPE definition
  ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
  ipc: update semtimedop() to use hrtimer
  ipc/sem: remove redundant assignments
  ...
2022-05-27 11:22:03 -07:00
Li kunyu
3a2bfec0b0 ftrace: Remove return value of ftrace_arch_modify_*()
All instances of the function ftrace_arch_modify_prepare() and
  ftrace_arch_modify_post_process() return zero. There's no point in
  checking their return value. Just have them be void functions.

Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com

Signed-off-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Nico Boehr
c9bfb460c3 s390/perf: obtain sie_block from the right address
Since commit 1179f170b6 ("s390: fix fpu restore in entry.S"), the
sie_block pointer is located at empty1[1], but in sie_block() it was
taken from empty1[0].

This leads to a random pointer being dereferenced, possibly causing
system crash.

This problem can be observed when running a simple guest with an endless
loop and recording the cpu-clock event:

  sudo perf kvm --guestvmlinux=<guestkernel> --guest top -e cpu-clock

With this fix, the correct guest address is shown.

Fixes: 1179f170b6 ("s390: fix fpu restore in entry.S")
Cc: stable@vger.kernel.org
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-25 11:46:02 +02:00
Heiko Carstens
3384f135e9 s390: generate register offsets into pt_regs automatically
Use asm offsets method to generate register offsets into pt_regs,
instead of open-coding at several places.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-25 11:46:02 +02:00
Heiko Carstens
85806016ac s390: simplify early program check handler
Due to historic reasons the base program check handler calls a
configurable function. Given that there is only the early program
check handler left, simplify the code by directly calling that
function.

The only other user was removed with commit d485235b00 ("s390:
assume diag308 set always works").

Also rename all functions and the asm file to reflect this.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-25 11:46:02 +02:00
Heiko Carstens
4c25f0ff63 s390/entry: workaround llvm's IAS limitations
llvm's integrated assembler cannot handle immediate values which are
calculated with two local labels:

<instantiation>:3:13: error: invalid operand for instruction
 clgfi %r14,.Lsie_done - .Lsie_gmap

Workaround this by adding clang specific code which reads the specific
value from memory. Since this code is within the hot paths of the kernel
and adds an additional memory reference, keep the original code, and add
ifdef'ed code.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20220511120532.2228616-5-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-17 15:16:28 +02:00
Heiko Carstens
e6ed91fd07 s390/alternatives: remove padding generation code
clang fails to handle ".if" statements in inline assembly which are heavily
used in the alternatives code.

To work around this remove this code, and enforce that users of
alternatives must specify original and alternative instruction sequences
which have identical sizes. Add a compile time check with two ".org"
statements similar to arm64.

In result not only clang can handle this, but also quite a lot of code can
be removed.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1356
Link: https://lore.kernel.org/r/20220511120532.2228616-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-17 15:16:28 +02:00
Heiko Carstens
fad442d3ab s390/alternatives: provide identical sized orginal/alternative sequences
Explicitly provide identical sized original/alternative instruction
sequences. This way there is no need for the s390 specific alternatives
infrastructure to generate padding sequences.
The code which generates such sequences will be removed with a follow on
patch.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220511120532.2228616-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-17 15:16:28 +02:00
Thomas Richter
c9311de716 s390/cpumf: add new extended counter set for IBM z16
Export the extended counter set counters of the IBM z16 via sysfs.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-16 10:58:33 +02:00
Sven Schnelle
5ace65ebb5 s390/stp: clock_delta should be signed
clock_delta is declared as unsigned long in various places. However,
the clock sync delta can be negative. This would add a huge positive
offset in clock_sync_global where clock_delta is added to clk.eitod
which is a 72 bit integer. Declare it as signed long to fix this.

Cc: stable@vger.kernel.org
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-11 14:40:57 +02:00
Sven Schnelle
03780c83c7 s390/stp: fix todoff size
The size of the TOD offset field in the stp info response is 64 bits.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-11 14:40:57 +02:00
Thomas Richter
39d62336f5 s390/pai: add support for cryptography counters
PMU device driver perf_pai_crypto supports Processor Activity
Instrumentation (PAI), available with IBM z16:
- maps a full page to lowcore address 0x1500.
- uses CR0 bit 13 to turn PAI crypto counting on and off.
- creates a sample with raw data on each context switch out when
  at context switch some mapped counters have a value of nonzero.
This device driver only supports CPU wide context, no task context
is allowed.

Support for counting:
- one or more counters can be specified using
  perf stat -e pai_crypto/xxx/
  where xxx stands for the counter event name. Multiple invocation
  of this command is possible. The counter names are listed in
  /sys/devices/pai_crypto/events directory.
- one special counters can be specified using
  perf stat -e pai_crypto/CRYPTO_ALL/
  which returns the sum of all incremented crypto counters.
- one event pai_crypto/CRYPTO_ALL/ is reserved for sampling.
  No multiple invocations are possible. The event collects data at
  context switch out and saves them in the ring buffer.

Add qpaci assembly instruction to query supported memory mapped crypto
counters. It returns the number of counters (no holes allowed in that
range).

The PAI crypto counter events are system wide and can not be executed
in parallel. Therefore some restrictions documented in function
paicrypt_busy apply.
In particular event CRYPTO_ALL for sampling must run exclusive.
Only counting events can run in parallel.

PAI crypto counter events can not be created when a CPU hot plug
add is processed. This means a CPU hot plug add does not get
the necessary PAI event to record PAI cryptography counter increments
on the newly added CPU. CPU hot plug remove removes the event and
terminates the counting of PAI counters immediately.

Co-developed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220504062351.2954280-3-tmricht@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-09 11:50:01 +02:00
Eric W. Biederman
5bd2e97c86 fork: Generalize PF_IO_WORKER handling
Add fn and fn_arg members into struct kernel_clone_args and test for
them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER).
This allows any task that wants to be a user space task that only runs
in kernel mode to use this functionality.

The code on x86 is an exception and still retains a PF_KTHREAD test
because x86 unlikely everything else handles kthreads slightly
differently than user space tasks that start with a function.

The functions that created tasks that start with a function
have been updated to set ".fn" and ".fn_arg" instead of
".stack" and ".stack_size".  These functions are fork_idle(),
create_io_thread(), kernel_thread(), and user_mode_thread().

Link: https://lkml.kernel.org/r/20220506141512.516114-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-07 09:01:59 -05:00
Eric W. Biederman
c5febea095 fork: Pass struct kernel_clone_args into copy_thread
With io_uring we have started supporting tasks that are for most
purposes user space tasks that exclusively run code in kernel mode.

The kernel task that exec's init and tasks that exec user mode
helpers are also user mode tasks that just run kernel code
until they call kernel execve.

Pass kernel_clone_args into copy_thread so these oddball
tasks can be supported more cleanly and easily.

v2: Fix spelling of kenrel_clone_args on h8300
Link: https://lkml.kernel.org/r/20220506141512.516114-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-07 09:01:48 -05:00
Heiko Carstens
fcdc03f78d s390/compat: cleanup compat_linux.h header file
Remove various declarations from former s390 specific compat system
calls which have been removed with commit fef747bab3 ("s390: use
generic UID16 implementation"). While at it clean up the whole small
header file.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-06 20:45:16 +02:00
Heiko Carstens
29b06ad7e8 s390/entry: remove broken and not needed code
LLVM's integrated assembler reports the following error when compiling
entry.S:

<instantiation>:38:5: error: unknown token in expression
 tm %r8,0x0001 # coming from user space?

The correct instruction would have been tmhh instead of tm.
The current code is doing nothing, since (with gas) it get's
translated to a tm instruction which reads from real address 8, which
again contains always zero, and therefore the conditional code is
never executed.
Note that due to the missing displacement gas translates "%r8" into
"8(%r0)".

Also code inspection reveals that this conditional code is not needed.
Therefore remove it.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-06 20:45:15 +02:00
Heiko Carstens
67a9c428ef s390/ptrace: move short psw definitions to ptrace header file
The short psw definitions are contained in compat header files, however
short psws are not compat specific. Therefore move the definitions to
ptrace header file. This also gets rid of a compat header include in kvm
code.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-06 20:45:15 +02:00
Matthew Wilcox (Oracle)
5d8de293c2 vmcore: convert copy_oldmem_page() to take an iov_iter
Patch series "Convert vmcore to use an iov_iter", v5.

For some reason several people have been sending bad patches to fix
compiler warnings in vmcore recently.  Here's how it should be done. 
Compile-tested only on x86.  As noted in the first patch, s390 should take
this conversion a bit further, but I'm not inclined to do that work
myself.


This patch (of 3):

Instead of passing in a 'buf' and 'userbuf' argument, pass in an iov_iter.
s390 needs more work to pass the iov_iter down further, or refactor, but
I'd be more comfortable if someone who can test on s390 did that work.

It's more convenient to convert the whole of read_from_oldmem() to take an
iov_iter at the same time, so rename it to read_from_oldmem_iter() and add
a temporary read_from_oldmem() wrapper that creates an iov_iter.

Link: https://lkml.kernel.org/r/20220408090636.560886-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20220408090636.560886-2-bhe@redhat.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:37:59 -07:00
Pingfan Liu
6260f6427c s390/irq: utilize RCU instead of irq_lock_sparse() in show_msi_interrupt()
As demonstrated by commit 74bdf7815d ("genirq: Speedup
show_interrupts()"), irq_desc can be accessed safely in RCU read section.

Hence here resorting to rcu read lock to get rid of irq_lock_sparse().

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Link: https://lore.kernel.org/r/20220422100212.22666-1-kernelfans@gmail.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-27 12:53:34 +02:00
Sven Schnelle
41cd81abaf s390/vdso: add vdso randomization
Randomize the address of vdso if randomize_va_space is enabled.
Note that this keeps the vdso address on the same PMD as the stack
to avoid allocating an extra page table just for vdso.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-25 13:54:15 +02:00
Sven Schnelle
9e37a2e854 s390/vdso: map vdso above stack
In the current code vdso is mapped below the stack. This is
problematic when programs mapped to the top of the address space
are allocating a lot of memory, because the heap will clash with
the vdso. To avoid this map the vdso above the stack and move
STACK_TOP so that it all fits into three level paging.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-25 13:54:14 +02:00
Sven Schnelle
57761da4dc s390/vdso: move vdso mapping to its own function
This is a preparation patch for adding vdso randomization to s390.
It adds a function vdso_size(), which will be used later in calculating
the STACK_TOP value. It also moves the vdso mapping into a new function
vdso_map(), to keep the code similar to other architectures.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-25 13:54:14 +02:00
Alexander Gordeev
7714e16f79 s390/smp: sort out physical vs virtual CPU0 lowcore pointer
SPX instruction called from set_prefix() expects physical
address of the lowcore to be installed, but instead the
virtual address is passed.

Note: this does not fix a bug currently, since virtual and
physical addresses are identical.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-25 13:54:13 +02:00
Alexander Egorenkov
2ba24343bd s390/kexec: set end-of-ipl flag in last diag308 call
If the facility IPL-complete-control is present then the last diag308
call made by kexec shall set the end-of-ipl flag in the subcode register
to signal the hypervisor that this is the last diag308 call made by Linux.
Only the diag308 calls made during a regular kexec need to set
the end-of-ipl flag, in all other cases the hypervisor will ignore it.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-25 13:54:12 +02:00
Heiko Carstens
711136bb66 s390/kexec: silence -Warray-bounds warning
Just use absolute_pointer() like e.g. in commit 545c272232 ("alpha:
Silence -Warray-bounds warnings") to get rid of this warning:

arch/s390/kernel/machine_kexec.c:59:9: warning: ‘memcpy’ offset [0, 511] is out of the bounds [0, 0] [-Warray-bounds]

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-07 12:36:54 +02:00
Heiko Carstens
6203ac3029 s390: add z16 elf platform
Add detection for machine types 0x3931 and 0x3932 and set ELF platform
name to z16.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-06 13:08:50 +02:00
Linus Torvalds
9ae24d5aa0 s390 updates for the 5.18 merge window #2
- Add kretprobes framepointer verification and return address recovery
   in stacktrace.
 
 - Support control domain masks on custom zcrypt devices and filter admin
   requests.
 
 - Cleanup timer API usage.
 
 - Rework absolute lowcore access helpers.
 
 - Other various small improvements and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmJG+10ACgkQjYWKoQLX
 FBjvOggAhBOcGniD4LDvyPw6Mwp6x3+bnSL0Hv+UdjyTXEsj2/XpSQeAsG807mq0
 hKicEY44suVYrk6ywLCuG9tA6oaxr+t6g13Z/cuNqvdljMDFdeZw9ptIeAXfoUQb
 19fxDpVyAV9iWc9VfcjwhgOXep9fk6dPDpr5clE81zuAgn94hV31tHpCbMIYDxHa
 mceza1rdW5DnByCz2afYcgF72pW2hnLadOPWddbuqyVZV6jjJeZDDShAzf5MCJ6p
 e91yJh6RYDT1f1Yn+htz2Bw8URb9FKRnJRMHf35h98kHT3r0x6N6VUjYiQ66CAjE
 k02XBdJIwgXwgGtErj6lZsQZFjzT8g==
 =5yyr
 -----END PGP SIGNATURE-----

Merge tag 's390-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Add kretprobes framepointer verification and return address recovery
   in stacktrace.

 - Support control domain masks on custom zcrypt devices and filter
   admin requests.

 - Cleanup timer API usage.

 - Rework absolute lowcore access helpers.

 - Other various small improvements and fixes.

* tag 's390-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (26 commits)
  s390/alternatives: avoid using jgnop mnemonic
  s390/pci: rename get_zdev_by_bus() to zdev_from_bus()
  s390/pci: improve zpci_dev reference counting
  s390/smp: use physical address for SIGP_SET_PREFIX command
  s390: cleanup timer API use
  s390/zcrypt: fix using the correct variable for sizeof()
  s390/vfio-ap: fix kernel doc and signature of group notifier functions
  s390/maccess: rework absolute lowcore accessors
  s390/smp: cleanup control register update routines
  s390/smp: cleanup target CPU callback starting
  s390/test_unwind: verify __kretprobe_trampoline is replaced
  s390/unwind: avoid duplicated unwinding entries for kretprobes
  s390/unwind: recover kretprobe modified return address in stacktrace
  s390/kprobes: enable kretprobes framepointer verification
  s390/test_unwind: extend kretprobe test
  s390/ap: adjust whitespace
  s390/ap: use insn format for new instructions
  s390/alternatives: use insn format for new instructions
  s390/alternatives: use instructions instead of byte patterns
  s390/traps: improve panic message for translation-specification exception
  ...
2022-04-01 13:26:31 -07:00
Linus Torvalds
b8321ed4a4 Kbuild updates for v5.18
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
    additional flags to be passed to user-space programs.
 
  - Fix missing fflush() bugs in Kconfig and fixdep
 
  - Fix a minor bug in the comment format of the .config file
 
  - Make kallsyms ignore llvm's local labels, .L*
 
  - Fix UAPI compile-test for cross-compiling with Clang
 
  - Extend the LLVM= syntax to support LLVM=<suffix> form for using a
    particular version of LLVm, and LLVM=<prefix> form for using custom
    LLVM in a particular directory path.
 
  - Clean up Makefiles
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmJFGloVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGH0kP/j6Vx5BqEv3tP2Q+UANxLqITleJs
 IFpbSesz/BhlG7I/IapWmCDSqFbYd5uJTO4ko8CsPmZHcxr6Gw3y+DN5yQACKaG/
 p9xiF6GjPyKR8+VdcT2tV50+dVY8ANe/DxCyzKrJd/uyYxgARPKJh0KRMNz+d9lj
 ixUpCXDhx/XlKzPIlcxrvhhjevKz+NnHmN0fe6rzcOw9KzBGBTsf20Q3PqUuBOKa
 rWHsRGcBPA8eKLfWT1Us1jjic6cT2g4aMpWjF20YgUWKHgWVKcNHpxYKGXASVo/z
 ewdDnNfmwo7f7fKMCDDro9iwFWV/BumGtn43U00tnqdBcTpFojPlEOga37UPbZDF
 nmTblGVUhR0vn4PmfBy8WkAkbW+IpVatKwJGV4J3KjSvdWvZOmVj9VUGLVAR0TXW
 /YcgRs6EtG8Hn0IlCj0fvZ5wRWoDLbP2DSZ67R/44EP0GaNQPwUe4FI1izEE4EYX
 oVUAIxcKixWGj4RmdtmtMMdUcZzTpbgS9uloMUmS3u9LK0Ir/8tcWaf2zfMO6Jl2
 p4Q31s1dUUKCnFnj0xDKRyKGUkxYebrHLfuBqi0RIc0xRpSlxoXe3Dynm9aHEQoD
 ZSV0eouQJxnaxM1ck5Bu4AHLgEebHfEGjWVyUHno7jFU5EI9Wpbqpe4pCYEEDTm1
 +LJMEpdZO0dFvpF+
 =84rW
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
   additional flags to be passed to user-space programs.

 - Fix missing fflush() bugs in Kconfig and fixdep

 - Fix a minor bug in the comment format of the .config file

 - Make kallsyms ignore llvm's local labels, .L*

 - Fix UAPI compile-test for cross-compiling with Clang

 - Extend the LLVM= syntax to support LLVM=<suffix> form for using a
   particular version of LLVm, and LLVM=<prefix> form for using custom
   LLVM in a particular directory path.

 - Clean up Makefiles

* tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: Make $(LLVM) more flexible
  kbuild: add --target to correctly cross-compile UAPI headers with Clang
  fixdep: use fflush() and ferror() to ensure successful write to files
  arch: syscalls: simplify uapi/kapi directory creation
  usr/include: replace extra-y with always-y
  certs: simplify empty certs creation in certs/Makefile
  certs: include certs/signing_key.x509 unconditionally
  kallsyms: ignore all local labels prefixed by '.L'
  kconfig: fix missing '# end of' for empty menu
  kconfig: add fflush() before ferror() check
  kbuild: replace $(if A,A,B) with $(or A,B)
  kbuild: Add environment variables for userprogs flags
  kbuild: unify cmd_copy and cmd_shipped
2022-03-31 11:59:03 -07:00
Masahiro Yamada
bbc90bc1bd arch: syscalls: simplify uapi/kapi directory creation
$(shell ...) expands to empty. There is no need to assign it to _dummy.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2022-03-31 12:03:46 +09:00
Linus Torvalds
1930a6e739 ptrace: Cleanups for v5.18
This set of changes removes tracehook.h, moves modification of all of
 the ptrace fields inside of siglock to remove races, adds a missing
 permission check to ptrace.c
 
 The removal of tracehook.h is quite significant as it has been a major
 source of confusion in recent years.  Much of that confusion was
 around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
 making the semantics clearer).
 
 For people who don't know tracehook.h is a vestiage of an attempt to
 implement uprobes like functionality that was never fully merged, and
 was later superseeded by uprobes when uprobes was merged.  For many
 years now we have been removing what tracehook functionaly a little
 bit at a time.  To the point where now anything left in tracehook.h is
 some weird strange thing that is difficult to understand.
 
 Eric W. Biederman (15):
       ptrace: Move ptrace_report_syscall into ptrace.h
       ptrace/arm: Rename tracehook_report_syscall report_syscall
       ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
       ptrace: Remove arch_syscall_{enter,exit}_tracehook
       ptrace: Remove tracehook_signal_handler
       task_work: Remove unnecessary include from posix_timers.h
       task_work: Introduce task_work_pending
       task_work: Call tracehook_notify_signal from get_signal on all architectures
       task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
       signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
       resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
       resume_user_mode: Move to resume_user_mode.h
       tracehook: Remove tracehook.h
       ptrace: Move setting/clearing ptrace_message into ptrace_stop
       ptrace: Return the signal to continue with from ptrace_stop
 
 Jann Horn (1):
       ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
 
 Yang Li (1):
       ptrace: Remove duplicated include in ptrace.c
 
  MAINTAINERS                          |   1 -
  arch/Kconfig                         |   5 +-
  arch/alpha/kernel/ptrace.c           |   5 +-
  arch/alpha/kernel/signal.c           |   4 +-
  arch/arc/kernel/ptrace.c             |   5 +-
  arch/arc/kernel/signal.c             |   4 +-
  arch/arm/kernel/ptrace.c             |  12 +-
  arch/arm/kernel/signal.c             |   4 +-
  arch/arm64/kernel/ptrace.c           |  14 +--
  arch/arm64/kernel/signal.c           |   4 +-
  arch/csky/kernel/ptrace.c            |   5 +-
  arch/csky/kernel/signal.c            |   4 +-
  arch/h8300/kernel/ptrace.c           |   5 +-
  arch/h8300/kernel/signal.c           |   4 +-
  arch/hexagon/kernel/process.c        |   4 +-
  arch/hexagon/kernel/signal.c         |   1 -
  arch/hexagon/kernel/traps.c          |   6 +-
  arch/ia64/kernel/process.c           |   4 +-
  arch/ia64/kernel/ptrace.c            |   6 +-
  arch/ia64/kernel/signal.c            |   1 -
  arch/m68k/kernel/ptrace.c            |   5 +-
  arch/m68k/kernel/signal.c            |   4 +-
  arch/microblaze/kernel/ptrace.c      |   5 +-
  arch/microblaze/kernel/signal.c      |   4 +-
  arch/mips/kernel/ptrace.c            |   5 +-
  arch/mips/kernel/signal.c            |   4 +-
  arch/nds32/include/asm/syscall.h     |   2 +-
  arch/nds32/kernel/ptrace.c           |   5 +-
  arch/nds32/kernel/signal.c           |   4 +-
  arch/nios2/kernel/ptrace.c           |   5 +-
  arch/nios2/kernel/signal.c           |   4 +-
  arch/openrisc/kernel/ptrace.c        |   5 +-
  arch/openrisc/kernel/signal.c        |   4 +-
  arch/parisc/kernel/ptrace.c          |   7 +-
  arch/parisc/kernel/signal.c          |   4 +-
  arch/powerpc/kernel/ptrace/ptrace.c  |   8 +-
  arch/powerpc/kernel/signal.c         |   4 +-
  arch/riscv/kernel/ptrace.c           |   5 +-
  arch/riscv/kernel/signal.c           |   4 +-
  arch/s390/include/asm/entry-common.h |   1 -
  arch/s390/kernel/ptrace.c            |   1 -
  arch/s390/kernel/signal.c            |   5 +-
  arch/sh/kernel/ptrace_32.c           |   5 +-
  arch/sh/kernel/signal_32.c           |   4 +-
  arch/sparc/kernel/ptrace_32.c        |   5 +-
  arch/sparc/kernel/ptrace_64.c        |   5 +-
  arch/sparc/kernel/signal32.c         |   1 -
  arch/sparc/kernel/signal_32.c        |   4 +-
  arch/sparc/kernel/signal_64.c        |   4 +-
  arch/um/kernel/process.c             |   4 +-
  arch/um/kernel/ptrace.c              |   5 +-
  arch/x86/kernel/ptrace.c             |   1 -
  arch/x86/kernel/signal.c             |   5 +-
  arch/x86/mm/tlb.c                    |   1 +
  arch/xtensa/kernel/ptrace.c          |   5 +-
  arch/xtensa/kernel/signal.c          |   4 +-
  block/blk-cgroup.c                   |   2 +-
  fs/coredump.c                        |   1 -
  fs/exec.c                            |   1 -
  fs/io-wq.c                           |   6 +-
  fs/io_uring.c                        |  11 +-
  fs/proc/array.c                      |   1 -
  fs/proc/base.c                       |   1 -
  include/asm-generic/syscall.h        |   2 +-
  include/linux/entry-common.h         |  47 +-------
  include/linux/entry-kvm.h            |   2 +-
  include/linux/posix-timers.h         |   1 -
  include/linux/ptrace.h               |  81 ++++++++++++-
  include/linux/resume_user_mode.h     |  64 ++++++++++
  include/linux/sched/signal.h         |  17 +++
  include/linux/task_work.h            |   5 +
  include/linux/tracehook.h            | 226 -----------------------------------
  include/uapi/linux/ptrace.h          |   2 +-
  kernel/entry/common.c                |  19 +--
  kernel/entry/kvm.c                   |   9 +-
  kernel/exit.c                        |   3 +-
  kernel/livepatch/transition.c        |   1 -
  kernel/ptrace.c                      |  47 +++++---
  kernel/seccomp.c                     |   1 -
  kernel/signal.c                      |  62 +++++-----
  kernel/task_work.c                   |   4 +-
  kernel/time/posix-cpu-timers.c       |   1 +
  mm/memcontrol.c                      |   2 +-
  security/apparmor/domain.c           |   1 -
  security/selinux/hooks.c             |   1 -
  85 files changed, 372 insertions(+), 495 deletions(-)
 
 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
 j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
 Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
 Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
 DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
 0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
 S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
 3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
 /HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
 4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
 eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
 disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
 =uEro
 -----END PGP SIGNATURE-----

Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ptrace cleanups from Eric Biederman:
 "This set of changes removes tracehook.h, moves modification of all of
  the ptrace fields inside of siglock to remove races, adds a missing
  permission check to ptrace.c

  The removal of tracehook.h is quite significant as it has been a major
  source of confusion in recent years. Much of that confusion was around
  task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
  semantics clearer).

  For people who don't know tracehook.h is a vestiage of an attempt to
  implement uprobes like functionality that was never fully merged, and
  was later superseeded by uprobes when uprobes was merged. For many
  years now we have been removing what tracehook functionaly a little
  bit at a time. To the point where anything left in tracehook.h was
  some weird strange thing that was difficult to understand"

* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Remove duplicated include in ptrace.c
  ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
  ptrace: Return the signal to continue with from ptrace_stop
  ptrace: Move setting/clearing ptrace_message into ptrace_stop
  tracehook: Remove tracehook.h
  resume_user_mode: Move to resume_user_mode.h
  resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
  signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
  task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
  task_work: Call tracehook_notify_signal from get_signal on all architectures
  task_work: Introduce task_work_pending
  task_work: Remove unnecessary include from posix_timers.h
  ptrace: Remove tracehook_signal_handler
  ptrace: Remove arch_syscall_{enter,exit}_tracehook
  ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
  ptrace/arm: Rename tracehook_report_syscall report_syscall
  ptrace: Move ptrace_report_syscall into ptrace.h
2022-03-28 17:29:53 -07:00
Alexander Gordeev
7277b4216a s390/smp: use physical address for SIGP_SET_PREFIX command
Signal processor SIGP_SET_PREFIX command expects physical
address of the lowcore to be installed, but instead the
virtual address is provided.

Note: this does not fix a bug currently, since virtual and
physical addresses are identical.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:39 +02:00
Alexander Gordeev
ed0192bc64 s390/maccess: rework absolute lowcore accessors
Macro mem_assign_absolute() is able to access the whole memory, but
is only used and makes sense when updating the absolute lowcore.
Instead, introduce get_abs_lowcore() and put_abs_lowcore() macros
that limit access to absolute lowcore addresses only.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:39 +02:00
Alexander Gordeev
9097fc793f s390/smp: cleanup control register update routines
Get rid of duplicate code and redundant data.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:39 +02:00
Alexander Gordeev
dc2ab23b99 s390/smp: cleanup target CPU callback starting
Macro mem_assign_absolute() is used to initialize a target
CPU lowcore callback parameters. But despite the macro name
it writes to the absolute lowcore only if the target CPU is
offline. In case the CPU is online the macro does implicitly
write to the normal memory.

That behaviour is correct, but extremely subtle. Sacrifice
few program bits in favour of clarity and distinguish between
online vs offline CPUs and normal vs absolute lowcore pointer.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:39 +02:00
Vasily Gorbik
708b137639 s390/unwind: avoid duplicated unwinding entries for kretprobes
Currently when unwinding starts from pt_regs or encounters pt_regs along
the way unwinder tries to yield 2 unwinding entries:
1. (reliable)     ip1: pt_regs->psw.addr,  sp1: regs->gprs[15]
2. (non-reliable) ip2: sp1->gprs[8] (r14), sp2: regs->gprs[15]

In case of kretprobes those are identical and serves no other purpose
than causing confusion over duplicated entries and cause kprobes tests
to fail. So, skip a duplicate non-reliable entry in this case.

With that kretprobes and unwinder implementation now comply with
ARCH_CORRECT_STACKTRACE_ON_KRETPROBE.

Reviewed-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:39 +02:00
Vasily Gorbik
d81675b60d s390/unwind: recover kretprobe modified return address in stacktrace
Based on commit cd9bc2c925 ("arm64: Recover kretprobe modified return
address in stacktrace").

"""
Since the kretprobe replaces the function return address with
the __kretprobe_trampoline on the stack, stack unwinder shows it
instead of the correct return address.

This checks whether the next return address is the
__kretprobe_trampoline(), and if so, try to find the correct
return address from the kretprobe instance list.
"""

Original patch series:
https://lore.kernel.org/all/163163030719.489837.2236069935502195491.stgit@devnote2/

Reviewed-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:39 +02:00
Vasily Gorbik
09bc20c8fb s390/kprobes: enable kretprobes framepointer verification
Use regs->gprs[15] for framepointer verification. This enables
additional sanity checks for nested kretprobes.

Reviewed-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:39 +02:00
Heiko Carstens
6982dba181 s390/alternatives: use insn format for new instructions
Use insn format with instruction format specifier instead of plain
longs. This way it is also more obvious that code instead of data is
generated.

The generated code is identical.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:39 +02:00
Heiko Carstens
f09354ffd8 s390/traps: improve panic message for translation-specification exception
There are many different types of translation exceptions but only a
translation-specification exception leads to a kernel panic since it
indicates corrupted page tables, which must never happen.

Improve the panic message so it is a bit more obvious what this is about.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-27 22:18:38 +02:00
Linus Torvalds
29c8c18363 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "This is the material which was staged after willystuff in linux-next.

  Subsystems affected by this patch series: mm (debug, selftests,
  pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise),
  and selftests"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (113 commits)
  selftests: kselftest framework: provide "finished" helper
  mm: madvise: MADV_DONTNEED_LOCKED
  mm: fix race between MADV_FREE reclaim and blkdev direct IO read
  mm: generalize ARCH_HAS_FILTER_PGPROT
  mm: unmap_mapping_range_tree() with i_mmap_rwsem shared
  mm: warn on deleting redirtied only if accounted
  mm/huge_memory: remove stale locking logic from __split_huge_pmd()
  mm/huge_memory: remove stale page_trans_huge_mapcount()
  mm/swapfile: remove stale reuse_swap_page()
  mm/khugepaged: remove reuse_swap_page() usage
  mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
  mm: streamline COW logic in do_swap_page()
  mm: slightly clarify KSM logic in do_swap_page()
  mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
  mm: optimize do_wp_page() for exclusive pages in the swapcache
  mm/huge_memory: make is_transparent_hugepage() static
  userfaultfd/selftests: enable hugetlb remap and remove event testing
  selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
  mm: enable MADV_DONTNEED for hugetlb mappings
  kasan: disable LOCKDEP when printing reports
  ...
2022-03-25 10:21:20 -07:00
Linus Torvalds
d710d370c4 s390 updates for the 5.18 merge window
- Raise minimum supported machine generation to z10, which comes with
   various cleanups and code simplifications (usercopy/spectre
   mitigation/etc).
 
 - Rework extables and get rid of anonymous out-of-line fixups.
 
 - Page table helpers cleanup. Add set_pXd()/set_pte() helper
   functions. Covert pte_val()/pXd_val() macros to functions.
 
 - Optimize kretprobe handling by avoiding extra kprobe on
   __kretprobe_trampoline.
 
 - Add support for CEX8 crypto cards.
 
 - Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.
 
 - Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
   group sections which simplifies kpatch support.
 
 - Always use the packed stack layout and extend kernel unwinder tests.
 
 - Add sanity checks for ftrace code patching.
 
 - Add s390dbf debug log for the vfio_ap device driver.
 
 - Various virtual vs physical address confusion fixes.
 
 - Various small fixes and improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmI94dsACgkQjYWKoQLX
 FBiaCggAm9xYJ06Qt9c+T9B7aA4Lt50w7Bnxqx1/Q7UHQQgDpkNhKzI1kt/xeKY4
 JgZQ9lJC4YRLlyfIVzffLI2DWGbl8BcTpuRWVLhPI5D2yHZBXr2ARe7IGFJueddy
 MVqU/r+U3H0r3obQeUc4TSrHtSRX7eQZWIoVuDU75b9fCniee/bmGZqs6yXPXXh4
 pTZQ/gsIhF/o6eBJLEXLjUAcIasxCk15GXWXmkaSwKHAhfYiintwGmtKqQ8etCvw
 17vdlTjA4ce+3ooD/hXGPa8TqeiGKsIB2Xr89x/48f1eJyp2zPJZ1ZvAUBHJBCNt
 b4sF4ql8303Lj7Be+LeqdlbXfa5PZg==
 =meZf
 -----END PGP SIGNATURE-----

Merge tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Raise minimum supported machine generation to z10, which comes with
   various cleanups and code simplifications (usercopy/spectre
   mitigation/etc).

 - Rework extables and get rid of anonymous out-of-line fixups.

 - Page table helpers cleanup. Add set_pXd()/set_pte() helper functions.
   Covert pte_val()/pXd_val() macros to functions.

 - Optimize kretprobe handling by avoiding extra kprobe on
   __kretprobe_trampoline.

 - Add support for CEX8 crypto cards.

 - Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.

 - Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
   group sections which simplifies kpatch support.

 - Always use the packed stack layout and extend kernel unwinder tests.

 - Add sanity checks for ftrace code patching.

 - Add s390dbf debug log for the vfio_ap device driver.

 - Various virtual vs physical address confusion fixes.

 - Various small fixes and improvements all over the code.

* tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (69 commits)
  s390/test_unwind: add kretprobe tests
  s390/kprobes: Avoid additional kprobe in kretprobe handling
  s390: convert ".insn" encoding to instruction names
  s390: assume stckf is always present
  s390/nospec: move to single register thunks
  s390: raise minimum supported machine generation to z10
  s390/uaccess: Add copy_from/to_user_key functions
  s390/nospec: align and size extern thunks
  s390/nospec: add an option to use thunk-extern
  s390/nospec: generate single register thunks if possible
  s390/pci: make zpci_set_irq()/zpci_clear_irq() static
  s390: remove unused expoline to BC instructions
  s390/irq: use assignment instead of cast
  s390/traps: get rid of magic cast for per code
  s390/traps: get rid of magic cast for program interruption code
  s390/signal: fix typo in comments
  s390/asm-offsets: remove unused defines
  s390/test_unwind: avoid build warning with W=1
  s390: remove .fixup section
  s390/bpf: encode register within extable entry
  ...
2022-03-25 10:01:34 -07:00
Andrey Konovalov
63840de296 kasan, x86, arm64, s390: rename functions for modules shadow
Rename kasan_free_shadow to kasan_free_module_shadow and
kasan_module_alloc to kasan_alloc_module_shadow.

These functions are used to allocate/free shadow memory for kernel modules
when KASAN_VMALLOC is not enabled.  The new names better reflect their
purpose.

Also reword the comment next to their declaration to improve clarity.

Link: https://lkml.kernel.org/r/36db32bde765d5d0b856f77d2d806e838513fe84.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
David Hildenbrand
2848a28b0a drivers/base/node: consolidate node device subsystem initialization in node_dev_init()
...  and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls.  All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().

This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.

Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.

The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device.  The latter should be the case for the current users of
topology_init().

Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64)
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:10 -07:00
Eric W. Biederman
355f841a3f tracehook: Remove tracehook.h
Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.

Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.

Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-13-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-10 16:51:51 -06:00
Eric W. Biederman
8ba62d3794 task_work: Call tracehook_notify_signal from get_signal on all architectures
Always handle TIF_NOTIFY_SIGNAL in get_signal.  With commit 35d0b389f3
("task_work: unconditionally run task_work from get_signal()") always
calling task_work_run all of the work of tracehook_notify_signal is
already happening except clearing TIF_NOTIFY_SIGNAL.

Factor clear_notify_signal out of tracehook_notify_signal and use it in
get_signal so that get_signal only needs one call of task_work_run.

To keep the semantics in sync update xfer_to_guest_mode_work (which
does not call get_signal) to call tracehook_notify_signal if either
_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL.

Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-8-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-10 16:51:36 -06:00
Tobias Huschle
63bf38ff5b s390/kprobes: Avoid additional kprobe in kretprobe handling
So far, s390 registered a krobe on __kretprobe_trampoline which is
called everytime a kretprobe fires. This kprobe would then determine
the correct return address and adjust the psw accordingly, such that
the kretprobe would branch to the appropriate address after completion.

Some other archs handle kretprobes without such an additional kprobe.
This approach is adopted to s390 with this patch.
Furthermore, the __kretprobe_trampoline now uses an assembler function
to correctly gather the register and psw content to be passed to the
registered kretprobe handler as struct pt_regs. After completion, the
register content and the psw are set based on the contents of said
pt_regs struct.
Note that a change to the psw address in struct pt_regs will not have
an impact, as the probe will still return to the original return
address of the probed function.
The return address is now recovered by using the appropriate function
arch_kretprobe_fixup_return.

The no longer needed kprobe is removed.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-10 15:58:17 +01:00
Vasily Gorbik
731efc9613 s390: convert ".insn" encoding to instruction names
With z10 as minimum supported machine generation many ".insn" encodings
could be now converted to instruction names. There are couple of exceptions
- stfle is used from the als code built for z900 and cannot be converted
- few ".insn" directives encode unsupported instruction formats

The generated code is identical before/after this change.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-10 15:58:17 +01:00
Vasily Gorbik
10bc15ba3a s390: assume stckf is always present
With z10 as minimum supported machine generation the store-clock-fast
facility (25) is always present and checked in als code.
Drop alternatives and always use stckf.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-10 15:58:17 +01:00
Vasily Gorbik
4efd417f29 s390: raise minimum supported machine generation to z10
Machine generations up to z9 (released in May 2006) have been officially
out of service for several years now (z9 end of service - January 31, 2019).
No distributions build kernels supporting those old machine generations
anymore, except Debian, which seems to pick the oldest supported
generation. The team supporting Debian on s390 has been notified about
the change.

Raising minimum supported machine generation to z10 helps to reduce
maintenance cost and effectively remove code, which is not getting
enough testing coverage due to lack of older hardware and distributions
support. Besides that this unblocks some optimization opportunities and
allows to use wider instruction set in asm files for future features
implementation. Due to this change spectre mitigation and usercopy
implementations could be drastically simplified and many newer instructions
could be converted from ".insn" encoding to instruction names.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-10 15:58:17 +01:00
Vasily Gorbik
2268169c14 s390: remove unused expoline to BC instructions
This reverts commit 6deaa3bbca ("s390: extend expoline to BC
instructions"). Expolines to BC instructions were added to be utilized
by commit de5cb6eb51 ("s390: use expoline thunks in the BPF JIT"). But
corresponding code has been removed by commit e1cf4befa2 ("bpf, s390x:
remove ld_abs/ld_ind"). And compiler does not generate such expolines as
well.

Compared to regular expolines, expolines to BC instructions contain
displacement and all possible variations cannot be generated in advance,
making kpatch support more complicated. So, remove those to avoid
future usages.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:01 +01:00
Heiko Carstens
7d8484c415 s390/irq: use assignment instead of cast
Change struct ext_code to contain a union which allows to simply
assign the int_code instead of using a cast.

In order to keep the patch small the anonymous union is embedded
within the existing struct instead of changing the struct ext_code to
a union.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:01 +01:00
Heiko Carstens
998e78004f s390/traps: get rid of magic cast for per code
Add a proper union in lowcore to reflect architecture and get rid of a
"magic" cast in order to read the full per code.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:01 +01:00
Heiko Carstens
52b739e278 s390/traps: get rid of magic cast for program interruption code
Add a proper union in lowcore to reflect architecture and get rid of a
"magic" cast in order to read the full program interruption code.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:01 +01:00
Heiko Carstens
0ecf337fa2 s390/signal: fix typo in comments
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:01 +01:00
Heiko Carstens
50b7c4688d s390/asm-offsets: remove unused defines
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:01 +01:00
Heiko Carstens
df5a95f481 s390: remove .fixup section
The only user is gone. Remove the section.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:01 +01:00
Heiko Carstens
46fee16f57 s390/extable: add and use fixup_exception helper function
Add and use fixup_exception helper function in order to remove the
duplicated exception handler fixup code at several places.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:00 +01:00
Heiko Carstens
cfa45c5e0d s390/base: pass pt_regs to early program check handler
Pass pt_regs to early program check handler like it is done for every
other interrupt and exception handler.

Also the passed pt_regs can be changed by the called function and the
changes register contents and psw contents will be taken into account
when returning. In addition the return psw will not be copied to the
program check old psw in lowcore, but to the usual return psw
location, like it is also done by the regular program check handler.
This allows also to get rid of the code that disabled lowcore
protection when changing the return address.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:00 +01:00
Heiko Carstens
d09a307fde s390/extable: move EX_TABLE define to asm-extable.h
Follow arm64 and riscv and move the EX_TABLE define to asm-extable.h
which is a lot less generic than the current linkage.h.

Also make sure that all files which contain EX_TABLE usages actually
include the new header file. This should make sure that the files
always compile and there won't be any random compile breakage due to
other header file dependencies.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:00 +01:00
Heiko Carstens
a156f09c90 s390/extable: sort amode31 extable early
The early program check handler is active before the amode31 extable
is sorted. Therefore in case a program check happens early within the
amode31 code the extable entry might not be found.

Fix this by sorting the amode31 extable early.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-08 00:33:00 +01:00
Vasily Gorbik
f0003a9e4c s390/entry: remove unused expoline thunk
Remove __s390_indirect_jump_r13use_r14 expoline thunk unused since
commit fbbdfca5c5 ("s390/entry.S: factor out SIEEXIT macro").

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 21:05:10 +01:00
Heiko Carstens
1a5e3f262e s390/ftrace: make use of epsw to get psw mask
Finally use epsw to create a complete psw mask within pt_regs. Without
this only some bits are correct, while other bits are (incorrectly)
always zero.

The epsw instruction is quite heavy weight, however given that this
only effects ftrace_regs_caller this seems to be the right thing, so
we finally get a complete psw mask for ftrace kprobed functions.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 21:05:10 +01:00
Heiko Carstens
96f6641a6a s390/ptrace: remove opencoded offsetof
Remove opencoded offsetof and use offsetof instead.
The generated code is identical before/after this change.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 21:05:10 +01:00
Alexander Gordeev
4851d22622 s390/smp: sort out physical vs virtual pointers usage
With commit 5789284710 ("s390/smp: reallocate IPL CPU lowcore")
virtual addresses are wrongly passed to memblock_free_late() and
SPX instructions on IPL CPU reinitialization.

Note: this does not fix a bug currently, since virtual and
physical addresses are identical.

Fixes: 5789284710 ("s390/smp: reallocate IPL CPU lowcore")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 21:05:10 +01:00
Vasily Gorbik
42b01a553a s390: always use the packed stack layout
-mpacked-stack option has been supported by both minimum
gcc and clang versions for a while. With commit e2bc3e91d9
("scripts/min-tool-version.sh: Raise minimum clang version to 13.0.0
for s390") minimum clang version now also supports a combination
of flags -mpacked-stack -mbackchain -pg -mfentry and fulfills
all requirements to always enable the packed stack layout.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 21:05:09 +01:00
Vasily Gorbik
9a4f03ad6d Merge branch 'fixes' into features
This helps to avoid several merge conflicts later.

* fixes:
  s390/extable: fix exception table sorting
  s390/ftrace: fix arch_ftrace_get_regs implementation
  s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
  s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
  s390/cio: verify the driver availability for path_event call
  s390/module: fix building test_modules_helpers.o with clang
  MAINTAINERS: downgrade myself to Reviewer for s390
  MAINTAINERS: add Alexander Gordeev as maintainer for s390

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 21:02:48 +01:00
Heiko Carstens
1389f17937 s390/ftrace: fix arch_ftrace_get_regs implementation
arch_ftrace_get_regs is supposed to return a struct pt_regs pointer
only if the pt_regs structure contains all register contents, which
means it must have been populated when created via ftrace_regs_caller.

If it was populated via ftrace_caller the contents are not complete
(the psw mask part is missing), and therefore a NULL pointer needs be
returned.

The current code incorrectly always returns a struct pt_regs pointer.

Fix this by adding another pt_regs flag which indicates if the
contents are complete, and fix arch_ftrace_get_regs accordingly.

Fixes: 894979689d ("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 20:41:28 +01:00
Heiko Carstens
9fa881f7e3 s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
ftrace_caller was used for both ftrace_caller and ftrace_regs_caller,
which means that the target address of the hotpatch trampoline was
never updated.

With commit 894979689d ("s390/ftrace: provide separate
ftrace_caller/ftrace_regs_caller implementations") a separate
ftrace_regs_caller entry point was implemeted, however it was
forgotten to implement the necessary changes for ftrace_modify_call
and ftrace_make_call, where the branch target has to be modified
accordingly.

Therefore add the missing code now.

Fixes: 894979689d ("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 20:41:28 +01:00
Alexander Egorenkov
6b4b54c7ca s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
We need to preserve the values at OLDMEM_BASE and OLDMEM_SIZE which are
used by zgetdump in case when kdump crashes. In that case zgetdump will
attempt to read OLDMEM_BASE and OLDMEM_SIZE in order to find out where
the memory range [0 - OLDMEM_SIZE] belonging to the production kernel is.

Fixes: f1a5469474 ("s390/setup: don't reserve memory that occupied decompressor's head")
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 20:41:28 +01:00
Alexander Gordeev
303fd988ed s390/maccess: fix semantics of memcpy_real() and its callers
There is a confusion with regard to the source address of
memcpy_real() and calling functions. While the declared
type for a source assumes a virtual address, in fact it
always called with physical address of the source.

This confusion led to bugs in copy_oldmem_kernel() and
copy_oldmem_user() functions, where __pa() macro applied
mistakenly to physical addresses. It does not lead to a
real issue, since virtual and physical addresses are
currently the same.

Fix both the bugs and memcpy_real() prototype by making
type of source address consistent to the function name
and the way it actually used.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-09 22:56:04 +01:00
Alexander Gordeev
dc306186a1 s390/dump: fix old lowcore virtual vs physical address confusion
Virtual addresses of vmcore_info and os_info members are
wrongly passed to copy_oldmem_kernel(), while the function
expects physical address of the source. Instead, __pa()
macro should have been applied.

Yet, use of __pa() macro could be somehow confusing, since
copy_oldmem_kernel() may treat the source as an offset, not
as a direct physical address (that depens from the oldmem
availability and location).

Fix the virtual vs physical address confusion and make the
way the old lowcore is read consistent across all sources.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-09 22:56:03 +01:00
Heiko Carstens
ba2d394c60 s390/lgr: use simple assignment instead of memcpy
It is quite pointless to use memcpy to copy two bytes, besides that
this construct will also partially remove type and size sanity checks.

Therefore simply use an assignment.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-06 23:31:29 +01:00
Alexander Gordeev
9de209c7d5 s390/dump: fix os_info virtual vs physical address confusion
Due to historical reasons os_info handling functions misuse
the notion of physical vs virtual addresses difference.

Note: this does not fix a bug currently, since virtual
and physical addresses are identical.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-06 23:31:29 +01:00
Sven Schnelle
98c0d24d1e s390/ftrace: verify opcode before applying patch
commit 72b3942a17 ("scripts: ftrace - move the
sort-processing in ftrace_init") had the unexpected
side effect that wrong code locations were patched.
To prevent this from happening again, verify the
opcode before patching it.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-06 23:31:29 +01:00
Heiko Carstens
f36e7c9845 s390: remove invalid email address of Heiko Carstens
Remove my old invalid email address which can be found in a couple of
files. Instead of updating it, just remove my contact data completely
from source files.
We have git and other tools which allow to figure out who is responsible
for what with recent contact data.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-06 23:31:29 +01:00
Ilya Leoshkevich
f3b7e73b2c s390/module: fix loading modules with a lot of relocations
If the size of the PLT entries generated by apply_rela() exceeds
64KiB, the first ones can no longer reach __jump_r1 with brc. Fix by
using brcl. An alternative solution is to add a __jump_r1 copy after
every 64KiB, however, the space savings are quite small and do not
justify the additional complexity.

Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Cc: stable@vger.kernel.org
Reported-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-01-24 09:10:37 +01:00
Christian Borntraeger
f094a39c6b s390/nmi: handle vector validity failures for KVM guests
The machine check validity bit tells about the context. If a KVM guest
was running the bit tells about the guest validity and the host state is
not affected. As a guest can disable the guest validity this might
result in unwanted host errors on machine checks.

Cc: stable@vger.kernel.org
Fixes: c929500d7a ("s390/nmi: s390: New low level handling for machine check happening in guest")
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-01-23 10:20:03 +01:00
Christian Borntraeger
1ea1d6a847 s390/nmi: handle guarded storage validity failures for KVM guests
machine check validity bits reflect the state of the machine check. If a
guest does not make use of guarded storage, the validity bit might be
off. We can not use the host CR bit to decide if the validity bit must
be on. So ignore "invalid" guarded storage controls for KVM guests in
the host and rely on the machine check being forwarded to the guest.  If
no other errors happen from a host perspective everything is fine and no
process must be killed and the host can continue to run.

Cc: stable@vger.kernel.org
Fixes: c929500d7a ("s390/nmi: s390: New low level handling for machine check happening in guest")
Reported-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Tested-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-01-23 10:20:02 +01:00
Linus Torvalds
85e67d56eb more s390 updates for 5.17 merge window
- add Sven Schnelle as reviewer for s390 code
 
 - make uaccess code more readable
 
 - change cpu measurement facility code to also support counter second
   version number 7, and add discard support for limited samples
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmHprPAACgkQIg7DeRsp
 bsLqkQ//ZWeZpN8YUS1VRoV0yPl2FX1LC19DXu5kat5UeUgeCSAG4COVv1XejD33
 RP6zLFVBTVncdA4qrAsMJZnPpT/RSUS6fk0t0zETj6n0orjJYqekRnGuhQSATlzK
 yceIamg9tyqZgTCeBlCLF0ThFB5tsHVDQnrqRLECsY/Q24/2q04/97BFIak7iVAv
 D+0xivhf6rLufbw1SfxO7xXvtUBtdZcJUC1y9OhRp5Io1tGNkaKUziYwsnBicePg
 6RGFtYv95QqQ1XqC47sFyntp7FK3RFK0DnQx7cWcAknAEOqNN/IUT/GnJlywSNK+
 4ZtCG7kIIBmCXZbPiF5uhrf5vrRCv9zCoxHmZvubpeNF06SKLVl5Nx9Wbqe4eC0w
 5+CmSX+oO4JNJ4GN6hHURtgf0veYCZPDTtQ4pOuIGYxRtOmeFYlNcrCC3imgbZfx
 JRRFgaaX7mbUkq95acgbWowLMbWJR/TWC/caA9hh7awOzSlkhmAmnHg2s5kTnDjE
 n6+WTH9a9qn7k6mMFaA7Vfot/GYHValgl5FGQO5LXN+Y2/xMi3zS6hhYGi+JXMyR
 NlsQO9CRehUU4ApkyHDqH2q7G04Ko63DJ2DUZAHixrCM+c76EGzEN90bTGtVDwvk
 X72WpRpMvoD5Aqu2RVq0GyrlHH7MTFmURyz9Sqy8T5CtwAa/6As=
 =IQZX
 -----END PGP SIGNATURE-----

Merge tag 's390-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - add Sven Schnelle as reviewer for s390 code

 - make uaccess code more readable

 - change cpu measurement facility code to also support counter second
   version number 7, and add discard support for limited samples

* tag 's390-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: add Sven Schnelle as reviewer
  s390/uaccess: introduce bit field for OAC specifier
  s390/cpumf: Support for CPU Measurement Sampling Facility LS bit
  s390/cpumf: Support for CPU Measurement Facility CSVN 7
2022-01-21 08:57:15 +02:00
Thomas Richter
745f5d20e7 s390/cpumf: Support for CPU Measurement Sampling Facility LS bit
Adds support for the CPU Measurement Sampling Facility limit sampling
bit in the sampling device driver.
Limited samples have no valueable information are not collected.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-01-17 14:13:08 +01:00
Thomas Richter
a87b0fd4f9 s390/cpumf: Support for CPU Measurement Facility CSVN 7
Adds support for the CPU Measurement Counter Facility second version
number 7.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-01-17 14:13:08 +01:00
Linus Torvalds
35ce8ae9ae Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal/exit/ptrace updates from Eric Biederman:
 "This set of changes deletes some dead code, makes a lot of cleanups
  which hopefully make the code easier to follow, and fixes bugs found
  along the way.

  The end-game which I have not yet reached yet is for fatal signals
  that generate coredumps to be short-circuit deliverable from
  complete_signal, for force_siginfo_to_task not to require changing
  userspace configured signal delivery state, and for the ptrace stops
  to always happen in locations where we can guarantee on all
  architectures that the all of the registers are saved and available on
  the stack.

  Removal of profile_task_ext, profile_munmap, and profile_handoff_task
  are the big successes for dead code removal this round.

  A bunch of small bug fixes are included, as most of the issues
  reported were small enough that they would not affect bisection so I
  simply added the fixes and did not fold the fixes into the changes
  they were fixing.

  There was a bug that broke coredumps piped to systemd-coredump. I
  dropped the change that caused that bug and replaced it entirely with
  something much more restrained. Unfortunately that required some
  rebasing.

  Some successes after this set of changes: There are few enough calls
  to do_exit to audit in a reasonable amount of time. The lifetime of
  struct kthread now matches the lifetime of struct task, and the
  pointer to struct kthread is no longer stored in set_child_tid. The
  flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
  removed. Issues where task->exit_code was examined with
  signal->group_exit_code should been examined were fixed.

  There are several loosely related changes included because I am
  cleaning up and if I don't include them they will probably get lost.

  The original postings of these changes can be found at:
     https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org

  I trimmed back the last set of changes to only the obviously correct
  once. Simply because there was less time for review than I had hoped"

* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
  ptrace/m68k: Stop open coding ptrace_report_syscall
  ptrace: Remove unused regs argument from ptrace_report_syscall
  ptrace: Remove second setting of PT_SEIZED in ptrace_attach
  taskstats: Cleanup the use of task->exit_code
  exit: Use the correct exit_code in /proc/<pid>/stat
  exit: Fix the exit_code for wait_task_zombie
  exit: Coredumps reach do_group_exit
  exit: Remove profile_handoff_task
  exit: Remove profile_task_exit & profile_munmap
  signal: clean up kernel-doc comments
  signal: Remove the helper signal_group_exit
  signal: Rename group_exit_task group_exec_task
  coredump: Stop setting signal->group_exit_task
  signal: Remove SIGNAL_GROUP_COREDUMP
  signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
  signal: Make coredump handling explicit in complete_signal
  signal: Have prepare_signal detect coredumps using signal->core_state
  signal: Have the oom killer detect coredumps using signal->core_state
  exit: Move force_uaccess back into do_exit
  exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
  ...
2022-01-17 05:49:30 +02:00
Linus Torvalds
f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Aneesh Kumar K.V
21b084fdf2 mm/mempolicy: wire up syscall set_mempolicy_home_node
Link: https://lkml.kernel.org/r/20211202123810.267175-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:30 +02:00
Kefeng Wang
60115fa54a mm: defer kmemleak object creation of module_alloc()
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
enabled(without KASAN_VMALLOC) on x86[1].

When the module area allocates memory, it's kmemleak_object is created
successfully, but the KASAN shadow memory of module allocation is not
ready, so when kmemleak scan the module's pointer, it will panic due to
no shadow memory with KASAN check.

  module_alloc
    __vmalloc_node_range
      kmemleak_vmalloc
				kmemleak_scan
				  update_checksum
    kasan_module_alloc
      kmemleak_ignore

Note, there is no problem if KASAN_VMALLOC enabled, the modules area
entire shadow memory is preallocated.  Thus, the bug only exits on ARCH
which supports dynamic allocation of module area per module load, for
now, only x86/arm64/s390 are involved.

Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
kmemleak in module_alloc() to fix this issue.

[1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/

[wangkefeng.wang@huawei.com: fix build]
  Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
[akpm@linux-foundation.org: simplify ifdefs, per Andrey]
  Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com

Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
Fixes: 793213a82d ("s390/kasan: dynamic shadow mem allocation for modules")
Fixes: 39d114ddc6 ("arm64: add KASAN support")
Fixes: bebf56a1b1 ("kasan: enable instrumentation of global variables")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:25 +02:00
Linus Torvalds
f0d43b3a38 s390 updates for 5.17 merge window
- add fast vector/SIMD implementation of the ChaCha20 stream cipher,
   which mainly adapts Andy Polyakov's code for the kernel
 
 - add status attribute to AP queue device so users can easily figure
   out its status
 
 - fix race in page table release code, and and lots of documentation
 
 - remove uevent suppress from cio device driver, since it turned out
   that it generated more problems than it solved problems
 
 - quite a lot of virtual vs physical address confusion fixes
 
 - various other small improvements and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmHbyk8ACgkQIg7DeRsp
 bsIizA/5AS6ZSDsoiOyuRtBpEJk/lmgRLYcjgqJHVE7ShQVu+CERG1+L6R5Wgecw
 /nKXqsxqt2p8ql/IyaKMzep1I8xQKi1XUW2Nq3ntbJV0NkEfMf/ZNf0mtTfERVP3
 OwB0kHMujFrymLhJvlRFdwuPbdGan5ZsUhoBoQuBW4DZ8ly3tpsgMr5ycPMfICiZ
 0e2zuC84keEp0xYbkAQ1u48u2r7LTrT/8F77WzGYW06JzjscZMQE62i7NCD+RR4Y
 D04IH4EA2fT6CpyIBgZRJia+t5BzEQlASBVjczoT7C16sHY4o239iMhnGemQC2Hz
 TwmXQwjop6eIS1XJ2gF6tvnIrbSNF/3fEV9UHasrF3PuWbWsspHJmz9ciDJqiUCs
 i+FRBdqhe4L6lR4LjTfi1+VQNEIDEFKJ41jpOKiSVWlBVcpX6XTd5bjuWI3YD4O7
 Jz5s0q1go0P0Xg0qY/JdptCAU1VYFYUhrGsvDKtAmLHRgoWjk6D02CF/FFgCyiPK
 hshWikxfFrU0K1lfNf6248PnjTjPbguxDDJlCD6xkCmxWPPaYFf9pR2XJXy9pePB
 9qriNhcflDqSgRs/c2AykERaQymZfuFypNNNIrDoY0tzgxfIa0af+RZl93XwqdiP
 SnVc94381ccHKj0DUq+7Pa0VTx9Q1jBZecVPpE7bXDx5g+IrqiM=
 =Iy+v
 -----END PGP SIGNATURE-----

Merge tag 's390-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:
 "Besides all the small improvements and cleanups the most notable part
  is the fast vector/SIMD implementation of the ChaCha20 stream cipher,
  which is an adaptation of Andy Polyakov's code for the kernel.

  Summary:

   - add fast vector/SIMD implementation of the ChaCha20 stream cipher,
     which mainly adapts Andy Polyakov's code for the kernel

   - add status attribute to AP queue device so users can easily figure
     out its status

   - fix race in page table release code, and and lots of documentation

   - remove uevent suppress from cio device driver, since it turned out
     that it generated more problems than it solved problems

   - quite a lot of virtual vs physical address confusion fixes

   - various other small improvements and cleanups all over the place"

* tag 's390-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
  s390/dasd: use default_groups in kobj_type
  s390/sclp_sd: use default_groups in kobj_type
  s390/pci: simplify __pciwb_mio() inline asm
  s390: remove unused TASK_SIZE_OF
  s390/crash_dump: fix virtual vs physical address handling
  s390/crypto: fix compile error for ChaCha20 module
  s390/mm: check 2KB-fragment page on release
  s390/mm: better annotate 2KB pagetable fragments handling
  s390/mm: fix 2KB pgtable release race
  s390/sclp: release SCLP early buffer after kernel initialization
  s390/nmi: disable interrupts on extended save area update
  s390/zcrypt: CCA control CPRB sending
  s390/disassembler: update opcode table
  s390/uv: fix memblock virtual vs physical address confusion
  s390/smp: fix memblock_phys_free() vs memblock_free() confusion
  s390/sclp: fix memblock_phys_free() vs memblock_free() confusion
  s390/exit: remove dead reference to do_exit from copy_thread
  s390/ap: add missing virt_to_phys address conversion
  s390/pgalloc: use pointers instead of unsigned long values
  s390/pgalloc: add virt/phys address handling to base asce functions
  ...
2022-01-10 08:58:16 -08:00
Linus Torvalds
9b9e211360 arm64 updates for 5.17:
- KCSAN enabled for arm64.
 
 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.
 
 - Some more SVE clean-ups and refactoring in preparation for SME support
   (scalable matrix extensions).
 
 - BTI clean-ups (SYM_FUNC macros etc.)
 
 - arm64 atomics clean-up and codegen improvements.
 
 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal
   square root estimate).
 
 - Use SHA3 instructions to speed-up XOR.
 
 - arm64 unwind code refactoring/unification.
 
 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1
   (potentially set by a hypervisor; user-space already does this).
 
 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.
 
 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmHXNtYACgkQa9axLQDI
 XvHBGw/+OVGdbORxwrU+uRb7N6qIJkrW/mmM4x1KLo1i+REZLb8/VlXm0xC60FG+
 39x6FSVkRr+lLDfTqpQsOez5FpdsvOe9Fc4L3bwniDg+EPo7x65VmP2dw/Ae2q0i
 87xyWCczx5hFEPF/1sb1R1pm3bTXjeklBkdv+OXhwflLOwpCp1J8z8WJK8qJVFX6
 CmuE6Q4fDQr0ghl9Nf8DiAr20mHDh8wMKNUJOg4waaQOOCta6q1oJ3qfz6E9z1eW
 zEE3dfZgBCx7HCRc3KGgzT7H4Ces3BYvhBYP6bJRliVI88XdPiM4MfdGL4UIb27Q
 NLAdr+FVzk/YLzMHtxSfkT10nBqoOPWUTckLu9jIIl5cpBX73Wiz7jfzBvqFmC/y
 opSFMZ3lwQPM5WAPtAlZptA3GPPySeInVmvUgB7IQ+1Q1T1n8ri1y5hzTYC4Sc/g
 amJI1rXf1Al8+2zFBggr6Up+EOnfV9nAwrzLXkRlASsfmvY4dnVWg3NWfBqtEHAq
 VuZCecSgawxuSlpmJ4VGbLrBFaz18bn9EzujR5fFvi5Qcg1CMFOROi2+6IynopNV
 IS0R8j6fwgQPA5lcnNIPeJRRkQoqO4l8bPDzeXEny0BSw313EgBSo9aQtnjyIJbp
 BTuDHARKs+/NvDPvd8GQkxNPgwJnVOL9pdgNAolEu1/k7JtnIS0=
 =ecyi
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - KCSAN enabled for arm64.

 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.

 - Some more SVE clean-ups and refactoring in preparation for SME
   support (scalable matrix extensions).

 - BTI clean-ups (SYM_FUNC macros etc.)

 - arm64 atomics clean-up and codegen improvements.

 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and
   reciprocal square root estimate).

 - Use SHA3 instructions to speed-up XOR.

 - arm64 unwind code refactoring/unification.

 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP ==
   1 (potentially set by a hypervisor; user-space already does this).

 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.

 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits)
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64: Enable KCSAN
  kselftest/arm64: Add pidbench for floating point syscall cases
  arm64/fp: Add comments documenting the usage of state restore functions
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME
  ...
2022-01-10 08:49:37 -08:00
Heiko Carstens
a0e45d40d5 s390/crash_dump: fix virtual vs physical address handling
Signal processor STORE STATUS requires a physical address where register
contents are supposed to be written to, however the kernel must read the
data via the corresponding virtual address.

Also the allocated save_area, where register contents are copied to,
resides in virtual address space.

Fix this by using proper __pa() conversion, or correct memblock_alloc()
invocation.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-20 10:21:55 +01:00
Alexander Gordeev
c7ed509b21 s390/nmi: disable interrupts on extended save area update
Updating of the pointer to machine check extended save area
on the IPL CPU needs the lowcore protection to be disabled.
Disable interrupts while the protection is off to avoid
unnoticed writes to the lowcore.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-16 19:58:07 +01:00
Heiko Carstens
248420797d s390/disassembler: update opcode table
Sync with binutils: update opcode table to reflect the
instruction format update of the lpswey instruction, and
add the qpaci instruction.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-16 19:58:07 +01:00
Heiko Carstens
15b5c1833a s390/uv: fix memblock virtual vs physical address confusion
memblock_alloc_try_nid() returns a virtual address, however in error
case the allocated memory is incorrectly freed with memblock_phys_free().
Properly use memblock_free() instead, and pass a physical address to
uv_init() to fix this.

Note: this doesn't fix a bug currently, since virtual and physical
addresses are identical.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-16 19:58:07 +01:00
Heiko Carstens
fcfcba6dfc s390/smp: fix memblock_phys_free() vs memblock_free() confusion
memblock_phys_free() is used on a virtual address. Fix this by using
memblock_free().

Note: this doesn't fix a bug currently, since virtual and physical
addresses are identical.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-16 19:58:07 +01:00
Eric W. Biederman
893d4d9c62 s390/exit: remove dead reference to do_exit from copy_thread
My s390 assembly is not particularly good so I have read the history
of the reference to do_exit copy_thread and have been able to
verify that do_exit is not used.

The general argument is that s390 has been changed to use the generic
kernel_thread and kernel_execve and the generic versions do not call
do_exit.  So it is strange to see a do_exit reference sitting there.

The history of the do_exit reference in s390's version of copy_thread
seems conclusive that the do_exit reference is something that lingers
and should have been removed several years ago.

Up through 8d19f15a60be ("[PATCH] s390 update (1/27): arch.")  the
s390 code made a call to the exit(2) system call when a kernel thread
finished.  Then kernel_thread_starter was added which branched
directly to the value in register 11 when the kernel thread finshed.
The value in register 11 was set in kernel_thread to
"regs.gprs[11] = (unsigned long) do_exit"

In commit 37fe5d41f6 ("s390: fold kernel_thread_helper() into
ret_from_fork()") kernel_thread_starter was moved into entry.S and
entry64.S unchanged (except for the syntax differences between inline
assemly and in the assembly file).

In commit f9a7e025df ("s390: switch to generic kernel_thread()") the
assignment to "gprs[11]" was moved into copy_thread from the old
kernel_thread.  The helper kernel_thread_starter was still being used
and was still branching to "%r11" at the end.

In commit 30dcb0996e ("s390: switch to saner kernel_execve()
semantics") kernel_thread_starter was changed to unconditionally
branch to sysc_tracenogo instead to %r11 which held the value of
do_exit.  Unfortunately copy_thread was not updated to stop passing
do_exit in "gprs[11]".

In commit 56e62a7370 ("s390: convert to generic entry")
kernel_thread_starter was replaced by __ret_from_fork.  And the code
still continued to pass do_exit in "gprs[11]" despite __ret_from_fork
not caring in the slightest.

Remove this dead reference to do_exit to make it clear that s390 is
not doing anything with do_exit in copy_thread.

History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 30dcb0996e ("s390: switch to saner kernel_execve() semantics")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lore.kernel.org/r/20211208202532.16409-1-ebiederm@xmission.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-16 19:58:06 +01:00
Eric W. Biederman
0e25498f8c exit: Add and use make_task_dead.
There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Eric W. Biederman
5e354747b2 exit/s390: Remove dead reference to do_exit from copy_thread
My s390 assembly is not particularly good so I have read the history
of the reference to do_exit copy_thread and have been able to
verify that do_exit is not used.

The general argument is that s390 has been changed to use the generic
kernel_thread and kernel_execve and the generic versions do not call
do_exit.  So it is strange to see a do_exit reference sitting there.

The history of the do_exit reference in s390's version of copy_thread
seems conclusive that the do_exit reference is something that lingers
and should have been removed several years ago.

Up through 8d19f15a60be ("[PATCH] s390 update (1/27): arch.")  the
s390 code made a call to the exit(2) system call when a kernel thread
finished.  Then kernel_thread_starter was added which branched
directly to the value in register 11 when the kernel thread finshed.
The value in register 11 was set in kernel_thread to
"regs.gprs[11] = (unsigned long) do_exit"

In commit 37fe5d41f6 ("s390: fold kernel_thread_helper() into
ret_from_fork()") kernel_thread_starter was moved into entry.S and
entry64.S unchanged (except for the syntax differences between inline
assemly and in the assembly file).

In commit f9a7e025df ("s390: switch to generic kernel_thread()") the
assignment to "gprs[11]" was moved into copy_thread from the old
kernel_thread.  The helper kernel_thread_starter was still being used
and was still branching to "%r11" at the end.

In commit 30dcb0996e ("s390: switch to saner kernel_execve()
semantics") kernel_thread_starter was changed to unconditionally
branch to sysc_tracenogo instead to %r11 which held the value of
do_exit.  Unfortunately copy_thread was not updated to stop passing
do_exit in "gprs[11]".

In commit 56e62a7370 ("s390: convert to generic entry")
kernel_thread_starter was replaced by __ret_from_fork.  And the code
still continued to pass do_exit in "gprs[11]" despite __ret_from_fork
not caring in the slightest.

Remove this dead reference to do_exit to make it clear that s390 is
not doing anything with do_exit in copy_thread.

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 30dcb0996e ("s390: switch to saner kernel_execve() semantics")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:03:47 -06:00
Sven Schnelle
c9b12b59e2 s390/entry: fix duplicate tracking of irq nesting level
In the current code, when exiting from idle, rcu_irq_enter() is
called twice during irq entry:

irq_entry_enter()-> rcu_irq_enter()
irq_enter() -> rcu_irq_enter()

This may lead to wrong results from rcu_is_cpu_rrupt_from_idle()
because of a wrong dynticks nmi nesting count. Fix this by only
calling irq_enter_rcu().

Cc: <stable@vger.kernel.org> # 5.12+
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 56e62a7370 ("s390: convert to generic entry")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-12 18:52:26 +01:00
Alexander Egorenkov
abf0e8e4ef s390/kexec: handle R_390_PLT32DBL rela in arch_kexec_apply_relocations_add()
Starting with gcc 11.3, the C compiler will generate PLT-relative function
calls even if they are local and do not require it. Later on during linking,
the linker will replace all PLT-relative calls to local functions with
PC-relative ones. Unfortunately, the purgatory code of kexec/kdump is
not being linked as a regular executable or shared library would have been,
and therefore, all PLT-relative addresses remain in the generated purgatory
object code unresolved. This leads to the situation where the purgatory
code is being executed during kdump with all PLT-relative addresses
unresolved. And this results in endless loops within the purgatory code.

Furthermore, the clang C compiler has always behaved like described above
and this commit should fix kdump for kernels built with the latter.

Because the purgatory code is no regular executable or shared library,
contains only calls to local functions and has no PLT, all R_390_PLT32DBL
relocation entries can be resolved just like a R_390_PC32DBL one.

* https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x1633.html#AEN1699

Relocation entries of purgatory code generated with gcc 11.3
------------------------------------------------------------

$ readelf -r linux/arch/s390/purgatory/purgatory.o

Relocation section '.rela.text' at offset 0x370 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000005c  000c00000013 R_390_PC32DBL     0000000000000000 purgatory_sha_regions + 2
00000000007a  000d00000014 R_390_PLT32DBL    0000000000000000 sha256_update + 2
00000000008c  000e00000014 R_390_PLT32DBL    0000000000000000 sha256_final + 2
000000000092  000800000013 R_390_PC32DBL     0000000000000000 .LC0 + 2
0000000000a0  000f00000014 R_390_PLT32DBL    0000000000000000 memcmp + 2

Relocation entries of purgatory code generated with gcc 11.2
------------------------------------------------------------

$ readelf -r linux/arch/s390/purgatory/purgatory.o

Relocation section '.rela.text' at offset 0x368 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000005c  000c00000013 R_390_PC32DBL     0000000000000000 purgatory_sha_regions + 2
00000000007a  000d00000013 R_390_PC32DBL     0000000000000000 sha256_update + 2
00000000008c  000e00000013 R_390_PC32DBL     0000000000000000 sha256_final + 2
000000000092  000800000013 R_390_PC32DBL     0000000000000000 .LC0 + 2
0000000000a0  000f00000013 R_390_PC32DBL     0000000000000000 memcmp + 2

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reported-by: Tao Liu <ltao@redhat.com>
Suggested-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20211209073817.82196-1-egorenar@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10 16:12:34 +01:00
Jerome Marchand
ac8fc6af1a s390/ftrace: remove preempt_disable()/preempt_enable() pair
It looks like commit ce5e48036c ("ftrace: disable preemption
when recursion locked") missed a spot in kprobe_ftrace_handler() in
arch/s390/kernel/ftrace.c.
Remove the superfluous preempt_disable/enable_notrace() there too.

Fixes: ce5e48036c ("ftrace: disable preemption when recursion locked")
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Link: https://lore.kernel.org/r/20211208151503.1510381-1-jmarchan@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10 16:12:34 +01:00
Philipp Rudo
41967a37b8 s390/kexec_file: fix error handling when applying relocations
arch_kexec_apply_relocations_add currently ignores all errors returned
by arch_kexec_do_relocs. This means that every unknown relocation is
silently skipped causing unpredictable behavior while the relocated code
runs. Fix this by checking for errors and fail kexec_file_load if an
unknown relocation type is encountered.

The problem was found after gcc changed its behavior and used
R_390_PLT32DBL relocations for brasl instruction and relied on ld to
resolve the relocations in the final link in case direct calls are
possible. As the purgatory code is only linked partially (option -r)
ld didn't resolve the relocations leaving them for arch_kexec_do_relocs.
But arch_kexec_do_relocs doesn't know how to handle R_390_PLT32DBL
relocations so they were silently skipped. This ultimately caused an
endless loop in the purgatory as the brasl instructions kept branching
to itself.

Fixes: 71406883fd ("s390/kexec_file: Add kexec_file_load system call")
Reported-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Philipp Rudo <prudo@redhat.com>
Link: https://lore.kernel.org/r/20211208130741.5821-3-prudo@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10 16:12:33 +01:00
Philipp Rudo
edce10ee21 s390/kexec_file: print some more error messages
Be kind and give some more information on what went wrong.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Link: https://lore.kernel.org/r/20211208130741.5821-2-prudo@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10 16:12:33 +01:00
Peter Zijlstra
1614b2b11f arch: Make ARCH_STACKWALK independent of STACKTRACE
Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:03 +00:00
Heiko Carstens
402ff5a338 s390/nmi: add missing __pa/__va address conversion of extended save area
Add missing __pa/__va address conversion of machine check extended
save area designation, which is an absolute address.

Note: this currently doesn't fix a real bug, since virtual addresses
are indentical to physical ones.

Reported-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-06 14:42:26 +01:00
Linus Torvalds
6b38e2fb70 s390 updates for 5.16-rc2
- Add missing Kconfig option for ftrace direct multi sample, so it can
   be compiled again, and also add s390 support for this sample.
 
 - Update Christian Borntraeger's email address.
 
 - Various fixes for memory layout setup. Besides other this makes it
   possible to load shared DCSS segments again.
 
 - Fix copy to user space of swapped kdump oldmem.
 
 - Remove -mstack-guard and -mstack-size compile options when building
   vdso binaries. This can happen when CONFIG_VMAP_STACK is disabled
   and results in broken vdso code which causes more or less random
   exceptions. Also remove the not needed -nostdlib option.
 
 - Fix memory leak on cpu hotplug and return code handling in kexec
   code.
 
 - Wire up futex_waitv system call.
 
 - Replace snprintf with sysfs_emit where appropriate.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmGZBmIACgkQIg7DeRsp
 bsLevQ//XfCEcvJ1sB4OEiN97xyy5me4FoOo5rWuzG/ZN/YmUH0CkzJHIhjDcCg3
 2FslxH5doOA3zLEBCQKXtcW4uaLSgJcqDgFgpE0TZk/6VKB9RD5q2eSjd+akFMGh
 HFge54pfgpR7pYYwWRvbqOJRyzkU5oHAjMmt2UweOoX3qwynhMhTrT/03Y9pGMgK
 VBHhp+ocfdLGQk3nbehAWsh7AWItWwOtKblsTFoyJ6BW0pxb7Yc6+wrpyxLYCaRK
 rCbyXDStvDqjeBSdx2GZDrA7HbVsrZTHA7sSStIW8yIss1/YJXTP0J2PMXmYNbeE
 ou2WCg/iti1DNwN7AOR0OdPu1NfPQkyW6NmV8814Haa8Ub3GUc6RCo+U4wlCXAbo
 ZcHWlb8sgWgfQMzho3WfgkeXuEohO+nOV/x/JFt+NFcwidNTQKO7FQ8GsyylUcYo
 fBhElbn7p44eS1ivMFEwzptBbpH1JVbb30iV7tMWxyjJQ9TkzpsC3Ph14JimSChk
 oZuUnmgMztss/ikEMFcDLhd3DNedXfz10Boq6FucD8x46cW5j7o0scwIomcNtxmx
 C3Y9JCsDdiXAfS6Et6KGbsuWbigT3NjNKETK0+Be65GYNP/NPD5pXLeKywU++cHe
 e+Lucqiej9polcGN3X97lORMDEx0dXpGkM6ZK2rtX66e7rBbB7M=
 =n7BA
 -----END PGP SIGNATURE-----

Merge tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Add missing Kconfig option for ftrace direct multi sample, so it can
   be compiled again, and also add s390 support for this sample.

 - Update Christian Borntraeger's email address.

 - Various fixes for memory layout setup. Besides other this makes it
   possible to load shared DCSS segments again.

 - Fix copy to user space of swapped kdump oldmem.

 - Remove -mstack-guard and -mstack-size compile options when building
   vdso binaries. This can happen when CONFIG_VMAP_STACK is disabled and
   results in broken vdso code which causes more or less random
   exceptions. Also remove the not needed -nostdlib option.

 - Fix memory leak on cpu hotplug and return code handling in kexec
   code.

 - Wire up futex_waitv system call.

 - Replace snprintf with sysfs_emit where appropriate.

* tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  ftrace/samples: add s390 support for ftrace direct multi sample
  ftrace/samples: add missing Kconfig option for ftrace direct multi sample
  MAINTAINERS: update email address of Christian Borntraeger
  s390/kexec: fix memory leak of ipl report buffer
  s390/kexec: fix return code handling
  s390/dump: fix copying to user-space of swapped kdump oldmem
  s390: wire up sys_futex_waitv system call
  s390/vdso: filter out -mstack-guard and -mstack-size
  s390/vdso: remove -nostdlib compiler flag
  s390: replace snprintf in show functions with sysfs_emit
  s390/boot: simplify and fix kernel memory layout setup
  s390/setup: re-arrange memblock setup
  s390/setup: avoid using memblock_enforce_memory_limit
  s390/setup: avoid reserving memory above identity mapping
2021-11-20 10:55:50 -08:00
Eric W. Biederman
fcb116bc43 signal: Replace force_fatal_sig with force_exit_sig when in doubt
Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.

Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.

Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit.  This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.

This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.

In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig.  That can be done where it matters on
a case-by-case basis with careful analysis.

Reported-by: Kyle Huey <me@kylehuey.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
Fixes: 00b06da29c ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Fixes: a3616a3c02 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die")
Fixes: 83a1f27ad7 ("signal/powerpc: On swapcontext failure force SIGSEGV")
Fixes: 9bc508cf07 ("signal/s390: Use force_sigsegv in default_trap_handler")
Fixes: 086ec444f8 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig")
Fixes: c317d306d5 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails")
Fixes: 695dd0d634 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit")
Fixes: 1fbd60df8a ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.")
Fixes: 941edc5bf1 ("exit/syscall_user_dispatch: Send ordinary signals on failure")
Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-19 09:15:58 -06:00
Baoquan He
4aa9340584 s390/kexec: fix memory leak of ipl report buffer
unreferenced object 0x38000195000 (size 4096):
  comm "kexec", pid 8548, jiffies 4294953647 (age 32443.270s)
  hex dump (first 32 bytes):
    00 00 00 c8 20 00 00 00 00 00 00 c0 02 80 00 00  .... ...........
    40 40 40 40 40 40 40 40 00 00 00 00 00 00 00 00  @@@@@@@@........
  backtrace:
    [<0000000011a2f199>] __vmalloc_node_range+0xc0/0x140
    [<0000000081fa2752>] vzalloc+0x5a/0x70
    [<0000000063a4c92d>] ipl_report_finish+0x2c/0x180
    [<00000000553304da>] kexec_file_add_ipl_report+0xf4/0x150
    [<00000000862d033f>] kexec_file_add_components+0x124/0x160
    [<000000000d2717bb>] arch_kexec_kernel_image_load+0x62/0x90
    [<000000002e0373b6>] kimage_file_alloc_init+0x1aa/0x2e0
    [<0000000060f2d14f>] __do_sys_kexec_file_load+0x17c/0x2c0
    [<000000008c86fe5a>] __s390x_sys_kexec_file_load+0x40/0x50
    [<000000001fdb9dac>] __do_syscall+0x1bc/0x1f0
    [<000000003ee4258d>] system_call+0x78/0xa0

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Fixes: 99feaa717e ("s390/kexec_file: Create ipl report and pass to next kernel")
Cc: <stable@vger.kernel.org> # v5.2: 20c76e242e: s390/kexec: fix return code handling
Cc: <stable@vger.kernel.org> # v5.2
Link: https://lore.kernel.org/r/20211116033101.GD21646@MiWiFi-R3L-srv
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-18 17:50:07 +01:00
Heiko Carstens
20c76e242e s390/kexec: fix return code handling
kexec_file_add_ipl_report ignores that ipl_report_finish may fail and
can return an error pointer instead of a valid pointer.
Fix this and simplify by returning NULL in case of an error and let
the only caller handle this case.

Fixes: 99feaa717e ("s390/kexec_file: Create ipl report and pass to next kernel")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-18 17:25:35 +01:00
Alexander Egorenkov
3b90954419 s390/dump: fix copying to user-space of swapped kdump oldmem
This commit fixes a bug introduced by commit e9e7870f90 ("s390/dump:
introduce boot data 'oldmem_data'").
OLDMEM_BASE was mistakenly replaced by oldmem_data.size instead of
oldmem_data.start.

This bug caused the following error during kdump:
kdump.sh[878]: No program header covering vaddr 0x3434f5245found kexec bug?

Fixes: e9e7870f90 ("s390/dump: introduce boot data 'oldmem_data'")
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-18 17:25:34 +01:00
Vasily Gorbik
6c122360cf s390: wire up sys_futex_waitv system call
Tested with futex kselftests.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16 12:29:19 +01:00
Sven Schnelle
00b55eaf45 s390/vdso: filter out -mstack-guard and -mstack-size
When CONFIG_VMAP_STACK is disabled, the user can enable CONFIG_STACK_CHECK,
which adds a stack overflow check to each C function in the kernel. This is
also done for functions in the vdso page. These functions are run in user
context and user stack sizes are usually different to what the kernel uses.
This might trigger the stack check although the stack size is valid.
Therefore filter the -mstack-guard and -mstack-size flags when compiling
vdso C files.

Cc: stable@kernel.org # 5.10+
Fixes: 4bff8cb545 ("s390: convert to GENERIC_VDSO")
Reported-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16 12:29:19 +01:00
Masahiro Yamada
7b737adc10 s390/vdso: remove -nostdlib compiler flag
The -nostdlib option requests the compiler to not use the standard
system startup files or libraries when linking. It is effective only
when $(CC) is used as a linker driver.

Since commit 2b2a25845d ("s390/vdso: Use $(LD) instead of $(CC) to
link vDSO"), $(LD) is directly used, hence -nostdlib is unneeded.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20211107162111.323701-1-masahiroy@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16 12:29:19 +01:00
Vasily Gorbik
6ad5f024d1 s390/setup: re-arrange memblock setup
- Avoid using ULONG_MAX in memblock_remove, it has no functional change
  but makes memblock_dbg output a range which makes sense.

- Actually finish memblock memory setup before doing amode31/cr/uv
  setup.

- Move memblock_dump_all() debug output after memblock memory setup is
  complete. This gives us final "memory" regions if they were trimmed
  due to addressing limits and still "physmem" regions as original info
  which came from mem_detect.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16 12:29:19 +01:00
Vasily Gorbik
5dbc4cb466 s390/setup: avoid using memblock_enforce_memory_limit
There is a difference in how architectures treat "mem=" option. For some
that is an amount of online memory, for s390 and x86 this is the limiting
max address. Some memblock api like memblock_enforce_memory_limit()
take limit argument and explicitly treat it as the size of online memory,
and use __find_max_addr to convert it to an actual max address. Current
s390 usage:

memblock_enforce_memory_limit(memblock_end_of_DRAM());

yields different results depending on presence of memory holes (offline
memory blocks in between online memory). If there are no memory holes
limit == max_addr in memblock_enforce_memory_limit() and it does trim
online memory and reserved memory regions. With memory holes present it
actually does nothing.

Since we already use memblock_remove() explicitly to trim online memory
regions to potential limit (think mem=, kdump, addressing limits, etc.)
drop the usage of memblock_enforce_memory_limit() altogether. Trimming
reserved regions should not be required, since we now use
memblock_set_current_limit() to limit allocations and any explicit memory
reservations above the limit is an actual problem we should not hide.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16 12:29:18 +01:00
Vasily Gorbik
420f48f636 s390/setup: avoid reserving memory above identity mapping
Such reserved memory region, if not cleaned up later causes problems when
memblock_free_all() is called to release free pages to the buddy allocator
and those reserved regions are carried over to reserve_bootmem_region()
which marks the pages as PageReserved.

Instead use memblock_set_current_limit() to make sure memblock allocations
do not go over identity mapping (which could happen when "mem=" option
is used or during kdump).

Cc: stable@vger.kernel.org
Fixes: 73045a08cf ("s390: unify identity mapping limits handling")
Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16 12:29:18 +01:00
Linus Torvalds
be427a88a3 s390 updates for the 5.16 merge window #2
- Add PCI automatic error recovery.
 
 - Fix tape driver timer initialization broken during timers api cleanup.
 
 - Fix bogus CPU measurement counters values on CPUs offlining.
 
 - Check the validity of subchanel before reading other fields in
   the schib in cio code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmGPwDMACgkQjYWKoQLX
 FBgCoQf/VBel0vDex9NNVo59OmGTNh9NPPT2cUU8vYEmwHfaBeInZVEx5WOxXijl
 8MIbEgi6Wt3EwnIghjouC50nk8jCiNOJ8Z/wG+01zZpVpLk2GvKGjxoYxKg+5E6T
 sOSr7TMeKOtOp23xKAXGVIzzkrDTSyr3qruTKg/m6TFhQ0XSm/ld2k6tR5AARbuB
 UtsxBOtPWyHm1xPyuhjr+c6riK2vGQwJwYya4vGtIW8ix9uZoPabdqqzWsD3meBc
 B6fe96YQGxA8Tt80FtyJ6pHEhNDr8CE656aJZNJCnd7q1RmLWC1R/aUme+9wqQtO
 i9YmMvc+uzgQonpo+YgWqu9fIQqcXA==
 =OjW1
 -----END PGP SIGNATURE-----

Merge tag 's390-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Add PCI automatic error recovery.

 - Fix tape driver timer initialization broken during timers api
   cleanup.

 - Fix bogus CPU measurement counters values on CPUs offlining.

 - Check the validity of subchanel before reading other fields in the
   schib in cio code.

* tag 's390-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cio: check the subchannel validity for dev_busid
  s390/cpumf: cpum_cf PMU displays invalid value after hotplug remove
  s390/tape: fix timer initialization in tape_std_assign()
  s390/pci: implement minimal PCI error recovery
  PCI: Export pci_dev_lock()
  s390/pci: implement reset_slot for hotplug slot
  s390/pci: refresh function handle in iomap
2021-11-13 09:18:06 -08:00
Linus Torvalds
5147da902e Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit cleanups from Eric Biederman:
 "While looking at some issues related to the exit path in the kernel I
  found several instances where the code is not using the existing
  abstractions properly.

  This set of changes introduces force_fatal_sig a way of sending a
  signal and not allowing it to be caught, and corrects the misuse of
  the existing abstractions that I found.

  A lot of the misuse of the existing abstractions are silly things such
  as doing something after calling a no return function, rolling BUG by
  hand, doing more work than necessary to terminate a kernel thread, or
  calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).

  In the review a deficiency in force_fatal_sig and force_sig_seccomp
  where ptrace or sigaction could prevent the delivery of the signal was
  found. I have added a change that adds SA_IMMUTABLE to change that
  makes it impossible to interrupt the delivery of those signals, and
  allows backporting to fix force_sig_seccomp

  And Arnd found an issue where a function passed to kthread_run had the
  wrong prototype, and after my cleanup was failing to build."

* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
  soc: ti: fix wkup_m3_rproc_boot_thread return type
  signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
  signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
  exit/r8188eu: Replace the macro thread_exit with a simple return 0
  exit/rtl8712: Replace the macro thread_exit with a simple return 0
  exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
  signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
  signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
  signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
  exit/syscall_user_dispatch: Send ordinary signals on failure
  signal: Implement force_fatal_sig
  exit/kthread: Have kernel threads return instead of calling do_exit
  signal/s390: Use force_sigsegv in default_trap_handler
  signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  signal/sparc: In setup_tsb_params convert open coded BUG into BUG
  signal/powerpc: On swapcontext failure force SIGSEGV
  signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
  signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  signal/sparc32: Remove unreachable do_exit in do_sparc_fault
  ...
2021-11-10 16:15:54 -08:00
Thomas Richter
9d48c7afed s390/cpumf: cpum_cf PMU displays invalid value after hotplug remove
When a CPU is hotplugged while the perf stat -e cycles command is
running, a wrong (very large) value is displayed immediately after the
CPU removal:

  Check the values, shouldn't be too high as in
            time             counts unit events
     1.001101919           29261846      cycles
     2.002454499           17523405      cycles
     3.003659292           24361161      cycles
     4.004816983 18446744073638406144      cycles
     5.005671647      <not counted>      cycles
     ...

The CPU hotplug off took place after 3 seconds.
The issue is the read of the event count value after 4 seconds when
the CPU is not available and the read of the counter returns an
error. This is treated as a counter value of zero. This results
in a very large value (0 - previous_value).

Fix this by detecting the hotplugged off CPU and report 0 instead
of a very large number.

Cc: stable@vger.kernel.org
Fixes: a029a4eab3 ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility")
Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-11-08 14:17:49 +01:00
Linus Torvalds
0b707e572a s390 updates for the 5.16 merge window
- Add support for ftrace with direct call and ftrace direct call samples.
 
 - Add support for kernel command lines longer than current 896 bytes and
   make its length configurable.
 
 - Add support for BEAR enhancement facility to improve last breaking
   event instruction tracking.
 
 - Add kprobes sanity checks and testcases to prevent kprobe in the mid
   of an instruction.
 
 - Allow concurrent access to /dev/hwc for the CPUMF users.
 
 - Various ftrace / jump label improvements.
 
 - Convert unwinder tests to KUnit.
 
 - Add s390_iommu_aperture kernel parameter to tweak the limits on
   concurrently usable DMA mappings.
 
 - Add ap.useirq AP module option which can be used to disable interrupt
   use.
 
 - Add add_disk() error handling support to block device drivers.
 
 - Drop arch specific and use generic implementation of strlcpy and strrchr.
 
 - Several __pa/__va usages fixes.
 
 - Various cio, crypto, pci, kernel doc and other small fixes and
   improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmGFW6EACgkQjYWKoQLX
 FBg20Qf/UbohgnKnE6vxbbH3sNTlI2dk3Cw4z3IobcsZgqXAu6AFLgLQGLk/X07F
 DIyUdrgSgCzLIEKLqrLrFXIOMIK44zAGaurIltNt7IrnWWlA+/YVD+YeL2gHwccq
 wT7KXRcrVMZQ1z18djJQ45DpPUC8ErBdL6+P+ftHck90YGFZsfMA5S7jf8X1h08U
 IlqdPTmY8t4unKHWVpHbxx9b+xrUuV6KTEXADsllpMV2jQoTLdDECd3vmefYR6tR
 3lssgop1m/RzH5OCqvia5Sy2D5fOQObNWDMakwOkVMxOD43lmGCTHstzS2Uo2OFE
 QcY79lfZ5NrzKnenUdE5Fd0XJ9kSwQ==
 =k0Ab
 -----END PGP SIGNATURE-----

Merge tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add support for ftrace with direct call and ftrace direct call
   samples.

 - Add support for kernel command lines longer than current 896 bytes
   and make its length configurable.

 - Add support for BEAR enhancement facility to improve last breaking
   event instruction tracking.

 - Add kprobes sanity checks and testcases to prevent kprobe in the mid
   of an instruction.

 - Allow concurrent access to /dev/hwc for the CPUMF users.

 - Various ftrace / jump label improvements.

 - Convert unwinder tests to KUnit.

 - Add s390_iommu_aperture kernel parameter to tweak the limits on
   concurrently usable DMA mappings.

 - Add ap.useirq AP module option which can be used to disable interrupt
   use.

 - Add add_disk() error handling support to block device drivers.

 - Drop arch specific and use generic implementation of strlcpy and
   strrchr.

 - Several __pa/__va usages fixes.

 - Various cio, crypto, pci, kernel doc and other small fixes and
   improvements all over the code.

[ Merge fixup as per https://lore.kernel.org/all/YXAqZ%2FEszRisunQw@osiris/ ]

* tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (63 commits)
  s390: make command line configurable
  s390: support command lines longer than 896 bytes
  s390/kexec_file: move kernel image size check
  s390/pci: add s390_iommu_aperture kernel parameter
  s390/spinlock: remove incorrect kernel doc indicator
  s390/string: use generic strlcpy
  s390/string: use generic strrchr
  s390/ap: function rework based on compiler warning
  s390/cio: make ccw_device_dma_* more robust
  s390/vfio-ap: s390/crypto: fix all kernel-doc warnings
  s390/hmcdrv: fix kernel doc comments
  s390/ap: new module option ap.useirq
  s390/cpumf: Allow multiple processes to access /dev/hwc
  s390/bitops: return true/false (not 1/0) from bool functions
  s390: add support for BEAR enhancement facility
  s390: introduce nospec_uses_trampoline()
  s390: rename last_break to pgm_last_break
  s390/ptrace: add last_break member to pt_regs
  s390/sclp: sort out physical vs virtual pointers usage
  s390/setup: convert start and end initrd pointers to virtual
  ...
2021-11-06 14:48:06 -07:00
Linus Torvalds
512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
David Hildenbrand
952eea9b01 memblock: allow to specify flags with memblock_add_node()
We want to specify flags when hotplugging memory.  Let's prepare to pass
flags to memblock_add_node() by adjusting all existing users.

Note that when hotplugging memory the system is already up and running
and we might have concurrent memblock users: for example, while we're
hotplugging memory, kexec_file code might search for suitable memory
regions to place kexec images.  It's important to add the memory
directly to memblock via a single call with the right flags, instead of
adding the memory first and apply flags later: otherwise, concurrent
memblock users might temporarily stumble over memblocks with wrong
flags, which will be important in a follow-up patch that introduces a
new flag to properly handle add_memory_driver_managed().

Link: https://lkml.kernel.org/r/20211004093605.5830-4-david@redhat.com
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shahab Vahedi <shahab@synopsys.com>	[arch/arc]
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jianyong Wu <Jianyong.Wu@arm.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
Mike Rapoport
3ecc68349b memblock: rename memblock_free to memblock_phys_free
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().

The callers are updated with the below semantic patch:

    @@
    expression addr;
    expression size;
    @@
    - memblock_free(addr, size);
    + memblock_phys_free(addr, size);

Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
fa27717110 memblock: drop memblock_free_early_nid() and memblock_free_early()
memblock_free_early_nid() is unused and memblock_free_early() is an
alias for memblock_free().

Replace calls to memblock_free_early() with calls to memblock_free() and
remove memblock_free_early() and memblock_free_early_nid().

Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Linus Torvalds
d7e0a795bf ARM:
* More progress on the protected VM front, now with the full
   fixed feature set as well as the limitation of some hypercalls
   after initialisation.
 
 * Cleanup of the RAZ/WI sysreg handling, which was pointlessly
   complicated
 
 * Fixes for the vgic placement in the IPA space, together with a
   bunch of selftests
 
 * More memcg accounting of the memory allocated on behalf of a guest
 
 * Timer and vgic selftests
 
 * Workarounds for the Apple M1 broken vgic implementation
 
 * KConfig cleanups
 
 * New kvmarm.mode=none option, for those who really dislike us
 
 RISC-V:
 * New KVM port.
 
 x86:
 * New API to control TSC offset from userspace
 
 * TSC scaling for nested hypervisors on SVM
 
 * Switch masterclock protection from raw_spin_lock to seqcount
 
 * Clean up function prototypes in the page fault code and avoid
 repeated memslot lookups
 
 * Convey the exit reason to userspace on emulation failure
 
 * Configure time between NX page recovery iterations
 
 * Expose Predictive Store Forwarding Disable CPUID leaf
 
 * Allocate page tracking data structures lazily (if the i915
 KVM-GT functionality is not compiled in)
 
 * Cleanups, fixes and optimizations for the shadow MMU code
 
 s390:
 * SIGP Fixes
 
 * initial preparations for lazy destroy of secure VMs
 
 * storage key improvements/fixes
 
 * Log the guest CPNC
 
 Starting from this release, KVM-PPC patches will come from
 Michael Ellerman's PPC tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBOiEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNowwf/axlx3g9sgCwQHr12/6UF/7hL/RwP
 9z+pGiUzjl2YQE+RjSvLqyd6zXh+h4dOdOKbZDLSkSTbcral/8U70ojKnQsXM0XM
 1LoymxBTJqkgQBLm9LjYreEbzrPV4irk4ygEmuk3CPOHZu8xX1ei6c5LdandtM/n
 XVUkXsQY+STkmnGv4P3GcPoDththCr0tBTWrFWtxa0w9hYOxx0ay1AZFlgM4FFX0
 QFuRc8VBLoDJpIUjbkhsIRIbrlHc/YDGjuYnAU7lV/CIME8vf2BW6uBwIZJdYcDj
 0ejozLjodEnuKXQGnc8sXFioLX2gbMyQJEvwCgRvUu/EU7ncFm1lfs7THQ==
 =UxKM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - More progress on the protected VM front, now with the full fixed
     feature set as well as the limitation of some hypercalls after
     initialisation.

   - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
     complicated

   - Fixes for the vgic placement in the IPA space, together with a
     bunch of selftests

   - More memcg accounting of the memory allocated on behalf of a guest

   - Timer and vgic selftests

   - Workarounds for the Apple M1 broken vgic implementation

   - KConfig cleanups

   - New kvmarm.mode=none option, for those who really dislike us

  RISC-V:

   - New KVM port.

  x86:

   - New API to control TSC offset from userspace

   - TSC scaling for nested hypervisors on SVM

   - Switch masterclock protection from raw_spin_lock to seqcount

   - Clean up function prototypes in the page fault code and avoid
     repeated memslot lookups

   - Convey the exit reason to userspace on emulation failure

   - Configure time between NX page recovery iterations

   - Expose Predictive Store Forwarding Disable CPUID leaf

   - Allocate page tracking data structures lazily (if the i915 KVM-GT
     functionality is not compiled in)

   - Cleanups, fixes and optimizations for the shadow MMU code

  s390:

   - SIGP Fixes

   - initial preparations for lazy destroy of secure VMs

   - storage key improvements/fixes

   - Log the guest CPNC

  Starting from this release, KVM-PPC patches will come from Michael
  Ellerman's PPC tree"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  RISC-V: KVM: fix boolreturn.cocci warnings
  RISC-V: KVM: remove unneeded semicolon
  RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions
  RISC-V: KVM: Factor-out FP virtualization into separate sources
  KVM: s390: add debug statement for diag 318 CPNC data
  KVM: s390: pv: properly handle page flags for protected guests
  KVM: s390: Fix handle_sske page fault handling
  KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info
  KVM: x86: Clarify the kvm_run.emulation_failure structure layout
  KVM: s390: Add a routine for setting userspace CPU state
  KVM: s390: Simplify SIGP Set Arch handling
  KVM: s390: pv: avoid stalls when making pages secure
  KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
  KVM: s390: pv: avoid double free of sida page
  KVM: s390: pv: add macros for UVC CC values
  s390/mm: optimize reset_guest_reference_bit()
  s390/mm: optimize set_guest_storage_key()
  s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present
  ...
2021-11-02 11:24:14 -07:00
Linus Torvalds
d2fac0afe8 audit/stable-5.16 PR 20211101
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmGANdUUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOmihAAgKSTv4Jf0s4yopdcxfuLweiyqHX1
 719QJzdLZohmllrJPq/83FZL9qodCzxy87nAm67Ht0baSKiEjtVgRaVCqJWEE+l6
 oQL+wUsGLP7CmExOP503Uh6tW35AhETQA4Uwu6QtiUYLYG17kAgeR3cTFuekUsJS
 iL4K65PXE2bBxMe7Ta1YIZqcxptbknMgpqYkdne7xs7RS+UiVj8TyRle6ACrfzEX
 IVy4LTk+spHCy1a494g9pt/21xOnbiLHr/FpckALscnvJiUThxbfQHGSQeMpM4uM
 BnwCqFrj860vMeh52M11/GAAXmdPh6AjoLhaSIW2I3M2GbV8ZP2hu1HYUz3osmrT
 f+aeMPJ4feX1xVj6qAC+1G83XRO83tP/YIEuocGiwyepImB25NHPin21xepf6Ru0
 wJX+aXC9O1eG6E2ghT6tBim/MpeNH5OT0hNO3uhGmEQ6xZpArRVVaBwlEdufJiCx
 ZljqEFUT7wA9nGEQif6GdLnGezGr/aNL65caTkIAzHKamd79QIr7VZXYjYIfHSqE
 p74Aro6E8qoQJjsTSkvZceM0u1LRzwS4wPRroE6eGz98oYDpiDm1RPb+9Gw5jyJf
 JN7UjJKO9+iPGAi3KivGBqpBskw4cCp2y/nHrMYmpGUPELcr5kQtDfQ6yp59tVZ8
 Dwo5GeSlG6khmiI=
 =WrEw
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "Add some additional audit logging to capture the openat2() syscall
  open_how struct info.

  Previous variations of the open()/openat() syscalls allowed audit
  admins to inspect the syscall args to get the information contained in
  the new open_how struct used in openat2()"

* tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: return early if the filter rule has a lower priority
  audit: add OPENAT2 record to list "how" info
  audit: add support for the openat2 syscall
  audit: replace magic audit syscall class numbers with macros
  lsm_audit: avoid overloading the "key" audit field
  audit: Convert to SPDX identifier
  audit: rename struct node to struct audit_node to prevent future name collisions
2021-11-01 21:17:39 -07:00
Linus Torvalds
79ef0c0014 Tracing updates for 5.16:
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
   dump happens from a kretprobe callback.
 
 - Fix to bootconfig parsing
 
 - Have tracefs allow owner and group permissions by default (only denying
   others). There's been pressure to allow non root to tracefs in a
   controlled fashion, and using groups is probably the safest.
 
 - Bootconfig memory managament updates.
 
 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.
 
 - Allow perf to be traced by function tracer.
 
 - Rewrite of function graph tracer to be a callback from the function tracer
   instead of having its own trampoline (this change will happen on an arch
   by arch basis, and currently only x86_64 implements it).
 
 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.
 
 - Allow histogram triggers to add variables that can perform calculations
   against the event's fields.
 
 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent warnings
   from the compiler.
 
 - Extend histogram triggers to key off of variables.
 
 - Have trace recursion use bit magic to determine preempt context over if
   branches.
 
 - Have trace recursion disable preemption as all use cases do anyway.
 
 - Added testing for verification of tracing utilities.
 
 - Various small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
 Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
 =Loz8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - kprobes: Restructured stack unwinder to show properly on x86 when a
   stack dump happens from a kretprobe callback.

 - Fix to bootconfig parsing

 - Have tracefs allow owner and group permissions by default (only
   denying others). There's been pressure to allow non root to tracefs
   in a controlled fashion, and using groups is probably the safest.

 - Bootconfig memory managament updates.

 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.

 - Allow perf to be traced by function tracer.

 - Rewrite of function graph tracer to be a callback from the function
   tracer instead of having its own trampoline (this change will happen
   on an arch by arch basis, and currently only x86_64 implements it).

 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.

 - Allow histogram triggers to add variables that can perform
   calculations against the event's fields.

 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent
   warnings from the compiler.

 - Extend histogram triggers to key off of variables.

 - Have trace recursion use bit magic to determine preempt context over
   if branches.

 - Have trace recursion disable preemption as all use cases do anyway.

 - Added testing for verification of tracing utilities.

 - Various small clean ups and fixes.

* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
  tracing/histogram: Fix semicolon.cocci warnings
  tracing/histogram: Fix documentation inline emphasis warning
  tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
  tracing: Show size of requested perf buffer
  bootconfig: Initialize ret in xbc_parse_tree()
  ftrace: do CPU checking after preemption disabled
  ftrace: disable preemption when recursion locked
  tracing/histogram: Document expression arithmetic and constants
  tracing/histogram: Optimize division by a power of 2
  tracing/histogram: Covert expr to const if both operands are constants
  tracing/histogram: Simplify handling of .sym-offset in expressions
  tracing: Fix operator precedence for hist triggers expression
  tracing: Add division and multiplication support for hist triggers
  tracing: Add support for creating hist trigger variables from literal
  selftests/ftrace: Stop tracing while reading the trace file by default
  MAINTAINERS: Update KPROBES and TRACING entries
  test_kprobes: Move it from kernel/ to lib/
  docs, kprobes: Remove invalid URL and add new reference
  samples/kretprobes: Fix return value if register_kretprobe() failed
  lib/bootconfig: Fix the xbc_get_info kerneldoc
  ...
2021-11-01 20:05:19 -07:00
Eric W. Biederman
e21294a7aa signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted.  So change every instance we can to make the code clearer.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Eric W. Biederman
9bc508cf07 signal/s390: Use force_sigsegv in default_trap_handler
Reading the history it is unclear why default_trap_handler calls
do_exit.  It is not even menthioned in the commit where the change
happened.  My best guess is that because it is unknown why the
exception happened it was desired to guarantee the process never
returned to userspace.

Using do_exit(SIGSEGV) has the problem that it will only terminate one
thread of a process, leaving the process in an undefined state.

Use force_sigsegv(SIGSEGV) instead which effectively has the same
behavior except that is uses the ordinary signal mechanism and
terminates all threads of a process and is generally well defined.

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Fixes: ca2ab03237ec ("[PATCH] s390: core changes")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lkml.kernel.org/r/20211020174406.17889-11-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:05 -05:00
Claudio Imbrenda
380d97bd02 KVM: s390: pv: properly handle page flags for protected guests
Introduce variants of the convert and destroy page functions that also
clear the PG_arch_1 bit used to mark them as secure pages.

The PG_arch_1 flag is always allowed to overindicate; using the new
functions introduced here allows to reduce the extent of overindication
and thus improve performance.

These new functions can only be called on pages for which a reference
is already being held.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20210920132502.36111-7-imbrenda@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-27 07:55:53 +02:00
Sven Schnelle
5ecb2da660 s390: support command lines longer than 896 bytes
Currently s390 supports a fixed maximum command line length of 896
bytes. This isn't enough as some installers are trying to pass all
configuration data via kernel command line, and even with zfcp alone
it is easy to generate really long command lines. Therefore extend
the command line to 4 kbytes.

In the parm area where the command line is stored there is no indication
of the maximum allowed length, so a new field which contains the maximum
length is added.

The parm area has always been initialized to zero, so with old kernels
this field would read zero. This is important because tools like zipl
could read this field. If it contains a number larger than zero zipl
knows the maximum length that can be stored in the parm area, otherwise
it must assume that it is booting a legacy kernel and only 896 bytes are
available.

The removing of trailing whitespace in head.S is also removed because
code to do this is already present in setup_boot_command_line().

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:31 +02:00
Sven Schnelle
277c838938 s390/kexec_file: move kernel image size check
In preparation of adding support for command lines with variable
sizes on s390, the check whether the new kernel image is at least HEAD_END
bytes long isn't correct. Move the check to kexec_file_add_components()
so we can get the size of the parm area and check the size there.

The '.org HEAD_END' directive can now also be removed from head.S. This
was used in the past to reserve space for the early sccb buffer, but with
commit 9a5131b87cac1 ("s390/boot: move sclp early buffer from fixed address
in asm to C") this is no longer required.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:30 +02:00
Thomas Richter
453380318e s390/cpumf: Allow multiple processes to access /dev/hwc
Commit a029a4eab3 ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility")
added CPU Measurement counter facility access to multiple consumers.
It allows concurrent access to the CPU Measurement counter facility
via several perf_event_open() system call invocations and via ioctl()
system call of device /dev/hwc.  However the access via device /dev/hwc
was exclusive, only one process was able to open this device.

The patch removes this restriction. Now multiple invocations of lshwc
can execute in parallel. They can access different CPUs and counter
sets or CPUs and counter set can overlap.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:29 +02:00
Sven Schnelle
3b051e89da s390: add support for BEAR enhancement facility
The Breaking-Event-Address-Register (BEAR) stores the address of the
last breaking event instruction. Breaking events are usually instructions
that change the program flow - for example branches, and instructions
that modify the address in the PSW like lpswe. This is useful for debugging
wild branches, because one could easily figure out where the wild branch
was originating from.

What is problematic is that lpswe is considered a breaking event, and
therefore overwrites BEAR on kernel exit. The BEAR enhancement facility
adds new instructions that allow to save/restore BEAR and also an lpswey
instruction that doesn't cause a breaking event. So we can save BEAR on
kernel entry and restore it on exit to user space.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:29 +02:00
Sven Schnelle
5d17d4ed7e s390: introduce nospec_uses_trampoline()
and replace all of the "__is_defined(CC_USING_EXPOLINE) && !nospec_disable"
occurrences.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:29 +02:00
Sven Schnelle
26c21aa485 s390: rename last_break to pgm_last_break
With the upcoming BEAR enhancements last_break isn't really
unique, so rename it to pgm_last_break. This way it should
be more obvious that this is the last_break value that is
written by the hardware when a program check occurs.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:28 +02:00
Sven Schnelle
c8f573eccb s390/ptrace: add last_break member to pt_regs
Instead of using args[0] for the value of the last breaking event
address register, add a member to make things more obvious.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:28 +02:00
Alexander Gordeev
ada1da31ce s390/sclp: sort out physical vs virtual pointers usage
Provide physical addresses whenever the hardware interface
expects it or a 32-bit value used for tracking.

Variable sclp_early_sccb gets initialized in the decompressor
and points to an address in physcal memory. Yet, it is used
as virtual memory pointer and therefore should be converted.

Note, the other two __bootdata variables sclp_info_sccb and
sclp_info_sccb_valid contain plain data, but no pointers and
do need any special care.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:28 +02:00
Alexander Gordeev
dd9089b654 s390/setup: convert start and end initrd pointers to virtual
Variables initrd_start and initrd_end are expected to hold
virtual memory pointers, not physical.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:28 +02:00
Alexander Gordeev
04f11ed7d8 s390/setup: use physical pointers for memblock_reserve()
memblock_reserve() function accepts physcal address of a memory
block to be reserved, but provided with virtual memory pointers.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:28 +02:00
Alexander Gordeev
e035389b73 s390/setup: use virtual address for STSI instruction
Provide virtual memory pointer for system-information block.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:28 +02:00
Alexander Gordeev
5caca32fba s390/cpcmd: use physical address for command and response
Virtual Console Function DIAGNOSE 8 accepts physical
addresses of command and response strings.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-26 15:21:28 +02:00
Claudio Imbrenda
f0a1a0615a KVM: s390: pv: avoid stalls when making pages secure
Improve make_secure_pte to avoid stalls when the system is heavily
overcommitted. This was especially problematic in kvm_s390_pv_unpack,
because of the loop over all pages that needed unpacking.

Due to the locks being held, it was not possible to simply replace
uv_call with uv_call_sched. A more complex approach was
needed, in which uv_call is replaced with __uv_call, which does not
loop. When the UVC needs to be executed again, -EAGAIN is returned, and
the caller (or its caller) will try again.

When -EAGAIN is returned, the path is the same as when the page is in
writeback (and the writeback check is also performed, which is
harmless).

Fixes: 214d9bbcd3 ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20210920132502.36111-5-imbrenda@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25 09:20:39 +02:00
David Hildenbrand
46c22ffd27 s390/uv: fully validate the VMA before calling follow_page()
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f260 ("mm: mmap: zap pages
with read mmap_sem in munmap").

find_vma() does not check if the address is >= the VMA start address;
use vma_lookup() instead.

Fixes: 214d9bbcd3 ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/r/20210909162248.14969-6-david@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25 09:20:38 +02:00
Eric W. Biederman
9fd5a04d8e exit: Remove calls of do_exit after noreturn versions of die
On nds32, openrisc, s390, sh, and xtensa the function die never
returns.  Mark die __noreturn so that no one expects die to return.
Remove the do_exit calls after die as they will never be reached.

Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: openrisc@lists.librecores.org
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Fixes: 2.3.16
Fixes: 2.3.99-pre8
Fixes: 3f65ce4d14 ("[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 5")
Fixes: 664eec400b ("nds32: MMU fault handling and page table management")
Fixes: 61e85e3675 ("OpenRISC: Memory management")
Link: https://lkml.kernel.org/r/20211020174406.17889-2-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-20 13:09:47 -05:00
Heiko Carstens
3d487acf1b s390: make STACK_FRAME_OVERHEAD available via asm-offsets.h
Make STACK_FRAME_OVERHEAD available via asm-offsets.h. This allows to
add s390 specific asm code to e.g. ftrace samples, without requiring
to add random header files, which might cause all sort of problems on
other architectures. asm-offsets.h can be assumed to be non-problematic.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-3-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-19 15:39:53 +02:00
Heiko Carstens
2ab3a0a9fa s390/ftrace: add HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALL support
This is the s390 variant of commit 562955fe6a ("ftrace/x86: Add
register_ftrace_direct() for custom trampolines").

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-2-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-19 15:39:53 +02:00
Kees Cook
42a20f86dc sched: Add wrapper for get_wchan() to keep task blocked
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
2021-10-15 11:25:14 +02:00
Heiko Carstens
894979689d s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations
ftrace_regs_caller is an alias to ftrace_caller - making ftrace_caller
quite heavyweight. Split the function and provide an ftrace_caller
implementation which comes with fewer instructions. Especially getting
rid of 'stosm' on each function entry should help here, e.g. to
have less performance impact on live patched functions.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-11 20:55:58 +02:00
Heiko Carstens
0c14c03795 s390/jump_label: add __init_or_module annotation
Add missing __init_or_module to arch_jump_label_transform_static().

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-11 20:55:58 +02:00
Heiko Carstens
acd6c9afc6 s390/jump_label: rename __jump_label_transform()
Trivial patch just to get rid of the leading underscores.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-11 20:55:58 +02:00
Heiko Carstens
4e0502b8b3 s390/jump_label: make use of HAVE_JUMP_LABEL_BATCH
Specify HAVE_JUMP_LABEL_BATCH in header file. This allows to make use
of the arch_jump_label_transform_queue()/arch_jump_label_transform_apply()
mechanism.

However unlike on x86, which currently is the only user of this
mechanism, the to be patched instructions are still directly
modified. The only difference to before is that serialization is only
done after all instructions have been modified. This way the number of
serialization/synchronization events is reduced.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-11 20:55:58 +02:00
Heiko Carstens
e5873d6f7a s390/ftrace: add missing serialization for graph caller patching
CPUs must be serialized also when ftrace_graph_caller gets patched.
This is missing since ftrace function graph support was added on s390.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-11 20:55:58 +02:00
Heiko Carstens
ae2b9a11b4 s390/ftrace: use text_poke_sync_lock()
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-11 20:55:58 +02:00
Heiko Carstens
1c27dfb24e s390/jump_label: use text_poke_sync()
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-11 20:55:58 +02:00
Heiko Carstens
e16d02ee3f s390: introduce text_poke_sync()
Introduce a text_poke_sync() similar to what x86 has. This can be
used to execute a serializing instruction on all CPUs (including
the current one).

Note: according to the Principles of Operation an IPI (= interrupt)
will already serialize a CPU, however it is better to be explicit. In
addition on_each_cpu() makes sure that also the current CPU get
serialized - just to make sure that possible preemption can prevent
some theoretical case where a CPU will not be serialized.

Therefore text_poke_sync() has to be used whenever code got modified,
just to avoid to rely on implicit serialization.

Also introduce text_poke_sync_lock() which will also disable CPU
hotplug, to prevent that any CPU is just going online with a
prefetched old version of a modified instruction.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-11 20:55:58 +02:00
Weizhao Ouyang
6644c654ea ftrace: Cleanup ftrace_dyn_arch_init()
Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common
ftrace_dyn_arch_init() to cleanup them.

Link: https://lkml.kernel.org/r/20210909090216.1955240-1-o451686892@gmail.com

Acked-by: Heiko Carstens <hca@linux.ibm.com> (s390)
Acked-by: Helge Deller <deller@gmx.de> (parisc)
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-08 19:41:39 -04:00
Alexander Gordeev
e3ec8e0f57 s390/boot: allocate amode31 section in decompressor
The memory for amode31 section is allocated from the decompressed
kernel. Instead, allocate that memory from the decompressor. This
is a prerequisite to allow initialization of the virtual memory
before the decompressed kernel takes over.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-04 09:49:37 +02:00
Alexander Gordeev
584315ed87 s390/boot: initialize control registers in decompressor
Partially revert commit 4555b9f34296 ("s390/boot: move
dma sections from decompressor to decompressed kernel").
This is a prerequisite to allow initialization of virtual
memory in decompressor and avoid overwriting of ASCEs in
the decompressed kernel otherwise.

Since the control registers 2, 5 and 15 are reinitialized
in the decompressed kernel again, this change does not
prevent relocating of amode31 section in any way.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-04 09:49:37 +02:00
Sven Schnelle
4df898dc06 s390/kprobes: add sanity check
Check whether the specified address points to the start of an
instruction to prevent users from setting a kprobe in the mid of
an instruction which would crash the kernel.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-04 09:49:36 +02:00
Heiko Carstens
b860b9346e s390/ftrace: remove dead code
ftrace_shared_hotpatch_trampoline() never returns NULL,
therefore quite a bit of code can be removed.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-04 09:49:36 +02:00
Richard Guy Briggs
1c30e3af8a audit: add support for the openat2 syscall
The openat2(2) syscall was added in kernel v5.6 with commit
fddb5d430a ("open: introduce openat2(2) syscall").

Add the openat2(2) syscall to the audit syscall classifier.

Link: https://github.com/linux-audit/audit-kernel/issues/67
Link: https://lore.kernel.org/r/f5f1a4d8699613f8c02ce762807228c841c2e26f.1621363275.git.rgb@redhat.com
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
[PM: merge fuzz due to previous header rename, commit line wraps]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-01 16:52:48 -04:00
Richard Guy Briggs
42f355ef59 audit: replace magic audit syscall class numbers with macros
Replace audit syscall class magic numbers with macros.

This required putting the macros into new header file
include/linux/audit_arch.h since the syscall macros were
included for both 64 bit and 32 bit in any compat code, causing
redefinition warnings.

Link: https://lore.kernel.org/r/2300b1083a32aade7ae7efb95826e8f3f260b1df.1621363275.git.rgb@redhat.com
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
[PM: renamed header to audit_arch.h after consulting with Richard]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-01 16:41:33 -04:00
Masami Hiramatsu
adf8a61a94 kprobes: treewide: Make it harder to refer kretprobe_trampoline directly
Since now there is kretprobe_trampoline_addr() for referring the
address of kretprobe trampoline code, we don't need to access
kretprobe_trampoline directly.

Make it harder to refer by renaming it to __kretprobe_trampoline().

Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30 21:24:06 -04:00
Masami Hiramatsu
96fed8ac2b kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
The __kretprobe_trampoline_handler() callback, called from low level
arch kprobes methods, has the 'trampoline_address' parameter, which is
entirely superfluous as it basically just replicates:

  dereference_kernel_function_descriptor(kretprobe_trampoline)

In fact we had bugs in arch code where it wasn't replicated correctly.

So remove this superfluous parameter and use kretprobe_trampoline_addr()
instead.

Link: https://lkml.kernel.org/r/163163044546.489837.13505751885476015002.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30 21:24:06 -04:00
Masami Hiramatsu
9c89bb8e32 kprobes: treewide: Cleanup the error messages for kprobes
This clean up the error/notification messages in kprobes related code.
Basically this defines 'pr_fmt()' macros for each files and update
the messages which describes

 - what happened,
 - what is the kernel going to do or not do,
 - is the kernel fine,
 - what can the user do about it.

Also, if the message is not needed (e.g. the function returns unique
error code, or other error message is already shown.) remove it,
and replace the message with WARN_*() macros if suitable.

Link: https://lkml.kernel.org/r/163163036568.489837.14085396178727185469.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30 21:24:05 -04:00
Linus Torvalds
f154c80667 2nd batch of s390 updates for 5.15 merge window
- Fix topology update on cpu hotplug, so notifiers see expected masks. This bug
   was uncovered with SCHED_CORE support.
 
 - Fix stack unwinding so that the correct number of entries are omitted like
   expected by common code. This fixes KCSAN selftests.
 
 - Add kmemleak annotation to stack_alloc to avoid false positive kmemleak
   warnings.
 
 - Avoid layering violation in common I/O code and don't unregister subchannel
   from child-drivers.
 
 - Remove xpram device driver for which no real use case exists since the kernel
   is 64 bit only. Also all hypervisors got required support removed in the
   meantime, which means the xpram device driver is dead code.
 
 - Fix -ENODEV handling of clp_get_state in our PCI code.
 
 - Enable KFENCE in debug defconfig.
 
 - Cleanup hugetlbfs s390 specific Kconfig dependency.
 
 - Quite a lot of trivial fixes to get rid of "W=1" warnings, and and other
   simple cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmE56jEACgkQIg7DeRsp
 bsI1sQ/+L91zvpjlWGEPjZhQmFJgDufuObLWJlhwOSPsOlezzJTujNscoisTe6Wm
 hfS1I/GzGsgcY3695xgBLgkPS37nrDdDLAgM4CnajOOalEZjbHgH5gcPiCPHfPAD
 QkvVFv2PjCQnaPx81kEIeK6tMFkvi6IRhfwhtGTf1fwoKDyw4IQT1couBsiuAy3n
 28/7NqMidS4gbv5X/BLK1Ez4as9d3PoecNre1debRPOZcdxIjCVDy7OW5MotI3ol
 ENsOHtNJe/orIDCc+QbsEP2xZJZdbZ0D0Zr/RQ4KEue42wKtGLzp/ZuG+UfTPyyx
 vlEDgMRgPHAGnceEImcMwK0XQwOn05sm13jOkbmpIwhmiE46rksAPf3cGL4DjlBP
 3rznDXoLYELX2OAHz2G4jfbrqFWDxbh5rp1NMr8tELvJV5xbdsMC11QFQY28swod
 /sUE39fX+zynwHSSttq0PXtKX4gr/d5ZMDdlhjl7lxlOgwEwDodBL3/xL81+C0qx
 jkQWDsJ6OpZ7iJpGvxaCUhFjlgihdi2InZ942inRGo/A/EaM6/7diExLiyqfaab5
 WEQ2BOlITUey85Fiu2WxeeweRChUwu+XNQt+Nx4hDF454K51htU/GJCUBW5Z5qtN
 Dm+/DolXkPY+joR7xBLHNzivob3ShcsoFiZjoBpTc/Hd18dhSQg=
 =fpJz
 -----END PGP SIGNATURE-----

Merge tag 's390-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:
 "Except for the xpram device driver removal it is all about fixes and
  cleanups.

   - Fix topology update on cpu hotplug, so notifiers see expected
     masks. This bug was uncovered with SCHED_CORE support.

   - Fix stack unwinding so that the correct number of entries are
     omitted like expected by common code. This fixes KCSAN selftests.

   - Add kmemleak annotation to stack_alloc to avoid false positive
     kmemleak warnings.

   - Avoid layering violation in common I/O code and don't unregister
     subchannel from child-drivers.

   - Remove xpram device driver for which no real use case exists since
     the kernel is 64 bit only. Also all hypervisors got required
     support removed in the meantime, which means the xpram device
     driver is dead code.

   - Fix -ENODEV handling of clp_get_state in our PCI code.

   - Enable KFENCE in debug defconfig.

   - Cleanup hugetlbfs s390 specific Kconfig dependency.

   - Quite a lot of trivial fixes to get rid of "W=1" warnings, and and
     other simple cleanups"

* tag 's390-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  hugetlbfs: s390 is always 64bit
  s390/ftrace: remove incorrect __va usage
  s390/zcrypt: remove incorrect kernel doc indicators
  scsi: zfcp: fix kernel doc comments
  s390/sclp: add __nonstring annotation
  s390/hmcdrv_ftp: fix kernel doc comment
  s390: remove xpram device driver
  s390/pci: read clp_list_pci_req only once
  s390/pci: fix clp_get_state() handling of -ENODEV
  s390/cio: fix kernel doc comment
  s390/ctrlchar: fix kernel doc comment
  s390/con3270: use proper type for tasklet function
  s390/cpum_cf: move array from header to C file
  s390/mm: fix kernel doc comments
  s390/topology: fix topology information when calling cpu hotplug notifiers
  s390/unwind: use current_frame_address() to unwind current task
  s390/configs: enable CONFIG_KFENCE in debug_defconfig
  s390/entry: make oklabel within CHKSTG macro local
  s390: add kmemleak annotation in stack_alloc()
  s390/cio: dont unregister subchannel from child-drivers
2021-09-09 12:55:12 -07:00
Arnd Bergmann
59ab844eed compat: remove some compat entry points
These are all handled correctly when calling the native system call entry
point, so remove the special cases.

Link: https://lkml.kernel.org/r/20210727144859.4150043-6-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 15:32:35 -07:00
Heiko Carstens
9652cb805c s390/ftrace: remove incorrect __va usage
The address of ftrace_graph_caller is already virtual.
Using __va() to translate the address is wrong.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-09-08 14:23:31 +02:00
Heiko Carstens
5dddfaac4c s390/cpum_cf: move array from header to C file
Move array from header to C file to avoid that it gets defined in
every C file where the header is included:

In file included from arch/s390/kernel/perf_cpum_cf_common.c:19:
./arch/s390/include/asm/cpu_mcf.h:27:18: warning: ‘cpumf_ctr_ctl’ defined but not used [-Wunused-const-variable=]
   27 | static const u64 cpumf_ctr_ctl[CPUMF_CTR_SET_MAX] = {
      |                  ^~~~~~~~~~~~~

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-09-07 13:38:41 +02:00
Sven Schnelle
a052096bdd s390/topology: fix topology information when calling cpu hotplug notifiers
The cpu hotplug notifiers are called without updating the core/thread
masks when a new CPU is added. This causes problems with code setting
up data structures in a cpu hotplug notifier, and relying on that later
in normal code.

This caused a crash in the new core scheduling code (SCHED_CORE),
where rq->core was set up in a notifier depending on cpu masks.

To fix this, add a cpu_setup_mask which is used in update_cpu_masks()
instead of the cpu_online_mask to determine whether the cpu masks should
be set for a certain cpu. Also move update_cpu_masks() to update the
masks before calling notify_cpu_starting() so that the notifiers are
seeing the updated masks.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@vger.kernel.org>
[hca@linux.ibm.com: get rid of cpu_online_mask handling]
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-09-07 13:38:41 +02:00
Linus Torvalds
14726903c8 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
2021-09-03 10:08:28 -07:00
Suren Baghdasaryan
dce4910396 mm: wire up syscall process_mrelease
Split off from prev patch in the series that implements the syscall.

Link: https://lkml.kernel.org/r/20210809185259.405936-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:17 -07:00
Mike Rapoport
a7259df767 memblock: make memblock_find_in_range method private
There are a lot of uses of memblock_find_in_range() along with
memblock_reserve() from the times memblock allocation APIs did not exist.

memblock_find_in_range() is the very core of memblock allocations, so any
future changes to its internal behaviour would mandate updates of all the
users outside memblock.

Replace the calls to memblock_find_in_range() with an equivalent calls to
memblock_phys_alloc() and memblock_phys_alloc_range() and make
memblock_find_in_range() private method of memblock.

This simplifies the callers, ensures that (unlikely) errors in
memblock_reserve() are handled and improves maintainability of
memblock_find_in_range().

Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>		[arm64]
Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[ACPI]
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Nick Kossifidis <mick@ics.forth.gr>			[riscv]
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:17 -07:00
Linus Torvalds
bcfeebbff3 Merge branch 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit cleanups from Eric Biederman:
 "In preparation of doing something about PTRACE_EVENT_EXIT I have
  started cleaning up various pieces of code related to do_exit. Most of
  that code I did not manage to get tested and reviewed before the merge
  window opened but a handful of very useful cleanups are ready to be
  merged.

  The first change is simply the removal of the bdflush system call. The
  code has now been disabled long enough that even the oldest userspace
  working userspace setups anyone can find to test are fine with the
  bdflush system call being removed.

  Changing m68k fsp040_die to use force_sigsegv(SIGSEGV) instead of
  calling do_exit directly is interesting only in that it is nearly the
  most difficult of the incorrect uses of do_exit to remove.

  The change to the seccomp code to simply send a signal instead of
  calling do_coredump directly is a very nice little cleanup made
  possible by realizing the existing signal sending helpers were missing
  a little bit of functionality that is easy to provide"

* 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal/seccomp: Dump core when there is only one live thread
  signal/seccomp: Refactor seccomp signal and coredump generation
  signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die
  exit/bdflush: Remove the deprecated bdflush system call
2021-09-01 14:52:05 -07:00
Heiko Carstens
15256194ef s390/entry: make oklabel within CHKSTG macro local
Make the oklabel within the CHKSTG macro local. This makes sure that
tools like objdump and the crash debugging tool still disassemble full
functions where the macro has been used instead of stopping half way
where such a global label is used and one has to guess how to
disassemble the rest of such a function:

E.g.:

0000000000cb0270 <mcck_int_handler>:
  cb0270:       b2 05 03 20             stck    800
  ...
  cb0354:       a7 74 00 97             jne     cb0482 <oklabel270+0xe2>

0000000000cb0358 <oklabel243>:
  cb0358:       c0 e0 00 22 4e 8f       larl    %r14,10fa076 <opcode+0x2558>
  ...

Fixes: d35925b349 ("s390/mcck: move storage error checks to assembler")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-31 14:54:15 +02:00
Sven Schnelle
436fc4feea s390: add kmemleak annotation in stack_alloc()
kmemleak with enabled auto scanning reports that our stack allocation is
lost. This is because we're saving the pointer + STACK_INIT_OFFSET to
lowcore. When kmemleak now scans the objects, it thinks that this one is
lost because it can't find a corresponding pointer.

Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-31 14:54:14 +02:00
Linus Torvalds
c7a5238ef6 s390 updates for 5.15 merge window
- Improve ftrace code patching so that stop_machine is not required anymore.
   This requires a small common code patch acked by Steven Rostedt:
   https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/
 
 - Enable KCSAN for s390. This comes with a small common code change to fix a
   compile warning. Acked by Marco Elver:
   https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com
 
 - Add KFENCE support for s390. This also comes with a minimal x86 patch from
   Marco Elver who said also this can be carried via the s390 tree:
   https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/
 
 - More changes to prepare the decompressor for relocation.
 
 - Enable DAT also for CPU restart path.
 
 - Final set of register asm removal patches; leaving only three locations where
   needed and sane.
 
 - Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO support to
   hwcaps flags.
 
 - Cleanup hwcaps implementation.
 
 - Add new instructions to in-kernel disassembler.
 
 - Various QDIO cleanups.
 
 - Add SCLP debug feature.
 
 - Various other cleanups and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmEs1GsACgkQIg7DeRsp
 bsJ9+A/9FApCECNPgu6jOX4Ee+no+LxpCPUF8rvt56TFTLv7+Dhm7fJl0xQ9utsZ
 FyLMDAr1/FKdm2wBW23QZH4vEIt1bd6e/03DwwK+6IjHKZHRIfB8eGJMsLj/TDzm
 K6/+FI7qXjvpNXxgkCqXf5yESi/y5Dgr+16kTBhPZj5awRiwe5puPamji3uiQ45V
 r4MdGCCC9BnTZvtPpUrr8ImnUqHJ4/TMo1YYdykLbZFuAvvYUyZ5YG5kh0pMa8JZ
 DGJpfLQfy7ZNscIzdVhZtfzzESVtS6/AOeBzDMO1pbM1CGXtvpJJP0Wjlr/PGwoW
 fvuMHpqTlDi+TfNZiPP5lwsFC89xSd6gtZH7vAuI8kFCXgW3RMjABF6h/mzpH1WO
 jXyaSEZROc/83gxPMYyOYiqrKyAFPbpZ/Rnav2bvGQGneqx7RvmpF3GgA9WEo1PW
 rMDoEbLstJuHk0E2uEV+OnQd5F7MHNonzpYfp/7pyQ+PL8w2GExV9yngVc/f3TqG
 HYLC9rc3K6DkxZappcJm0qTb7lDTMFI7xK3g9RiqPQBJE1v1MYE/rai48nW69ypE
 bRNL76AjyXKo+zKR2wlhJVMY1I1+DarMopHhZj6fzQT5te1LLsv8OuTU2gkt6dIq
 YoSYOYvModf3HbKnJul2tszQG9yl+vpE9MiCyBQSsxIYXCriq/c=
 =WDRh
 -----END PGP SIGNATURE-----

Merge tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Improve ftrace code patching so that stop_machine is not required
   anymore. This requires a small common code patch acked by Steven
   Rostedt:

     https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/

 - Enable KCSAN for s390. This comes with a small common code change to
   fix a compile warning. Acked by Marco Elver:

     https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com

 - Add KFENCE support for s390. This also comes with a minimal x86 patch
   from Marco Elver who said also this can be carried via the s390 tree:

     https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/

 - More changes to prepare the decompressor for relocation.

 - Enable DAT also for CPU restart path.

 - Final set of register asm removal patches; leaving only three
   locations where needed and sane.

 - Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO
   support to hwcaps flags.

 - Cleanup hwcaps implementation.

 - Add new instructions to in-kernel disassembler.

 - Various QDIO cleanups.

 - Add SCLP debug feature.

 - Various other cleanups and improvements all over the place.

* tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (105 commits)
  s390: remove SCHED_CORE from defconfigs
  s390/smp: do not use nodat_stack for secondary CPU start
  s390/smp: enable DAT before CPU restart callback is called
  s390: update defconfigs
  s390/ap: fix state machine hang after failure to enable irq
  KVM: s390: generate kvm hypercall functions
  s390/sclp: add tracing of SCLP interactions
  s390/debug: add early tracing support
  s390/debug: fix debug area life cycle
  s390/debug: keep debug data on resize
  s390/diag: make restart_part2 a local label
  s390/mm,pageattr: fix walk_pte_level() early exit
  s390: fix typo in linker script
  s390: remove do_signal() prototype and do_notify_resume() function
  s390/crypto: fix all kernel-doc warnings in vfio_ap_ops.c
  s390/pci: improve DMA translation init and exit
  s390/pci: simplify CLP List PCI handling
  s390/pci: handle FH state mismatch only on disable
  s390/pci: fix misleading rc in clp_set_pci_fn()
  s390/boot: factor out offset_vmlinux_info() function
  ...
2021-08-30 13:07:15 -07:00
Alexander Gordeev
d6be5d0ad3 s390/smp: do not use nodat_stack for secondary CPU start
The secondary CPU start C routine uses nodat_stack as a
interim stack before finally switching to kernel_stack.
Such scheme is superfluous, since the assembler restart
interrupt handler (that secondary CPU starter is called
from) does not need to use any stack for switching into
DAT mode. Once DAT is on, any stack including virtually-
mapped one could be used.

Avoid the use of nodat_stack and smp_start_secondary()
helper. Instead, initiate kernel_stack directly from
the restart interrupt handler.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-26 20:22:13 +02:00
Alexander Gordeev
915fea04f9 s390/smp: enable DAT before CPU restart callback is called
The restart interrupt is triggered whenever a secondary CPU is
brought online, a remote function call dispatched from another
CPU or a manual PSW restart is initiated and causes the system
to kdump. The handling routine is always called with DAT turned
off. It then initializes the stack frame and invokes a callback.

The existing callbacks handle DAT as follows:

  * __do_restart() and __machine_kexec() turn in on upon entry;
  * __ipl_run(), __reipl_run() and __dump_run() do not turn it
    right away, but all of them call diag308() - which turns DAT
    on, but only if kasan is enabled;

In addition to the described complexity all callbacks (and the
functions they call) should avoid kasan instrumentation while
DAT is off.

This update enables DAT in the assembler restart handler and
relieves any callbacks (which are mostly C functions) from
dealing with DAT altogether.

There are four types of CPU restart that initialize control
registers in different ways:

  1. Start of secondary CPU on boot - control registers are
     inherited from the IPL CPU;
  2. Restart of online CPU - control registers of the CPU being
     restarted are kept;
  3. Hotplug of offline CPU - control registers are inherited
     from the starting CPU;
  4. Start of offline CPU triggered by manual PSW restart -
     the control registers are read from the absolute lowcore
     and contain the boot time IPL CPU values updated with all
     follow-up calls of smp_ctl_set_bit() and smp_ctl_clear_bit()
     routines;

In first three cases contents of the control registers is the
most recent. In the latter case control registers are good
enough to facilitate successful completion of kdump operation.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-26 20:22:12 +02:00
Peter Oberparleiter
70aa5d3982 s390/sclp: add tracing of SCLP interactions
Add tracing of interactions between the SCLP base driver, firmware and
other drivers to support problem determination in case of SCLP-related
issues.

For that purpose this patch introduces two new s390dbf debug areas:

  - sclp: An abbreviated log of all common interactions
  - sclp_err: A full log of failed or abnormal interactions

Tracing of full SCCB contents can be enabled for the sclp area by
setting its debug level to maximum (6).

Overview of added trace events:

  * Firmware interaction:
    - SRV1: Service call about to be issued
    - SRV2: Service call was issued
    - INT:  Interrupt received

  * Driver interaction:
    - RQAD: Request was added
    - RQOK: Request success
    - RQAB: Request aborted
    - RQTM: Request timed out
    - REG:  Event listener registered
    - UREG: Event listener unregistered
    - EVNT: Event callback
    - STCG: State-change callback

  * Abnormal events:
    - TMO:  A timeout occurred
    - UNEX: Unexpected SCCB completion

  * Other (not traced at default level):
    - SYN1: Synchronous wait start
    - SYN2: Synchronous wait end

Since the SCLP interface is used by console drivers this patch also
moves s390dbf printks outside the critical section protected by debug
area locks to prevent a potential deadlock that would otherwise be
introduced between console_owner --> sclp_lock --> sclp_debug.lock.

Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-25 11:03:35 +02:00
Peter Oberparleiter
d72541f945 s390/debug: add early tracing support
Debug areas can currently only be used after s390dbf initialization
which occurs as a postcore_initcall. This is too late for tracing
earlier code such as that related to console_init().

This patch introduces a macro for defining a statically initialized
debug area that can be used to trace very early code. The macro is made
available for built-in code only because modules are never running
during early boot.

Example usage:

1. Define static debug area:

  DEFINE_STATIC_DEBUG_INFO(my_debug, "my_debug", 4, 1, 16,
			   &debug_hex_ascii_view);

2. Add trace entry:

  debug_event(&my_debug, 0, "DATA", 4);

Note: The debug area is automatically registered in debugfs during boot.
      A driver must not call any of the debug_register()/_unregister()
      functions on a static debug_info_t!

Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-25 11:03:35 +02:00
Peter Oberparleiter
9372a82892 s390/debug: fix debug area life cycle
Currently allocation and registration of s390dbf debug areas are tied
together. As a result, a debug area cannot be unregistered and
re-registered while any process has an associated debugfs file open.

Fix this by splitting alloc/release from register/unregister.

Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-25 11:03:35 +02:00
Peter Oberparleiter
1204777867 s390/debug: keep debug data on resize
Any previously recorded s390dbf debug data is reset when a debug area
is resized using the 'pages' sysfs attribute. This can make
live-debugging unnecessarily complex.

Fix this by copying existing debug data to the newly allocated debug
area when resizing.

Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-25 11:03:35 +02:00
Heiko Carstens
2879048c7e s390/diag: make restart_part2 a local label
Avoid that the "restart_part2" label, which is in the middle of a
function, appears in /proc/kallsyms.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-25 11:03:34 +02:00
Heiko Carstens
8b5f08b484 s390: fix typo in linker script
Rename amod31 to amode31 like it was supposed to be.

Fixes: c78d0c7484 ("s390: rename dma section to amode31")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-25 11:03:34 +02:00
Sven Schnelle
28be5743c6 s390: remove do_signal() prototype and do_notify_resume() function
Both are no longer used since the conversion to generic
entry, therefore remove them.

Fixes: 56e62a7370 ("s390: convert to generic entry")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-25 11:03:34 +02:00
Alexander Egorenkov
7c0eaa78b9 s390/sclp: reserve memory occupied by sclp early buffer
The memory block occupied by the SCLP early buffer that is allocated
by the decompressor and then handed over to the decompressed kernel,
must be reserved to prevent it from being reused for other purposes.
This is necessary because the SCLP early buffer is still in use
during kernel initialization.

Fixes: f1d3c53237 ("s390/boot: move sclp early buffer from fixed address in asm to C")
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-08-18 10:01:29 +02:00
Heiko Carstens
c78d0c7484 s390: rename dma section to amode31
The dma section name is confusing, since the code which resides within
that section has nothing to do with direct memory access.  Instead the
limitation is that the code has to run in 31 bit addressing mode, and
therefore has to reside below 2GB.  So the name was chosen since
ZONE_DMA is the same region.

To reduce confusion rename the section to amode31, which hopefully
describes better what this is about.

Note: this will also change vmcoreinfo strings
- SDMA=... gets renamed to SAMODE31=...
- EDMA=... gets renamed to EAMODE31=...

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-05 14:10:53 +02:00
Sebastian Andrzej Siewior
a73de29320 s390: replace deprecated CPU-hotplug functions
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210803141621.780504-5-bigeasy@linutronix.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-05 14:10:53 +02:00
Ilya Leoshkevich
de5012b41e s390/ftrace: implement hotpatching
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.

The new trampolines are split in 3 stages:

- A first stage is a 6-byte relative conditional long branch located at
  each function's entry point. Its offset always points to the second
  stage for the corresponding function, and its mask is either all 0s
  (ftrace off) or all 1s (ftrace on). The code for flipping the mask is
  borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
  flipping, ftrace_arch_code_modify_post_process() syncs with all the
  other CPUs by sending SIGPs.

- Second stages for vmlinux are stored in a separate part of the .text
  section reserved by the linker script, and in dynamically allocated
  memory for modules. This prevents the icache pollution. The total
  size of second stages is about 1.5% of that of the kernel image.

  Putting second stages in the .bss section is possible and decreases
  the size of the non-compressed vmlinux, but splits the kernel 1:1
  mapping, which is a bad tradeoff.

  Each second stage contains a call to the third stage, a pointer to
  the part of the intercepted function right after the first stage, and
  a pointer to an interceptor function (e.g. ftrace_caller).

  Second stages are 8-byte aligned for the future direct calls
  implementation.

- There are only two copies of the third stage: in the .text section
  for vmlinux and in dynamically allocated memory for modules. It can be
  an expoline, which is relatively large, so inlining it into each
  second stage is prohibitively expensive.

As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.

Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-08-03 14:31:40 +02:00
Ilya Leoshkevich
e37b3dd063 s390: enable KCSAN
s390x GCC and SystemZ Clang have ThreadSanitizer support now [1] [2],
so enable KCSAN for s390.

[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ea22954e7c58
[2] https://reviews.llvm.org/D105629

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-30 17:09:23 +02:00
Sumanth Korikkar
7561c14d8a s390/vdso: add .got.plt in vdso linker script
KCFLAGS="-mno-pic-data-is-text-relative" make leads to bfd assertion
error in s390_got_pointer():

LD      arch/s390/kernel/vdso64/vdso64.so.dbg
ld: BFD version 2.35-18.fc33 assertion fail elf-s390-common.c:74

readelf -Wr vdso64_generic.o | grep GOT
0000000000000032  000000110000001a R_390_GOTENT 0000000000000000 _vdso_data + 2
(...)

Add .got.plt in linker script to avoid this.

Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-30 17:08:21 +02:00
Heiko Carstens
7e82523f25 s390/hwcaps: make sie capability regular hwcap
Commit 7f16d7e787 ("s390: show virtualization support in /proc/cpuinfo")
introduced special handling for sie capability, saying this should not be
exposed via hwcaps, without giving a reason.

However this leads to an inconsistent /proc/cpuinfo features line
where all features except the sie capability are also present in
hwcaps. I really don't see a reason to not add that to hwcaps - it
might be quite pointless, but at least this way it is possible to get
rid of some special handling.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:21 +02:00
Heiko Carstens
98ac9169e5 s390/hwcaps: remove hwcap stfle check
Remove the not so obvious "(elf_hwcap & (1UL << 2)" which only checks
if stfle is available. This used to be required for old code before
test_facility() was introduced. test_facility() will do the right
thing, regardless if stfle is available or not.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:21 +02:00
Heiko Carstens
487dff5638 s390/hwcaps: remove z/Architecture mode active check
Remove a leftover from the common 31/64 bit code. z/Architecture mode
is now always active, there is no need to check.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:21 +02:00
Heiko Carstens
449fbd713f s390/hwcaps: use consistent coding style / remove comments
Use a consistent coding style within setup_hwcaps() and remove obvious
and outdated comments.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:21 +02:00
Heiko Carstens
251527c9b0 s390/hwcaps: open code initialization of first six hwcap bits
The first six hwcap bits are initialized in a rather odd way: an array
contains the stfl(e) bits which need to be set, so that the
corresponding bit position (= array index) within hwcaps are set.

Better open code it like it is done for all other bits, making it
obvious which bit is set when.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:21 +02:00
Heiko Carstens
873129ca7b s390/hwcaps: split setup_hwcaps()
setup_hwcaps() is a quite large function. Make it smaller by moving
the elf platform setup code into an independent setup function.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:20 +02:00
Heiko Carstens
f17a6d5d83 s390/hwcaps: move setup_hwcaps()
Move setup_hwcaps() to processor.c for two reasons:
- make setup.c a bit smaller
- have allmost all of the hwcap code in one file

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:20 +02:00
Heiko Carstens
c68d463286 s390/hwcaps: add sanity checks
Add BUILD_BUG_ON() sanity checks to make sure the hwcap string array
contains a string for each hwcap.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:20 +02:00
Heiko Carstens
95655495e4 s390/hwcaps: use named initializers for hwcap string arrays
Use named initializers to make it obvious which hwcap string array
element belongs to which hwcap.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:20 +02:00
Heiko Carstens
511ad531af s390/hwcaps: shorten HWCAP defines
Remove s390 part of all HWCAP defines, just to make them shorter and
easier to handle. The namespace is anyway per architecture.
This is similar to what arm64 has.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:20 +02:00
Niklas Schnelle
7e8403ecaf s390: add HWCAP_S390_PCI_MIO to ELF hwcaps
In order to support the use of enhanced PCI instructions in both kernel-
and userspace we need both hardware support and proper setup in the
kernel. The latter can be toggled off with the pci=nomio command line
option.

Thus availability of this feature in userspace depends on all of kernel
configuration (CONFIG_PCI), hardware support and the current kernel
command line and can thus not rely solely on a facility bit. Instead
let's introduce a new ELF hardware capability bit HWCAP_S390_PCI_MIO to
tell userspace whether these PCI instructions can be used.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:19 +02:00
Niklas Schnelle
3322ba0d7b s390: make PCI mio support a machine flag
Kernel support for the newer PCI mio instructions can be toggled off
with the pci=nomio command line option which needs to integrate with
common code PCI option parsing. However this option then toggles static
branches which can't be toggled yet in an early_param() call.

Thus commit 9964f396f1 ("s390: fix setting of mio addressing control")
moved toggling the static branches to the PCI init routine.

With this setup however we can't check for mio support outside the PCI
code during early boot, i.e. before switching the static branches, which
we need to be able to export this as an ELF HWCAP.

Improve on this by turning mio availability into a machine flag that
gets initially set based on CONFIG_PCI and the facility bit and gets
toggled off if pci=nomio is found during PCI option parsing allowing
simple access to this machine flag after early init.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:19 +02:00
Heiko Carstens
196e3c6ad1 s390/disassembler: add instructions
Add more instructions to the kernel disassembler.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:19 +02:00
Heiko Carstens
b3bc7980f4 s390: report more CPU capabilities
Add hardware capability bits and feature tags to /proc/cpuinfo
for NNPA and Vector-Packed-Decimal-Enhancement Facility 2.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:19 +02:00
Alexander Egorenkov
f1a5469474 s390/setup: don't reserve memory that occupied decompressor's head
There is no useful information within [STARTUP_NORMAL_OFFSET, HEAD_END] now.

But the memory region [0, STARTUP_NORMAL_OFFSET] is used by:
* lowcore
* kdump for swapping memory
* stand-alone zipl dumpers for code, data, stack and heap

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:17 +02:00
Alexander Egorenkov
6bda667037 s390/boot: move dma sections from decompressor to decompressed kernel
This change simplifies the task of making the decompressor relocatable.

The decompressor's image contains special DMA sections between _sdma and
_edma. This DMA segment is loaded at boot as part of the decompressor and
then simply handed over to the decompressed kernel. The decompressor itself
never uses it in any way. The primary reason for this is the need to keep
the aforementioned DMA segment below 2GB which is required by architecture,
and because the decompressor is always loaded at a fixed low physical
address, it is guaranteed that the DMA region will not cross the 2GB
memory limit. If the DMA region had been placed in the decompressed kernel,
then KASLR would make this guarantee impossible to fulfill or it would
be restricted to the first 2GB of memory address space.

This commit moves all DMA sections between _sdma and _edma from
the decompressor's image to the decompressed kernel's image. The complete
DMA region is placed in the init section of the decompressed kernel and
immediately relocated below 2GB at start-up before it is needed by other
parts of the decompressed kernel. The relocation of the DMA region happens
even if the decompressed kernel is already located below 2GB in order
to keep the first implementation simple. The relocation should not have
any noticeable impact on boot time because the DMA segment is only a couple
of pages.

After relocating the DMA sections, the kernel has to fix all references
which point into it. In order to automate this, place all variables
pointing into the DMA sections in a special .dma.refs section. All such
variables must be defined using the new __dma_ref macro. Only variables
containing addresses within the DMA sections must be placed in the new
.dma.refs section.

Furthermore, move the initialization of control registers from
the decompressor to the decompressed kernel because some control registers
reference tables that must be placed in the DMA data section to
guarantee that their addresses are below 2G. Because the decompressed
kernel relocates the DMA sections at startup, the content of control
registers CR2, CR5 and CR15 must be updated with new addresses after
the relocation. The decompressed kernel initializes all control registers
early at boot and then updates the content of CR2, CR5 and CR15
as soon as the DMA relocation has occurred. This practically reverts
the commit a80313ff91 ("s390/kernel: introduce .dma sections").

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:17 +02:00
Alexander Egorenkov
455cac5028 s390/setup: generate asm offsets from struct parmarea
To reduce duplication, replace error-prone and hard-coded parameter area
offsets with auto-generated ones.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:16 +02:00
Alexander Egorenkov
e9e7870f90 s390/dump: introduce boot data 'oldmem_data'
The new boot data struct shall replace global variables OLDMEM_BASE and
OLDMEM_SIZE. It is initialized in the decompressor and passed
to the decompressed kernel. In comparison to the old solution, this one
doesn't access data at fixed physical addresses which will become important
when the decompressor becomes relocatable.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:16 +02:00
Alexander Egorenkov
84733284f6 s390/boot: introduce boot data 'initrd_data'
The new boot data struct shall replace global variables INITRD_START and
INITRD_SIZE. It is initialized in the decompressor and passed
to the decompressed kernel. In comparison to the old solution, this one
doesn't access data at fixed physical addresses which will become important
when the decompressor becomes relocatable.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:15 +02:00
Alexander Egorenkov
7f33565b25 s390/uv: de-duplicate checks for Protected Host Virtualization
De-duplicate checks for Protected Host Virtualization in decompressor and
kernel.

Set prot_virt_host=0 in the decompressor in *any* of the following cases
and hand it over to the decompressed kernel:
* No explicit prot_virt=1 is given on the kernel command-line
* Protected Guest Virtualization is enabled
* Hardware support not present
* kdump or stand-alone dump

The decompressed kernel needs to use only is_prot_virt_host() instead of
performing again all checks done by the decompressor.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:14 +02:00
Heiko Carstens
5492886c14 s390/jump_label: print real address in a case of a jump label bug
In case of a jump label print the real address of the piece of code
where a mismatch was detected. This is right before the system panics,
so there is nothing revealed.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:13 +02:00
Alexander Egorenkov
6040b3f45f s390/cio: remove unused include linux/spinlock.h from cio.h
* The linux/spinlock.h header was included indirectly by the decompressor
  and brought unnecessary build dependencies.
* Use proper includes in files which either directly or indirectly included
  cio.h and were hidden until now by the included linux/spinlock.h, e.g.
  linux/string.h for memcpy() or asm/page.h for PAGE_SIZE.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:13 +02:00
Linus Torvalds
3d5895cd35 s390 updates for 5.14-rc3
- fix / add expoline usage in "DMA" code
 
 - fix compat vdso Makefile to avoid permanent rebuild
 
 - fix ftrace_update_ftrace_func to avoid NULL pointer dereference
 
 - update defconfigs
 
 - trivial coding style fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmD4aBQACgkQIg7DeRsp
 bsJu3Q/9F5roAWqs+L6Us9tCd+rYvUsJYBhYeI7eQDM9eyb860IgJgCSmxj45WGH
 798C0yO01u7DkzN21XOfl8GnnGYSEX7yN1IZkXm7k8rDauxqbk0tc/XsPRJM9wOT
 WckqSuKJrgZYxnEuiMQQgxv3aBYsQBWAG3z1spDE/PlUyfcBKTYrUrR55t8WZgQp
 UPE21ArH8wzx2sycxet5+m3UwttbMxdhN0VY9yEg1tFesVKX0CcjpwhGcX1DUPY1
 +lJRCEjEvryJWg8xBdPXXBeUjV2U5h8035gi2um75IGe5LL+J3uwQlmkGgUXa635
 /oWJ6ST1F7TyJ2wM9Nmcn+MjvwMHs+Sd8vfSgpX040GidNGMlZYL/1494zxQ91XS
 h51JB1b9m/8kP+RAtoYKEpK0k4ELWXWaI5zAh9gsKGisfH25pMOSxsuQIBpTijRu
 o9VDUtRXHkmqoQ3wurxrFzWZLVDneBGv5EOK7SxoKGdkpg1PDqkO+ZLarppMsx9g
 IcLmQwV3mQYQrBttAa061UkgvDxUFhrfUANJraEhI9283aEc8tzc+dRJkNJu48kA
 kHWHhgKqVysKiyppQO2XbidRMp2vVlVHuvB8e0sMbDsC7a7EH5LW2pRq/B+RCB+1
 797j5B9EczsneISU2YNL10JWPMfHeZ8BYnkA5CWRqAbcYdDBAAI=
 =E8AN
 -----END PGP SIGNATURE-----

Merge tag 's390-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - fix / add expoline usage in "DMA" code

 - fix compat vdso Makefile to avoid permanent rebuild

 - fix ftrace_update_ftrace_func to avoid NULL pointer dereference

 - update defconfigs

 - trivial coding style fix

* tag 's390-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: update defconfigs
  s390/cpumf: fix semicolon.cocci warnings
  s390/boot: fix use of expolines in the DMA code
  s390/ftrace: fix ftrace_update_ftrace_func implementation
  s390/defconfig: allow early device mapper disks
  s390/vdso32: add vdso32.lds to targets
2021-07-21 13:51:26 -07:00
kernel test robot
7d24464375 s390/cpumf: fix semicolon.cocci warnings
arch/s390/kernel/perf_cpum_cf.c:748:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: a029a4eab3 ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility")
CC: Thomas Richter <tmricht@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-20 17:59:40 +02:00
Vasily Gorbik
f8c2602733 s390/ftrace: fix ftrace_update_ftrace_func implementation
s390 enforces DYNAMIC_FTRACE if FUNCTION_TRACER is selected.
At the same time implementation of ftrace_caller is not compliant with
HAVE_DYNAMIC_FTRACE since it doesn't provide implementation of
ftrace_update_ftrace_func() and calls ftrace_trace_function() directly.

The subtle difference is that during ftrace code patching ftrace
replaces function tracer via ftrace_update_ftrace_func() and activates
it back afterwards. Unexpected direct calls to ftrace_trace_function()
during ftrace code patching leads to nullptr-dereferences when tracing
is activated for one of functions which are used during code patching.
Those function currently are:
copy_from_kernel_nofault()
copy_from_kernel_nofault_allowed()
preempt_count_sub() [with debug_defconfig]
preempt_count_add() [with debug_defconfig]

Corresponding KASAN report:
 BUG: KASAN: nullptr-dereference in function_trace_call+0x316/0x3b0
 Read of size 4 at addr 0000000000001e08 by task migration/0/15

 CPU: 0 PID: 15 Comm: migration/0 Tainted: G B 5.13.0-41423-g08316af3644d
 Hardware name: IBM 3906 M04 704 (LPAR)
 Stopper: multi_cpu_stop+0x0/0x3e0 <- stop_machine_cpuslocked+0x1e4/0x218
 Call Trace:
  [<0000000001f77caa>] show_stack+0x16a/0x1d0
  [<0000000001f8de42>] dump_stack+0x15a/0x1b0
  [<0000000001f81d56>] print_address_description.constprop.0+0x66/0x2e0
  [<000000000082b0ca>] kasan_report+0x152/0x1c0
  [<00000000004cfd8e>] function_trace_call+0x316/0x3b0
  [<0000000001fb7082>] ftrace_caller+0x7a/0x7e
  [<00000000006bb3e6>] copy_from_kernel_nofault_allowed+0x6/0x10
  [<00000000006bb42e>] copy_from_kernel_nofault+0x3e/0xd0
  [<000000000014605c>] ftrace_make_call+0xb4/0x1f8
  [<000000000047a1b4>] ftrace_replace_code+0x134/0x1d8
  [<000000000047a6e0>] ftrace_modify_all_code+0x120/0x1d0
  [<000000000047a7ec>] __ftrace_modify_code+0x5c/0x78
  [<000000000042395c>] multi_cpu_stop+0x224/0x3e0
  [<0000000000423212>] cpu_stopper_thread+0x33a/0x5a0
  [<0000000000243ff2>] smpboot_thread_fn+0x302/0x708
  [<00000000002329ea>] kthread+0x342/0x408
  [<00000000001066b2>] __ret_from_fork+0x92/0xf0
  [<0000000001fb57fa>] ret_from_fork+0xa/0x30

 The buggy address belongs to the page:
 page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1
 flags: 0x1ffff00000001000(reserved|node=0|zone=0|lastcpupid=0x1ffff)
 raw: 1ffff00000001000 0000040000000048 0000040000000048 0000000000000000
 raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  0000000000001d00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0000000000001d80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
 >0000000000001e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
                       ^
  0000000000001e80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0000000000001f00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
 ==================================================================

To fix that introduce ftrace_func callback to be called from
ftrace_caller and update it in ftrace_update_ftrace_func().

Fixes: 4cc9bed034 ("[S390] cleanup ftrace backend functions")
Cc: stable@vger.kernel.org
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-15 12:54:58 +02:00
Gustavo A. R. Silva
d4e81342ea s390: Fix fall-through warnings for Clang
Fix the following fallthrough warnings:

drivers/s390/net/ctcm_fsms.c:1457:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
drivers/s390/net/qeth_l3_main.c:437:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
drivers/s390/char/tape_char.c:374:4: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
arch/s390/kernel/uprobes.c:129:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-07-13 14:43:09 -05:00
Eric W. Biederman
b48c7236b1 exit/bdflush: Remove the deprecated bdflush system call
The bdflush system call has been deprecated for a very long time.
Recently Michael Schmitz tested[1] and found that the last known
caller of of the bdflush system call is unaffected by it's removal.

Since the code is not needed delete it.

[1] https://lkml.kernel.org/r/36123b5d-daa0-6c2b-f2d4-a942f069fd54@gmail.com
Link: https://lkml.kernel.org/r/87sg10quue.fsf_-_@disp2133
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-07-12 15:17:47 -05:00
Sven Schnelle
98f7cd23aa s390/vdso32: add vdso32.lds to targets
This fixes a permanent rebuild of the 32 bit vdso. The RPM build process
was first calling 'make bzImage' and 'make modules' as a second step.
This caused a recompilation of vdso32.so, which in turn also changed
the build-id of vmlinux.

Fixes: 779df22487 ("s390/vdso: add minimal compat vdso")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-12 14:33:53 +02:00
Linus Torvalds
e98e03d075 s390 updates for the 5.14 merge window #2
- Fix preempt_count initialization.
 
 - Rework call_on_stack() macro to add proper type handling and avoid
   possible register corruption.
 
 - More error prone "register asm" removal and fixes.
 
 - Fix syscall restarting when multiple signals are coming in. This adds
   minimalistic trampolines to vdso so we can return from signal without
   using the stack which requires pgm check handler hacks when NX is
   enabled.
 
 - Remove HAVE_IRQ_EXIT_ON_IRQ_STACK since this is no longer true after
   switch to generic entry.
 
 - Fix protected virtualization secure storage access exception handling.
 
 - Make machine check C handler always enter with DAT enabled and move
   register validation to C code.
 
 - Fix tinyconfig boot problem by avoiding MONITOR CALL without CONFIG_BUG.
 
 - Increase asm symbols alignment to 16 to make it consistent with
   compilers.
 
 - Enable concurrent access to the CPU Measurement Counter Facility.
 
 - Add support for dynamic AP bus size limit and rework ap_dqap to deal
   with messages greater than recv buffer.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmDpo78ACgkQjYWKoQLX
 FBhapAf/UkwhYLTrEEOSmrWQaJHyOkdxcexksBPvh/GEQvDkHixHROeytRb5SBMD
 JXGJsWN3g9DjiE9DLFGSK2eQfKYFoPaXWv3Tf+QIPvEwvODKZHRLVJU1VTXiF4EA
 +Xu4Q0o1VtX8ashEN0IHjBycPY9l+v9Ncb2erKG+CzoRa1r5C+A+1E0sLtsNN9TW
 AV+57ATCc0pgNkUfzVoG2S/QdSY+fpMWtjoPeCj/DNCpFq1eRTMRGolTsdnqGpDx
 ueAozHW9Pc9TQXeWpOVM7ZgWRWJThCC4vj2W/B7ygCuyJh4N0y2rthFEg3tWUf0f
 +X6KDXeijFs4Gf0ewYLZjZCpBT8/NA==
 =EgkG
 -----END PGP SIGNATURE-----

Merge tag 's390-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Fix preempt_count initialization.

 - Rework call_on_stack() macro to add proper type handling and avoid
   possible register corruption.

 - More error prone "register asm" removal and fixes.

 - Fix syscall restarting when multiple signals are coming in. This adds
   minimalistic trampolines to vdso so we can return from signal without
   using the stack which requires pgm check handler hacks when NX is
   enabled.

 - Remove HAVE_IRQ_EXIT_ON_IRQ_STACK since this is no longer true after
   switch to generic entry.

 - Fix protected virtualization secure storage access exception
   handling.

 - Make machine check C handler always enter with DAT enabled and move
   register validation to C code.

 - Fix tinyconfig boot problem by avoiding MONITOR CALL without
   CONFIG_BUG.

 - Increase asm symbols alignment to 16 to make it consistent with
   compilers.

 - Enable concurrent access to the CPU Measurement Counter Facility.

 - Add support for dynamic AP bus size limit and rework ap_dqap to deal
   with messages greater than recv buffer.

* tag 's390-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
  s390: preempt: Fix preempt_count initialization
  s390/linkage: increase asm symbols alignment to 16
  s390: rename CALL_ON_STACK_NORETURN() to call_on_stack_noreturn()
  s390: add type checking to CALL_ON_STACK_NORETURN() macro
  s390: remove old CALL_ON_STACK() macro
  s390/softirq: use call_on_stack() macro
  s390/lib: use call_on_stack() macro
  s390/smp: use call_on_stack() macro
  s390/kexec: use call_on_stack() macro
  s390/irq: use call_on_stack() macro
  s390/mm: use call_on_stack() macro
  s390: introduce proper type handling call_on_stack() macro
  s390/irq: simplify on_async_stack()
  s390/irq: inline do_softirq_own_stack()
  s390/irq: simplify do_softirq_own_stack()
  s390/ap: get rid of register asm in ap_dqap()
  s390: rename PIF_SYSCALL_RESTART to PIF_EXECVE_PGSTE_RESTART
  s390: move restart of execve() syscall
  s390/signal: remove sigreturn on stack
  s390/signal: switch to using vdso for sigreturn and syscall restart
  ...
2021-07-10 10:46:14 -07:00
Valentin Schneider
6a942f5780 s390: preempt: Fix preempt_count initialization
S390's init_idle_preempt_count(p, cpu) doesn't actually let us initialize the
preempt_count of the requested CPU's idle task: it unconditionally writes
to the current CPU's. This clearly conflicts with idle_threads_init(),
which intends to initialize *all* the idle tasks, including their
preempt_count (or their CPU's, if the arch uses a per-CPU preempt_count).

Unfortunately, it seems the way s390 does things doesn't let us initialize
every possible CPU's preempt_count early on, as the pages where this
resides are only allocated when a CPU is brought up and are freed when it
is brought down.

Let the arch-specific code set a CPU's preempt_count when its lowcore is
allocated, and turn init_idle_preempt_count() into an empty stub.

Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210707163338.1623014-1-valentin.schneider@arm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:18 +02:00
Heiko Carstens
b55e692e6b s390: rename CALL_ON_STACK_NORETURN() to call_on_stack_noreturn()
Lower case matches the call_on_stack() macro and is easier to read.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:18 +02:00
Heiko Carstens
0f541cc201 s390/smp: use call_on_stack() macro
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:18 +02:00
Heiko Carstens
845370f47f s390/kexec: use call_on_stack() macro
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:18 +02:00
Heiko Carstens
de556892dc s390/irq: use call_on_stack() macro
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:18 +02:00
Heiko Carstens
bb250e64e4 s390/irq: simplify on_async_stack()
Make on_async_stack() a bit more readable, even though as usual it
depends if one considers "!!!" readable or not.
At least the new construct to check if the async stack is in use or
not is a bit shorter and generates slightly better code.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:17 +02:00
Heiko Carstens
2ae6521504 s390/irq: inline do_softirq_own_stack()
Move do_softirq_own_stack() to proper header file so it can be
inlined; saving a few cycles.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:17 +02:00
Heiko Carstens
938e02beb3 s390/irq: simplify do_softirq_own_stack()
do_softirq_own_stack() is always called from task context and
therefore it is not necessary to check if the async stack is
currently used.
Remove the check and directly switch to async stack.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:17 +02:00
Sven Schnelle
d26a357fe8 s390: rename PIF_SYSCALL_RESTART to PIF_EXECVE_PGSTE_RESTART
PIF_SYSCALL_RESTART is now only used to restart execve when loading
PGSTE binaries. Rename the flag to reflect that, and avoid people
thinking that this bit has anything to do with generic syscall
restarting.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:17 +02:00
Sven Schnelle
e3c7a8d7f4 s390: move restart of execve() syscall
On s390, execve might have to be restarted for PGSTE binaries
like kvm. In the past this was done via the PIF_SYSCALL_RESTART
bit. However, with the recent changes, syscalls are now restarted
differently. Now that execve() is the only call that might get
restarted via PIF_SYSCALL_RESTART, move the loop to do_syscall().
This also has the advantage that the restart is no longer visible
to userspace.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:17 +02:00
Sven Schnelle
fbf50f47ea s390/signal: remove sigreturn on stack
{rt_}sigreturn is now called from the vdso, so we no longer
need the svc on the stack, and therefore no hack to support that
mechanism on machines with non-executable stack.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:17 +02:00
Sven Schnelle
df29a7440c s390/signal: switch to using vdso for sigreturn and syscall restart
with generic entry, there's a bug when it comes to restarting of signals.
The failing sequence is:

a) a signal is coming in, and no handler is registered, so the lower
   part of arch_do_signal_or_restart() in arch/s390/kernel/signal.c
   sets PIF_SYSCALL_RESTART.

b) a second signal gets pending while the kernel is still in the exit
   loop, and for that one, a handler exists.

c) The first part of arch_do_signal_or_restart() is called. That part
   calls handle_signal(), which sets up stack + registers for handling
   the signal.

d) __do_syscall() in arch/s390/kernel/syscall.c checks for
   PIF_SYSCALL_RESTART right before leaving to userspace. If it is set,
   it restart's the syscall. However, the registers are already setup
   for handling a signal from c). The syscall is now restarted with the
   wrong arguments.

Change the code to:

- use vdso for syscall_restart() instead of PIF_SYSCALL_RESTART because
  we cannot rewind and go back to userspace on s390 because the system call
  number might be encoded in the svc instruction.
- for all other syscalls we rewind the PSW and return to userspace.

Cc: <stable@kernel.org> # v5.12+ d57778feb9: s390/vdso: always enable vdso
Cc: <stable@kernel.org> # v5.12+ 686341f254: s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall
Cc: <stable@kernel.org> # v5.12+ 43e1f76b0b: s390/vdso: rename VDSO64_LBASE to VDSO_LBASE
Cc: <stable@kernel.org> # v5.12+ 779df22487: s390/vdso: add minimal compat vdso
Cc: <stable@kernel.org> # v5.12+
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:09:47 +02:00
Kefeng Wang
638cd5a306 s390: convert to setup_initial_init_mm()
Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-14-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:21 -07:00
Sven Schnelle
779df22487 s390/vdso: add minimal compat vdso
Add a small vdso for 31 bit compat application that provides
trampolines for calls to sigreturn,rt_sigreturn,syscall_restart.
This is requird for moving these syscalls away from the signal
frame to the vdso. Note that this patch effectively disables
CONFIG_COMPAT when using clang to compile the kernel. clang
doesn't support 31 bit mode.

We want to redirect sigreturn and restart_syscall to the vdso. However,
the kernel cannot parse the ELF vdso file, so we need to generate header
files which contain the offsets of the syscall instructions in the vdso
page.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 15:37:28 +02:00
Sven Schnelle
43e1f76b0b s390/vdso: rename VDSO64_LBASE to VDSO_LBASE
Will be used by both vdso32 and vdso64, so change the name.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 15:37:28 +02:00
Sven Schnelle
686341f254 s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall
Add minimalistic trampolines to vdso64 so we can return from signal
without using the stack which requires pgm check handler hacks when
NX is enabled.

restart_syscall will be called from vdso to work around the architectural
limitation that the syscall number might be encoded in the svc instruction,
and therefore can not be changed.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 15:37:28 +02:00
Sven Schnelle
d57778feb9 s390/vdso: always enable vdso
With the upcoming move of the svc sigreturn instruction from
the signal frame to vdso we need to have vdso always enabled.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 15:37:28 +02:00
Ilya Leoshkevich
b8e9cc20b8 s390/traps: do not test MONITOR CALL without CONFIG_BUG
tinyconfig fails to boot, because without CONFIG_BUG report_bug()
always returns BUG_TRAP_TYPE_BUG, which causes mc 0,0 in
test_monitor_call() to panic. Fix by skipping the test without
CONFIG_BUG.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 15:37:27 +02:00
Thomas Richter
a029a4eab3 s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility
Commit cf6acb8bdb ("s390/cpumf: Add support for complete counter set extraction")
allows access to the CPU Measurement Counter Facility via character
device /dev/hwctr. The access was exclusive via this device or
via perf_event_open() system call. Only one path at a time was
permitted. The CPU Measurement Counter Facility device driver blocked
access to other processes.

This patch removes this restriction and allows concurrent access to
the CPU Measurement Counter Facility from multiple processes at the same
time via perf_event_open() SVC and via /dev/hwctr device. The access
via /dev/hwctr device is still exclusive, only one process is allowed to
access this device.

This patch
- moves the /dev/hwctr device access from file perf_cpum_cf_diag.c.
  to file perf_cpum_cf.c.
- use only one trace buffer .../s390dbf/cpum_cf.
- remove cfset_csd structure and includes its members it into the
  structure cpu_cf_events. This results in one data structure and
  simplifies the access.
- rework function familiy ctr_set_enable, ctr_set_disable, ctr_set_start
  and ctr_set_stop which operate on a counter set number.
  Now they operate on a counter set bit mask.
- move CF_DIAG event functionality to file perf_cpum_cf.c. It now
  contains the complete functionality of the CPU Measurement Counter
  Facility:
  - Performance measurement support for counters using perf stat.
  - Support for complete counter set extraction with device /dev/hwctr.
  - Support for counter set extraction event CF_DIAG attached to
    samples using perf record.
- removes file perf_cpum_cf_diag.c

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Alexander Gordeev
5fa2ea0714 s390/mcck: move register validation to C code
This update partially reverts commit 3037a52f98 ("s390/nmi:
do register validation as early as possible").

Storage error checks and control registers validation are left
in the assembler code, since correct ASCEs and page tables are
required to enable DAT - which is done before the C handler is
entered.

System damage, kernel instruction address and PSW MWP checks
are left in the assembler code as well, since there is no way
to proceed if one of these checks is failed.

The getcpu vdso syscall reads CPU number from the programmable
field of the TOD clock. Disregard the TOD programmable register
validity bit and load the CPU number into the TOD programmable
field unconditionally.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Alexander Egorenkov
9f744abb46 s390/boot: replace magic string check with a bootdata flag
The magic string "S390EP" at offset 0x10008 indicated to the decompressed
kernel that it was booted by the decompressor. Introduce a new bootdata
flag instead which conveys the same information in an explicit and
a cleaner way. But keep the magic string because it is a kernel ABI.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Alexander Gordeev
d35925b349 s390/mcck: move storage error checks to assembler
The current storage errors tackling is wrong - the DAT is
enabled in assembler code before the actual storage checks
in C half are executed. In case the page tables themselves
are damaged such approach is not going to work.

With this update unrecoverable storage errors are not
passed to C code for handling, but rather the machine
is stopped right away. The only exception to this flow
is when a machine check occurred in KVM guest - in this
case the errors are reinjected by the handler.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Alexander Gordeev
7f6dc8d4c8 s390/mcck: always enter C handler with DAT enabled
The machine check handler must be entered with DAT disabled
in case control registers are corrupted or a storage error
happened and we can not tell if such error corresponds to a
page table.

Both of described conditions end up in stopping all CPUs and
entering the disabled wait in C half of the handler. However,
the storage errors are still checked after the DAT is enabled
and C code is entered. In case a page table is damaged such
flow is not expected to work.

This update paves the way for moving the storage error checks
from C to assembler half. All fatal errors that can only be
handled with DAT disabled are handled in assembler half also.
As result, the C half is only entered if the DAT is secured.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Alexander Gordeev
e2c13d6420 s390/mcck: optimize user mode check in case of !CONFIG_KVM
In case of the !CONFIG_KVM use "jz" instead of "jnz" when
detecting user mode and get rid of unnecessary jump as result.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Christia Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Alexander Gordeev
fbbdfca5c5 s390/entry.S: factor out SIEEXIT macro
Factor out SIEEXIT macro and use it instead of cleanup_sie
routine. As a side effect %r13 and %r14 are spared.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Christia Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Janosch Frank
85b18d7b5e s390: mm: Fix secure storage access exception handling
Turns out that the bit 61 in the TEID is not always 1 and if that's
the case the address space ID and the address are
unpredictable. Without an address and its address space ID we can't
export memory and hence we can only send a SIGSEGV to the process or
panic the kernel depending on who caused the exception.

Unfortunately bit 61 is only reliable if we have the "misc" UV feature
bit.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 084ea4d611 ("s390/mm: add (non)secure page access exceptions handlers")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Kefeng Wang
47f7c6cf00 s390/kprobes: use is_kernel() helper
Use is_kernel() helper instead of is_kernel_addr().

[hca@linux.ibm.com: add missing unsigned long cast]
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Linus Torvalds
2bb919b62f s390 updates for the 5.14 merge window
- Rework inline asm to get rid of error prone "register asm" constructs,
   which are problematic especially when code instrumentation is enabled. In
   particular introduce and use register pair union to allocate even/odd
   register pairs. Unfortunately this breaks compatibility with older
   clang compilers and minimum clang version for s390 has been raised to 13.
   https://lore.kernel.org/linux-next/CAK7LNARuSmPCEy-ak0erPrPTgZdGVypBROFhtw+=3spoGoYsyw@mail.gmail.com/
 
 - Fix gcc 11 warnings, which triggered various minor reworks all over
   the code.
 
 - Add zstd kernel image compression support.
 
 - Rework boot CPU lowcore handling.
 
 - De-duplicate and move kernel memory layout setup logic earlier.
 
 - Few fixes in preparation for FORTIFY_SOURCE performing compile-time
   and run-time field bounds checking for mem functions.
 
 - Remove broken and unused power management support leftovers in s390
   drivers.
 
 - Disable stack-protector for decompressor and purgatory to fix buildroot
   build.
 
 - Fix vt220 sclp console name to match the char device name.
 
 - Enable HAVE_IOREMAP_PROT and add zpci_set_irq()/zpci_clear_irq() in
   zPCI code.
 
 - Remove some implausible WARN_ON_ONCEs and remove arch specific counter
   transaction call backs in favour of default transaction handling in
   perf code.
 
 - Extend/add new uevents for online/config/mode state changes of
   AP card / queue device in zcrypt.
 
 - Minor entry and ccwgroup code improvements.
 
 - Other small various fixes and improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmDhuTEACgkQjYWKoQLX
 FBjVlggAgDFBkDjlyfvrm4xzmHi7BJMmhrTJIONsSz+3tcA4/u5kE+Hrdrqxm0Uh
 ZH4MXBxn4q4Fmoomhu5w5ZDe8o2ip0aN9fFNdsBoP8hurmQbL/IbdTnBETKMrKpV
 XpogU2G7p+2nQ0+9+o6PS/vWlZhI88NVh8dWyRd2+5/XdMycgLv2Qm7NpQoACVw1
 CbUvxP2PlpZ0wltLvNBKPg1xXMZa3GS0wbVUsS2jiWcr/3VzCqfTHenZJ/RadoE6
 axG99QXCbLDMsJgVQcXtlI8K6Z461fAwbNtWZWC+Uq7o5pYuUFW1dovMg9WWF+7T
 lFNqXyyNy5wwITRkvuzjlVTE8yzYYg==
 =ADZ4
 -----END PGP SIGNATURE-----

Merge tag 's390-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Rework inline asm to get rid of error prone "register asm"
   constructs, which are problematic especially when code
   instrumentation is enabled.

   In particular introduce and use register pair union to allocate
   even/odd register pairs. Unfortunately this breaks compatibility with
   older clang compilers and minimum clang version for s390 has been
   raised to 13.

     https://lore.kernel.org/linux-next/CAK7LNARuSmPCEy-ak0erPrPTgZdGVypBROFhtw+=3spoGoYsyw@mail.gmail.com/

 - Fix gcc 11 warnings, which triggered various minor reworks all over
   the code.

 - Add zstd kernel image compression support.

 - Rework boot CPU lowcore handling.

 - De-duplicate and move kernel memory layout setup logic earlier.

 - Few fixes in preparation for FORTIFY_SOURCE performing compile-time
   and run-time field bounds checking for mem functions.

 - Remove broken and unused power management support leftovers in s390
   drivers.

 - Disable stack-protector for decompressor and purgatory to fix
   buildroot build.

 - Fix vt220 sclp console name to match the char device name.

 - Enable HAVE_IOREMAP_PROT and add zpci_set_irq()/zpci_clear_irq() in
   zPCI code.

 - Remove some implausible WARN_ON_ONCEs and remove arch specific
   counter transaction call backs in favour of default transaction
   handling in perf code.

 - Extend/add new uevents for online/config/mode state changes of AP
   card / queue device in zcrypt.

 - Minor entry and ccwgroup code improvements.

 - Other small various fixes and improvements all over the code.

* tag 's390-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (91 commits)
  s390/dasd: use register pair instead of register asm
  s390/qdio: get rid of register asm
  s390/ioasm: use symbolic names for asm operands
  s390/ioasm: get rid of register asm
  s390/cmf: get rid of register asm
  s390/lib,string: get rid of register asm
  s390/lib,uaccess: get rid of register asm
  s390/string: get rid of register asm
  s390/cmpxchg: use register pair instead of register asm
  s390/mm,pages-states: get rid of register asm
  s390/lib,xor: get rid of register asm
  s390/timex: get rid of register asm
  s390/hypfs: use register pair instead of register asm
  s390/zcrypt: Switch to flexible array member
  s390/speculation: Use statically initialized const for instructions
  virtio/s390: get rid of open-coded kvm hypercall
  s390/pci: add zpci_set_irq()/zpci_clear_irq()
  scripts/min-tool-version.sh: Raise minimum clang version to 13.0.0 for s390
  s390/ipl: use register pair instead of register asm
  s390/mem_detect: fix tprot() program check new psw handling
  ...
2021-07-04 12:17:38 -07:00
Linus Torvalds
71bd934101 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "190 patches.

  Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
  vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
  migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
  zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
  core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
  signals, exec, kcov, selftests, compress/decompress, and ipc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  ipc/util.c: use binary search for max_idx
  ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
  ipc: use kmalloc for msg_queue and shmid_kernel
  ipc sem: use kvmalloc for sem_undo allocation
  lib/decompressors: remove set but not used variabled 'level'
  selftests/vm/pkeys: exercise x86 XSAVE init state
  selftests/vm/pkeys: refill shadow register after implicit kernel write
  selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
  kcov: add __no_sanitize_coverage to fix noinstr for all architectures
  exec: remove checks in __register_bimfmt()
  x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
  hfsplus: report create_date to kstat.btime
  hfsplus: remove unnecessary oom message
  nilfs2: remove redundant continue statement in a while-loop
  kprobes: remove duplicated strong free_insn_page in x86 and s390
  init: print out unknown kernel parameters
  checkpatch: do not complain about positive return values starting with EPOLL
  checkpatch: improve the indented label test
  checkpatch: scripts/spdxcheck.py now requires python3
  ...
2021-07-02 12:08:10 -07:00
Linus Torvalds
911a2997a5 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmDcl7AACgkQnJ2qBz9k
 QNnsBQf+LBAPsfykQ/f8EdHErO1lfbVTmwf2g/JzTkjrIVZTZ6Ic47aCIiFxgHU2
 Js9ufaPxpsbbopzpn2PAoCUzxNsZDqgXtnC03MOUAqoSFbAvgLHz2sQwjqeYJUGQ
 P6n7VipEA/qBVpQI5zeCUhHYcahoNrRjSLzaFnE2Z8CrQYQ6Ry9gVEhduvu2OTru
 62cWlAWlTJfx/FcR1Y0F/ZznnNSKMiAHcEe3F6Beztplg2ooq+z6FclJYrkmnxMq
 SXSOsqTCdi1/oFx36NpvLkykrIS9I7N/iqCnKwbm6X+nyZZKyAwYZhWVqkbozPPu
 +u1Ppq8o0IuWwEA6/UAmxgAO3m/Gkw==
 =tn0h
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull misc fs updates from Jan Kara:
 "The new quotactl_fd() syscall (remake of quotactl_path() syscall that
  got introduced & disabled in 5.13 cycle), and couple of udf, reiserfs,
  isofs, and writeback fixes and cleanups"

* tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  writeback: fix obtain a reference to a freeing memcg css
  quota: remove unnecessary oom message
  isofs: remove redundant continue statement
  quota: Wire up quotactl_fd syscall
  quota: Change quotactl_path() systcall to an fd-based one
  reiserfs: Remove unneed check in reiserfs_write_full_page()
  udf: Fix NULL pointer dereference in udf_symlink function
  reiserfs: add check for invalid 1st journal block
2021-07-01 12:06:39 -07:00
Barry Song
66ce75144d kprobes: remove duplicated strong free_insn_page in x86 and s390
free_insn_page() in x86 and s390 is same with the common weak function in
kernel/kprobes.c.  Plus, the comment "Recover page to RW mode before
releasing it" in x86 seems insensible to be there since resetting mapping
is done by common code in vfree() of module_memfree().  So drop these two
duplicated strong functions and related comment, then mark the common one
in kernel/kprobes.c strong.

Link: https://lkml.kernel.org/r/20210608065736.32656-1-song.bao.hua@hisilicon.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:06 -07:00
Andy Shevchenko
f39650de68 kernel.h: split out panic and oops helpers
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.

There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain

At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
  Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com

Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:04 -07:00
Linus Torvalds
54a728dc5e Scheduler udpates for this cycle:
- Changes to core scheduling facilities:
 
     - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
       coordinated scheduling across SMT siblings. This is a much
       requested feature for cloud computing platforms, to allow
       the flexible utilization of SMT siblings, without exposing
       untrusted domains to information leaks & side channels, plus
       to ensure more deterministic computing performance on SMT
       systems used by heterogenous workloads.
 
       There's new prctls to set core scheduling groups, which
       allows more flexible management of workloads that can share
       siblings.
 
     - Fix task->state access anti-patterns that may result in missed
       wakeups and rename it to ->__state in the process to catch new
       abuses.
 
  - Load-balancing changes:
 
      - Tweak newidle_balance for fair-sched, to improve
        'memcache'-like workloads.
 
      - "Age" (decay) average idle time, to better track & improve workloads
        such as 'tbench'.
 
      - Fix & improve energy-aware (EAS) balancing logic & metrics.
 
      - Fix & improve the uclamp metrics.
 
      - Fix task migration (taskset) corner case on !CONFIG_CPUSET.
 
      - Fix RT and deadline utilization tracking across policy changes
 
      - Introduce a "burstable" CFS controller via cgroups, which allows
        bursty CPU-bound workloads to borrow a bit against their future
        quota to improve overall latencies & batching. Can be tweaked
        via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
 
      - Rework assymetric topology/capacity detection & handling.
 
  - Scheduler statistics & tooling:
 
      - Disable delayacct by default, but add a sysctl to enable
        it at runtime if tooling needs it. Use static keys and
        other optimizations to make it more palatable.
 
      - Use sched_clock() in delayacct, instead of ktime_get_ns().
 
  - Misc cleanups and fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZcPoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g3yw//WfhIqy7Psa9d/MBMjQDRGbTuO4+w22Dj
 vmWFU44Q4KJxQHWeIgUlrK+dzvYWvNmflUs2CUUOiDVzxFTHMIyBtL4qCBUbx4Ns
 vKAcB9wsWZge2o3WzZqpProRhdoRaSKw8egUr2q7rACVBkckY7eGP/OjWxXU8BdA
 b7D0LPWwuIBFfN4pFYeCDLn32Dqr9s6Chyj+ZecabdG7EE6Gu+f1diVcxy7JE/mc
 4WWL0D1RqdgpGrBEuMJIxPYekdrZiuy4jtEbztz5gbTBteN1cj3BLfqn0Pc/e6rO
 Vyuc5mXCAmzRVi18z6g6bsVl+IA/nrbErENB2OHOhOYtqiZxqGTd4GPWZszMyY17
 5AsEO5+5pcaBsy4gyp09qURggBu9zhJnMVmOI3rIHZkmkhwzc6uUJlyhDCTiFWOz
 3ZF3LjbZEyCKodMD8qMHbs3axIBpIfZqjzkvSKyFnvfXEGVytVse7NUuWtQ36u92
 GnURxVeYY1TDVXvE1Y8owNKMxknKQ6YRlypP7Dtbeo/qG6hShp0xmS7qDLDi0ybZ
 ZlK+bDECiVoDf3nvJo+8v5M82IJ3CBt4UYldeRJsa1YCK/FsbK8tp91fkEfnXVue
 +U6LPX0AmMpXacR5HaZfb3uBIKRw/QMdP/7RFtBPhpV6jqCrEmuqHnpPQiEVtxwO
 UmG7bt94Trk=
 =3VDr
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler udpates from Ingo Molnar:

 - Changes to core scheduling facilities:

    - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
      coordinated scheduling across SMT siblings. This is a much
      requested feature for cloud computing platforms, to allow the
      flexible utilization of SMT siblings, without exposing untrusted
      domains to information leaks & side channels, plus to ensure more
      deterministic computing performance on SMT systems used by
      heterogenous workloads.

      There are new prctls to set core scheduling groups, which allows
      more flexible management of workloads that can share siblings.

    - Fix task->state access anti-patterns that may result in missed
      wakeups and rename it to ->__state in the process to catch new
      abuses.

 - Load-balancing changes:

    - Tweak newidle_balance for fair-sched, to improve 'memcache'-like
      workloads.

    - "Age" (decay) average idle time, to better track & improve
      workloads such as 'tbench'.

    - Fix & improve energy-aware (EAS) balancing logic & metrics.

    - Fix & improve the uclamp metrics.

    - Fix task migration (taskset) corner case on !CONFIG_CPUSET.

    - Fix RT and deadline utilization tracking across policy changes

    - Introduce a "burstable" CFS controller via cgroups, which allows
      bursty CPU-bound workloads to borrow a bit against their future
      quota to improve overall latencies & batching. Can be tweaked via
      /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.

    - Rework assymetric topology/capacity detection & handling.

 - Scheduler statistics & tooling:

    - Disable delayacct by default, but add a sysctl to enable it at
      runtime if tooling needs it. Use static keys and other
      optimizations to make it more palatable.

    - Use sched_clock() in delayacct, instead of ktime_get_ns().

 - Misc cleanups and fixes.

* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/doc: Update the CPU capacity asymmetry bits
  sched/topology: Rework CPU capacity asymmetry detection
  sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
  psi: Fix race between psi_trigger_create/destroy
  sched/fair: Introduce the burstable CFS controller
  sched/uclamp: Fix uclamp_tg_restrict()
  sched/rt: Fix Deadline utilization tracking during policy change
  sched/rt: Fix RT utilization tracking during policy change
  sched: Change task_struct::state
  sched,arch: Remove unused TASK_STATE offsets
  sched,timer: Use __set_current_state()
  sched: Add get_current_state()
  sched,perf,kvm: Fix preemption condition
  sched: Introduce task_is_running()
  sched: Unbreak wakeups
  sched/fair: Age the average idle time
  sched/cpufreq: Consider reduced CPU capacity in energy calculation
  sched/fair: Take thermal pressure into account while estimating energy
  thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
  sched/fair: Return early from update_tg_cfs_load() if delta == 0
  ...
2021-06-28 12:14:19 -07:00
Linus Torvalds
28a27cbd86 Perf events updates for this cycle:
- Platform PMU driver updates:
 
      - x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
      - Fix RDPMC support
      - Fix [extended-]PEBS-via-PT support
      - Fix Sapphire Rapids event constraints
      - Fix :ppp support on Sapphire Rapids
      - Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
      - Other heterogenous-PMU fixes
 
  - Kprobes:
 
      - Remove the unused and misguided kprobe::fault_handler callbacks.
      - Warn about kprobes taking a page fault.
      - Fix the 'nmissed' stat counter.
 
  - Misc cleanups and fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZaxMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hPgw//f9SnGzFoP1uR5TBqM8j/QHulMewew/iD
 dM5lh2emdmqHWYPBeRxUHgag38K2Golr3Y+NxLA3R+RMx+OZQe8Mz/wYvPQcBvsV
 k1HHImU3GRMn4GM7GwxH3vPIottDUx3mNS2J6pzlw3kwRUVqrxUdj/0/pSY/4eJ7
 ZT4uq4yLV83Jd3qioU7o7e/u6MrdNIIcAXRpVDdE9Mm1+kWXSVN7/h3Vsiz4tj5E
 iS+UXEtSc1a2mnmekv63pYkJHHNUb6guD8jgI/wrm1KIFGjDRifM+3TV6R/kB96/
 TfD2LhCcTShfSp8KI191pgV7/NQbB/PmLdSYmff3rTBiii4cqXuCygJCHInZ09z0
 4fTSSqM6aHg7kfTQyOCp+DUQ+9vNVXWo8mxt9c6B8xA0GyCI3zhjQ4UIiSUWRpjs
 Be5ZyF0kNNuPxYrKFnGnBf8+51DURpCz3sDdYRuK4KNkj1+4ZvJo/KzGTMUUIE4B
 IDQG6wDP5Kb388eRDtKrG5X7IXg+L5F/kezin60j0QF5MwDgxirT217teN8H1lNn
 YgWMjRK8Tw0flUJsbCxa51/nl93UtByB+fIRIc88MSeLxcI6/ORW+TxBBEqkYm5Z
 6BLFtmHSuAqAXUuyZXSGLcW7XLJvIaDoHgvbDn6l4g7FMWHqPOIq6nJQY3L8ben2
 e+fQrGh4noI=
 =20Vc
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:

 - Platform PMU driver updates:

     - x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
     - Fix RDPMC support
     - Fix [extended-]PEBS-via-PT support
     - Fix Sapphire Rapids event constraints
     - Fix :ppp support on Sapphire Rapids
     - Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
     - Other heterogenous-PMU fixes

 - Kprobes:

     - Remove the unused and misguided kprobe::fault_handler callbacks.
     - Warn about kprobes taking a page fault.
     - Fix the 'nmissed' stat counter.

 - Misc cleanups and fixes.

* tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix task context PMU for Hetero
  perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids
  perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids
  perf/x86/intel: Fix fixed counter check warning for some Alder Lake
  perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS
  perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  kprobes: Do not increment probe miss count in the fault handler
  x86,kprobes: WARN if kprobes tries to handle a fault
  kprobes: Remove kprobe::fault_handler
  uprobes: Update uprobe_write_opcode() kernel-doc comment
  perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint
  perf/core: Fix DocBook warnings
  perf/core: Make local function perf_pmu_snapshot_aux() static
  perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX
  perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR
  perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure
  perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
2021-06-28 12:03:20 -07:00
Kees Cook
c74d3c182a s390/speculation: Use statically initialized const for instructions
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
confusing the checks when using a static const source.

Move the static const array into a variable so the compiler can perform
appropriate bounds checking.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210616201823.1245603-1-keescook@chromium.org
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-28 11:18:28 +02:00
Sven Schnelle
9e3d62d55b s390/topology: clear thread/group maps for offline cpus
The current code doesn't clear the thread/group maps for offline
CPUs. This may cause kernel crashes like the one bewlow in common
code that assumes if a CPU has sibblings it is online.

Unable to handle kernel pointer dereference in virtual kernel address space

Call Trace:
 [<000000013a4b8c3c>] blk_mq_map_swqueue+0x10c/0x388
([<000000013a4b8bcc>] blk_mq_map_swqueue+0x9c/0x388)
 [<000000013a4b9300>] blk_mq_init_allocated_queue+0x448/0x478
 [<000000013a4b9416>] blk_mq_init_queue+0x4e/0x90
 [<000003ff8019d3e6>] loop_add+0x106/0x278 [loop]
 [<000003ff801b8148>] loop_init+0x148/0x1000 [loop]
 [<0000000139de4924>] do_one_initcall+0x3c/0x1e0
 [<0000000139ef449a>] do_init_module+0x6a/0x2a0
 [<0000000139ef61bc>] __do_sys_finit_module+0xa4/0xc0
 [<0000000139de9e6e>] do_syscall+0x7e/0xd0
 [<000000013a8e0aec>] __do_syscall+0xbc/0x110
 [<000000013a8ee2e8>] system_call+0x78/0xa0

Fixes: 52aeda7acc ("s390/topology: remove offline CPUs from CPU topology masks")
Cc: <stable@kernel.org> # 5.7+
Reported-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-21 11:19:18 +02:00
Sven Schnelle
ca1f4d702d s390: clear pt_regs::flags on irq entry
The current irq entry code doesn't initialize pt_regs::flags. On exit to
user mode arch_do_signal_or_restart() tests whether PIF_SYSCALL is set,
which might yield wrong results.

Fix this by clearing pt_regs::flags in the entry.S irq handler
code.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Fixes: 56e62a7370 ("s390: convert to generic entry")
Cc: <stable@vger.kernel.org> # 5.12
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-21 11:19:18 +02:00
Sven Schnelle
fc66127dc3 s390: fix system call restart with multiple signals
glibc complained with "The futex facility returned an unexpected error
code.". It turned out that the futex syscall returned -ERESTARTSYS because
a signal is pending. arch_do_signal_or_restart() restored the syscall
parameters (nameley regs->gprs[2]) and set PIF_SYSCALL_RESTART. When
another signal is made pending later in the exit loop
arch_do_signal_or_restart() is called again. This function clears
PIF_SYSCALL_RESTART and checks the return code which is set in
regs->gprs[2]. However, regs->gprs[2] was restored in the previous run
and no longer contains -ERESTARTSYS, so PIF_SYSCALL_RESTART isn't set
again and the syscall is skipped.

Fix this by not clearing PIF_SYSCALL_RESTART - it is already cleared in
__do_syscall() when the syscall is restarted.

Reported-by: Bjoern Walk <bwalk@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Fixes: 56e62a7370 ("s390: convert to generic entry")
Cc: <stable@vger.kernel.org> # 5.12
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-21 11:19:18 +02:00
Heiko Carstens
5a4e0f58e2 s390/ipl: use register pair instead of register asm
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:24 +02:00
Heiko Carstens
5fe29839de s390/sysinfo: get rid of register asm
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:23 +02:00
Heiko Carstens
0a9d947fbe s390/cpcmd: use register pair instead of register asm
Remove register asm usage from diag8_noresponse() since it wasn't
needed at all. There is no requirement for even/odd register pairs for
diag 0x8.

For diag_response() use register pairs to fulfill the rx+1 and ry+1
requirements as required if a response buffer is specified. Also
change the inline asm to return the condition code of the diagnose
instruction and do the conditional handling of response length
calculation in C.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:22 +02:00
Heiko Carstens
6a7b4e4ee1 s390/sthyi: use register pair instead of register asm
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:22 +02:00
Heiko Carstens
3c45a07bee s390/diag: use register pair instead of register asm
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:21 +02:00
Heiko Carstens
ddd38fd261 s390/smp: use register pair instead of register asm
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:21 +02:00
Peter Oberparleiter
d2beeb3bc7 s390/debug: Remove pointer obfuscation
When read via debugfs, s390dbf debug-views print the kernel address of
the call-site that created a trace entry. The kernel's %p pointer
hashing feature obfuscates this address, and commit 860ec7c6e2
("s390/debug: use pK for kernel pointers") made this obfuscation
configurable via the kptr_restrict sysctl.

Obfuscation of kernel address data printed via s390dbf debug-views does
not add any additional protection since the associated debugfs files are
only accessible to the root user that typically has enough other means
to obtain kernel address data.

Also trace payload data may contain binary representations of kernel
addresses as part of logged data structues. Requiring such payload data
to be obfuscated as well would be impractical and greatly diminish the
use of s390dbf.

Therefore completely remove pointer obfuscation from s390dbf
debug-views.

Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:19 +02:00
Vasily Gorbik
6a9100ad13 s390/setup: cleanup reserve/remove_oldmem
Since OLDMEM_BASE/OLDMEM_SIZE is already taken into consideration and is
reflected in ident_map_size. reserve/remove_oldmem() is no longer needed
and could be removed.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:19 +02:00
Vasily Gorbik
0c4f2623b9 s390: setup kernel memory layout early
Currently there are two separate places where kernel memory layout has
to be known and adjusted:
1. early kasan setup.
2. paging setup later.

Those 2 places had to be kept in sync and adjusted to reflect peculiar
technical details of one another. With additional factors which influence
kernel memory layout like ultravisor secure storage limit, complexity
of keeping two things in sync grew up even more.

Besides that if we look forward towards creating identity mapping and
enabling DAT before jumping into uncompressed kernel - that would also
require full knowledge of and control over kernel memory layout.

So, de-duplicate and move kernel memory layout setup logic into
the decompressor.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:19 +02:00
Peter Zijlstra
b03fbd4ff2 sched: Introduce task_is_running()
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
2021-06-18 11:43:07 +02:00
Alexander Gordeev
b5415c8f97 s390/entry.S: factor out OUTSIDE macro
Introduce OUTSIDE macro that checks whether an instruction
address is inside or outside of a block of instructions.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-16 23:46:18 +02:00
Alexander Gordeev
20232b18e5 s390/mcck: cleanup use of cleanup_sie_mcck
cleanup_sie_mcck label is called from a single location only
and thus does not need to be a subroutine. Move the labelled
code to the caller - by doing that the SIE critical section
checks appear next to each other and the SIE cleanup becomes
bit more readable.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:12:59 +02:00
Vasily Gorbik
3bd6958136 Merge branch 's390/fixes' into features
This helps to avoid merge conflicts later.

* fixes:
  s390/mcck: fix invalid KVM guest condition check
  s390/mcck: fix calculation of SIE critical section size

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:11:10 +02:00
Sven Schnelle
0a500447b8 s390: use struct tpi_info in lowcore.h
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:01 +02:00
Alexander Gordeev
d2e834c62d s390/smp: remove redundant pcpu::lowcore member
Per-CPU pointer to lowcore is stored in global lowcore_ptr[]
array and duplicated in struct pcpu::lowcore member. This
update removes the redundancy.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Alexander Gordeev
587704efb3 s390/smp: do not preserve boot CPU lowcore on hotplug
Once the kernel is running the boot CPU lowcore becomes
freeable and does not differ from the secondary CPU ones
in any way. Make use of it and do not preserve the boot
CPU lowcore on unplugging. That allows returning unused
memory when the boot CPU is offline and makes the code
more clear.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Alexander Gordeev
5789284710 s390/smp: reallocate IPL CPU lowcore
The lowcore for IPL CPU is special. It is allocated early
in the boot process using memblock and never freed since.
The reason is pcpu_alloc_lowcore() and pcpu_free_lowcore()
routines use page allocator which is not available when
the IPL CPU is getting initialized.

Similar problem is already addressed for stacks - once the
virtual memory is available the early boot stacks get re-
allocated. Doing the same for lowcore will allow freeing
the IPL CPU lowcore and make no difference between the
boot and secondary CPUs.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:07:00 +02:00
Heiko Carstens
f73c632d38 s390/ipl: make parameter area accessible via struct parmarea
Since commit 9a965ea95135 ("s390/kexec_file: Simplify parmarea
access") we have struct parmarea which describes the layout of the
kernel parameter area.

Make the kernel parameter area available as global variable parmarea
of type struct parmarea, which allows to easily access its members.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Valentin Vidic
b7d91d230a s390/sclp_vt220: fix console name to match device
Console name reported in /proc/consoles:

  ttyS1                -W- (EC p  )    4:65

does not match the char device name:

  crw--w----    1 root     root        4,  65 May 17 12:18 /dev/ttysclp0

so debian-installer inside a QEMU s390x instance gets confused and fails
to start with the following error:

  steal-ctty: No such file or directory

Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Link: https://lore.kernel.org/r/20210427194010.9330-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:59 +02:00
Sven Schnelle
755112b35c s390/traps: add struct to access transactional diagnostic block
gcc-11 warns:

arch/s390/kernel/traps.c: In function __do_pgm_check:
arch/s390/kernel/traps.c:319:17: warning: memcpy reading 256 bytes from a region of size 0 [-Wstringop-overread]
  319 |                 memcpy(&current->thread.trap_tdb, &S390_lowcore.pgm_tdb, 256);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by adding a struct pgm_tdb to struct lowcore and copy that.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Sven Schnelle
6c6a07fc7c s390/irq: add union/struct to access external interrupt parameters
gcc-11 warns:

arch/s390/kernel/irq.c: In function do_ext_irq:
arch/s390/kernel/irq.c:175:9: warning: memcpy reading 4 bytes from a region of size 0 [-Wstringop-overread]
  175 |         memcpy(&regs->int_code, &S390_lowcore.ext_cpu_addr, 4);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by adding a struct for int_code to struct lowcore.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Sven Schnelle
17e89e1340 s390/facilities: move stfl information from lowcore to global data
With gcc-11, there are a lot of warnings because the facility functions
are accessing lowcore through a null pointer. Fix this by moving the
facility arrays away from lowcore.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Sven Schnelle
af9ad82290 s390/entry: use assignment to read intcode / asm to copy gprs
arch/s390/kernel/syscall.c: In function __do_syscall:
arch/s390/kernel/syscall.c:147:9: warning: memcpy reading 64 bytes from a region of size 0 [-Wstringop-overread]
  147 |         memcpy(&regs->gprs[8], S390_lowcore.save_area_sync, 8 * sizeof(unsigned long));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/s390/kernel/syscall.c:148:9: warning: memcpy reading 4 bytes from a region of size 0 [-Wstringop-overread]
  148 |         memcpy(&regs->int_code, &S390_lowcore.svc_ilc, sizeof(regs->int_code));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by moving the gprs restore from C to assembly, and use a assignment
for int_code instead of memcpy.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Thomas Richter
15e5b53ff4 s390/cpumf: remove WARN_ON_ONCE in counter start handler
Remove some WARN_ON_ONCE() warnings when a counter is started. Each
counter is installed function calls
event_sched_in() --> cpumf_pmu_add(..., PERF_EF_START).

This is done after the event has been created using
perf_pmu_event_init() which verifies the counter is valid.
Member hwc->config must be valid at this point.

Function cpumf_pmu_start(..., PERF_EF_RELOAD) is called from
function cpumf_pmu_add() for counter events. All other invocations of
cpumf_pmu_start(..., PERF_EF_RELOAD) are from the performance subsystem
for sampling events.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Thomas Richter
d552a58d70 s390/cpumf: remove counter transaction call backs
The command 'perf stat -e cycles ...' triggers the following function
sequence in the CPU Measurement Facility counter device driver:

perf_pmu_event_init()
  __hw_perf_event_init()
    validate_ctr_auth()
    validate_ctr_version()

During event creation, the counter number is checked in functions
validate_ctr_auth() and validate_ctr_version() to verify it is a valid
counter and supported by the hardware. If this is not the case, both
functions return an error and the event is not created. System call
perf_event_open() returns an error in this case.

Later on the event is installed in the kernel event subsystem and the
driver functions cpumf_pmu_add() and cpumf_pmu_commit_txn() are called
to install the counter event by the hardware.

Since both events have been verified at event creation, there is no need
to re-evaluate the authorization state. This can not change since on
 * LPARs the authorization change requires a restart of the LPAR (and
   thus a reboot of the kernel)
 * DPMs can not take resources away, just add them.

Also the sequence of CPU Measurement facility counter device driver
calls is
  cpumf_pmu_start_txn
  cpumf_pmu_add
  cpumf_pmu_start
  cpumf_pmu_commit_txn
for every single event. Which means the condition in cpumf_pmu_add()
is never met and validate_ctr_auth() is never called.

This leaves the counter device driver transaction functions with
just one task:
start_txn: Verify a transaction is not in flight and call
	perf_pmu_disable()
cancel_txn, commit_txn: Verify a transaction is in flight and call
	perf_pmu_enable()

The same functionality is provided by the default transaction handling
functions in kernel/events/core.c. Use those by removing the
counter device driver private call back functions.

Suggested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Alexander Gordeev
1874cb13d5 s390/mcck: fix invalid KVM guest condition check
Wrong condition check is used to decide if a machine check hit
while in KVM guest. As result of this check the instruction
following the SIE critical section might be considered as still
in KVM guest and _CIF_MCCK_GUEST CPU flag mistakenly set as
result.

Fixes: c929500d7a ("s390/nmi: s390: New low level handling for machine check happening in guest")
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 12:12:03 +02:00
Alexander Gordeev
5bcbe3285f s390/mcck: fix calculation of SIE critical section size
The size of SIE critical section is calculated wrongly
as result of a missed subtraction in commit 0b0ed657fe
("s390: remove critical section cleanup from entry.S")

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 12:12:03 +02:00
Jan Kara
65ffb3d69e quota: Wire up quotactl_fd syscall
Wire up the quotactl_fd syscall.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-06-07 12:11:24 +02:00
Ingo Molnar
a9e906b71f Merge branch 'sched/urgent' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-03 19:00:49 +02:00
Naveen N. Rao
2e38eb04c9 kprobes: Do not increment probe miss count in the fault handler
Kprobes has a counter 'nmissed', that is used to count the number of
times a probe handler was not called. This generally happens when we hit
a kprobe while handling another kprobe.

However, if one of the probe handlers causes a fault, we are currently
incrementing 'nmissed'. The comment in fault handler indicates that this
can be used to account faults taken by the probe handlers. But, this has
never been the intention as is evident from the comment above 'nmissed'
in 'struct kprobe':

	/*count the number of times this probe was temporarily disarmed */
	unsigned long nmissed;

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210601120150.672652-1-naveen.n.rao@linux.vnet.ibm.com
2021-06-03 15:47:26 +02:00
Peter Zijlstra
ec6aba3d2b kprobes: Remove kprobe::fault_handler
The reason for kprobe::fault_handler(), as given by their comment:

 * We come here because instructions in the pre/post
 * handler caused the page_fault, this could happen
 * if handler tries to access user space by
 * copy_from_user(), get_user() etc. Let the
 * user-specified handler try to fix it first.

Is just plain bad. Those other handlers are ran from non-preemptible
context and had better use _nofault() functions. Also, there is no
upstream usage of this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
2021-06-01 16:00:08 +02:00
Jan Kara
5b9fedb31e quota: Disable quotactl_path syscall
In commit fa8b90070a ("quota: wire up quotactl_path") we have wired up
new quotactl_path syscall. However some people in LWN discussion have
objected that the path based syscall is missing dirfd and flags argument
which is mostly standard for contemporary path based syscalls. Indeed
they have a point and after a discussion with Christian Brauner and
Sascha Hauer I've decided to disable the syscall for now and update its
API. Since there is no userspace currently using that syscall and it
hasn't been released in any major release, we should be fine.

CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein
Signed-off-by: Jan Kara <jack@suse.cz>
2021-05-17 14:39:56 +02:00
Valentin Schneider
f1a0a376ca sched/core: Initialize the idle task with preemption disabled
As pointed out by commit

  de9b8f5dcb ("sched: Fix crash trying to dequeue/enqueue the idle thread")

init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.

As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().

Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().

Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().

Secondary startups were patched via coccinelle:

  @begone@
  @@

  -preempt_disable();
  ...
  cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
2021-05-12 13:01:45 +02:00
Linus Torvalds
e48661230c more s390 updates for 5.13 merge window
- add support for system call stack randomization.
 
 - handle stale PCI deconfiguration events.
 
 - couple of defconfig updates.
 
 - some fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmCURRgACgkQIg7DeRsp
 bsLiVw//ThqXjgP7koJtawL0MFvSo1V69KTw1QNoMmUvrCynZ8nJlt4sHj1LIuEN
 m7kHoWUsvNcg8r8QbxL1eZ2f/Qf43qrFjXIKi5iTdOtO/LF9NNNQYFnA3cT3h9oE
 7hfycj8o5yi+KYY3Ca2HjlQ0i7zKYfPul1+Yms5h0nAgcvOXuPltVAlyYrrtddrM
 cfpolZZd1IB/lMHSa8/qLviRB5ADlrNx4N6Y1ROeCPCWDbO8flrnDOPTDG8a8sCN
 llQ0/vBTmenkGyT7UjG5bx9P/gX1FsMShBtyZMa8t8leIJfruDiwdo87wvSDf5IT
 I612xdbLpMfGy6i/LnJHhnw61FkpwBKJZ3UrVVkrmjY8IVN8tVdAjy5s4Fplhgjj
 BUbk9Ep03YCqfO6fpqh5DkBxCF0dnj4dZrcHA881/DnZuUkxpMJhyNjJDIx1OLup
 PC+y9eILFAnDveFvhZJZeMpH7wAheyrW/WgKZsZNLYZ+61pKPyGn9RrE5UBgI7ra
 CSIi9Km/lAuNCd9o4n5/5wCd3a9dW47kCrRe3S20oF57v5RU3AVtbC2YSUqPYahf
 NR4ZgL+zDhByLPRVij5FJ1LLeaJJftKM9uUO9egHOk1JnSxDc9EyU41x4838SKYv
 CfhQJw1ISTTRNCGeflE7+CBfEYCKX7h+DySVCtxTsg6PelcQHLs=
 =C1HZ
 -----END PGP SIGNATURE-----

Merge tag 's390-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - add support for system call stack randomization

 - handle stale PCI deconfiguration events

 - couple of defconfig updates

 - some fixes and cleanups

* tag 's390-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix detection of vector enhancements facility 1 vs. vector packed decimal facility
  s390/entry: add support for syscall stack randomization
  s390/configs: change CONFIG_VIRTIO_CONSOLE to "m"
  s390/cio: remove invalid condition on IO_SCH_UNREG
  s390/cpumf: remove call to perf_event_update_userpage
  s390/cpumf: move counter set size calculation to common place
  s390/cpumf: beautify if-then-else indentation
  s390/configs: enable CONFIG_PCI_IOV
  s390/pci: handle stale deconfiguration events
  s390/pci: rename zpci_configure_device()
2021-05-06 14:39:50 -07:00
David Hildenbrand
b208108638 s390: fix detection of vector enhancements facility 1 vs. vector packed decimal facility
The PoP documents:
	134: The vector packed decimal facility is installed in the
	     z/Architecture architectural mode. When bit 134 is
	     one, bit 129 is also one.
	135: The vector enhancements facility 1 is installed in
	     the z/Architecture architectural mode. When bit 135
	     is one, bit 129 is also one.

Looks like we confuse the vector enhancements facility 1 ("EXT") with the
Vector packed decimal facility ("BCD"). Let's fix the facility checks.

Detected while working on QEMU/tcg z14 support and only unlocking
the vector enhancements facility 1, but not the vector packed decimal
facility.

Fixes: 2583b848ca ("s390: report new vector facilities")
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20210503121244.25232-1-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-05-04 19:10:56 +02:00
Linus Torvalds
17ae69aba8 Add Landlock, a new LSM from Mickaël Salaün <mic@linux.microsoft.com>
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEgycj0O+d1G2aycA8rZhLv9lQBTwFAmCInP4ACgkQrZhLv9lQ
 BTza0g//dTeb9woC9H7qlEhK4l9yk62lTss60Q8X7m7ZSNfdL4tiEbi64SgK+iOW
 OOegbrOEb8Kzh4KJJYmVlVZ5YUWyH4szgmee1wnylBdsWiWaPLPF3Cflz77apy6T
 TiiBsJd7rRE29FKheaMt34B41BMh8QHESN+DzjzJWsFoi/uNxjgSs2W16XuSupKu
 bpRmB1pYNXMlrkzz7taL05jndZYE5arVriqlxgAsuLOFOp/ER7zecrjImdCM/4kL
 W6ej0R1fz2Geh6CsLBJVE+bKWSQ82q5a4xZEkSYuQHXgZV5eywE5UKu8ssQcRgQA
 VmGUY5k73rfY9Ofupf2gCaf/JSJNXKO/8Xjg0zAdklKtmgFjtna5Tyg9I90j7zn+
 5swSpKuRpilN8MQH+6GWAnfqQlNoviTOpFeq3LwBtNVVOh08cOg6lko/bmebBC+R
 TeQPACKS0Q0gCDPm9RYoU1pMUuYgfOwVfVRZK1prgi2Co7ZBUMOvYbNoKYoPIydr
 ENBYljlU1OYwbzgR2nE+24fvhU8xdNOVG1xXYPAEHShu+p7dLIWRLhl8UCtRQpSR
 1ofeVaJjgjrp29O+1OIQjB2kwCaRdfv/Gq1mztE/VlMU/r++E62OEzcH0aS+mnrg
 yzfyUdI8IFv1q6FGT9yNSifWUWxQPmOKuC8kXsKYfqfJsFwKmHM=
 =uCN4
 -----END PGP SIGNATURE-----

Merge tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull Landlock LSM from James Morris:
 "Add Landlock, a new LSM from Mickaël Salaün.

  Briefly, Landlock provides for unprivileged application sandboxing.

  From Mickaël's cover letter:
    "The goal of Landlock is to enable to restrict ambient rights (e.g.
     global filesystem access) for a set of processes. Because Landlock
     is a stackable LSM [1], it makes possible to create safe security
     sandboxes as new security layers in addition to the existing
     system-wide access-controls. This kind of sandbox is expected to
     help mitigate the security impact of bugs or unexpected/malicious
     behaviors in user-space applications. Landlock empowers any
     process, including unprivileged ones, to securely restrict
     themselves.

     Landlock is inspired by seccomp-bpf but instead of filtering
     syscalls and their raw arguments, a Landlock rule can restrict the
     use of kernel objects like file hierarchies, according to the
     kernel semantic. Landlock also takes inspiration from other OS
     sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD
     Pledge/Unveil.

     In this current form, Landlock misses some access-control features.
     This enables to minimize this patch series and ease review. This
     series still addresses multiple use cases, especially with the
     combined use of seccomp-bpf: applications with built-in sandboxing,
     init systems, security sandbox tools and security-oriented APIs [2]"

  The cover letter and v34 posting is here:

      https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/

  See also:

      https://landlock.io/

  This code has had extensive design discussion and review over several
  years"

Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1]
Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2]

* tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  landlock: Enable user space to infer supported features
  landlock: Add user and kernel documentation
  samples/landlock: Add a sandbox manager example
  selftests/landlock: Add user space tests
  landlock: Add syscall implementations
  arch: Wire up Landlock syscalls
  fs,security: Add sb_delete hook
  landlock: Support filesystem access-control
  LSM: Infrastructure management of the superblock
  landlock: Add ptrace restrictions
  landlock: Set up the security framework and manage credentials
  landlock: Add ruleset and domain management
  landlock: Add object management
2021-05-01 18:50:44 -07:00
Linus Torvalds
152d32aa84 ARM:
- Stage-2 isolation for the host kernel when running in protected mode
 
 - Guest SVE support when running in nVHE mode
 
 - Force W^X hypervisor mappings in nVHE mode
 
 - ITS save/restore for guests using direct injection with GICv4.1
 
 - nVHE panics now produce readable backtraces
 
 - Guest support for PTP using the ptp_kvm driver
 
 - Performance improvements in the S2 fault handler
 
 x86:
 
 - Optimizations and cleanup of nested SVM code
 
 - AMD: Support for virtual SPEC_CTRL
 
 - Optimizations of the new MMU code: fast invalidation,
   zap under read lock, enable/disably dirty page logging under
   read lock
 
 - /dev/kvm API for AMD SEV live migration (guest API coming soon)
 
 - support SEV virtual machines sharing the same encryption context
 
 - support SGX in virtual machines
 
 - add a few more statistics
 
 - improved directed yield heuristics
 
 - Lots and lots of cleanups
 
 Generic:
 
 - Rework of MMU notifier interface, simplifying and optimizing
 the architecture-specific code
 
 - Some selftests improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
 y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
 c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
 Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
 +2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
 M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
 =AXUi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This is a large update by KVM standards, including AMD PSP (Platform
  Security Processor, aka "AMD Secure Technology") and ARM CoreSight
  (debug and trace) changes.

  ARM:

   - CoreSight: Add support for ETE and TRBE

   - Stage-2 isolation for the host kernel when running in protected
     mode

   - Guest SVE support when running in nVHE mode

   - Force W^X hypervisor mappings in nVHE mode

   - ITS save/restore for guests using direct injection with GICv4.1

   - nVHE panics now produce readable backtraces

   - Guest support for PTP using the ptp_kvm driver

   - Performance improvements in the S2 fault handler

  x86:

   - AMD PSP driver changes

   - Optimizations and cleanup of nested SVM code

   - AMD: Support for virtual SPEC_CTRL

   - Optimizations of the new MMU code: fast invalidation, zap under
     read lock, enable/disably dirty page logging under read lock

   - /dev/kvm API for AMD SEV live migration (guest API coming soon)

   - support SEV virtual machines sharing the same encryption context

   - support SGX in virtual machines

   - add a few more statistics

   - improved directed yield heuristics

   - Lots and lots of cleanups

  Generic:

   - Rework of MMU notifier interface, simplifying and optimizing the
     architecture-specific code

   - a handful of "Get rid of oprofile leftovers" patches

   - Some selftests improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
  KVM: selftests: Speed up set_memory_region_test
  selftests: kvm: Fix the check of return value
  KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
  KVM: SVM: Skip SEV cache flush if no ASIDs have been used
  KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
  KVM: SVM: Drop redundant svm_sev_enabled() helper
  KVM: SVM: Move SEV VMCB tracking allocation to sev.c
  KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
  KVM: SVM: Unconditionally invoke sev_hardware_teardown()
  KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
  KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
  KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
  KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
  KVM: SVM: Move SEV module params/variables to sev.c
  KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
  KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
  KVM: SVM: Zero out the VMCB array used to track SEV ASID association
  x86/sev: Drop redundant and potentially misleading 'sev_enabled'
  KVM: x86: Move reverse CPUID helpers to separate header file
  KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
  ...
2021-05-01 10:14:08 -07:00
Sven Schnelle
bae1cd368c s390/entry: add support for syscall stack randomization
This adds support for adding a random offset to the stack while handling
syscalls. The patch uses get_tod_clock_fast() as this is considered good
enough and has much less performance penalty compared to using
get_random_int(). The patch also adds randomization in pgm_check_handler()
as the sigreturn/rt_sigreturn system calls might be called from there.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://lore.kernel.org/r/20210429091451.1062594-1-svens@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-30 17:20:39 +02:00
Thomas Richter
b0583ab477 s390/cpumf: remove call to perf_event_update_userpage
The function cpumf_pmu_add and cpumf_pmu_del call function
perf_event_update_userpage(). This calls is obsolete, the calls add and
delete a counter event. Counter events do not sample data and the
event->rb member to access the sampling ring buffer is always NULL.
The function perf_event_update_userpage() simply returns in this case.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by : Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-30 17:17:01 +02:00
Thomas Richter
1eefa4f439 s390/cpumf: move counter set size calculation to common place
The function to calculate the size of counter sets is renamed from
cf_diag_ctrset_size() to cpum_cf_ctrset_size() and moved to the file
containing common functions for the CPU Measurement Counter Facility.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by : Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-30 17:17:00 +02:00
Thomas Richter
0cceeab5a3 s390/cpumf: beautify if-then-else indentation
Beautify if-then-else indentation to match coding guideline.
Also use shorter pointer notation hwc instead of event->hw.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by : Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-30 17:17:00 +02:00
Linus Torvalds
767fcbc80f \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmCJU1UACgkQnJ2qBz9k
 QNk62AgAgp05OIXU/AgObb7DvSyI3ycwCV8PeWBpwD8yoDAh5x0tmT7vnJu974p6
 yHdnF7rr69ZzvbNCHLJ5kRykRlUao9W7cO5fdOW1uTpL7Ic60QuJMks/NfgVTHp1
 2zIQmBDerfn1/LTK8r2pPGcvtcjRcr7Ep4beN0Duw57lfVMJhjsNRPnBbXGBcp0r
 QzKk4/8V3DCZvOw+XNC3nto7avjvf+nU9sJmuh83546eqh0atjWivvO5aAlDOe6W
 rhBiLlmP0in5u2n1fYqzI1OQvtgtleyEZT2G0CrbAZn0xjmV/if9wl+3K6TOwDvR
 778xDEX7sZCaO/xkB+WK3hrd15ftKg==
 =0kYE
 -----END PGP SIGNATURE-----

Merge tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull quota, ext2, reiserfs updates from Jan Kara:

 - support for path (instead of device) based quotactl syscall
   (quotactl_path(2))

 - ext2 conversion to kmap_local()

 - other minor cleanups & fixes

* tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fs/reiserfs/journal.c: delete useless variables
  fs/ext2: Replace kmap() with kmap_local_page()
  ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry()
  fs/ext2/: fix misspellings using codespell tool
  quota: report warning limits for realtime space quotas
  quota: wire up quotactl_path
  quota: Add mountpath based quota support
2021-04-29 10:51:29 -07:00
Linus Torvalds
6daa755f81 s390 updates for 5.13 merge window
- fix buffer size for in-kernel disassembler for ebpf programs.
 
 - fix two memory leaks in zcrypt driver.
 
 - expose PCI device UID as index, including an indicator if the uid is
   unique.
 
 - remove some oprofile leftovers.
 
 - improve stack unwinder tests.
 
 - don't use gcc atomic builtins anymore, just like all other
   architectures. Even though I'm sure the current code is ok, I
   totally dislike that s390 is the only architecture being special
   here; especially considering that there was a lengthly discussion
   about this topic and the outcome was not to use the builtins.
   Therefore open-code atomic ops again with inline assembly and switch
   to gcc builtins as soon as other architectures are doing.
 
 - couple of other changes to atomic and cmpxchg, and use
   atomic-instrumented.h for KASAN.
 
 - separate zbus creation, registration, and scanning in our PCI code
   which allows for cleaner and easier handling.
 
 - a rather large change to the vfio-ap code to fix circular locking
   dependencies when updating crypto masks.
 
 - move QAOB handling from qdio layer down to drivers.
 
 - add CRW inject facility to common I/O layer. This adds debugs files
   which allow to generate artificial events from user space for
   testing purposes.
 
 - increase SCLP console line length from 80 to 320 characters to avoid
   odd wrapped lines.
 
 - add protected virtualization guest and host indication files, which
   indicate either that a guest is running in pv mode or if the
   hypervisor is capable of starting pv guests.
 
 - various other small fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmCICNwACgkQIg7DeRsp
 bsJgSQ/0Cn3KTymF8SJ7tXLpYBNHmZBL1sQ284pPNYmqluwephUaX643f/48RR5y
 rlbaewyHCbYyR+gwkUykuUQm/d+iwihip/uFmyEktr9JutOOS1RQKd8ujeyE2BUb
 aaDyE0J5VFYdd/ZA92n9FhkuNDOMZRFAK1SOfifT9jNWCl8iYz+pXV1Gx7LbJYVY
 KWfJ9D/zgzLoOTWhj4jWu8LutfLEqK+hq5nqxBII8APCV/QDYnjkwpwW01LoMtOv
 eHhtSz0JboFRk0FYf8oyR7AXQBz76+Ru3aivJNL7sr1S3N2yMSzNQbk/ATVBLER9
 VMQX2TfGGT/Ln3P4rYEoP2vGITRn765wg4KWNB2u3pY2try12G39fmjzOwVfbxQw
 BDAcLwxU7Tw0vJY+yI6ZWkPDXcs+uAWQwNiYoMtfUPfMYLEpLFffbWGwdZPKZRrH
 fy4e5ZFuavZsZr8Zeu4WYILJZoDnvhbs59gPzaLBPKosR0ZGNi8q6bnztnqnrYhi
 Oirt6aPOOyEoN/IT2bO1sDhIzIpCorIwCMZTRIQzerFRcjJgy0xHM9MRYRLMj6iW
 xltgWNt01SbTm6pbimwMUnjN5AdMTbHlTSzD8G34eWO21cVgHGpndbcT/M3uemyy
 Wf034Er2eFqlhXyhiAYTnNhVGbd+YMido7eo/CbGvqCxNvmKbA==
 =e+mh
 -----END PGP SIGNATURE-----

Merge tag 's390-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - fix buffer size for in-kernel disassembler for ebpf programs.

 - fix two memory leaks in zcrypt driver.

 - expose PCI device UID as index, including an indicator if the uid is
   unique.

 - remove some oprofile leftovers.

 - improve stack unwinder tests.

 - don't use gcc atomic builtins anymore, just like all other
   architectures. Even though I'm sure the current code is ok, I totally
   dislike that s390 is the only architecture being special here;
   especially considering that there was a lengthly discussion about
   this topic and the outcome was not to use the builtins. Therefore
   open-code atomic ops again with inline assembly and switch to gcc
   builtins as soon as other architectures are doing.

 - couple of other changes to atomic and cmpxchg, and use
   atomic-instrumented.h for KASAN.

 - separate zbus creation, registration, and scanning in our PCI code
   which allows for cleaner and easier handling.

 - a rather large change to the vfio-ap code to fix circular locking
   dependencies when updating crypto masks.

 - move QAOB handling from qdio layer down to drivers.

 - add CRW inject facility to common I/O layer. This adds debugs files
   which allow to generate artificial events from user space for testing
   purposes.

 - increase SCLP console line length from 80 to 320 characters to avoid
   odd wrapped lines.

 - add protected virtualization guest and host indication files, which
   indicate either that a guest is running in pv mode or if the
   hypervisor is capable of starting pv guests.

 - various other small fixes and improvements all over the place.

* tag 's390-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (53 commits)
  s390/disassembler: increase ebpf disasm buffer size
  s390/archrandom: add parameter check for s390_arch_random_generate
  s390/zcrypt: fix zcard and zqueue hot-unplug memleak
  s390/pci: expose a PCI device's UID as its index
  s390/atomic,cmpxchg: always inline __xchg/__cmpxchg
  s390/smp: fix do_restart() prototype
  s390: get rid of oprofile leftovers
  s390/atomic,cmpxchg: make constraints work with old compilers
  s390/test_unwind: print test suite start/end info
  s390/cmpxchg: use unsigned long values instead of void pointers
  s390/test_unwind: add WARN if tests failed
  s390/test_unwind: unify error handling paths
  s390: update defconfigs
  s390/spinlock: use R constraint in inline assembly
  s390/atomic,cmpxchg: switch to use atomic-instrumented.h
  s390/cmpxchg: get rid of gcc atomic builtins
  s390/atomic: get rid of gcc atomic builtins
  s390/atomic: use proper constraints
  s390/atomic: move remaining inline assemblies to atomic_ops.h
  s390/bitops: make bitops only work on longs
  ...
2021-04-27 17:54:15 -07:00
Linus Torvalds
ea5bc7b977 Trivial cleanups and fixes all over the place.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGmYIACgkQEsHwGGHe
 VUr45w/8CSXr7MXaFBj4To0hTWJXSZyF6YGqlZOSJXFcFh4cWTNwfVOoFaV47aDo
 +HsCNTkGENcKhLrDUWDRiG/Uo46jxtOtl1vhq7U4pGemSYH871XWOKfb5k5XNMwn
 /uhaHMI4aEfd6bUFnF518NeyRIsD0BdqFj4tB7RbAiyFwdETDX9Tkj/uBKnQ4zon
 4tEDoXgThuK5YKK9zVQg5pa7aFp2zg1CAdX/WzBkS8BHVBPXSV0CF97AJYQOM/V+
 lUHv+BN3wp97GYHPQMPsbkNr8IuFoe2mIvikwjxg8iOFpzEU1G1u09XV9R+PXByX
 LclFTRqK/2uU5hJlcsBiKfUuidyErYMRYImbMAOREt2w0ogWVu2zQ7HkjVve25h1
 sQPwPudbAt6STbqRxvpmB3yoV4TCYwnF91FcWgEy+rcEK2BDsHCnScA45TsK5I1C
 kGR1K17pHXprgMZFPveH+LgxewB6smDv+HllxQdSG67LhMJXcs2Epz0TsN8VsXw8
 dlD3lGReK+5qy9FTgO7mY0xhiXGz1IbEdAPU4eRBgih13puu03+jqgMaMabvBWKD
 wax+BWJUrPtetwD5fBPhlS/XdJDnd8Mkv2xsf//+wT0s4p+g++l1APYxeB8QEehm
 Pd7Mvxm4GvQkfE13QEVIPYQRIXCMH/e9qixtY5SHUZDBVkUyFM0=
 =bO1i
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 cleanups from Borislav Petkov:
 "Trivial cleanups and fixes all over the place"

* tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Remove me from IDE/ATAPI section
  x86/pat: Do not compile stubbed functions when X86_PAT is off
  x86/asm: Ensure asm/proto.h can be included stand-alone
  x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files
  x86/msr: Make locally used functions static
  x86/cacheinfo: Remove unneeded dead-store initialization
  x86/process/64: Move cpu_current_top_of_stack out of TSS
  tools/turbostat: Unmark non-kernel-doc comment
  x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL()
  x86/fpu/math-emu: Fix function cast warning
  x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
  x86: Fix various typos in comments, take #2
  x86: Remove unusual Unicode characters from comments
  x86/kaslr: Return boolean values from a function returning bool
  x86: Fix various typos in comments
  x86/setup: Remove unused RESERVE_BRK_ARRAY()
  stacktrace: Move documentation for arch_stack_walk_reliable() to header
  x86: Remove duplicate TSC DEADLINE MSR definitions
2021-04-26 09:25:47 -07:00
Paolo Bonzini
c4f71901d5 KVM/arm64 updates for Linux 5.13
New features:
 
 - Stage-2 isolation for the host kernel when running in protected mode
 - Guest SVE support when running in nVHE mode
 - Force W^X hypervisor mappings in nVHE mode
 - ITS save/restore for guests using direct injection with GICv4.1
 - nVHE panics now produce readable backtraces
 - Guest support for PTP using the ptp_kvm driver
 - Performance improvements in the S2 fault handler
 - Alexandru is now a reviewer (not really a new feature...)
 
 Fixes:
 - Proper emulation of the GICR_TYPER register
 - Handle the complete set of relocation in the nVHE EL2 object
 - Get rid of the oprofile dependency in the PMU code (and of the
   oprofile body parts at the same time)
 - Debug and SPE fixes
 - Fix vcpu reset
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCCpuAPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD2G8QALWQYeBggKnNmAJfuihzZ2WariBmgcENs2R2
 qNZ/Py6dIF+b69P68nmgrEV1x2Kp35cPJbBwXnnrS4FCB5tk0b8YMaj00QbiRIYV
 UXbPxQTmYO1KbevpoEcw8NmR4bZJ/hRYPuzcQG7CCMKIZw0zj2cMcBofzQpTOAp/
 CgItdcv7at3iwamQatfU9vUmC0nDdnjdIwSxTAJOYMVV1ENwtnYSNgZVo4XLTg7n
 xR/5Qx27PKBJw7GyTRAIIxKAzNXG2tDL+GVIHe4AnRp3z3La8sr6PJf7nz9MCmco
 ISgeY7EGQINzmm4LahpnV+2xwwxOWo8QotxRFGNuRTOBazfARyAbp97yJ6eXJUpa
 j0qlg3xK9neyIIn9BQKkKx4sY9V45yqkuVDsK6odmqPq3EE01IMTRh1N/XQi+sTF
 iGrlM3ZW4AjlT5zgtT9US/FRXeDKoYuqVCObJeXZdm3sJSwEqTAs0JScnc0YTsh7
 m30CODnomfR2y5X6GoaubbQ0wcZ2I20K1qtIm+2F6yzD5P1/3Yi8HbXMxsSWyYWZ
 1ldoSa+ZUQlzV9Ot0S3iJ4PkphLKmmO96VlxE2+B5gQG50PZkLzsr8bVyYOuJC8p
 T83xT9xd07cy+FcGgF9veZL99Y6BLHMa6ZwFUolYNbzJxqrmqyR1aiJMEBIcX+aP
 ACeKW1w5
 =fpey
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.13

New features:

- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)

Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
  oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
2021-04-23 07:41:17 -04:00
Mickaël Salaün
a49f4f81cb arch: Wire up Landlock syscalls
Wire up the following system calls for all architectures:
* landlock_create_ruleset(2)
* landlock_add_rule(2)
* landlock_restrict_self(2)

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-10-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2021-04-22 12:22:11 -07:00
Paolo Bonzini
fd49e8ee70 Merge branch 'kvm-sev-cgroup' into HEAD 2021-04-22 13:19:01 -04:00
Marc Zyngier
8c3f7913a1 s390: Get rid of oprofile leftovers
perf_pmu_name() and perf_num_counters() are unused. Drop them.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210414134409.1266357-4-maz@kernel.org
2021-04-22 13:32:39 +01:00
Vasily Gorbik
6f3353c2d2 s390/disassembler: increase ebpf disasm buffer size
Current ebpf disassembly buffer size of 64 is too small. E.g. this line
takes 65 bytes:
01fffff8005822e: ec8100ed8065\tclgrj\t%r8,%r1,8,001fffff80058408\n\0

Double the buffer size like it is done for the kernel disassembly buffer.

Fixes the following KASAN finding:

UG: KASAN: stack-out-of-bounds in print_fn_code+0x34c/0x380
Write of size 1 at addr 001fff800ad5f970 by task test_progs/853

CPU: 53 PID: 853 Comm: test_progs Not tainted
5.12.0-rc7-23786-g23457d86b1f0-dirty #19
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
 [<0000000cd8e0538a>] show_stack+0x17a/0x1668
 [<0000000cd8e2a5d8>] dump_stack+0x140/0x1b8
 [<0000000cd8e16e74>] print_address_description.constprop.0+0x54/0x260
 [<0000000cd75a8698>] kasan_report+0xc8/0x130
 [<0000000cd6e26da4>] print_fn_code+0x34c/0x380
 [<0000000cd6ea0f4e>] bpf_int_jit_compile+0xe3e/0xe58
 [<0000000cd72c4c88>] bpf_prog_select_runtime+0x5b8/0x9c0
 [<0000000cd72d1bf8>] bpf_prog_load+0xa78/0x19c0
 [<0000000cd72d7ad6>] __do_sys_bpf.part.0+0x18e/0x768
 [<0000000cd6e0f392>] do_syscall+0x12a/0x220
 [<0000000cd8e333f8>] __do_syscall+0x98/0xc8
 [<0000000cd8e54834>] system_call+0x6c/0x94
1 lock held by test_progs/853:
 #0: 0000000cd9bf7460 (report_lock){....}-{2:2}, at:
     kasan_report+0x96/0x130

addr 001fff800ad5f970 is located in stack of task test_progs/853 at
offset 96 in frame:
 print_fn_code+0x0/0x380
this frame has 1 object:
 [32, 96) 'buffer'

Memory state around the buggy address:
 001fff800ad5f800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 001fff800ad5f880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>001fff800ad5f900: 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00 f3 f3
                                                             ^
 001fff800ad5f980: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 001fff800ad5fa00: 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00

Cc: <stable@vger.kernel.org>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-21 12:32:12 +02:00
Alexander Gordeev
b44913fceb s390/smp: fix do_restart() prototype
Funciton do_restart() is a callback invoked from the
restart CPU routine and passed a single parameter.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-18 21:32:02 +02:00
Marc Zyngier
ff23f8c970 s390: get rid of oprofile leftovers
perf_pmu_name() and perf_num_counters() are unused. Drop them.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210414134409.1266357-4-maz@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-18 21:32:02 +02:00
Paolo Bonzini
6c377b02a8 KVM: s390: Updates for 5.13
- properly handle MVPG in nesting KVM (vsie)
 - allow to forward the yield_to hypercall (diagnose 9c)
 - fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJgdFvgAAoJEBF7vIC1phx8+IIP/0OdF4I5VqBJ1C9Roc3l4P+4
 b95OZX4nBLQ0L1JnPMeJqNo3V6JH/5356dwpIplQXv5wraS3+sQGX2D1xW00QnLE
 M6L3368uT30JmEVWnnrulUdLWwUqExJ17BEX9p4rmJQAm+7rLOJsVsWIKwclupyR
 BacDMG2q5aG+/eaceimBdEPyfE6YHJzbtD9BEBe12/Y+B0PyCyinAOiGALcugDkY
 kSqdqBcHFqXJuF37DsQn2gSlBFGByfvWlaYa0dKhdGFp4ps3TDhmC+qyoBAjHJFu
 nzTNOFdjgMlatUe92OsgwqilV0OUgdNZ+deKSyGHdmht+RknuLsJU0LqCvN66cTA
 H58D5s3PrM8868e/bflX47Lt0fbJSA7ZXZqJuyP84tEqTgQmAH43VvQg8t9bybTp
 dY2UUx19ZHpktVjL+FIylUcxyLXFSX8KTI0a/JxlMUUjE+NAaB22iCyBMMIoogSj
 ozqKGq7VwPJftoxLiUaGEUL4NyXlo7+XivZNTHFIjh0sjDZooH9IZ9LK/17684ra
 GLCAnw2hhB4xegNPuJWawo/vNJ5dAtiKVQ6Hwgr6ORaCEBLGtIlyYhm1XYAwb7f4
 vAfQ60lqbL1dpGtKnf4cMySrgNczotura4KPreXkDJ68eqNJCjbDUVnN+0XsBIC8
 7+SaOJRmJRd0VzeEPBg3
 =8wV0
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Updates for 5.13

- properly handle MVPG in nesting KVM (vsie)
- allow to forward the yield_to hypercall (diagnose 9c)
- fixes
2021-04-15 13:02:13 -04:00
Heiko Carstens
17a363dcd2 s390/traps,mm: add conditional trap handlers
Add conditional trap handlers similar to conditional system calls
(COND_SYSCALL), to reduce the number of ifdefs.

Trap handlers which may or may not exist depending on config options
are supposed to have a COND_TRAP entry, which redirects to
default_trap_handler() for non-existent trap handlers during link
time.

This allows to get rid of the secure execution trap handlers for the
!PGSTE case.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:46:42 +02:00
Heiko Carstens
6f8daa2953 s390/traps: convert pgm_check.S to C
Convert the program check table to C. Which allows to get rid of yet
another assembler file, and also enables proper type checking for the
table.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:46:41 +02:00
zhongbaisong
644975179c s390/protvirt: fix error return code in uv_info_init()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
Fixes: 37564ed834 ("s390/uv: add prot virt guest/host indication files")
Link: https://lore.kernel.org/r/2f7d62a4-3e75-b2b4-951b-75ef8ef59d16@huawei.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:46:41 +02:00
Heiko Carstens
0ee3f73914 Merge branch 'fixes' into features
* fixes:
  s390/entry: save the caller of psw_idle
  s390/entry: avoid setting up backchain in ext|io handlers
  s390/setup: use memblock_free_late() to free old stack
  s390/irq: fix reading of ext_params2 field from lowcore
  s390/unwind: add machine check handler stack
  s390/cpcmd: fix inline assembly register clobbering
  MAINTAINERS: add backups for s390 vfio drivers
  s390/vdso: fix initializing and updating of vdso_data
  s390/vdso: fix tod_steering_delta type
  s390/vdso: copy tod_steering_delta value to vdso_data page

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:45:38 +02:00
Vasily Gorbik
a994eddb94 s390/entry: save the caller of psw_idle
Currently psw_idle does not allocate a stack frame and does not
save its r14 and r15 into the save area. Even though this is valid from
call ABI point of view, because psw_idle does not make any calls
explicitly, in reality psw_idle is an entry point for controlled
transition into serving interrupts. So, in practice, psw_idle stack
frame is analyzed during stack unwinding. Depending on build options
that r14 slot in the save area of psw_idle might either contain a value
saved by previous sibling call or complete garbage.

  [task    0000038000003c28] do_ext_irq+0xd6/0x160
  [task    0000038000003c78] ext_int_handler+0xba/0xe8
  [task   *0000038000003dd8] psw_idle_exit+0x0/0x8 <-- pt_regs
 ([task    0000038000003dd8] 0x0)
  [task    0000038000003e10] default_idle_call+0x42/0x148
  [task    0000038000003e30] do_idle+0xce/0x160
  [task    0000038000003e70] cpu_startup_entry+0x36/0x40
  [task    0000038000003ea0] arch_call_rest_init+0x76/0x80

So, to make a stacktrace nicer and actually point for the real caller of
psw_idle in this frequently occurring case, make psw_idle save its r14.

  [task    0000038000003c28] do_ext_irq+0xd6/0x160
  [task    0000038000003c78] ext_int_handler+0xba/0xe8
  [task   *0000038000003dd8] psw_idle_exit+0x0/0x6 <-- pt_regs
 ([task    0000038000003dd8] arch_cpu_idle+0x3c/0xd0)
  [task    0000038000003e10] default_idle_call+0x42/0x148
  [task    0000038000003e30] do_idle+0xce/0x160
  [task    0000038000003e70] cpu_startup_entry+0x36/0x40
  [task    0000038000003ea0] arch_call_rest_init+0x76/0x80

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:44:31 +02:00
Vasily Gorbik
b74e409ea1 s390/entry: avoid setting up backchain in ext|io handlers
Currently when interrupt arrives to cpu while in kernel context
INT_HANDLER macro (used for ext_int_handler and io_int_handler)
allocates new stack frame and pt_regs on the kernel stack and
sets up the backchain to jump over the pt_regs to the frame which has
been interrupted. This is not ideal to two reasons:

1. This hides the fact that kernel stack contains interrupt frame in it
   and hence breaks arch_stack_walk_reliable(), which needs to know that to
   guarantee "reliability" and checks that there are no pt_regs on the way.

2. It breaks the backchain unwinder logic, which assumes that the next
   stack frame after an interrupt frame is reliable, while it is not.
   In some cases (when r14 contains garbage) this leads to early unwinding
   termination with an error, instead of marking frame as unreliable
   and continuing.

To address that, only set backchain to 0.

Fixes: 56e62a7370 ("s390: convert to generic entry")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:44:30 +02:00
Heiko Carstens
ad31a8c051 s390/setup: use memblock_free_late() to free old stack
Use memblock_free_late() to free the old machine check stack to the
buddy allocator instead of leaking it.

Fixes: b61b159512 ("s390: add stack for machine check handler")
Cc: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-07 14:37:28 +02:00
Heiko Carstens
85012e764d s390/irq: fix reading of ext_params2 field from lowcore
The contents of the ext_params2 field of the lowcore should just be
copied to the pt_regs structure, not dereferenced.

Fixes crashes / program check loops like this:

Krnl PSW : 0404c00180000000 00000000d6d02b3c (do_ext_irq+0x74/0x170)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 80000000000b974e 00000000d71abee0 00000000d71abee0
           0000000080030000 000000000000000f 0000000000000000 0000000000000000
           0000000000000001 00000380000bf918 00000000d73ef780 00000380000bf518
           0000000080348000 00000000d6d13350 00000000d6d02b1e 00000380000bf428
Krnl Code: 00000000d6d02b2e: 58100080            l       %r1,128
           00000000d6d02b32: 5010b0a4            st      %r1,164(%r11)
          #00000000d6d02b36: e31001b80104        lg      %r1,4536
          >00000000d6d02b3c: e31010000004        lg      %r1,0(%r1)
           00000000d6d02b42: e310b0a80024        stg     %r1,168(%r11)
           00000000d6d02b48: c01000242270        larl    %r1,00000000d7187028
           00000000d6d02b4e: d5071000b010        clc     0(8,%r1),16(%r11)
           00000000d6d02b54: a784001b            brc     8,00000000d6d02b8a
Call Trace:
 [<00000000d6d02b3c>] do_ext_irq+0x74/0x170
 [<00000000d6d0ea5c>] ext_int_handler+0xc4/0xf4
 [<00000000d621d266>] die+0x106/0x188
 [<00000000d62305b8>] do_no_context+0xc8/0x100
 [<00000000d6d02790>] __do_pgm_check+0xe0/0x1f0
 [<00000000d6d0e950>] pgm_check_handler+0x118/0x160
 [<00000000d6d02b3c>] do_ext_irq+0x74/0x170
 [<00000000d6d0ea5c>] ext_int_handler+0xc4/0xf4
 [<00000000d621d266>] die+0x106/0x188
 [<00000000d62305b8>] do_no_context+0xc8/0x100
 [<00000000d6d02790>] __do_pgm_check+0xe0/0x1f0
 [<00000000d6d0e950>] pgm_check_handler+0x118/0x160
 [<00000000d6d02b3c>] do_ext_irq+0x74/0x170
 [<00000000d6d0ea5c>] ext_int_handler+0xc4/0xf4
 [<0000000000000000>] 0x0
 [<00000000d6d0e57a>] default_idle_call+0x42/0x110
 [<00000000d629856e>] do_idle+0xce/0x160
 [<00000000d62987be>] cpu_startup_entry+0x36/0x40
 [<00000000d621f2f2>] smp_start_secondary+0x82/0x88

Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Fixes: 56e62a7370 ("s390: convert to generic entry")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-05 11:30:07 +02:00
Vasily Gorbik
08edb9683e s390/unwind: add machine check handler stack
Fixes: b61b159512 ("s390: add stack for machine check handler")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-05 11:30:07 +02:00
Alexander Gordeev
7a2f91441b s390/cpcmd: fix inline assembly register clobbering
Register variables initialized using arithmetic. That leads to
kasan instrumentaton code corrupting the registers contents.
Follow GCC guidlines and use temporary variables for assigning
init values to register variables.

Fixes: 94c12cc7d1 ("[S390] Inline assembly cleanup.")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Local-Register-Variables.html
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-05 11:30:07 +02:00
Heiko Carstens
5b43bd1845 s390/vdso: fix initializing and updating of vdso_data
Li Wang reported that clock_gettime(CLOCK_MONOTONIC_RAW, ...) returns
incorrect values when time is provided via vdso instead of system call:

vdso_ts_nsec = 4484351380985507, vdso_ts.tv_sec = 4484351, vdso_ts.tv_nsec = 380985507
sys_ts_nsec  = 1446923235377, sys_ts.tv_sec  = 1446, sys_ts.tv_nsec  = 923235377

Within the s390 specific vdso function __arch_get_hw_counter() reads
tod clock steering values from the arch_data member of the passed in
vdso_data structure.

Problem is that only for the CS_HRES_COARSE vdso_data arch_data is
initialized and gets updated. The CS_RAW specific vdso_data does not
contain any valid tod_clock_steering information, which explains the
different values.

Fix this by initializing and updating all vdso_datas.

Reported-by: Li Wang <liwang@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Fixes: 1ba2d6c0fd ("s390/vdso: simplify __arch_get_hw_counter()")
Link: https://lore.kernel.org/linux-s390/YFnxr1ZlMIOIqjfq@osiris
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-25 21:57:26 +01:00
Heiko Carstens
72bbc226ed s390/vdso: copy tod_steering_delta value to vdso_data page
When converting the vdso assembler code to C it was forgotten to
actually copy the tod_steering_delta value to vdso_data page.

Which in turn means that tod clock steering will not work correctly.

Fix this by simply copying the value whenever it is updated.

Fixes: 4bff8cb545 ("s390: convert to GENERIC_VDSO")
Cc: <stable@vger.kernel.org> # 5.10
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-25 21:57:25 +01:00
Janosch Frank
df2e400e07 s390/uv: fix prot virt host indication compilation
prot_virt_host is only available if CONFIG_KVM is enabled. So lets use
a variable initialized to zero and overwrite it when that config
option is set with prot_virt_host.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: 37564ed834 ("s390/uv: add prot virt guest/host indication files")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-24 16:06:19 +01:00
Bhaskar Chowdhury
5671d9718f s390/kernel: fix a typo
s/struture/structure/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Link: https://lore.kernel.org/r/20210322062500.3109603-1-unixbhaskar@gmail.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-22 11:36:05 +01:00
Janosch Frank
37564ed834 s390/uv: add prot virt guest/host indication files
Let's export the prot_virt_guest and prot_virt_host variables into the
UV sysfs firmware interface to make them easily consumable by
administrators.

prot_virt_host being 1 indicates that we did the UV
initialization (opt-in)

prot_virt_guest being 1 indicates that the UV indicates the share and
unshare ultravisor calls which is an indication that we are running as
a protected guest.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-22 11:36:03 +01:00
Ingo Molnar
ca8778c45e Merge branch 'linus' into x86/cleanups, to resolve conflict
Conflicts:
	arch/x86/kernel/kprobes/ftrace.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-21 22:16:08 +01:00
Ingo Molnar
14ff3ed86e Linux 5.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBOgu4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGUd0H/3Ey8aWjVAig9Pe+
 VQVZKwG+LXWH6UmUx5qyaTxophhmGnWLvkigJMn63qIg4eQtfp2gNFHK+T4OJNIP
 ybnkjFZ337x4J9zD6m8mt4Wmelq9iW2wNOS+3YZAyYiGlXfMGM7SlYRCQRQznTED
 2O/JCMsOoP+Z8tr5ah/bzs0dANsXmTZ3QqRP2uzb6irKTgFR3/weOhj+Ht1oJ4Aq
 V+bgdcwhtk20hJhlvVeqws+o74LR789tTDCknlz/YNMv9e6VPfyIQ5vJAcFmZATE
 Ezj9yzkZ4IU+Ux6ikAyaFyBU8d1a4Wqye3eHCZBsEo6tcSAhbTZ90eoU86vh6ajS
 LZjwkNw=
 =6y1u
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc3' into x86/cleanups, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-18 15:27:03 +01:00
Sascha Hauer
fa8b90070a quota: wire up quotactl_path
Wire up the quotactl_path syscall added in the previous patch.

Link: https://lore.kernel.org/r/20210304123541.30749-3-s.hauer@pengutronix.de
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-17 15:51:17 +01:00
Gerald Schaefer
d54cb7d548 s390/vtime: fix increased steal time accounting
Commit 152e9b8676 ("s390/vtime: steal time exponential moving average")
inadvertently changed the input value for account_steal_time() from
"cputime_to_nsecs(steal)" to just "steal", resulting in broken increased
steal time accounting.

Fix this by changing it back to "cputime_to_nsecs(steal)".

Fixes: 152e9b8676 ("s390/vtime: steal time exponential moving average")
Cc: <stable@vger.kernel.org> # 5.1
Reported-by: Sabine Forkel <sabine.forkel@de.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-15 19:09:25 +01:00
Thomas Richter
c79f01b6eb s390/cpumf: disable preemption when accessing per-cpu variable
The following BUG message was triggered repeatedly when complete counter
sets are extracted from the CPUMF:

BUG: using smp_processor_id() in preemptible [00000000]
     code: psvc-readsets/7759
 caller is cf_diag_needspace+0x2c/0x100
 CPU: 7 PID: 7759 Comm: psvc-readsets Not tainted 5.12.0
 Hardware name: IBM 3906 M03 703 (LPAR)
 Call Trace:
  [<00000000c7043f78>] show_stack+0x90/0xf8
  [<00000000c705776a>] dump_stack+0xba/0x108
  [<00000000c705d91c>] check_preemption_disabled+0xec/0xf0
  [<00000000c63eb1c4>] cf_diag_needspace+0x2c/0x100
  [<00000000c63ecbcc>] cf_diag_ioctl_start+0x10c/0x240
  [<00000000c63ece9a>] cf_diag_ioctl+0x19a/0x238
  [<00000000c675f3f4>] __s390x_sys_ioctl+0xc4/0x100
  [<00000000c63ca762>] do_syscall+0x82/0xd0
  [<00000000c705bdd8>] __do_syscall+0xc0/0xd8
  [<00000000c706d532>] system_call+0x72/0x98
 2 locks held by psvc-readsets/7759:
  #0: 00000000c75a57c0 (cpu_hotplug_lock){++++}-{0:0},
      at: cf_diag_ioctl+0x44/0x238
  #1: 00000000c75a3078 (cf_diag_ctrset_mutex){+.+.}-{3:3},
	            at: cf_diag_ioctl+0x54/0x238

This issue is a missing get_cpu_ptr/put_cpu_ptr pair in function
cf_diag_needspace. Add it.

Fixes: cf6acb8bdb ("s390/cpumf: Add support for complete counter set extraction")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-15 19:09:25 +01:00
Mark Brown
b18adee4ce stacktrace: Move documentation for arch_stack_walk_reliable() to header
Currently arch_stack_walk_reliable() is documented with an identical
comment in both x86 and S/390 implementations which is a bit redundant.
Move this to the header and convert to kerneldoc while we're at it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/20210309194125.652-1-broonie@kernel.org
2021-03-10 15:52:31 +01:00
Pierre Morel
87e28a15c4 KVM: s390: diag9c (directed yield) forwarding
When we intercept a DIAG_9C from the guest we verify that the
target real CPU associated with the virtual CPU designated by
the guest is running and if not we forward the DIAG_9C to the
target real CPU.

To avoid a diag9c storm we allow a maximal rate of diag9c forwarding.

The rate is calculated as a count per second defined as a new
parameter of the s390 kvm module: diag9c_forwarding_hz .

The default value of 0 is to not forward diag9c.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Link: https://lore.kernel.org/r/1613997661-22525-2-git-send-email-pmorel@linux.ibm.com
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-03-09 10:16:26 +01:00
Jiapeng Chong
1c0a9c7997 s390/cpumf: remove unneeded semicolon
Fix the following coccicheck warnings:

./arch/s390/kernel/perf_cpum_cf.c:272:2-3: Unneeded semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/1614233736-87331-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:29 +01:00
Thomas Richter
46b635b6ab s390/cpumf: rename header file to hwctrset.h
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Hendrick Brueckner <brueckner@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:28 +01:00
Thomas Richter
c41b20de1a s390/cpumf: remove 60 seconds read limit
Remove the 60 seconds read interval limit. Do not impose any limit
at all and allow read of complete counter sets.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:28 +01:00
Heiko Carstens
f9d8cbf33e s390/topology: remove always false if check
The cpumask being checked in cpu_group_map() must have at least one
cpu set; therefore remove the check.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:27 +01:00
Heiko Carstens
eba8e1af5a s390/time,idle: get rid of unsigned long long
Get rid of unsigned long long, and use unsigned long instead
everywhere. The usage of unsigned long long is a leftover from
31 bit kernel support.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-08 10:46:27 +01:00
Linus Torvalds
5695e51619 io_uring-worker.v3-2021-02-25
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmA4JRkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoWqD/9dbbqe8L701U6May1A/4hRsqL4THTA2flx
 vNCNRBl6XV3l/wBCtL6waKy6tyO4lyM8XdUdEvo3Kxl2kGPb8eVfpyYL/+77HqyH
 ctT4RMrs+84Mxn+5N6cM97hS1qVI2moTxxyvOEl/JTB7BYrutz9gvAoeY3/Dto47
 J66oSaPeuqJ32TyihxfQHVxQopJcqFzDjyoYHGDu6ATio1PXfaIdTu8ywVYSECAh
 pWI4rwnqdurGuHMNpxyL1bA6CT/jC7s+sqU7bUYUCgtYI3eG0u3V0bp5gAQQIgl9
 5sxxE3DidYGAkYZsosrelshBtzGddLdz4Qrt2ungMYv8RsGNpFQ095jDPKDwFaZj
 bSvSsfplCo7iFsJByb1TtpNEOW8eAwi81PmBDVQ9Oq5P5ygTYno9GBDc/20ql0Fk
 q6wcX28coE3IBw44ne0hIwvBOtXV4WJyluG/gqOxfbTH+kOy3pDsN8lWcY/P4X0U
 yzdU2MLHe8BNMyYlUiBF47Amzt4ltr85P4XD3WZ4bX71iwri6HvrdGWLuuKwX+Ie
 66QiIDDQIYZQ6NMMJWS9DGW3y3DBizpSXGxONbOw1J2bQdNmtToR0D2UnK/9UnKp
 msnvkUNk8fkYGS4aptpJ6HxbmjMEG5YtbiGlPj6fz5/7MTvhRjPxt7A0LWrUIdqR
 f88+sHUMqg==
 =oc8u
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block

Pull io_uring thread rewrite from Jens Axboe:
 "This converts the io-wq workers to be forked off the tasks in question
  instead of being kernel threads that assume various bits of the
  original task identity.

  This kills > 400 lines of code from io_uring/io-wq, and it's the worst
  part of the code. We've had several bugs in this area, and the worry
  is always that we could be missing some pieces for file types doing
  unusual things (recent /dev/tty example comes to mind, userfaultfd
  reads installing file descriptors is another fun one... - both of
  which need special handling, and I bet it's not the last weird oddity
  we'll find).

  With these identical workers, we can have full confidence that we're
  never missing anything. That, in itself, is a huge win. Outside of
  that, it's also more efficient since we're not wasting space and code
  on tracking state, or switching between different states.

  I'm sure we're going to find little things to patch up after this
  series, but testing has been pretty thorough, from the usual
  regression suite to production. Any issue that may crop up should be
  manageable.

  There's also a nice series of further reductions we can do on top of
  this, but I wanted to get the meat of it out sooner rather than later.
  The general worry here isn't that it's fundamentally broken. Most of
  the little issues we've found over the last week have been related to
  just changes in how thread startup/exit is done, since that's the main
  difference between using kthreads and these kinds of threads. In fact,
  if all goes according to plan, I want to get this into the 5.10 and
  5.11 stable branches as well.

  That said, the changes outside of io_uring/io-wq are:

   - arch setup, simple one-liner to each arch copy_thread()
     implementation.

   - Removal of net and proc restrictions for io_uring, they are no
     longer needed or useful"

* tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits)
  io-wq: remove now unused IO_WQ_BIT_ERROR
  io_uring: fix SQPOLL thread handling over exec
  io-wq: improve manager/worker handling over exec
  io_uring: ensure SQPOLL startup is triggered before error shutdown
  io-wq: make buffered file write hashed work map per-ctx
  io-wq: fix race around io_worker grabbing
  io-wq: fix races around manager/worker creation and task exit
  io_uring: ensure io-wq context is always destroyed for tasks
  arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
  io_uring: cleanup ->user usage
  io-wq: remove nr_process accounting
  io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
  net: remove cmsg restriction from io_uring based send/recvmsg calls
  Revert "proc: don't allow async path resolution of /proc/self components"
  Revert "proc: don't allow async path resolution of /proc/thread-self components"
  io_uring: move SQPOLL thread io-wq forked worker
  io-wq: make io_wq_fork_thread() available to other users
  io-wq: only remove worker from free_list, if it was there
  io_uring: remove io_identity
  io_uring: remove any grabbing of context
  ...
2021-02-27 08:29:02 -08:00
Linus Torvalds
e7270e47a0 s390 updates for the 5.12 merge window #2
- Fix physical vs virtual confusion in some basic mm macros and
   routines. Caused by __pa == __va on s390 currently.
 
 - Get rid of on-stack cpu masks.
 
 - Add support for complete CPU counter set extraction.
 
 - Add arch_irq_work_raise implementation.
 
 - virtio-ccw revision and opcode fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmA5UHMACgkQjYWKoQLX
 FBh/Vgf/ezaC/qx8cPAJKemWTST5tK/cc3g63L5SlvIsiKTTcO08+PYpGEo5ajU8
 0DDpDZdP+DYeDgLatrFFj+MlQyIL4wd602uKiRqD1LwjTR1oA+6HDDQtE41v2Z2a
 t2Kuv32dGVT5361RythnerTdMx18XG6k77JprP6b7zCFa2qCpc8DaKk7FvcqnQt+
 gfebknC/hdL15IJhtpPrIoqqerDXxB+HU+dSouipQwiamtENGFOuJ5csgl9XIJuR
 ILzq3Ad/6u8yKf77c+q9WRvu3DTIsXXSY8xwl3zJjNqXcAnN2sa4apc7+noNJ0YZ
 UkbokSshQomfOYMhIxyLs9AKEEcltg==
 =YVkj
 -----END PGP SIGNATURE-----

Merge tag 's390-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Fix physical vs virtual confusion in some basic mm macros and
   routines. Caused by __pa == __va on s390 currently.

 - Get rid of on-stack cpu masks.

 - Add support for complete CPU counter set extraction.

 - Add arch_irq_work_raise implementation.

 - virtio-ccw revision and opcode fixes.

* tag 's390-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cpumf: Add support for complete counter set extraction
  virtio/s390: implement virtio-ccw revision 2 correctly
  s390/smp: implement arch_irq_work_raise()
  s390/topology: move cpumasks away from stack
  s390/smp: smp_emergency_stop() - move cpumask away from stack
  s390/smp: __smp_rescan_cpus() - move cpumask away from stack
  s390/smp: consolidate locking for smp_rescan()
  s390/mm: fix phys vs virt confusion in vmem_*() functions family
  s390/mm: fix phys vs virt confusion in pgtable allocation routines
  s390/mm: fix invalid __pa() usage in pfn_pXd() macros
  s390/mm: make pXd_deref() macros return a pointer
  s390/opcodes: rename selhhhr to selfhr
2021-02-26 14:12:32 -08:00
Linus Torvalds
29c395c77a Rework of the X86 irq stack handling:
The irq stack switching was moved out of the ASM entry code in course of
   the entry code consolidation. It ended up being suboptimal in various
   ways.
 
   - Make the stack switching inline so the stackpointer manipulation is not
     longer at an easy to find place.
 
   - Get rid of the unnecessary indirect call.
 
   - Avoid the double stack switching in interrupt return and reuse the
     interrupt stack for softirq handling.
 
   - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused
     about the stack pointer manipulation.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmA21OcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaX0D/9S0ud6oqbsIvI8LwhvYub63a2cjKP9
 liHAJ7xwMYYVwzf0skwsPb/QE6+onCzdq0upJkgG/gEYm2KbiaMWZ4GgHdj0O7ER
 qXKJONDd36AGxSEdaVzLY5kPuD/mkomGk5QdaZaTmjruthkNzg4y/N2wXUBIMZR0
 FdpSpp5fGspSZCn/DXDx6FjClwpLI53VclvDs6DcZ2DIBA0K+F/cSLb1UQoDLE1U
 hxGeuNa+GhKeeZ5C+q5giho1+ukbwtjMW9WnKHAVNiStjm0uzdqq7ERGi/REvkcB
 LY62u5uOSW1zIBMmzUjDDQEqvypB0iFxFCpN8g9sieZjA0zkaUioRTQyR+YIQ8Cp
 l8LLir0dVQivR1bHghHDKQJUpdw/4zvDj4mMH10XHqbcOtIxJDOJHC5D00ridsAz
 OK0RlbAJBl9FTdLNfdVReBCoehYAO8oefeyMAG12nZeSh5XVUWl238rvzmzIYNhG
 cEtkSx2wIUNEA+uSuI+xvfmwpxL7voTGvqmiRDCAFxyO7Bl/GBu9OEBFA1eOvHB+
 +wTmPDMswRetQNh4QCRXzk1JzP1Wk5CobUL9iinCWFoTJmnsPPSOWlosN6ewaNXt
 kYFpRLy5xt9EP7dlfgBSjiRlthDhTdMrFjD5bsy1vdm1w7HKUo82lHa4O8Hq3PHS
 tinKICUqRsbjig==
 =Sqr1
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 irq entry updates from Thomas Gleixner:
 "The irq stack switching was moved out of the ASM entry code in course
  of the entry code consolidation. It ended up being suboptimal in
  various ways.

  This reworks the X86 irq stack handling:

   - Make the stack switching inline so the stackpointer manipulation is
     not longer at an easy to find place.

   - Get rid of the unnecessary indirect call.

   - Avoid the double stack switching in interrupt return and reuse the
     interrupt stack for softirq handling.

   - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got
     confused about the stack pointer manipulation"

* tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix stack-swizzle for FRAME_POINTER=y
  um: Enforce the usage of asm-generic/softirq_stack.h
  x86/softirq/64: Inline do_softirq_own_stack()
  softirq: Move do_softirq_own_stack() to generic asm header
  softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig
  x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
  x86/softirq: Remove indirection in do_softirq_own_stack()
  x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall
  x86/entry: Convert device interrupts to inline stack switching
  x86/entry: Convert system vectors to irq stack macro
  x86/irq: Provide macro for inlining irq stack switching
  x86/apic: Split out spurious handling code
  x86/irq/64: Adjust the per CPU irq stack pointer by 8
  x86/irq: Sanitize irq stack tracking
  x86/entry: Fix instrumentation annotation
2021-02-24 16:32:23 -08:00
Thomas Richter
cf6acb8bdb s390/cpumf: Add support for complete counter set extraction
Add support to the CPU Measurement counter facility device driver
to extract complete counter sets per CPU and per counter set from user
space. This includes a new device named /dev/hwctr and support
for the device driver functions open, close and ioctl. Other
functions are not supported.

The ioctl command supports 3 subcommands:
S390_HWCTR_START: enables counter sets on a list of CPUs.
S390_HWCTR_STOP: disables counter sets on a list of CPUs.
S390_HWCTR_READ: reads counter sets on a list of CPUs.

The ioctl(..., S390_HWCTR_READ, ...) is the only subcommand which
returns data.  It requires member data_bytes to be positive and
indicates the maximum amount of data available to store counter set
data. The other ioctl() subcommands do not use this member and it
should be set to zero.
The S390_HWCTR_READ subcommand returns the following data:

The cpuset data is flattened using the following scheme, stored in member
data:

 0x0       0x8   0xc       0x10  0x10      0x18  0x20  0x28         0xU-1
 +---------+-----+---------+-----+---------+-----+-----+------+------+
 | no_cpus | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n |
 +---------+-----+---------+-----+---------+-----+-----+------+------+

                           0xU   0xU+4     0xU+8 0xU+10             0xV-1
                           +-----+---------+-----+-----+------+------+
                           | set | no_cnts | cv1 | cv2 | .... | cv_n |
                           +-----+---------+-----+-----+------+------+

           0xV   0xV+4     0xV+8 0xV+c
           +-----+---------+-----+---------+-----+-----+------+------+
           | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n |
           +-----+---------+-----+---------+-----+-----+------+------+

U and V denote arbitrary hexadezimal addresses.
The first integer represents the number of CPUs data was extracted
from. This is followed by CPU number and number of counter sets extracted.
Both are two integer values. This is followed by the set identifer
and number of counters extracted. Both are two integer values. This is
followed by the counter values, each element is eight bytes in size.

The S390_HWCTR_READ ioctl subcommand is also limited to one call per
minute. This ensures that an application does not read out the
counter sets too often and reduces the overall CPU performance.
The complete counter set extraction is an expensive operation.

Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24 00:31:23 +01:00
Ilya Leoshkevich
55f03123f6 s390/smp: implement arch_irq_work_raise()
The immediate need to have this is to have bpf_send_signal() send the
signal ASAP instead of during the next hrtimer interrupt. However, it
should also improve irq_work_queue() latencies in general, as well as
get s390 out of the lame architectures list [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq_work.c?h=v5.11#n45

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24 00:31:22 +01:00
Heiko Carstens
da6d2c289d s390/topology: move cpumasks away from stack
Make cpumasks static variables to avoid potential large stack
frames. There shouldn't be any concurrent callers since all current
callers are serialized with the cpu hotplug lock.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24 00:31:22 +01:00
Heiko Carstens
f213e5502d s390/smp: smp_emergency_stop() - move cpumask away from stack
Make "cpumask_t cpumask" a static variable to avoid a potential large
stack frame. Also protect against potential concurrent callers by
introducing a local lock.
Note: smp_emergency_stop() gets only called with irqs and machine
checks disabled, therefore a cpu local deadlock is not possible. For
concurrent callers the first cpu which enters the critical section
wins and will stop all other cpus.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24 00:31:22 +01:00
Heiko Carstens
62c8dca9e1 s390/smp: __smp_rescan_cpus() - move cpumask away from stack
Avoid a potentially large stack frame and overflow by making
"cpumask_t avail" a static variable. There is no concurrent
access due to the existing locking.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24 00:31:22 +01:00
Heiko Carstens
588a079ebd s390/smp: consolidate locking for smp_rescan()
Move locking to __smp_rescan() instead of duplicating it to all call sites.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24 00:31:22 +01:00
Linus Torvalds
7d6beb71da idmapped-mounts-v5.12
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
 ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
 =yPaw
 -----END PGP SIGNATURE-----

Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Jens Axboe
4727dc20e0 arch: setup PF_IO_WORKER threads like PF_KTHREAD
PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the
sense that we don't assign ->set_child_tid with our own structure. Just
ensure that every arch sets up the PF_IO_WORKER threads like kthreads
in the arch implementation of copy_thread().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Linus Torvalds
df24212a49 s390 updates for the 5.12 merge window
- Convert to using the generic entry infrastructure.
 
 - Add vdso time namespace support.
 
 - Switch s390 and alpha to 64-bit ino_t. As discussed here
   lkml.kernel.org/r/YCV7QiyoweJwvN+m@osiris
 
 - Get rid of expensive stck (store clock) usages where possible. Utilize
   cpu alternatives to patch stckf when supported.
 
 - Make tod_clock usage less error prone by converting it to a union and
   rework code which is using it.
 
 - Machine check handler fixes and cleanups.
 
 - Drop couple of minor inline asm optimizations to fix clang build.
 
 - Default configs changes notably to make libvirt happy.
 
 - Various changes to rework and improve qdio code.
 
 - Other small various fixes and improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmAyzcwACgkQjYWKoQLX
 FBjjMwgAmeY3oMkj93bnUF/OnbYTJQ0ZHmlyeboKt7SnFyvNpOVGyRfl7+fPHsNu
 +t9QZQk0f7fSxbcC04gz0ZMw1YbTjWihgZJsN6s+qtrRsv/kVqKr7kvhFrcs8uSZ
 rLiwIRWGVAbprnJZWCNqaGpKkOM0wPYZ5W3Mtnoxe4nTM2LwSu2RWI8ibTGYLQPy
 FybKos2hYOFBTGQdrxmg1zAvpE8DJg4qQNLhYvnmHd8Bw/FNBmoyhx8rS8z06NmS
 dWMk7pfvQaslIIaFC3Yo7/sJVa/JJH33FlBonc+MSO8OZz5O6vG4bk9ZHq6DfHUH
 V1I38xiBdYdSXDq8QqT3N9d+CtjeMQ==
 =Lt/v
 -----END PGP SIGNATURE-----

Merge tag 's390-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Convert to using the generic entry infrastructure.

 - Add vdso time namespace support.

 - Switch s390 and alpha to 64-bit ino_t. As discussed at

     https://lore.kernel.org/linux-mm/YCV7QiyoweJwvN+m@osiris/

 - Get rid of expensive stck (store clock) usages where possible.
   Utilize cpu alternatives to patch stckf when supported.

 - Make tod_clock usage less error prone by converting it to a union and
   rework code which is using it.

 - Machine check handler fixes and cleanups.

 - Drop couple of minor inline asm optimizations to fix clang build.

 - Default configs changes notably to make libvirt happy.

 - Various changes to rework and improve qdio code.

 - Other small various fixes and improvements all over the code.

* tag 's390-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (68 commits)
  s390/qdio: remove 'merge_pending' mechanism
  s390/qdio: improve handling of PENDING buffers for QEBSM devices
  s390/qdio: rework q->qdio_error indication
  s390/qdio: inline qdio_kick_handler()
  s390/time: remove get_tod_clock_ext()
  s390/crypto: use store_tod_clock_ext()
  s390/hypfs: use store_tod_clock_ext()
  s390/debug: use union tod_clock
  s390/kvm: use union tod_clock
  s390/vdso: use union tod_clock
  s390/time: convert tod_clock_base to union
  s390/time: introduce new store_tod_clock_ext()
  s390/time: rename store_tod_clock_ext() and use union tod_clock
  s390/time: introduce union tod_clock
  s390,alpha: switch to 64-bit ino_t
  s390: split cleanup_sie
  s390: use r13 in cleanup_sie as temp register
  s390: fix kernel asce loading when sie is interrupted
  s390: add stack for machine check handler
  s390: use WRITE_ONCE when re-allocating async stack
  ...
2021-02-21 13:40:06 -08:00
Linus Torvalds
591fd30eee Merge branch 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ELF compat updates from Al Viro:
 "Sanitizing ELF compat support, especially for triarch architectures:

   - X32 handling cleaned up

   - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now

   - Kconfig side of things regularized

  Eventually I hope to have compat_binfmt_elf.c killed, with both native
  and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed
  by kbuild, but that's a separate story - not included here"

* 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get rid of COMPAT_ELF_EXEC_PAGESIZE
  compat_binfmt_elf: don't bother with undef of ELF_ARCH
  Kconfig: regularize selection of CONFIG_BINFMT_ELF
  mips compat: switch to compat_binfmt_elf.c
  mips: don't bother with ELF_CORE_EFLAGS
  mips compat: don't bother with ELF_ET_DYN_BASE
  mips: KVM_GUEST makes no sense for 64bit builds...
  mips: kill unused definitions in binfmt_elf[on]32.c
  mips binfmt_elf*32.c: use elfcore-compat.h
  x32: make X32, !IA32_EMULATION setups able to execute x32 binaries
  [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly
  elf_prstatus: collect the common part (everything before pr_reg) into a struct
  binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
2021-02-21 09:29:23 -08:00
Heiko Carstens
d1deda6f2b s390/debug: use union tod_clock
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
169ceac429 s390/vdso: use union tod_clock
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
f8d8977a3d s390/time: convert tod_clock_base to union
Convert tod_clock_base to union tod_clock. This simplifies quite a bit
of code and also fixes a bug in read_persistent_clock64();

void read_persistent_clock64(struct timespec64 *ts)
{
        __u64 delta;

        delta = initial_leap_seconds + TOD_UNIX_EPOCH;
        get_tod_clock_ext(clk);
        *(__u64 *) &clk[1] -= delta;
        if (*(__u64 *) &clk[1] > delta)
                clk[0]--;
        ext_to_timespec64(clk, ts);
}

Assume &clk[1] == 3 and delta == 2; then after the substraction the if
condition becomes true and the epoch part of the clock is decremented
by one because of an assumed overflow, even though there is none.

Fix this by using 128 bit arithmetics and let the compiler do the
right thing:

void read_persistent_clock64(struct timespec64 *ts)
{
        union tod_clock clk;
        u64 delta;

        delta = initial_leap_seconds + TOD_UNIX_EPOCH;
        store_tod_clock_ext(&clk);
        clk.eitod -= delta;
        ext_to_timespec64(&clk, ts);
}

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
530f639f1e s390/time: rename store_tod_clock_ext() and use union tod_clock
Rename store_tod_clock_ext() to store_tod_clock_ext_cc() to reflect
that it returns a condition code and also use union tod_clock as
parameter.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Sven Schnelle
efa5473590 s390: split cleanup_sie
The current code uses the address in %r11 to figure out whether
it was called from the machine check handler or from a normal
interrupt handler. Instead of doing this implicit logic (which
is mostly a leftover from the old critical cleanup approach)
just add a second label and use that.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
33ea04872d s390: use r13 in cleanup_sie as temp register
Instead of thrashing r11 which is normally our pointer to struct
pt_regs on the stack, use r13 as temporary register in the BR_EX
macro. r13 is already used in cleanup_sie, so no need to thrash
another register.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
26521412ae s390: fix kernel asce loading when sie is interrupted
If a machine check is coming in during sie, the PU saves the
control registers to the machine check save area. Afterwards
mcck_int_handler is called, which loads __LC_KERNEL_ASCE into
%cr1. Later the code restores %cr1 from the machine check area,
but that is wrong when SIE was interrupted because the machine
check area still contains the gmap asce. Instead it should return
with either __KERNEL_ASCE in %cr1 when interrupted in SIE or
the previous %cr1 content saved in the machine check save area.

Fixes: 87d5986345 ("s390/mm: remove set_fs / rework address space handling")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
b61b159512 s390: add stack for machine check handler
The previous code used the normal kernel stack for machine checks.
This is problematic when a machine check interrupts a system call
or interrupt handler right at the beginning where registers are set up.

Assume system_call is interrupted at the first instruction and a machine
check is triggered. The machine check handler is called, checks the PSW
to see whether it is coming from user space, notices that it is already
in kernel mode but %r15 still contains the user space stack. This would
lead to a kernel crash.

There are basically two ways of fixing that: Either using the 'critical
cleanup' approach which compares the address in the PSW to see whether
it is already at a point where the stack has been set up, or use an extra
stack for the machine check handler.

For simplicity, we will go with the second approach and allocate an extra
stack. This adds some memory overhead for large systems, but usually large
system have plenty of memory so this isn't really a concern. But it keeps
the mchk stack setup simple and less error prone.

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
64985c3a22 s390: use WRITE_ONCE when re-allocating async stack
The code does:

S390_lowcore.async_stack = new + STACK_INIT_OFFSET;

But the compiler is free to first assign one value and
add the other value later. If a IRQ would be coming in
between these two operations, it would run with an invalid
stack. Prevent this by using WRITE_ONCE.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
b0d31159a4 s390: open code SWITCH_KERNEL macro
This is a preparation patch for two later bugfixes. In the past both
int_handler and machine check handler used SWITCH_KERNEL to switch to
the kernel stack. However, SWITCH_KERNEL doesn't work properly in machine
check context. So instead of adding more complexity to this macro, just
remove it.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Ingo Molnar
a3251c1a36 Merge branch 'x86/paravirt' into x86/entry
Merge in the recent paravirt changes to resolve conflicts caused
by objtool annotations.

Conflicts:
	arch/x86/xen/xen-asm.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-02-12 13:36:43 +01:00
Thomas Gleixner
db1cc7aede softirq: Move do_softirq_own_stack() to generic asm header
To avoid include recursion hell move the do_softirq_own_stack() related
content into a generic asm header and include it from all places in arch/
which need the prototype.

This allows architectures to provide an inline implementation of
do_softirq_own_stack() without introducing a lot of #ifdeffery all over the
place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002513.289960691@linutronix.de
2021-02-10 23:34:16 +01:00
Heiko Carstens
1c7673476b s390/vtime: use cpu alternative for stck/stckf
Use a cpu alternative to switch between stck and stckf instead of
making it compile time dependent. This will also make kernels compiled
for old machines, but running on newer machines, use stckf.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:06 +01:00
Heiko Carstens
78f6570946 s390/entry: use cpu alternative for stck/stckf
Use a cpu alternative to switch between stck and stckf instead of
making it compile time dependent. This will also make kernels compiled
for old machines, but running on newer machines, use stckf.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
b22446d00a s390/time: use stcke instead of stck
Use STORE CLOCK EXTENDED instead of STORE CLOCK in early tod clock
setup. This is just to remove another usage of stck, trying to remove
all usages of STORE CLOCK.  This doesn't fix anything.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
683071b02c s390/cpum_cf_diag: use get_tod_clock_fast()
Use get_tod_clock_fast() instead of store_tod_clock(), since
store_tod_clock() can be very slow.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
b29c509382 s390/vtime: fix inline assembly clobber list
The stck/stckf instruction used within the inline assembly within
do_account_vtime() changes the condition code. This is not reflected
with the clobber list, and therefore might result in incorrect code
generation.

It seems unlikely that the compiler could generate incorrect code
considering the surrounding C code, but it must still be fixed.

Cc: <stable@vger.kernel.org>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
fe8344a092 s390/vdso: on timens page fault prefault also VVAR page
This is the s390 variant of commit e6b28ec65b ("x86/vdso: On timens
page fault prefault also VVAR page").

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
eeab78b05d s390/vdso: implement generic vdso time namespace support
Implement generic vdso time namespace support which also enables time
namespaces for s390. This is quite similar to what arm64 has.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
214b356486 s390/vdso: move data page before code pages
For consistency with x86 and arm64 move the data page before code
pages. Similar to commit 601255ae3c ("arm64: vdso: move data page
before code pages").

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
5056c2c53a s390/vdso: put vdso datapage in a separate vma
Add a separate "[vvar]" mapping for the vdso datapage, since it
doesn't need to be executable or COW-able.

This is actually the s390 implementation of commit 8715493852
("arm64: vdso: put vdso datapage in a separate vma")

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
dfc11c9876 s390/vdso: get rid of vdso_fault
Implement vdso mapping similar to arm64 and powerpc.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
8d4be7f318 s390/vdso: misc simple code changes
- remove unneeded includes
- move functions around
- remove obvious and/or incorrect comments
- shorten some if conditions

No functional change.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
6755270b5e s390/vdso: remove superfluous variables
A few local variables exist only so the contents of a global variable
can be copied to them, and use that value only for reading.
Just remove them and rename some global variables. Also change
vdso64_[start|end] to be character arrays to be consistent with other
architectures, and get rid of the global variable vdso64_kbase.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
5ffd9af0fb s390/vdso: remove superfluous check
vdso_pages (aka vdso64_pages) is never 0, therefore remove the check.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
e1eac1947b s390/vdso: remove BUG_ON()
Handle allocation error gracefully and simply disable vdso instead of
leaving the system in an undefined state.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
ea44de691e s390/vdso: simplify vdso size calculation
The vdso is (and must) be page aligned and its size must also be
a multiple of PAGE_SIZE. Therefore no need to round upwards.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
96c0c7ae52 s390/vdso: convert vdso_init() to arch_initcall
Convert vdso_init() to arch_initcall like it is on all other architectures.
This requires to remove the vdso_getcpu_init() call from vdso_init()
since it must be called before smp is enabled.
vdso_getcpu_init() is now an early_initcall like on powerpc.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Heiko Carstens
1432cfe69e s390/vdso: fix vdso data page definition
The vdso data page actually contains an array. Fix that.
This doesn't fix a real bug, just reflects reality.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09 15:57:05 +01:00
Sven Schnelle
c1971eae30 s390: add missing include to arch/s390/kernel/signal.c
This fixes the following warning:

CHECK   linux/arch/s390/kernel/signal.c
linux/arch/s390/kernel/signal.c:465:6: warning: symbol 'arch_do_signal_or_restart' was not declared. Should it be static?

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-27 13:00:47 +01:00
Janosch Frank
e82080e1f4 s390: uv: Fix sysfs max number of VCPUs reporting
The number reported by the query is N-1 and I think people reading the
sysfs file would expect N instead. For users creating VMs there's no
actual difference because KVM's limit is currently below the UV's
limit.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: a0f60f8431 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information")
Cc: stable@vger.kernel.org
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-27 13:00:04 +01:00
Christian Brauner
2a1867219c
fs: add mount_setattr()
This implements the missing mount_setattr() syscall. While the new mount
api allows to change the properties of a superblock there is currently
no way to change the properties of a mount or a mount tree using file
descriptors which the new mount api is based on. In addition the old
mount api has the restriction that mount options cannot be applied
recursively. This hasn't changed since changing mount options on a
per-mount basis was implemented in [1] and has been a frequent request
not just for convenience but also for security reasons. The legacy
mount syscall is unable to accommodate this behavior without introducing
a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND |
MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost
mount. Changing MS_REC to apply to the whole mount tree would mean
introducing a significant uapi change and would likely cause significant
regressions.

The new mount_setattr() syscall allows to recursively clear and set
mount options in one shot. Multiple calls to change mount options
requesting the same changes are idempotent:

int mount_setattr(int dfd, const char *path, unsigned flags,
                  struct mount_attr *uattr, size_t usize);

Flags to modify path resolution behavior are specified in the @flags
argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW,
and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to
restrict path resolution as introduced with openat2() might be supported
in the future.

The mount_setattr() syscall can be expected to grow over time and is
designed with extensibility in mind. It follows the extensible syscall
pattern we have used with other syscalls such as openat2(), clone3(),
sched_{set,get}attr(), and others.
The set of mount options is passed in the uapi struct mount_attr which
currently has the following layout:

struct mount_attr {
	__u64 attr_set;
	__u64 attr_clr;
	__u64 propagation;
	__u64 userns_fd;
};

The @attr_set and @attr_clr members are used to clear and set mount
options. This way a user can e.g. request that a set of flags is to be
raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in
@attr_set while at the same time requesting that another set of flags is
to be lowered such as removing noexec from a mount tree by specifying
MOUNT_ATTR_NOEXEC in @attr_clr.

Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0,
not a bitmap, users wanting to transition to a different atime setting
cannot simply specify the atime setting in @attr_set, but must also
specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that
MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set
can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in
@attr_clr.

The @propagation field lets callers specify the propagation type of a
mount tree. Propagation is a single property that has four different
settings and as such is not really a flag argument but an enum.
Specifically, it would be unclear what setting and clearing propagation
settings in combination would amount to. The legacy mount() syscall thus
forbids the combination of multiple propagation settings too. The goal
is to keep the semantics of mount propagation somewhat simple as they
are overly complex as it is.

The @userns_fd field lets user specify a user namespace whose idmapping
becomes the idmapping of the mount. This is implemented and explained in
detail in the next patch.

[1]: commit 2e4b7fcd92 ("[PATCH] r/o bind mounts: honor mount writer counts at remount")

Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com
Cc: David Howells <dhowells@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:42:45 +01:00
Sven Schnelle
3a790cc1c9 s390: pass struct pt_regs instead of registers to syscalls
Instead of fetching all registers from struct pt_regs and passing
them to the syscall wrappers, let the system call wrappers only
fetch the values really required.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19 12:29:27 +01:00
Sven Schnelle
56e62a7370 s390: convert to generic entry
This patch converts s390 to use the generic entry infrastructure from
kernel/entry/*.

There are a few special things on s390:

- PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't
  know about our PIF flags in exit_to_user_mode_loop().

- The old code had several ways to restart syscalls:

  a) PIF_SYSCALL_RESTART, which was only set during execve to force a
     restart after upgrading a process (usually qemu-kvm) to pgste page
     table extensions.

  b) PIF_SYSCALL, which is set by do_signal() to indicate that the
     current syscall should be restarted. This is changed so that
     do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use
     PIF_SYSCALL doesn't work with the generic code, and changing it
     to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART
     more unique.

- On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by
executing a svc instruction on the process stack which causes a fault.
While handling that fault the fault code sets PIF_SYSCALL to hand over
processing to the syscall code on exit to usermode.

The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets
a return value for a syscall. The s390x ptrace ABI uses r2 both for the
syscall number and return value, so ptrace cannot set the syscall number +
return value at the same time. The flag makes handling that a bit easier.
do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET
is set.

CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY.
CR1/7/13 will be checked both on kernel entry and exit to contain the
correct asces.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19 12:29:26 +01:00
Al Viro
f2485a2dc9 elf_prstatus: collect the common part (everything before pr_reg) into a struct
Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating
the beginning of compat_elf_prstatus, take these fields into a separate
structure (compat_elf_prstatus_common), so that it could be reused.  Due to
the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we
need the same shape change done to native struct elf_prstatus, gathering the
fields prior to pr_reg into a new structure (struct elf_prstatus_common).

Fortunately, offset of pr_reg is always a multiple of 16 with no padding
right before it, so it's possible to turn all the stuff prior to it into
a single member without disturbing the layout.

[build fix from Geert Uytterhoeven folded in]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-06 08:38:29 -05:00
Linus Torvalds
3913d00ac5 A treewide cleanup of interrupt descriptor (ab)use with all sorts of racy
accesses, inefficient and disfunctional code. The goal is to remove the
 export of irq_to_desc() to prevent these things from creeping up again.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/ifgsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYm6EACAo8sObkuY3oWLagtGj1KHxon53oGZ
 VfDw2LYKM+rgJjDWdiyocyxQU5gtm6loWCrIHjH2adRQ4EisB5r8hfI8NZHxNMyq
 8khUi822NRBfFN6SCpO8eW9o95euscNQwCzqi7gV9/U/BAKoDoSEYzS4y0YmJlup
 mhoikkrFiBuFXplWI0gbP4ihb8S/to2+kTL6o7eBoJY9+fSXIFR3erZ6f3fLjYZG
 CQUUysTywdDhLeDkC9vaesXwgdl2XnaPRwcQqmK8Ez0QYNYpawyILUHLD75cIHDu
 bHdK2ZoDv/wtad/3BoGTK3+wChz20a/4/IAnBIUVgmnSLsPtW8zNEOPWNNc0aGg+
 rtafi5bvJ1lMoSZhkjLWQDOGU6vFaXl9NkC2fpF+dg1skFMT2CyLC8LD/ekmocon
 zHAPBva9j3m2A80hI3dUH9azo/IOl1GHG8ccM6SCxY3S/9vWSQChNhQDLe25xBEO
 VtKZS7DYFCRiL8mIy9GgwZWof8Vy2iMua2ML+W9a3mC9u3CqSLbCFmLMT/dDoXl1
 oHnMdAHk1DRatA8pJAz83C75RxbAS2riGEqtqLEQ6OaNXn6h0oXCanJX9jdKYDBh
 z6ijWayPSRMVktN6FDINsVNFe95N4GwYcGPfagIMqyMMhmJDic6apEzEo7iA76lk
 cko28MDqTIK4UQ==
 =BXv+
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2020-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "This is the second attempt after the first one failed miserably and
  got zapped to unblock the rest of the interrupt related patches.

  A treewide cleanup of interrupt descriptor (ab)use with all sorts of
  racy accesses, inefficient and disfunctional code. The goal is to
  remove the export of irq_to_desc() to prevent these things from
  creeping up again"

* tag 'irq-core-2020-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  genirq: Restrict export of irq_to_desc()
  xen/events: Implement irq distribution
  xen/events: Reduce irq_info:: Spurious_cnt storage size
  xen/events: Only force affinity mask for percpu interrupts
  xen/events: Use immediate affinity setting
  xen/events: Remove disfunct affinity spreading
  xen/events: Remove unused bind_evtchn_to_irq_lateeoi()
  net/mlx5: Use effective interrupt affinity
  net/mlx5: Replace irq_to_desc() abuse
  net/mlx4: Use effective interrupt affinity
  net/mlx4: Replace irq_to_desc() abuse
  PCI: mobiveil: Use irq_data_get_irq_chip_data()
  PCI: xilinx-nwl: Use irq_data_get_irq_chip_data()
  NTB/msi: Use irq_has_action()
  mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc
  pinctrl: nomadik: Use irq_has_action()
  drm/i915/pmu: Replace open coded kstat_irqs() copy
  drm/i915/lpe_audio: Remove pointless irq_to_desc() usage
  s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt()
  parisc/irq: Use irq_desc_kstat_cpu() in show_interrupts()
  ...
2020-12-24 13:50:23 -08:00
Heiko Carstens
450f68e242 epoll: fix compat syscall wire up of epoll_pwait2
Commit b0a0c2615f ("epoll: wire up syscall epoll_pwait2") wired up
the 64 bit syscall instead of the compat variant in a couple of places.

Fixes: b0a0c2615f ("epoll: wire up syscall epoll_pwait2")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-20 10:01:38 -08:00
Linus Torvalds
1db98bcf56 Merge branch 'akpm' (patches from Andrew)
Merge still more updates from Andrew Morton:
 "18 patches.

  Subsystems affected by this patch series: mm (memcg and cleanups) and
  epoll"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/Kconfig: fix spelling mistake "whats" -> "what's"
  selftests/filesystems: expand epoll with epoll_pwait2
  epoll: wire up syscall epoll_pwait2
  epoll: add syscall epoll_pwait2
  epoll: convert internal api to timespec64
  epoll: eliminate unnecessary lock for zero timeout
  epoll: replace gotos with a proper loop
  epoll: pull all code between fetch_events and send_event into the loop
  epoll: simplify and optimize busy loop logic
  epoll: move eavail next to the list_empty_careful check
  epoll: pull fatal signal checks into ep_send_events()
  epoll: simplify signal handling
  epoll: check for events when removing a timed out thread from the wait queue
  mm/memcontrol:rewrite mem_cgroup_page_lruvec()
  mm, kvm: account kvm_vcpu_mmap to kmemcg
  mm/memcg: remove unused definitions
  mm/memcg: warning on !memcg after readahead page charged
  mm/memcg: bail early from swap accounting if memcg disabled
2020-12-19 11:39:50 -08:00
Willem de Bruijn
b0a0c2615f epoll: wire up syscall epoll_pwait2
Split off from prev patch in the series that implements the syscall.

Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-19 11:18:38 -08:00
Linus Torvalds
a087241716 - Always initialize kernel stack backchain when entering the kernel, so
that unwinding works properly.
 
 - Fix stack  unwinder test case to avoid rare interrupt stack corruption.
 
 - Simplify udelay() and just let it busy loop instead of implementing a
   complex logic.
 
 - arch_cpu_idle() cleanup.
 
 - Some other minor improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl/c800ACgkQIg7DeRsp
 bsJBTRAAxJz7J4X1CqyBf+exDcWhjc+FXUEgwDCNbmkPRezvOrivKSymXDoVbvVo
 D2ptGGQtpnUsrFqHZ6o0DwEWfcxrDSXlyJV16ABkPDcARuV2bDaor7HzaHJfyuor
 nUD0rb/0dWbzzFMlNo+WAG8htrhmS5mA4f1p5XSOohf9zP8Sm6NTVq0A7pK4oJuw
 AU6723chxE326yoB2DcyFHaNqByn7jNyVLxfZgH1tyCTRGvqi6ERT+kKLb58eSi8
 t1YYEEuwanUUZSjSDHqZeHA2evfJl/ilWAkUdAJWwJL7hoYnCBhqcjexseeinQ7n
 09GEGTVVdv09YPZYgDCU+YpJ853gS5zAHYh2ItC3kluCcXV0XNrNyCDT11OxQ4I4
 s1uoMhx6S2RvEXKuJZTatmEhNpKd5UXTUoiM0NDYgwdpcxKcyE0cA4FH3Ik+KE/1
 np4CsskOYU/XuFxOlu29gB7jJ7R/x2AXyJQdSELU+QXKUuaIF8uINnbzUyCc9mcY
 pG9+NKWycRzTXT/1nbKOTBFEhjQi20XcoWRLqX5T0o9D9wLnq4Q+wVhLTt/e5DMb
 pw94JDK9HNX2QTULd6YDR4gXxPrypiX4IBli8CHvZcwNnm6N5vdz9nMvxX+v4s/B
 lbdo4JHnmIpTsTJf8YdFZPggYlJsxuV4ITNRu4BfFwtdCrZhfc8=
 =1l0g
 -----END PGP SIGNATURE-----

Merge tag 's390-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:
 "This is mainly to decouple udelay() and arch_cpu_idle() and simplify
  both of them.

  Summary:

   - Always initialize kernel stack backchain when entering the kernel,
     so that unwinding works properly.

   - Fix stack unwinder test case to avoid rare interrupt stack
     corruption.

   - Simplify udelay() and just let it busy loop instead of implementing
     a complex logic.

   - arch_cpu_idle() cleanup.

   - Some other minor improvements"

* tag 's390-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: convert comma to semicolon
  s390/idle: allow arch_cpu_idle() to be kprobed
  s390/idle: remove raw_local_irq_save()/restore() from arch_cpu_idle()
  s390/idle: merge enabled_wait() and arch_cpu_idle()
  s390/delay: remove udelay_simple()
  s390/irq: select HAVE_IRQ_EXIT_ON_IRQ_STACK
  s390/delay: simplify udelay
  s390/test_unwind: use timer instead of udelay
  s390/test_unwind: fix CALL_ON_STACK tests
  s390: make calls to TRACE_IRQS_OFF/TRACE_IRQS_ON balanced
  s390: always clear kernel stack backchain before calling functions
2020-12-18 11:08:06 -08:00
Linus Torvalds
09c0796adf Tracing updates for 5.11
The major update to this release is that there's a new arch config option called:
 CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS. Currently, only x86_64 enables it.
 All the ftrace callbacks now take a struct ftrace_regs instead of a struct
 pt_regs. If the architecture has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then
 the ftrace_regs will have enough information to read the arguments of the
 function being traced, as well as access to the stack pointer. This way, if
 a user (like live kernel patching) only cares about the arguments, then it
 can avoid using the heavier weight "regs" callback, that puts in enough
 information in the struct ftrace_regs to simulate a breakpoint exception
 (needed for kprobes).
 
 New config option that audits the timestamps of the ftrace ring buffer at
 most every event recorded.  The "check_buffer()" calls will conflict with
 mainline, because I purposely added the check without including the fix that
 it caught, which is in mainline. Running a kernel built from the commit of
 the added check will trigger it.
 
 Ftrace recursion protection has been cleaned up to move the protection to
 the callback itself (this saves on an extra function call for those
 callbacks).
 
 Perf now handles its own RCU protection and does not depend on ftrace to do
 it for it (saving on that extra function call).
 
 New debug option to add "recursed_functions" file to tracefs that lists all
 the places that triggered the recursion protection of the function tracer.
 This will show where things need to be fixed as recursion slows down the
 function tracer.
 
 The eval enum mapping updates done at boot up are now offloaded to a work
 queue, as it caused a noticeable pause on slow embedded boards.
 
 Various clean ups and last minute fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX9uq8xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtrwAQCHevqWMjKc1Q76bnCgwB0AbFKB6vqy
 5b6g/co5+ihv8wD/eJPWlZMAt97zTVW7bdp5qj/GTiCDbAsODMZ597LsxA0=
 =rZEz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The major update to this release is that there's a new arch config
  option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.

  Currently, only x86_64 enables it. All the ftrace callbacks now take a
  struct ftrace_regs instead of a struct pt_regs. If the architecture
  has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
  have enough information to read the arguments of the function being
  traced, as well as access to the stack pointer.

  This way, if a user (like live kernel patching) only cares about the
  arguments, then it can avoid using the heavier weight "regs" callback,
  that puts in enough information in the struct ftrace_regs to simulate
  a breakpoint exception (needed for kprobes).

  A new config option that audits the timestamps of the ftrace ring
  buffer at most every event recorded.

  Ftrace recursion protection has been cleaned up to move the protection
  to the callback itself (this saves on an extra function call for those
  callbacks).

  Perf now handles its own RCU protection and does not depend on ftrace
  to do it for it (saving on that extra function call).

  New debug option to add "recursed_functions" file to tracefs that
  lists all the places that triggered the recursion protection of the
  function tracer. This will show where things need to be fixed as
  recursion slows down the function tracer.

  The eval enum mapping updates done at boot up are now offloaded to a
  work queue, as it caused a noticeable pause on slow embedded boards.

  Various clean ups and last minute fixes"

* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  tracing: Offload eval map updates to a work queue
  Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
  ring-buffer: Add rb_check_bpage in __rb_allocate_pages
  ring-buffer: Fix two typos in comments
  tracing: Drop unneeded assignment in ring_buffer_resize()
  tracing: Disable ftrace selftests when any tracer is running
  seq_buf: Avoid type mismatch for seq_buf_init
  ring-buffer: Fix a typo in function description
  ring-buffer: Remove obsolete rb_event_is_commit()
  ring-buffer: Add test to validate the time stamp deltas
  ftrace/documentation: Fix RST C code blocks
  tracing: Clean up after filter logic rewriting
  tracing: Remove the useless value assignment in test_create_synth_event()
  livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
  ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
  ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
  MAINTAINERS: assign ./fs/tracefs to TRACING
  tracing: Fix some typos in comments
  ftrace: Remove unused varible 'ret'
  ring-buffer: Add recording of ring buffer recursion into recursed_functions
  ...
2020-12-17 13:22:17 -08:00
Linus Torvalds
005b2a9dc8 tif-task_work.arch-2020-12-14
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/YJxsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjpyEACBdW+YjenjTbkUPeEXzQgkBkTZUYw3g007
 DPcUT1g8PQZXYXlQvBKCvGhhIr7/KVcjepKoowiNQfBNGcIPJTVopW58nzpqAfTQ
 goI2WYGn5EKFFKBPvtH04cJD/Wo8muXdxynKtqyZbnGGgZjQxPrE259b8dpHjBSR
 6L7HHkk0D1oU/5b6h6Ocpg9mc/0iIUCZylySAYY3eGO0JaVPJaXgZSJZYgHxCHll
 Lb+/y/fXdtm/0PmQ3ko0ev54g3yEWqZIX0NsZW1asrButIy+KLzQ2Mz1xFLFDMag
 prtIfwb8tzgc4dFPY090C/azjCh5CPpxqYS6FkRwS0p86n6OhkyXrqfily5Hs4/B
 NC7CBPBSH/j+NKUK7CYZcpTzTpxPjUr9p0anUdlvMJz8FhTb/3YEEZ1UTeWOeHmk
 Yo5SxnFghLeZZeZ1ok6rdymnVa7WEX12SCLGQX31BB2mld0tNbKb4b+FsBF6OUMk
 IUaX6OjwDFVRaysC88BQ4hjcIP1HxsViG4/VZDX15gjAAH2Pvb+7tev+lcDcOhjz
 TCD4GNFspTFzRhh9nT7oxQ679qCh9G9zHbzuIRewnrS6iqvo5SJQB3dR2yrWZRRH
 ySkQFiHpYOlnLJYv0jg9COlGwo2FUdcvKhCvkjQKKBz48rzW/IC0LwKdRQWZDFk3
 FKGzP/NBig==
 =cadT
 -----END PGP SIGNATURE-----

Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block

Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe:
 "This sits on top of of the core entry/exit and x86 entry branch from
  the tip tree, which contains the generic and x86 parts of this work.

  Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL.

  With that done, we can get rid of JOBCTL_TASK_WORK from task_work and
  signal.c, and also remove a deadlock work-around in io_uring around
  knowing that signal based task_work waking is invoked with the sighand
  wait queue head lock.

  The motivation for this work is to decouple signal notify based
  task_work, of which io_uring is a heavy user of, from sighand. The
  sighand lock becomes a huge contention point, particularly for
  threaded workloads where it's shared between threads. Even outside of
  threaded applications it's slower than it needs to be.

  Roman Gershman <romger@amazon.com> reported that his networked
  workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU
  after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all
  spent hammering on the sighand lock, showing 57% of the CPU time there
  [1].

  There are further cleanups possible on top of this. One example is
  TIF_PATCH_PENDING, where a patch already exists to use
  TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more
  consolidation, but the work stands on its own as well"

[1] https://github.com/axboe/liburing/issues/215

* tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits)
  io_uring: remove 'twa_signal_ok' deadlock work-around
  kernel: remove checking for TIF_NOTIFY_SIGNAL
  signal: kill JOBCTL_TASK_WORK
  io_uring: JOBCTL_TASK_WORK is no longer used by task_work
  task_work: remove legacy TWA_SIGNAL path
  sparc: add support for TIF_NOTIFY_SIGNAL
  riscv: add support for TIF_NOTIFY_SIGNAL
  nds32: add support for TIF_NOTIFY_SIGNAL
  ia64: add support for TIF_NOTIFY_SIGNAL
  h8300: add support for TIF_NOTIFY_SIGNAL
  c6x: add support for TIF_NOTIFY_SIGNAL
  alpha: add support for TIF_NOTIFY_SIGNAL
  xtensa: add support for TIF_NOTIFY_SIGNAL
  arm: add support for TIF_NOTIFY_SIGNAL
  microblaze: add support for TIF_NOTIFY_SIGNAL
  hexagon: add support for TIF_NOTIFY_SIGNAL
  csky: add support for TIF_NOTIFY_SIGNAL
  openrisc: add support for TIF_NOTIFY_SIGNAL
  sh: add support for TIF_NOTIFY_SIGNAL
  um: add support for TIF_NOTIFY_SIGNAL
  ...
2020-12-16 12:33:35 -08:00
Heiko Carstens
8d93b70118 s390/idle: allow arch_cpu_idle() to be kprobed
Remove NOKPROBE_SYMBOL() for arch_cpu_idle(). This might have made
sense when enabled_wait() (aka arch_cpu_idle()) was called from
udelay.
But now there shouldn't be a reason why s390 should be the only
architecture which doesn't allow arch_cpu_idle() to be probed.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:50 +01:00
Heiko Carstens
7494755a9a s390/idle: remove raw_local_irq_save()/restore() from arch_cpu_idle()
arch_cpu_idle() gets called with interrupts disabled,
and psw_idle() returns with interrupts disabled.
No reason to use raw_local_irq_save() / restore().

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:49 +01:00
Heiko Carstens
44292c8684 s390/idle: merge enabled_wait() and arch_cpu_idle()
The only caller of enabled_wait() besides arch_cpu_idle() was
udelay(). Since that call doesn't exist anymore, merge enabled_wait()
and arch_cpu_idle().

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:49 +01:00
Heiko Carstens
e0d62dcb20 s390/delay: remove udelay_simple()
udelay_simple() callers can make use of the now simplified udelay()
implementation. No need to keep it.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:49 +01:00
Heiko Carstens
dd6cfe5532 s390/delay: simplify udelay
udelay is implemented by using quite subtle details to make it
possible to load an idle psw and waiting for an interrupt even in irq
context or when interrupts are disabled. Also handling (or better: no
handling) of softirqs is taken into account.

All this is done to optimize for something which should in normal
circumstances never happen: calling udelay to busy wait. Therefore get
rid of the whole complexity and just busy loop like other
architectures are doing it also.

It could have been possible to use diag 0x44 instead of cpu_relax() in
the busy loop, however we have seen too many bad things happen with
diag 0x44 that it seems to be better to simply busy loop.

Also note that with this new implementation kernel preemption does
work when within the udelay loop. This did not work before.

To get a feeling what the former code optimizes for: IPL'ing a kernel
with 'defconfig' and afterwards compiling a kernel ends with a total
of zero udelay calls.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:49 +01:00
Heiko Carstens
f0c7cf13a3 s390: make calls to TRACE_IRQS_OFF/TRACE_IRQS_ON balanced
In case of udelay CIF_IGNORE_IRQ is set. This leads to an unbalanced
call of TRACE_IRQS_OFF and TRACE_IRQS_ON. That is: from lockdep's
point of view TRACE_IRQS_ON is called one time too often.

This doesn't fix any real bug, just makes the calls balanced.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:48 +01:00
Heiko Carstens
9365965db0 s390: always clear kernel stack backchain before calling functions
Clear the kernel stack backchain before potentially calling the
lockdep trace_hardirqs_off/on functions. Without this walking the
kernel backchain, e.g. during a panic, might stop too early.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:48 +01:00
Linus Torvalds
2cffa11e2a Generic interrupt and irqchips subsystem:
Core:
 
      - Consolidation and robustness changes for irq time accounting
 
      - Cleanup and consolidation of irq stats
 
      - Remove the fasteoi IPI flow which has been proved useless
 
      - Provide an interface for converting legacy interrupt mechanism into
        irqdomains
 
  Drivers:
 
      The rare event of not having completely new chip driver code, just new
      DT bindings and extensions of existing drivers to accomodate new
      variants!
 
      - Preliminary support for managed interrupts on platform devices
 
      - Correctly identify allocation of MSIs proxyied by another device
 
      - Generalise the Ocelot support to new SoCs
 
      - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
 
      - Work around spurious interrupts on Qualcomm PDC
 
      - Random fixes and cleanups
 
 Thanks,
 
 	tglx
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/YwZgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoW4CD/90rTi1OQrMe3nb5okVjUZmktz/K3BN
 Cl5+evFiXiNoH+yJSMIVP+8eMAtBH6RgoaD0EUtSYmgzb9h/JRRQYwtPxobXcMb2
 2xcWyLPJkVJL431JKNM8BBRYjLA2VnQ6Ia+Kx3BxqpgKXn5+cEMh1dwIy27Ll2rj
 +2NHAQe1sHL7o/KcCDhYqbVIDjw5K/d7YPwjEuPeEoNv1DOxrOCdCEfgFN0jBtRE
 CoaRTBskeAaHIzHNp47Mxyz43g4tA/D8kB68X0OjpEykVkPUbgNK1FHSwaPbIsFT
 FTSPU3zg8Q6DZ+RGyjNJykIFgUbirlJxARk2c6Ct8Kc3DN6K1jQt4EsU7CXRCc98
 BTBjUNeFeNj3irZ4GHhyMKOQJCA1Z5nCRfBUGiW6gK8183us3BLfH5DM1zEsAYUh
 DCp+UKsLuXhbB80EWq7kl82/2mNGZ8En8EerE6XJA7Z3JN8FplOHEuLezYYzwzbb
 RIes971Vc50J2u2Wf/M2c3PDz3D/4FzfwUeA4LJfTnmOL09RYZ8CsqSckpx4ku/F
 XiBnjwtGEpDXWJ8z13DC7yONrxFGByV19+sqHTBlub5DmIs0gXjhC0dKAPAruUIS
 iCC+Vx6xLgOpTDu8shFsjibbi9Hb6vuZrF2Te+WR5Rf7d80C0J4b5K5PS4daUjr6
 IuD2tz+3CtPjHw==
 =iytv
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Generic interrupt and irqchips subsystem updates. Unusually, there is
  not a single completely new irq chip driver, just new DT bindings and
  extensions of existing drivers to accomodate new variants!

  Core:

   - Consolidation and robustness changes for irq time accounting

   - Cleanup and consolidation of irq stats

   - Remove the fasteoi IPI flow which has been proved useless

   - Provide an interface for converting legacy interrupt mechanism into
     irqdomains

  Drivers:

   - Preliminary support for managed interrupts on platform devices

   - Correctly identify allocation of MSIs proxyied by another device

   - Generalise the Ocelot support to new SoCs

   - Improve GICv4.1 vcpu entry, matching the corresponding KVM
     optimisation

   - Work around spurious interrupts on Qualcomm PDC

   - Random fixes and cleanups"

* tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
  driver core: platform: Add devm_platform_get_irqs_affinity()
  ACPI: Drop acpi_dev_irqresource_disabled()
  resource: Add irqresource_disabled()
  genirq/affinity: Add irq_update_affinity_desc()
  irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge
  irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device
  platform-msi: Track shared domain allocation
  irqchip/ti-sci-intr: Fix freeing of irqs
  irqchip/ti-sci-inta: Fix printing of inta id on probe success
  drivers/irqchip: Remove EZChip NPS interrupt controller
  Revert "genirq: Add fasteoi IPI flow"
  irqchip/hip04: Make IPIs use handle_percpu_devid_irq()
  irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq()
  irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq()
  irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq()
  irqchip/ocelot: Add support for Jaguar2 platforms
  irqchip/ocelot: Add support for Serval platforms
  irqchip/ocelot: Add support for Luton platforms
  irqchip/ocelot: prepare to support more SoC
  ...
2020-12-15 15:03:31 -08:00
Linus Torvalds
ac73e3dc8a Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few random little subsystems

 - almost all of the MM patches which are staged ahead of linux-next
   material. I'll trickle to post-linux-next work in as the dependents
   get merged up.

Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
  mm: cleanup kstrto*() usage
  mm: fix fall-through warnings for Clang
  mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
  mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
  mm:backing-dev: use sysfs_emit in macro defining functions
  mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
  mm: use sysfs_emit for struct kobject * uses
  mm: fix kernel-doc markups
  zram: break the strict dependency from lzo
  zram: add stat to gather incompressible pages since zram set up
  zram: support page writeback
  mm/process_vm_access: remove redundant initialization of iov_r
  mm/zsmalloc.c: rework the list_add code in insert_zspage()
  mm/zswap: move to use crypto_acomp API for hardware acceleration
  mm/zswap: fix passing zero to 'PTR_ERR' warning
  mm/zswap: make struct kernel_param_ops definitions const
  userfaultfd/selftests: hint the test runner on required privilege
  userfaultfd/selftests: fix retval check for userfaultfd_open()
  userfaultfd/selftests: always dump something in modes
  userfaultfd: selftests: make __{s,u}64 format specifiers portable
  ...
2020-12-15 12:53:37 -08:00
Dmitry Safonov
871402e05b mm: forbid splitting special mappings
Don't allow splitting of vm_special_mapping's.  It affects vdso/vvar
areas.  Uprobes have only one page in xol_area so they aren't affected.

Those restrictions were enforced by checks in .mremap() callbacks.
Restrict resizing with generic .split() callback.

Link: https://lkml.kernel.org/r/20201013013416.390574-7-dima@arista.com
Signed-off-by: Dmitry Safonov <dima@arista.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:41 -08:00
Thomas Gleixner
ba22d0ede3 s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt()
The irq descriptor is already there, no need to look it up again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20201210194043.769108348@linutronix.de
2020-12-15 16:19:31 +01:00
Thomas Gleixner
3c41e57a1e irqchip updates for Linux 5.11
- Preliminary support for managed interrupts on platform devices
 - Correctly identify allocation of MSIs proxyied by another device
 - Remove the fasteoi IPI flow which has been proved useless
 - Generalise the Ocelot support to new SoCs
 - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
 - Work around spurious interrupts on Qualcomm PDC
 - Random fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl/Uxq8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDoW0P/0ZMDvFPxrfnJD46exUgUOPuuFF8jZxAlxD8
 7UExqar7u6yX7bbq394jPgtOOxldDagfCx/jCXgb9ja7DK5EHKRcrfjaDT8knHi2
 Keg5RaRMRi9TVltvWQTxAkXwSv0Atl881qqsndPeZCez0GNZp+HB34s+rNkZwBOu
 MBrWihMQOSv5QE6milsNc7HXLSHM1eLZ7Y2XgumNtKrIGEX9yZI7qwdMofwP8Za3
 ayMOvc1WAWaTJI7Mg5ac1yTCVbqLmRHhCtws6c6DMgaRu6SI0itmbpQzkDuJJIe3
 k9h4KQPaKAFcQsoo3GV0MKTMm63eq82XT3CAdv+htYRY1z95D2+nzNK+mJtsGptX
 gJ2zeJkUb4u+yVtNguL9qjo5ssCXV/6IybJxv6baaEFnSwQMUwqa066NdxmtqfIe
 1BOWnc153a7SRbQ34M9/llje+v8YJbueGMS2RFR2LQ6IjjpaHsXh+YCZokfA/kdk
 zGbOUD5WWFtFD1T3UoaJ4gFt+pzHjNqym4CcEj4S1Vf5y+POUkNmC+GYK+xfm2Fp
 WJMbdIUxJhHFRD9L1ShtfAVUSbp712VOOdILp9rYAkOdqfb51BVUiMUP++s2dGp1
 ZIT78qt7kTKT1CxbDdFAjzsi7RoMqdSGYgKmG4sVprELeZnFwq47nBkBr8XEQ1TT
 0ccEUOY8
 =7Z24
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates for 5.11 from Marc Zyngier:

  - Preliminary support for managed interrupts on platform devices
  - Correctly identify allocation of MSIs proxyied by another device
  - Remove the fasteoi IPI flow which has been proved useless
  - Generalise the Ocelot support to new SoCs
  - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
  - Work around spurious interrupts on Qualcomm PDC
  - Random fixes and cleanups

Link: https://lore.kernel.org/r/20201212135626.1479884-1-maz@kernel.org
2020-12-15 10:48:07 +01:00
Linus Torvalds
586592478b - Add support for the hugetlb_cma command line option to allocate gigantic
hugepages using CMA:
 
 - Add arch_get_random_long() support.
 
 - Add ap bus userspace notifications.
 
 - Increase default size of vmalloc area to 512GB and otherwise let it increase
   dynamically by the size of physical memory. This should fix all occurrences
   where the vmalloc area was not large enough.
 
 - Completely get rid of set_fs() (aka select SET_FS) and rework address space
   handling while doing that; making address space handling much more simple.
 
 - Reimplement getcpu vdso syscall in C.
 
 - Add support for extended SCLP responses (> 4k). This allows e.g. to handle
   also potential large system configurations.
 
 - Simplify KASAN by removing 3-level page table support and only supporting
   4-levels from now on.
 
 - Improve debug-ability of the kernel decompressor code, which now prints also
   stack traces and symbols in case of problems to the console.
 
 - Remove more power management leftovers.
 
 - Other various fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl/XQAIACgkQIg7DeRsp
 bsIdYA//TCtSTrka/yW03b4b0FuLtKNpKB5zQgaqtEurbgbZhXdZ7/L3N+KavPQH
 njmKAARxebRIJB0DoZ9w9XpSb+mI3Q5y8GMi5xvUzjtJj/c6ahi3cEXIpuDR0PBv
 bf4UYSUpvndOwVFVOEZLeaJwKciCYvdoOwjBCmoKz9orthNVdVh5vztVRE2dMkNl
 y9C/Pb3w4ZMYxrbETuYnxqzueCxUhVOJmwodkGdP6bxBeemOwKn2TLVZQCbGGe7y
 BZpG+xsTaLZV1dZUZuDSOzVi1CTzJBGaJuYy5ewddWfxi7+mxqwEg/4s6nGKAciX
 Fa3T6aqLpUmDDN842Ql9TZHrwR+GYrlAp3XaQETOusUuEQLvP1dKRj/RXiDXN3MZ
 L+Mfa56dbs9GkVaNN/N+L7Y4z/6tZ2caX4X2S22Cp/QzvRTrG4jXVTn0r4WIcY/2
 vn7fEy71LJ97CLQTDryyfJx7YNMdyIlUZY5ICAk1bt8nz1lB/IoZy0YoCBvPxIzb
 cEKcFTOdOtZR4WY3F8+kU0Nv1HQ8yPBzMaAqSNERvNQhMvoCChxntmyYxuVgH5iB
 SACADqEJKQ3hb4nMnxkeTrmmrhH4e0kdF9lAEytX+VYbjAq/6MY+qYo+QHDYkFWh
 BndxI54d6IiktDcKuBcpKJM7S/7N2t+EsLTS6Dhux7dbDZ2+Upw=
 =UR7j
 -----END PGP SIGNATURE-----

Merge tag 's390-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Add support for the hugetlb_cma command line option to allocate
   gigantic hugepages using CMA

 - Add arch_get_random_long() support.

 - Add ap bus userspace notifications.

 - Increase default size of vmalloc area to 512GB and otherwise let it
   increase dynamically by the size of physical memory. This should fix
   all occurrences where the vmalloc area was not large enough.

 - Completely get rid of set_fs() (aka select SET_FS) and rework address
   space handling while doing that; making address space handling much
   more simple.

 - Reimplement getcpu vdso syscall in C.

 - Add support for extended SCLP responses (> 4k). This allows e.g. to
   handle also potential large system configurations.

 - Simplify KASAN by removing 3-level page table support and only
   supporting 4-levels from now on.

 - Improve debug-ability of the kernel decompressor code, which now
   prints also stack traces and symbols in case of problems to the
   console.

 - Remove more power management leftovers.

 - Other various fixes and improvements all over the place.

* tag 's390-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (62 commits)
  s390/mm: add support to allocate gigantic hugepages using CMA
  s390/crypto: add arch_get_random_long() support
  s390/smp: perform initial CPU reset also for SMT siblings
  s390/mm: use invalid asce for user space when switching to init_mm
  s390/idle: fix accounting with machine checks
  s390/idle: add missing mt_cycles calculation
  s390/boot: add build-id to decompressor
  s390/kexec_file: fix diag308 subcode when loading crash kernel
  s390/cio: fix use-after-free in ccw_device_destroy_console
  s390/cio: remove pm support from ccw bus driver
  s390/cio: remove pm support from css-bus driver
  s390/cio: remove pm support from IO subchannel drivers
  s390/cio: remove pm support from chsc subchannel driver
  s390/vmur: remove unused pm related functions
  s390/tape: remove unsupported PM functions
  s390/cio: remove pm support from eadm-sch drivers
  s390: remove pm support from console drivers
  s390/dasd: remove unused pm related functions
  s390/zfcp: remove pm support from zfcp driver
  s390/ap: let bus_register() add the AP bus sysfs attributes
  ...
2020-12-14 16:22:26 -08:00
Gerald Schaefer
343dbdb7cb s390/mm: add support to allocate gigantic hugepages using CMA
Commit cf11e85fc0 ("mm: hugetlb: optionally allocate gigantic hugepages
using cma") added support for allocating gigantic hugepages using CMA,
by specifying the hugetlb_cma= kernel parameter, which will disable any
boot-time allocation of gigantic hugepages.

This patch enables that option also for s390.

Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-10 21:11:01 +01:00
Sven Schnelle
b5e438ebd7 s390/smp: perform initial CPU reset also for SMT siblings
Not resetting the SMT siblings might leave them in unpredictable
state. One of the observed problems was that the CPU timer wasn't
reset and therefore large system time values where accounted during
CPU bringup.

Cc: <stable@kernel.org> # 4.0
Fixes: 10ad34bc76 ("s390: add SMT support")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09 21:02:08 +01:00
Sven Schnelle
454efcf82e s390/idle: fix accounting with machine checks
When a machine check interrupt is triggered during idle, the code
is using the async timer/clock for idle time calculation. It should use
the machine check enter timer/clock which is passed to the macro.

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Cc: <stable@vger.kernel.org> # 5.8
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09 21:02:07 +01:00
Sven Schnelle
e259b3fafa s390/idle: add missing mt_cycles calculation
During removal of the critical section cleanup the calculation
of mt_cycles during idle was removed. This causes invalid
accounting on systems with SMT enabled.

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Cc: <stable@vger.kernel.org> # 5.8
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09 21:02:07 +01:00
Frederic Weisbecker
8a6a5920d3 sched/vtime: Consolidate IRQ time accounting
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
all have their own version of irq time accounting that dispatch the
cputime to the appropriate index: hardirq, softirq, system, idle,
guest... from an all-in-one function.

Instead of having these ad-hoc versions, move the cputime destination
dispatch decision to the core code and leave only the actual per-index
cputime accounting to the architecture.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201202115732.27827-4-frederic@kernel.org
2020-12-02 20:20:05 +01:00
Frederic Weisbecker
2b91ec9f55 s390/vtime: Use the generic IRQ entry accounting
s390 has its own version of IRQ entry accounting because it doesn't
account the idle time the same way the other architectures do. Only
the actual idle sleep time is accounted as idle time, the rest of the
idle task execution is accounted as system time.

Make the generic IRQ entry accounting aware of architectures that have
their own way of accounting idle time and convert s390 to use it.

This prepares s390 to get involved in further consolidations of IRQ
time accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201202115732.27827-3-frederic@kernel.org
2020-12-02 20:20:04 +01:00
Frederic Weisbecker
7197688b20 sched/cputime: Remove symbol exports from IRQ time accounting
account_irq_enter_time() and account_irq_exit_time() are not called
from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ
cputime accounting functions called from there.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201202115732.27827-2-frederic@kernel.org
2020-12-02 20:20:04 +01:00
Heiko Carstens
b1cae1f84a s390: fix irq state tracing
With commit 58c644ba51 ("sched/idle: Fix arch_cpu_idle() vs
tracing") common code calls arch_cpu_idle() with a lockdep state that
tells irqs are on.

This doesn't work very well for s390: psw_idle() will enable interrupts
to wait for an interrupt. As soon as an interrupt occurs the interrupt
handler will verify if the old context was psw_idle(). If that is the
case the interrupt enablement bits in the old program status word will
be cleared.

A subsequent test in both the external as well as the io interrupt
handler checks if in the old context interrupts were enabled. Due to
the above patching of the old program status word it is assumed the
old context had interrupts disabled, and therefore a call to
TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
turn makes lockdep incorrectly "think" that interrupts are enabled
within the interrupt handler.

Fix this by unconditionally calling TRACE_IRQS_OFF when entering
interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
leaving interrupts handlers.

This leaves the special psw_idle() case, which now returns with
interrupts disabled, but has an "irqs on" lockdep state. So callers of
psw_idle() must adjust the state on their own, if required. This is
currently only __udelay_disabled().

Fixes: 58c644ba51 ("sched/idle: Fix arch_cpu_idle() vs tracing")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-02 18:17:50 +01:00
Heiko Carstens
1ab3001b6e s390/vdso: add missing prototypes for vdso functions
clang W=1 warns about missing prototypes:

>> arch/s390/kernel/vdso64/getcpu.c:8:5: warning: no previous prototype for function '__s390_vdso_getcpu' [-Wmissing-prototypes]
   int __s390_vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
       ^

Add a local header file in order to get rid of this warnings.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-30 14:10:50 +01:00
Linus Torvalds
f91a3aa6bc Yet two more places which invoke tracing from RCU disabled regions in the
idle path. Similar to the entry path the low level idle functions have to
 be non-instrumentable.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/DpAUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXSLD/9klc0YimnEnROW6Q5Svb2IcyIutmXF
 bOIY1bYYoKILOBj3wyvDUhmdMuq5zh7H9yG11hO8MaVVWVQcLcOMLdHTYm9dcdmF
 xQk33+xqjuhRShB+nEmC9ayYtWogtH6W6uZ6WDtF9ZltMKU85n5ddGJ/Fvo+HoCb
 NbOdHGJdJ3/3ZCeHnxOnxM+5/GwjkBuccTV/tXmb3yXrfU9DBySyQ4/UchcpF43w
 LcEb0kiQbpZsBTByKJOQV8+RR654S0sILlvRwVXpmj94vrgGwhlVk1/9rz7tkOhF
 ksoo1mTVu75LMt22G/hXxE63787yRvFdHjapf0+kCOAuhl992NK+xlGDH8o9DXcu
 9y73D4bI0HnDFs20w6vs20iLvxECJiYHJqlgR5ZwFUToceaNgtiYr8kzuD7Zbae1
 KG2E7BuNSwHWMtf97fGn44GZknPEOaKdDn4Wv6/bvKHxLm77qe11RKF70Stcz2AI
 am13KmQzzsHGF5qNWwpElRUxSdxfJMR66RnOdTQULGrRedaZTFol/y2pnVzTSe3k
 SZnlpL5kE7y92UYDogPb5wWA7b+YkJN0OdSkRFy1FH26ZG8E4M7ZJ2tql5Sw7pGM
 lsTjXpAUphnK5rz7QcYE8KAZWj//fIAcElIrvdklVcBnS3IqjfksYW27B64133vx
 cT1B/lA1PHXj6Q==
 =raED
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two more places which invoke tracing from RCU disabled regions in the
  idle path.

  Similar to the entry path the low level idle functions have to be
  non-instrumentable"

* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  intel_idle: Fix intel_idle() vs tracing
  sched/idle: Fix arch_cpu_idle() vs tracing
2020-11-29 11:19:26 -08:00
Linus Torvalds
3913a2bc81 ARM:
- Fix alignment of the new HYP sections
 - Fix GICR_TYPER access from userspace
 
 S390:
 - do not reset the global diag318 data for per-cpu reset
 - do not mark memory as protected too early
 - fix for destroy page ultravisor call
 
 x86:
 - fix for SEV debugging
 - fix incorrect return code
 - fix for "noapic" with PIC in userspace and LAPIC in kernel
 - fix for 5-level paging
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/BKpQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPrZgf+Jdw1ONU5hFLl5Xz2YneVppqMr3nh
 X/Nr/dGzP+ve2FPNgkMotwqOWb/6jwKYKbliB2Q6fS51/7MiV7TDizna8ZpzEn12
 M0/NMWtW7Luq7yTTnXUhClG4QfRvn90EaflxUYxCBSRRhDleJ9sCl4Ga5b1fDIdQ
 AeDdqJV4ElCGUrPM1my4vemrbFeiiEeDeWZvb6TP5LlJS+EDZeehk9zEAB7PFwAu
 oX3O8WUbRxRYakZR1PPIn8e0qh2zaVDFUk/sZKJLOCCPx2UnOErf3jV6rQEMeSPC
 5aOspfq+gI3jukufdyNxcKxRSj8Jw63f0vDaUgd4H71dsG390gM6onQiQg==
 =IyC5
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:
   - Fix alignment of the new HYP sections
   - Fix GICR_TYPER access from userspace

  S390:
   - do not reset the global diag318 data for per-cpu reset
   - do not mark memory as protected too early
   - fix for destroy page ultravisor call

  x86:
   - fix for SEV debugging
   - fix incorrect return code
   - fix for 'noapic' with PIC in userspace and LAPIC in kernel
   - fix for 5-level paging"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
  KVM: x86: Fix split-irqchip vs interrupt injection window request
  KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
  MAINTAINERS: Update email address for Sean Christopherson
  MAINTAINERS: add uv.c also to KVM/s390
  s390/uv: handle destroy page legacy interface
  KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
  KVM: SVM: fix error return code in svm_create_vcpu()
  KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
  KVM: arm64: Correctly align nVHE percpu data
  KVM: s390: remove diag318 reset code
  KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup
2020-11-27 11:04:13 -08:00
Linus Torvalds
80145ac2f7 - disable interrupts when restoring fpu and vector registers,
otherwise KVM guests might see corrupted register contents
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl+9R6cACgkQIg7DeRsp
 bsK7Pw/8CeOCtG+MfVUugdX1AwZhA9h8IztJ5QrPmzp87rTE7spypaAahockzTpz
 GKIJUY6jWluqllOzy1K3+e5VqOSlprz6QAxCipPqP8IPc2J1Pyy7uRqOcATJ6rc+
 N1Yb4sBMtMW38c/fKSiN4DLKZ4bDZHdqAIJXDxmJC2oE1o2X0is6s7mcZje+zFz1
 jwIanfQ9J9NEz1cPeFmdTw1f5vj34M0eLGedWQn5zzomfQwmXWu/Yv1ErZzzFIIi
 Uh9U1m06j+AEKlTiotF+UHwuyfelgynbJ4uexVBglll3d68XlRjA2y12h91Y4+gB
 IxOiFQSMwZjdSKF/Js6xibpt1Wtvxu5keH9ziOu3yduLyY0SXxWQM/SRLe+Gt5Jy
 tJdR574XYbq24PLI0zFdrfjQ2npqJsIGWQDqYMCr0X+hAXz9JDZpzx7qQYRmyWIi
 37ycw2aOMtObD2CyYLX3vDDcZxNOTSxRBt7jYwwMJIfaM5hKTGomJN4itcKZZReN
 iuwPrRZq1cRNlmE6OIOHXTpFHV3hufGHJmhXmmVe5I707dS6L8CyS5XQ6EVn4rL7
 hjdvhywdljvWUg6DoXDQlG4OKMTgJneyrlZHhjoVxE9kANPmf5KJzM8Jga3BwOPO
 3Dp45LGn3lwyTzvF+DK46A9WAVi1ercteCg6yNqUzbHWFshUvOM=
 =Ry2h
 -----END PGP SIGNATURE-----

Merge tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fix from Heiko Carstens:
 "Disable interrupts when restoring fpu and vector registers, otherwise
  KVM guests might see corrupted register contents"

* tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix fpu restore in entry.S
2020-11-24 12:15:44 -08:00
Peter Zijlstra
58c644ba51 sched/idle: Fix arch_cpu_idle() vs tracing
We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.

(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)

Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
2020-11-24 16:47:35 +01:00
Heiko Carstens
80f0630624 s390/vdso: reimplement getcpu vdso syscall
Implement the previously removed getcpu vdso syscall by using the
TOD programmable field to pass the cpu number to user space.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23 12:01:13 +01:00
Heiko Carstens
062e527956 s390/mm: add debug user asce support
Verify on exit to user space that always
- the primary ASCE (cr1) is set to kernel ASCE
- the secondary ASCE (cr7) is set to user ASCE

If this is not the case: panic since something went terribly wrong.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23 12:01:12 +01:00
Heiko Carstens
0290c9e328 s390/mm: use invalid asce instead of kernel asce
Create a region 3 page table which contains only invalid entries, and
use that via "s390_invalid_asce" instead of the kernel ASCE whenever
there is either
- no user address space available, e.g. during early startup
- as an intermediate ASCE when address spaces are switched

This makes sure that user space accesses in such situations are
guaranteed to fail.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23 12:01:12 +01:00
Heiko Carstens
87d5986345 s390/mm: remove set_fs / rework address space handling
Remove set_fs support from s390. With doing this rework address space
handling and simplify it. As a result address spaces are now setup
like this:

CPU running in              | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
----------------------------|-----------|-----------|-----------
user space                  |  user     |  user     |  kernel
kernel, normal execution    |  kernel   |  user     |  kernel
kernel, kvm guest execution |  gmap     |  user     |  kernel

To achieve this the getcpu vdso syscall is removed in order to avoid
secondary address mode and a separate vdso address space in for user
space. The getcpu vdso syscall will be implemented differently with a
subsequent patch.

The kernel accesses user space always via secondary address space.
This happens in different ways:
- with mvcos in home space mode and directly read/write to secondary
  address space
- with mvcs/mvcp in primary space mode and copy from primary space to
  secondary space or vice versa
- with e.g. cs in secondary space mode and access secondary space

Switching translation modes happens with sacf before and after
instructions which access user space, like before.

Lazy handling of control register reloading is removed in the hope to
make everything simpler, but at the cost of making kernel entry and
exit a bit slower. That is: on kernel entry the primary asce is always
changed to contain the kernel asce, and on kernel exit the primary
asce is changed again so it contains the user asce.

In kernel mode there is only one exception to the primary asce: when
kvm guests are executed the primary asce contains the gmap asce (which
describes the guest address space). The primary asce is reset to
kernel asce whenever kvm guest execution is interrupted, so that this
doesn't has to be taken into account for any user space accesses.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23 12:01:12 +01:00
Heiko Carstens
77663819d4 Merge branch 'fixes' into features
* fixes:
  s390: fix fpu restore in entry.S

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23 12:00:42 +01:00
Sven Schnelle
1179f170b6 s390: fix fpu restore in entry.S
We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.

The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location.  This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.

Cc: <stable@vger.kernel.org> # 5.8
Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Reported-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23 11:52:13 +01:00
Julian Wiedmann
074ff04e27 s390/stp: let subsys_system_register() sysfs attributes
Instead of creating the sysfs attributes for the stp root_dev by hand,
pass them to subsys_system_register() as parameter.

This also ensures that the attributes are available when the KOBJ_ADD
event is raised.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-20 19:19:11 +01:00
Vasily Gorbik
c9343637d6 s390/ftrace: assume -mhotpatch or -mrecord-mcount always available
Currently the kernel minimal compiler requirement is gcc 4.9 or
clang 10.0.1.
* gcc -mhotpatch option is supported since 4.8.
* A combination of -pg -mrecord-mcount -mnop-mcount -mfentry flags is
supported since gcc 9 and since clang 10.

Drop support for old -pg function prologues. Which leaves binary
compatible -mhotpatch / -mnop-mcount -mfentry prologues in a form:
	brcl	0,0
Which are also do not require initial nop optimization / conversion and
presence of _mcount symbol.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-20 19:19:11 +01:00
Vasily Gorbik
73045a08cf s390: unify identity mapping limits handling
Currently we have to consider too many different values which
in the end only affect identity mapping size. These are:
1. max_physmem_end - end of physical memory online or standby.
   Always <= end of the last online memory block (get_mem_detect_end()).
2. CONFIG_MAX_PHYSMEM_BITS - the maximum size of physical memory the
   kernel is able to support.
3. "mem=" kernel command line option which limits physical memory usage.
4. OLDMEM_BASE which is a kdump memory limit when the kernel is executed as
   crash kernel.
5. "hsa" size which is a memory limit when the kernel is executed during
   zfcp/nvme dump.

Through out kernel startup and run we juggle all those values at once
but that does not bring any amusement, only confusion and complexity.

Unify all those values to a single one we should really care, that is
our identity mapping size.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-20 19:19:10 +01:00
Heiko Carstens
0cd9b7230c s390: add separate program check exit path
System call and program check handler both use the system call exit
path when returning to previous context. However the program check
handler jumps right to the end of the system call exit path if the
previous context is kernel context.

This lead to the quite odd double disabling of interrupts in the
system call exit path introduced with commit ce9dfafe29 ("s390:
fix system call exit path").

To avoid that have a separate program check handler exit path if the
previous context is kernel context.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-20 19:17:24 +01:00
Heiko Carstens
6c81603801 Merge branch 'fixes' into features
* fixes:
  s390/cpum_sf.c: fix file permission for cpum_sfb_size
  s390: update defconfigs
  s390: fix system call exit path

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-20 19:15:58 +01:00
Paolo Bonzini
79af02af1d KVM: s390: Fix for destroy page ultravisor call
- handle response code from older firmware
 - add uv.c to KVM: s390/s390 maintainer list
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJftQ/EAAoJEBF7vIC1phx8eW0P/RQkoX/307h3Y+ywipqPwx+m
 5sTDbL728pSH3RbJWDm3iSLvuvsfNzUE9gqluM9MSoYmzpVbqfTkwU2f7e1JdJJp
 DMt55XTaeMWliOdrDGiSJ5tlmSDjoQA8OaXfnk9Z5LBt71duej6iMV4o1a0JItv2
 h8cCMyY8HcLceTApNk6arogLZ/S+IftvlsTlPov+pxxwXehIIfdKhpUYdIAGPXhS
 lRQUD9umIBhXPgD+u/NRdH3LsPkvJ31vqAhgqPJMp8TVH13rQjtirYF1rs/zC8Qi
 qVKHNpvxDYtzduBEXCpBbCvu1hOLZC3vyAFaBFiCzoqRzaCDjd8TI4OzjATzYjkX
 lEeRlEzjU0G+mseE2EvSfklGrHXXu5MUJ7x4ojfMOd0ZwZXoOe3Qg1o9+QX+chX/
 IZ1oWaiqm43eF2PNlrMeLbSQwJ65qO4IxtfAtiGAF8U99drEDjObpCVPeCDcy48U
 9jzwek/wI38JNqMY5sIYBF5qJIG3KwY1sGpEnsSLpjw+gsQlSxzscqZmmm+1xR7l
 RaLYKER+68vv7QdQhzK5CSGlHQU77EfZTYKDEhH1Sm/wZ5FNuHVQP2CYbwDEzq1O
 AsdvJh0HXZ+QARseDuQzrEe+70ghLKuF99mIi0NnJANgyhKxjOqySlKaSLYkXrDU
 GkW9isvZU1hnkTLli0iF
 =YfhG
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: Fix for destroy page ultravisor call

- handle response code from older firmware
- add uv.c to KVM: s390/s390 maintainer list
2020-11-18 12:04:05 -05:00
Christian Borntraeger
4c80d05714 s390/uv: handle destroy page legacy interface
Older firmware can return rc=0x107 rrc=0xd for destroy page if the
page is already non-secure. This should be handled like a success
as already done by newer firmware.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 1a80b54d1c ("s390/uv: add destroy page call")
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
2020-11-18 13:08:49 +01:00
Linus Torvalds
111e91a6df - fix system call exit path; avoid return to user space with
any TIF/CIF/PIF set
 
 - fix file permission for cpum_sfb_size parameter
 
 - another small defconfig update
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl+0EOoACgkQIg7DeRsp
 bsIh5g/+LuitpduJRvbyb9Km1A+CSflhC7USScBnyO0n2MRlo1c4M2rRIoUtOVD4
 ktOV03CcKAjahw3umIx9euHkqYo5qgUVkSgx9q23R0GMf3iSXwh4wKKpe/YAuiad
 qsAunpD6xRtKm2xqnnSGYiZ8gKDRw7N+nZBWNrZa74I/thcNZbq1d7TBmrTJrkYL
 EH8JxTvPohNpjtDYOwVh8XcKPl1tT3R7N9/bmudTiOtGQtDWCsOjg4XisAc10ovw
 thCmr1t32SBifdhk6HE7AQrA73EpazDQlAUZlPVb+E7JJypp5gUDSkJO1wZ+TkhW
 WJgIgJGzeyJ9iqLQQcnwdxQW91spKr/gYw9yy5gZDZ1uvCclqdfKo/sha1N+xX3F
 j67+h/LEOGV3d02mBlDi6+4fnjHbnyWhDUivi3Atp7PHmWGd1qTPjLqZ3NsXZrLT
 8sTX6c77g8YqzC++Q2goXPDmToxqcT1LCPpAVSNYY3BdAsOgvMJSlFVia1If5SAv
 6MU8MUWTORBqh7c/hB0Ka+cVJUxtZ6Pt/HESM9qONhTEmAqNfeWvPgsSSlLytl39
 PS9RDL6vw29rsOvu9kLEaISRl1G31RaLRYLdIZ4HyTl+8m+skQ3VAmursEnLrTnb
 oRFBuNp6Y5jPGWkqXhE6t7z3ozzRNZERXA1AEqM2VozKHOXYtiI=
 =TVeJ
 -----END PGP SIGNATURE-----

Merge tag 's390-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - fix system call exit path; avoid return to user space with any
   TIF/CIF/PIF set

 - fix file permission for cpum_sfb_size parameter

 - another small defconfig update

* tag 's390-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cpum_sf.c: fix file permission for cpum_sfb_size
  s390: update defconfigs
  s390: fix system call exit path
2020-11-17 11:22:03 -08:00
Steven Rostedt (VMware)
d19ad0775d ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
In preparation to have arguments of a function passed to callbacks attached
to functions as default, change the default callback prototype to receive a
struct ftrace_regs as the forth parameter instead of a pt_regs.

For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they
will now need to get the pt_regs via a ftrace_get_regs() helper call. If
this is called by a callback that their ftrace_ops did not have a
FL_SAVE_REGS flag set, it that helper function will return NULL.

This will allow the ftrace_regs to hold enough just to get the parameters
and stack pointer, but without the worry that callbacks may have a pt_regs
that is not completely filled.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-13 12:14:55 -05:00
Thomas Richter
78d732e1f3 s390/cpum_sf.c: fix file permission for cpum_sfb_size
This file is installed by the s390 CPU Measurement sampling
facility device driver to export supported minimum and
maximum sample buffer sizes.
This file is read by lscpumf tool to display the details
of the device driver capabilities. The lscpumf tool might
be invoked by a non-root user. In this case it does not
print anything because the file contents can not be read.

Fix this by allowing read access for all users. Reading
the file contents is ok, changing the file contents is
left to the root user only.

For further reference and details see:
 [1] https://github.com/ibm-s390-tools/s390-tools/issues/97

Fixes: 69f239ed33 ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur")
Cc: <stable@vger.kernel.org> # 3.14
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-12 12:10:36 +01:00
Peter Zijlstra
76a4efa809 perf/arch: Remove perf_sample_data::regs_user_copy
struct perf_sample_data lives on-stack, we should be careful about it's
size. Furthermore, the pt_regs copy in there is only because x86_64 is a
trainwreck, solve it differently.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
2020-11-09 18:12:34 +01:00
Peter Zijlstra
267fb27352 perf: Reduce stack usage of perf_output_begin()
__perf_output_begin() has an on-stack struct perf_sample_data in the
unlikely case it needs to generate a LOST record. However, every call
to perf_output_begin() must already have a perf_sample_data on-stack.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org
2020-11-09 18:12:33 +01:00
Jens Axboe
75309018a2 s390: add support for TIF_NOTIFY_SIGNAL
Wire up TIF_NOTIFY_SIGNAL handling for s390.

Cc: linux-s390@vger.kernel.org
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-09 08:16:55 -07:00
Vasily Gorbik
d7e7fbba67 s390/early: rewrite program parameter setup in C
And move it earlier in the decompressor.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09 11:21:00 +01:00
Vasily Gorbik
97b142b740 s390: make sure vmemmap is top region table entry aligned
Since commit 29d37e5b82 ("s390/protvirt: add ultravisor initialization")
vmax is adjusted to the ultravisor secure storage limit. This limit is
currently applied when 4-level paging is used. Later vmax is also used
to align vmemmap address to the top region table entry border. When vmax
is set to the ultravisor secure storage limit this is no longer the case.

Instead of changing vmax, make only MODULES_END be affected by the
secure storage limit, so that vmax stays intact for further vmemmap
address alignment.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09 11:20:58 +01:00
Vasily Gorbik
f38b0a7439 s390: remove unused s390_base_ext_handler
s390_base_ext_handler_fn haven't been used since its introduction in
commit ab14de6c37 ("[S390] Convert memory detection into C code.").

s390_base_ext_handler itself is currently falsely storing 16 registers
at __LC_SAVE_AREA_ASYNC rewriting several following lowcore values:
cpu_flags, return_psw, return_mcck_psw, sync_enter_timer and
async_enter_timer.

Besides that s390_base_ext_handler itself is only potentially hiding
EXT interrupts which should not have happen in the first place. Any
piece of code which requires EXT interrupts before fully functional
ext_int_handler is enabled has to do it on its own, like this is done
by sclp_early_cmd() which is doing EXT interrupts handling synchronously
in sclp_early_wait_irq().

With s390_base_ext_handler removed unexpected EXT interrupt leads
to disabled wait with the address 0x1b0 (__LC_EXT_NEW_PSW), which is
currently setup in the decompressor.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09 11:20:58 +01:00
Vasily Gorbik
85cde0192a s390/udelay: make it work for the early code
Currently udelay relies on working EXT interrupts handler, which is not
the case during early startup. In such cases udelay_simple() has to be
used instead.

To avoid mistakes of calling udelay too early, which could happen from
the common code as well - make udelay work for the early code by
introducing static branch and redirecting all udelay calls to
udelay_simple until EXT interrupts handler is fully initialized and
async stack is allocated.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09 11:20:58 +01:00
Heiko Carstens
ce9dfafe29 s390: fix system call exit path
The system call exit path is running with interrupts enabled while
checking for TIF/PIF/CIF bits which require special handling. If all
bits have been checked interrupts are disabled and the kernel exits to
user space.
The problem is that after checking all bits and before interrupts are
disabled bits can be set already again, due to interrupt handling.

This means that the kernel can exit to user space with some
TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED
might be set, which might lead to additional latencies, since that bit
will only be recognized with next exit to user space.

Fix this by checking the corresponding bits only when interrupts are
disabled.

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Cc: <stable@vger.kernel.org> # 5.8
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09 11:16:11 +01:00
Steven Rostedt (VMware)
773c167050 ftrace: Add recording of functions that caused recursion
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
"recursed_functions" all the functions that caused recursion while a
callback to the function tracer was running.

Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: live-patching@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06 08:42:26 -05:00
Steven Rostedt (VMware)
c536aa1c5b kprobes/ftrace: Add recursion protection to the ftrace callback
If a ftrace callback does not supply its own recursion protection and
does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will
make a helper trampoline to do so before calling the callback instead of
just calling the callback directly.

The default for ftrace_ops is going to change. It will expect that handlers
provide their own recursion protection, unless its ftrace_ops states
otherwise.

Link: https://lkml.kernel.org/r/20201028115613.140212174@goodmis.org
Link: https://lkml.kernel.org/r/20201106023546.944907560@goodmis.org

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh  Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06 08:35:44 -05:00
Qian Cai
de5d9dae15 s390/smp: move rcu_cpu_starting() earlier
The call to rcu_cpu_starting() in smp_init_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:

 WARNING: suspicious RCU usage
 -----------------------------
 kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 RCU used illegally from offline CPU!
 rcu_scheduler_active = 1, debug_locks = 1
 no locks held by swapper/1/0.

 Call Trace:
 show_stack+0x158/0x1f0
 dump_stack+0x1f2/0x238
 __lock_acquire+0x2640/0x4dd0
 lock_acquire+0x3a8/0xd08
 _raw_spin_lock_irqsave+0xc0/0xf0
 clockevents_register_device+0xa8/0x528
 init_cpu_timer+0x33e/0x468
 smp_init_secondary+0x11a/0x328
 smp_start_secondary+0x82/0x88

This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the smp_init_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.

Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-03 15:12:16 +01:00
Heiko Carstens
cfef9aa69a s390/vdso: remove unused constants
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-03 15:12:16 +01:00
Linus Torvalds
4a22709e21 arch-cleanup-2020-10-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+SOXIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptrcD/93VUDmRAn73ChKNd0TtXUicJlAlNLVjvfs
 VFTXWBDnlJnGkZT7ElkDD9b8dsz8l4xGf/QZ5dzhC/th2OsfObQkSTfe0lv5cCQO
 mX7CRSrDpjaHtW+WGPDa0oQsGgIfpqUz2IOg9NKbZZ1LJ2uzYfdOcf3oyRgwZJ9B
 I3sh1vP6OzjZVVCMmtMTM+sYZEsDoNwhZwpkpiwMmj8tYtOPgKCYKpqCiXrGU0x2
 ML5FtDIwiwU+O3zYYdCBWqvCb2Db0iA9Aov2whEBz/V2jnmrN5RMA/90UOh1E2zG
 br4wM1Wt3hNrtj5qSxZGlF/HEMYJVB8Z2SgMjYu4vQz09qRVVqpGdT/dNvLAHQWg
 w4xNCj071kVZDQdfwnqeWSKYUau9Xskvi8xhTT+WX8a5CsbVrM9vGslnS5XNeZ6p
 h2D3Q+TAYTvT756icTl0qsYVP7PrPY7DdmQYu0q+Lc3jdGI+jyxO2h9OFBRLZ3p6
 zFX2N8wkvvCCzP2DwVnnhIi/GovpSh7ksHnb039F36Y/IhZPqV1bGqdNQVdanv6I
 8fcIDM6ltRQ7dO2Br5f1tKUZE9Pm6x60b/uRVjhfVh65uTEKyGRhcm5j9ztzvQfI
 cCBg4rbVRNKolxuDEkjsAFXVoiiEEsb7pLf4pMO+Dr62wxFG589tQNySySneUIVZ
 J9ILnGAAeQ==
 =aVWo
 -----END PGP SIGNATURE-----

Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block

Pull arch task_work cleanups from Jens Axboe:
 "Two cleanups that don't fit other categories:

   - Finally get the task_work_add() cleanup done properly, so we don't
     have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
     all callers, and also fixes up the documentation for
     task_work_add().

   - While working on some TIF related changes for 5.11, this
     TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
     duplication for how that is handled"

* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
  task_work: cleanup notification modes
  tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
2020-10-23 10:06:38 -07:00
Linus Torvalds
746b25b1aa Kbuild updates for v5.10
- Support 'make compile_commands.json' to generate the compilation
    database more easily, avoiding stale entries
 
  - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
    using clang-tidy
 
  - Preprocess scripts/modules.lds.S to allow CONFIG options in the module
    linker script
 
  - Drop cc-option tests from compiler flags supported by our minimal
    GCC/Clang versions
 
  - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y
 
  - Use sha1 build id for both BFD linker and LLD
 
  - Improve deb-pkg for reproducible builds and rootless builds
 
  - Remove stale, useless scripts/namespace.pl
 
  - Turn -Wreturn-type warning into error
 
  - Fix build error of deb-pkg when CONFIG_MODULES=n
 
  - Replace 'hostname' command with more portable 'uname -n'
 
  - Various Makefile cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl+RfS0VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGG1QP/2hzoMzK1YXErPUhGrhYU1rxz7Nu
 HkLTIkyKF1HPwSJf5XyNW/FTBI4SDlkNoVg/weEDCS1yFxxpvQLIck8ChzA1kIIM
 P+1IfBWOTzqn91XsapU2zwSno3gylphVchVIvYAB3oLUotGeMSluy1cQtBRzyA5D
 rj2Q7H8fzkzk3YoBcBC/BOKDlfo/usqQ1X/gsfRFwN/BJxeZSYoujNBE7KtHaDsd
 8K/ggBIqmST4NBn+M8c11d8CxzvWbtG1gq3EkUL5nG8T13DsGn1EFC0SPt85bkvv
 f9YywfJi37HixhZzK6tXYjN/PWoiEY6z90mhd0NtZghQT7kQMiTQ3sWrM8dX3ssf
 phBzO94uFQDjhyxOaSSsCoI/TIciAPo4+G8PNjcaEtj63IEfhEz/dnlstYwY5Y9P
 Pp3aZtVjSGJwGW2u2EUYj6paFVqjf6DXQjQKPNHnsYCEidIvFTjjguRGvx9gl6mx
 yd8oseOsAtOEf0alRe9MMdvN17O3UrRAxgBdap7fktg02TLVRGxZIbuwKmBf29ho
 ORl9zeFkYBn6XQFyuItJoXy/kYFyHDaBEPYCRQcY4dwqcjZIiAc/FhYbqYthJ59L
 5vLN2etmDIVSuUv1J5nBqHHGCqJChykbqg7riQ651dCNKw4gZB8ctCay2lXhBXMg
 1mqOcoG5WWL7//F+
 =tZRN
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support 'make compile_commands.json' to generate the compilation
   database more easily, avoiding stale entries

 - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
   using clang-tidy

 - Preprocess scripts/modules.lds.S to allow CONFIG options in the
   module linker script

 - Drop cc-option tests from compiler flags supported by our minimal
   GCC/Clang versions

 - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y

 - Use sha1 build id for both BFD linker and LLD

 - Improve deb-pkg for reproducible builds and rootless builds

 - Remove stale, useless scripts/namespace.pl

 - Turn -Wreturn-type warning into error

 - Fix build error of deb-pkg when CONFIG_MODULES=n

 - Replace 'hostname' command with more portable 'uname -n'

 - Various Makefile cleanups

* tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
  kbuild: Use uname for LINUX_COMPILE_HOST detection
  kbuild: Only add -fno-var-tracking-assignments for old GCC versions
  kbuild: remove leftover comment for filechk utility
  treewide: remove DISABLE_LTO
  kbuild: deb-pkg: clean up package name variables
  kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
  kbuild: enforce -Werror=return-type
  scripts: remove namespace.pl
  builddeb: Add support for all required debian/rules targets
  builddeb: Enable rootless builds
  builddeb: Pass -n to gzip for reproducible packages
  kbuild: split the build log of kallsyms
  kbuild: explicitly specify the build id style
  scripts/setlocalversion: make git describe output more reliable
  kbuild: remove cc-option test of -Werror=date-time
  kbuild: remove cc-option test of -fno-stack-check
  kbuild: remove cc-option test of -fno-strict-overflow
  kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
  kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
  kbuild: do not create built-in objects for external module builds
  ...
2020-10-22 13:13:57 -07:00
Minchan Kim
ecb8ac8b1f mm/madvise: introduce process_madvise() syscall: an external memory hinting API
There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.

The information required to make the reclaim decision is not known to the
app.  Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.

To solve the issue, this patch introduces a new syscall
process_madvise(2).  It uses pidfd of an external process to give the
hint.  It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement.  I
think it would be bigger in real practice because the testing ran very
cache friendly environment).

Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations.  In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment.  With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.

ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.

I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky.  Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone.  Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

If someone want to add other hints, we could hear the usecase and review
it for each hint.  It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.

So finally, the API is as follows,

      ssize_t process_madvise(int pidfd, const struct iovec *iovec,
                unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
      The process_madvise() system call is used to give advice or directions
      to the kernel about the address ranges from external process as well as
      local process. It provides the advice to address ranges of process
      described by iovec and vlen. The goal of such advice is to improve
      system or application performance.

      The pidfd selects the process referred to by the PID file descriptor
      specified in pidfd. (See pidofd_open(2) for further information)

      The pointer iovec points to an array of iovec structures, defined in
      <sys/uio.h> as:

        struct iovec {
            void *iov_base;         /* starting address */
            size_t iov_len;         /* number of bytes to be advised */
        };

      The iovec describes address ranges beginning at address(iov_base)
      and with size length of bytes(iov_len).

      The vlen represents the number of elements in iovec.

      The advice is indicated in the advice argument, which is one of the
      following at this moment if the target process specified by pidfd is
      external.

        MADV_COLD
        MADV_PAGEOUT

      Permission to provide a hint to external process is governed by a
      ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

      The process_madvise supports every advice madvise(2) has if target
      process is in same thread group with calling process so user could
      use process_madvise(2) to extend existing madvise(2) to support
      vector address ranges.

    RETURN VALUE
      On success, process_madvise() returns the number of bytes advised.
      This return value may be less than the total number of requested
      bytes, if an error occurred. The caller should check return value
      to determine whether a partial advice occurred.

FAQ:

Q.1 - Why does any external entity have better knowledge?

Quote from Sandeep

"For Android, every application (including the special SystemServer)
are forked from Zygote.  The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.

After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.

In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.

So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.

Besides, we can never rely on applications to clean things up
themselves.  We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.

So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.

- ssp

Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?

process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called.  If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect.  It's the
responsibility of the process calling process_madvise to close this
race condition.  For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called.  Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process.  Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm.  The suggested API itself does not provide synchronization.  It
also apply other APIs like move_pages, process_vm_write.

The race isn't really a problem though.  Why is it so wrong to require
that callers do their own synchronization in some manner?  Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something.  Think about mmap.  It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before.  That's where we need synchronization by using other API or
design from userside.  It shouldn't be part of API itself.  If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3].  Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.

To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.

Q.3 - Why doesn't ptrace work?

Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA.  Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill.  It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.

[1] https://developer.android.com/topic/performance/memory"

[2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

[3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

[minchan@kernel.org: fix process_madvise build break for arm64]
  Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
  Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
  Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
  Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
  Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
  Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
  Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
  Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
Jens Axboe
3c532798ec tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 15:04:36 -06:00