Commit Graph

726 Commits

Author SHA1 Message Date
Holger Dengler
9b52ddeb46 s390/pkey_pckmo: Return with success for valid protected key types
The key_to_protkey handler function in module pkey_pckmo should return
with success on all known protected key types, including the new types
introduced by fd197556ee ("s390/pkey: Add AES xts and HMAC clear key
token support").

Fixes: fd197556ee ("s390/pkey: Add AES xts and HMAC clear key token support")
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-16 11:32:32 +02:00
Harald Freudenberger
78f636e82b s390/ap: Fix CCA crypto card behavior within protected execution environment
A crypto card comes in 3 flavors: accelerator, CCA co-processor or
EP11 co-processor. Within a protected execution environment only the
accelerator and EP11 co-processor is supported. However, it is
possible to set up a KVM guest with a CCA card and run it as a
protected execution guest. There is nothing at the host side which
prevents this. Within such a guest, a CCA card is shown as "illicit"
and you can't do anything with such a crypto card.

Regardless of the unsupported CCA card within a protected execution
guest there are a couple of user space applications which
unconditional try to run crypto requests to the zcrypt device
driver. There was a bug within the AP bus code which allowed such a
request to be forwarded to a CCA card where it is finally
rejected and the driver reacts with -ENODEV but also triggers an AP
bus scan. Together with a retry loop this caused some kind of "hang"
of the KVM guest. On startup it caused timeouts and finally led the
KVM guest startup fail. Fix that by closing the gap and make sure a
CCA card is not usable within a protected execution environment.

Another behavior within an protected execution environment with CCA
cards was that the se_bind and se_associate AP queue sysfs attributes
where shown. The implementation unconditional always added these
attributes. Fix that by checking if the card mode is supported within
a protected execution environment and only if valid, add the attribute
group.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-10 15:31:55 +02:00
Linus Torvalds
e08d227840 more s390 updates for 6.12 merge window
- Clean up and improve vdso code: use SYM_* macros for function and
   data annotations, add CFI annotations to fix GDB unwinding, optimize
   the chacha20 implementation
 
 - Add vfio-ap driver feature advertisement for use by libvirt and mdevctl
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmb39nkACgkQjYWKoQLX
 FBiacggAlHrwDYIZdr7bd+0bJa8Wq5STrdo1XiohQhCrRg+4rXRaquEQszWVheRk
 qi/9u+MKJFtbd4VMbf+WqEi2P4Pnn5cByv+As3B4RMR0Tp3uQK5TlDbh7wbJ+pIG
 AQQoL5D2z05rvbYgc1Wvt/3lgrfs2EzUY3cAyIMmFwSp0On+psDuwuFVtWzH0jCr
 fxCwX7mrF6OoBMR0QSUxAvoOujUhd4CANk3n6rNxJfi2AC/gX/ZkFUrV5a3FvJ2Z
 X7fQgDc3rivsm5wFuoBvlrH5J1jhmWtsg8Tiygos7BBehpQ5NUsJMDGS8kR+avbF
 iYKg1oMy5/ZXekBjqkEMO6Vp3nCrdg==
 =Idf7
 -----END PGP SIGNATURE-----

Merge tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Clean up and improve vdso code: use SYM_* macros for function and
   data annotations, add CFI annotations to fix GDB unwinding, optimize
   the chacha20 implementation

 - Add vfio-ap driver feature advertisement for use by libvirt and
   mdevctl

* tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vfio-ap: Driver feature advertisement
  s390/vdso: Use one large alternative instead of an alternative branch
  s390/vdso: Use SYM_DATA_START_LOCAL()/SYM_DATA_END() for data objects
  tools: Add additional SYM_*() stubs to linkage.h
  s390/vdso: Use macros for annotation of asm functions
  s390/vdso: Add CFI annotations to __arch_chacha20_blocks_nostack()
  s390/vdso: Fix comment within __arch_chacha20_blocks_nostack()
  s390/vdso: Get rid of permutation constants
2024-09-28 09:11:46 -07:00
Al Viro
cb787f4ac0 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 08:18:43 -07:00
Jason J. Herne
2d8721364c s390/vfio-ap: Driver feature advertisement
Advertise features of the driver for the benefit of automated tooling
like Libvirt and mdevctl.

Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Link: https://lore.kernel.org/r/20240916120123.11484-1-jjherne@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-09-23 17:57:04 +02:00
Linus Torvalds
1ec6d09789 s390 updates for 6.12 merge window
- Optimize ftrace and kprobes code patching and avoid stop machine for
   kprobes if sequential instruction fetching facility is available
 
 - Add hiperdispatch feature to dynamically adjust CPU capacity in
   vertical polarization to improve scheduling efficiency and overall
   performance. Also add infrastructure for handling warning track
   interrupts (WTI), allowing for graceful CPU preemption
 
 - Rework crypto code pkey module and split it into separate, independent
   modules for sysfs, PCKMO, CCA, and EP11, allowing modules to load only
   when the relevant hardware is available
 
 - Add hardware acceleration for HMAC modes and the full AES-XTS cipher,
   utilizing message-security assist extensions (MSA) 10 and 11. It
   introduces new shash implementations for HMAC-SHA224/256/384/512 and
   registers the hardware-accelerated AES-XTS cipher as the preferred
   option. Also add clear key token support
 
 - Add MSA 10 and 11 processor activity instrumentation counters to perf
   and update PAI Extension 1 NNPA counters
 
 - Cleanup cpu sampling facility code and rework debug/WARN_ON_ONCE
   statements
 
 - Add support for SHA3 performance enhancements introduced with MSA 12
 
 - Add support for the query authentication information feature of
   MSA 13 and introduce the KDSA CPACF instruction. Provide query and query
   authentication information in sysfs, enabling tools like cpacfinfo to
   present this data in a human-readable form
 
 - Update kernel disassembler instructions
 
 - Always enable EXPOLINE_EXTERN if supported by the compiler to ensure
   kpatch compatibility
 
 - Add missing warning handling and relocated lowcore support to the
   early program check handler
 
 - Optimize ftrace_return_address() and avoid calling unwinder
 
 - Make modules use kernel ftrace trampolines
 
 - Strip relocs from the final vmlinux ELF file to make it roughly 2
   times smaller
 
 - Dump register contents and call trace for early crashes to the console
 
 - Generate ptdump address marker array dynamically
 
 - Fix rcu_sched stalls that might occur when adding or removing large
   amounts of pages at once to or from the CMM balloon
 
 - Fix deadlock caused by recursive lock of the AP bus scan mutex
 
 - Unify sync and async register save areas in entry code
 
 - Cleanup debug prints in crypto code
 
 - Various cleanup and sanitizing patches for the decompressor
 
 - Various small ftrace cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmbsZawACgkQjYWKoQLX
 FBg+Ogf+NiKPfvI14NcTwnOHB6qz8ApPdGfN9bNVtQxtK3epeAvtj0cMonAuKpRg
 xckTRRd8y0guhCT7Q2+WitSgA5eYDn+u9/Ux5YuKUdUdXolQ0D64BJNtVeEFkmJj
 s+Lesb8cVI9T2VBZOpuF9lJigfsDALBkFroqN4MDudDeahS+qy33bAc0OoqYNXHo
 S6OwPK1/tEG9O/oTN2V4mN+aP0B3/dl7Msezb0gfAXQJA+WUAyMNK0RHvoG9uzaa
 BWAyWWYABj6woGZEAQAzXcbzkQiRPixTqZVe6e4YndXhIlEnB/Z2AQFdTpT9V7En
 eOmmve3QuJa0hkF9q4H/anvOMPntTg==
 =Xagq
 -----END PGP SIGNATURE-----

Merge tag 's390-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Optimize ftrace and kprobes code patching and avoid stop machine for
   kprobes if sequential instruction fetching facility is available

 - Add hiperdispatch feature to dynamically adjust CPU capacity in
   vertical polarization to improve scheduling efficiency and overall
   performance. Also add infrastructure for handling warning track
   interrupts (WTI), allowing for graceful CPU preemption

 - Rework crypto code pkey module and split it into separate,
   independent modules for sysfs, PCKMO, CCA, and EP11, allowing modules
   to load only when the relevant hardware is available

 - Add hardware acceleration for HMAC modes and the full AES-XTS cipher,
   utilizing message-security assist extensions (MSA) 10 and 11. It
   introduces new shash implementations for HMAC-SHA224/256/384/512 and
   registers the hardware-accelerated AES-XTS cipher as the preferred
   option. Also add clear key token support

 - Add MSA 10 and 11 processor activity instrumentation counters to perf
   and update PAI Extension 1 NNPA counters

 - Cleanup cpu sampling facility code and rework debug/WARN_ON_ONCE
   statements

 - Add support for SHA3 performance enhancements introduced with MSA 12

 - Add support for the query authentication information feature of MSA
   13 and introduce the KDSA CPACF instruction. Provide query and query
   authentication information in sysfs, enabling tools like cpacfinfo to
   present this data in a human-readable form

 - Update kernel disassembler instructions

 - Always enable EXPOLINE_EXTERN if supported by the compiler to ensure
   kpatch compatibility

 - Add missing warning handling and relocated lowcore support to the
   early program check handler

 - Optimize ftrace_return_address() and avoid calling unwinder

 - Make modules use kernel ftrace trampolines

 - Strip relocs from the final vmlinux ELF file to make it roughly 2
   times smaller

 - Dump register contents and call trace for early crashes to the
   console

 - Generate ptdump address marker array dynamically

 - Fix rcu_sched stalls that might occur when adding or removing large
   amounts of pages at once to or from the CMM balloon

 - Fix deadlock caused by recursive lock of the AP bus scan mutex

 - Unify sync and async register save areas in entry code

 - Cleanup debug prints in crypto code

 - Various cleanup and sanitizing patches for the decompressor

 - Various small ftrace cleanups

* tag 's390-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (84 commits)
  s390/crypto: Display Query and Query Authentication Information in sysfs
  s390/crypto: Add Support for Query Authentication Information
  s390/crypto: Rework RRE and RRF CPACF inline functions
  s390/crypto: Add KDSA CPACF Instruction
  s390/disassembler: Remove duplicate instruction format RSY_RDRU
  s390/boot: Move boot_printk() code to own file
  s390/boot: Use boot_printk() instead of sclp_early_printk()
  s390/boot: Rename decompressor_printk() to boot_printk()
  s390/boot: Compile all files with the same march flag
  s390: Use MARCH_HAS_*_FEATURES defines
  s390: Provide MARCH_HAS_*_FEATURES defines
  s390/facility: Disable compile time optimization for decompressor code
  s390/boot: Increase minimum architecture to z10
  s390/als: Remove obsolete comment
  s390/sha3: Fix SHA3 selftests failures
  s390/pkey: Add AES xts and HMAC clear key token support
  s390/cpacf: Add MSA 10 and 11 new PCKMO functions
  s390/mm: Add cond_resched() to cmm_alloc/free_pages()
  s390/pai_ext: Update PAI extension 1 counters
  s390/pai_crypto: Add support for MSA 10 and 11 pai counters
  ...
2024-09-21 09:02:54 -07:00
Harald Freudenberger
fd197556ee s390/pkey: Add AES xts and HMAC clear key token support
Add support for deriving protected keys from clear key token for
AES xts and HMAC keys via PCKMO instruction. Add support for
protected key generation and unwrap of protected key tokens for
these key types. Furthermore 4 new sysfs attributes are introduced:

- /sys/devices/virtual/misc/pkey/protkey/protkey_aes_xts_128
- /sys/devices/virtual/misc/pkey/protkey/protkey_aes_xts_256
- /sys/devices/virtual/misc/pkey/protkey/protkey_hmac_512
- /sys/devices/virtual/misc/pkey/protkey/protkey_hmac_1024

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05 15:17:23 +02:00
Harald Freudenberger
56199bb956 s390/ap: Fix deadlock caused by recursive lock of the AP bus scan mutex
There is a possibility to deadlock with an recursive
lock of the AP bus scan mutex ap_scan_bus_mutex:

  ... kernel: ============================================
  ... kernel: WARNING: possible recursive locking detected
  ... kernel: 5.14.0-496.el9.s390x #3 Not tainted
  ... kernel: --------------------------------------------
  ... kernel: kworker/12:1/130 is trying to acquire lock:
  ... kernel: 0000000358bc1510 (ap_scan_bus_mutex){+.+.}-{3:3}, at: ap_bus_force_rescan+0x92/0x108
  ... kernel:
	      but task is already holding lock:
  ... kernel: 0000000358bc1510 (ap_scan_bus_mutex){+.+.}-{3:3}, at: ap_scan_bus_wq_callback+0x28/0x60
  ... kernel:
	      other info that might help us debug this:
  ... kernel:  Possible unsafe locking scenario:
  ... kernel:        CPU0
  ... kernel:        ----
  ... kernel:   lock(ap_scan_bus_mutex);
  ... kernel:   lock(ap_scan_bus_mutex);
  ... kernel:
	      *** DEADLOCK ***

Here is how the callstack looks like:

  ... [<00000003576fe9ce>] process_one_work+0x2a6/0x748
  ... [<0000000358150c00>] ap_scan_bus_wq_callback+0x40/0x60   <- mutex locked
  ... [<00000003581506e2>] ap_scan_bus+0x5a/0x3b0
  ... [<000000035815037c>] ap_scan_adapter+0x5b4/0x8c0
  ... [<000000035814fa34>] ap_scan_domains+0x2d4/0x668
  ... [<0000000357d989b4>] device_add+0x4a4/0x6b8
  ... [<0000000357d9bb54>] bus_probe_device+0xb4/0xc8
  ... [<0000000357d9daa8>] __device_attach+0x120/0x1b0
  ... [<0000000357d9a632>] bus_for_each_drv+0x8a/0xd0
  ... [<0000000357d9d548>] __device_attach_driver+0xc0/0x140
  ... [<0000000357d9d3d8>] driver_probe_device+0x40/0xf0
  ... [<0000000357d9cec2>] really_probe+0xd2/0x460
  ... [<000000035814d7b0>] ap_device_probe+0x150/0x208
  ... [<000003ff802a5c46>] zcrypt_cex4_queue_probe+0xb6/0x1c0 [zcrypt_cex4]
  ... [<000003ff7fb2d36e>] zcrypt_queue_register+0xe6/0x1b0 [zcrypt]
  ... [<000003ff7fb2c8ac>] zcrypt_rng_device_add+0x94/0xd8 [zcrypt]
  ... [<0000000357d7bc52>] hwrng_register+0x212/0x228
  ... [<0000000357d7b8c2>] add_early_randomness+0x102/0x110
  ... [<000003ff7fb29c94>] zcrypt_rng_data_read+0x94/0xb8 [zcrypt]
  ... [<0000000358150aca>] ap_bus_force_rescan+0x92/0x108
  ... [<0000000358177572>] mutex_lock_interruptible_nested+0x32/0x40  <- lock again

Note this only happens when the very first random data providing
crypto card appears via hot plug in the system AND is in disabled
state ("deconfig"). Then the initial pull of random data fails and
a re-scan of the AP bus is triggered while already in the middle
of an AP bus scan caused by the appearing new hardware.

The fix is relatively simple once the scenario us understood:
The AP bus force rescan function will immediately return if there
is currently an AP bus scan running with the very same thread id.

Fixes: eacf5b3651 ("s390/ap: introduce mutex to lock the AP bus scan")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:34 +02:00
Harald Freudenberger
177b621bf0 s390/pkey: Add function to enforce pkey handler modules load
There is a use case during early boot with an secure key encrypted
root file system where the paes cipher may try to derive a protected
key from secure key while the AP bus is still in the process of
scanning the bus and building up the zcrypt device drivers. As the
detection of CEX cards also triggers the modprobe of the pkey handler
modules, these modules may come into existence too late.

Yet another use case happening during early boot is for use of an
protected key encrypted swap file(system). There is an ephemeral
protected key read via sysfs to set up the swap file. But this only
works when the pkey_pckmo module is already in - which may happen at a
later time as the load is triggered via CPU feature.

This patch introduces a new function pkey_handler_request_modules()
and invokes it which unconditional tries to load in the pkey handler
modules. This function is called for the in-kernel API to derive a
protected key from whatever and in the sysfs API when the first
attempt to simple invoke the handler function failed.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:34 +02:00
Harald Freudenberger
2fc401b944 s390/pkey: Add slowpath function to CCA and EP11 handler
For some keys there exists an alternative but usually slower
path to convert the key material into a protected key.
This patch introduces a new handler function
  slowpath_key_to_protkey()
which provides this alternate path for the CCA and EP11
handler code. With that even the knowledge about how
and when this can be used within the pkey API code can
be removed. So now the pkey API just tries the primary
way and if that fails simple tries the alternative way.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:34 +02:00
Harald Freudenberger
8fcc231ce3 s390/pkey: Introduce pkey base with handler registry and handler modules
Introduce pkey base kernel code with a simple pkey handler registry.
Regroup the pkey code into these kernel modules:
- pkey is the pkey api supporting the ioctls, sysfs and in-kernel api.
  Also the pkey base code which offers the handler registry and
  handler wrapping invocation functions is integrated there. This
  module is automatically loaded in via CPU feature if the MSA feature
  is available.
- pkey-cca is the CCA related handler code kernel module a offering
  CCA specific implementation for pkey. This module is loaded in
  via MODULE_DEVICE_TABLE when a CEX[4-8] card becomes available.
- pkey-ep11 is the EP11 related handler code kernel module offering an
  EP11 specific implementation for pkey. This module is loaded in via
  MODULE_DEVICE_TABLE when a CEX[4-8] card becomes available.
- pkey-pckmo is the PCKMO related handler code kernel module. This
  module is loaded in via CPU feature if the MSA feature is available,
  but on init a check for availability of the pckmo instruction is
  performed.

The handler modules register via a pkey_handler struct at the pkey
base code and the pkey customer (that is currently the pkey api code
fetches a handler via pkey handler registry functions and calls the
unified handler functions via the pkey base handler functions.

As a result the pkey-cca, pkey-ep11 and pkey-pckmo modules get
independent from each other and it becomes possible to write new
handlers which offer another kind of implementation without implicit
dependencies to other handler implementations and/or kernel device
drivers.

For each of these 4 kernel modules there is an individual Kconfig
entry: CONFIG_PKEY for the base and api, CONFIG_PKEY_CCA for the PKEY
CCA support handler, CONFIG_PKEY_EP11 for the EP11 support handler and
CONFIG_PKEY_PCKMO for the pckmo support. The both CEX related handler
modules (PKEY CCA and PKEY EP11) have a dependency to the zcrypt api
of the zcrypt device driver.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:34 +02:00
Harald Freudenberger
ea88e1710a s390/pkey: Unify pkey cca, ep11 and pckmo functions signatures
As a preparation step for introducing a common function API
between the pkey API module and the handlers (that is the
cca, ep11 and pckmo code) this patch unifies the functions
signatures exposed by the handlers and reworks all the
invocation code of these functions.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:33 +02:00
Harald Freudenberger
86fbf5e2a0 s390/pkey: Rework and split PKEY kernel module code
This is a huge rework of all the pkey kernel module code.
The goal is to split the code into individual parts with
a dedicated calling interface:
- move all the sysfs related code into pkey_sysfs.c
- all the CCA related code goes to pkey_cca.c
- the EP11 stuff has been moved to pkey_ep11.c
- the PCKMO related code is now in pkey_pckmo.c

The CCA, EP11 and PCKMO code may be seen as "handlers" with
a similar calling interface. The new header file pkey_base.h
declares this calling interface. The remaining code in
pkey_api.c handles the ioctl, the pkey module things and the
"handler" independent code on top of the calling interface
invoking the handlers.

This regrouping of the code will be the base for a real
pkey kernel module split into a pkey base module which acts
as a dispatcher and handler modules providing their service.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:33 +02:00
Harald Freudenberger
7344eea1b3 s390/pkey: Split pkey_unlocked_ioctl function
Split the very huge ioctl handling function pkey_unlocked_ioctl()
into individual functions per each IOCTL command.

There is no change in functional code coming with this patch.
The work is a simple copy-and-paste with the goal to have
the functionality absolutely untouched.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:33 +02:00
Holger Dengler
073ef6b204 s390/zcrypt_msgtype6: Cleanup debug code
The dynamic debugging provides function names on request. So remove
all explicit function strings.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
[dengler: fix indent]
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-21 16:17:01 +02:00
Holger Dengler
a7a88eeae3 s390/zcrypt_msgtype50: Cleanup debug code
The dynamic debugging provides function names on request. So remove
all explicit function strings.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-21 16:17:01 +02:00
Holger Dengler
1849850e81 s390/zcrypt_api: Cleanup debug code
The dynamic debugging provides function names on request. So remove
all explicit function strings.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-21 16:17:01 +02:00
Holger Dengler
ea31f0f6e2 s390/ap_queue: Cleanup debug code
The dynamic debugging provides function names on request. So remove
all explicit function strings.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-21 16:17:00 +02:00
Holger Dengler
391b8a6ce1 s390/ap_bus: Cleanup debug code
The dynamic debugging provides function names on request. So remove
all explicit function strings.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-21 16:17:00 +02:00
Harald Freudenberger
b4f5bd60d5 s390/ap: Refine AP bus bindings complete processing
With the rework of the AP bus scan and the introduction of
a bindings complete completion also the timing until the
userspace finally receives a AP bus binding complete uevent
had increased. Unfortunately this event triggers some important
jobs for preparation of KVM guests, for example the modification
of card/queue masks to reassign AP resources to the alternate
AP queue device driver (vfio_ap) which is the precondition
for building mediated devices which may be a precondition for
starting KVM guests using AP resources.

This small fix now triggers the check for binding complete
each time an AP device driver has registered. With this patch
the bindings complete may be posted up to 30s earlier as there
is no need to wait for the next AP bus scan any more.

Fixes: 778412ab91 ("s390/ap: rearm APQNs bindings complete completion")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Cc: stable@vger.kernel.org
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-21 16:14:45 +02:00
Greg Kroah-Hartman
d69d804845 driver core: have match() callback in struct bus_type take a const *
In the match() callback, the struct device_driver * should not be
changed, so change the function callback to be a const *.  This is one
step of many towards making the driver core safe to have struct
device_driver in read-only memory.

Because the match() callback is in all busses, all busses are modified
to handle this properly.  This does entail switching some container_of()
calls to container_of_const() to properly handle the constant *.

For some busses, like PCI and USB and HV, the const * is cast away in
the match callback as those busses do want to modify those structures at
this point in time (they have a local lock in the driver structure.)
That will have to be changed in the future if they wish to have their
struct device * in read-only-memory.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alex Elder <elder@kernel.org>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03 15:16:54 +02:00
Heiko Carstens
c1248638f8 s390/zcrypt: Use kvcalloc() instead of kvmalloc_array()
sparse warns about a large memset() call within
zcrypt_device_status_mask_ext():

drivers/s390/crypto/zcrypt_api.c:1303:15: warning: memset with byte count of 262144

Get rid of this warning by making sure that all callers of this function
allocate memory with __GFP_ZERO, which zeroes memory already at allocation
time, which again allows to remove the memset() call.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-05-17 10:43:43 +02:00
Harald Freudenberger
306d6bda8f s390/ap: Fix bind complete udev event sent after each AP bus scan
With the mentioned commit (see the fixes tag) on every AP bus scan an
uevent "AP bus change bindings complete" is emitted.  Furthermore if an AP
device switched from one driver to another, for example by manipulating the
apmask, there was never a "bindings complete" uevent generated.

The "bindings complete" event should be sent once when all AP devices have
been bound to device drivers and again if unbind/bind actions take place
and finally all AP devices are bound again. Therefore implement this.

Fixes: 778412ab91 ("s390/ap: rearm APQNs bindings complete completion")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-05-16 10:17:11 +02:00
Harald Freudenberger
d4f9d5a99a s390/ap: Fix crash in AP internal function modify_bitmap()
A system crash like this

  Failing address: 200000cb7df6f000 TEID: 200000cb7df6f403
  Fault in home space mode while using kernel ASCE.
  AS:00000002d71bc007 R3:00000003fe5b8007 S:000000011a446000 P:000000015660c13d
  Oops: 0038 ilc:3 [#1] PREEMPT SMP
  Modules linked in: mlx5_ib ...
  CPU: 8 PID: 7556 Comm: bash Not tainted 6.9.0-rc7 #8
  Hardware name: IBM 3931 A01 704 (LPAR)
  Krnl PSW : 0704e00180000000 0000014b75e7b606 (ap_parse_bitmap_str+0x10e/0x1f8)
  R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
  Krnl GPRS: 0000000000000001 ffffffffffffffc0 0000000000000001 00000048f96b75d3
  000000cb00000100 ffffffffffffffff ffffffffffffffff 000000cb7df6fce0
  000000cb7df6fce0 00000000ffffffff 000000000000002b 00000048ffffffff
  000003ff9b2dbc80 200000cb7df6fcd8 0000014bffffffc0 000000cb7df6fbc8
  Krnl Code: 0000014b75e7b5fc: a7840047            brc     8,0000014b75e7b68a
  0000014b75e7b600: 18b2                lr      %r11,%r2
  #0000014b75e7b602: a7f4000a            brc     15,0000014b75e7b616
  >0000014b75e7b606: eb22d00000e6        laog    %r2,%r2,0(%r13)
  0000014b75e7b60c: a7680001            lhi     %r6,1
  0000014b75e7b610: 187b                lr      %r7,%r11
  0000014b75e7b612: 84960021            brxh    %r9,%r6,0000014b75e7b654
  0000014b75e7b616: 18e9                lr      %r14,%r9
  Call Trace:
  [<0000014b75e7b606>] ap_parse_bitmap_str+0x10e/0x1f8
  ([<0000014b75e7b5dc>] ap_parse_bitmap_str+0xe4/0x1f8)
  [<0000014b75e7b758>] apmask_store+0x68/0x140
  [<0000014b75679196>] kernfs_fop_write_iter+0x14e/0x1e8
  [<0000014b75598524>] vfs_write+0x1b4/0x448
  [<0000014b7559894c>] ksys_write+0x74/0x100
  [<0000014b7618a440>] __do_syscall+0x268/0x328
  [<0000014b761a3558>] system_call+0x70/0x98
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
  [<0000014b75e7b636>] ap_parse_bitmap_str+0x13e/0x1f8
  Kernel panic - not syncing: Fatal exception: panic_on_oops

occured when /sys/bus/ap/a[pq]mask was updated with a relative mask value
(like +0x10-0x12,+60,-90) with one of the numeric values exceeding INT_MAX.

The fix is simple: use unsigned long values for the internal variables. The
correct checks are already in place in the function but a simple int for
the internal variables was used with the possibility to overflow.

Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-05-16 10:17:11 +02:00
Holger Dengler
f2ebdadd85 s390/pkey: Wipe copies of protected- and secure-keys
Although the clear-key of neither protected- nor secure-keys is
accessible, this key material should only be visible to the calling
process. So wipe all copies of protected- or secure-keys from stack,
even in case of an error.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-14 20:16:34 +02:00
Holger Dengler
d65d76a44f s390/pkey: Wipe copies of clear-key structures on failure
Wipe all sensitive data from stack for all IOCTLs, which convert a
clear-key into a protected- or secure-key.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-14 20:16:33 +02:00
Holger Dengler
1d8c270de5 s390/pkey: Wipe sensitive data on failure
Wipe sensitive data from stack also if the copy_to_user() fails.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-14 20:16:33 +02:00
Jules Irenge
22e6824622 s390/pkey: Use kfree_sensitive() to fix Coccinelle warnings
Replace memzero_explicit() and kfree() with kfree_sensitive() to fix
warnings reported by Coccinelle:

WARNING opportunity for kfree_sensitive/kvfree_sensitive (line 1506)
WARNING opportunity for kfree_sensitive/kvfree_sensitive (line 1643)
WARNING opportunity for kfree_sensitive/kvfree_sensitive (line 1770)

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Link: https://lore.kernel.org/r/ZjqZkNi_JUJu73Rg@octinomon.home
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-14 13:46:18 +02:00
Linus Torvalds
d65e1a0f30 - Store AP Query Configuration Information in a static buffer
- Rework the AP initialization and add missing cleanups to the error path
 
 - Swap IRQ and AP bus/device registration to avoid race conditions
 
 - Export prot_virt_guest symbol
 
 - Introduce AP configuration changes notifier interface to facilitate
   modularization of the AP bus
 
 - Add CONFIG_AP kernel configuration option to allow modularization of
   the AP bus
 
 - Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description and
   dependency and rename it to CONFIG_AP_DEBUG
 
 - Convert sprintf() and snprintf() to sysfs_emit() in CIO code
 
 - Adjust indentation of RELOCS command build step
 
 - Make crypto performance counters upward compatible
 
 - Convert make_page_secure() and gmap_make_secure() to use folio
 
 - Rework channel-utilization-block (CUB) handling in preparation of
   introducing additional CUBs
 
 - Use attribute groups to simplify registration, removal and extension
   of measurement-related channel-path sysfs attributes
 
 - Add a per-channel-path binary "ext_measurement" sysfs attribute that
   provides access to extended channel-path measurement data
 
 - Export measurement data for all channel-measurement-groups (CMG), not
   only for a specific ones. This enables support of new CMG data formats
   in userspace without the need for kernel changes
 
 - Add a per-channel-path sysfs attribute "speed_bps" that provides the
   operating speed in bits per second or 0 if the operating speed is not
   available
 
 - The CIO tracepoint subchannel-type field "st" is incorrectly set to
   the value of subchannel-enabled SCHIB "ena" field. Fix that
 
 - Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS
 
 - Consider the maximum physical address available to a DCSS segment
   (512GB) when memory layout is set up
 
 - Simplify the virtual memory layout setup by reducing the size of
   identity mapping vs vmemmap overlap
 
 - Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
   This will allow to place the kernel image next to kernel modules
 
 - Move everyting KASLR related from <asm/setup.h> to <asm/page.h>
 
 - Put virtual memory layout information into a structure to improve
   code generation
 
 - Currently __kaslr_offset is the kernel offset in both physical and
   virtual memory spaces. Uncouple these offsets to allow uncoupling
   of the addresses spaces
 
 - Currently the identity mapping base address is implicit and is always
   set to zero. Make it explicit by putting into __identity_base persistent
   boot variable and use it in proper context
 
 - Introduce .amode31 section start and end macros AMODE31_START and
   AMODE31_END
 
 - Introduce OS_INFO entries that do not reference any data in memory,
   but rather provide only values
 
 - Store virtual memory layout in OS_INFO. It is read out by makedumpfile,
   crash and other tools
 
 - Store virtual memory layout in VMCORE_INFO. It is read out by crash and
   other tools when /proc/kcore device is used
 
 - Create additional PT_LOAD ELF program header that covers kernel image
   only, so that vmcore tools could locate kernel text and data when virtual
   and physical memory spaces are uncoupled
 
 - Uncouple physical and virtual address spaces
 
 - Map kernel at fixed location when KASLR mode is disabled. The location is
   defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration value.
 
 - Rework deployment of kernel image for both compressed and uncompressed
   variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel configuration
   value
 
 - Move .vmlinux.relocs section in front of the compressed kernel.
   The interim section rescue step is avoided as result
 
 - Correct modules thunk offset calculation when branch target is more
   than 2GB away
 
 - Kernel modules contain their own set of expoline thunks. Now that the
   kernel modules area is less than 4GB away from kernel expoline thunks,
   make modules use kernel expolines. Also make EXPOLINE_EXTERN the default
   if the compiler supports it
 
 - userfaultfd can insert shared zeropages into processes running VMs,
   but that is not allowed for s390. Fallback to allocating a fresh
   zeroed anonymous folio and insert that instead
 
 - Re-enable shared zeropages for non-PV and non-skeys KVM guests
 
 - Rename hex2bitmap() to ap_hex2bitmap() and export it for external use
 
 - Add ap_config sysfs attribute to provide the means for setting or
   displaying adapters, domains and control domains assigned to a vfio-ap
   mediated device in a single operation
 
 - Make vfio_ap_mdev_link_queue() ignore duplicate link requests
 
 - Add write support to ap_config sysfs attribute to allow atomic update
   a vfio-ap mediated device state
 
 - Document ap_config sysfs attribute
 
 - Function os_info_old_init() is expected to be called only from a regular
   kdump kernel. Enable it to be called from a stand-alone dump kernel
 
 - Address gcc -Warray-bounds warning and fix array size in struct os_info
 
 - s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks
 
 - Use unwinder instead of __builtin_return_address() with ftrace to
   prevent returning of undefined values
 
 - Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
   kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is disabled
 
 - Compile kernel with -fPIC and link with -no-pie to allow kpatch feature
   always succeed and drop the whole CONFIG_PIE_BUILD option-enabled code
 
 - Add missing virt_to_phys() converter for VSIE facility and crypto
   control blocks
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZjkp5xccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8D99AQCEby+KHssuZe9m0NvvikWREYBC
 myqob4EmdU3KdTEbNAEAt2OB7mzSQc90yjawI+Je7vwVyh3uc2Nb4Qg05yO6owI=
 =eOYN
 -----END PGP SIGNATURE-----

Merge tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

 - Store AP Query Configuration Information in a static buffer

 - Rework the AP initialization and add missing cleanups to the error
   path

 - Swap IRQ and AP bus/device registration to avoid race conditions

 - Export prot_virt_guest symbol

 - Introduce AP configuration changes notifier interface to facilitate
   modularization of the AP bus

 - Add CONFIG_AP kernel configuration option to allow modularization of
   the AP bus

 - Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description
   and dependency and rename it to CONFIG_AP_DEBUG

 - Convert sprintf() and snprintf() to sysfs_emit() in CIO code

 - Adjust indentation of RELOCS command build step

 - Make crypto performance counters upward compatible

 - Convert make_page_secure() and gmap_make_secure() to use folio

 - Rework channel-utilization-block (CUB) handling in preparation of
   introducing additional CUBs

 - Use attribute groups to simplify registration, removal and extension
   of measurement-related channel-path sysfs attributes

 - Add a per-channel-path binary "ext_measurement" sysfs attribute that
   provides access to extended channel-path measurement data

 - Export measurement data for all channel-measurement-groups (CMG), not
   only for a specific ones. This enables support of new CMG data
   formats in userspace without the need for kernel changes

 - Add a per-channel-path sysfs attribute "speed_bps" that provides the
   operating speed in bits per second or 0 if the operating speed is not
   available

 - The CIO tracepoint subchannel-type field "st" is incorrectly set to
   the value of subchannel-enabled SCHIB "ena" field. Fix that

 - Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS

 - Consider the maximum physical address available to a DCSS segment
   (512GB) when memory layout is set up

 - Simplify the virtual memory layout setup by reducing the size of
   identity mapping vs vmemmap overlap

 - Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
   This will allow to place the kernel image next to kernel modules

 - Move everyting KASLR related from <asm/setup.h> to <asm/page.h>

 - Put virtual memory layout information into a structure to improve
   code generation

 - Currently __kaslr_offset is the kernel offset in both physical and
   virtual memory spaces. Uncouple these offsets to allow uncoupling of
   the addresses spaces

 - Currently the identity mapping base address is implicit and is always
   set to zero. Make it explicit by putting into __identity_base
   persistent boot variable and use it in proper context

 - Introduce .amode31 section start and end macros AMODE31_START and
   AMODE31_END

 - Introduce OS_INFO entries that do not reference any data in memory,
   but rather provide only values

 - Store virtual memory layout in OS_INFO. It is read out by
   makedumpfile, crash and other tools

 - Store virtual memory layout in VMCORE_INFO. It is read out by crash
   and other tools when /proc/kcore device is used

 - Create additional PT_LOAD ELF program header that covers kernel image
   only, so that vmcore tools could locate kernel text and data when
   virtual and physical memory spaces are uncoupled

 - Uncouple physical and virtual address spaces

 - Map kernel at fixed location when KASLR mode is disabled. The
   location is defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration
   value.

 - Rework deployment of kernel image for both compressed and
   uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel
   configuration value

 - Move .vmlinux.relocs section in front of the compressed kernel. The
   interim section rescue step is avoided as result

 - Correct modules thunk offset calculation when branch target is more
   than 2GB away

 - Kernel modules contain their own set of expoline thunks. Now that the
   kernel modules area is less than 4GB away from kernel expoline
   thunks, make modules use kernel expolines. Also make EXPOLINE_EXTERN
   the default if the compiler supports it

 - userfaultfd can insert shared zeropages into processes running VMs,
   but that is not allowed for s390. Fallback to allocating a fresh
   zeroed anonymous folio and insert that instead

 - Re-enable shared zeropages for non-PV and non-skeys KVM guests

 - Rename hex2bitmap() to ap_hex2bitmap() and export it for external use

 - Add ap_config sysfs attribute to provide the means for setting or
   displaying adapters, domains and control domains assigned to a
   vfio-ap mediated device in a single operation

 - Make vfio_ap_mdev_link_queue() ignore duplicate link requests

 - Add write support to ap_config sysfs attribute to allow atomic update
   a vfio-ap mediated device state

 - Document ap_config sysfs attribute

 - Function os_info_old_init() is expected to be called only from a
   regular kdump kernel. Enable it to be called from a stand-alone dump
   kernel

 - Address gcc -Warray-bounds warning and fix array size in struct
   os_info

 - s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks

 - Use unwinder instead of __builtin_return_address() with ftrace to
   prevent returning of undefined values

 - Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
   kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is
   disabled

 - Compile kernel with -fPIC and link with -no-pie to allow kpatch
   feature always succeed and drop the whole CONFIG_PIE_BUILD
   option-enabled code

 - Add missing virt_to_phys() converter for VSIE facility and crypto
   control blocks

* tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
  Revert "s390: Relocate vmlinux ELF data to virtual address space"
  KVM: s390: vsie: Use virt_to_phys for crypto control block
  s390: Relocate vmlinux ELF data to virtual address space
  s390: Compile kernel with -fPIC and link with -no-pie
  s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD
  s390/ftrace: Use unwinder instead of __builtin_return_address()
  s390/pci: Drop unneeded reference to CONFIG_DMI
  s390/os_info: Fix array size in struct os_info
  s390/os_info: Initialize old os_info in standalone dump kernel
  docs: Update s390 vfio-ap doc for ap_config sysfs attribute
  s390/vfio-ap: Add write support to sysfs attr ap_config
  s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue
  s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state
  s390/ap: Externalize AP bus specific bitmap reading function
  s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests
  mm/userfaultfd: Do not place zeropages when zeropages are disallowed
  s390/expoline: Make modules use kernel expolines
  s390/nospec: Correct modules thunk offset calculation
  s390/boot: Do not rescue .vmlinux.relocs section
  s390/boot: Rework deployment of the kernel image
  ...
2024-05-13 08:33:52 -07:00
Harald Freudenberger
da5658320b s390/zcrypt: Use EBUSY to indicate temp unavailability
Use -EBUSY instead of -EAGAIN in zcrypt_ccamisc.c
in cases where the CCA card returns 8/2290 to indicate
a temporarily unavailability of this function.

Fixes: ed6776c96c ("s390/crypto: remove retry loop with sleep from PAES pkey invocation")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-01 11:52:54 +02:00
Harald Freudenberger
c0e983b697 s390/zcrypt: Handle ep11 cprb return code
An EP11 reply cprb contains a field ret_code which may
hold an error code different than the error code stored
in the payload of the cprb. As of now all the EP11 misc
functions do not evaluate this field but focus on the
error code in the payload.

Before checking the payload error, first the cprb error
field should be evaluated which is introduced with this
patch.

If the return code value 0x000c0003 is seen, this
indicates a busy situation which is reflected by
-EBUSY in the zcrpyt_ep11misc.c low level function.
A higher level caller should consider to retry after
waiting a dedicated duration (say 1 second).

Fixes: ed6776c96c ("s390/crypto: remove retry loop with sleep from PAES pkey invocation")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-01 11:52:54 +02:00
Harald Freudenberger
a4499998c7 s390/zcrypt: Fix wrong format string in debug feature printout
Fix wrong format string debug feature: %04x was used
to print out a 32 bit value. - changed to %08x.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-01 11:52:54 +02:00
Jason J. Herne
8fb456bc9f s390/vfio-ap: Add write support to sysfs attr ap_config
Allow writing a complete set of masks to ap_config. Doing so will
cause the vfio-ap driver to replace the vfio-ap mediated device's
matrix masks with the given set of masks. If the given state cannot
be set, then no changes are made to the vfio-ap mediated device.

The format of the data written to ap_config is as follows:
{amask},{dmask},{cmask}\n

\n is a newline character.

amask, dmask, and cmask are masks identifying which adapters, domains,
and control domains should be assigned to the mediated device.

The format of a mask is as follows:
0xNN..NN

Where NN..NN is 64 hexadecimal characters representing a 256-bit value.
The leftmost (highest order) bit represents adapter/domain 0.

For an example set of masks that represent your mdev's current
configuration, simply cat ap_config.

This attribute is intended to be used by an mdevctl callout script
supporting the mdev type vfio_ap-passthrough to atomically update a
vfio-ap mediated device's state.

Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-5-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22 12:49:17 +02:00
Jason J. Herne
f3e3a4008c s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue
vfio_ap_mdev_link_queue is changed to detect if a matrix_mdev has
already linked the given queue. If so, it bails out.

Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-4-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22 12:49:17 +02:00
Jason J. Herne
e12aa0b5b2 s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state
Add ap_config sysfs attribute. This will provide the means for
setting or displaying the adapters, domains and control domains assigned
to the vfio-ap mediated device in a single operation. This sysfs
attribute is comprised of three masks: One for adapters, one for domains,
and one for control domains.

This attribute is intended to be used by mdevctl to query a vfio-ap
mediated device's state.

Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-3-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22 12:49:17 +02:00
Jason J. Herne
6e6973948c s390/ap: Externalize AP bus specific bitmap reading function
Rename hex2bitmap() to ap_hex2bitmap() and export it for external
use. This function will be used by the implementation of the vfio-ap
ap_config sysfs attribute.

Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-2-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22 12:49:17 +02:00
Holger Dengler
b3840c8bfc s390/ap: rename ap debug configuration option
The configuration option ZCRYPT_DEBUG is used only in ap queue code,
so rename it to AP_DEBUG. It also no longer depends on ZCRYPT but on
AP. While at it, also update the help text.

Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-09 17:29:56 +02:00
Holger Dengler
123760841a s390/ap: modularize ap bus
There is no hard requirement to have the ap bus statically in the
kernel, so add an option to compile it as module.

Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-09 17:29:56 +02:00
Holger Dengler
2a483d333f s390/chsc: use notifier for AP configuration changes
The direct dependency of chsc and the AP bus prevents the
modularization of ap bus. Introduce a notifier interface for AP
changes, which decouples the producer of the change events (chsc) from
the consumer (ap_bus).

Remove the ap_cfg_chg() interface and replace it with the notifier
invocation. The ap bus module registers a notification handler, which
triggers the AP bus scan.

Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-09 17:29:55 +02:00
Holger Dengler
3c7a377324 s390/ap: swap IRQ and bus/device registration
The IRQ handler may rely on the bus or the root device. Register the
adapter IRQ after setting up the bus and the root device to avoid any
race conditions.

Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-09 17:29:55 +02:00
Holger Dengler
170660ccf8 s390/ap: rework ap initialization
Rework the ap initialization and add missing cleanups to the error path.
Errors during the registration of IRQ handler is now also detected.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-09 17:29:55 +02:00
Holger Dengler
8dec9cb9f5 s390/ap: use static qci information
Since qci is available on most of the current machines, move away from
the dynamic buffers for qci information and store it instead in a
statically defined buffer.

The new flags member in struct ap_config_info is now used as an
indicator, if qci is available in the system (at least one of these
bits is set).

Suggested-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-09 17:29:55 +02:00
Ricardo B. Marliere
b11cc9e6d9 s390/zcrypt: make zcrypt_class constant
Since commit 43a7206b09 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the zcrypt_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Acked-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20240305-class_cleanup-s390-v1-1-c4ff1ec49ffd@marliere.net
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-13 09:23:49 +01:00
Harald Freudenberger
50ed48c80f s390/zcrypt: fix reference counting on zcrypt card objects
Tests with hot-plugging crytpo cards on KVM guests with debug
kernel build revealed an use after free for the load field of
the struct zcrypt_card. The reason was an incorrect reference
handling of the zcrypt card object which could lead to a free
of the zcrypt card object while it was still in use.

This is an example of the slab message:

    kernel: 0x00000000885a7512-0x00000000885a7513 @offset=1298. First byte 0x68 instead of 0x6b
    kernel: Allocated in zcrypt_card_alloc+0x36/0x70 [zcrypt] age=18046 cpu=3 pid=43
    kernel:  kmalloc_trace+0x3f2/0x470
    kernel:  zcrypt_card_alloc+0x36/0x70 [zcrypt]
    kernel:  zcrypt_cex4_card_probe+0x26/0x380 [zcrypt_cex4]
    kernel:  ap_device_probe+0x15c/0x290
    kernel:  really_probe+0xd2/0x468
    kernel:  driver_probe_device+0x40/0xf0
    kernel:  __device_attach_driver+0xc0/0x140
    kernel:  bus_for_each_drv+0x8c/0xd0
    kernel:  __device_attach+0x114/0x198
    kernel:  bus_probe_device+0xb4/0xc8
    kernel:  device_add+0x4d2/0x6e0
    kernel:  ap_scan_adapter+0x3d0/0x7c0
    kernel:  ap_scan_bus+0x5a/0x3b0
    kernel:  ap_scan_bus_wq_callback+0x40/0x60
    kernel:  process_one_work+0x26e/0x620
    kernel:  worker_thread+0x21c/0x440
    kernel: Freed in zcrypt_card_put+0x54/0x80 [zcrypt] age=9024 cpu=3 pid=43
    kernel:  kfree+0x37e/0x418
    kernel:  zcrypt_card_put+0x54/0x80 [zcrypt]
    kernel:  ap_device_remove+0x4c/0xe0
    kernel:  device_release_driver_internal+0x1c4/0x270
    kernel:  bus_remove_device+0x100/0x188
    kernel:  device_del+0x164/0x3c0
    kernel:  device_unregister+0x30/0x90
    kernel:  ap_scan_adapter+0xc8/0x7c0
    kernel:  ap_scan_bus+0x5a/0x3b0
    kernel:  ap_scan_bus_wq_callback+0x40/0x60
    kernel:  process_one_work+0x26e/0x620
    kernel:  worker_thread+0x21c/0x440
    kernel:  kthread+0x150/0x168
    kernel:  __ret_from_fork+0x3c/0x58
    kernel:  ret_from_fork+0xa/0x30
    kernel: Slab 0x00000372022169c0 objects=20 used=18 fp=0x00000000885a7c88 flags=0x3ffff00000000a00(workingset|slab|node=0|zone=1|lastcpupid=0x1ffff)
    kernel: Object 0x00000000885a74b8 @offset=1208 fp=0x00000000885a7c88
    kernel: Redzone  00000000885a74b0: bb bb bb bb bb bb bb bb                          ........
    kernel: Object   00000000885a74b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    kernel: Object   00000000885a74c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    kernel: Object   00000000885a74d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    kernel: Object   00000000885a74e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    kernel: Object   00000000885a74f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    kernel: Object   00000000885a7508: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 68 4b 6b 6b 6b a5  kkkkkkkkkkhKkkk.
    kernel: Redzone  00000000885a7518: bb bb bb bb bb bb bb bb                          ........
    kernel: Padding  00000000885a756c: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a              ZZZZZZZZZZZZ
    kernel: CPU: 0 PID: 387 Comm: systemd-udevd Not tainted 6.8.0-HF #2
    kernel: Hardware name: IBM 3931 A01 704 (KVM/Linux)
    kernel: Call Trace:
    kernel:  [<00000000ca5ab5b8>] dump_stack_lvl+0x90/0x120
    kernel:  [<00000000c99d78bc>] check_bytes_and_report+0x114/0x140
    kernel:  [<00000000c99d53cc>] check_object+0x334/0x3f8
    kernel:  [<00000000c99d820c>] alloc_debug_processing+0xc4/0x1f8
    kernel:  [<00000000c99d852e>] get_partial_node.part.0+0x1ee/0x3e0
    kernel:  [<00000000c99d94ec>] ___slab_alloc+0xaf4/0x13c8
    kernel:  [<00000000c99d9e38>] __slab_alloc.constprop.0+0x78/0xb8
    kernel:  [<00000000c99dc8dc>] __kmalloc+0x434/0x590
    kernel:  [<00000000c9b4c0ce>] ext4_htree_store_dirent+0x4e/0x1c0
    kernel:  [<00000000c9b908a2>] htree_dirblock_to_tree+0x17a/0x3f0
    kernel:  [<00000000c9b919dc>] ext4_htree_fill_tree+0x134/0x400
    kernel:  [<00000000c9b4b3d0>] ext4_dx_readdir+0x160/0x2f0
    kernel:  [<00000000c9b4bedc>] ext4_readdir+0x5f4/0x760
    kernel:  [<00000000c9a7efc4>] iterate_dir+0xb4/0x280
    kernel:  [<00000000c9a7f1ea>] __do_sys_getdents64+0x5a/0x120
    kernel:  [<00000000ca5d6946>] __do_syscall+0x256/0x310
    kernel:  [<00000000ca5eea10>] system_call+0x70/0x98
    kernel: INFO: lockdep is turned off.
    kernel: FIX kmalloc-96: Restoring Poison 0x00000000885a7512-0x00000000885a7513=0x6b
    kernel: FIX kmalloc-96: Marking all objects used

The fix is simple: Before use of the queue not only the queue object
but also the card object needs to increase it's reference count
with a call to zcrypt_card_get(). Similar after use of the queue
not only the queue but also the card object's reference count is
decreased with zcrypt_card_put().

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-13 09:23:44 +01:00
Harald Freudenberger
5dabfecad4 s390/pkey: improve pkey retry behavior
This patch reworks and improves the pkey retry behavior for the
pkey_ep11key2pkey() function. In contrast to the pkey_skey2pkey()
function which is used to trigger a protected key derivation from an
CCA secure data or cipher key the EP11 counterpart function had no
proper retry loop implemented. This patch now introduces code which
acts similar to the retry already done for CCA keys for this function
used for EP11 keys.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:15 +01:00
Harald Freudenberger
c3384369bc s390/zcrypt: improve zcrypt retry behavior
This patch reworks and improves the zcrypt retry behavior:
- The zcrypt_rescan_req counter has been removed. This
  counter variable has been increased on some transport
  errors and was used as a gatekeeper for AP bus rescans.
- Rework of the zcrypt_process_rescan() function to not
  use the above counter variable any more. Instead now
  always the ap_bus_force_rescan() function is called
  (as this has been improved with a previous patch).
- As the zcrpyt_process_rescan() function is called in
  all cprb send functions in case of the first attempt
  to send failed with ENODEV now before the next attempt
  to send an cprb is started.
- Introduce a define ZCRYPT_WAIT_BINDINGS_COMPLETE_MS
  for the amount of milliseconds to have the zcrypt API
  wait for AP bindings complete. This amount has been
  reduced to 30s (was 60s). Some playing around showed
  that 30s is a really fair limit.

The result of the above together with the patches to
improve the AP scan bus functions is that after the
first loop of cprb send retries when the result is a
ENODEV the AP bus scan is always triggered (synchronous).
If the AP bus scan detects changes in the configuration,
all the send functions now retry when the first attempt
was failing with ENODEV in the hope that now a suitable
device has appeared.

About concurrency: The ap_bus_force_rescan() uses a mutex
to ensure only one active AP bus scan is running. Another
caller of this function is blocked as long as the scan is
running but does not cause yet another scan. Instead the
result of the 'other' scan is used. This affects only tasks
which run into an initial ENODEV. Tasks with successful
delivery of cprbs will never invoke the bus scan and thus
never get blocked by the mutex.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:15 +01:00
Harald Freudenberger
77c51fc6fb s390/zcrypt: introduce retries on in-kernel send CPRB functions
The both functions zcrypt_send_cprb() and zcrypt_send_ep11_cprb()
are used to send CPRBs in-kernel from different sources. For
example the pkey module may call one of the functions in
zcrypt_ep11misc.c to trigger a derive of a protected key from
a secure key blob via an existing crypto card. These both
functions are then the internal API to send the CPRB and
receive the response.

All the ioctl functions to send an CPRB down to the addressed
crypto card use some kind of retry mechanism. When the first
attempt fails with ENODEV, a bus rescan is triggered and a
loop with retries is carried out.

For the both named internal functions there was never any
retry attempt made. This patch now introduces the retry code
even for this both internal functions to have effectively
same behavior on sending an CPRB from an in-kernel source
and sending an CPRB from userspace via ioctl.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:14 +01:00
Harald Freudenberger
eacf5b3651 s390/ap: introduce mutex to lock the AP bus scan
Rework the invocations around ap_scan_bus():
- Protect ap_scan_bus() with a mutex to make sure only one
  scan at a time is running.
- The workqueue invocation which is triggered by either the
  module init or via AP bus scan timer expiration uses this
  mutex and if there is already a scan running, the work
  is simple aborted (as the job is done by another task).
- The ap_bus_force_rescan() which is invoked by higher level
  layers mostly on failures which indicate a bus scan may
  help is reworked to call ap_scan_bus() direct instead of
  enqueuing work into a system workqueue and waiting for that
  to finish. Of course the mutex is respected and in case of
  another task already running a bus scan the shortcut of
  waiting for this scan to finish and reusing the scan result
  is taken.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:14 +01:00
Harald Freudenberger
b5caf05ee8 s390/ap: rework ap_scan_bus() to return true on config change
The AP scan bus function now returns true if there have
been any config changes detected. This will become
important in a follow up patch which will exploit this
hint for further actions. This also required to have
the AP scan bus timer callback reworked as the function
signature has changed to bool ap_scan_bus(void).

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:14 +01:00
Harald Freudenberger
99b3126e46 s390/ap: clarify AP scan bus related functions and variables
This patch tries to clarify the functions and variables
around the AP scan bus job. All these variables and
functions start with ap_scan_bus and are declared in
one place now.

No functional changes in this patch - only renaming and
move of code or declarations.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:14 +01:00
Harald Freudenberger
778412ab91 s390/ap: rearm APQNs bindings complete completion
The APQN bindings complete completion was used to reflect
that 1st the AP bus initial scan is done and 2nd all the
detected APQNs have been bound to a device driver.
This was a single-shot action. However, as the AP bus
supports hot-plug it may be that new APQNs appear reflected
as new AP queue and card devices which need to be bound
to appropriate device drivers. So the condition that
all existing AP queue devices are bound to device drivers
may go away for a certain time.

This patch now checks during AP bus scan for maybe new AP
devices appearing and does a re-init of the internal completion
variable. So the AP bus function ap_wait_apqn_bindings_complete()
now may block on this condition variable even later after
initial scan is through when new APQNs appear which need to
get bound.

This patch also moves the check for binding complete invocation
from the probe function to the end of the AP bus scan function.
This change also covers some weird scenarios where during a
card hotplug the binding of the card device was sufficient for
binding complete but the queue devices where still in the
process of being discovered.

As of now this change has no impact on existing code. The
behavior change in the now later bindings complete should not
impact any code (and has been tested so far). The only
exploiter is the zcrypt function zcrypt_wait_api_operational()
which only initial calls ap_wait_apqn_bindings_complete().

However, this new behavior of the AP bus wait for APQNs bindings
complete function will be used in a later patch exploiting
this for the zcrypt API layer.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:14 +01:00
Jason J. Herne
a681226c67 s390/vfio-ap: handle hardware checkstop state on queue reset operation
Update vfio_ap_mdev_reset_queue() to handle an unexpected checkstop (hardware error) the
same as the deconfigured case. This prevents unexpected and unhelpful warnings in the
event of a hardware error.

We also stop lying about a queue's reset response code. This was originally done so we
could force vfio_ap_mdev_filter_matrix to pass a deconfigured device through to the guest
for the hotplug scenario. vfio_ap_mdev_filter_matrix is instead modified to allow
passthrough for all queues with reset state normal, deconfigured, or checkstopped. In the
checkstopped case we choose to pass the device through and let the error state be
reflected at the guest level.

Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20240215153144.14747-1-jjherne@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:14 +01:00
Holger Dengler
d065bdb4d1 s390/ap: explicitly include ultravisor header
The ap_bus is using inline functions of the ultravisor (uv) in-kernel
API. The related header file is implicitly included via several other
headers. Replace this by an explicit include of the ultravisor header
in the ap_bus file.

Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:31 +01:00
Harald Freudenberger
b69b65f511 s390/zcrypt: add debug possibility for CCA and EP11 messages
This patch introduces dynamic debug hexdump invocation
possibilities to be able to:
- dump an CCA or EP11 CPRB request as early as possible
  when received via ioctl from userspace but after the
  ap message has been collected together.
- dump an CCA or EP11 CPRB reply short before it is
  transferred via ioctl into userspace.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:13 +01:00
Harald Freudenberger
6a2892d09d s390/ap: add debug possibility for AP messages
This patch introduces two dynamic debug hexdump
invocation possibilities to be able to a) dump an
AP message immediately before it goes into the
firmware queue and b) dump a fresh from the
firmware queue received AP message.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:13 +01:00
Harald Freudenberger
6d749b4e02 s390/pkey: introduce dynamic debugging for pkey
This patch replaces all the s390 debug feature calls with
debug level by dynamic debug calls pr_debug. These calls
are much more flexible and each single invocation can get
enabled/disabled at runtime wheres the s390 debug feature
debug calls have only one knob - enable or disable all in
one bunch.

This patch follows a similar change for the AP bus and
zcrypt device driver code. All this code uses dynamic
debugging with pr_debug and friends for emitting debug
traces now.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:13 +01:00
Harald Freudenberger
0ccac45295 s390/pkey: harmonize pkey s390 debug feature calls
Cleanup and harmonize the s390 debug feature calls
and defines for the pkey module to be similar to
the debug feature as it is used in the zcrypt device
driver and AP bus.

More or less only renaming but no functional changes.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:13 +01:00
Harald Freudenberger
08b2c3706d s390/zcrypt: introduce dynamic debugging for AP and zcrypt code
This patch replaces all the s390 debug feature calls with
debug level by dynamic debug calls pr_debug. These calls
are much more flexible and each single invocation can get
enabled/disabled at runtime wheres the s390 debug feature
debug calls have only one knob - enable or disable all in
one bunch. The benefit is especially significant with
high frequency called functions like the AP bus scan. In
most debugging scenarios you don't want and need them, but
sometimes it is crucial to know exactly when and how long
the AP bus scan took.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:12 +01:00
Harald Freudenberger
88e4c0da9b s390/zcrypt: harmonize debug feature calls and defines
This patch harmonizes the calls and defines around the
s390 debug feature as it is used in the AP bus and
zcrypt device driver code.

More or less cleanup and renaming, no functional changes.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:12 +01:00
Ricardo B. Marliere
9e99049a80 s390/vfio-ap: make matrix_bus const
Now that the driver core can properly handle constant struct bus_type,
move the matrix_bus variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240203-bus_cleanup-s390-v1-6-ac891afc7282@marliere.net
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:16 +01:00
Ricardo B. Marliere
5b43178754 s390/ap: make ap_bus_type const
Now that the driver core can properly handle constant struct bus_type,
move the ap_bus_type variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240203-bus_cleanup-s390-v1-5-ac891afc7282@marliere.net
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:16 +01:00
Tony Krowiak
b9bd10c434 s390/vfio-ap: do not reset queue removed from host config
When a queue is unbound from the vfio_ap device driver, it is reset to
ensure its crypto data is not leaked when it is bound to another device
driver. If the queue is unbound due to the fact that the adapter or domain
was removed from the host's AP configuration, then attempting to reset it
will fail with response code 01 (APID not valid) getting returned from the
reset command. Let's ensure that the queue is assigned to the host's
configuration before resetting it.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: eeb386aeb5 ("s390/vfio-ap: handle config changed and scan complete notification")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240115185441.31526-7-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17 13:53:06 +01:00
Tony Krowiak
f009cfa466 s390/vfio-ap: reset queues associated with adapter for queue unbound from driver
When a queue is unbound from the vfio_ap device driver, if that queue is
assigned to a guest's AP configuration, its associated adapter is removed
because queues are defined to a guest via a matrix of adapters and
domains; so, it is not possible to remove a single queue.

If an adapter is removed from the guest's AP configuration, all associated
queues must be reset to prevent leaking crypto data should any of them be
assigned to a different guest or device driver. The one caveat is that if
the queue is being removed because the adapter or domain has been removed
from the host's AP configuration, then an attempt to reset the queue will
fail with response code 01, AP-queue number not valid; so resetting these
queues should be skipped.

Acked-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Fixes: 09d31ff787 ("s390/vfio-ap: hot plug/unplug of AP devices when probed/removed")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240115185441.31526-6-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17 13:53:05 +01:00
Tony Krowiak
f848cba767 s390/vfio-ap: reset queues filtered from the guest's AP config
When filtering the adapters from the configuration profile for a guest to
create or update a guest's AP configuration, if the APID of an adapter and
the APQI of a domain identify a queue device that is not bound to the
vfio_ap device driver, the APID of the adapter will be filtered because an
individual APQN can not be filtered due to the fact the APQNs are assigned
to an AP configuration as a matrix of APIDs and APQIs. Consequently, a
guest will not have access to all of the queues associated with the
filtered adapter. If the queues are subsequently made available again to
the guest, they should re-appear in a reset state; so, let's make sure all
queues associated with an adapter unplugged from the guest are reset.

In order to identify the set of queues that need to be reset, let's allow a
vfio_ap_queue object to be simultaneously stored in both a hashtable and a
list: A hashtable used to store all of the queues assigned
to a matrix mdev; and/or, a list used to store a subset of the queues that
need to be reset. For example, when an adapter is hot unplugged from a
guest, all guest queues associated with that adapter must be reset. Since
that may be a subset of those assigned to the matrix mdev, they can be
stored in a list that can be passed to the vfio_ap_mdev_reset_queues
function.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 48cae940c3 ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240115185441.31526-5-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17 13:53:05 +01:00
Tony Krowiak
774d10196e s390/vfio-ap: let on_scan_complete() callback filter matrix and update guest's APCB
When adapters and/or domains are added to the host's AP configuration, this
may result in multiple queue devices getting created and probed by the
vfio_ap device driver. For each queue device probed, the matrix of adapters
and domains assigned to a matrix mdev will be filtered to update the
guest's APCB. If any adapters or domains get added to or removed from the
APCB, the guest's AP configuration will be dynamically updated (i.e., hot
plug/unplug). To dynamically update the guest's configuration, its VCPUs
must be taken out of SIE for the period of time it takes to make the
update. This is disruptive to the guest's operation and if there are many
queues probed due to a change in the host's AP configuration, this could be
troublesome. The problem is exacerbated by the fact that the
'on_scan_complete' callback also filters the mdev's matrix and updates
the guest's AP configuration.

In order to reduce the potential amount of disruption to the guest that may
result from a change to the host's AP configuration, let's bypass the
filtering of the matrix and updating of the guest's AP configuration in the
probe callback - if due to a host config change - and defer it until the
'on_scan_complete' callback is invoked after the AP bus finishes its device
scan operation. This way the filtering and updating will be performed only
once regardless of the number of queues added.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 48cae940c3 ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240115185441.31526-4-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17 13:53:05 +01:00
Tony Krowiak
16fb78cbf5 s390/vfio-ap: loop over the shadow APCB when filtering guest's AP configuration
While filtering the mdev matrix, it doesn't make sense - and will have
unexpected results - to filter an APID from the matrix if the APID or one
of the associated APQIs is not in the host's AP configuration. There are
two reasons for this:

1. An adapter or domain that is not in the host's AP configuration can be
   assigned to the matrix; this is known as over-provisioning. Queue
   devices, however, are only created for adapters and domains in the
   host's AP configuration, so there will be no queues associated with an
   over-provisioned adapter or domain to filter.

2. The adapter or domain may have been externally removed from the host's
   configuration via an SE or HMC attached to a DPM enabled LPAR. In this
   case, the vfio_ap device driver would have been notified by the AP bus
   via the on_config_changed callback and the adapter or domain would
   have already been filtered.

Since the matrix_mdev->shadow_apcb.apm and matrix_mdev->shadow_apcb.aqm are
copied from the mdev matrix sans the APIDs and APQIs not in the host's AP
configuration, let's loop over those bitmaps instead of those assigned to
the matrix.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 48cae940c3 ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240115185441.31526-3-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17 13:53:05 +01:00
Tony Krowiak
850fb7fa8c s390/vfio-ap: always filter entire AP matrix
The vfio_ap_mdev_filter_matrix function is called whenever a new adapter or
domain is assigned to the mdev. The purpose of the function is to update
the guest's AP configuration by filtering the matrix of adapters and
domains assigned to the mdev. When an adapter or domain is assigned, only
the APQNs associated with the APID of the new adapter or APQI of the new
domain are inspected. If an APQN does not reference a queue device bound to
the vfio_ap device driver, then it's APID will be filtered from the mdev's
matrix when updating the guest's AP configuration.

Inspecting only the APID of the new adapter or APQI of the new domain will
result in passing AP queues through to a guest that are not bound to the
vfio_ap device driver under certain circumstances. Consider the following:

guest's AP configuration (all also assigned to the mdev's matrix):
14.0004
14.0005
14.0006
16.0004
16.0005
16.0006

unassign domain 4
unbind queue 16.0005
assign domain 4

When domain 4 is re-assigned, since only domain 4 will be inspected, the
APQNs that will be examined will be:
14.0004
16.0004

Since both of those APQNs reference queue devices that are bound to the
vfio_ap device driver, nothing will get filtered from the mdev's matrix
when updating the guest's AP configuration. Consequently, queue 16.0005
will get passed through despite not being bound to the driver. This
violates the linux device model requirement that a guest shall only be
given access to devices bound to the device driver facilitating their
pass-through.

To resolve this problem, every adapter and domain assigned to the mdev will
be inspected when filtering the mdev's matrix.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 48cae940c3 ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240115185441.31526-2-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17 13:53:05 +01:00
Linus Torvalds
de927f6c0b s390 updates for 6.8 merge window
- Add machine variable capacity information to /proc/sysinfo.
 
 - Limit the waste of page tables and always align vmalloc area size
   and base address on segment boundary.
 
 - Fix a memory leak when an attempt to register interruption sub class
   (ISC) for the adjunct-processor (AP) guest failed.
 
 - Reset response code AP_RESPONSE_INVALID_GISA to understandable
   by guest AP_RESPONSE_INVALID_ADDRESS in response to a failed
   interruption sub class (ISC) registration attempt.
 
 - Improve reaction to adjunct-processor (AP) AP_RESPONSE_OTHERWISE_CHANGED
   response code when enabling interrupts on behalf of a guest.
 
 - Fix incorrect sysfs 'status' attribute of adjunct-processor (AP) queue
   device bound to the vfio_ap device driver when the mediated device is
   attached to a guest, but the queue device is not passed through.
 
 - Rework struct ap_card to hold the whole adjunct-processor (AP) card
   hardware information. As result, all the ugly bit checks are replaced
   by simple evaluations of the required bit fields.
 
 - Improve handling of some weird scenarios between service element (SE)
   host and SE guest with adjunct-processor (AP) pass-through support.
 
 - Change local_ctl_set_bit() and local_ctl_clear_bit() so they return the
   previous value of the to be changed control register. This is useful if
   a bit is only changed temporarily and the previous content needs to be
   restored.
 
 - The kernel starts with machine checks disabled and is expected to enable
   it once trap_init() is called. However the implementation allows machine
   checks early. Consistently enable it in trap_init() only.
 
 - local_mcck_disable() and local_mcck_enable() assume that machine checks
   are always enabled. Instead implement and use local_mcck_save() and
   local_mcck_restore() to disable machine checks and restore the previous
   state.
 
 - Modification of floating point control (FPC) register of a traced
   process using ptrace interface may lead to corruption of the FPC
   register of the tracing process. Fix this.
 
 - kvm_arch_vcpu_ioctl_set_fpu() allows to set the floating point control
   (FPC) register in vCPU, but may lead to corruption of the FPC register
   of the host process. Fix this.
 
 - Use READ_ONCE() to read a vCPU floating point register value from the
   memory mapped area. This avoids that, depending on code generation,
   a different value is tested for validity than the one that is used.
 
 - Get rid of test_fp_ctl(), since it is quite subtle to use it correctly.
   Instead copy a new floating point control register value into its save
   area and test the validity of the new value when loading it.
 
 - Remove superfluous save_fpu_regs() call.
 
 - Remove s390 support for ARCH_WANTS_DYNAMIC_TASK_STRUCT. All machines
   provide the vector facility since many years and the need to make the
   task structure size dependent on the vector facility does not exist.
 
 - Remove the "novx" kernel command line option, as the vector code runs
   without any problems since many years.
 
 - Add the vector facility to the z13 architecture level set (ALS).
   All hypervisors support the vector facility since many years.
   This allows compile time optimizations of the kernel.
 
 - Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx(). As result,
   the compiled code will have less runtime checks and less code.
 
 - Convert pgste_get_lock() and pgste_set_unlock() ASM inlines to C.
 
 - Convert the struct subchannel spinlock from pointer to member.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZZxnchccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8PyGAP9Z0Z1Pm3hPf24M4FgY5uuRqBHo
 VoiuohOZRGONwifnsgEAmN/3ba/7d/j0iMwpUdUFouiqtwTUihh15wYx1sH2IA8=
 =F6YD
 -----END PGP SIGNATURE-----

Merge tag 's390-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

 - Add machine variable capacity information to /proc/sysinfo.

 - Limit the waste of page tables and always align vmalloc area size and
   base address on segment boundary.

 - Fix a memory leak when an attempt to register interruption sub class
   (ISC) for the adjunct-processor (AP) guest failed.

 - Reset response code AP_RESPONSE_INVALID_GISA to understandable by
   guest AP_RESPONSE_INVALID_ADDRESS in response to a failed
   interruption sub class (ISC) registration attempt.

 - Improve reaction to adjunct-processor (AP)
   AP_RESPONSE_OTHERWISE_CHANGED response code when enabling interrupts
   on behalf of a guest.

 - Fix incorrect sysfs 'status' attribute of adjunct-processor (AP)
   queue device bound to the vfio_ap device driver when the mediated
   device is attached to a guest, but the queue device is not passed
   through.

 - Rework struct ap_card to hold the whole adjunct-processor (AP) card
   hardware information. As result, all the ugly bit checks are replaced
   by simple evaluations of the required bit fields.

 - Improve handling of some weird scenarios between service element (SE)
   host and SE guest with adjunct-processor (AP) pass-through support.

 - Change local_ctl_set_bit() and local_ctl_clear_bit() so they return
   the previous value of the to be changed control register. This is
   useful if a bit is only changed temporarily and the previous content
   needs to be restored.

 - The kernel starts with machine checks disabled and is expected to
   enable it once trap_init() is called. However the implementation
   allows machine checks early. Consistently enable it in trap_init()
   only.

 - local_mcck_disable() and local_mcck_enable() assume that machine
   checks are always enabled. Instead implement and use
   local_mcck_save() and local_mcck_restore() to disable machine checks
   and restore the previous state.

 - Modification of floating point control (FPC) register of a traced
   process using ptrace interface may lead to corruption of the FPC
   register of the tracing process. Fix this.

 - kvm_arch_vcpu_ioctl_set_fpu() allows to set the floating point
   control (FPC) register in vCPU, but may lead to corruption of the FPC
   register of the host process. Fix this.

 - Use READ_ONCE() to read a vCPU floating point register value from the
   memory mapped area. This avoids that, depending on code generation, a
   different value is tested for validity than the one that is used.

 - Get rid of test_fp_ctl(), since it is quite subtle to use it
   correctly. Instead copy a new floating point control register value
   into its save area and test the validity of the new value when
   loading it.

 - Remove superfluous save_fpu_regs() call.

 - Remove s390 support for ARCH_WANTS_DYNAMIC_TASK_STRUCT. All machines
   provide the vector facility since many years and the need to make the
   task structure size dependent on the vector facility does not exist.

 - Remove the "novx" kernel command line option, as the vector code runs
   without any problems since many years.

 - Add the vector facility to the z13 architecture level set (ALS). All
   hypervisors support the vector facility since many years. This allows
   compile time optimizations of the kernel.

 - Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx(). As
   result, the compiled code will have less runtime checks and less
   code.

 - Convert pgste_get_lock() and pgste_set_unlock() ASM inlines to C.

 - Convert the struct subchannel spinlock from pointer to member.

* tag 's390-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (24 commits)
  Revert "s390: update defconfigs"
  s390/cio: make sch->lock spinlock pointer a member
  s390: update defconfigs
  s390/mm: convert pgste locking functions to C
  s390/fpu: get rid of MACHINE_HAS_VX
  s390/als: add vector facility to z13 architecture level set
  s390/fpu: remove "novx" option
  s390/fpu: remove ARCH_WANTS_DYNAMIC_TASK_STRUCT support
  KVM: s390: remove superfluous save_fpu_regs() call
  s390/fpu: get rid of test_fp_ctl()
  KVM: s390: use READ_ONCE() to read fpc register value
  KVM: s390: fix setting of fpc register
  s390/ptrace: handle setting of fpc register correctly
  s390/nmi: implement and use local_mcck_save() / local_mcck_restore()
  s390/nmi: consistently enable machine checks in trap_init()
  s390/ctlreg: return old register contents when changing bits
  s390/ap: handle outband SE bind state change
  s390/ap: store TAPQ hwinfo in struct ap_card
  s390/vfio-ap: fix sysfs status attribute for AP queue devices
  s390/vfio-ap: improve reaction to response code 07 from PQAP(AQIC) command
  ...
2024-01-10 18:18:20 -08:00
Harald Freudenberger
207022d39d s390/ap: handle outband SE bind state change
This patch addresses some weird scenarios where an outband
manipulation of the SE bind state of a queue assigned and
maybe in use by an SE guest with AP pass-through support
took place. So for example when the guest has bound and
associated a queue and then this domain has been zeroed on
the service element.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30 16:24:23 +01:00
Harald Freudenberger
d4c53ae8e4 s390/ap: store TAPQ hwinfo in struct ap_card
As of now the AP card struct held only part of the
queue's hwinfo (that is the GR2 register content returned
with an TAPQ invocation). This patch reworks struct ap_card
to hold the whole hwinfo now.

As there is a nice bit field union on top of this
ap_tapq_hwinfo struct, all the ugly bit checkings can
now get replaced by simple evaluations of the required
bit field.

Suggested-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30 16:24:23 +01:00
Tony Krowiak
a0d8f4eeb7 s390/vfio-ap: fix sysfs status attribute for AP queue devices
The 'status' attribute for AP queue devices bound to the vfio_ap device
driver displays incorrect status when the mediated device is attached to a
guest, but the queue device is not passed through. In the current
implementation, the status displayed is 'in_use' which is not correct; it
should be 'assigned'. This can happen if one of the queue devices
associated with a given adapter is not bound to the vfio_ap device driver.
For example:

Queues listed in /sys/bus/ap/drivers/vfio_ap:
14.0005
14.0006
14.000d
16.0006
16.000d

Queues listed in /sys/devices/vfio_ap/matrix/$UUID/matrix
14.0005
14.0006
14.000d
16.0005
16.0006
16.000d

Queues listed in /sys/devices/vfio_ap/matrix/$UUID/guest_matrix
14.0005
14.0006
14.000d

The reason no queues for adapter 0x16 are listed in the guest_matrix is
because queue 16.0005 is not bound to the vfio_ap device driver, so no
queue associated with the adapter is passed through to the guest;
therefore, each queue device for adapter 0x16 should display 'assigned'
instead of 'in_use', because those queues are not in use by a guest, but
only assigned to the mediated device.

Let's check the AP configuration for the guest to determine whether a
queue device is passed through before displaying a status of 'in_use'.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20231108201135.351419-1-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30 16:24:23 +01:00
Tony Krowiak
c44ce57924 s390/vfio-ap: improve reaction to response code 07 from PQAP(AQIC) command
Let's improve the vfio_ap driver's reaction to reception of response code
07 from the PQAP(AQIC) command when enabling interrupts on behalf of a
guest:

* Unregister the guest's ISC before the pages containing the notification
  indicator bytes are unpinned.

* Capture the return code from the kvm_s390_gisc_unregister function and
  log a DBF warning if it fails.

Suggested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20231109164427.460493-4-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30 16:24:23 +01:00
Anthony Krowiak
3746d48c55 s390/vfio-ap: set status response code to 06 on gisc registration failure
The interception handler for the PQAP(AQIC) command calls the
kvm_s390_gisc_register function to register the guest ISC with the channel
subsystem. If that call fails, the status response code 08 - indicating
Invalid ZONE/GISA designation - is returned to the guest. This response
code is not valid because setting the ZONE/GISA values is the
responsibility of the hypervisor controlling the guest and there is nothing
that can be done from the guest perspective to correct that problem.

The likelihood of GISC registration failure is nil and there is no status
response code to indicate an invalid ISC value, so let's set the response
code to 06 indicating 'Invalid address of AP-queue notification byte'.
While this is not entirely accurate, it is better than setting a response
code which makes no sense for the guest.

Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20231109164427.460493-3-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30 16:24:22 +01:00
Anthony Krowiak
7b2d039da6 s390/vfio-ap: unpin pages on gisc registration failure
In the vfio_ap_irq_enable function, after the page containing the
notification indicator byte (NIB) is pinned, the function attempts
to register the guest ISC. If registration fails, the function sets the
status response code and returns without unpinning the page containing
the NIB. In order to avoid a memory leak, the NIB should be unpinned before
returning from the vfio_ap_irq_enable function.

Co-developed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Fixes: 783f0a3ccd ("s390/vfio-ap: add s390dbf logging to the vfio_ap_irq_enable function")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20231109164427.460493-2-akrowiak@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30 16:24:22 +01:00
Christian Brauner
3652117f85 eventfd: simplify eventfd_signal()
Ever since the eventfd type was introduced back in 2007 in commit
e1ad7468c7 ("signal/timer/event: eventfd core") the eventfd_signal()
function only ever passed 1 as a value for @n. There's no point in
keeping that additional argument.

Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org
Acked-by: Xu Yilun <yilun.xu@intel.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl
Acked-by: Eric Farman <farman@linux.ibm.com>  # s390
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-28 14:08:38 +01:00
Harald Freudenberger
5cf1a563a3 s390/ap: fix vanishing crypto cards in SE environment
A secure execution (SE, also known as confidential computing)
guest may see asynchronous errors on a crypto firmware queue.
The current implementation to gather information about cards
and queues in ap_queue_info() simple returns if an asynchronous
error is hanging on the firmware queue. If such a situation
happened and it was the only queue visible for a crypto card
within an SE guest, then the card vanished from sysfs as
the AP bus scan function refuses to hold a card without any
type information. As lszcrypt evaluates the sysfs such
a card vanished from the lszcrypt card listing and the
user is baffled and has no way to reset and thus clear the
pending asynchronous error.

This patch improves the named function to also evaluate GR2
of the TAPQ in case of asynchronous error pending. If there
is a not-null value stored in, the info is processed now.
In the end, a queue with pending asynchronous error does not
lead to a vanishing card any more.

Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-11-05 22:34:57 +01:00
Ingo Franzki
cfaef6516e s390/zcrypt: don't report online if card or queue is in check-stop state
If a card or a queue is in check-stop state, it can't be used by
applications to perform crypto operations. Applications check the 'online'
sysfs attribute to check if a card or queue is usable.

Report a card or queue as offline in case it is in check-stop state.
Furthermore, don't allow to set a card or queue online, if it is in
check-stop state.

This is similar to when the card or the queue is in deconfigured state,
there it is also reported as being offline, and it can't be set online.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-11-05 22:34:57 +01:00
Harald Freudenberger
e14aec2302 s390/ap: fix AP bus crash on early config change callback invocation
Fix kernel crash in AP bus code caused by very early invocation of the
config change callback function via SCLP.

After a fresh IML of the machine the crypto cards are still offline and
will get switched online only with activation of any LPAR which has the
card in it's configuration. A crypto card coming online is reported
to the LPAR via SCLP and the AP bus offers a callback function to get
this kind of information. However, it may happen that the callback is
invoked before the AP bus init function is complete. As the callback
triggers a synchronous AP bus scan, the scan may already run but some
internal states are not initialized by the AP bus init function resulting
in a crash like this:

  [   11.635859] Unable to handle kernel pointer dereference in virtual kernel address space
  [   11.635861] Failing address: 0000000000000000 TEID: 0000000000000887
  [   11.635862] Fault in home space mode while using kernel ASCE.
  [   11.635864] AS:00000000894c4007 R3:00000001fece8007 S:00000001fece7800 P:000000000000013d
  [   11.635879] Oops: 0004 ilc:1 [#1] SMP
  [   11.635882] Modules linked in:
  [   11.635884] CPU: 5 PID: 42 Comm: kworker/5:0 Not tainted 6.6.0-rc3-00003-g4dbf7cdc6b42 #12
  [   11.635886] Hardware name: IBM 3931 A01 751 (LPAR)
  [   11.635887] Workqueue: events_long ap_scan_bus
  [   11.635891] Krnl PSW : 0704c00180000000 0000000000000000 (0x0)
  [   11.635895]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
  [   11.635897] Krnl GPRS: 0000000001000a00 0000000000000000 0000000000000006 0000000089591940
  [   11.635899]            0000000080000000 0000000000000a00 0000000000000000 0000000000000000
  [   11.635901]            0000000081870c00 0000000089591000 000000008834e4e2 0000000002625a00
  [   11.635903]            0000000081734200 0000038000913c18 000000008834c6d6 0000038000913ac8
  [   11.635906] Krnl Code:>0000000000000000: 0000                illegal
  [   11.635906]            0000000000000002: 0000                illegal
  [   11.635906]            0000000000000004: 0000                illegal
  [   11.635906]            0000000000000006: 0000                illegal
  [   11.635906]            0000000000000008: 0000                illegal
  [   11.635906]            000000000000000a: 0000                illegal
  [   11.635906]            000000000000000c: 0000                illegal
  [   11.635906]            000000000000000e: 0000                illegal
  [   11.635915] Call Trace:
  [   11.635916]  [<0000000000000000>] 0x0
  [   11.635918]  [<000000008834e4e2>] ap_queue_init_state+0x82/0xb8
  [   11.635921]  [<000000008834ba1c>] ap_scan_domains+0x6fc/0x740
  [   11.635923]  [<000000008834c092>] ap_scan_adapter+0x632/0x8b0
  [   11.635925]  [<000000008834c3e4>] ap_scan_bus+0xd4/0x288
  [   11.635927]  [<00000000879a33ba>] process_one_work+0x19a/0x410
  [   11.635930] Discipline DIAG cannot be used without z/VM
  [   11.635930]  [<00000000879a3a2c>] worker_thread+0x3fc/0x560
  [   11.635933]  [<00000000879aea60>] kthread+0x120/0x128
  [   11.635936]  [<000000008792afa4>] __ret_from_fork+0x3c/0x58
  [   11.635938]  [<00000000885ebe62>] ret_from_fork+0xa/0x30
  [   11.635942] Last Breaking-Event-Address:
  [   11.635942]  [<000000008834c6d4>] ap_wait+0xcc/0x148

This patch improves the ap_bus_force_rescan() function which is
invoked by the config change callback by checking if a first
initial AP bus scan has been done. If not, the force rescan request
is simple ignored. Anyhow it does not make sense to trigger AP bus
re-scans even before the very first bus scan is complete.

Cc: stable@vger.kernel.org
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-11-05 22:34:57 +01:00
Harald Freudenberger
c40284b364 s390/ap: re-enable interrupt for AP queues
This patch introduces some code lines which check
for interrupt support enabled on an AP queue after
a reply has been received. This invocation has been
chosen as there is a good chance to have the queue
empty at that time. As the enablement of the irq
imples a state machine change the queue should not
have any pending requests or unreceived replies.

Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-11-05 22:34:57 +01:00
Harald Freudenberger
01c89ab7f8 s390/ap: rework to use irq info from ap queue status
This patch reworks the irq handling and reporting code
for the AP queue interrupt handling to always use the
irq info from the queue status.

Until now the interrupt status of an AP queue was stored
into a bool variable within the ap_queue struct. This
variable was set on a successful interrupt enablement
and cleared with kicking a reset. However, it may be
that the interrupt state is manipulated outband for
example by a hypervisor. This patch removes this variable
and instead the irq bit from the AP queue status which is
always reflecting the current irq state is used.

Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-11-05 22:34:57 +01:00
Harald Freudenberger
8d6be876bb s390/ap: show APFS value on error reply 0x8B
With Secure Execution the error reply RX 0x8B now carries
an APFS value indicating why the request has been filtered
by a lower layer of the firmware. So display this value
as a warning line in the s390 debug feature trace.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-16 13:04:09 +02:00
Harald Freudenberger
a19a161482 s390/zcrypt: introduce new internal AP queue se_bound attribute
This patch introduces a new AP queue internal attribute
se_bound which reflects the bound state of an APQN within
a Secure Execution environment.

With introduction of Secure Execution guests now an
AP firmware queue needs to be bound to the guest before
usage. This patch introduces a new internal attribute
reflecting this bound state and some glue code to handle
this new field during lifetime of an AP queue device.

Together with that now the zcrypt scheduler considers
the state of the AP queues when a message is about to be
distributed among the existing queues. There is a new
function ap_queue_usable() which returns true only when
all conditions for using this AP queue device are fulfilled.
In details this means: the AP queue needs to be configured,
not checkstopped and within an SE environment it needs
to be bound. So the new function gives and indication
if the AP queue device is ready to serve requests or not.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-16 13:04:09 +02:00
Harald Freudenberger
32d1d9204f s390/ap: re-init AP queues on config on
On a state toggle from config off to config on and on the
state toggle from checkstop to not checkstop the queue's
internal states was set but the state machine was not
nudged. This did not care as on the first enqueue of a
request the state machine kick ran.

However, within an Secure Execution guest a queue is
only chosen by the scheduler when it has been bound.
But to bind a queue, it needs to run through the initial
states (reset, enable interrupts, ...). So this is like
a chicken-and-egg problem and the result was in fact
that a queue was unusable after a config off/on toggle.

With some slight rework of the handling of these states
now the new function _ap_queue_init_state() is called
which is the core of the ap_queue_init_state() function
but without locking handling. This has the benefit that
it can be called on all the places where a (re-)init
of the AP queue's state machine is needed.

Fixes: 2d72eaf036 ("s390/ap: implement SE AP bind, unbind and associate")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-16 13:04:09 +02:00
Ingo Franzki
f3cfb875d0 s390/zcrypt: update list of EP11 operation modes
Add additional operation mode strings into the EP11 operation mode table.
These strings are returned by sysfs entries /sys/devices/ap/cardxx/op_modes
and /sys/devices/ap/cardxx/xx.yyyy/op_modes.

Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:55 +02:00
Linus Torvalds
4a0fc73da9 more s390 updates for 6.6 merge window
- Couple of virtual vs physical address confusion fixes
 
 - Rework locking in dcssblk driver to address a lockdep warning
 
 - Remove support for "noexec" kernel command line option since there
   is no use case where it would make sense
 
 - Simplify kernel mapping setup and get rid of quite a bit of code
 
 - Add architecture specific __set_memory_yy() functions which allow to
   modify kernel mappings. Unlike the set_memory_xx() variants they
   take void pointer start and end parameters, which allows to use them
   without the usual casts, and also to use them on areas larger than
   8TB.
   Note that the set_memory_xx() family comes with an int num_pages
   parameter which overflows with 8TB. This could be addressed by
   changing the num_pages parameter to unsigned long, however requires
   to change all architectures, since the module code expects an int
   parameter (see module_set_memory()).
   This was indeed an issue since for debug_pagealloc() we call
   set_memory_4k() on the whole identity mapping. Therefore address
   this for now with the __set_memory_yy() variant, and address common
   code later
 
 - Use dev_set_name() and also fix memory leak in zcrypt driver error
   handling
 
 - Remove unused lsi_mask from airq_struct
 
 - Add warning for invalid kernel mapping requests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmT5sYkACgkQIg7DeRsp
 bsJQHg//SFJOcVr2uViorvmt+IyiW78H75y8+AC6qoxZ3fuJeuyW2zz/resRsFRO
 UDCjkV2QYT885ADFUMcunf7LQnJYpSFVjAB6WEhgFr+bFdMWtX1crjHsMHkpcobR
 Fo4IFurG5NOMIdunPwKiO/dLU8pNGTm1T430uyLZH3ApVrmJ8stgOczusqTO/h3V
 tjbC5HAm9BBwNyOd2lnZSi8pbUMcfGE2DkRYhFQup1N9NT0BC4S6B8vTsD7NVYVZ
 WbJzqSl3CnTUJxZFX287YDW6cSaPCLAUfCl7r5FJZUoGvUtOVLlqcOjIIm/72epv
 YygMI8RdM0KHuzOd8XTNDbBnyYWTmWnZypCQVgNezY1GiO5nfTCkwYLZZagcc0Lo
 6fx1B7xBbNao9/pURa6wJ9dXsx+NmD52DjM806hP1H+qoL/YP8dLqV6Qh2C1PFzI
 VncrnG2mnxjwry5WoOyl19SKqXDgmDylnFuHQHKA9mdCaqZWNkqovOo/6qE6OdWI
 Q3TinPwTug9vmgwBuQgua/2R1owy7G9mJwwDW9AtZXVzlMG6k06deKIYrZjLxf6Z
 hOtAu+Xy5IeoXOaaOFU92RsRKX481OTG77rWLREC+orAT1dPrfnguPLlQ7KeFbkR
 foduTTHcDPOLiIVL1RbWQC9AAw/i7IjN04Jp6KFlQTZqLATyPbM=
 =+iSZ
 -----END PGP SIGNATURE-----

Merge tag 's390-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - A couple of virtual vs physical address confusion fixes

 - Rework locking in dcssblk driver to address a lockdep warning

 - Remove support for "noexec" kernel command line option since there is
   no use case where it would make sense

 - Simplify kernel mapping setup and get rid of quite a bit of code

 - Add architecture specific __set_memory_yy() functions which allow us
   to modify kernel mappings. Unlike the set_memory_xx() variants they
   take void pointer start and end parameters, which allows using them
   without the usual casts, and also to use them on areas larger than
   8TB.

   Note that the set_memory_xx() family comes with an int num_pages
   parameter which overflows with 8TB. This could be addressed by
   changing the num_pages parameter to unsigned long, however requires
   to change all architectures, since the module code expects an int
   parameter (see module_set_memory()).

   This was indeed an issue since for debug_pagealloc() we call
   set_memory_4k() on the whole identity mapping. Therefore address this
   for now with the __set_memory_yy() variant, and address common code
   later

 - Use dev_set_name() and also fix memory leak in zcrypt driver error
   handling

 - Remove unused lsi_mask from airq_struct

 - Add warning for invalid kernel mapping requests

* tag 's390-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vmem: do not silently ignore mapping limit
  s390/zcrypt: utilize dev_set_name() ability to use a formatted string
  s390/zcrypt: don't leak memory if dev_set_name() fails
  s390/mm: fix MAX_DMA_ADDRESS physical vs virtual confusion
  s390/airq: remove lsi_mask from airq_struct
  s390/mm: use __set_memory() variants where useful
  s390/set_memory: add __set_memory() variant
  s390/set_memory: generate all set_memory() functions
  s390/mm: improve description of mapping permissions of prefix pages
  s390/amode31: change type of __samode31, __eamode31, etc
  s390/mm: simplify kernel mapping setup
  s390: remove "noexec" option
  s390/vmem: fix virtual vs physical address confusion
  s390/dcssblk: fix lockdep warning
  s390/monreader: fix virtual vs physical address confusion
2023-09-07 10:52:13 -07:00
Andy Shevchenko
f59ec04d38 s390/zcrypt: utilize dev_set_name() ability to use a formatted string
With the dev_set_name() prototype it's not obvious that it takes
a formatted string as a parameter. Use its facility instead of
duplicating the same with strncpy()/snprintf() calls.

With this, also prevent return error code to be shadowed.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230831110000.24279-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-09-05 20:12:52 +02:00
Andy Shevchenko
6252f47b78 s390/zcrypt: don't leak memory if dev_set_name() fails
When dev_set_name() fails, zcdn_create() doesn't free the newly
allocated resources. Do it.

Fixes: 00fab2350e ("s390/zcrypt: multiple zcrypt device nodes support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230831110000.24279-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-09-05 20:12:51 +02:00
Linus Torvalds
ec0e2dc810 VFIO updates for v6.6-rc1
- VFIO direct character device (cdev) interface support.  This extracts
    the vfio device fd from the container and group model, and is intended
    to be the native uAPI for use with IOMMUFD. (Yi Liu)
 
  - Enhancements to the PCI hot reset interface in support of cdev usage.
    (Yi Liu)
 
  - Fix a potential race between registering and unregistering vfio files
    in the kvm-vfio interface and extend use of a lock to avoid extra
    drop and acquires. (Dmitry Torokhov)
 
  - A new vfio-pci variant driver for the AMD/Pensando Distributed Services
    Card (PDS) Ethernet device, supporting live migration. (Brett Creeley)
 
  - Cleanups to remove redundant owner setup in cdx and fsl bus drivers,
    and simplify driver init/exit in fsl code. (Li Zetao)
 
  - Fix uninitialized hole in data structure and pad capability structures
    for alignment. (Stefan Hajnoczi)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmTvnDUbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsimEEP/AzG+VRcu5LfYbLGLe0z
 zB8ts6G7S78wXlmfN/LYi3v92XWvMMcm+vYF8oNAMfr1YL5sibWN6UtQfY1KCr7h
 nWKdQdqjajJ5yDDZnOFdhqHJGNfmZw6+fey8Z0j8zRI2oymK4DncWWX3g/7L1SNr
 9tIexGJef+mOdAmC94yOut3YviAaZ+f95T/xrdXHzzoNr50DD0+PD6AJdKJfKggP
 vhiC/DAYH3Fofaa6tRasgWuKCYWdjZLR/kxgNpeEmW6kZnbq/dnzZ+kgn4HH1f9G
 8p7UKVARR6FfG5aLheWu6Y9PDaKnfnqu8y/hobuE/ivXcmqqK+a6xSxrjgbVs8WJ
 94SYnTBRoTlDJaKWa7GxqdgzJnV+s5ZyAgPhjzdi6mLTPWGzkuLhFWGtYL+LZAQ6
 pNeZSM6CFBk+bva/xT0nNPCXxPh+/j/Y0G18FREj8aPFc03HrJQqz0RLydvTnoDz
 nX/by5KdzMSVSVLPr4uDMtAsgxsGqWiFcp7QMw1HhhlLWxqmYbA+mLZaqyMZUUOx
 6b/P8WXT9P2I+qPVKWQ5CWyqpsEqm6P+72yg6LOM9kINvgwDhOa7cagMXIuMWYMH
 Rf97FL+K8p1eIy6AnvRHgFBMM5185uG+0YcJyVqtucDr/k8T/Om6ujAI6JbWtNe6
 cLgaVAqKOYqCR4HC9bfVGSbd
 =eKSR
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - VFIO direct character device (cdev) interface support. This extracts
   the vfio device fd from the container and group model, and is
   intended to be the native uAPI for use with IOMMUFD (Yi Liu)

 - Enhancements to the PCI hot reset interface in support of cdev usage
   (Yi Liu)

 - Fix a potential race between registering and unregistering vfio files
   in the kvm-vfio interface and extend use of a lock to avoid extra
   drop and acquires (Dmitry Torokhov)

 - A new vfio-pci variant driver for the AMD/Pensando Distributed
   Services Card (PDS) Ethernet device, supporting live migration (Brett
   Creeley)

 - Cleanups to remove redundant owner setup in cdx and fsl bus drivers,
   and simplify driver init/exit in fsl code (Li Zetao)

 - Fix uninitialized hole in data structure and pad capability
   structures for alignment (Stefan Hajnoczi)

* tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio: (53 commits)
  vfio/pds: Send type for SUSPEND_STATUS command
  vfio/pds: fix return value in pds_vfio_get_lm_file()
  pds_core: Fix function header descriptions
  vfio: align capability structures
  vfio/type1: fix cap_migration information leak
  vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code
  vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver
  vfio/pds: Add Kconfig and documentation
  vfio/pds: Add support for firmware recovery
  vfio/pds: Add support for dirty page tracking
  vfio/pds: Add VFIO live migration support
  vfio/pds: register with the pds_core PF
  pds_core: Require callers of register/unregister to pass PF drvdata
  vfio/pds: Initial support for pds VFIO driver
  vfio: Commonize combine_ranges for use in other VFIO drivers
  kvm/vfio: avoid bouncing the mutex when adding and deleting groups
  kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add()
  docs: vfio: Add vfio device cdev description
  vfio: Compile vfio_group infrastructure optionally
  vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev()
  ...
2023-08-30 20:36:01 -07:00
Heiko Carstens
6daf5a6824 Merge branch 'vfio-ap' into features
Tony Krowiak says:

===================
This patch series is for the changes required in the vfio_ap device
driver to facilitate pass-through of crypto devices to a secure
execution guest. In particular, it is critical that no data from the
queues passed through to the SE guest is leaked when the guest is
destroyed. There are also some new response codes returned from the
PQAP(ZAPQ) and PQAP(TAPQ) commands that have been added to the
architecture in support of pass-through of crypto devices to SE guests;
these need to be accounted for when handling the reset of queues.
===================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-23 14:36:37 +02:00
Tony Krowiak
f88fb13357 s390/vfio-ap: make sure nib is shared
Since the NIB is visible by HW, KVM and the (PV) guest it needs to be
in non-secure or secure but shared storage. Return code 6 is used to
indicate to a PV guest that its NIB would be on secure, unshared
storage and therefore the NIB address is invalid.

Unfortunately we have no easy way to check if a page is unshared after
vfio_pin_pages() since it will automatically export an unshared page
if the UV pin shared call did not succeed due to a page being in
unshared state.

Therefore we use the fact that UV pinning it a second time is a nop
but trying to pin an exported page is an error (0x102). If we
encounter this error, we do a vfio unpin and import the page again,
since vfio_pin_pages() exported it.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-13-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:24:39 +02:00
Tony Krowiak
7847a19b5b s390/vfio-ap: check for TAPQ response codes 0x35 and 0x36
Check for response codes 0x35 and 0x36 which are asynchronous return codes
indicating a failure of the guest to associate a secret with a queue. Since
there can be no interaction with this queue from the guest (i.e., the vcpus
are out of SIE for hot unplug, the guest is being shut down or an emulated
subsystem reset of the guest is taking place), let's go ahead and re-issue
the ZAPQ to reset and zeroize the queue.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-10-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:29 +02:00
Tony Krowiak
e1f17f8ea9 s390/vfio-ap: handle queue state change in progress on reset
A new APQSW response code (0xA) indicating the designated queue is in the
process of being bound or associated to a configuration may be returned
from the PQAP(ZAPQ) command. This patch introduces code that will verify
when the PQAP(ZAPQ) command can be re-issued after receiving response code
0xA.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-9-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:29 +02:00
Tony Krowiak
9261f04388 s390/vfio-ap: use work struct to verify queue reset
Instead of waiting to verify that a queue is reset in the
vfio_ap_mdev_reset_queue function, let's use a wait queue to check the
the state of the reset. This way, when resetting all of the queues assigned
to a matrix mdev, we don't have to wait for each queue to be reset before
initiating a reset on the next queue to be reset.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-8-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:29 +02:00
Tony Krowiak
62aab082e9 s390/vfio-ap: store entire AP queue status word with the queue object
Store the entire AP queue status word returned from the ZAPQ command with
the struct vfio_ap_queue object instead of just the response code field.
The other information contained in the status word is need by the
apq_reset_check function to display a proper message to indicate that the
vfio_ap driver is waiting for the ZAPQ to complete because the queue is
not empty or IRQs are still enabled.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-7-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:29 +02:00
Tony Krowiak
dd174833e4 s390/vfio-ap: remove upper limit on wait for queue reset to complete
The architecture does not define an upper limit on how long a queue reset
(RAPQ/ZAPQ) can take to complete. In order to ensure both the security
requirements and prevent resource leakage and corruption in the hypervisor,
it is necessary to remove the upper limit (200ms) the vfio_ap driver
currently waits for a reset to complete. This, of course, may result in a
hang which is a less than desirable user experience, but until a firmware
solution is provided, this is a necessary evil.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-6-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:28 +02:00
Tony Krowiak
c51f8c6bb5 s390/vfio-ap: allow deconfigured queue to be passed through to a guest
When a queue is reset, the status response code returned from the reset
operation is stored in the reset_rc field of the vfio_ap_queue structure
representing the queue being reset. This field is later used to decide
whether the queue should be passed through to a guest. If the reset_rc
field is a non-zero value, the queue will be filtered from the list of
queues passed through.

When an adapter is deconfigured, all queues associated with that adapter
are reset. That being the case, it is not necessary to filter those queues;
so, if the status response code returned from a reset operation indicates
the queue is deconfigured, the reset_rc field of the vfio_ap_queue
structure will be set to zero so it will be passed through (i.e., not
filtered).

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-5-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:28 +02:00
Tony Krowiak
411b0109da s390/vfio-ap: wait for response code 05 to clear on queue reset
Response code 05, AP busy, is a valid response code for a ZAPQ or TAPQ.
Instead of returning error -EIO when a ZAPQ fails with response code 05,
let's wait until the queue is no longer busy and try the ZAPQ again.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-4-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:28 +02:00
Tony Krowiak
7aa7b2a80c s390/vfio-ap: clean up irq resources if possible
The architecture does not specify whether interrupts are disabled as part
of the asynchronous reset or upon return from the PQAP/ZAPQ instruction.
If, however, PQAP/ZAPQ completes with APQSW response code 0 and the
interrupt bit in the status word is also 0, we know the interrupts are
disabled and we can go ahead and clean up the corresponding resources;
otherwise, we must wait until the asynchronous reset has completed.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-3-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:28 +02:00
Tony Krowiak
680b7ddd7e s390/vfio-ap: no need to check the 'E' and 'I' bits in APQSW after TAPQ
After a ZAPQ is executed to reset a queue, if the queue is not empty or
interrupts are still enabled, the vfio_ap driver will wait for the reset
operation to complete by repeatedly executing the TAPQ instruction and
checking the 'E' and 'I' bits in the APQSW to verify that the queue is
empty and interrupts are disabled. This is unnecessary because it is
sufficient to check only the response code in the APQSW. If the reset is
still in progress, the response code will be 02; however, if the reset has
completed successfully, the response code will be 00.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Link: https://lore.kernel.org/r/20230815184333.6554-2-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:09:28 +02:00
Holger Dengler
386cb81e4b s390/zcrypt_ep11misc: support API ordinal 6 with empty pin-blob
Secure execution guest environments require an empty pinblob in all
key generation and unwrap requests. Empty pinblobs are only available
in EP11 API ordinal 6 or higher.

Add an empty pinblob to key generation and unwrap requests, if the AP
secure binding facility is available. In all other cases, stay with
the empty pin tag (no pinblob) and the current API ordinals.

The EP11 API ordinal also needs to be considered when the pkey module
tries to figure out the list of eligible cards for key operations
with protected keys in secure execution environment.

These changes are transparent to userspace but required for running
an secure execution guest with handling key generate and key derive
(e.g. secure key to protected key) correct. Especially using EP11
secure keys with the kernel dm-crypt layer requires this patch.

Co-developed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18 15:07:57 +02:00