- Add an erratum workaround for Intel CPUs which, in certain
circumstances, end up consuming an unrelated uncorrectable memory error
when using fast string copy insns
- Remove the MCE tolerance level control as it is not really needed or
used anymore
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmI7Pe4ACgkQEsHwGGHe
VUpQQRAAjEK4k+iXhWrNaX736WSaVb8qom+JFlAarrOKaJ6UpdQn+IZD8aF7iscr
n1LWGYOyieFvovt69jjTeSprbCVueyhvCmOxxsvH9F2qhNklNwxKEaAPNBXgDuyJ
SOs1fTZO4tS85qZbnZa/Um1keSIacBCVar49sXKsj6Ss+rg6wXnPitQh3ztGOAVn
CBkNE5n6GG2ELjV+fuVOO54NixMtoElj8SIplQ0UOMlQPBO0Z5MkY5VM6LaQVx/e
GGEna6Jo1Z9+b29yf6bR5izWLWcBHTXjvn6i2EIulqKGFRCFmPDBWmuw8YqeyG2a
eT/sxVILKZby0Dj11Q1uxaUcln48WNIM5WPYWojaOelzYNNjJ1Kwa+klrlLOxbnM
j92MSEBe7Nr2w4cukBg+0sIAdtcfRNx5Oov8yXC9VUA0tg4satAoYHdXn35eVJ3z
ZEFo+94H3T0nlCwP+6TayXkTs1k1YICSaCZzp7HcbUdxCsIZQ0kyGknLVtTzydQc
z3GEze35VPeqULeBntoaAb2Vpy76Hs5uBl1lkXv+wEGJuECdDld8IilvqtEzCZy5
vLRizqfXle1PQjlGG+eAqUG/7TPTvDmwuCyHEiCdSf1r3f8WLXevdP4WGyCB/yXy
VYLmz/Rbga1wsFC4w19pe8FM2S6SSeODYqx6zEjiKYgbNjV/thQ=
=oVWo
-----END PGP SIGNATURE-----
Merge tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- More noinstr fixes
- Add an erratum workaround for Intel CPUs which, in certain
circumstances, end up consuming an unrelated uncorrectable memory
error when using fast string copy insns
- Remove the MCE tolerance level control as it is not really needed or
used anymore
* tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Remove the tolerance level control
x86/mce: Work around an erratum on fast string copy instructions
x86/mce: Use arch atomic and bit helpers
Merge yet more updates from Andrew Morton:
"This is the material which was staged after willystuff in linux-next.
Subsystems affected by this patch series: mm (debug, selftests,
pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise),
and selftests"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (113 commits)
selftests: kselftest framework: provide "finished" helper
mm: madvise: MADV_DONTNEED_LOCKED
mm: fix race between MADV_FREE reclaim and blkdev direct IO read
mm: generalize ARCH_HAS_FILTER_PGPROT
mm: unmap_mapping_range_tree() with i_mmap_rwsem shared
mm: warn on deleting redirtied only if accounted
mm/huge_memory: remove stale locking logic from __split_huge_pmd()
mm/huge_memory: remove stale page_trans_huge_mapcount()
mm/swapfile: remove stale reuse_swap_page()
mm/khugepaged: remove reuse_swap_page() usage
mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
mm: streamline COW logic in do_swap_page()
mm: slightly clarify KSM logic in do_swap_page()
mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
mm: optimize do_wp_page() for exclusive pages in the swapcache
mm/huge_memory: make is_transparent_hugepage() static
userfaultfd/selftests: enable hugetlb remap and remove event testing
selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
mm: enable MADV_DONTNEED for hugetlb mappings
kasan: disable LOCKDEP when printing reports
...
Rename kasan_free_shadow to kasan_free_module_shadow and
kasan_module_alloc to kasan_alloc_module_shadow.
These functions are used to allocate/free shadow memory for kernel modules
when KASAN_VMALLOC is not enabled. The new names better reflect their
purpose.
Also reword the comment next to their declaration to improve clarity.
Link: https://lkml.kernel.org/r/36db32bde765d5d0b856f77d2d806e838513fe84.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma-buf:
- rename dma-buf-map to iosys-map
core:
- move buddy allocator to core
- add pci/platform init macros
- improve EDID parser deep color handling
- EDID timing type 7 support
- add GPD Win Max quirk
- add yes/no helpers to string_helpers
- flatten syncobj chains
- add nomodeset support to lots of drivers
- improve fb-helper clipping support
- add default property value interface
fbdev:
- improve fbdev ops speed
ttm:
- add a backpointer from ttm bo->ttm resource
dp:
- move displayport headers
- add a dp helper module
bridge:
- anx7625 atomic support, HDCP support
panel:
- split out panel-lvds and lvds bindings
- find panels in OF subnodes
privacy:
- add chromeos privacy screen support
fb:
- hot unplug fw fb on forced removal
simpledrm:
- request region instead of marking ioresource busy
- add panel oreintation property
udmabuf:
- fix oops with 0 pages
amdgpu:
- power management code cleanup
- Enable freesync video mode by default
- RAS code cleanup
- Improve VRAM access for debug using SDMA
- SR-IOV rework special register access and fixes
- profiling power state request ioctl
- expose IP discovery via sysfs
- Cyan skillfish updates
- GC 10.3.7, SDMA 5.2.7, DCN 3.1.6 updates
- expose benchmark tests via debugfs
- add module param to disable XGMI for testing
- GPU reset debugfs register dumping support
amdkfd:
- CRIU support
- SDMA queue fixes
radeon:
- UVD suspend fix
- iMac backlight fix
i915:
- minimal parallel submission for execlists
- DG2-G12 subplatform added
- DG2 programming workarounds
- DG2 accelerated migration support
- flat CCS and CCS engine support for XeHP
- initial small BAR support
- drop fake LMEM support
- ADL-N PCH support
- bigjoiner updates
- introduce VMA resources and async unbinding
- register definitions cleanups
- multi-FBC refactoring
- DG1 OPROM over SPI support
- ADL-N platform enabling
- opregion mailbox #5 support
- DP MST ESI improvements
- drm device based logging
- async flip optimisation for DG2
- CPU arch abstraction fixes
- improve GuC ADS init to work on aarch64
- tweak TTM LRU priority hint
- GuC 69.0.3 support
- remove short term execbuf pins
nouveau:
- higher DP/eDP bitrates
- backlight fixes
msm:
- dpu + dp support for sc8180x
- dp support for sm8350
- dpu + dsi support for qcm2290
- 10nm dsi phy tuning support
- bridge support for dp encoder
- gpu support for additional 7c3 SKUs
ingenic:
- HDMI support for JZ4780
- aux channel EDID support
ast:
- AST2600 support
- add wide screen support
- create DP/DVI connectors
omapdrm:
- fix implicit dma_buf fencing
vc4:
- add CSC + full range support
- better display firmware handoff
panfrost:
- add initial dual-core GPU support
stm:
- new revision support
- fb handover support
mediatek:
- transfer display binding document to yaml format.
- add mt8195 display device binding.
- allow commands to be sent during video mode.
- add wait_for_event for crtc disable by cmdq.
tegra:
- YUV format support
rcar-du:
- LVDS support for M3-W+ (R8A77961)
exynos:
- BGR pixel format for FIMD device
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmI71h4ACgkQDHTzWXnE
hr6wKg//SvKFiEOhptua8Ao8XYkhXpg1/tgdAs4D7bZ0YgJyF4Im0RuFOKMmF3mN
0Y8AwguqrsmrOAFbK8B1WEysB66DmGlZN/V2Q75X7fui8xs4uGF2Fcxyr+265zhf
vONPwAoxYr+KXqwOI1p1BP2QEL6bJTdu+nrXRsXIBIrWnw8ehXJlw3fDhgvG5QBn
RPdbU7lQnd47hdYxkbe5SiZvWnPC46dJmpqsRJir0xjskR6juU36f34C4IKhTGwO
NDPeWVgusVXtIC/F4X6RebCWG0f66h+CUFa9zeYIleI/2/5yZWXfcw6Obx8HgPkt
gieiI0R4TpkVxeHCApCQ5UpxWgfSOXdoDoyw172bKQw7JCHVEkSwenyMEEwNet6r
SCJrRmlB1PBI/iTWmhm9qgrU46ZZyAnQoTlCsXGzJncdP3hzGlA1embl00yfEl7f
wzM35N20qd5T4VKUEF8QYF0fLZYmKw4cWVASu4hQ3qmGal6frilphz2J8JK8hQNq
KhFqNbVTnZsQNr9LBCbrf0kOPaMzpmW+2vQG9ApdAb1N3gNPZT7ctti0Xq5N2OUR
AipWFAsDPS2NPADKmBtDU55PgFH9MqUIsoHHXLV4Qi76dvCqYoN68qRQxrL7rpSu
b0gr0YKU2QcIB/uytjOPHcgtI5Xvrh+q8JPz/dJ38/Esgjmk4wo=
=uRsT
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2022-03-24' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Lots of work all over, Intel improving DG2 support, amdkfd CRIU
support, msm new hw support, and faster fbdev support.
dma-buf:
- rename dma-buf-map to iosys-map
core:
- move buddy allocator to core
- add pci/platform init macros
- improve EDID parser deep color handling
- EDID timing type 7 support
- add GPD Win Max quirk
- add yes/no helpers to string_helpers
- flatten syncobj chains
- add nomodeset support to lots of drivers
- improve fb-helper clipping support
- add default property value interface
fbdev:
- improve fbdev ops speed
ttm:
- add a backpointer from ttm bo->ttm resource
dp:
- move displayport headers
- add a dp helper module
bridge:
- anx7625 atomic support, HDCP support
panel:
- split out panel-lvds and lvds bindings
- find panels in OF subnodes
privacy:
- add chromeos privacy screen support
fb:
- hot unplug fw fb on forced removal
simpledrm:
- request region instead of marking ioresource busy
- add panel oreintation property
udmabuf:
- fix oops with 0 pages
amdgpu:
- power management code cleanup
- Enable freesync video mode by default
- RAS code cleanup
- Improve VRAM access for debug using SDMA
- SR-IOV rework special register access and fixes
- profiling power state request ioctl
- expose IP discovery via sysfs
- Cyan skillfish updates
- GC 10.3.7, SDMA 5.2.7, DCN 3.1.6 updates
- expose benchmark tests via debugfs
- add module param to disable XGMI for testing
- GPU reset debugfs register dumping support
amdkfd:
- CRIU support
- SDMA queue fixes
radeon:
- UVD suspend fix
- iMac backlight fix
i915:
- minimal parallel submission for execlists
- DG2-G12 subplatform added
- DG2 programming workarounds
- DG2 accelerated migration support
- flat CCS and CCS engine support for XeHP
- initial small BAR support
- drop fake LMEM support
- ADL-N PCH support
- bigjoiner updates
- introduce VMA resources and async unbinding
- register definitions cleanups
- multi-FBC refactoring
- DG1 OPROM over SPI support
- ADL-N platform enabling
- opregion mailbox #5 support
- DP MST ESI improvements
- drm device based logging
- async flip optimisation for DG2
- CPU arch abstraction fixes
- improve GuC ADS init to work on aarch64
- tweak TTM LRU priority hint
- GuC 69.0.3 support
- remove short term execbuf pins
nouveau:
- higher DP/eDP bitrates
- backlight fixes
msm:
- dpu + dp support for sc8180x
- dp support for sm8350
- dpu + dsi support for qcm2290
- 10nm dsi phy tuning support
- bridge support for dp encoder
- gpu support for additional 7c3 SKUs
ingenic:
- HDMI support for JZ4780
- aux channel EDID support
ast:
- AST2600 support
- add wide screen support
- create DP/DVI connectors
omapdrm:
- fix implicit dma_buf fencing
vc4:
- add CSC + full range support
- better display firmware handoff
panfrost:
- add initial dual-core GPU support
stm:
- new revision support
- fb handover support
mediatek:
- transfer display binding document to yaml format.
- add mt8195 display device binding.
- allow commands to be sent during video mode.
- add wait_for_event for crtc disable by cmdq.
tegra:
- YUV format support
rcar-du:
- LVDS support for M3-W+ (R8A77961)
exynos:
- BGR pixel format for FIMD device"
* tag 'drm-next-2022-03-24' of git://anongit.freedesktop.org/drm/drm: (1529 commits)
drm/i915/display: Do not re-enable PSR after it was marked as not reliable
drm/i915/display: Fix HPD short pulse handling for eDP
drm/amdgpu: Use drm_mode_copy()
drm/radeon: Use drm_mode_copy()
drm/amdgpu: Use ternary operator in `vcn_v1_0_start()`
drm/amdgpu: Remove pointless on stack mode copies
drm/amd/pm: fix indenting in __smu_cmn_reg_print_error()
drm/amdgpu/dc: fix typos in comments
drm/amdgpu: fix typos in comments
drm/amd/pm: fix typos in comments
drm/amdgpu: Add stolen reserved memory for MI25 SRIOV.
drm/amdgpu: Merge get_reserved_allocation to get_vbios_allocations.
drm/amdkfd: evict svm bo worker handle error
drm/amdgpu/vcn: fix vcn ring test failure in igt reload test
drm/amdgpu: only allow secure submission on rings which support that
drm/amdgpu: fixed the warnings reported by kernel test robot
drm/amd/display: 3.2.177
drm/amd/display: [FW Promotion] Release 0.0.108.0
drm/amd/display: Add save/restore PANEL_PWRSEQ_REF_DIV2
drm/amd/display: Wait for hubp read line for Pollock
...
Merge more updates from Andrew Morton:
"Various misc subsystems, before getting into the post-linux-next
material.
41 patches.
Subsystems affected by this patch series: procfs, misc, core-kernel,
lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
taskstats, panic, kcov, resource, and ubsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
kernel/resource: fix kfree() of bootmem memory again
kcov: properly handle subsequent mmap calls
kcov: split ioctl handling into locked and unlocked parts
panic: move panic_print before kmsg dumpers
panic: add option to dump all CPUs backtraces in panic_print
docs: sysctl/kernel: add missing bit to panic_print
taskstats: remove unneeded dead assignment
kasan: no need to unset panic_on_warn in end_report()
ubsan: no need to unset panic_on_warn in ubsan_epilogue()
panic: unset panic_on_warn inside panic()
docs: kdump: add scp example to write out the dump file
docs: kdump: update description about sysfs file system support
arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
cgroup: use irqsave in cgroup_rstat_flush_locked().
fat: use pointer to simple type in put_user()
minix: fix bug when opening a file with O_DIRECT
...
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout
the stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower
iTLB pressure and prevents identity mapping huge pages from
getting split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop
the user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if called
from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling
of TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
/DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
=ELpe
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The sprinkling of SPI drivers is because we added a new one and Mark
sent us a SPI driver interface conversion pull request.
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the
stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower iTLB
pressure and prevents identity mapping huge pages from getting
split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the
user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if
called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of
TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
llc: fix netdevice reference leaks in llc_ui_bind()
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
ice: fix 'scheduling while atomic' on aux critical err interrupt
net/sched: fix incorrect vlan_push_eth dest field
net: bridge: mst: Restrict info size queries to bridge ports
net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
drivers: net: xgene: Fix regression in CRC stripping
net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
net: dsa: fix missing host-filtered multicast addresses
net/mlx5e: Fix build warning, detected write beyond size of field
iwlwifi: mvm: Don't fail if PPAG isn't supported
selftests/bpf: Fix kprobe_multi test.
Revert "rethook: x86: Add rethook x86 implementation"
Revert "arm64: rethook: Add arm64 rethook implementation"
Revert "powerpc: Add rethook support"
Revert "ARM: rethook: Add rethook arm implementation"
netdevice: add missing dm_private kdoc
net: bridge: mst: prevent NULL deref in br_mst_info_size()
selftests: forwarding: Use same VRF for port and VLAN upper
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmI6MhgTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXiBkB/9FEaebcytvBMNVnz/fprfQWfYxdrUB
wmDB9dnJVPX6UcWg6DuDv7KqatimDp7JrnKrNsrqlgF6Wafn/wI3Cdf8ZUi8CgtZ
Walq8m2QM9j8WCm4EtgmluzGkHjAn6llmjE7XoLmo/A8MYp0FZhJzghKfCEu54EB
9DhgqQX+zzy7n32CxoAWrGrHegnGeqMpj3rhQ4qVuNn+bOzZ/wTyJQPmXTu4hSW1
0PrXRnfzqnODkB6QDUantABZM5bh1VsqwIlENp+xAFQVsQpsc35lW7AUg7JXczVn
ES0+UHG7wVWPJCtIfD4FgIFT1xekPKxRYX9U66LMRbWGYhXfohQX+wFt
=u62V
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20220322' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
"Minor patches from various people"
* tag 'hyperv-next-signed-20220322' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Output host build info as normal Windows version number
hv_balloon: rate-limit "Unhandled message" warning
drivers: hv: log when enabling crash_kexec_post_notifiers
hv_utils: Add comment about max VMbus packet size in VSS driver
Drivers: hv: Compare cpumasks and not their weights in init_vp_index()
Drivers: hv: Rename 'alloced' to 'allocated'
Drivers: hv: vmbus: Use struct_size() helper in kmalloc()
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.
Link: https://lkml.kernel.org/r/20211206160514.2000-4-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good. This
was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly
tricky and error-prone code.
There is a small merge conflict against a parisc cleanup, the
solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel. The
hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
There are some obvious conflicts against changes to the removed
files.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
=vtCN
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
ARCH_REQ_XCOMP_PERM is supposed to add the requested feature to the
permission bitmap of thread_group_leader()->fpu. But the code overwrites
the bitmap with the requested feature bit only rather than adding it.
Fix the code to add the requested feature bit to the master bitmask.
Fixes: db8268df09 ("x86/arch_prctl: Add controls for dynamic XSTATE components")
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <bonzini@gnu.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220129173647.27981-2-chang.seok.bae@intel.com
... and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls. All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().
This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.
Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.
The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device. The latter should be the case for the current users of
topology_init().
Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64)
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the hwpoison page meets the filter conditions, it should not be
regarded as successful memory_failure() processing for mce handler, but
should return a distinct value, otherwise mce handler regards the error
page has been identified and isolated, which may lead to calling
set_mce_nospec() to change page attribute, etc.
Here memory_failure() return -EOPNOTSUPP to indicate that the error
event is filtered, mce handler should not take any action for this
situation and hwpoison injector should treat as correct.
Link: https://lkml.kernel.org/r/20220223082135.2769649-1-luofei@unicloud.com
Signed-off-by: luofei <luofei@unicloud.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler build,
from the fast-headers tree - including placeholder *_api.h headers for
later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI5rg8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hGrw/+M3QOk6fH7G48wjlNnBvcOife6ls+Ni4k
ixOAcF4JKoixO8HieU5vv0A7yf/83tAa6fpeXeMf1hkCGc0NSlmLtuIux+WOmoAL
LzCyDEYfiP8KnVh0A1Tui/lK0+AkGo21O6ADhQE2gh8o2LpslOHQMzvtyekSzeeb
mVxMYQN+QH0m518xdO2D8IQv9ctOYK0eGjmkqdNfntOlytypPZHeNel/tCzwklP/
dElJUjNiSKDlUgTBPtL3DfpoLOI/0mHF2p6NEXvNyULxSOqJTu8pv9Z2ADb2kKo1
0D56iXBDngMi9MHIJLgvzsA8gKzHLFSuPbpODDqkTZCa28vaMB9NYGhJ643NtEie
IXTJEvF1rmNkcLcZlZxo0yjL0fjvPkczjw4Vj27gbrUQeEBfb4mfuI4BRmij63Ep
qEkgQTJhduCqqrQP1rVyhwWZRk1JNcVug+F6N42qWW3fg1xhj0YSrLai2c9nPez6
3Zt98H8YGS1Z/JQomSw48iGXVqfTp/ETI7uU7jqHK8QcjzQ4lFK5H4GZpwuqGBZi
NJJ1l97XMEas+rPHiwMEN7Z1DVhzJLCp8omEj12QU+tGLofxxwAuuOVat3CQWLRk
f80Oya3TLEgd22hGIKDRmHa22vdWnNQyS0S15wJotawBzQf+n3auS9Q3/rh979+t
ES/qvlGxTIs=
=Z8uT
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler
build, from the fast-headers tree - including placeholder *_api.h
headers for later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per
node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
sched/headers: ARM needs asm/paravirt_api_clock.h too
sched/numa: Fix boot crash on arm64 systems
headers/prep: Fix header to build standalone: <linux/psi.h>
sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
cgroup: Fix suspicious rcu_dereference_check() usage warning
sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
sched/deadline,rt: Remove unused functions for !CONFIG_SMP
sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
sched/deadline: Remove unused def_dl_bandwidth
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
sched/tracing: Don't re-read p->state when emitting sched_switch event
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
sched/cpuacct: Remove redundant RCU read lock
sched/cpuacct: Optimize away RCU read lock
sched/cpuacct: Fix charge percpu cpuusage
sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
...
- Allow device_pm_check_callbacks() to be called from interrupt
context without issues (Dmitry Baryshkov).
- Modify devm_pm_runtime_enable() to automatically handle
pm_runtime_dont_use_autosuspend() at driver exit time (Douglas
Anderson).
- Make the schedutil cpufreq governor use to_gov_attr_set() instead
of open coding it (Kevin Hao).
- Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
cpufreq longhaul driver (Rafael Wysocki).
- Unify show() and store() naming in cpufreq and make it use
__ATTR_XX (Lianjie Zhang).
- Make the intel_pstate driver use the EPP value set by the firmware
by default (Srinivas Pandruvada).
- Re-order the init checks in the powernow-k8 cpufreq driver (Mario
Limonciello).
- Make the ACPI processor idle driver check for architectural
support for LPI to avoid using it on x86 by mistake (Mario
Limonciello).
- Add Sapphire Rapids Xeon support to the intel_idle driver (Artem
Bityutskiy).
- Add 'preferred_cstates' module argument to the intel_idle driver
to work around C1 and C1E handling issue on Sapphire Rapids (Artem
Bityutskiy).
- Add core C6 optimization on Sapphire Rapids to the intel_idle
driver (Artem Bityutskiy).
- Optimize the haltpoll cpuidle driver a bit (Li RongQing).
- Remove leftover text from intel_idle() kerneldoc comment and fix
up white space in intel_idle (Rafael Wysocki).
- Fix load_image_and_restore() error path (Ye Bin).
- Fix typos in comments in the system wakeup hadling code (Tom Rix).
- Clean up non-kernel-doc comments in hibernation code (Jiapeng
Chong).
- Fix __setup handler error handling in system-wide suspend and
hibernation core code (Randy Dunlap).
- Add device name to suspend_report_result() (Youngjin Jang).
- Make virtual guests honour ACPI S4 hardware signature by
default (David Woodhouse).
- Block power off of a parent PM domain unless child is in deepest
state (Ulf Hansson).
- Use dev_err_probe() to simplify error handling for generic PM
domains (Ahmad Fatoum).
- Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).
- Document Intel uncore frequency scaling (Srinivas Pandruvada).
- Add DTPM hierarchy description (Daniel Lezcano).
- Change the locking scheme in DTPM (Daniel Lezcano).
- Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer
release (Daniel Lezcano).
- Make dtpm_node_callback[] static (kernel test robot).
- Fix spelling mistake "initialze" -> "initialize" in
dtpm_create_hierarchy() (Colin Ian King).
- Add tracer tool for the amd-pstate driver (Jinzhou Su).
- Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).
- Add AMD P-State support to the cpupower utility (Huang Rui).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmI4pM4SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxh5wQAJEz3u55wIHzeov30obtXaD3SxxnvRzR
p96gRcmNoR2so/Q9D+h+JHZKQkVklbnbqExMXQn1qarceAUN7KPjVMRvagjZsC/f
J3LtQmx96yqGTCzOTu5n+Ol2ojKLMCMo++no/2873BYhd60TV6oQxRzkNiZx215n
tT6MKY5ZMX448VKWAWh9vt5rdvbBj9z6cfvpchK/3bziE21lfLz/1iXeFnwqjPGU
XuA7NYbVAHOfsdHZk19+4qAgm8EYkmjd4/J8HDlb7XouyLuUGy8KJZYhSrJKiQ1C
f9f2Zw0925/YpBmFXOwxuYWP9KjFKlq7Cdr3SSgVGDOvgyRtpeV4fU8Y6WPFCtEV
fQdKr9/4KQP6hwUpxJZucSf49wcnyh7hFDMxrwVVcL96yXZef1OqG3ITihJY/n4J
+wDnpR2VqBeiG5NyECjk3mPROZGFfUlHRsqMd3JOswMpGF5phpEI9nNFcayB262S
Rkgcb3MacFVsuo/ZBdzCUTZ6ECvjxZn4FGZPxumkp65SJO18gOPbqs8qfGCZ3Tgb
GDy0CWEOv/KuGnks1CkBGok2Z4q8s2GcZmaOp9BiPjxKJD71i4uPtiGA/5Ahb6cm
Cu0G7Ub/t2Vc93E7mnTE4hh2IuiAN73yB5teM4YNllHw6f+aqVGlvJktIMpShajo
eEBNFlkwljyz
=WlR9
-----END PGP SIGNATURE-----
Merge tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These are mostly fixes and cleanups all over the code and a new piece
of documentation for Intel uncore frequency scaling.
Functionality-wise, the intel_idle driver will support Sapphire Rapids
Xeons natively now (with some extra facilities for controlling
C-states more precisely on those systems), virtual guests will take
the ACPI S4 hardware signature into account by default, the
intel_pstate driver will take the defualt EPP value from the firmware,
cpupower utility will support the AMD P-state driver added in the
previous cycle, and there is a new tracer utility for that driver.
Specifics:
- Allow device_pm_check_callbacks() to be called from interrupt
context without issues (Dmitry Baryshkov).
- Modify devm_pm_runtime_enable() to automatically handle
pm_runtime_dont_use_autosuspend() at driver exit time (Douglas
Anderson).
- Make the schedutil cpufreq governor use to_gov_attr_set() instead
of open coding it (Kevin Hao).
- Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
cpufreq longhaul driver (Rafael Wysocki).
- Unify show() and store() naming in cpufreq and make it use
__ATTR_XX (Lianjie Zhang).
- Make the intel_pstate driver use the EPP value set by the firmware
by default (Srinivas Pandruvada).
- Re-order the init checks in the powernow-k8 cpufreq driver (Mario
Limonciello).
- Make the ACPI processor idle driver check for architectural support
for LPI to avoid using it on x86 by mistake (Mario Limonciello).
- Add Sapphire Rapids Xeon support to the intel_idle driver (Artem
Bityutskiy).
- Add 'preferred_cstates' module argument to the intel_idle driver to
work around C1 and C1E handling issue on Sapphire Rapids (Artem
Bityutskiy).
- Add core C6 optimization on Sapphire Rapids to the intel_idle
driver (Artem Bityutskiy).
- Optimize the haltpoll cpuidle driver a bit (Li RongQing).
- Remove leftover text from intel_idle() kerneldoc comment and fix up
white space in intel_idle (Rafael Wysocki).
- Fix load_image_and_restore() error path (Ye Bin).
- Fix typos in comments in the system wakeup hadling code (Tom Rix).
- Clean up non-kernel-doc comments in hibernation code (Jiapeng
Chong).
- Fix __setup handler error handling in system-wide suspend and
hibernation core code (Randy Dunlap).
- Add device name to suspend_report_result() (Youngjin Jang).
- Make virtual guests honour ACPI S4 hardware signature by default
(David Woodhouse).
- Block power off of a parent PM domain unless child is in deepest
state (Ulf Hansson).
- Use dev_err_probe() to simplify error handling for generic PM
domains (Ahmad Fatoum).
- Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).
- Document Intel uncore frequency scaling (Srinivas Pandruvada).
- Add DTPM hierarchy description (Daniel Lezcano).
- Change the locking scheme in DTPM (Daniel Lezcano).
- Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer
release (Daniel Lezcano).
- Make dtpm_node_callback[] static (kernel test robot).
- Fix spelling mistake "initialze" -> "initialize" in
dtpm_create_hierarchy() (Colin Ian King).
- Add tracer tool for the amd-pstate driver (Jinzhou Su).
- Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).
- Add AMD P-State support to the cpupower utility (Huang Rui)"
* tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (58 commits)
cpufreq: powernow-k8: Re-order the init checks
cpuidle: intel_idle: Drop redundant backslash at line end
cpuidle: intel_idle: Update intel_idle() kerneldoc comment
PM: hibernate: Honour ACPI hardware signature by default for virtual guests
cpufreq: intel_pstate: Use firmware default EPP
cpufreq: unify show() and store() naming and use __ATTR_XX
PM: core: keep irq flags in device_pm_check_callbacks()
cpuidle: haltpoll: Call cpuidle_poll_state_init() later
Documentation: amd-pstate: add tracer tool introduction
tools/power/x86/amd_pstate_tracer: Add tracer tool for AMD P-state
tools/power/x86/intel_pstate_tracer: make tracer as a module
cpufreq: amd-pstate: Add more tracepoint for AMD P-State module
PM: sleep: Add device name to suspend_report_result()
turbostat: fix PC6 displaying on some systems
intel_idle: add core C6 optimization for SPR
intel_idle: add 'preferred_cstates' module argument
intel_idle: add SPR support
PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend()
ACPI: processor idle: Check for architectural support for LPI
cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
...
- Use uintptr_t and offsetof() in the ACPICA code to avoid compiler
warnings regarding NULL pointer arithmetic (Rafael Wysocki).
- Fix possible NULL pointer dereference in acpi_ns_walk_namespace()
when passed "acpi=off" in the command line (Rafael Wysocki).
- Fix and clean up acpi_os_read/write_port() (Rafael Wysocki).
- Introduce acpi_bus_for_each_dev() and use it for walking all ACPI
device objects in the Type C code (Rafael Wysocki).
- Fix the _OSC platform capabilities negotioation and prevent CPPC
from being used if the platform firmware indicates that it not
supported via _OSC (Rafael Wysocki).
- Use ida_alloc() instead of ida_simple_get() for ACPI enumeration
of devices (Rafael Wysocki).
- Add AGDI and CEDT to the list of known ACPI table signatures (Ilkka
Koskinen, Robert Kiraly).
- Add power management debug messages related to suspend-to-idle in
two places (Rafael Wysocki).
- Fix __acpi_node_get_property_reference() return value and clean up
that function (Andy Shevchenko, Sakari Ailus).
- Fix return value of the __setup handler in the ACPI PM timer clock
source driver (Randy Dunlap).
- Clean up double words in two comments (Tom Rix).
- Add "skip i2c clients" quirks for Lenovo Yoga Tablet 1050F/L and
Nextbook Ares 8 (Hans de Goede).
- Clean up frequency invariance handling on x86 in the ACPI CPPC
library (Huang Rui).
- Work around broken XSDT on the Advantech DAC-BJ01 board (Mark
Cilissen).
- Make wakeup events checks in the ACPI EC driver more
straightforward and clean up acpi_ec_submit_event() (Rafael
Wysocki).
- Make it possible to obtain the CPU capacity with the help of CPPC
information (Ionela Voinescu).
- Improve fine grained fan control in the ACPI fan driver and
document it (Srinivas Pandruvada).
- Add device HID and quirk for Microsoft Surface Go 3 to the ACPI
battery driver (Maximilian Luz).
- Make the ACPI driver for Intel SoCs (LPSS) let the SPI driver know
the exact type of the controller (Andy Shevchenko).
- Force native backlight mode on Clevo NL5xRU and NL5xNU (Werner
Sembach).
- Fix return value of __setup handlers in the APEI code (Randy
Dunlap).
- Add Arm Generic Diagnostic Dump and Reset device driver (Ilkka
Koskinen).
- Limit printable size of BERT table data (Darren Hart).
- Fix up HEST and GHES initialization (Shuai Xue).
- Update the ACPI device enumeration documentation and unify the ASL
style in GPIO-related examples (Andy Shevchenko).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmI4pF0SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxrPMP/A8kkgzJegS4CtUCtUpLcCufaggdpQTd
I9GQJeo73wGdmaelCQuXFJ9NUhuA1KHIU0WYqneWX+wifht+wl+KAZYvswPm0/wt
TiypiyRMf8Il0Q9tTTmWKSokK80O7ks8OZEe1HmiJimdEn+F1XUzLLgbQKFqhbbV
NHkVix3xR/7htgSb0ksaijH3XLyStuwPvc4WFueO14Pp5Bkr2Of33Xdd0UYeTCi4
RUqL3qJ4DT5gvgKipg43y6D2igRq/xMKx1bgnBjtwKChtjK23GGR6UB/jAIitIMv
XpxLw7kceY65zjJmmJ1+OKeM6CNAcIbTeyCyffSAH/MYRObj93XpMjnhxXILzjYB
Pz2U/lJy0kgw0PUkFzTdPkuuJlDn5GLY8F2cytvtlQAIhtFVFFcnHZYfhhLRWpoN
Sta2NHpGRejR/jixkQ4JtsjQ/Og02zQ9N344enaC64h3JYPBSyM8mpLH/YoXnuSx
jDPQK1KE/QVXRixKFjrPSXYq2p7w/CH7yZXX7TOo+ScnLhapiSUpyh7wiFslZ729
v11yzjsgBQk27qf1EGSImsh+YoRck9qOTb9tkVXGxcifTUPYzyXGn4T5i/ZwpN9v
nL6imYuiRJjFNAksbWo72hjYfhNwWAoCIXgUuxroCPLGGT394j5djisHYMjDNAsG
x43D1Fd4vEgT
=uB8P
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"From the new functionality perspective, the most significant items
here are the new driver for the 'ARM Generic Diagnostic Dump and
Reset' device, the extension of fine grain fan control in the ACPI fan
driver, and the change making it possible to use CPPC information to
obtain CPU capacity.
There are also a few new quirks, a bunch of fixes, including the
platform-level _OSC handling change to make it actually take the
platform firmware response into account, some code and documentation
cleanups, and a notable update of the ACPI device enumeration
documentation.
Specifics:
- Use uintptr_t and offsetof() in the ACPICA code to avoid compiler
warnings regarding NULL pointer arithmetic (Rafael Wysocki).
- Fix possible NULL pointer dereference in acpi_ns_walk_namespace()
when passed "acpi=off" in the command line (Rafael Wysocki).
- Fix and clean up acpi_os_read/write_port() (Rafael Wysocki).
- Introduce acpi_bus_for_each_dev() and use it for walking all ACPI
device objects in the Type C code (Rafael Wysocki).
- Fix the _OSC platform capabilities negotioation and prevent CPPC
from being used if the platform firmware indicates that it not
supported via _OSC (Rafael Wysocki).
- Use ida_alloc() instead of ida_simple_get() for ACPI enumeration of
devices (Rafael Wysocki).
- Add AGDI and CEDT to the list of known ACPI table signatures (Ilkka
Koskinen, Robert Kiraly).
- Add power management debug messages related to suspend-to-idle in
two places (Rafael Wysocki).
- Fix __acpi_node_get_property_reference() return value and clean up
that function (Andy Shevchenko, Sakari Ailus).
- Fix return value of the __setup handler in the ACPI PM timer clock
source driver (Randy Dunlap).
- Clean up double words in two comments (Tom Rix).
- Add "skip i2c clients" quirks for Lenovo Yoga Tablet 1050F/L and
Nextbook Ares 8 (Hans de Goede).
- Clean up frequency invariance handling on x86 in the ACPI CPPC
library (Huang Rui).
- Work around broken XSDT on the Advantech DAC-BJ01 board (Mark
Cilissen).
- Make wakeup events checks in the ACPI EC driver more
straightforward and clean up acpi_ec_submit_event() (Rafael
Wysocki).
- Make it possible to obtain the CPU capacity with the help of CPPC
information (Ionela Voinescu).
- Improve fine grained fan control in the ACPI fan driver and
document it (Srinivas Pandruvada).
- Add device HID and quirk for Microsoft Surface Go 3 to the ACPI
battery driver (Maximilian Luz).
- Make the ACPI driver for Intel SoCs (LPSS) let the SPI driver know
the exact type of the controller (Andy Shevchenko).
- Force native backlight mode on Clevo NL5xRU and NL5xNU (Werner
Sembach).
- Fix return value of __setup handlers in the APEI code (Randy
Dunlap).
- Add Arm Generic Diagnostic Dump and Reset device driver (Ilkka
Koskinen).
- Limit printable size of BERT table data (Darren Hart).
- Fix up HEST and GHES initialization (Shuai Xue).
- Update the ACPI device enumeration documentation and unify the ASL
style in GPIO-related examples (Andy Shevchenko)"
* tag 'acpi-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
clocksource: acpi_pm: fix return value of __setup handler
ACPI: bus: Avoid using CPPC if not supported by firmware
Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
arm64, topology: enable use of init_cpu_capacity_cppc()
arch_topology: obtain cpu capacity using information from CPPC
x86, ACPI: rename init_freq_invariance_cppc() to arch_init_invariance_cppc()
ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
ACPI: tables: Add AGDI to the list of known table signatures
ACPI/APEI: Limit printable size of BERT table data
ACPI: docs: gpio-properties: Unify ASL style for GPIO examples
ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
ACPI: APEI: fix return value of __setup handlers
x86/ACPI: CPPC: Move init_freq_invariance_cppc() into x86 CPPC
x86: Expose init_freq_invariance() to topology header
x86/ACPI: CPPC: Move AMD maximum frequency ratio setting function into x86 CPPC
x86/ACPI: CPPC: Rename cppc_msr.c to cppc.c
ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L
ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8
ACPICA: Avoid walking the ACPI Namespace if it is not there
...
- Simplify the PASID handling to allocate the PASID once, associate it to
the mm of a process and free it on mm_exit(). The previous attempt of
refcounted PASIDs and dynamic alloc()/free() turned out to be error
prone and too complex. The PASID space is 20bits, so the case of
resource exhaustion is a pure academic concern.
- Populate the PASID MSR on demand via #GP to avoid racy updates via IPIs.
- Reenable ENQCMD and let objtool check for the forbidden usage of ENQCMD
in the kernel.
- Update the documentation for Shared Virtual Addressing accordingly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmI4WpETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUfnD/0bY94rgEX4Uuy/mFQ1W8X8XlcyKrha
0/cRATb+4QV/pwJgGr2nClKhGlFMYPdJLvKMC1TCUPCVrLD1RNmluIZoFzeqXwhm
jDdCcFOuGZ2D4ujDPWwOOpKBT1ytovnQa7+lH6QJyKkEqdcC2ncOvGJQoiRxRQIG
8wTVs/OUvQJ5ZhSZQMKQN4uMWMyHEjhbroYS30/uNi/598jTPgzlEoa14XocQ9Os
nS6ALvjuc9MsJ34F61etMaJU1ZMI3Wx75u9QjEvX6hmJs87YdvgwE7lzJUKFDEuh
gewM0wp2fTa8/azzP0eMiHTin56PqFdmllzRqXmilbZMEPOeI29dZVArCdpKcAn0
r9p1kJUT3Xl2G3Oir/OdCaaQHcznD1Y5ZFOyh12wgEucZ/rdeSr7nq7n5HoOL5Bw
Q2o6YvTkE9DOL0nTN1lSXGiPspou7fzX0uUcRBrbJUS3sBv4zGIlaJXUaTVnSdAt
VZj4LeOK7v2BjyeiOY0iaaIQd3xjmLUF0UjozXS5M13SoVcToZRbyWqhDzPvNuKA
imQb/dnFpXhABgmuqAiJLeqM0VtGMFNc780OURkcsBSPng+iSEdV4DzuhK0jpU8x
Uk1RuGMd/vgmrlDFBrw+orQQiiKR1ixpI0LiHfcOBycfJhqTwcnrNZvAN5/do28Z
E23+QzlUbZF0cw==
=Dy8V
-----END PGP SIGNATURE-----
Merge tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PASID support from Thomas Gleixner:
"Reenable ENQCMD/PASID support:
- Simplify the PASID handling to allocate the PASID once, associate
it to the mm of a process and free it on mm_exit().
The previous attempt of refcounted PASIDs and dynamic
alloc()/free() turned out to be error prone and too complex. The
PASID space is 20bits, so the case of resource exhaustion is a pure
academic concern.
- Populate the PASID MSR on demand via #GP to avoid racy updates via
IPIs.
- Reenable ENQCMD and let objtool check for the forbidden usage of
ENQCMD in the kernel.
- Update the documentation for Shared Virtual Addressing accordingly"
* tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/x86: Update documentation for SVA (Shared Virtual Addressing)
tools/objtool: Check for use of the ENQCMD instruction in the kernel
x86/cpufeatures: Re-enable ENQCMD
x86/traps: Demand-populate PASID MSR via #GP
sched: Define and initialize a flag to identify valid PASID in the task
x86/fpu: Clear PASID when copying fpstate
iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
kernel/fork: Initialize mm's PASID
iommu/ioasid: Introduce a helper to check for valid PASIDs
mm: Change CONFIG option for mm->pasid field
iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmI4l6UACgkQEsHwGGHe
VUonRw/7BQUx2A8O6mMtQDGzDVJlKobKb/6t+VhmdrC7YHUbMjgl8+Q9EjZIGkE4
EF3Wb/5gUmnVhWzB2y5veE1mX4wlcXs0HyGL5iSn7X4UYxpPLdCSTcAriyj6I/0F
fId9aTcb9dbiXGnK4F+tBRVphoe2uLCJUAdbNoprOBWUsmIjKaMewl+sLg8m1+f5
mtTY83koAtblSCtkUP/sLx/1RM+LO2uep11W20m44eRXkc8CJ3mOiMoLwCZGyyaM
66y/6i2QqPCWEUE8VUklrEM+gI/vGn1yb6DJo465aauHFabteYaiTa0kAiZ+xyOn
UlybAhL2nJlkCJ+mt2+FNujdvJu2z8MGNTuQPgI0CeYlRllGuvvNfkpP6PLhqE+c
HUfNbmJgB163c3w8QOlppkmCImiup+wtm8r3w4h0brD68sqnRv1AkXp83hC4MBNP
k1/S/3GCLOFjq6LHvZJycq8r1NpbNPKGNq81kjNKobfWZHX3fEclVGJGjiDkJQhC
VA4hCtIUnpagpMHwPHZ9fdHROHWCDJjLaEY5L/qiGnrBJPfwVpbmRv86k8kE+hJu
IgoqRF1DSWMlhg3lNKGnoodTvgrWJM/HZgp/exrY0/N83AMatcWmaAlwWkrGQpGY
HdnSKzXSHIwLFlf7WiVCoUDpRU4zRZzUUFm3mMqgdAJV8mwufsA=
=EF36
-----END PGP SIGNATURE-----
Merge tag 'x86_cleanups_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- Remove a misleading message and an unused function
* tag 'x86_cleanups_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi: Remove the 'strange power saving mode' hint from unknown NMI handler
x86/pat: Remove the unused set_pages_array_wt() function
vendors instead of proliferating home-grown solutions for technologies
which are pretty similar
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmI4kiQACgkQEsHwGGHe
VUojvA//QD5VxsqPq+RAQWAFiWGHCpFed2szc2Q5eAZj6CEmXcqBOdTqaoHpJpVl
L1uvB6oLq8WTOea0V3xGu1kfiLRuq1fo0mqZeTxe3iZ3kUk/SU0wGfTLDECB58mI
P5A+CZFiAk4XJ/kRqJWNxmd5kIDjhlCx4ysVbPl1vm/qfS6FEGb5HUr317kbOYwK
zw5cEajnYu2KA6bI8nGuy30vmvn97gpy98vCiCzKrcBPggO8WHiJ+kqD72BhP5em
z7mh4aFrAPVbIMqd/Xb5La3zvP7Vii4Tz9mSUsKy/Ige+ghFZQ18LPk2yANvmWeN
hIFDqSsESR2go0tKvSrzPln8h93hKx/TPbiF9jVMISBZFdWCGQvzCrYDHqzHFQJ1
zHw0lxdFQimfhs5YlEumZCqq2Dc7w3OGCVfP22+t7pNhnixPT3Dlie0Ya6z/aXV3
VNcqckDDZLijQlf0iPhbw2fBs9ErTcB3OXHKmX78Zxb4hP4WJx8QK4lMPzFkPd9H
bTEquYQWIPsjdRTlMl50nCpNHtAzo56H01G6ZPPx/5Y7Lt38UXJERfdqBhQjNF6F
ILPMrOn/BHU9snlqSCh7SxhRiRdafThIJHsi5zQrDC4rPvlwi5kinIzGnPyOuDbO
qwwnPOzx855/Zw0swKrQRXaxU7lwGKo529yKZWt7r8WB12tSOao=
=zWVD
-----END PGP SIGNATURE-----
Merge tag 'x86_cc_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 confidential computing updates from Borislav Petkov:
- Add shared confidential computing code which will be used by both
vendors instead of proliferating home-grown solutions for
technologies (SEV/SNP and TDX) which are pretty similar
* tag 'x86_cc_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/cpa: Generalize __set_memory_enc_pgtable()
x86/coco: Add API to handle encryption mask
x86/coco: Explicitly declare type of confidential computing platform
x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmI4VSgACgkQEsHwGGHe
VUr2ew/9HyXJu2Yp39Gd/UVz/pAGpnlkw+d57lGTeQ2MilvSmJ6t5pE+F6XQ6qX9
jRzi5VOpWRSNQ6YzZHiVXCdgL49kNvJQTpnDXDsdcYn7zxynSp1d1G7YfJpI0eC2
9dho37eLnoXDY0RsATYiZsN0v1tD92WktYVgUPMZBUj6jysNZK8FeBOkyEhJLaUu
LiFwET7A+Odg3ioek34v4mbGJ0pTobHOxF125b+Rq/wHKu3LjKGvj/tiYdTh32oo
R0HgSszEQBfPn5lfjw1YodDTS3N4BvhWQZwlOdMr8GwlgynVnBC381PSObRVpPT1
nKQizDnyE0YKHPn0Mq58vN92D+ulmfS58vDCotRYHSUwwuUXDt2klr/y62e4Vh90
cWlWxzojrj8Dta44ofw87oESJfu5jJ7zQmeMKSjuud3CLsaqkPKcwXsiBwgV5glu
MA9QBfnbRjoXr9CkDcii17K3S5udY3pfkNSE9P6J+GwBgfOLVeE91WLaGDc3wChf
tC9bLlV5F71gCpoZYSBdCKsVaV+8ZHeHjH9UTmkGH9zxKEkP0HEdxBSkL4uLhrCt
NcU/Xt57O/SrzVu23XPufbI52PanGYHs7kNl1NNLthaIXgg43Mtx+c5rI0uWOFK7
4vFbJEtCvqOBEnVHpPjiUHH30ihSnx6D8N13trOdu67NtUXrxPc=
=PVnz
-----END PGP SIGNATURE-----
Merge tag 'x86_sev_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV fix from Borislav Petkov:
- Add a missing function section annotation
* tag 'x86_sev_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/head64: Add missing __head annotation to sme_postprocess_startup()
Add the PPIN number to sysfs so that sockets can be identified when
replacement is needed
- Minor fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmI4UBcACgkQEsHwGGHe
VUoDQw/9E2WVsLS+iVngYI5hY+LQbbeOLPt9sqgdf8U/4tQPwJfAKYRALQ3FvjbJ
XKEqcHoCIBH0XQSC0TPpBF96VABr5Vrc/knV8x2OWJ82p54beB0rXh5mdsnsrVMQ
7Gi00iEgZ1kw7TwRN7rMzUpudNOr7C/SSQL535hyZA4NT9QkNBObZWHMnojVnEmr
AW1TY54xXSpu7xDY5ari5NGeSgvu1PIsCy0EKMK/SFLEpKQW4lt+lUe1aSieMarj
xgnfIyGW3SUbadwlvLbIqVpR0RQBDLTabx8nyXnJAVZlwuAfioRUGL+Z4GFA0Y7q
uDofxuScBAea3sPPFAAIoh13y595TjowBX7pHA1sqjWLmFKt6Qqz5dq1uBVEvIYw
uTAQ/igJ4N2jq03jwnAw1LUAES5azSseCsiQxR7oqzK9KaRlptxHTAqhjqsgpIp4
VLdYtgkzOEFiOsWsHWP1Dd+vzpMvTh5gtTXZuVcldo2D6scdcj+oaloHQ5XMiFu1
GKuyiY4EbkRcp9ZQ847xOn4knEg+aq9zL0tJoWWEMKfRQn6425TEOLqkIdc9QfeU
t63yqJ1q3NTjjzxzy/FdKwdoyOOQxeDl5YGPX3gZnj9X/0wgs+dHRmKp0o74SIg9
4h2kB69wRwn6rC09P2UkQVGpDL0mnif4ZAh61vRE+mS0zSNCkEA=
=MWVZ
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu feature updates from Borislav Petkov:
- Merge the AMD and Intel PPIN code into a shared one by both vendors.
Add the PPIN number to sysfs so that sockets can be identified when
replacement is needed
- Minor fixes and cleanups
* tag 'x86_cpu_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Clear SME feature flag when not in use
x86/cpufeatures: Put the AMX macros in the word 18 block
topology/sysfs: Add PPIN in sysfs under cpu topology
topology/sysfs: Add format parameter to macro defining "show" functions for proc
x86/cpu: Read/save PPIN MSR during initialization
x86/cpu: X86_FEATURE_INTEL_PPIN finally has a CPUID bit
x86/cpu: Merge Intel and AMD ppin_init() functions
x86/CPU/AMD: Use default_groups in kobj_type
Merge changes related to system sleep, PM domains changes and power
management documentation changes for 5.18-rc1:
- Fix load_image_and_restore() error path (Ye Bin).
- Fix typos in comments in the system wakeup hadling code (Tom Rix).
- Clean up non-kernel-doc comments in hibernation code (Jiapeng
Chong).
- Fix __setup handler error handling in system-wide suspend and
hibernation core code (Randy Dunlap).
- Add device name to suspend_report_result() (Youngjin Jang).
- Make virtual guests honour ACPI S4 hardware signature by
default (David Woodhouse).
- Block power off of a parent PM domain unless child is in deepest
state (Ulf Hansson).
- Use dev_err_probe() to simplify error handling for generic PM
domains (Ahmad Fatoum).
- Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).
- Document Intel uncore frequency scaling (Srinivas Pandruvada).
* pm-sleep:
PM: hibernate: Honour ACPI hardware signature by default for virtual guests
PM: sleep: Add device name to suspend_report_result()
PM: suspend: fix return value of __setup handler
PM: hibernate: fix __setup handler error handling
PM: hibernate: Clean up non-kernel-doc comments
PM: sleep: wakeup: Fix typos in comments
PM: hibernate: fix load_image_and_restore() error path
* pm-domains:
PM: domains: Fix sleep-in-atomic bug caused by genpd_debug_remove()
PM: domains: use dev_err_probe() to simplify error handling
PM: domains: Prevent power off for parent unless child is in deepest state
* pm-docs:
Documentation: admin-guide: pm: Document uncore frequency scaling
Merge ACPI power management changes, ACPI device properties handling
changes, x86-specific ACPI changes and miscellaneous ACPI changes for
5.18-rc1:
- Add power management debug messages related to suspend-to-idle in
two places (Rafael Wysocki).
- Fix __acpi_node_get_property_reference() return value and clean up
that function (Andy Shevchenko, Sakari Ailus).
- Fix return value of the __setup handler in the ACPI PM timer clock
source driver (Randy Dunlap).
- Clean up double words in two comments (Tom Rix).
- Add "skip i2c clients" quirks for Lenovo Yoga Tablet 1050F/L and
Nextbook Ares 8 (Hans de Goede).
- Clean up frequency invariance handling on x86 in the ACPI CPPC
library (Huang Rui).
- Work around broken XSDT on the Advantech DAC-BJ01 board (Mark
Cilissen).
* acpi-pm:
ACPI: EC / PM: Print additional debug message in acpi_ec_dispatch_gpe()
ACPI: PM: Print additional debug message in acpi_s2idle_wake()
* acpi-properties:
ACPI: property: Get rid of redundant 'else'
ACPI: properties: Consistently return -ENOENT if there are no more references
* acpi-misc:
clocksource: acpi_pm: fix return value of __setup handler
ACPI: clean up double words in two comments
* acpi-x86:
ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
x86/ACPI: CPPC: Move init_freq_invariance_cppc() into x86 CPPC
x86: Expose init_freq_invariance() to topology header
x86/ACPI: CPPC: Move AMD maximum frequency ratio setting function into x86 CPPC
x86/ACPI: CPPC: Rename cppc_msr.c to cppc.c
ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L
ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8
The ACPI specification says that OSPM should refuse to restore from
hibernate if the hardware signature changes, and should boot from
scratch. However, real BIOSes often vary the hardware signature in cases
where we *do* want to resume from hibernate, so Linux doesn't follow the
spec by default.
However, in a virtual environment there's no reason for the VMM to vary
the hardware signature *unless* it wants to trigger a clean reboot as
defined by the ACPI spec. So enable the check by default if a hypervisor
is detected.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The
Do you have a strange power saving mode enabled?
hint when unknown NMI happens dates back to i386 stone age, and isn't
currently really helpful.
Unknown NMIs are coming for many different reasons (broken firmware,
faulty hardware, ...) and rarely have anything to do with 'strange power
saving mode' (whatever that even is).
Just remove it as it's largerly misleading.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2203140924120.24795@cbobk.fhfr.pm
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmIuUskeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGCFkH/2n3mpGXuITp0ZXE
TNrpbdZOof5SgLw+w7THswXuo6m5yRGNKQs9fvIvDD8Vf7/OdQQfPOmF1cIE5+nk
wcz6aHKbdrok8Jql2qjJqWXZ5xbGj6qywg3zZrwOUsCKFP5p+AjBJcmZOsvQHjSp
ASODy1moOlK+nO52TrMaJw74a8xQPmQiNa+T2P+FedEYjlcRH/c7hLJ7GEnL6+cC
/R4bATZq3tiInbTBlkC0hR0iVNgRXwXNyv9PEXrYYYHnekh8G1mgSNf06iejLcsG
aAYsW9NyPxu8zPhhHNx79K9o8BMtxGD4YQpsfdfIEnf9Q3euqAKe2evRWqHHlDms
RuSCtsc=
=M9Nc
-----END PGP SIGNATURE-----
Merge tag 'v5.17-rc8' into usb-next
We need the Xen USB fixes as other patches depend on those changes.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The generic earlyprintk= parsing already parses the optional ",keep",
no need to duplicate that in the xdbc driver.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220304152135.975568860@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently loops_per_jiffy is set in tsc_early_init(), but then don't
switch to delay_tsc, with the result that delay_loop is used with
loops_per_jiffy set for delay_tsc.
Then in (late) tsc_init() lpj_fine is set (which is mostly unused) and
after which use_tsc_delay() is finally called.
Move both loops_per_jiffy and use_tsc_delay() into
tsc_enable_sched_clock() which is called the moment tsc_khz is
determined, be it early or late. Keeping the lot consistent.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220304152135.914397165@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0bf6276392 ("x32: Warn and disable rather than error if
binutils too old") added a small test in arch/x86/Makefile because
binutils 2.22 or newer is needed to properly support elf32-x86-64. This
check is no longer necessary, as the minimum supported version of
binutils is 2.23, which is enforced at configuration time with
scripts/min-tool-version.sh.
Remove this check and replace all uses of CONFIG_X86_X32 with
CONFIG_X86_X32_ABI, as two symbols are no longer necessary.
[nathan: Rebase, fix up a few places where CONFIG_X86_X32 was still
used, and simplify commit message to satisfy -tip requirements]
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220314194842.3452-2-nathan@kernel.org
Objtool's --ibt option generates .ibt_endbr_seal which lists
superfluous ENDBR instructions. That is those instructions for which
the function is never indirectly called.
Overwrite these ENDBR instructions with a NOP4 such that these
function can never be indirect called, reducing the number of viable
ENDBR targets in the kernel.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.822545231@infradead.org
Find all ENDBR instructions which are never referenced and stick them
in a section such that the kernel can poison them, sealing the
functions from ever being an indirect call target.
This removes about 1-in-4 ENDBR instructions.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.763643193@infradead.org
Annotate away some of the generic code references. This is things
where we take the address of a symbol for exception handling or return
addresses (eg. context switch).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.877758523@infradead.org
Similar to ibt_selftest_ip, apply the same pattern.
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.700456643@infradead.org
The bits required to make the hardware go.. Of note is that, provided
the syscall entry points are covered with ENDBR, #CP doesn't need to
be an IST because we'll never hit the syscall gap.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.582331711@infradead.org
In order to allow kprobes to skip the ENDBR instructions at sym+0 for
X86_KERNEL_IBT builds, change _kprobe_addr() to take an architecture
callback to inspect the function at hand and modify the offset if
needed.
This streamlines the existing interface to cover more cases and
require less hooks. Once PowerPC gets fully converted there will only
be the one arch hook.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.405947704@infradead.org
Return trampoline must not use indirect branch to return; while this
preserves the RSB, it is fundamentally incompatible with IBT. Instead
use a retpoline like ROP gadget that defeats IBT while not unbalancing
the RSB.
And since ftrace_stub is no longer a plain RET, don't use it to copy
from. Since RET is a trivial instruction, poke it directly.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.347296408@infradead.org
Currently a lot of ftrace code assumes __fentry__ is at sym+0. However
with Intel IBT enabled the first instruction of a function will most
likely be ENDBR.
Change ftrace_location() to not only return the __fentry__ location
when called for the __fentry__ location, but also when called for the
sym+0 location.
Then audit/update all callsites of this function to consistently use
these new semantics.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.227581603@infradead.org
Kernel entry points should be having ENDBR on for IBT configs.
The SYSCALL entry points are found through taking their respective
address in order to program them in the MSRs, while the exception
entry points are found through UNWIND_HINT_IRET_REGS.
The rule is that any UNWIND_HINT_IRET_REGS at sym+0 should have an
ENDBR, see the later objtool ibt validation patch.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.933157479@infradead.org
Even though Xen currently doesn't advertise IBT, prepare for when it
will eventually do so and sprinkle the ENDBR dust accordingly.
Even though most of the entry points are IRET like, the CPL0
Hypervisor can set WAIT-FOR-ENDBR and demand ENDBR at these sites.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.873919996@infradead.org
By doing an early rewrite of 'jmp native_iret` in
restore_regs_and_return_to_kernel() we can get rid of the last
INTERRUPT_RETURN user and paravirt_iret.
Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.815039833@infradead.org
swapped back into EPC memory
- Prevent do_int3() from being kprobed, to avoid recursion
- Remap setup_data and setup_indirect structures properly when accessing
their members
- Correct the alternatives patching order for modules too
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmItzJgACgkQEsHwGGHe
VUqaow/8C115xuZEBn+iT+adcQxbqrg3S2en/Hq0aJOEhkNkbOhgAW0OWHvj7Gs3
+2taD35MqzneEOfa0Gv46600V4+SV5K5NAFndr4PA2FVgIw01rEQios2oc4QSQBP
PVJgvGyIMpN71ODKTiZ8w4ihp3J7MWDkCP1z4hbO/lfM4tOXcYzh2Lv1fE8hHr5b
qFtPDyYgfEKUVFa+sv2sE1cJw670UFDcFqGAIjxUUm0r78GKmPz08gZm9YiBTJgV
jrxySdpAh/eaPeHNfFH9RzAD2ZGZppgIkPCp33ZdrMEhnZmwLz7vc76BMbkD2P6w
1fBmBZ5F8yOMaaLHSGh4Ek5Gs3p9DjmaZdEWwz+yiIe1RFLKyOQu6gsmGbAyuQx4
KSfPFnfkOfw/7cz6BSp3Sh6zgrGPqloIVcHkWRth/LJZSV/fVgM8bPg3VLJP6WFi
o4WTcNAq/fNMAmGwtIVpTUW/QJafXvOauKkDGQkMQ87U68QSh6uDrvrvMHPF8W+Y
SPcYrdsAPagLxq0GCCQ6doSvBjWNTolXfTnfAoATZpae0URmrvu9ddgUbIlgeQWY
n/rK+cKk+iuLTEZC55+v5OALwEMOM3Tuz4Ghko8re0pkD/kE61m3Az6w5sKN3Inc
c21tvO/dxHhAnHV+34d2LM27PU4qoFdVO2mPup702x68XT+X0/g=
=YLph
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.17_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Free shmem backing storage for SGX enclave pages when those are
swapped back into EPC memory
- Prevent do_int3() from being kprobed, to avoid recursion
- Remap setup_data and setup_indirect structures properly when
accessing their members
- Correct the alternatives patching order for modules too
* tag 'x86_urgent_for_v5.17_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Free backing memory after faulting the enclave page
x86/traps: Mark do_int3() NOKPROBE_SYMBOL
x86/boot: Add setup_indirect support in early_memremap_is_setup_data()
x86/boot: Fix memremap of setup_indirect structures
x86/module: Fix the paravirt vs alternative order
There is a limited amount of SGX memory (EPC) on each system. When that
memory is used up, SGX has its own swapping mechanism which is similar
in concept but totally separate from the core mm/* code. Instead of
swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM
comes from a shared memory pseudo-file and can itself be swapped by the
core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing
storage needs to be freed. Currently, the backing shmem is not freed.
This effectively wastes the shmem while the enclave is running. The
memory is recovered when the enclave is destroyed and the backing
storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as
a page is faulted back to the EPC. In addition, free the memory for
PCMD pages as soon as all PCMD's in a page have been marked as unused
by zeroing its contents.
Cc: stable@vger.kernel.org
Fixes: 1728ab54b4 ("x86/sgx: Add a page reclaimer")
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org
Since kprobe_int3_handler() is called in do_int3(), probing do_int3()
can cause a breakpoint recursion and crash the kernel. Therefore,
do_int3() should be marked as NOKPROBE_SYMBOL.
Fixes: 21e28290b3 ("x86/traps: Split int3 handler up")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220310120915.63349-1-lihuafei1@huawei.com
Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.
Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-13-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Always handle TIF_NOTIFY_SIGNAL in get_signal. With commit 35d0b389f3
("task_work: unconditionally run task_work from get_signal()") always
calling task_work_run all of the work of tracehook_notify_signal is
already happening except clearing TIF_NOTIFY_SIGNAL.
Factor clear_notify_signal out of tracehook_notify_signal and use it in
get_signal so that get_signal only needs one call of task_work_run.
To keep the semantics in sync update xfer_to_guest_mode_work (which
does not call get_signal) to call tracehook_notify_signal if either
_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-8-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
As documented, the setup_indirect structure is nested inside
the setup_data structures in the setup_data list. The code currently
accesses the fields inside the setup_indirect structure but only
the sizeof(struct setup_data) is being memremapped. No crash
occurred but this is just due to how the area is remapped under the
covers.
Properly memremap both the setup_data and setup_indirect structures
in these cases before accessing them.
Fixes: b3c72fc9a7 ("x86/boot: Introduce setup_indirect")
Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1645668456-22036-2-git-send-email-ross.philipson@oracle.com
Hyper-V provides host version number information that is output in
text form by a Linux guest when it boots. For whatever reason, the
formatting has historically been non-standard. Change it to output
in normal Windows version format for better readability.
Similar code for ARM64 guests already outputs in normal Windows
version format.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1646767364-2234-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
On this board the ACPI RSDP structure points to both a RSDT and an XSDT,
but the XSDT points to a truncated FADT. This causes all sorts of trouble
and usually a complete failure to boot after the following error occurs:
ACPI Error: Unsupported address space: 0x20 (*/hwregs-*)
ACPI Error: AE_SUPPORT, Unable to initialize fixed events (*/evevent-*)
ACPI: Unable to start ACPI Interpreter
This leaves the ACPI implementation in such a broken state that subsequent
kernel subsystem initialisations go wrong, resulting in among others
mismapped PCI memory, SATA and USB enumeration failures, and freezes.
As this is an older embedded platform that will likely never see any BIOS
updates to address this issue and its default shipping OS only complies to
ACPI 1.0, work around this by forcing `acpi=rsdt`. This patch, applied on
top of Linux 5.10.102, was confirmed on real hardware to fix the issue.
Signed-off-by: Mark Cilissen <mark@yotsuba.nl>
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The init_freq_invariance_cppc code actually doesn't need the SMP
functionality. So setting the CONFIG_SMP as the check condition for
init_freq_invariance_cppc may cause the confusion to misunderstand the
CPPC. And the x86 CPPC file is better space to store the CPPC related
functions, while the init_freq_invariance_cppc is out of smpboot, that
means, the CONFIG_SMP won't be mandatory condition any more. And It's more
clear than before.
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function init_freq_invariance will be used on x86 CPPC, so expose it in
the topology header.
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The AMD maximum frequency ratio setting function depends on CPPC, so the
x86 CPPC implementation file is better space for this function.
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rename the cppc_msr.c to cppc.c in x86 ACPI, that expects to use this file
to cover more function implementation for ACPI CPPC beside MSR helpers.
Naming as "cppc" is more straightforward as one of the functionalities
under ACPI subsystem.
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ever since commit
4e6292114c ("x86/paravirt: Add new features for paravirt patching")
there is an ordering dependency between patching paravirt ops and
patching alternatives, the module loader still violates this.
Fixes: 4e6292114c ("x86/paravirt: Add new features for paravirt patching")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220303112825.068773913@infradead.org
which support eIBRS, i.e., the hardware-assisted speculation restriction
after it has been shown that such machines are vulnerable even with the
hardware mitigation.
- Do not use the default LFENCE-based Spectre v2 mitigation on AMD as it
is insufficient to mitigate such attacks. Instead, switch to retpolines
on all AMD by default.
- Update the docs and add some warnings for the obviously vulnerable
cmdline configurations.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmIkktUACgkQEsHwGGHe
VUo7ZQ/+O4hzL/tHY0V/ekkDxCrJ3q3Hp+DcxUl2ee5PC3Qgxv1Z1waH6ppK8jQs
marAGr7FYbvzY039ON7irxhpSIckBCpx9tM2F43zsPxxY8EdxGojkHbmaqso5HtW
l3/O28AcZYoKN/fF8rRAIJy4hrTVascKrNJ2fOiYWYBT62ZIoPm0FusgXbKTZPD+
gT7iUMoyPjBnKdWDT9L6kKOxDF9TivX1Y6JdDHbnnBsgRkeFatkeq9BJ93M73q63
Ziq9c8ZcEXyKez+cGFCfXM7+pNYmfsiL48lilTyf+v+GXahDJQOkFw39j5zXEALm
Nk6yB3PRQ74pEwm5WbK7KO8iwPpblmnDB978mfUcpk+9xWJD8pyoUcItAmCBsXh1
LjIImYPqL6YihUb9udh+PEDISsfzWNzr4T+kgW9/yXXG4ZmGy3TLInhTK+rNAxJa
EshWZExEZj6yJvt83Vu08W9fppYJq976tJvl8LWOYthaxqY7IQz0q7mYd799yxk0
MLPqvZP1+4pHzqn2c9yeHgrwHwMmoqcyMx6B3EA5maYQPdlT7Fk9RCBeCdIA/ieF
OgGxy1WwMH+cvUa5MaBy3Y32LeYU3bUJh0yPFq/7BxEYGG9PJtLhg2xTo1Ui8F1d
fKrcSFcjZKVJ9UE5HaqOcp4ka+Q220I9IDGURXkAFQlnOU7X7CE=
=Athd
-----END PGP SIGNATURE-----
Merge tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 spectre fixes from Borislav Petkov:
- Mitigate Spectre v2-type Branch History Buffer attacks on machines
which support eIBRS, i.e., the hardware-assisted speculation
restriction after it has been shown that such machines are vulnerable
even with the hardware mitigation.
- Do not use the default LFENCE-based Spectre v2 mitigation on AMD as
it is insufficient to mitigate such attacks. Instead, switch to
retpolines on all AMD by default.
- Update the docs and add some warnings for the obviously vulnerable
cmdline configurations.
* tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
x86/speculation: Warn about Spectre v2 LFENCE mitigation
x86/speculation: Update link to AMD speculation whitepaper
x86/speculation: Use generic retpoline by default on AMD
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Documentation/hw-vuln: Update spectre doc
x86/speculation: Add eIBRS + Retpoline options
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
* Tweaks to the paravirtualization code, to avoid using them
when they're pointless or harmful
x86 host:
* Fix for SRCU lockdep splat
* Brown paper bag fix for the propagation of errno
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIkkdsUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP15Qf7B8BXNMlNkret5WN/4pGf06gNdIY6
ZqC8t/Lx1+fCkzGk+VtAw0bxRscOF4z1XzvfywO5ZI5bxQB/b2xTyBkVY90SqhsB
shug5QpikejpmvVZJXxwD3+loCUah2T6FUT6QJa0sKVhW+XiqOva8fAmYLG5agaa
VGvqFXTXiVmbiw/O9ZI/CfUC0WNrn+I1iDO+oGWyhv/22tePxGCizVczRFJn6DAD
Vh5P6AfOqXjmzdpUeOiU544FQZPHAZehb7/xYc0T9GSW4fPnTmHwRzwhUqgJnx7d
3E+eWGwny+Q/OrpKf7SbxtB65yn7lHRmdN/YtCHygl4sjs6CdjSPY8/9jQ==
=PPz1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86 guest:
- Tweaks to the paravirtualization code, to avoid using them when
they're pointless or harmful
x86 host:
- Fix for SRCU lockdep splat
- Brown paper bag fix for the propagation of errno"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run
KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
KVM: x86: Yield to IPI target vCPU only if it is busy
x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
x86/kvm: Don't waste memory if kvmclock is disabled
x86/kvm: Don't use PV TLB/yield when mwait is advertised
The commit
44a3918c82 ("x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting")
added a warning for the "eIBRS + unprivileged eBPF" combination, which
has been shown to be vulnerable against Spectre v2 BHB-based attacks.
However, there's no warning about the "eIBRS + LFENCE retpoline +
unprivileged eBPF" combo. The LFENCE adds more protection by shortening
the speculation window after a mispredicted branch. That makes an attack
significantly more difficult, even with unprivileged eBPF. So at least
for now the logic doesn't warn about that combination.
But if you then add SMT into the mix, the SMT attack angle weakens the
effectiveness of the LFENCE considerably.
So extend the "eIBRS + unprivileged eBPF" warning to also include the
"eIBRS + LFENCE + unprivileged eBPF + SMT" case.
[ bp: Massage commit message. ]
Suggested-by: Alyssa Milburn <alyssa.milburn@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
With:
f8a66d608a ("x86,bugs: Unconditionally allow spectre_v2=retpoline,amd")
it became possible to enable the LFENCE "retpoline" on Intel. However,
Intel doesn't recommend it, as it has some weaknesses compared to
retpoline.
Now AMD doesn't recommend it either.
It can still be left available as a cmdline option. It's faster than
retpoline but is weaker in certain scenarios -- particularly SMT, but
even non-SMT may be vulnerable in some cases.
So just unconditionally warn if the user requests it on the cmdline.
[ bp: Massage commit message. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
AMD retpoline may be susceptible to speculation. The speculation
execution window for an incorrect indirect branch prediction using
LFENCE/JMP sequence may potentially be large enough to allow
exploitation using Spectre V2.
By default, don't use retpoline,lfence on AMD. Instead, use the
generic retpoline.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmIb/PEeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGugMH/R8icwG5gOjAuxBr
fuz9032ECVFS36Cy3ps9Eqgf5EdS4G6yz3joM4aNtp4B7e5FzI9lzBaHS8OAguNL
y7puFtBr9CywsnniJumZzciB9pEHmF/yyEKfMlRZA3JsRfLDacFstETp+duJnXoA
+s49IWsy1ot5zoherhDXcFLqDoFhLVU4hYwE1xpLpW/cllqmSnW8SDZMtWW1Ui/B
Of7zqINR4zBMmRjP4ymGq/ZrPWlFyWLdtOo0xxVoAQkeMgm33kfaaeGzfyK25FqR
JDGUZ7lkKvwz3PYh2hqJ7dc5K+vhJ18I+F1UmOiL6QAAUF/k8jq7i3Qaf6MKqh2Z
2v+A5k0=
=Hiar
-----END PGP SIGNATURE-----
Backmerge tag 'v5.17-rc6' into drm-next
This backmerges v5.17-rc6 so I can merge some amdgpu and some tegra changes on top.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When sending a call-function IPI-many to vCPUs, yield to the
IPI target vCPU which is marked as preempted.
but when emulating HLT, an idling vCPU will be voluntarily
scheduled out and mark as preempted from the guest kernel
perspective. yielding to idle vCPU is pointless and increase
unnecessary vmexit, maybe miss the true preempted vCPU
so yield to IPI target vCPU only if vCPU is busy and preempted
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Message-Id: <1644380201-29423-1-git-send-email-lirongqing@baidu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When Linux runs as an Isolated VM on Hyper-V, it supports AMD SEV-SNP
but it's partially enlightened, i.e. cc_platform_has(
CC_ATTR_GUEST_MEM_ENCRYPT) is true but sev_active() is false.
Commit 4d96f91091 per se is good, but with it now
kvm_setup_vsyscall_timeinfo() -> kvmclock_init_mem() calls
set_memory_decrypted(), and later gets stuck when trying to zere out
the pages pointed by 'hvclock_mem', if Linux runs as an Isolated VM on
Hyper-V. The cause is that here now the Linux VM should no longer access
the original guest physical addrss (GPA); instead the VM should do
memremap() and access the original GPA + ms_hyperv.shared_gpa_boundary:
see the example code in drivers/hv/connection.c: vmbus_connect() or
drivers/hv/ring_buffer.c: hv_ringbuffer_init(). If the VM tries to
access the original GPA, it keepts getting injected a fault by Hyper-V
and gets stuck there.
Here the issue happens only when the VM has >=65 vCPUs, because the
global static array hv_clock_boot[] can hold 64 "struct
pvclock_vsyscall_time_info" (the sizeof of the struct is 64 bytes), so
kvmclock_init_mem() only allocates memory in the case of vCPUs > 64.
Since the 'hvclock_mem' pages are only useful when the kvm clock is
supported by the underlying hypervisor, fix the issue by returning
early when Linux VM runs on Hyper-V, which doesn't support kvm clock.
Fixes: 4d96f91091 ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()")
Tested-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Message-Id: <20220225084600.17817-1-decui@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Even if "no-kvmclock" is passed in cmdline parameter, the guest kernel
still allocates hvclock_mem which is scaled by the number of vCPUs,
let's check kvmclock enable in advance to avoid this memory waste.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1645520523-30814-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MWAIT is advertised in host is not overcommitted scenario, however, PV
TLB/sched yield should be enabled in host overcommitted scenario. Let's
add the MWAIT checking when enabling PV TLB/sched yield.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1645777780-2581-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The __range_not_ok() helper is an x86 (and sparc64) specific interface
that does roughly the same thing as __access_ok(), but with different
calling conventions.
Change this to use the normal interface in order for consistency as we
clean up all access_ok() implementations.
This changes the limit from TASK_SIZE to TASK_SIZE_MAX, which Al points
out is the right thing do do here anyway.
The callers have to use __access_ok() instead of the normal access_ok()
though, because on x86 that contains a WARN_ON_IN_IRQ() check that cannot
be used inside of NMI context while tracing.
The check in copy_code() is not needed any more, because this one is
already done by copy_from_user_nmi().
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/lkml/YgsUKcXGR7r4nINj@zeniv-ca.linux.org.uk/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* Expose KVM_CAP_ENABLE_CAP since it is supported
* Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode
* Ensure async page fault token is nonzero
* Fix lockdep false negative
* Fix FPU migration regression from the AMX changes
x86 guest:
* Don't use PV TLB/IPI/yield on uniprocessor guests
PPC:
* reserve capability id (topic branch for ppc/kvm)
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIXyQAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPKJQf/T9NeXOFIPIIlH4ZKM7155qlwX8dx
NR2YV+RNYd27MDkaEm9w4ucXacGpPuBPPx9v7UiLlAqAN+NP7nF3rQKC0SpQMC6H
EKFtm+8al8EzyDYP36fqnwDne/xWHlOeGXRRJMKPGhXBSoXoY5cK35IXmNZjfteQ
hK7siBs2saJ2VFqMCbJ9Pqdu1NDO6OEt8HWz2Dnx6EUd90O0pHWZy5JvWOYfyLjL
Y2pP0dZQxuB/PmqkpVj2gV9jK2Zhj33eerzDV4tVXPV7le8fgGeTaJ8ft+SUIizS
YCcPR89+u5c9yzlwY2i7mvloayKnuqkECiGtRG6VHNlrPZTPijems8tH1w==
=lWjy
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86 host:
- Expose KVM_CAP_ENABLE_CAP since it is supported
- Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode
- Ensure async page fault token is nonzero
- Fix lockdep false negative
- Fix FPU migration regression from the AMX changes
x86 guest:
- Don't use PV TLB/IPI/yield on uniprocessor guests
PPC:
- reserve capability id (topic branch for ppc/kvm)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled
KVM: x86/mmu: make apf token non-zero to fix bug
KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3
x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
x86/kvm: Fix compilation warning in non-x86_64 builds
x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0
x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
KVM: Fix lockdep false negative during host resume
KVM: x86: Add KVM_CAP_ENABLE_CAP to x86
The kernel provides infrastructure to set or clear the encryption mask
from the pages for AMD SEV, but TDX requires few tweaks.
- TDX and SEV have different requirements to the cache and TLB
flushing.
- TDX has own routine to notify VMM about page encryption status change.
Modify __set_memory_enc_pgtable() and make it flexible enough to cover
both AMD SEV and Intel TDX. The AMD-specific behavior is isolated in the
callbacks under x86_platform.guest. TDX will provide own version of said
callbacks.
[ bp: Beat into submission. ]
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20220223043528.2093214-1-brijesh.singh@amd.com
The kernel derives the confidential computing platform
type it is running as from sme_me_mask on AMD or by using
hv_is_isolation_supported() on HyperV isolation VMs. This detection
process will be more complicated as more platforms get added.
Declare a confidential computing vendor variable explicitly and set it
via cc_set_vendor() on the respective platform.
[ bp: Massage commit message, fixup HyperV check. ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220222185740.26228-4-kirill.shutemov@linux.intel.com
Move cc_platform.c to arch/x86/coco/. The directory is going to be the
home space for code related to confidential computing.
Intel TDX code will land here. AMD SEV code will also eventually be
moved there.
No functional changes.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220222185740.26228-3-kirill.shutemov@linux.intel.com
There is no need to have struct kernfs_root be part of kernfs.h for
the whole kernel to see and poke around it. Move it internal to kernfs
code and provide a helper function, kernfs_root_to_node(), to handle the
one field that kernfs users were directly accessing from the structure.
Cc: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220222070713.3517679-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is pretty much unused and not really useful. What is more, all
relevant MCA hardware has recoverable machine checks support so there's
no real need to tweak MCA tolerance levels in order to *maybe* extend
machine lifetime.
So rip it out.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/YcDq8PxvKtTENl/e@zn.tnic
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmISrYgeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg20IAKDZr7rfSHBopjQV
Cocw744tom0XuxpvSZpp2GGOOXF+tkswcNNaRIrbGOl1mkyxA7eBZCTMpDeDS9aQ
wB0D0Gxx8QBAJp4KgB1W7TB+hIGes/rs8Ve+6iO4ulLLdCVWX/q2boI0aZ7QX9O9
qNi8OsoZQtk6falRvciZFHwV5Av1p2Sy1AW57udQ7DvJ4H98AfKf1u8/z208WWW8
1ixC+qJxQcUcM9vI+7P9Tt7NbFSKv8SvAmqjFY7P+DxQAsVw6KXoqVXykDzeOv0t
fUNOE/t0oFZafwtn8h7KBQnwS9lH03+3KkslVZs+iMFyUj/Bar+NVVyKoDhWXtVg
/PuMhEg=
=eU1o
-----END PGP SIGNATURE-----
Merge tag 'v5.17-rc5' into sched/core, to resolve conflicts
New conflicts in sched/core due to the following upstream fixes:
44585f7bc0 ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n")
a06247c680 ("psi: Fix uaf issue when psi trigger is destroyed while being polled")
Conflicts:
include/linux/psi_types.h
kernel/sched/psi.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable
to Spectre v2 BHB-based attacks.
When both are enabled, print a warning message and report it in the
'spectre_v2' sysfs vulnerabilities file.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Thanks to the chaps at VUsec it is now clear that eIBRS is not
sufficient, therefore allow enabling of retpolines along with eIBRS.
Add spectre_v2=eibrs, spectre_v2=eibrs,lfence and
spectre_v2=eibrs,retpoline options to explicitly pick your preferred
means of mitigation.
Since there's new mitigations there's also user visible changes in
/sys/devices/system/cpu/vulnerabilities/spectre_v2 to reflect these
new mitigations.
[ bp: Massage commit message, trim error messages,
do more precise eIBRS mode checking. ]
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
The RETPOLINE_AMD name is unfortunate since it isn't necessarily
AMD only, in fact Hygon also uses it. Furthermore it will likely be
sufficient for some Intel processors. Therefore rename the thing to
RETPOLINE_LFENCE to better describe what it is.
Add the spectre_v2=retpoline,lfence option as an alias to
spectre_v2=retpoline,amd to preserve existing setups. However, the output
of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed.
[ bp: Fix typos, massage. ]
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
- Handle poisoned pages properly in the SGX reclaimer code
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmISMboACgkQEsHwGGHe
VUoo0hAAvohA86G6CCfDvQUjSoimYqsKockWCS594/Fz0eEEi7wf+1e48OOtEVnW
4TCqd/dCNXnzOSj0KE2/nY+X3SaOv+2gLr1uyv+A8iX8nkTt+3Hkrzjm6mkSE5Bq
C6aKANwdqfzmyy7efYHYHOb4hbcMe8t1df7rPsAwaT+MVJCOgjnICLAx4+YuhNqP
hvBs6o5wFgPx6/LJzmgaL1QA/qxstmhagLncU1MXZPte6xo+Dp1/6qMm2QFiJgIr
clxZU8Q87NlzIAOGFohwzs0fm3DID529PzTfArqsRJPKB8TX0EUoZ8d2OPzD03I1
aOdw6dhdNxfgORI0+5glffJ6aGbUi076x+1K4r4wYzGHW+hjd/gyNxIfnGOOgKOG
ZZdf01wiOpcvzBHk/yx+jSfmwI5Zt079DCTD9MnutV8k952dtqZNmJ+Vrkmj9aCS
Lsv/AZJDGa4HEBAI+lvYCZb1xFzBIDCinVgl/PrFrcMFuRi737J2C50LzYeZpg/x
gllkk3U5kTlZ1iGiQlRzBlDu28vXVIV9koF5xoOwNLYjpFr6Jog62tiioP5IS9ym
VVxAaYlzYNNMjwwtypS55sMmYK+dnaK/jrmCp9xI3br5e9OXhZCr3dq/od03ijPH
jhFCr0Ec7IPIk5uHj9uvcUDKyvrqjDWCG8aK1Yc65Ed5pLYX7cQ=
=5Bl4
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Fix the ptrace regset xfpregs_set() callback to behave according to
the ABI
- Handle poisoned pages properly in the SGX reclaimer code
* tag 'x86_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
x86/sgx: Fix missing poison handling in reclaimer
A rare kernel panic scenario can happen when the following conditions
are met due to an erratum on fast string copy instructions:
1) An uncorrected error.
2) That error must be in first cache line of a page.
3) Kernel must execute page_copy from the page immediately before that
page.
The fast string copy instructions ("REP; MOVS*") could consume an
uncorrectable memory error in the cache line _right after_ the desired
region to copy and raise an MCE.
Bit 0 of MSR_IA32_MISC_ENABLE can be cleared to disable fast string
copy and will avoid such spurious machine checks. However, that is less
preferable due to the permanent performance impact. Considering memory
poison is rare, it's desirable to keep fast string copy enabled until an
MCE is seen.
Intel has confirmed the following:
1. The CPU erratum of fast string copy only applies to Skylake,
Cascade Lake and Cooper Lake generations.
Directly return from the MCE handler:
2. Will result in complete execution of the "REP; MOVS*" with no data
loss or corruption.
3. Will not result in another MCE firing on the next poisoned cache line
due to "REP; MOVS*".
4. Will resume execution from a correct point in code.
5. Will result in the same instruction that triggered the MCE firing a
second MCE immediately for any other software recoverable data fetch
errors.
6. Is not safe without disabling the fast string copy, as the next fast
string copy of the same buffer on the same CPU would result in a PANIC
MCE.
This should mitigate the erratum completely with the only caveat that
the fast string copy is disabled on the affected hyper thread thus
performance degradation.
This is still better than the OS crashing on MCEs raised on an
irrelevant process due to "REP; MOVS*' accesses in a kernel context,
e.g., copy_page.
Tested:
Injected errors on 1st cache line of 8 anonymous pages of process
'proc1' and observed MCE consumption from 'proc2' with no panic
(directly returned).
Without the fix, the host panicked within a few minutes on a
random 'proc2' process due to kernel access from copy_page.
[ bp: Fix comment style + touch ups, zap an unlikely(), improve the
quirk function's readability. ]
Signed-off-by: Jue Wang <juew@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20220218013209.2436006-1-juew@google.com
xfpregs_set() handles 32-bit REGSET_XFP and 64-bit REGSET_FP. The actual
code treats these regsets as modern FX state (i.e. the beginning part of
XSTATE). The declarations of the regsets thought they were the legacy
i387 format. The code thought they were the 32-bit (no xmm8..15) variant
of XSTATE and, for good measure, made the high bits disappear by zeroing
the wrong part of the buffer. The latter broke ptrace, and everything
else confused anyone trying to understand the code. In particular, the
nonsense definitions of the regsets confused me when I wrote this code.
Clean this all up. Change the declarations to match reality (which
shouldn't change the generated code, let alone the ABI) and fix
xfpregs_set() to clear the correct bits and to only do so for 32-bit
callers.
Fixes: 6164331d15 ("x86/fpu: Rewrite xfpregs_set()")
Reported-by: Luís Ferreira <contact@lsferreira.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215524
Link: https://lore.kernel.org/r/YgpFnZpF01WwR8wU@zn.tnic
Inspired by commit 3553ae5690 (x86/kvm: Don't use pvqspinlock code if
only 1 vCPU), on a VM with only 1 vCPU, there is no need to enable
pv tlb/ipi/sched_yield and we can save the memory for __pv_cpu_mask.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1645171838-2855-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The SGX reclaimer code lacks page poison handling in its main
free path. This can lead to avoidable machine checks if a
poisoned page is freed and reallocated instead of being
isolated.
A troublesome scenario is:
1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
2. arch_memory_failure() is eventually called
3. (SGX) page->poison set to 1
4. Page is reclaimed
5. Page added to normal free lists by sgx_reclaim_pages()
^ This is the bug (poison pages should be isolated on the
sgx_poison_page_list instead)
6. Page is reallocated by some innocent enclave, a second (synchronous)
in-kernel #MC is induced, probably during EADD instruction.
^ This is the fallout from the bug
(6) is unfortunate and can be avoided by replacing the open coded
enclave page freeing code in the reclaimer with sgx_free_epc_page()
to obtain support for poison page handling that includes placing the
poisoned page on the correct list.
Fixes: d6d261bded ("x86/sgx: Add new sgx_epc_page flag bit to mark free pages")
Fixes: 992801ae92 ("x86/sgx: Initial poison handling for dirty and free pages")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/dcc95eb2aaefb042527ac50d0a50738c7c160dac.1643830353.git.reinette.chatre@intel.com
During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel
swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate().
When xsave feature is available, the fpu swap is done by:
- xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used
to store the current state of the fpu registers to a buffer.
- xrstor(s) instruction, with (fpu_kernel_cfg.max_features &
XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs.
For xsave(s) the mask is used to limit what parts of the fpu regs will
be copied to the buffer. Likewise on xrstor(s), the mask is used to
limit what parts of the fpu regs will be changed.
The mask for xsave(s), the guest's fpstate->xfeatures, is defined on
kvm_arch_vcpu_create(), which (in summary) sets it to all features
supported by the cpu which are enabled on kernel config.
This means that xsave(s) will save to guest buffer all the fpu regs
contents the cpu has enabled when the guest is paused, even if they
are not used.
This would not be an issue, if xrstor(s) would also do that.
xrstor(s)'s mask for host/guest swap is basically every valid feature
contained in kernel config, except XFEATURE_MASK_PKRU.
Accordingto kernel src, it is instead switched in switch_to() and
flush_thread().
Then, the following happens with a host supporting PKRU starts a
guest that does not support it:
1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest,
2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU)
3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU)
4 - guest runs, then switch back to host,
5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU)
6 - xrstor(s) host fpstate to fpu regs.
7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with
XFEATURE_MASK_PKRU, which should not be supported by guest vcpu)
On 5, even though the guest does not support PKRU, it does have the flag
set on guest fpstate, which is transferred to userspace via vcpu ioctl
KVM_GET_XSAVE.
This becomes a problem when the user decides on migrating the above guest
to another machine that does not support PKRU: the new host restores
guest's fpu regs to as they were before (xrstor(s)), but since the new
host don't support PKRU, a general-protection exception ocurs in xrstor(s)
and that crashes the guest.
This can be solved by making the guest's fpstate->user_xfeatures hold
a copy of guest_supported_xcr0. This way, on 7 the only flags copied to
userspace will be the ones compatible to guest requirements, and thus
there will be no issue during migration.
As a bonus, it will also fail if userspace tries to set fpu features
(with the KVM_SET_XSAVE ioctl) that are not compatible to the guest
configuration. Such features will never be returned by KVM_GET_XSAVE
or KVM_GET_XSAVE2.
Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures,
there is not need to set it in kvm_check_cpuid(). So, change
fpstate_realloc() so it does not touch fpstate->user_xfeatures if a
non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid()
calls it.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220217053028.96432-2-leobras@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, the SME CPU feature flag is reflective of whether the CPU
supports the feature but not whether it has been activated by the
kernel.
Change this around to clear the SME feature flag if the kernel is not
using it so userspace can determine if it is available and in use
from /proc/cpuinfo.
As the feature flag is cleared on systems where SME isn't active, use
CPUID 0x8000001f to confirm SME availability before calling
native_wbinvd().
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220216034446.2430634-1-mario.limonciello@amd.com
Refer to housekeeping APIs using single feature types instead of flags.
This prevents from passing multiple isolation features at once to
housekeeping interfaces, which soon won't be possible anymore as each
isolation features will have their own cpumask.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
All tasks start with PASID state disabled. This means that the first
time they execute an ENQCMD instruction they will take a #GP fault.
Modify the #GP fault handler to check if the "mm" for the task has
already been allocated a PASID. If so, try to fix the #GP fault by
loading the IA32_PASID MSR.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220207230254.3342514-9-fenghua.yu@intel.com
The kernel must allocate a Process Address Space ID (PASID) on behalf of
each process which will use ENQCMD and program it into the new MSR to
communicate the process identity to platform hardware. ENQCMD uses the
PASID stored in this MSR to tag requests from this process.
The PASID state must be cleared on fork() since fork creates a
new address space.
For clone(), it would be functionally OK to copy the PASID. However,
clearing it is _also_ functionally OK since any PASID use will trigger
the #GP handler to populate the MSR.
Copying the PASID state has two main downsides:
* It requires differentiating fork() and clone() in the code,
both in the FPU code and keeping tsk->pasid_activated consistent.
* It guarantees that the PASID is out of its init state, which
incurs small but non-zero cost on every XSAVE/XRSTOR.
The main downside of clearing the PASID at fpstate copy is the future,
one-time #GP for the thread.
Use the simplest approach: clear the PASID state both on clone() and
fork(). Rely on the #GP handler for MSR population in children.
Also, just clear the PASID bit from xfeatures if XSAVE is supported.
This will have no effect on systems that do not have PASID support. It
is virtually zero overhead because 'dst_fpu' was just written and
the whole thing is cache hot.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220207230254.3342514-7-fenghua.yu@intel.com
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmIJZmoeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGZdoH/04d8zUhM3Fd3ACB
V/ONtOXmkfP2mEJSjb7cXTN1EM2SlOBdSnSsEw09FtGhjHABjOnLho4J5ixk9TH8
zNMNI3EMksM2T9KadHwxv8Vvp1LTrWRzMbws8tOCPA0RkOpikJfClC8CzRAyidJ3
cAbbDH/Jl1GnVZ8bpKmv2auYt+kNVGb0cwJ2W8phCwwkL7sLky5tgYeaGiJEXbJf
Tfi/3qtFdmYjD8wtYnCfzjnB7suG5nF7rGEnxCIxNi+IA4DieUv2c1KchuoaBfT9
df364VjKaGT3j+GB07ksQ/8mkwWiRXsCzOXAyMZSZaWjdMD4aAhCTJak5j7/TvGC
wtgHPww=
=/CMW
-----END PGP SIGNATURE-----
Backmerge tag 'v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next
Daniel asked for this for some intel deps, so let's do it now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The arch helpers do not have explicit KASAN instrumentation. Use them in
noinstr code.
Inline a couple more functions with single call sites, while at it:
mce_severity_amd_smca() has a single call-site which is noinstr so force
the inlining and fix:
vmlinux.o: warning: objtool: mce_severity_amd.constprop.0()+0xca: call to \
mce_severity_amd_smca() leaves .noinstr.text section
Always inline mca_msr_reg():
text data bss dec hex filename
16065240 128031326 36405368 180501934 ac23dae vmlinux.before
16065240 128031294 36405368 180501902 ac23d8e vmlinux.after
and mce_no_way_out() as the latter one is used only once, to fix:
vmlinux.o: warning: objtool: mce_read_aux()+0x53: call to mca_msr_reg() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_machine_check()+0xc9: call to mce_no_way_out() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20220204083015.17317-4-bp@alien8.de
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmII99AACgkQEsHwGGHe
VUp03RAAo2KE+kb5kbMRqF0eJjGCl34EGbcfkl2Kr+M1tbi7aG41VbnGuZIyHoOQ
U6ZnN8v/ft5tQy7PfXrZW7iNKQgLrdEOfI6lUoyr5LGgJprXj/SSJvFMngfypuQi
mPhkvRs+hTIe4ylQvbqQKgQxFIMgF90+XsdmBz0vaJDun8dkOr8ghS2bBPHf1y4o
1sQ14SgDCU0hVtPhH5HQQIcanqmHXNbYreXuTToHRwgqy4CcoX5vaQQUGgjQK5SK
ektmDxGiBI90jscL+ZoXg730dchXku04WY5tfoYJazUnIMKNSlpmTGQLBsQnlDf5
Cxi94h91GYaLM0u44OICyfHNPUi+xx19o31dzBnNviSpKLKoK/lquFCNUXNqVn9E
lwOhMysSYOuxgtnYqLXMUSQWMwY3rbISrUqPOR7vMYzg/b+LKPTc76I9B7C9/UYW
6lKCchDicAAv/rLh1+0JOOKWTaz1F8dVasxRCRGYreL8ZxT14jsB41sn4xxSRXRZ
d/iEobh/LFL1c37ju0sWdHj5fSK9c4pPIM560o2ftBwGypvryVdBFDpbe3B1W/AD
IJXRsVW3LU7BbnGEXYcobcX5vXeBKULtcTliS9VTQjIQVZkcqPp7t8GSJOMf9qos
k889Fi9NfQktaUQDQztTujmwuQrP7JPaejrmvQU0xwam88ZkvwM=
=yZ3t
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
"Prevent softlockups when tearing down large SGX enclaves"
* tag 'x86_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Silence softlockup detection when releasing large enclaves
This function was previously part of __startup_64() which is marked
__head, and is currently only called from there. Mark it __head too.
Signed-off-by: Marco Bonelli <marco@mebeim.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220211162350.11780-1-marco@mebeim.net
Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
triggers the softlockup detector.
Actual SGX systems have 128GB of enclave memory or more. The
"unclobbered_vdso_oversubscribed" selftest creates one enclave which
consumes all of the enclave memory on the system. Tearing down such a
large enclave takes around a minute, most of it in the loop where
the EREMOVE instruction is applied to each individual 4k enclave page.
Spending one minute in a loop triggers the softlockup detector.
Add a cond_resched() to give other tasks a chance to run and placate
the softlockup detector.
Cc: stable@vger.kernel.org
Fixes: 1728ab54b4 ("x86/sgx: Add a page reclaimer")
Reported-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org> (kselftest as sanity check)
Link: https://lkml.kernel.org/r/ced01cac1e75f900251b0a4ae1150aa8ebd295ec.1644345232.git.reinette.chatre@intel.com
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-02-09
We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).
The main changes are:
1) Add custom BPF allocator for JITs that pack multiple programs into a huge
page to reduce iTLB pressure, from Song Liu.
2) Add __user tagging support in vmlinux BTF and utilize it from BPF
verifier when generating loads, from Yonghong Song.
3) Add per-socket fast path check guarding from cgroup/BPF overhead when
used by only some sockets, from Pavel Begunkov.
4) Continued libbpf deprecation work of APIs/features and removal of their
usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
and various others.
5) Improve BPF instruction set documentation by adding byte swap
instructions and cleaning up load/store section, from Christoph Hellwig.
6) Switch BPF preload infra to light skeleton and remove libbpf dependency
from it, from Alexei Starovoitov.
7) Fix architecture-agnostic macros in libbpf for accessing syscall
arguments from BPF progs for non-x86 architectures,
from Ilya Leoshkevich.
8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
of 16-bit field with anonymous zero padding, from Jakub Sitnicki.
9) Add new bpf_copy_from_user_task() helper to read memory from a different
task than current. Add ability to create sleepable BPF iterator progs,
from Kenny Yu.
10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.
11) Generate temporary netns names for BPF selftests to avoid naming
collisions, from Hangbin Liu.
12) Implement bpf_core_types_are_compat() with limited recursion for
in-kernel usage, from Matteo Croce.
13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.
14) Misc minor fixes to libbpf and selftests from various folks.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
libbpf: Fix compilation warning due to mismatched printf format
selftests/bpf: Test BPF_KPROBE_SYSCALL macro
libbpf: Add BPF_KPROBE_SYSCALL macro
libbpf: Fix accessing the first syscall argument on s390
libbpf: Fix accessing the first syscall argument on arm64
libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
libbpf: Fix accessing syscall arguments on riscv
libbpf: Fix riscv register names
libbpf: Fix accessing syscall arguments on powerpc
selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
libbpf: Add PT_REGS_SYSCALL_REGS macro
selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
bpf: Fix leftover header->pages in sparc and powerpc code.
libbpf: Fix signedness bug in btf_dump_array_data()
selftests/bpf: Do not export subtest as standalone test
bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
...
====================
Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 7f7b4236f2 ("x86/PCI: Ignore E820 reservations for bridge windows
on newer systems") fixes the touchpad not working on laptops like
the Lenovo IdeaPad 3 15IIL05 and the Lenovo IdeaPad 5 14IIL05, as well as
fixing thunderbolt hotplug issues on the Lenovo Yoga C940.
Unfortunately it turns out that this is causing issues with suspend/resume
on Lenovo ThinkPad X1 Carbon Gen 2 laptops. So, per the no regressions
policy, rever this. Note I'm looking into another fix for the issues this
fixed.
Fixes: 7f7b4236f2 ("x86/PCI: Ignore E820 reservations for bridge windows on newer systems")
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2029207
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This will be used by BPF jit compiler to dump JITed binary to a RX huge
page, and thus allow multiple BPF programs sharing the a huge (2MB) page.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20220204185742.271030-6-song@kernel.org
Currently, the PPIN (Protected Processor Inventory Number) MSR is read
by every CPU that processes a machine check, CMCI, or just polls machine
check banks from a periodic timer. This is not a "fast" MSR, so this
adds to overhead of processing errors.
Add a new "ppin" field to the cpuinfo_x86 structure. Read and save the
PPIN during initialization. Use this copy in mce_setup() instead of
reading the MSR.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220131230111.2004669-4-tony.luck@intel.com
After nine generations of adding to model specific list of CPUs that
support PPIN (Protected Processor Inventory Number) Intel allocated
a CPUID bit to enumerate the MSRs.
CPUID(EAX=7, ECX=1).EBX bit 0 enumerates presence of MSR_PPIN_CTL and
MSR_PPIN. Add it to the "scattered" CPUID bits and add an entry to the
ppin_cpuids[] x86_match_cpu() array to catch Intel CPUs that implement
it.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220131230111.2004669-3-tony.luck@intel.com
The code to decide whether a system supports the PPIN (Protected
Processor Inventory Number) MSR was cloned from the Intel
implementation. Apart from the X86_FEATURE bit and the MSR numbers it is
identical.
Merge the two functions into common x86 code, but use x86_match_cpu()
instead of the switch (c->x86_model) that was used by the old Intel
code.
No functional change.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220131230111.2004669-2-tony.luck@intel.com
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the AMD mce sysfs code to use default_groups field which has
been the preferred way since
aa30f47cf6 ("kobject: Add support for default attribute groups to kobj_type")
so that the obsolete default_attrs field can be removed soon.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20220106103537.3663852-1-gregkh@linuxfoundation.org
Catch-up with 5.17-rc2 and trying to align with drm-intel-gt-next
for a possible topic branch for merging the split of i915_regs...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Missed adding the Icelake-D CPU to the list. It uses the same MSRs
to control and read the inventory number as all the other models.
Fixes: dc6b025de9 ("x86/mce: Add Xeon Icelake to list of CPUs that support PPIN")
Reported-by: Ailin Xu <ailin.xu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220121174743.1875294-2-tony.luck@intel.com
Changes to the AMD Thresholding sysfs code prevents sysfs writes from
updating the underlying registers once CPU init is completed, i.e.
"threshold_banks" is set.
Allow the registers to be updated if the thresholding interface is
already initialized or if in the init path. Use the "set_lvt_off" value
to indicate if running in the init path, since this value is only set
during init.
Fixes: a037f3ca0e ("x86/mce/amd: Make threshold bank setting hotplug robust")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220117161328.19148-1-yazen.ghannam@amd.com
-----BEGIN PGP SIGNATURE-----
iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
=RKW4
-----END PGP SIGNATURE-----
Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
New driver:
- Sunplus SP7021 RTC
- Nintendo GameCube, Wii and Wii U RTC
Drivers:
- cmos: refactor UIP handling and presence check, fix century
- rs5c372: offset correction support, report low voltage
- rv8803: Epson RX8804 support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmHp+jkACgkQY6TcMGxw
OjJS+hAArVbXHT0ceVHY5mzqkBXCyVxhXdyLUw/VmedO9NvCTk1+O9VhBwKTRdQX
FLuvPkyjZlkAk3CEfaHBp4u5AW3Xumdkuh9XwG8b3qRs3UGYBgKiNfXQFtOWHs7w
VO+lD7+f2GNNTYbYH2dLPs6qZwVIN0rL7JTwUH7Po2mqBVztppSOYhmo3vmOqqux
FI5jFVDoE8u+h359bIKcaax3/dne9m1uyD1WWxOJFRtY+apbECpUBfo1tmauAEXP
gg+v7On+gboDqVe9/lwqB+xfKzWFJKwYIu6CM8+Mf/dxhtRNRXCgszoiCSFZHUwG
GghD7Kb96xWuGX1pRe6xx5/7wixes1wCkIVGa5JQLb5E/GQ3O9y8D+Tdk6lyAP5m
XUVPWGC/qByj6z9pBHtbHmq9lhd1vPHp3SU0ske1xlCQd3WnPq7bQ3MEw345gf4Z
RGrtpnUPIfKzHWID+2zCzTj6TluW9FnnOjcm2U8jF/kwB+d1/H2O5xeWR7eQYbUJ
BY5gjDRVIaM3aQC9QiiFMUorqv8Q3oE9FtjCSuLhUx6WEg4S0mtYsXtRUaLT7pRw
RRsoeCJd8Abnf1Nl/imZ2IrRCcbNhzLAq2Aw8cHbiroAk8lSFAH2NzvcS8213JfG
muOrIB2Q3XBu+6hq452ZDE1155FGWFLwKnAjvSRy5yRbsN79/YA=
=nwD6
-----END PGP SIGNATURE-----
Merge tag 'rtc-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Two new drivers this cycle and a significant rework of the CMOS driver
make the bulk of the changes.
I also carry powerpc changes with the agreement of Michael.
New drivers:
- Sunplus SP7021 RTC
- Nintendo GameCube, Wii and Wii U RTC
Driver updates:
- cmos: refactor UIP handling and presence check, fix century
- rs5c372: offset correction support, report low voltage
- rv8803: Epson RX8804 support"
* tag 'rtc-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (33 commits)
rtc: sunplus: fix return value in sp_rtc_probe()
rtc: cmos: Evaluate century appropriate
rtc: gamecube: Fix an IS_ERR() vs NULL check
rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
dt-bindings: rtc: qcom-pm8xxx-rtc: update register numbers
rtc: pxa: fix null pointer dereference
rtc: ftrtc010: Use platform_get_irq() to get the interrupt
rtc: Move variable into switch case statement
rtc: pcf2127: Fix typo in comment
dt-bindings: rtc: Add Sunplus RTC json-schema
rtc: Add driver for RTC in Sunplus SP7021
rtc: rs5c372: fix incorrect oscillation value on r2221tl
rtc: rs5c372: add offset correction support
rtc: cmos: avoid UIP when writing alarm time
rtc: cmos: avoid UIP when reading alarm time
rtc: mc146818-lib: refactor mc146818_does_rtc_work
rtc: mc146818-lib: refactor mc146818_get_time
rtc: mc146818-lib: extract mc146818_avoid_UIP
rtc: mc146818-lib: fix RTC presence check
rtc: Check return value from mc146818_get_time()
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmHpyGoUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vz08A/8CnBCelFGiHNHwLHfSsz6sKG0+g1T
X3n7q+u5FsyiWl83lmjMUUyZ3C7ubyaFYV6vaSEM68J3NR8HKlcuyixnq88CJVXO
yIgvlzREHx04f5UdCDs4+nIKkv1rSC2GGfN2zBRLUgFpiV9jaF34C9s62jP13hj7
fM9SOr/i9RSEs2nF+UtAlrFt8qDTs7cbtu5Y0asONfLgqZK+mr+Uv06sNZfHtMUy
RyXxFwduslLwunpbkAA9+Jim3u7GK1VaZCb/wXmFvuzV/eEifUtnqxud9l8onKCE
3rHlBHOGpNUmA3iFyAzN/qAZzE/YG/27Z+wHdRGv/XrsvG/g0f1U8DnaoTC2IL9v
u0rp6i8mXwkrpi/kKLwtCwgttA0L1zB/FzUurpFXsuf5xlej8S4XlGuzeHRiDKmi
0oVPOGycBUya0DPyoK0oFA9LhYgVcNjouAM/Z0CjO1xI8f7UW/FfKBfufTM7iW3c
K8Y/GlDeiECjYSTo9iV0NC68ZceB8VIWQlNDimQ55h9SUdZFTvBm0v12c6ScN08+
fdLO4bCEgJE+DFpY0u6e8rvdE06diHJmJMqC35UQDKJnxNEjqrlgSdCjTjQoL9zN
tulNsvrvmmflv7s9A8JnmGZ+3L+nrk48i5fL0r+tHmb9PILty3e6ZkZK3InGZHOO
oDJeLeNQeE4JIiw=
=8wOd
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.17-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
- Reserve "stolen memory" for integrated Intel GPU, even if it's not
the first GPU to be enumerated (Lucas De Marchi)
* tag 'pci-v5.17-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/gpu: Reserve stolen memory for first integrated Intel GPU
Merge more updates from Andrew Morton:
"55 patches.
Subsystems affected by this patch series: percpu, procfs, sysctl,
misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
lib: remove redundant assignment to variable ret
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
btrfs: use generic Kconfig option for 256kB page size limit
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
configs: introduce debug.config for CI-like setup
delayacct: track delays from memory compact
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
delayacct: cleanup flags in struct task_delay_info and functions use it
delayacct: fix incomplete disable operation when switch enable to disable
delayacct: support swapin delay accounting for swapping without blkio
panic: remove oops_id
panic: use error_report_end tracepoint on warnings
fs/adfs: remove unneeded variable make code cleaner
FAT: use io_schedule_timeout() instead of congestion_wait()
hfsplus: use struct_group_attr() for memcpy() region
nilfs2: remove redundant pointer sbufs
fs/binfmt_elf: use PT_LOAD p_align values for static PIE
const_structs.checkpatch: add frequently used ops structs
...
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
populate pte, this patch adds a generic pcpu populate pte function,
pcpu_populate_pte(), which is marked __weak and used on most
architectures, but it is overridden on x86, which has its own
implementation.
Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the previous patch, we could add a generic pcpu first chunk
allocate and free function to cleanup the duplicated definations on each
architecture.
Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
first chunk allocation will call it to alloc memblock on the
corresponding node by it, this is prepare for the next patch.
Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"Stolen memory" is memory set aside for use by an Intel integrated GPU.
The intel_graphics_quirks() early quirk reserves this memory when it is
called for a GPU that appears in the intel_early_ids[] table of integrated
GPUs.
Previously intel_graphics_quirks() was marked as QFLAG_APPLY_ONCE, so it
was called only for the first Intel GPU found. If a discrete GPU happened
to be enumerated first, intel_graphics_quirks() was called for it but not
for any integrated GPU found later. Therefore, stolen memory for such an
integrated GPU was never reserved.
For example, this problem occurs in this Alderlake-P (integrated) + DG2
(discrete) topology where the DG2 is found first, but stolen memory is
associated with the integrated GPU:
- 00:01.0 Bridge
`- 03:00.0 DG2 discrete GPU
- 00:02.0 Integrated GPU (with stolen memory)
Remove the QFLAG_APPLY_ONCE flag and call intel_graphics_quirks() for every
Intel GPU. Reserve stolen memory for the first GPU that appears in
intel_early_ids[].
[bhelgaas: commit log, add code comment, squash in
https://lore.kernel.org/r/20220118190558.2ququ4vdfjuahicm@ldmartin-desk2]
Link: https://lore.kernel.org/r/20220114002843.2083382-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
- Add support for the the Platform Firmware Runtime Update and
Telemetry (PFRUT) interface based on ACPI to allow certain pieces
of the platform firmware to be updated without restarting the
system and to provide a mechanism for collecting platform firmware
telemetry data (Chen Yu, Dan Carpenter, Yang Yingliang).
- Ignore E820 reservations covering PCI host bridge windows on
sufficiently recent x86 systems to avoid issues with allocating
PCI BARs on systems where the E820 reservations cover the entire
PCI host bridge memory window returned by the _CRS object in the
system's ACPI tables (Hans de Goede).
- Fix and clean up acpi_scan_init() (Rafael Wysocki).
- Add more sanity checking to ACPI SPCR tables parsing (Mark
Langsdorf).
- Fix up ACPI APD (AMD Soc) driver initialization (Jiasheng Jiang).
- Drop unnecessary "static" from the ACPI PCC address space handling
driver added recently (kernel test robot).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmHlrTwSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxQ00QAI3CYIyM+TtgxOkun/DQdHEtLOXxVsQG
V9p9zNa0xFjQY2fE3wwa4r03fg/AfM+qqOdgZDg2QnUlHVx+ZqzNlxMVnYz2QK3P
lhZUJ2sX0ziAYC5AJFP44mcU4yAFgqGgUgv6VUGdoG2khngKA5ds1qYPqdNMigGr
lfAoro2tVpKL66czvglBM6Af0RIrZvmh5dpwtja1tfquAy2qLOLfwBIafdFjrTS8
nVckdBNTgd12BxMaTlEIr8ZKX9Qe/YI0s+avzLc/lMI8gBlG0ZkGczDHxjLE48iE
ZHwttNJm6PKR6Sj5AwXjswkW0n0aqVz1BbeDgdeL9zB6WbH9+EQaQFqYIc1pNyGG
f67c8YwGFPMGx3BKdTksFESKb6DZifc1zbye3OKLxQquaWmufmhmQDVuXZzcfRO3
dkhIngX/sieEWwj1P1WdaDO4vC2zG2devJhxu3WGV8HCAHsZUoBduUwk+YZUVyp3
y+imobCCGEKZnHxtqJg7+VnQnH8ZTHFgOyqGa0boI5M8Dkae+qCcPCRDxHNa7L+7
Hb1eCH5RbB4MKQj/dmf01vpy/Q3gZOMHOoOHyMWp+IveculjcCxOIVZN0YQizj+3
Nw+1dcJLM6iR21f63zOQkoOhoFchp88IDc1PJtJnZOvI49Q9YB/2HKQdBGn292ZL
+e3TWGpsfaZW
=I4UB
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"The most significant item here is the Platform Firmware Runtime Update
and Telemetry (PFRUT) support designed to allow certain pieces of the
platform firmware to be updated on the fly, among other things.
Also important is the e820 handling change on x86 that should work
around PCI BAR allocation issues on some systems shipping since 2019.
The rest is just a handful of assorted fixes and cleanups on top of
the ACPI material merged previously.
Specifics:
- Add support for the the Platform Firmware Runtime Update and
Telemetry (PFRUT) interface based on ACPI to allow certain pieces
of the platform firmware to be updated without restarting the
system and to provide a mechanism for collecting platform firmware
telemetry data (Chen Yu, Dan Carpenter, Yang Yingliang).
- Ignore E820 reservations covering PCI host bridge windows on
sufficiently recent x86 systems to avoid issues with allocating PCI
BARs on systems where the E820 reservations cover the entire PCI
host bridge memory window returned by the _CRS object in the
system's ACPI tables (Hans de Goede).
- Fix and clean up acpi_scan_init() (Rafael Wysocki).
- Add more sanity checking to ACPI SPCR tables parsing (Mark
Langsdorf).
- Fix up ACPI APD (AMD Soc) driver initialization (Jiasheng Jiang).
- Drop unnecessary "static" from the ACPI PCC address space handling
driver added recently (kernel test robot)"
* tag 'acpi-5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PCC: pcc_ctx can be static
ACPI: scan: Rename label in acpi_scan_init()
ACPI: scan: Simplify initialization of power and sleep buttons
ACPI: scan: Change acpi_scan_init() return value type to void
ACPI: SPCR: check if table->serial_port.access_width is too wide
ACPI: APD: Check for NULL pointer after calling devm_ioremap()
x86/PCI: Ignore E820 reservations for bridge windows on newer systems
ACPI: pfr_telemetry: Fix info leak in pfrt_log_ioctl()
ACPI: pfr_update: Fix return value check in pfru_write()
ACPI: tools: Introduce utility for firmware updates/telemetry
ACPI: Introduce Platform Firmware Runtime Telemetry driver
ACPI: Introduce Platform Firmware Runtime Update device driver
efi: Introduce EFI_FIRMWARE_MANAGEMENT_CAPSULE_HEADER and corresponding structures
Pull signal/exit/ptrace updates from Eric Biederman:
"This set of changes deletes some dead code, makes a lot of cleanups
which hopefully make the code easier to follow, and fixes bugs found
along the way.
The end-game which I have not yet reached yet is for fatal signals
that generate coredumps to be short-circuit deliverable from
complete_signal, for force_siginfo_to_task not to require changing
userspace configured signal delivery state, and for the ptrace stops
to always happen in locations where we can guarantee on all
architectures that the all of the registers are saved and available on
the stack.
Removal of profile_task_ext, profile_munmap, and profile_handoff_task
are the big successes for dead code removal this round.
A bunch of small bug fixes are included, as most of the issues
reported were small enough that they would not affect bisection so I
simply added the fixes and did not fold the fixes into the changes
they were fixing.
There was a bug that broke coredumps piped to systemd-coredump. I
dropped the change that caused that bug and replaced it entirely with
something much more restrained. Unfortunately that required some
rebasing.
Some successes after this set of changes: There are few enough calls
to do_exit to audit in a reasonable amount of time. The lifetime of
struct kthread now matches the lifetime of struct task, and the
pointer to struct kthread is no longer stored in set_child_tid. The
flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
removed. Issues where task->exit_code was examined with
signal->group_exit_code should been examined were fixed.
There are several loosely related changes included because I am
cleaning up and if I don't include them they will probably get lost.
The original postings of these changes can be found at:
https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.orghttps://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.orghttps://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org
I trimmed back the last set of changes to only the obviously correct
once. Simply because there was less time for review than I had hoped"
* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
ptrace/m68k: Stop open coding ptrace_report_syscall
ptrace: Remove unused regs argument from ptrace_report_syscall
ptrace: Remove second setting of PT_SEIZED in ptrace_attach
taskstats: Cleanup the use of task->exit_code
exit: Use the correct exit_code in /proc/<pid>/stat
exit: Fix the exit_code for wait_task_zombie
exit: Coredumps reach do_group_exit
exit: Remove profile_handoff_task
exit: Remove profile_task_exit & profile_munmap
signal: clean up kernel-doc comments
signal: Remove the helper signal_group_exit
signal: Rename group_exit_task group_exec_task
coredump: Stop setting signal->group_exit_task
signal: Remove SIGNAL_GROUP_COREDUMP
signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
signal: Make coredump handling explicit in complete_signal
signal: Have prepare_signal detect coredumps using signal->core_state
signal: Have the oom killer detect coredumps using signal->core_state
exit: Move force_uaccess back into do_exit
exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
...
- Use common KVM implementation of MMU memory caches
- SBI v0.2 support for Guest
- Initial KVM selftests support
- Fix to avoid spurious virtual interrupts after clearing hideleg CSR
- Update email address for Anup and Atish
ARM:
- Simplification of the 'vcpu first run' by integrating it into
KVM's 'pid change' flow
- Refactoring of the FP and SVE state tracking, also leading to
a simpler state and less shared data between EL1 and EL2 in
the nVHE case
- Tidy up the header file usage for the nvhe hyp object
- New HYP unsharing mechanism, finally allowing pages to be
unmapped from the Stage-1 EL2 page-tables
- Various pKVM cleanups around refcounting and sharing
- A couple of vgic fixes for bugs that would trigger once
the vcpu xarray rework is merged, but not sooner
- Add minimal support for ARMv8.7's PMU extension
- Rework kvm_pgtable initialisation ahead of the NV work
- New selftest for IRQ injection
- Teach selftests about the lack of default IPA space and
page sizes
- Expand sysreg selftest to deal with Pointer Authentication
- The usual bunch of cleanups and doc update
s390:
- fix sigp sense/start/stop/inconsistency
- cleanups
x86:
- Clean up some function prototypes more
- improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation
- add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
- completely remove potential TOC/TOU races in nested SVM consistency checks
- update some PMCs on emulated instructions
- Intel AMX support (joint work between Thomas and Intel)
- large MMU cleanups
- module parameter to disable PMU virtualization
- cleanup register cache
- first part of halt handling cleanups
- Hyper-V enlightened MSR bitmap support for nested hypervisors
Generic:
- clean up Makefiles
- introduce CONFIG_HAVE_KVM_DIRTY_RING
- optimize memslot lookup using a tree
- optimize vCPU array usage by converting to xarray
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHhxvsUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPZkAf+Nz92UL/5nNGcdHtE4m7AToMmitE9
bYkesf9BMQvAe5wjkABLuoHGi6ay4jabo4fiGzbdkiK7lO5YgfsWiMB3/MT5fl4E
jRPzaVQabp3YZLM8UYCBmfUVuRj524S967SfSRe0AvYjDEH8y7klPf4+7sCsFT0/
Px9Vf2KGuOlf0eM78yKg4rGaF0jS22eLgXm6FfNMY8/e29ZAo/jyUmqBY+Z2xxZG
aWhceDtSheW1jwLHLj3nOlQJvHTn8LVGXBE/R8Gda3ZjrBV2rKaDi4Fh+HD+dz86
2zVXwzQ7uck2CMW73GMoXMTWoKSHMyvlBOs1BdvBm4UsnGcXR+q8IFCeuQ==
=s73m
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"RISCV:
- Use common KVM implementation of MMU memory caches
- SBI v0.2 support for Guest
- Initial KVM selftests support
- Fix to avoid spurious virtual interrupts after clearing hideleg CSR
- Update email address for Anup and Atish
ARM:
- Simplification of the 'vcpu first run' by integrating it into KVM's
'pid change' flow
- Refactoring of the FP and SVE state tracking, also leading to a
simpler state and less shared data between EL1 and EL2 in the nVHE
case
- Tidy up the header file usage for the nvhe hyp object
- New HYP unsharing mechanism, finally allowing pages to be unmapped
from the Stage-1 EL2 page-tables
- Various pKVM cleanups around refcounting and sharing
- A couple of vgic fixes for bugs that would trigger once the vcpu
xarray rework is merged, but not sooner
- Add minimal support for ARMv8.7's PMU extension
- Rework kvm_pgtable initialisation ahead of the NV work
- New selftest for IRQ injection
- Teach selftests about the lack of default IPA space and page sizes
- Expand sysreg selftest to deal with Pointer Authentication
- The usual bunch of cleanups and doc update
s390:
- fix sigp sense/start/stop/inconsistency
- cleanups
x86:
- Clean up some function prototypes more
- improved gfn_to_pfn_cache with proper invalidation, used by Xen
emulation
- add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
- completely remove potential TOC/TOU races in nested SVM consistency
checks
- update some PMCs on emulated instructions
- Intel AMX support (joint work between Thomas and Intel)
- large MMU cleanups
- module parameter to disable PMU virtualization
- cleanup register cache
- first part of halt handling cleanups
- Hyper-V enlightened MSR bitmap support for nested hypervisors
Generic:
- clean up Makefiles
- introduce CONFIG_HAVE_KVM_DIRTY_RING
- optimize memslot lookup using a tree
- optimize vCPU array usage by converting to xarray"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits)
x86/fpu: Fix inline prefix warnings
selftest: kvm: Add amx selftest
selftest: kvm: Move struct kvm_x86_state to header
selftest: kvm: Reorder vcpu_load_state steps for AMX
kvm: x86: Disable interception for IA32_XFD on demand
x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
kvm: selftests: Add support for KVM_CAP_XSAVE2
kvm: x86: Add support for getting/setting expanded xstate buffer
x86/fpu: Add uabi_size to guest_fpu
kvm: x86: Add CPUID support for Intel AMX
kvm: x86: Add XCR0 support for Intel AMX
kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
kvm: x86: Emulate IA32_XFD_ERR for guest
kvm: x86: Intercept #NM for saving IA32_XFD_ERR
x86/fpu: Prepare xfd_err in struct fpu_guest
kvm: x86: Add emulation for IA32_XFD
x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
x86/fpu: Add guest support to xfd_enable_feature()
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmHhw7oTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXrjSB/979LV4Dn1PMcFYsSdlFEMeHcjzJdw/
kFnLPXMaPJyfg6QPuf83jxzw9uxw8fcePMdVq/FFBtmVV9fJMAv62B8jaGS1p58c
WnAg+7zsTN+xEoJn+tskSSon8BNMWVrl41zP3K4Ged+5j8UEBk62GB8Orz1qkpwL
fTh3/+xAvczJeD4zZb1dAm4WnmcQJ4vhg45p07jX6owvnwQAikMFl45aSW54I5o8
vAxGzFgdsZ2NtExnRNKh3b3DozA8JUE89KckBSZnDtq4rH8Fyy6Wij56Hc6v6Cml
SUohiNbHX7hsNwit/lxL8wuF97IiA0pQSABobEg3rxfTghTUep51LlaN
=/m4A
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- More patches for Hyper-V isolation VM support (Tianyu Lan)
- Bug fixes and clean-up patches from various people
* tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
scsi: storvsc: Fix storvsc_queuecommand() memory leak
x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
Drivers: hv: vmbus: Initialize request offers message for Isolation VM
scsi: storvsc: Fix unsigned comparison to zero
swiotlb: Add CONFIG_HAS_IOMEM check around swiotlb_mem_remap()
x86/hyperv: Fix definition of hv_ghcb_pg variable
Drivers: hv: Fix definition of hypercall input & output arg variables
net: netvsc: Add Isolation VM support for netvsc driver
scsi: storvsc: Add Isolation VM support for storvsc driver
hyper-v: Enable swiotlb bounce buffer for Isolation VM
x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
swiotlb: Add swiotlb bounce buffer remap function for HV IVM
Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
A couple of kernel functions call for_each_*_bit_from() with start
bit equal to 0. Replace them with for_each_*_bit().
No functional changes, but might improve on readability.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
enabled(without KASAN_VMALLOC) on x86[1].
When the module area allocates memory, it's kmemleak_object is created
successfully, but the KASAN shadow memory of module allocation is not
ready, so when kmemleak scan the module's pointer, it will panic due to
no shadow memory with KASAN check.
module_alloc
__vmalloc_node_range
kmemleak_vmalloc
kmemleak_scan
update_checksum
kasan_module_alloc
kmemleak_ignore
Note, there is no problem if KASAN_VMALLOC enabled, the modules area
entire shadow memory is preallocated. Thus, the bug only exits on ARCH
which supports dynamic allocation of module area per module load, for
now, only x86/arm64/s390 are involved.
Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
kmemleak in module_alloc() to fix this issue.
[1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/
[wangkefeng.wang@huawei.com: fix build]
Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
[akpm@linux-foundation.org: simplify ifdefs, per Andrey]
Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com
Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
Fixes: 793213a82d ("s390/kasan: dynamic shadow mem allocation for modules")
Fixes: 39d114ddc6 ("arm64: add KASAN support")
Fixes: bebf56a1b1 ("kasan: enable instrumentation of global variables")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KVM can disable the write emulation for the XFD MSR when the vCPU's fpstate
is already correctly sized to reduce the overhead.
When write emulation is disabled the XFD MSR state after a VMEXIT is
unknown and therefore not in sync with the software states in fpstate and
the per CPU XFD cache.
Provide fpu_sync_guest_vmexit_xfd_state() which has to be invoked after a
VMEXIT before enabling interrupts when write emulation is disabled for the
XFD MSR.
It could be invoked unconditionally even when write emulation is enabled
for the price of a pointless MSR read.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-21-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Userspace needs to inquire KVM about the buffer size to work
with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info
to guest_fpu for KVM to access.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-18-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guest XFD can be updated either in the emulation path or in the
restore path.
Provide a wrapper to update guest_fpu::fpstate::xfd. If the guest
fpstate is currently in-use, also update the per-cpu xfd cache and
the actual MSR.
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-10-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Provide a wrapper for expanding the guest fpstate buffer according
to requested xfeatures. KVM wants to call this wrapper to manage
any dynamic xstate used by the guest.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220105123532.12586-8-yang.zhong@intel.com>
[Remove unnecessary 32-bit check. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guest support for dynamically enabled FPU features requires a few
modifications to the enablement function which is currently invoked from
the #NM handler:
1) Use guest permissions and sizes for the update
2) Update fpu_guest state accordingly
3) Take into account that the enabling can be triggered either from a
running guest via XSETBV and MSR_IA32_XFD write emulation or from
a guest restore. In the latter case the guests fpstate is not the
current tasks active fpstate.
Split the function and implement the guest mechanics throughout the
callchain.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-7-yang.zhong@intel.com>
[Add 32-bit stub for __xfd_enable_feature. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vCPU threads are different from native tasks regarding to the initial XFD
value. While all native tasks follow a fixed value (init_fpstate::xfd)
established by the FPU core at boot, vCPU threads need to obey the reset
value (i.e. ZERO) defined by the specification, to meet the expectation of
the guest.
Let the caller supply an argument and adjust the host and guest related
invocations accordingly.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Treewide cleanup and consolidation of MSI interrupt handling in
preparation for further changes in this area which are necessary to:
- address existing shortcomings in the VFIO area
- support the upcoming Interrupt Message Store functionality which
decouples the message store from the PCI config/MMIO space
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf+SETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYobzGD/wNEFl5qQo5mNZ9thP6JSJFOItm7zMc
2QgzCYOqNwAv4jL6Dqo+EHtbShYqDyWzKdKccgqNjmdIqgW8q7/fubN1OPzRsClV
CZG997AsXDGXYlQcE3tXZjkeCWnWEE2AGLnygSkFV1K/r9ALAtFfTBJAWB+UD+Zc
1P8Kxo0q0Jg+DQAMAA5bWfSSjo/Pmpr/1AFjY7+GA8BBeJJgWOyW7H1S+GYEWVOE
RaQP81Sbd6x1JkopxkNqSJ/lbNJfnPJxi2higB56Y0OYn5CuSarYbZUM7oQ2V61t
jN7pcEEvTpjLd6SJ93ry8WOcJVMTbccCklVfD0AfEwwGUGw2VM6fSyNrZfnrosUN
tGBEO8eflBJzGTAwSkz1EhiGKna4o1NBDWpr0sH2iUiZC5G6V2hUDbM+0PQJhDa8
bICwguZElcUUPOprwjS0HXhymnxghTmNHyoEP1yxGoKLTrwIqkH/9KGustWkcBmM
hNtOCwQNqxcOHg/r3MN0KxttTASgoXgNnmFliAWA7XwseRpLWc95XPQFa5sptRhc
EzwumEz17EW1iI5/NyZQcY+jcZ9BdgCqgZ9ECjZkyN4U+9G6iACUkxVaHUUs77jl
a0ISSEHEvJisFOsOMYyFfeWkpIKGIKP/bpLOJEJ6kAdrUWFvlRGF3qlav3JldXQl
ypFjPapDeB5guw==
=vKzd
-----END PGP SIGNATURE-----
Merge tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI irq updates from Thomas Gleixner:
"Rework of the MSI interrupt infrastructure.
This is a treewide cleanup and consolidation of MSI interrupt handling
in preparation for further changes in this area which are necessary
to:
- address existing shortcomings in the VFIO area
- support the upcoming Interrupt Message Store functionality which
decouples the message store from the PCI config/MMIO space"
* tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
genirq/msi: Populate sysfs entry only once
PCI/MSI: Unbreak pci_irq_get_affinity()
genirq/msi: Convert storage to xarray
genirq/msi: Simplify sysfs handling
genirq/msi: Add abuse prevention comment to msi header
genirq/msi: Mop up old interfaces
genirq/msi: Convert to new functions
genirq/msi: Make interrupt allocation less convoluted
platform-msi: Simplify platform device MSI code
platform-msi: Let core code handle MSI descriptors
bus: fsl-mc-msi: Simplify MSI descriptor handling
soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs()
soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation
NTB/msi: Convert to msi_on_each_desc()
PCI: hv: Rework MSI handling
powerpc/mpic_u3msi: Use msi_for_each-desc()
powerpc/fsl_msi: Use msi_for_each_desc()
powerpc/pasemi/msi: Convert to msi_on_each_dec()
powerpc/cell/axon_msi: Convert to msi_on_each_desc()
powerpc/4xx/hsta: Rework MSI handling
...
misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and
LIVEPATCH as the backtrace misses the function which is being fixed up.
- Add Straight Light Speculation mitigation support which uses a new
compiler switch -mharden-sls= which sticks an INT3 after a RET or an
indirect branch in order to block speculation after them. Reportedly,
CPUs do speculate behind such insns.
- The usual set of cleanups and improvements
-----BEGIN PGP SIGNATURE-----
iQIyBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHfKA0ACgkQEsHwGGHe
VUqLJg/2I2X2xXr5filJVaK+sQgmvDzk67DKnbxRBW2xcPF+B5sSW5yhe3G5UPW7
SJVdhQ3gHcTiliGGlBf/VE7KXbqxFN0vO4/VFHZm78r43g7OrXTxz6WXXQRJ1n67
U3YwRH3b6cqXZNFMs+X4bJt6qsGJM1kdTTZ2as4aERnaFr5AOAfQvfKbyhxLe/XA
3SakfYISVKCBQ2RkTfpMpwmqlsatGFhTC5IrvuDQ83dDsM7O+Dx1J6Gu3fwjKmie
iVzPOjCh+xTpZQp/SIZmt7MzoduZvpSym4YVyHvEnMiexQT4AmyaRthWqrhnEXY/
qOvj8/XIqxmix8EaooGqRIK0Y2ZegxkPckNFzaeC3lsWohwMIGIhNXwHNEeuhNyH
yvNGAW9Cq6NeDRgz5MRUXcimYw4P4oQKYLObS1WqFZhNMqm4sNtoEAYpai/lPYfs
zUDckgXF2AoPOsSqy3hFAVaGovAgzfDaJVzkt0Lk4kzzjX2WQiNLhmiior460w+K
0l2Iej58IajSp3MkWmFH368Jo8YfUVmkjbbpsmjsBppA08e1xamJB7RmswI/Ezj6
s5re6UioCD+UYdjWx41kgbvYdvIkkZ2RLrktoZd/hqHrOLWEIiwEbyFO2nRFJIAh
YjvPkB1p7iNuAeYcP1x9Ft9GNYVIsUlJ+hK86wtFCqy+abV+zQ==
=R52z
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:
- Get rid of all the .fixup sections because this generates
misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and
LIVEPATCH as the backtrace misses the function which is being fixed
up.
- Add Straight Line Speculation mitigation support which uses a new
compiler switch -mharden-sls= which sticks an INT3 after a RET or an
indirect branch in order to block speculation after them. Reportedly,
CPUs do speculate behind such insns.
- The usual set of cleanups and improvements
* tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
x86/entry_32: Fix segment exceptions
objtool: Remove .fixup handling
x86: Remove .fixup section
x86/word-at-a-time: Remove .fixup usage
x86/usercopy: Remove .fixup usage
x86/usercopy_32: Simplify __copy_user_intel_nocache()
x86/sgx: Remove .fixup usage
x86/checksum_32: Remove .fixup usage
x86/vmx: Remove .fixup usage
x86/kvm: Remove .fixup usage
x86/segment: Remove .fixup usage
x86/fpu: Remove .fixup usage
x86/xen: Remove .fixup usage
x86/uaccess: Remove .fixup usage
x86/futex: Remove .fixup usage
x86/msr: Remove .fixup usage
x86/extable: Extend extable functionality
x86/entry_32: Remove .fixup usage
x86/entry_64: Remove .fixup usage
x86/copy_mc_64: Remove .fixup usage
...
- set_fs removal
- Devicetree support
- Many cleanups from Al
- Various virtio and build related fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmHbPpwWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wXkcD/9UfDRFqvgtZIlacQ/0IvN24xeq
+f4aXoXEVyVsCd02jv9pUk3IAezIQyJf3MGtNJ4D/UFXtYfjEYjK5kJpPDP6umaZ
ZDnpTzn29HW2aGlgOxW9gU7a3Yze629QasIRP6x7Ht+Hk5eXrvRYrgcmKtw1mm04
SA5v5ZqP3P5r623fpsFiw4Dvl7l6MhDyFeyA2tabNnmv93HgB76PHDtV2Z+SWrC+
ubjlfBQc87QGHW+eTvce+0qw9APMoJpNFjNN4H8P/9VcDTvw+KL2JqQ02HSMWh4z
HeHKsv6hbty+GskBhbaWDW7867fPJ3e08TFAAAjeEiBP/CDBwjOTSr3eOw1eHgzU
xdAqC2Bz0e5G3shClmVEzzvcP6R2cgNZjeBze5m3wQ1NKHEddk6N9t5K+4NrOpgp
gbNN5Q4FAVOBKeQsZWG81bJKGcu7SbShgiKjlxcaRpMyp6LwyD4naauGjmCzYsbf
Pd4ilLO1Yocf7nFs2C4vWxE4iAZ6hfQtukerIxCQfb/Y2BaWT3bcWWYHFRFy6Lq+
hTFGnjf+Ro65QCoa1idaLaUdhwAGi6U9sjjL6G/JdQCCE3ftcXLVA9TJz9CNdMb5
98IGznxhczOZc7rHNXOF4km5+OUrU6N+C0WRp3yOoUWcI+Ms4PXHzqIwC5cde/V7
O/o9O1BAoBP6LE1pPg==
=5J6E
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- set_fs removal
- Devicetree support
- Many cleanups from Al
- Various virtio and build related fixes
* tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (31 commits)
um: virtio_uml: Allow probing from devicetree
um: Add devicetree support
um: Extract load file helper from initrd.c
um: remove set_fs
hostfs: Fix writeback of dirty pages
um: Use swap() to make code cleaner
um: header debriding - sigio.h
um: header debriding - os.h
um: header debriding - net_*.h
um: header debriding - mem_user.h
um: header debriding - activate_ipi()
um: common-offsets.h debriding...
um, x86: bury crypto_tfm_ctx_offset
um: unexport handle_page_fault()
um: remove a dangling extern of syscall_trace()
um: kill unused cpu()
uml/i386: missing include in barrier.h
um: stop polluting the namespace with registers.h contents
logic_io instance of iounmap() needs volatile on argument
um: move amd64 variant of mmap(2) to arch/x86/um/syscalls_64.c
...
New drivers:
- PMBus driver for MPS Multi-phase mp5023
- PMBus driver for Delta AHE-50DC fan control module
- Driver for NZXT RGB&Fan Controller/Smart Device v2
- Driver for Texas Instruments INA238
- Driver to support X370 Asus WMI
- Driver to support B550 Asus WMI
Other notable changes:
- Cleanup of ntc_thermistor driver, and added support for Samsung 1404-001221 NTC
- Improve detection of LM84, MAX1617, and MAX1617A in adm1021 driver
- Clean up tmp401 driver, and convert to with_info API
- Add support for regulators and IR38060, IR38164 IR38263 to ir38064 PMBus driver
- Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh to k10temp driver
- Add support for F81966 to f71882fg driver
- Add support for ONSEMI N34TS04 to jc42 driver
- Clean up and simplify dell-smm driver
- Add support for ROG STRIX B550-A/X570-I GAMING to nct6775 driver
Various other minor improvements and fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAmHbsioACgkQyx8mb86f
mYGL2A//RPRWUgnBqzT6dwrhqiRzYFJXbqFPUIEIeBBFTZ/QbyQWazvQYlxOieDm
8Y8NZlgqCP2UJ1pYI7jh7g7n6GFp4stNHgAO9GsGbkQNZYCR7eyRCSxVTB9oW+L4
Cxt1sXclUwcQXi5h3cMZbBTebAcKwUtg81QaIkOBjpfmLzP6PisS0rB6Dy9WjWnF
aXTNH6wHNY/xqOFpUuS5DkS+FT5eu35kP9DDLvF8yHLUdvn1JzwMOVNV/p0H1VZ8
ZmOyhgDqqwcaYJlHIY5R6M5u1sH/r4I+zUgQPzmuzw3CLYSa3C/yhnmagRAqheT2
HguOJClQuSilO49W4w3iiOOOUy6Qf082Mta62RjQT6/+po1FYNQTfYbAyYR1CkMH
sj0c9TvSOzSXpy2W0jTmCGL38VLy54lXMq+xM2o+dVzSKpN3JyuGpBLt730Z25on
gTRNO8DpytLe2EPZ32R5hu1UcoMe1O+X2tCctyyC0RZMwKwWRcC5N3Kvx98ea61L
dkvjQyTzfI8nZoUoh0ML15OhhOTJV10fQt6NB6oA9/iFjPblE9YqU6jeN/2gCTFL
SkDKUO35vD8d6aBl7YtRNSSdjF1wuC22GSPpj47vh4eM3MU9eCL2Qyqd6hksFTFF
wYm9PzvKxmPSlsCvifZj/oKdFODOgG06+7tGZki/gfCv66uJl4U=
=bh5w
-----END PGP SIGNATURE-----
Merge tag 'hwmon-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
"New drivers:
- PMBus driver for MPS Multi-phase mp5023
- PMBus driver for Delta AHE-50DC fan control module
- Driver for NZXT RGB&Fan Controller/Smart Device v2
- Driver for Texas Instruments INA238
- Driver to support X370 Asus WMI
- Driver to support B550 Asus WMI
Other notable changes:
- Cleanup of ntc_thermistor driver, and added support for Samsung
1404-001221 NTC
- Improve detection of LM84, MAX1617, and MAX1617A in adm1021 driver
- Clean up tmp401 driver, and convert to with_info API
- Add support for regulators and IR38060, IR38164 IR38263 to ir38064
PMBus driver
- Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh to
k10temp driver
- Add support for F81966 to f71882fg driver
- Add support for ONSEMI N34TS04 to jc42 driver
- Clean up and simplify dell-smm driver
- Add support for ROG STRIX B550-A/X570-I GAMING to nct6775 driver
And various other minor improvements and fixes"
* tag 'hwmon-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (49 commits)
hwmon: (nzxt-smart2) make array detect_fans_report static const
hwmon: (xgene-hwmon) Add free before exiting xgene_hwmon_probe
hwmon: (nzxt-smart2) Fix "unused function" warning
hwmon: (dell-smm) Pack the whole smm_regs struct
hwmon: (nct6775) Additional check for ChipID before ASUS WMI usage
hwmon: (mr75203) fix wrong power-up delay value
hwmon/pmbus: (ir38064) Fix spelling mistake "comaptible" -> "compatible"
hwmon/pmbus: (ir38064) Expose a regulator
hwmon/pmbus: (ir38064) Add of_match_table
hwmon/pmbus: (ir38064) Add support for IR38060, IR38164 IR38263
hwmon: add driver for NZXT RGB&Fan Controller/Smart Device v2.
hwmon: (nct6775) add ROG STRIX B550-A/X570-I GAMING
hwmon: (pmbus) Add support for MPS Multi-phase mp5023
dt-bindings: add Delta AHE-50DC fan control module
hwmon: (pmbus) Add Delta AHE-50DC fan control module driver
hwmon: prefix kernel-doc comments for structs with struct
hwmon: (ntc_thermistor) Add Samsung 1404-001221 NTC
hwmon: (ntc_thermistor) Drop OF dependency
hwmon: (dell-smm) Unify i8k_ioctl() and i8k_ioctl_unlocked()
hwmon: (dell-smm) Simplify ioctl handler
...
Some BIOS-es contain a bug where they add addresses which map to system
RAM in the PCI host bridge window returned by the ACPI _CRS method, see
commit 4dc2287c18 ("x86: avoid E820 regions when allocating address
space").
To work around this bug Linux excludes E820 reserved addresses when
allocating addresses from the PCI host bridge window since 2010.
Recently (2019) some systems have shown-up with E820 reservations which
cover the entire _CRS returned PCI bridge memory window, causing all
attempts to assign memory to PCI BARs which have not been setup by the
BIOS to fail. For example here are the relevant dmesg bits from a
Lenovo IdeaPad 3 15IIL 81WE:
[mem 0x000000004bc50000-0x00000000cfffffff] reserved
pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
The ACPI specifications appear to allow this new behavior:
The relationship between E820 and ACPI _CRS is not really very clear.
ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
This range of addresses is in use or reserved by the system and is
not to be included in the allocatable memory pool of the operating
system's memory manager.
and it may be used when:
The address range is in use by a memory-mapped system device.
Furthermore, sec 15.2 says:
Address ranges defined for baseboard memory-mapped I/O devices, such
as APICs, are returned as reserved.
A PCI host bridge qualifies as a baseboard memory-mapped I/O device,
and its apertures are in use and certainly should not be included in
the general allocatable pool, so the fact that some BIOS-es reports
the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
So it seems that the excluding of E820 reserved addresses is a mistake.
Ideally Linux would fully stop excluding E820 reserved addresses,
but then the old systems this was added for will regress.
Instead keep the old behavior for old systems, while ignoring
the E820 reservations for any systems from now on.
Old systems are defined here as BIOS year < 2018, this was chosen to make
sure that E820 reservations will not be used on the currently affected
systems, while at the same time also taking into account that the systems
for which the E820 checking was originally added may have received BIOS
updates for quite a while (esp. CVE related ones), giving them a more
recent BIOS year then 2010.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
BugLink: https://bugs.launchpad.net/bugs/1878279
BugLink: https://bugs.launchpad.net/bugs/1931715
BugLink: https://bugs.launchpad.net/bugs/1932069
BugLink: https://bugs.launchpad.net/bugs/1921649
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Add new P-state driver for AMD processors (Huang Rui).
- Fix initialization of min and max frequency QoS requests in the
cpufreq core (Rafael Wysocki).
- Fix EPP handling on Alder Lake in intel_pstate (Srinivas Pandruvada).
- Make intel_pstate update cpuinfo.max_freq when notified of HWP
capabilities changes and drop a redundant function call from that
driver (Rafael Wysocki).
- Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
Stephen Boyd, Vladimir Zapolskiy).
- Fix double devm_remap() in the Mediatek cpufreq driver (Hector Yuan).
- Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
Luba).
- Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).
- Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).
- Fix two comments in cpuidle code (Jason Wang, Yang Li).
- Allow model-specific normal EPB value to be used in the intel_epb
sysfs attribute handling code (Srinivas Pandruvada).
- Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).
- Add safety net to supplier device release in the runtime PM core
code (Rafael Wysocki).
- Capture device status before disabling runtime PM for it (Rafael
Wysocki).
- Add new macros for declaring PM operations to allow drivers to
avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
update some drivers to use these macros (Paul Cercueil).
- Allow ACPI hardware signature to be honoured during restore from
hibernation (David Woodhouse).
- Update outdated operating performance points (OPP) documentation
(Tang Yizhou).
- Reduce log severity for informative message regarding frequency
transition failures in devfreq (Tzung-Bi Shih).
- Add DRAM frequency controller devfreq driver for Allwinner sunXi
SoCs (Samuel Holland).
- Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
Bergmann).
- Add support for new layout of Psys PowerLimit Register on SPR to
the Intel RAPL power capping driver (Zhang Rui).
- Fix typo in a comment in idle_inject.c (Jason Wang).
- Remove unused function definition from the DTPM (Dynamit Thermal
Power Management) power capping framework (Daniel Lezcano).
- Reduce DTPM trace verbosity (Daniel Lezcano).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmHcgkgSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxs34P/3kFhRk7qrwEekx6F11im6caLKT9+Qap
PuGVqfTbK7TupVQDVGFBEjTjgKY7Ph7Fcr4bqn6wvNOp96cjXyOSk/c1fcpS3Bpr
b1PYsFsb9diNKE462sGGYClyCT3X5qQqtpxzOl3g4I1PWKTC1mKFm4Jm2m6S6cFq
DKhsgYKFzQSZNb1wJM4JjHS9c3BRygqp4nfEAmifu5b9tLZf7stWnFHhbGq63M9m
OwHOrEEnzhf4pOXGZTvIXeczgE6IcuDdlGkIg7XMHnmKSNvj1HqhEgi2lfSRb98z
5eI4S6JymCJGVK+gr8iVCq1iJ+LKqV3YPXRqvI35/+NqIKYxMt2ZivQQf5s3aQLe
26gUulD3O6Pz5tMlwcDElD4/tcClfg35PCD/VzpRR8TAo8vLBb63kZ5v6+HM34ZJ
6QbLTNZJTnGmEqxMccUxP+HhZz8ssqpLAC+R2sE5yXbNpIZq8CbPiGb65RGiX3SG
CmRKqH/xQVNKBYP0ChjmUyhKcBxOnx1Xu8AhsN7gRAy0aht7j7OdjTnJuGiX6gu3
Q5WxvVvkekyfhuFQ5TST9y/fzvMJWzeaA6GhVIr6RoBmshNQGTb0H4HXARxS3Ah5
qjd7ao7BFLa898FCHaHIpmFWp0wF5iljwCJQVP3I2qUpPvDJxEtsxc4CF/AZzyNR
VudoFqLoIV5C
=1egI
-----END PGP SIGNATURE-----
Merge tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The most signigicant change here is the addition of a new cpufreq
'P-state' driver for AMD processors as a better replacement for the
venerable acpi-cpufreq driver.
There are also other cpufreq updates (in the core, intel_pstate, ARM
drivers), PM core updates (mostly related to adding new macros for
declaring PM operations which should make the lives of driver
developers somewhat easier), and a bunch of assorted fixes and
cleanups.
Summary:
- Add new P-state driver for AMD processors (Huang Rui).
- Fix initialization of min and max frequency QoS requests in the
cpufreq core (Rafael Wysocki).
- Fix EPP handling on Alder Lake in intel_pstate (Srinivas
Pandruvada).
- Make intel_pstate update cpuinfo.max_freq when notified of HWP
capabilities changes and drop a redundant function call from that
driver (Rafael Wysocki).
- Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
Stephen Boyd, Vladimir Zapolskiy).
- Fix double devm_remap() in the Mediatek cpufreq driver (Hector
Yuan).
- Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
Luba).
- Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).
- Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).
- Fix two comments in cpuidle code (Jason Wang, Yang Li).
- Allow model-specific normal EPB value to be used in the intel_epb
sysfs attribute handling code (Srinivas Pandruvada).
- Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).
- Add safety net to supplier device release in the runtime PM core
code (Rafael Wysocki).
- Capture device status before disabling runtime PM for it (Rafael
Wysocki).
- Add new macros for declaring PM operations to allow drivers to
avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
update some drivers to use these macros (Paul Cercueil).
- Allow ACPI hardware signature to be honoured during restore from
hibernation (David Woodhouse).
- Update outdated operating performance points (OPP) documentation
(Tang Yizhou).
- Reduce log severity for informative message regarding frequency
transition failures in devfreq (Tzung-Bi Shih).
- Add DRAM frequency controller devfreq driver for Allwinner sunXi
SoCs (Samuel Holland).
- Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
Bergmann).
- Add support for new layout of Psys PowerLimit Register on SPR to
the Intel RAPL power capping driver (Zhang Rui).
- Fix typo in a comment in idle_inject.c (Jason Wang).
- Remove unused function definition from the DTPM (Dynamit Thermal
Power Management) power capping framework (Daniel Lezcano).
- Reduce DTPM trace verbosity (Daniel Lezcano)"
* tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
cpufreq: amd-pstate: Fix Kconfig dependencies for AMD P-State
cpufreq: amd-pstate: Fix struct amd_cpudata kernel-doc comment
cpuidle: use default_groups in kobj_type
x86: intel_epb: Allow model specific normal EPB value
MAINTAINERS: Add AMD P-State driver maintainer entry
Documentation: amd-pstate: Add AMD P-State driver introduction
cpufreq: amd-pstate: Add AMD P-State performance attributes
cpufreq: amd-pstate: Add AMD P-State frequencies attributes
cpufreq: amd-pstate: Add boost mode support for AMD P-State
cpufreq: amd-pstate: Add trace for AMD P-State module
cpufreq: amd-pstate: Introduce the support for the processors with shared memory solution
cpufreq: amd-pstate: Add fast switch function for AMD P-State
cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors
ACPI: CPPC: Add CPPC enable register function
ACPI: CPPC: Check present CPUs for determining _CPC is valid
ACPI: CPPC: Implement support for SystemIO registers
x86/msr: Add AMD CPPC MSR definitions
x86/cpufeatures: Add AMD Collaborative Processor Performance Control feature flag
cpufreq: use default_groups in kobj_type
...
core:
- add privacy screen support
- move nomodeset option into drm subsystem
- clean up nomodeset handling in drivers
- make drm_irq.c legacy
- fix stack_depot name conflicts
- remove DMA_BUF_SET_NAME ioctl restrictions
- sysfs: send hotplug event
- replace several DRM_* logging macros with drm_*
- move hashtable to legacy code
- add error return from gem_create_object
- cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
- kernel.h related include cleanups
- support XRGB2101010 source buffers
ttm:
- don't include drm hashtable
- stop pruning fences after wait
- documentation updates
dma-buf:
- add dma_resv selftest
- add debugfs helpers
- remove dma_resv_get_excl_unlocked
- documentation
- make fences mandatory in dma_resv_add_excl_fence
dp:
- add link training delay helpers
gem:
- link shmem/cma helpers into separate modules
- use dma_resv iteratior
- import dma-buf namespace into gem helper modules
scheduler:
- fence grab fix
- lockdep fixes
bridge:
- switch to managed MIPI DSI helpers
- register and attach during probe fixes
- convert to YAML in several places.
panel:
- add bunch of new panesl
simpledrm:
- support FB_DAMAGE_CLIPS
- support virtual screen sizes
- add Apple M1 support
amdgpu:
- enable seamless boot for DCN 3.01
- runtime PM fixes
- use drm_kms_helper_connector_hotplug_event
- get all fences at once
- use generic drm fb helpers
- PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
- add smart trace buffer (STB) for supported GPUs
- display debugfs entries
- new SMU debug option
- Documentation update
amdkfd:
- IP discovery enumeration refactor
- interface between driver fixes
- SVM fixes
- kfd uapi header to define some sysfs bitfields.
i915:
- support VESA panel backlights
- enable ADL-P by default
- add eDP privacy screen support
- add Raptor Lake S (RPL-S) support
- DG2 page table support
- lots of GuC/HuC fw refactoring
- refactored i915->gt interfaces
- CD clock squashing support
- enable 10-bit gamma support
- update ADL-P DMC fw to v2.14
- enable runtime PM autosuspend by default
- ADL-P DSI support
- per-lane DP drive settings for ICL+
- add support for pipe C/D DMC firmware
- Atomic gamma LUT updates
- remove CCS FB stride restrictions on ADL-P
- VRR platform support for display 11
- add support for display audio codec keepalive
- lots of display refactoring
- fix runtime PM handling during PXP suspend
- improved eviction performance with async TTM moves
- async VMA unbinding improvements
- VMA locking refactoring
- improved error capture robustness
- use per device iommu checks
- drop bits stealing from i915_sw_fence function ptr
- remove dma_resv_prune
- add IC cache invalidation on DG2
nouveau:
- crc fixes
- validate LUTs in atomic check
- set HDMI AVI RGB quant to full
tegra:
- buffer objects reworks for dma-buf compat
- NVDEC driver uAPI support
- power management improvements
etnaviv:
- IOMMU enabled system support
- fix > 4GB command buffer mapping
- close a DoS vector
- fix spurious GPU resets
ast:
- fix i2c initialization
rcar-du:
- DSI output support
exynos:
- replace legacy gpio interface
- implement generic GEM object mmap
msm:
- dpu plane state cleanup in prep for multirect
- dpu debugfs cleanups
- dp support for sc7280
- a506 support
- removal of struct_mutex
- remove old eDP sub-driver
anx7625:
- support MIPI DSI input
- support HDMI audio
- fix reading EDID
lvds:
- fix bridge DT bindings
megachips:
- probe both bridges before registering
dw-hdmi:
- allow interlace on bridge
ps8640:
- enable runtime PM
- support aux-bus
tx358768:
- enable reference clock
- add pulse mode support
ti-sn65dsi86:
- use regmap bulk write
- add PWM support
etnaviv:
- get all fences at once
gma500:
- gem object cleanups
kmb:
- enable fb console
radeon:
- use dma_resv_wait_timeout
rockchip:
- add DSP hold timeout
- suspend/resume fixes
- PLL clock fixes
- implement mmap in GEM object functions
- use generic fbdev emulation
sun4i:
- use CMA helpers without vmap support
vc4:
- fix HDMI-CEC hang with display is off
- power on HDMI controller while disabling
- support 4K@60Hz modes
- support 10-bit YUV 4:2:0 output
vmwgfx:
- fix leak on probe errors
- fail probing on broken hosts
- new placement for MOB page tables
- hide internal BOs from userspace
- implement GEM support
- implement GL 4.3 support
virtio:
- overflow fixes
xen:
- implement mmap as GEM object function
omapdrm:
- fix scatterlist export
- support virtual planes
mediatek:
- MT8192 support
- CMDQ refinement
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmHX1vMACgkQDHTzWXnE
hr6rAw/9ES5RO5N3Ku9foFk1CI9bqy1Kh663KLkkEc+rDdhKpiZbBnAsrKkZ9sGu
fNuHmWNN5nWXtDSOqHWuslt3F7Gh+qEBQtlkqC9mZsBm3bWB0aJK6E4QaJxfeSaK
ta6AmyGx8DaV+C69i86dnemQurYSDVjROd7LDPKnCU0Fye/JxiXSXQmXksKMFVxd
x5vmO9yfeDSg3EF+u1yB6nJNUYZBV0vhrAfjPqxPCRBXuQc7akuaglE/SFwlGnEk
vn0GjVHEQcRTqYKrHr64xvQxIoKXcJP0pkDUyT7KYCsyj8GJkvxkb7/ls5pp5DvL
SwyNg3J3vwUVP6w6GEvzf3ffG720qqUZvCbvLmE+A/t2DhGILiAm+HXSo43PTOW8
uagT7Gxma8dy8EovjSxioS9HPX8Gcu+S+XYavgOsevOZ7oeEt4f4TLW7LXsw9d6y
75FrMhiUpreab5hAh8Le0swuLYZHjdnJRdjSTqZJ/T6VdTdVftLT6IfwvSDx5CHy
cWuufgcAjd7xVTXFquHWYXWLTQkiSMGf1M02jx9IWolTd4Cm41LNBhqMEDHZLHJD
7ngGgoaREVDQ+MqjG90yfIwJFIpJPI3YOaHLi/Kznga+iDzFY6cyOQWW2vX7ZdY5
7+LJWsgGT8Feb7/bzD5hX1mYqJLxh1pWUIaqIKMl+7LJL7gTVU8=
=MByd
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights are support for privacy screens found in new laptops, a
bunch of nomodeset refactoring, and i915 enables ADL-P systems by
default, while starting to add RPL-S support.
vmwgfx adds GEM and support for OpenGL 4.3 features in userspace.
Lots of internal refactorings around dma reservations, and lots of
driver refactoring as well.
Summary:
core:
- add privacy screen support
- move nomodeset option into drm subsystem
- clean up nomodeset handling in drivers
- make drm_irq.c legacy
- fix stack_depot name conflicts
- remove DMA_BUF_SET_NAME ioctl restrictions
- sysfs: send hotplug event
- replace several DRM_* logging macros with drm_*
- move hashtable to legacy code
- add error return from gem_create_object
- cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
- kernel.h related include cleanups
- support XRGB2101010 source buffers
ttm:
- don't include drm hashtable
- stop pruning fences after wait
- documentation updates
dma-buf:
- add dma_resv selftest
- add debugfs helpers
- remove dma_resv_get_excl_unlocked
- documentation
- make fences mandatory in dma_resv_add_excl_fence
dp:
- add link training delay helpers
gem:
- link shmem/cma helpers into separate modules
- use dma_resv iteratior
- import dma-buf namespace into gem helper modules
scheduler:
- fence grab fix
- lockdep fixes
bridge:
- switch to managed MIPI DSI helpers
- register and attach during probe fixes
- convert to YAML in several places.
panel:
- add bunch of new panesl
simpledrm:
- support FB_DAMAGE_CLIPS
- support virtual screen sizes
- add Apple M1 support
amdgpu:
- enable seamless boot for DCN 3.01
- runtime PM fixes
- use drm_kms_helper_connector_hotplug_event
- get all fences at once
- use generic drm fb helpers
- PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
- add smart trace buffer (STB) for supported GPUs
- display debugfs entries
- new SMU debug option
- Documentation update
amdkfd:
- IP discovery enumeration refactor
- interface between driver fixes
- SVM fixes
- kfd uapi header to define some sysfs bitfields.
i915:
- support VESA panel backlights
- enable ADL-P by default
- add eDP privacy screen support
- add Raptor Lake S (RPL-S) support
- DG2 page table support
- lots of GuC/HuC fw refactoring
- refactored i915->gt interfaces
- CD clock squashing support
- enable 10-bit gamma support
- update ADL-P DMC fw to v2.14
- enable runtime PM autosuspend by default
- ADL-P DSI support
- per-lane DP drive settings for ICL+
- add support for pipe C/D DMC firmware
- Atomic gamma LUT updates
- remove CCS FB stride restrictions on ADL-P
- VRR platform support for display 11
- add support for display audio codec keepalive
- lots of display refactoring
- fix runtime PM handling during PXP suspend
- improved eviction performance with async TTM moves
- async VMA unbinding improvements
- VMA locking refactoring
- improved error capture robustness
- use per device iommu checks
- drop bits stealing from i915_sw_fence function ptr
- remove dma_resv_prune
- add IC cache invalidation on DG2
nouveau:
- crc fixes
- validate LUTs in atomic check
- set HDMI AVI RGB quant to full
tegra:
- buffer objects reworks for dma-buf compat
- NVDEC driver uAPI support
- power management improvements
etnaviv:
- IOMMU enabled system support
- fix > 4GB command buffer mapping
- close a DoS vector
- fix spurious GPU resets
ast:
- fix i2c initialization
rcar-du:
- DSI output support
exynos:
- replace legacy gpio interface
- implement generic GEM object mmap
msm:
- dpu plane state cleanup in prep for multirect
- dpu debugfs cleanups
- dp support for sc7280
- a506 support
- removal of struct_mutex
- remove old eDP sub-driver
anx7625:
- support MIPI DSI input
- support HDMI audio
- fix reading EDID
lvds:
- fix bridge DT bindings
megachips:
- probe both bridges before registering
dw-hdmi:
- allow interlace on bridge
ps8640:
- enable runtime PM
- support aux-bus
tx358768:
- enable reference clock
- add pulse mode support
ti-sn65dsi86:
- use regmap bulk write
- add PWM support
etnaviv:
- get all fences at once
gma500:
- gem object cleanups
kmb:
- enable fb console
radeon:
- use dma_resv_wait_timeout
rockchip:
- add DSP hold timeout
- suspend/resume fixes
- PLL clock fixes
- implement mmap in GEM object functions
- use generic fbdev emulation
sun4i:
- use CMA helpers without vmap support
vc4:
- fix HDMI-CEC hang with display is off
- power on HDMI controller while disabling
- support 4K@60Hz modes
- support 10-bit YUV 4:2:0 output
vmwgfx:
- fix leak on probe errors
- fail probing on broken hosts
- new placement for MOB page tables
- hide internal BOs from userspace
- implement GEM support
- implement GL 4.3 support
virtio:
- overflow fixes
xen:
- implement mmap as GEM object function
omapdrm:
- fix scatterlist export
- support virtual planes
mediatek:
- MT8192 support
- CMDQ refinement"
* tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm: (1241 commits)
drm/amdgpu: no DC support for headless chips
drm/amd/display: fix dereference before NULL check
drm/amdgpu: always reset the asic in suspend (v2)
drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
drm/amd/display: Fix the uninitialized variable in enable_stream_features()
drm/amdgpu: fix runpm documentation
amdgpu/pm: Make sysfs pm attributes as read-only for VFs
drm/amdgpu: save error count in RAS poison handler
drm/amdgpu: drop redundant semicolon
drm/amd/display: get and restore link res map
drm/amd/display: support dynamic HPO DP link encoder allocation
drm/amd/display: access hpo dp link encoder only through link resource
drm/amd/display: populate link res in both detection and validation
drm/amd/display: define link res and make it accessible to all link interfaces
drm/amd/display: 3.2.167
drm/amd/display: [FW Promotion] Release 0.0.98
drm/amd/display: Undo ODM combine
drm/amd/display: Add reg defs for DCN303
drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
drm/amd/display: Set optimize_pwr_state for DCN31
...
Pull random number generator updates from Jason Donenfeld:
"These a bit more numerous than usual for the RNG, due to folks
resubmitting patches that had been pending prior and generally renewed
interest.
There are a few categories of patches in here:
1) Dominik Brodowski and I traded a series back and forth for a some
weeks that fixed numerous issues related to seeds being provided
at extremely early boot by the firmware, before other parts of the
kernel or of the RNG have been initialized, both fixing some
crashes and addressing correctness around early boot randomness.
One of these is marked for stable.
2) I replaced the RNG's usage of SHA-1 with BLAKE2s in the entropy
extractor, and made the construction a bit safer and more
standard. This was sort of a long overdue low hanging fruit, as we
were supposed to have phased out SHA-1 usage quite some time ago
(even if all we needed here was non-invertibility). Along the way
it also made extraction 131% faster. This required a bit of
Kconfig and symbol plumbing to make things work well with the
crypto libraries, which is one of the reasons why I'm sending you
this pull early in the cycle.
3) I got rid of a truly superfluous call to RDRAND in the hot path,
which resulted in a whopping 370% increase in performance.
4) Sebastian Andrzej Siewior sent some patches regarding PREEMPT_RT,
the full series of which wasn't ready yet, but the first two
preparatory cleanups were good on their own. One of them touches
files in kernel/irq/, which is the other reason why I'm sending
you this pull early in the cycle.
5) Other assorted correctness fixes from Eric Biggers, Jann Horn,
Mark Brown, Dominik Brodowski, and myself"
* 'random-5.17-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: don't reset crng_init_cnt on urandom_read()
random: avoid superfluous call to RDRAND in CRNG extraction
random: early initialization of ChaCha constants
random: use IS_ENABLED(CONFIG_NUMA) instead of ifdefs
random: harmonize "crng init done" messages
random: mix bootloader randomness into pool
random: do not throw away excess input to crng_fast_load
random: do not re-init if crng_reseed completes before primary init
random: fix crash on multiple early calls to add_bootloader_randomness()
random: do not sign extend bytes for rotation when mixing
random: use BLAKE2s instead of SHA1 in extraction
lib/crypto: blake2s: include as built-in
random: fix data race on crng init time
random: fix data race on crng_node_pool
irq: remove unused flags argument from __handle_irq_event_percpu()
random: remove unused irq_flags argument from add_interrupt_randomness()
random: document add_hwgenerator_randomness() with other input functions
MAINTAINERS: add git tree for random.c
arch/x86/ to amd64_edac as that is its only user anyway
- Some MCE error injection improvements to the AMD side
- Reorganization of the #MC handler code and the facilities it calls to
make it noinstr-safe
- Add support for new AMD MCA bank types and non-uniform banks layout
- The usual set of cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcGZ4ACgkQEsHwGGHe
VUr6Zw//WBvNvfV/akQGsvVo94G0DaF+buYB+Tl1p0goMd7QfKA5iHxjB1alEJC2
dTchIr7pjiiE3nr4svuWLLQZamx8kMwQNqipioBHXg3YThj0wD4PbUOC9TlIceBR
3yxVbvwlD7Y7sb2PII6IMlagzTiIeW0/ps29DHFr5vqDBvEanNdAHoV/h2vQi+76
Ma96psIxzTMSk11yGB6l9k66EASCdDGBU7sODjup7wuQmuRaQ/1oJAWY0wIJvJez
frjpaz/YKmlTwTf9bxoJbky2FkeBsD4yXXUGwjDgMq0EyUUaeSbvaQkm8gSHX9Yr
VDDv1WvT6QIw6x7Wc4skS8lWmZghNBbAHOoNS31BPJ2IDmFWkF5Q2bNEuHrtU4EC
0mkNeyN6x48L/F8j/1aE/tm+SjiGexZX4zhi6MNWReTV140I1zqQq/r7CCu5+MEa
PAB1YH/96k2dMPT6mbFrRIFJmkDuBuZOAkuwYWEjO/XjPl2SGBGj1jKolWW3qjRR
Po7vBJnDt7wgigWFh6+R4rJv+fh87XfB7B2wEOt4Yn37jUkK6dNRIy0zFmDaC1J2
bHgsJbWC+Sgs1G57gnYABJYzLj7RRdDyCu1/UUVyBBP7/WfZJw0kjABE7p3AaYTd
15JV1L0c/Ypuv05LJf40LkyF2F5w2fnP5QM2Rr8U4xW/GumEyWs=
=8Hu7
-----END PGP SIGNATURE-----
Merge tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
"A relatively big amount of movements in RAS-land this time around:
- First part of a series to move the AMD address translation code
from arch/x86/ to amd64_edac as that is its only user anyway
- Some MCE error injection improvements to the AMD side
- Reorganization of the #MC handler code and the facilities it calls
to make it noinstr-safe
- Add support for new AMD MCA bank types and non-uniform banks layout
- The usual set of cleanups and fixes"
* tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/mce: Reduce number of machine checks taken during recovery
x86/mce/inject: Avoid out-of-bounds write when setting flags
x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
x86/mce: Check regs before accessing it
x86/mce: Mark mce_start() noinstr
x86/mce: Mark mce_timed_out() noinstr
x86/mce: Move the tainting outside of the noinstr region
x86/mce: Mark mce_read_aux() noinstr
x86/mce: Mark mce_end() noinstr
x86/mce: Mark mce_panic() noinstr
x86/mce: Prevent severity computation from being instrumented
x86/mce: Allow instrumentation during task work queueing
x86/mce: Remove noinstr annotation from mce_setup()
x86/mce: Use mce_rdmsrl() in severity checking code
x86/mce: Remove function-local cpus variables
x86/mce: Do not use memset to clear the banks bitmaps
x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
x86/mce/inject: Check if a bank is populated before injecting
x86/mce: Get rid of cpu_missing
...
accesing it in order to prevent any potential data races, and convert
all users to those new accessors
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcgFoACgkQEsHwGGHe
VUqXeRAAvcNEfFw6BvXeGfFTxKmOrsRtu2WCkAkjvamyhXMCrjBqqHlygLJFCH5i
2mc6HBohzo4vBFcgi3R5tVkGazqlthY1KUM9Jpk7rUuUzi0phTH7n/MafZOm9Es/
BHYcAAyT/NwZRbCN0geccIzBtbc4xr8kxtec7vkRfGDx8B9/uFN86xm7cKAaL62G
UDs0IquDPKEns3A7uKNuvKztILtuZWD1WcSkbOULJzXgLkb+cYKO1Lm9JK9rx8Ds
8tjezrJgOYGLQyyv0i3pWelm3jCZOKUChPslft0opvVUbrNd8piehvOm9CWopHcB
QsYOWchnULTE9o4ZAs/1PkxC0LlFEWZH8bOLxBMTDVEY+xvmDuj1PdBUpncgJbOh
dunHzsvaWproBSYUXA9nKhZWTVGl+CM8Ks7jXjl3IPynLd6cpYZ/5gyBVWEX7q3e
8htG95NzdPPo7doxMiNSKGSmSm0Np1TJ/i89vsYeGfefsvsq53Fyjhu7dIuTWHmU
2YUe6qHs6dF9x1bkHAAZz6T9Hs4BoGQBcXUnooT9JbzVdv2RfTPsrawdu8dOnzV1
RhwCFdFcll0AIEl0T9fCYzUI/Ga8ZS0roXs5NZ4wl0lwr0BGFwiU8WC1FUdGsZo9
0duaa0Tpv0OWt6rIMMB/E9QsqCDsQ4CMHuQpVVw+GOO5ux9kMms=
=v6Xn
-----END PGP SIGNATURE-----
Merge tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull thread_info flag accessor helper updates from Borislav Petkov:
"Add a set of thread_info.flags accessors which snapshot it before
accesing it in order to prevent any potential data races, and convert
all users to those new accessors"
* tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
powerpc: Snapshot thread flags
powerpc: Avoid discarding flags in system_call_exception()
openrisc: Snapshot thread flags
microblaze: Snapshot thread flags
arm64: Snapshot thread flags
ARM: Snapshot thread flags
alpha: Snapshot thread flags
sched: Snapshot thread flags
entry: Snapshot thread flags
x86: Snapshot thread flags
thread_info: Add helpers to snapshot thread flags
copy_user_enhanced_fast_string()
- Avoid writing MSR_CSTAR on Intel due to TDX guests raising a #VE trap
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcFRcACgkQEsHwGGHe
VUrVYRAAg8hJS/aIMnqr+CDX+iOlx2hxJ2TA2bA45NwWc1A4VTt9kwRB0+NIKkjj
F3uJbidZjxSch9Oza6O5KyjJK8QtOfqxyYcx8TLjSleqJRoJWxl1Ub1/yAfKIX/0
QsqXVc/OuMzgwVGYLUwGSWifJOWMYKy03vSczmXK74zp9vZ56fdot8rOhDm3Xb/R
QSfT5nKlgCvxbvAqgFfbXKoEu/EqT43sTXq4o1C6yDX/G6JOGe6nXZIAvIVm3iKZ
utOqO+tBOmektF/yg3EHZL/7paFgtfETcI1YpmPYqKhG3KvvZgm7yyU6SqrcctSx
vMSPTcgcuZl2I5OF+eesUGfGGhHSfSPBAhkxpCTOb6lHf73PYRC3BnQtlQkQt6g/
UOtm3fQwrVJcKlMu7nem46iDCgbSyvASFa5ZyuOGcrAiFLhJzQNRDlXLpxp/q615
yOYTRgj4YS6vomzc6bL3zNCcF5aJUwAPNVghe3l2zwKXetoOPvtWX8sKlYjiN3GW
DTtEi117IAiWkosDIYY+aFNxLeOqxpNMcOkwd5eHHdpR3rkeFkjOtBctll/eHzPi
NYx++cV5yYW0z4S2uRr6o4k4hdgAQU/p7xhdO28Z+yzWpmXQ//79HhiOf2nNd1iI
dpQAx9roo8vbR3JYLxGYFuJrZsHna+/f6Gqf5teUy7SjVL5M95U=
=zbYM
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Enable the short string copies for CPUs which support them, in
copy_user_enhanced_fast_string()
- Avoid writing MSR_CSTAR on Intel due to TDX guests raising a #VE trap
* tag 'x86_cpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Add fast-short-rep-movs check to copy_user_enhanced_fast_string()
x86/cpu: Don't write CSTAR MSR on Intel CPUs
to use it in SEV and TDX guests
- An include fix and reorg to allow for removing set_fs in UML later
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcEl4ACgkQEsHwGGHe
VUqFTQ/+K9Kb6X0+r7wBSRTeAIWaYewmgOdf+7rpFVyFqQtNecKbuSAWGgFnEHc8
8HUB/krNa+odtx7mAy73wNALUaPmR0KUg6O+YKrvT6LHt8DLlGl5u0g/hihzFdAB
PW7auuxqt9TvK1i8PkYAI+W7t93o4mw4LzgDCVvoLPQUutRZEV1gHRht8Tn8SjaN
3EmEiazpFDrXNGWl/3rnS0qIyvtiZu7KNtibE6ljbUgse9cgxOt733mykH6eO9RJ
hXOfewKML72UxmgWig01pElgLaXeYI5rpSoG7usm4FwwYh+tmBIA8S/EoeE24gn0
e82lxwRCcHjqUDRp2//gz16sYhs//K6bcViT/4FtnL33e2CjK2/J4MwHPn9zgimO
VvxSdAes7UFiA/gDIomFt3gJij+hfy4TGKg5d3326Nm9rsQLpxg49WkozYJZ8m/f
75VVlC4BAj9SnYLQYhSm9buF7pIXmfwN3yWkYJsebl18C6/4FXLLomiqOgWpo3mG
D0e+CXhLZsEaU5NTiVuaPySzjtpRUzmfWf3S9GifJZex0rX+et7+mqIuC92aHbtD
Dc+nNFX/D77Fq8Uoe8bIEt8QsnjdACov1TI/S8h2rSjt5R/Lyg73qh0CpN0jtQ+S
9dUooJWwE4RXnuVMpFq/Xea/BYj1lQ72kMeyFiCNc0/hnzYhZNM=
=NBcE
-----END PGP SIGNATURE-----
Merge tag 'x86_misc_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
"The pile which we cannot find the proper topic for so we stick it in
x86/misc:
- Add support for decoding instructions which do MMIO accesses in
order to use it in SEV and TDX guests
- An include fix and reorg to allow for removing set_fs in UML later"
* tag 'x86_misc_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Remove the mtrr_bp_init() stub
x86/sev-es: Use insn_decode_mmio() for MMIO implementation
x86/insn-eval: Introduce insn_decode_mmio()
x86/insn-eval: Introduce insn_get_modrm_reg_ptr()
x86/insn-eval: Handle insn_get_opcode() failure
pagetable to prevent any stale entries' presence
- Flush global mappings from the TLB, in addition to the CR3-write,
after switching off of the trampoline_pgd during boot to clear the
identity mappings
- Prevent instrumentation issues resulting from the above changes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcEGIACgkQEsHwGGHe
VUp14RAAgo6BbW9J82Pyl55egIhcQDdGsa16Gdm9S/AFIIW/NhwYo9ydrgtzr/70
3XKpJYX7nH7PUKYRmoca/m3NnzUU+wnjSGS1XMyB3bJvn2/8S1qeuwBty2VP2dYM
iS2eGRLjVjbMWwQUSK7tPJa5wi11zUqLIyCe3t0YiWso6TK7xKaVJTQ3/19Xc+/a
zVQ5VpmzglUTxA6xGCvTDn5IUViUb8QmIuw7Ty6QtQEoI6T3qQvPkdJNXOxDcHNy
9gDGf4O+5YlPCxYsNEkWDDa02zSZ2aWFSq76b98VyMiOK0xts+ktnAwq6oes+as9
ZLIipOu5aIkj8te7he0FelyvPhZAVzrFvvmMf1U+EV3PqbyVkabhk5SBeP5v8CZy
bM4eYNuJ2FLvFpUCC9zQ/MNVQ6ZtxN15rrrsTqk46KLPBHmHp/Aj9W/DP4zpCcNg
Wwh4xbnGNIN8jZBiBJG6R6q7oM/lZt/loEicxm2QFZHtAIYMsiUmE99HnIREjUHd
+0mwo2rHniie9zh6GoybX8OcbZCLYGdfe3iPvlO9fQpyDTn8IUIlnruDlUiTBMDM
fX4J2dynh7xXRH1WW+MwxDv4n400+C08SG9zTD0qPCbGhYwNscMlZhA2JN6mlPep
spuRPOzzwUUxqjXkDloeDDJNUQ8r032OB2LMhWSbLApJrJM9/QA=
=cM+z
-----END PGP SIGNATURE-----
Merge tag 'x86_mm_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Borislav Petkov:
- Flush *all* mappings from the TLB after switching to the trampoline
pagetable to prevent any stale entries' presence
- Flush global mappings from the TLB, in addition to the CR3-write,
after switching off of the trampoline_pgd during boot to clear the
identity mappings
- Prevent instrumentation issues resulting from the above changes
* tag 'x86_mm_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Prevent early boot triple-faults with instrumentation
x86/mm: Include spinlock_t definition in pgtable.
x86/mm: Flush global TLB when switching to trampoline page-table
x86/mm/64: Flush global TLB on boot and AP bringup
x86/realmode: Add comment for Global bit usage in trampoline_pgd
x86/mm: Add missing <asm/cpufeatures.h> dependency to <asm/page_64.h>
from poison memory and error injection into SGX pages
- A bunch of changes to the SGX selftests to simplify and allow of SGX
features testing without the need of a whole SGX software stack
- Add a sysfs attribute which is supposed to show the amount of SGX
memory in a NUMA node, similar to what /proc/meminfo is to normal
memory
- The usual bunch of fixes and cleanups too
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcDQMACgkQEsHwGGHe
VUq42xAAjWM0AFpIxgUBpbE0swV3ZMulnndl3/vA5XN+9Yn7Q52+AFyPRE0s7Zam
Ap+cInh2Il7d/sv54rZ4x/j7+TH4i7s8fWPVU/XiPALQuOuw0/B1wJJ+jmMiPFiU
3jr7DkUPyWjWTHduMY/tk+xMOpkx1XsxJheYnKvsKVW+fjJ0vPuftAZtfu2z2VOh
3JLcp5cAXPxW0UK9gdoF5bCBQhBu0NRguTbhHhbByAixQO2GyVSKLSRovUdj0a+y
QRrQ6hgcvpTOsVHJoWJ7yIX4SBzQTe9Bg6dT9DghOxE4Sc2GH89hu7wRztGawBJO
nLyzWgiW9ttjQutDpBvZANNVcFAPAdtDWczrzZpREbrGKkzT+kOBnIIL1LWITWOy
2YWTO3ytW0KNIK85GzMjSVOKRMgaHJeBaGuYZ7Z0kb3GuUPJ9zRlaRxNapKQFuzA
0PGoA4IDT+2Afy7VYBBNUA2d/WverFQuXKusSxK6b5zJ173o5/DXL2q0d3gn/j8Z
hhxJUJyVOsfRXSG4NKrj4se4FiA0n/RL4oyUZR9iJ8kWzzZTd0eZTAn468bpGIp5
yiOlPOLgsmu0xzVmAtG1+4d2+S2x+Ec5YE0sP1V/JLNciYk3Ebp7UyfnS3tn33Xc
cpdWjELvD1LJVpMEURnbjRrwU6OiiAekYJCP/9lmK9zfOGpwRHc=
=vFTM
-----END PGP SIGNATURE-----
Merge tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Borislav Petkov:
- Add support for handling hw errors in SGX pages: poisoning,
recovering from poison memory and error injection into SGX pages
- A bunch of changes to the SGX selftests to simplify and allow of SGX
features testing without the need of a whole SGX software stack
- Add a sysfs attribute which is supposed to show the amount of SGX
memory in a NUMA node, similar to what /proc/meminfo is to normal
memory
- The usual bunch of fixes and cleanups too
* tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/sgx: Fix NULL pointer dereference on non-SGX systems
selftests/sgx: Fix corrupted cpuid macro invocation
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
x86/sgx: Fix minor documentation issues
selftests/sgx: Add test for multiple TCS entry
selftests/sgx: Enable multiple thread support
selftests/sgx: Add page permission and exception test
selftests/sgx: Rename test properties in preparation for more enclave tests
selftests/sgx: Provide per-op parameter structs for the test enclave
selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
selftests/sgx: Move setup_test_encl() to each TEST_F()
selftests/sgx: Encpsulate the test enclave creation
selftests/sgx: Dump segments and /proc/self/maps only on failure
selftests/sgx: Create a heap for the test enclave
selftests/sgx: Make data measurement for an enclave segment optional
selftests/sgx: Assign source for each segment
selftests/sgx: Fix a benign linker warning
x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
x86/sgx: Add hook to error injection address validation
x86/sgx: Hook arch_memory_failure() into mainline code
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcC4cACgkQEsHwGGHe
VUoKAw/5AYM6etR5PHHOkNHycaT7fzZYVhvHbW9FWdRh7rlLlQ2A/khe//hkidOY
auXcc1wy0UfJN/MvoKlZyXpbTzj3HdnuNTaNf24akhKAM9HpC5oYzAxGaZtF9OvD
NpkYC1sFWxF1StvRK5iS2Ns80eL0SU1o7VOeKR6BaabBBiuE9Nq9AAuVddNIqWkW
k5wwjkttBea4wSMUt3kZbac5mncwV5o2vF23db6j6vvsG6lVo+xT/zcKzQk0z/bu
NZsdAqKZwzubV+3zksxiQjsPpy4kS7OGVlPCiNGX1zCRZtOX6wpNUzOcj6S4TDju
FIlnD2zqK9mylBomBDq0n/GWJavMNttojO6u9iwJm3v6bteXb5dT3Cu8OCp6J7B+
/q4t79hwleELVQ5z60Wz8R9oPi2jR/YKjSVW1w5X1yxxJPUIDU905ZF7r50BxeA+
QQK0kn+EkJvOahfwK9RwhlaexSilYB9TBum5E70yzBefp3mbMGwwu/60UkjydrEP
SoD/qlY5vatLjuYMLx/Pe93aE0nK0HOfbZKpgvnxzdm9pxQs+GogXGW5CmutfpHo
9Fv8wH0bAlFnltmylXg+zdJ0hggRlQS97bM/FKJXq4qhj3NgEQ1uPhfGSe/CmnTe
t6TB2dXPPN9+2hnbnZDNKRygVPUnRklnnVweSP7ij2Csksiqj7A=
=eG29
-----END PGP SIGNATURE-----
Merge tag 'x86_cache_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control fixlet from Borislav Petkov:
"A minor code cleanup removing a redundant assignment"
* tag 'x86_cache_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Remove redundant assignment to variable chunks
- Cleanups and generalzation of code shared by SEV and TDX
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcCf4ACgkQEsHwGGHe
VUr5GBAAiG5FuPmNBk+9WE/nQLfk/O9J7DJTY3CXuMxDVbCN3x9qZs2PHWwukZ18
XX6NV+OlD6ZhkHTx28WsmoY6b6rePAXBYH0r1kSCwqJ6qujbi81Mmg8jSJK/xsb3
JjXbqt92TSbTcRPZ27Xqz81mjrifR7iTF7wU3o3tUdCfDmnfiaS4DmOJ/bwd1Vvl
wVsjn+zBhuzbQHZtGJuJm4geYAgQNLwh3lPK6ad+V6PmQKXS6QRpbDsxqD5+ZqnT
nfNND8+lEov3DVzvSHgAN6VsA63kfx7gU9PFKxNOLjgz6TN0fmAq/tu50HLh2X9V
tcpgb4SywSifha3sDKSqPzDILIY7+L1S2YBcVt2QYpIYzahmEd6aKkMQ8AZINcXQ
kV6Xrm8FOOWA3+uhqi4XBlIZ5OJ8O47UZszjSN/j1COqWnxiDVF8P+QfmEEJPs8C
BIkgwhvQM5dLR/rRArBieEIe/mgYPKjUeQCfiUhxBOMkZamDvWBQtv4wr/2CdImH
b0Tu2sPkyBDLsv8sxK5LSmzxWpvJJGopu8Aqvu2SOrcUw9M4rppRq/nAfpr2YZw4
AJi4CKEQsLxsiaQQ2duWXQYnWInvAySDJgReDahdAZj2TWSgHNhCiGdEmdzMfTUY
ToRtczTjHd16TxRDgw3JCi2XG8hneJdAK/Kbp6P1AONEZSNWyT8=
=9EUd
-----END PGP SIGNATURE-----
Merge tag 'x86_sev_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
"The accumulated pile of x86/sev generalizations and cleanups:
- Share the SEV string unrolling logic with TDX as TDX guests need it
too
- Cleanups and generalzation of code shared by SEV and TDX"
* tag 'x86_sev_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Move common memory encryption code to mem_encrypt.c
x86/sev: Rename mem_encrypt.c to mem_encrypt_amd.c
x86/sev: Use CC_ATTR attribute to generalize string I/O unroll
x86/sev: Remove do_early_exception() forward declarations
x86/head64: Carve out the guest encryption postprocessing into a helper
x86/sev: Get rid of excessive use of defines
x86/sev: Shorten GHCB terminate macro names
don't contribute to frequency throttling
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcAjYACgkQEsHwGGHe
VUqgkw//RixgjMiu/YH50mqrpYCtk6WPtecssMeVMth4ATWD/JoDzyh+r4ZqfYXI
owSQ3tZA+bByXT7TRhI8XONYQhj1+45UvG6bI6LNJN0CVzw64g7MkHsh0f1tfjft
Tt2SsKDWYQibMmHLZ01aLi7/ibTevkdKeEIvtbJSTW/qyjXYKKzpcFDBQeoPcP8R
qfUwfLm/YbWR0wCoeaR2WCGwgXwV+6rUk+mHMK0sxws8/jrtz+OQWnTHsmvUg1Re
pJqRSQvwkeXU+j2JMOntz//66Cw00fOrbe1sQ6VWf0+VbHGKdhXXiWjMBXj+Ye8d
53CSsQzkIeNwD505W+Yx+Ju2irxk2H2NgCN7HXokohYYLy8rGpy5jK8AinCHWflU
3HD9ee1nF9laiyDZczu8yL6tD5gWWuHJRgDr3jm5juOPaYVEAbZMEc747JOPpi8V
4HBlQSzXh179vNEEBBiTOuJN7t9PkJD7SHfW9sHo6tQ7wyd42qDCw0gGGAqh+upp
1p+2IoGM9r2fL8aAWlV2HALf0E+ugywIfmb2O5HpKgk+3z+fb/DwaZ6T7posM+gr
r6JwXNOCUFcyJzVeTokmRzWO079qHyxmL+pWF1S5y7mHiVHoCpjM1olDsCkDdsKI
Sp4iZ70soqDSOq4EO6twVyyrt0GsoshNd/p9GHWA2rRQKyJFRhs=
=aPgM
-----END PGP SIGNATURE-----
Merge tag 'x86_fpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu update from Borislav Petkov:
"A single x86/fpu update for 5.17:
- Exclude AVX opmask registers use from AVX512 state tracking as they
don't contribute to frequency throttling"
* tag 'x86_fpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Correct AVX512 state tracking
Merge cpuidle updates, PM core updates and one hiberation-related
update for 5.17-rc1:
- Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).
- Fix two comments in cpuidle code (Jason Wang, Yang Li).
- Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).
- Add safety net to supplier device release in the runtime PM core
code (Rafael Wysocki).
- Capture device status before disabling runtime PM for it (Rafael
Wysocki).
- Add new macros for declaring PM operations to allow drivers to
avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
update some drivers to use these macros (Paul Cercueil).
- Allow ACPI hardware signature to be honoured during restore from
hibernation (David Woodhouse).
* pm-cpuidle:
cpuidle: use default_groups in kobj_type
cpuidle: Fix cpuidle_remove_state_sysfs() kerneldoc comment
cpuidle: menu: Fix typo in a comment
* pm-core:
PM: runtime: Simplify locking in pm_runtime_put_suppliers()
mmc: mxc: Use the new PM macros
mmc: jz4740: Use the new PM macros
PM: runtime: Add safety net to supplier device release
PM: runtime: Capture device status before disabling runtime PM
PM: core: Add new *_PM_OPS macros, deprecate old ones
PM: core: Redefine pm_ptr() macro
r8169: Avoid misuse of pm_ptr() macro
* pm-sleep:
PM: hibernate: Allow ACPI hardware signature to be honoured
- KCSAN enabled for arm64.
- Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.
- Some more SVE clean-ups and refactoring in preparation for SME support
(scalable matrix extensions).
- BTI clean-ups (SYM_FUNC macros etc.)
- arm64 atomics clean-up and codegen improvements.
- HWCAPs for FEAT_AFP (alternate floating point behaviour) and
FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal
square root estimate).
- Use SHA3 instructions to speed-up XOR.
- arm64 unwind code refactoring/unification.
- Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1
(potentially set by a hypervisor; user-space already does this).
- Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.
- Other fixes and clean-ups; highlights: fix the handling of erratum
1418040, correct the calculation of the nomap region boundaries,
introduce io_stop_wc() mapped to the new DGH instruction (data
gathering hint).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmHXNtYACgkQa9axLQDI
XvHBGw/+OVGdbORxwrU+uRb7N6qIJkrW/mmM4x1KLo1i+REZLb8/VlXm0xC60FG+
39x6FSVkRr+lLDfTqpQsOez5FpdsvOe9Fc4L3bwniDg+EPo7x65VmP2dw/Ae2q0i
87xyWCczx5hFEPF/1sb1R1pm3bTXjeklBkdv+OXhwflLOwpCp1J8z8WJK8qJVFX6
CmuE6Q4fDQr0ghl9Nf8DiAr20mHDh8wMKNUJOg4waaQOOCta6q1oJ3qfz6E9z1eW
zEE3dfZgBCx7HCRc3KGgzT7H4Ces3BYvhBYP6bJRliVI88XdPiM4MfdGL4UIb27Q
NLAdr+FVzk/YLzMHtxSfkT10nBqoOPWUTckLu9jIIl5cpBX73Wiz7jfzBvqFmC/y
opSFMZ3lwQPM5WAPtAlZptA3GPPySeInVmvUgB7IQ+1Q1T1n8ri1y5hzTYC4Sc/g
amJI1rXf1Al8+2zFBggr6Up+EOnfV9nAwrzLXkRlASsfmvY4dnVWg3NWfBqtEHAq
VuZCecSgawxuSlpmJ4VGbLrBFaz18bn9EzujR5fFvi5Qcg1CMFOROi2+6IynopNV
IS0R8j6fwgQPA5lcnNIPeJRRkQoqO4l8bPDzeXEny0BSw313EgBSo9aQtnjyIJbp
BTuDHARKs+/NvDPvd8GQkxNPgwJnVOL9pdgNAolEu1/k7JtnIS0=
=ecyi
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- KCSAN enabled for arm64.
- Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.
- Some more SVE clean-ups and refactoring in preparation for SME
support (scalable matrix extensions).
- BTI clean-ups (SYM_FUNC macros etc.)
- arm64 atomics clean-up and codegen improvements.
- HWCAPs for FEAT_AFP (alternate floating point behaviour) and
FEAT_RPRESS (increased precision of reciprocal estimate and
reciprocal square root estimate).
- Use SHA3 instructions to speed-up XOR.
- arm64 unwind code refactoring/unification.
- Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP ==
1 (potentially set by a hypervisor; user-space already does this).
- Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.
- Other fixes and clean-ups; highlights: fix the handling of erratum
1418040, correct the calculation of the nomap region boundaries,
introduce io_stop_wc() mapped to the new DGH instruction (data
gathering hint).
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits)
arm64: Use correct method to calculate nomap region boundaries
arm64: Drop outdated links in comments
arm64: perf: Don't register user access sysctl handler multiple times
drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
arm64: errata: Fix exec handling in erratum 1418040 workaround
arm64: Unhash early pointer print plus improve comment
asm-generic: introduce io_stop_wc() and add implementation for ARM64
arm64: Ensure that the 'bti' macro is defined where linkage.h is included
arm64: remove __dma_*_area() aliases
docs/arm64: delete a space from tagged-address-abi
arm64: Enable KCSAN
kselftest/arm64: Add pidbench for floating point syscall cases
arm64/fp: Add comments documenting the usage of state restore functions
kselftest/arm64: Add a test program to exercise the syscall ABI
kselftest/arm64: Allow signal tests to trigger from a function
kselftest/arm64: Parameterise ptrace vector length information
arm64/sve: Minor clarification of ABI documentation
arm64/sve: Generalise vector length configuration prctl() for SME
arm64/sve: Make sysctl interface for SVE reusable by SME
...
To support dynamically enabled FPU features for guests prepare the guest
pseudo FPU container to keep track of the currently enabled xfeatures and
the guest permissions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-3-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM requires a clear separation of host user space and guest permissions
for dynamic XSTATE components.
Add a guest permissions member to struct fpu and a separate set of prctl()
arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM.
The semantics are equivalent to the host user space permission control
except for the following constraints:
1) Permissions have to be requested before the first vCPU is created
2) Permissions are frozen when the first vCPU is created to ensure
consistency. Any attempt to expand permissions via the prctl() after
that point is rejected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
== Problem ==
Nathan Chancellor reported an oops when aceessing the
'sgx_total_bytes' sysfs file:
https://lore.kernel.org/all/YbzhBrimHGGpddDM@archlinux-ax161/
The sysfs output code accesses the sgx_numa_nodes[] array
unconditionally. However, this array is allocated during SGX
initialization, which only occurs on systems where SGX is
supported.
If the sysfs file is accessed on systems without SGX support,
sgx_numa_nodes[] is NULL and an oops occurs.
== Solution ==
To fix this, hide the entire nodeX/x86/ attribute group on
systems without SGX support using the ->is_visible attribute
group callback.
Unfortunately, SGX is initialized via a device_initcall() which
occurs _after_ the ->is_visible() callback. Instead of moving
SGX initialization earlier, call sysfs_update_group() during
SGX initialization to update the group visiblility.
This update requires moving the SGX sysfs code earlier in
sgx/main.c. There are no code changes other than the addition of
arch_update_sysfs_visibility() and a minor whitespace fixup to
arch_node_attr_is_visible() which checkpatch caught.
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-sgx@vger.kernel.org
Cc: x86@kernel.org
Fixes: 50468e4313 ("x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20220104171527.5E8416A8@davehans-spike.ostc.intel.com
I made the actual CPU bringup go nice and fast... and then Linux spends
half a minute printing stupid nonsense about clocks and steal time for
each of 256 vCPUs. Don't do that. Nobody cares.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211209150938.3518-12-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit
ee3e00e9e7 ("random: use registers from interrupted code for CPU's w/o a cycle counter")
the irq_flags argument is no longer used.
Remove unused irq_flags.
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The current EPB "normal" is defined as 6 and set whenever power-up EPB
value is 0. This setting resulted in the desired out of box power and
performance for several CPU generations. But this value is not suitable
for AlderLake mobile CPUs, as this resulted in higher uncore power.
Since EPB is model specific, this is not unreasonable to have different
behavior.
Allow a capability where "normal" EPB can be redefined. For AlderLake
mobile CPUs this desired normal value is 7.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A contrived zero-length write, for example, by using write(2):
...
ret = write(fd, str, 0);
...
to the "flags" file causes:
BUG: KASAN: stack-out-of-bounds in flags_write
Write of size 1 at addr ffff888019be7ddf by task writefile/3787
CPU: 4 PID: 3787 Comm: writefile Not tainted 5.16.0-rc7+ #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
due to accessing buf one char before its start.
Prevent such out-of-bounds access.
[ bp: Productize into a proper patch. Link below is the next best
thing because the original mail didn't get archived on lore. ]
Fixes: 0451d14d05 ("EDAC, mce_amd_inj: Modify flags attribute to use string arguments")
Signed-off-by: Zhang Zixun <zhang133010@icloud.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/linux-edac/YcnePfF1OOqoQwrX@zn.tnic/
Add the new PCI Device IDs to support new generation of AMD 19h family of
processors.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h
Link: https://lore.kernel.org/r/163640828133.955062.18349019796157170473.stgit@bmoger-ubuntu
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Add an IS_ENABLED() check in setup_arch() and call pat_disable()
directly if MTRRs are not supported. This allows to remove the
<asm/memtype.h> include in <asm/mtrr.h>, which pull in lowlevel x86
headers that should not be included for UML builds and will cause build
warnings with a following patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211215165612.554426-2-hch@lst.de
AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.
For example:
Bank # | CPUx | CPUy
0 LS LS
1 RAZ UMC
2 CS CS
3 SMU RAZ
Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.
For example:
Bank # | CPUx | CPUy
0 LS LS
1 RAZ UMC
2 CS NBIO
3 SMU RAZ
Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.
Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.
Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.
Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.
The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.
[ bp: Remove useless comments over hwid types. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
Commit in Fixes added a global TLB flush on the early boot path, after
the kernel switches off of the trampoline page table.
Compiler profiling options enabled with GCOV_PROFILE add additional
measurement code on clang which needs to be initialized prior to
use. The global flush in x86_64_start_kernel() happens before those
initializations can happen, leading to accessing invalid memory.
GCOV_PROFILE builds with gcc are still ok so this is clang-specific.
The second issue this fixes is with KASAN: for a similar reason,
kasan_early_init() needs to have happened before KASAN-instrumented
functions are called.
Therefore, reorder the flush to happen after the KASAN early init
and prevent the compilers from adding profiling instrumentation to
native_write_cr4().
Fixes: f154f29085 ("x86/mm/64: Flush global TLB on boot and AP bringup")
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Carel Si <beibei.si@intel.com>
Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
Link: https://lore.kernel.org/r/20211209144141.GC25654@xsang-OptiPlex-9020
hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.
In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.
Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.
The map function memremap() can't work in the early place
(e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
attributes() in the hyperv_init().
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211213071407.314309-4-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Hyper-V provides Isolation VM for confidential computing support and
guest memory is encrypted in it. Places checking cc_platform_has()
with GUEST_MEM_ENCRYPT attr should return "True" in Isolation VM.
Hyper-V Isolation VMs need to adjust the SWIOTLB size just like SEV
guests. Add a hyperv_cc_platform_has() variant which enables that.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211213071407.314309-3-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Adding PCI device ids and enabling ADL-N platform.
ADL-N from i915 point of view is subplatform of ADL-P.
BSpec: 68397
Changes since V2:
- Added version log history
Changes since V1:
- replace IS_ALDERLAKE_N with IS_ADLP_N - Jani Nikula
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Reviewed-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211210051802.4063958-1-tejaskumarx.surendrakumar.upadhyay@intel.com
Commit in Fixes accesses pt_regs before checking whether it is NULL or
not. Make sure the NULL pointer check happens first.
Fixes: 0a5b288e85 ("x86/mce: Prevent severity computation from being instrumented")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20211217102029.GA29708@kili
drm/i915 feature pull #2 for v5.17:
Features and functionality:
- Add eDP privacy screen support (Hans)
- Add Raptor Lake S (RPL-S) support (Anusha)
- Add CD clock squashing support (Mika)
- Properly support ADL-P without force probe (Clint)
- Enable pipe color support (10 bit gamma) for display 13 platforms (Uma)
- Update ADL-P DMC firmware to v2.14 (Madhumitha)
Refactoring and cleanups:
- More FBC refactoring preparing for multiple FBC instances (Ville)
- Plane register cleanups (Ville)
- Header refactoring and include cleanups (Jani)
- Crtc helper and vblank wait function cleanups (Jani, Ville)
- Move pipe/transcoder/abox masks under intel_device_info.display (Ville)
Fixes:
- Add a delay to let eDP source OUI write take effect (Lyude)
- Use div32 version of MPLLB word clock for UHBR on SNPS PHY (Jani)
- Fix DMC firmware loader overflow check (Harshit Mogalapalli)
- Fully disable FBC on FIFO underruns (Ville)
- Disable FBC with double wide pipe as mutually exclusive (Ville)
- DG2 workarounds (Matt)
- Non-x86 build fixes (Siva)
- Fix HDR plane max width for NV12 (Vidya)
- Disable IRQ for selftest timestamp calculation (Anshuman)
- ADL-P VBT DDC pin mapping fix (Tejas)
Merges:
- Backmerge drm-next for privacy screen plumbing (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87ee6f5h9u.fsf@intel.com
instead of fiddling with MSI descriptors.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221813.372357371@linutronix.de
There are 4 users of mc146818_get_time() and none of them was checking
the return value from this function. Change this.
Print the appropriate warnings in callers of mc146818_get_time() instead
of in the function mc146818_get_time() itself, in order not to add
strings to rtc-mc146818-lib.c, which is kind of a library.
The callers of alpha_rtc_read_time() and cmos_read_time() may use the
contents of (struct rtc_time *) even when the functions return a failure
code. Therefore, set the contents of (struct rtc_time *) to 0x00,
which looks more sensible then 0xff and aligns with the (possibly
stale?) comment in cmos_read_time:
/*
* If pm_trace abused the RTC for storage, set the timespec to 0,
* which tells the caller that this RTC value is unusable.
*/
For consistency, do this in mc146818_get_time().
Note: hpet_rtc_interrupt() may call mc146818_get_time() many times a
second. It is very unlikely, though, that the RTC suddenly stops
working and mc146818_get_time() would consistently fail.
Only compile-tested on alpha.
Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: linux-alpha@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211210200131.153887-4-mat.jonczyk@o2.pl
The memory reservation in arch/x86/platform/efi/efi.c depends on at
least two command line parameters. Put it back later in the boot process
and move efi_memblock_x86_reserve_range() out of early_memory_reserve().
An attempt to fix this was done in
8d48bf8206 ("x86/boot: Pull up cmdline preparation and early param parsing")
but that caused other troubles so it got reverted.
The bug this is addressing is:
Dan reports that Anjaneya Chagam can no longer use the efi=nosoftreserve
kernel command line parameter to suppress "soft reservation" behavior.
This is due to the fact that the following call-chain happens at boot:
early_reserve_memory
|-> efi_memblock_x86_reserve_range
|-> efi_fake_memmap_early
which does
if (!efi_soft_reserve_enabled())
return;
and that would have set EFI_MEM_NO_SOFT_RESERVE after having parsed
"nosoftreserve".
However, parse_early_param() gets called *after* it, leading to the boot
cmdline not being taken into account.
See also https://lore.kernel.org/r/e8dd8993c38702ee6dd73b3c11f158617e665607.camel@intel.com
[ bp: Turn into a proper patch. ]
Signed-off-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211213112757.2612-4-bp@alien8.de
This reverts commit 8d48bf8206.
It turned out to be a bad idea as it broke supplying mem= cmdline
parameters due to parse_memopt() requiring preparatory work like setting
up the e820 table in e820__memory_setup() in order to be able to exclude
the range specified by mem=.
Pulling that up would've broken Xen PV again, see threads at
https://lkml.kernel.org/r/20210920120421.29276-1-jgross@suse.com
due to xen_memory_setup() needing the first reservations in
early_reserve_memory() - kernel and initrd - to have happened already.
This could be fixed again by having Xen do those reservations itself...
Long story short, revert this and do a simpler fix in a later patch.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211213112757.2612-3-bp@alien8.de
There are two big uses of do_exit. The first is it's design use to be
the guts of the exit(2) system call. The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.
Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure. In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.
Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.
As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
add_taint() is yet another external facility which the #MC handler
calls. Move that tainting call into the instrumentation-allowed part of
the handler.
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0x617: call to add_taint() leaves .noinstr.text section
While at it, allow instrumentation around the mce_log() call.
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0x690: call to mce_log() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-11-bp@alien8.de
It is called by the #MC handler which is noinstr.
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0xbd6: call to memset() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-9-bp@alien8.de
And allow instrumentation inside it because it does calls to other
facilities which will not be tagged noinstr.
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0xc73: call to mce_panic() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-8-bp@alien8.de
Mark all the MCE severity computation logic noinstr and allow
instrumentation when it "calls out".
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0xc5d: call to mce_severity() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-7-bp@alien8.de
Instead, sandwitch around the call which is done in noinstr context and
mark the caller - mce_gather_info() - as noinstr.
Also, document what the whole instrumentation strategy with #MC is going
to be in the future and where it all is supposed to be going to.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-5-bp@alien8.de
Create EX_TYPE_FAULT_SGX which does as EX_TYPE_FAULT does, except adds
this extra bit that SGX really fancies having.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.961246679@infradead.org
Employ EX_TYPE_EFAULT_REG to store '-EFAULT' into the %[err] register
on exception. All the callers only ever test for 0, so the change
from -1 to -EFAULT is immaterial.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.604494664@infradead.org
Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.
Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The variable chunks is being shifted right and re-assinged the shifted
value which is then returned. Since chunks is not being read afterwards
the assignment is redundant and the >>= operator can be replaced with a
shift >> operator instead.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lkml.kernel.org/r/20211207223735.35173-1-colin.i.king@gmail.com
== Problem ==
The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems. It can be as small as dozens of MB's
and as large as many GB's on servers. Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.
== Solution ==
Introduce a new sysfs file:
/sys/devices/system/node/nodeX/x86/sgx_total_bytes
to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.
'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system. They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available. 'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.
== Implementation Details ==
Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:
== ABI Design Discussion ==
As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node. Essentially, a
single value would rule out NUMA optimizations for enclaves.
Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory. Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:
MemTotal: xxxx kB // sgx_total_bytes (implemented here)
MemFree: yyyy kB // sgx_free_bytes
SwapTotal: zzzz kB // sgx_swapped_bytes
So, at *least* three. I think we will eventually end up needing
something more along the lines of a dozen. A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.
Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific. It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/"
as opposed to "x86/" was also considered. But, there is a real chance
this can get used for other arch-specific purposes.
[ dhansen: rewrite changelog ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
Make arch_restore_msi_irqs() return a boolean which indicates whether the
core code should restore the MSI message or not. Get rid of the indirection
in x86.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI
Link: https://lore.kernel.org/r/20211206210224.485668098@linutronix.de
The unnamed struct sucks and is in the way of further cleanups. Stick the
PCI related MSI data into a real data structure and cleanup all users.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211206210224.374863119@linutronix.de
Currently, text_poke_bp() is very strict to only allow patching a
single instruction; however with straight-line-speculation it will be
required to patch: ret; int3, which is two instructions.
As such, relax the constraints a little to allow int3 padding for all
instructions that do not imply the execution of the next instruction,
ie: RET, JMP.d8 and JMP.d32.
While there, rename the text_poke_loc::rel32 field to ::disp.
Note: this fills up the text_poke_loc structure which is now a round
16 bytes big.
[ bp: Put comments ontop instead of on the side. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134908.082342723@infradead.org
For x86 hybrid CPUs like Alder Lake, the order of CPU selection should
be based strictly on CPU priority. Don't include cluster topology for
hybrid CPUs to avoid interference with such CPU selection order.
On Alder Lake, the Atom CPU cluster has more capacity (4 Atom CPUs) vs
Big core cluster (2 hyperthread CPUs). This could potentially bias CPU
selection towards Atom over Big Core, when Big core CPU has higher
priority.
Fixes: 66558b730f ("sched: Add cluster scheduler level for x86")
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lkml.kernel.org/r/20211204091402.GM16608@worktop.programming.kicks-ass.net
Raptor Lake S(RPL-S) is a version 12
Display, Media and Render. For all i915
purposes it is the same as Alder Lake S (ADL-S).
Introduce RPL-S as a subplatform
of ADL-S. This patch adds PCI ids for RPL-S.
BSpec: 53655
Cc: x86@kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # arch/x86
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211203063545.2254380-2-anusha.srivatsa@intel.com
Replace all ret/retq instructions with ASM_RET in preparation of
making it more than a single instruction.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134907.964635458@infradead.org
INS/OUTS are not supported in TDX guests and cause #UD. Kernel has to
avoid them when running in TDX guest. To support existing usage, string
I/O operations are unrolled using IN/OUT instructions.
AMD SEV platform implements this support by adding unroll
logic in ins#bwl()/outs#bwl() macros with SEV-specific checks.
Since TDX VM guests will also need similar support, use
CC_ATTR_GUEST_UNROLL_STRING_IO and generic cc_platform_has() API to
implement it.
String I/O helpers were the last users of sev_key_active() interface and
sev_enable_key static key. Remove them.
[ bp: Move comment too and do not delete it. ]
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20211206135505.75045-2-kirill.shutemov@linux.intel.com
Theoretically, when the hardware signature in FACS changes, the OS
is supposed to gracefully decline to attempt to resume from S4:
"If the signature has changed, OSPM will not restore the system
context and can boot from scratch"
In practice, Windows doesn't do this and many laptop vendors do allow
the signature to change especially when docking/undocking, so it would
be a bad idea to simply comply with the specification by default in the
general case.
However, there are use cases where we do want the compliant behaviour
and we know it's safe. Specifically, when resuming virtual machines where
we know the hypervisor has changed sufficiently that resume will fail.
We really want to be able to *tell* the guest kernel not to try, so it
boots cleanly and doesn't just crash. This patch provides a way to opt
in to the spec-compliant behaviour on the command line.
A follow-up patch may do this automatically for certain "known good"
machines based on a DMI match, or perhaps just for all hypervisor
guests since there's no good reason a hypervisor would change the
hardware_signature that it exposes to guests *unless* it wants them
to obey the ACPI specification.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Replace all ret/retq instructions with RET in preparation of making
RET a macro. Since AS is case insensitive it's a big no-op without
RET defined.
find arch/x86/ -name \*.S | while read file
do
sed -i 's/\<ret[q]*\>/RET/' $file
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org
MCA handlers check the valid bit in each status register
(MCA_STATUS[Val]) and continue processing the error only if the valid
bit is set.
Set the valid bit unconditionally in the corresponding MCA_STATUS
register and correct any Val=0 injections made by the user as such
errors will get ignored and such injections will be largely pointless.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211104215846.254012-3-Smita.KoralahalliChannabasappa@amd.com
The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
(SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
will read as zero and writes to it will be ignored.
On a hw-type error injection (injection which writes the actual MCA
registers in an attempt to cause a real MCE) check the value of this
register before trying to inject the error.
Do not impose any limitations on a sw injection and allow the user to
test out all the decoding paths without relying on the available hardware,
as its purpose is to just test the code.
[ bp: Heavily massage. ]
Link: https://lkml.kernel.org/r/20211019233641.140275-2-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211104215846.254012-2-Smita.KoralahalliChannabasappa@amd.com
Move the switching code into a function so that it can be re-used and
add a global TLB flush. This makes sure that usage of memory which is
not mapped in the trampoline page-table is reliably caught.
Also move the clearing of CR4.PCIDE before the CR3 switch because the
cr4_clear_bits() function will access data not mapped into the
trampoline page-table.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211202153226.22946-4-joro@8bytes.org
The AP bringup code uses the trampoline_pgd page-table which
establishes global mappings in the user range of the address space.
Flush the global TLB entries after the indentity mappings are removed so
no stale entries remain in the TLB.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211202153226.22946-3-joro@8bytes.org
Properly type the operands being passed to __put_user()/__get_user().
Otherwise, these routines truncate data for dependent instructions
(e.g., INSW) and only read/write one byte.
This has been tested by sending a string with REP OUTSW to a port and
then reading it back in with REP INSW on the same port.
Previous behavior was to only send and receive the first char of the
size. For example, word operations for "abcd" would only read/write
"ac". With change, the full string is now written and read back.
Fixes: f980f9c31a (x86/sev-es: Compile early handler code into kernel image)
Signed-off-by: Michael Sterritt <sterritt@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20211119232757.176201-1-sterritt@google.com
There are cases that the TSC clocksource is wrongly judged as unstable by
the clocksource watchdog mechanism which tries to validate the TSC against
HPET, PM_TIMER or jiffies. While there is hardly a general reliable way to
check the validity of a watchdog, Thomas Gleixner proposed [1]:
"I'm inclined to lift that requirement when the CPU has:
1) X86_FEATURE_CONSTANT_TSC
2) X86_FEATURE_NONSTOP_TSC
3) X86_FEATURE_NONSTOP_TSC_S3
4) X86_FEATURE_TSC_ADJUST
5) At max. 4 sockets
After two decades of horrors we're finally at a point where TSC seems
to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
was really key as we can now detect even small modifications reliably
and the important point is that we can cure them as well (not pretty
but better than all other options)."
As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
of Atom processorz, and is always coupled with X86_FEATURE_CONSTANT_TSC
and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
to use maximal 2 sockets.
The check is done inside tsc_init() before registering 'tsc-early' and
'tsc' clocksources, as there were cases that both of them had been
wrongly judged as unreliable.
For more background of tsc/watchdog, there is a good summary in [2]
[tglx} Update vs. jiffies:
On systems where the only remaining clocksource aside of TSC is jiffies
there is no way to make this work because that creates a circular
dependency. Jiffies accuracy depends on not missing a periodic timer
interrupt, which is not guaranteed. That could be detected by TSC, but as
TSC is not trusted this cannot be compensated. The consequence is a
circulus vitiosus which results in shutting down TSC and falling back to
the jiffies clocksource which is even more unreliable.
[1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
[2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
[ tglx: Refine comment and amend changelog ]
Fixes: 6e3cd95234 ("x86/hpet: Use another crystalball to evaluate HPET usability")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-2-feng.tang@intel.com
The TSC_ADJUST register is checked every time a CPU enters idle state, but
Thomas Gleixner mentioned there is still a caveat that a system won't enter
idle [1], either because it's too busy or configured purposely to not enter
idle.
Setup a periodic timer (every 10 minutes) to make sure the check is
happening on a regular base.
[1] https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/
Fixes: 6e3cd95234 ("x86/hpet: Use another crystalball to evaluate HPET usability")
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-1-feng.tang@intel.com
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Generally this is unlikely to cause a
problem in practice, but it is somewhat unsound, and KCSAN will
legitimately warn that there is a data race.
To avoid such issues, a snapshot of the flags has to be taken prior to
using them. Some places already use READ_ONCE() for that, others do not.
Convert them all to the new flag accessor helpers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20211129130653.2037928-12-mark.rutland@arm.com
Switch SEV implementation to insn_decode_mmio(). The helper is going
to be used by TDX too.
No functional changes.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20211130184933.31005-5-kirill.shutemov@linux.intel.com
Intel CPUs do not support SYSCALL in 32-bit mode, but the kernel
initializes MSR_CSTAR unconditionally. That MSR write is normally
ignored by the CPU, but in a TDX guest it raises a #VE trap.
Exclude Intel CPUs from the MSR_CSTAR initialization.
[ tglx: Fixed the subject line and removed the redundant comment. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20211119035803.4012145-1-sathyanarayanan.kuppuswamy@linux.intel.com
Fix:
WARNING: modpost: vmlinux.o(.text.unlikely+0x64d0): Section mismatch in reference \
from the function prepare_command_line() to the variable .init.data:command_line
The function prepare_command_line() references
the variable __initdata command_line.
This is often because prepare_command_line lacks a __initdata
annotation or the annotation of command_line is wrong.
Apparently some toolchains do different inlining decisions.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YZySgpmBcNNM2qca@zn.tnic
- Move the command line preparation and the early command line parsing
earlier so that the command line parameters which affect
early_reserve_memory(), e.g. efi=nosftreserve, are taken into
account. This was broken when the invocation of early_reserve_memory()
was moved recently.
- Use an atomic type for the SGX page accounting, which is read and
written lockless, to plug various race conditions related to it.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGaYPoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTM7D/9bivpPDiNzfjUV7kNKx6aTUwPjdFer
G0RuuDZqkpJm9j7+51VnQNFssIfAFtzKMJn/DuGVoXF0ERxXMEhJVHiSTeOlCjJU
u1760qFYlAQ1mwvKVNLk2SenWuNZwwgUneY3VvvS4qYsSq7PsbYlekuddPeX0Nws
AJ1llOoCoBkm5vNZ5c3/CmhY6iPSRQQkDmbA11cZZUyWl2uouSk21+ax24IDCvW3
E8Aq9QqB6ND2uukB32kQ7Wp7/UZ4inJHTUXF9UF/8P+N1ftDWeKDjQz6y9U19Tsd
ivuMr6NqqAos/Fpo9PhlGns07C8HeKGf4ronnt9cUMqjzYWfdS+pRT+0pQR+vIPa
M8+jyHQplzeOX9/nKOkpV+u0tYP2zgx8e7yeu5Sion8TqsKqiNOy9+D0D2utUDmw
1x3DzuzGx/mK2OX5gjGSx4ZbS4u0DIAWnF8vB9YfgEfcnqpxr6KdbrY0bLatIbKv
ip9mh0rRYeTkTZ4FGmvy3hFgAmadCODWxva/7AhzbWVZoM+AShwnTDsipkRaaj3V
nMdgcVix8qVDg9YIAn9ziZbxkXKQUXFJn7lZj3KBeWjKcV2svA89S/9YL6JTaSeW
TJ4X6wK8EoApKhEasZhufXBNAl9EmQlBS1k9pHiIjKVuRgBGlzMuEhzvrqZM2+rA
KaUQSwBN6Ij6Dg==
=CJUK
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Move the command line preparation and the early command line parsing
earlier so that the command line parameters which affect
early_reserve_memory(), e.g. efi=nosftreserve, are taken into
account. This was broken when the invocation of
early_reserve_memory() was moved recently.
- Use an atomic type for the SGX page accounting, which is read and
written locklessly, to plug various race conditions related to it.
* tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Fix free page accounting
x86/boot: Pull up cmdline preparation and early param parsing
Pull exit-vs-signal handling fixes from Eric Biederman:
"This is a small set of changes where debuggers were no longer able to
intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit
cleanups.
This is essentially the change you suggested with all of i's dotted
and the t's crossed so that ptrace can intercept all of the cases it
has been able to intercept the past, and all of the cases that made it
to exit without giving ptrace a chance still don't give ptrace a
chance"
* 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Replace force_fatal_sig with force_exit_sig when in doubt
signal: Don't always set SA_IMMUTABLE for forced signals
Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.
Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit. This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.
This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.
In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig. That can be done where it matters on
a case-by-case basis with careful analysis.
Reported-by: Kyle Huey <me@kylehuey.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
Fixes: 00b06da29c ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Fixes: a3616a3c02 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die")
Fixes: 83a1f27ad7 ("signal/powerpc: On swapcontext failure force SIGSEGV")
Fixes: 9bc508cf07 ("signal/s390: Use force_sigsegv in default_trap_handler")
Fixes: 086ec444f8 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig")
Fixes: c317d306d5 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails")
Fixes: 695dd0d634 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit")
Fixes: 1fbd60df8a ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.")
Fixes: 941edc5bf1 ("exit/syscall_user_dispatch: Send ordinary signals on failure")
Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Get rid of cpu_missing because
7bb39313cd ("x86/mce: Make mce_timed_out() identify holdout CPUs")
provides a more detailed message about which CPUs are missing.
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211109112345.2673403-1-zhangzl2013@126.com
The SGX driver maintains a single global free page counter,
sgx_nr_free_pages, that reflects the number of free pages available
across all NUMA nodes. Correspondingly, a list of free pages is
associated with each NUMA node and sgx_nr_free_pages is updated
every time a page is added or removed from any of the free page
lists. The main usage of sgx_nr_free_pages is by the reclaimer
that runs when it (sgx_nr_free_pages) goes below a watermark
to ensure that there are always some free pages available to, for
example, support efficient page faults.
With sgx_nr_free_pages accessed and modified from a few places
it is essential to ensure that these accesses are done safely but
this is not the case. sgx_nr_free_pages is read without any
protection and updated with inconsistent protection by any one
of the spin locks associated with the individual NUMA nodes.
For example:
CPU_A CPU_B
----- -----
spin_lock(&nodeA->lock); spin_lock(&nodeB->lock);
... ...
sgx_nr_free_pages--; /* NOT SAFE */ sgx_nr_free_pages--;
spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock);
Since sgx_nr_free_pages may be protected by different spin locks
while being modified from different CPUs, the following scenario
is possible:
CPU_A CPU_B
----- -----
{sgx_nr_free_pages = 100}
spin_lock(&nodeA->lock); spin_lock(&nodeB->lock);
sgx_nr_free_pages--; sgx_nr_free_pages--;
/* LOAD sgx_nr_free_pages = 100 */ /* LOAD sgx_nr_free_pages = 100 */
/* sgx_nr_free_pages-- */ /* sgx_nr_free_pages-- */
/* STORE sgx_nr_free_pages = 99 */ /* STORE sgx_nr_free_pages = 99 */
spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock);
In the above scenario, sgx_nr_free_pages is decremented from two CPUs
but instead of sgx_nr_free_pages ending with a value that is two less
than it started with, it was only decremented by one while the number
of free pages were actually reduced by two. The consequence of
sgx_nr_free_pages not being protected is that its value may not
accurately reflect the actual number of free pages on the system,
impacting the availability of free pages in support of many flows.
The problematic scenario is when the reclaimer does not run because it
believes there to be sufficient free pages while any attempt to allocate
a page fails because there are no free pages available. In the SGX driver
the reclaimer's watermark is only 32 pages so after encountering the
above example scenario 32 times a user space hang is possible when there
are no more free pages because of repeated page faults caused by no
free pages made available.
The following flow was encountered:
asm_exc_page_fault
...
sgx_vma_fault()
sgx_encl_load_page()
sgx_encl_eldu() // Encrypted page needs to be loaded from backing
// storage into newly allocated SGX memory page
sgx_alloc_epc_page() // Allocate a page of SGX memory
__sgx_alloc_epc_page() // Fails, no free SGX memory
...
if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) // Wake reclaimer
wake_up(&ksgxd_waitq);
return -EBUSY; // Return -EBUSY giving reclaimer time to run
return -EBUSY;
return -EBUSY;
return VM_FAULT_NOPAGE;
The reclaimer is triggered in above flow with the following code:
static bool sgx_should_reclaim(unsigned long watermark)
{
return sgx_nr_free_pages < watermark &&
!list_empty(&sgx_active_page_list);
}
In the problematic scenario there were no free pages available yet the
value of sgx_nr_free_pages was above the watermark. The allocation of
SGX memory thus always failed because of a lack of free pages while no
free pages were made available because the reclaimer is never started
because of sgx_nr_free_pages' incorrect value. The consequence was that
user space kept encountering VM_FAULT_NOPAGE that caused the same
address to be accessed repeatedly with the same result.
Change the global free page counter to an atomic type that
ensures simultaneous updates are done safely. While doing so, move
the updating of the variable outside of the spin lock critical
section to which it does not belong.
Cc: stable@vger.kernel.org
Fixes: 901ddbb9ec ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/a95a40743bbd3f795b465f30922dde7f1ea9e0eb.1637004094.git.reinette.chatre@intel.com
Add a separate, local mask for tracking AVX512 usage which does not
include the opmask xfeature set. Opmask registers usage does not cause
frequency throttling so it is a completely unnecessary false positive.
While at it, carve it out into a separate function to keep that
abomination extracted out.
[ bp: Rediff and cleanup ontop of 5.16-rc1. ]
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20210920053951.4093668-1-goldstein.w.n@gmail.com
There's a perfectly fine prototype in the asm/setup.h header. Use it.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211110220731.2396491-8-brijesh.singh@amd.com
Carve it out so that it is abstracted out of the main boot path. All
other encrypted guest-relevant processing should be placed in there.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211110220731.2396491-7-brijesh.singh@amd.com
Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.
Poison found in a free page results in the page being moved from the
free list to the per-node poison page list.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-5-tony.luck@intel.com
A memory controller patrol scrubber can report poison in a page
that isn't currently being used.
Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages
Poison is a new field separated from flags to avoid having to make all
updates to flags atomic, or integrate poison state changes into some
other locking scheme to protect flags (Currently just sgx_reclaimer_lock
which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags).
In both cases place the poisoned page on a per-node list of poisoned
epc pages to make sure it will not be reallocated.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-4-tony.luck@intel.com
X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.
SGX EPC pages do not have Linux "struct page" associated with them.
Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to
the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX.
Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.
Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page". If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.
Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-3-tony.luck@intel.com
SGX EPC pages go through the following life cycle:
DIRTY ---> FREE ---> IN-USE --\
^ |
\-----------------/
Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.
Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page
is added to a free list and cleared when the page is allocated.
Notes:
1) These transitions are made while holding the node->lock so that
future code that checks the flags while holding the node->lock
can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the
page is on the free list.
2) Initially while the pages are on the dirty list the
SGX_EPC_PAGE_IS_FREE bit is cleared.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-2-tony.luck@intel.com
Explicitly check for MSR_HYPERCALL and MSR_VP_INDEX support when probing
for running as a Hyper-V guest instead of waiting until hyperv_init() to
detect the bogus configuration. Add messages to give the admin a heads
up that they are likely running on a broken virtual machine setup.
At best, silently disabling Hyper-V is confusing and difficult to debug,
e.g. the kernel _says_ it's using all these fancy Hyper-V features, but
always falls back to the native versions. At worst, the half baked setup
will crash/hang the kernel.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20211104182239.1302956-3-seanjc@google.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Dan reports that Anjaneya Chagam can no longer use the efi=nosoftreserve
kernel command line parameter to suppress "soft reservation" behavior.
This is due to the fact that the following call-chain happens at boot:
early_reserve_memory
|-> efi_memblock_x86_reserve_range
|-> efi_fake_memmap_early
which does
if (!efi_soft_reserve_enabled())
return;
and that would have set EFI_MEM_NO_SOFT_RESERVE after having parsed
"nosoftreserve".
However, parse_early_param() gets called *after* it, leading to the boot
cmdline not being taken into account.
Therefore, carve out the command line preparation into a separate
function which does the early param parsing too. So that it all goes
together.
And then call that function before early_reserve_memory() so that the
params would have been parsed by then.
Fixes: 8aa83e6395 ("x86/setup: Call early_reserve_memory() earlier")
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Anjaneya Chagam <anjaneya.chagam@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/e8dd8993c38702ee6dd73b3c11f158617e665607.camel@intel.com
by placing explicit signature bytes after the call trampoline to prevent
patching random other jumps like the CFI jump table entries.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDKsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZeNEADEFTbUJKd8812O9vkY9we1GDAtH7bY
z6sYkh0/rPvYjdPfHuwqW8tUAl+CO2ne2X8FRPKgEdRLg44BY4HaMHmujdbGh3fh
zpqynUBPoOIgtWxAPGdF+JxjrKlzjFd+WwjG3qBXOF3pjKgCc5knyjTucsl6ced3
wF293rSYrIJ6uRv2TTNbM5hWJdC0arWbdMFnwQTxeZR54WLpu7Wfm+CCK41w0fAU
nrfSsv73WEwpmAZNh04wsZsf7h6yCO7dCrIJD/3mpJtrUVBZXuZAKDzUzJPvHJal
T8LcKwxZQAgPv0ubmOCrolj98Qp6PAPSdDJbzNsCJUYEbBqaB2inJ0PeHcZPspy9
YyW00EHXD2UKm/GNF/DIlhoiNxOSh8Wn4b6H5ZRML50bS7jsMp8YVbticWEjItL6
N4/61c45/uPILBS+Lysj0aqyj4TvagiuffJFWjw3YAQ+Gp/pzlJwRNjrw7/4DxAx
KdpM881IKCR8UowBz3gIiA9FrJv2dGMqq31Rs1fjuauxkIX0gV3c64tAIRWrVscT
k6GKGvHSis5cT97K3yhmNH0BUND+Skeku8G/SnTkefvcB85aU/7HBkLLJpw0w84F
F6PTCaCJOEHrl3ADkilsi3z0sKWrph6aAzDEgp6Q6cmo9ulFAGw0bjuJb59xsvVK
flIvTLUY3n76FA==
=dgiF
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 static call update from Thomas Gleixner:
"A single fix for static calls to make the trampoline patching more
robust by placing explicit signature bytes after the call trampoline
to prevent patching random other jumps like the CFI jump table
entries"
* tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
static_call,x86: Robustify trampoline patching
the preemption model
- clear cluster CPU masks too, on the CPU unplug path
- prevent use-after-free in cfs
- Prevent a race condition when updating CPU cache domains
- Factor out common shared part of smp_prepare_cpus() into a common
helper which can be called by both baremetal and Xen, in order to fix a
booting of Xen PV guests
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ8HcACgkQEsHwGGHe
VUouoA//WAZ/dZu7IiM06JhZWswa2yNsdU8qQHys81lEqstaBqiWuZdg1qJTVIir
2d0aN0keiPcsLyAsp1UJ2g/K/7D5vSJWDzsHKfEAToiAm8Tntai2LlSocWWfeSQm
10grDHWpEHbj0hTHTA6HYOr2WbY4/LnR4cdL0WobIzivIrRTx49d0XUOUfWLP5KX
60uM6dSjwpJrQUnvzk+bhGiHVmutFrEJy+UU/0o+nxkdhwraNiSbLi0007BGRCof
6dokRRvLLR09dl1LMG51gVjQch4j/lCx6EWWUhYOFeV3I3gibSCNkmu7dpmMCBTR
QWO01cR9gyFN4xQ2is4I36M5L0/8T+sbGvvXIXNDT/XWr0/p+g6p2mx0cd2XiYIr
ZthGRcxxV/KGmxfPaygKS9tpQseMEIrdd6VjAnGfZ3OS6CtUvYt8d0B2Soj8FALQ
N9fMXDIEP3uUZim8UvCT6HBKlj9LR5uI5n+dAQ6uzsenO9WqeGeldc/N26/+osdN
vo4lNYTqiXJPhJvunYW5t4j5JnUa3grDHioAPWaQRJlWtEZBGKs9SXTcweg/KURb
mNfe1RfSlGJt28RD3E18gXeSS7xWdKgpcVX1rmW/9tUjX04NNDWjq4sAzOj7c+Ir
4sr78XgCY0pUxFaFYxvQWFUy7wcm0zAczo1RGUhcDTf1edDEvjo=
=s2MX
-----END PGP SIGNATURE-----
Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Avoid touching ~100 config files in order to be able to select the
preemption model
- clear cluster CPU masks too, on the CPU unplug path
- prevent use-after-free in cfs
- Prevent a race condition when updating CPU cache domains
- Factor out common shared part of smp_prepare_cpus() into a common
helper which can be called by both baremetal and Xen, in order to fix
a booting of Xen PV guests
* tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt: Restore preemption model selection configs
arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
sched/fair: Prevent dead task groups from regaining cfs_rq's
sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
x86/smp: Factor out parts of native_smp_prepare_cpus()
- Do not log spurious corrected MCEs on SKL too, due to an erratum
- Clarify the path of paravirt ops patches upstream
- Add an optimization to avoid writing out AMX components to sigframes
when former are in init state
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ3CgACgkQEsHwGGHe
VUoLAA/+NXRvcBHYkLaByT9f4OI6B79HzyguIBSfipYiw8ir0H7uEdV5FUCCUgCz
egBRVFpOsXWt1teeuu6ViO+WBHncUxG/ryZ0ka35lri/3kuVYnugZExWDs4MrGR5
vehRXehOxYNRaYc3oLYjubSbxqF1nWz3WWfGfhiBKk0jT/S1T9tX6lsRXlKsJCgj
M4x5aqBWP8HTbFQfqjdHwagNitmSKzgjZvMcC4UWcql33ZCycbjvRdrAzBtw7WRI
UBvgxWVmeMoagu5fqEOoph1oSoFxWuFrweFUjnxJmT6uZrTsfF7BVgXkxdG6eYUy
2Xogcd4bPDBiRgbs0vPEog1tyyrKHOQ6p1pvksySKMPq6ULcSZ6hBpEZRpgr6Y9u
0jB3P6weQgCckx5Hd+iwvX1a+GvEuHSEqAE+j160wFyrsBS5Cir3P1WqthWaPd5I
3nH3h955PokUHPUioUhdf+8cfuP6h6K0nz1gdYI8GR8+fJHhEceT+pLLeyIxj/VM
yr+bq+V7D6Cg62w3z3s9Dzg2XKpxStu1R9L1N/K8MtIGf6Uc7paL6xR27XxhmBp5
Y6bGZw0mxxFhp6AEsFWo3rwLL9Dl5DmFcfgUHHpPK5VP0pVWp48Uapx2Hi2/JzAo
c1o4UkPQa/EZJBPTklmGkS1JNp/2TsEL4Fw7sew+j7DWtsJpCfk=
=Ge2T
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add the model number of a new, Raptor Lake CPU, to intel-family.h
- Do not log spurious corrected MCEs on SKL too, due to an erratum
- Clarify the path of paravirt ops patches upstream
- Add an optimization to avoid writing out AMX components to sigframes
when former are in init state
* tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Raptor Lake to Intel family
x86/mce: Add errata workaround for Skylake SKX37
MAINTAINERS: Add some information to PARAVIRT_OPS entry
x86/fpu: Optimize out sigframe xfeatures when in init state
* Guest API and guest kernel support for SEV live migration
* SEV and SEV-ES intra-host migration
Bugfixes and cleanups for x86:
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status
* Fix selftests on APICv machines
* Fix sparse warnings
* Fix detection of KVM features in CPUID
* Cleanups for bogus writes to MSR_KVM_PV_EOI_EN
* Fixes and cleanups for MSR bitmap handling
* Cleanups for INVPCID
* Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
Bugfixes for ARM:
* Fix finalization of host stage2 mappings
* Tighten the return value of kvm_vcpu_preferred_target()
* Make sure the extraction of ESR_ELx.EC is limited to architected bits
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGO1usUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOsXgf/d2CgZmTxZii/pOt0JKGSDKwRdoti
pfbmtXwTkagqg2tIJtEsxwvcc26Q2VLyg9mlEelX1I8KVlexSa8mDtbGWHLFilsc
JIZY8lE96wtlr9AyuN06K44QingDMIbXzlxcwO+muS3zTlSPsNkPcaVDk5PL35sN
Wy2GA6GCLWv6iTLCtlX5EzbcgkrR+Mypj4lGSdXGRqfBKVFOQjeVt+Q3YzOPuacS
03NBlYOmRxTnAGXfeAVs0/zEa69u/dYjBmwDPe6ERma4u8thHv0U/fDAh5C/ES7q
42Thes/JOYnEZiBQ1xZLsHqRII0achbJKZhmxqPwjRf/u6eXsfz0KY9FBg==
=kN8U
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
"New x86 features:
- Guest API and guest kernel support for SEV live migration
- SEV and SEV-ES intra-host migration
Bugfixes and cleanups for x86:
- Fix misuse of gfn-to-pfn cache when recording guest steal time /
preempted status
- Fix selftests on APICv machines
- Fix sparse warnings
- Fix detection of KVM features in CPUID
- Cleanups for bogus writes to MSR_KVM_PV_EOI_EN
- Fixes and cleanups for MSR bitmap handling
- Cleanups for INVPCID
- Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
Bugfixes for ARM:
- Fix finalization of host stage2 mappings
- Tighten the return value of kvm_vcpu_preferred_target()
- Make sure the extraction of ESR_ELx.EC is limited to architected
bits"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from
KVM: x86: move guest_pv_has out of user_access section
KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS
KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()
KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT
KVM: nVMX: Clean up x2APIC MSR handling for L2
KVM: VMX: Macrofy the MSR bitmap getters and setters
KVM: nVMX: Handle dynamic MSR intercept toggling
KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
KVM: x86: Rename kvm_lapic_enable_pv_eoi()
KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES
KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows
kvm: mmu: Use fast PF path for access tracking of huge pages when possible
KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator
KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool
KVM: x86: Fix recording of guest steal time / preempted status
selftest: KVM: Add intra host migration tests
selftest: KVM: Add open sev dev helper
...
Pull vm86 fix from Eric Biederman:
"Just the removal of an unnecessary (and incorrect) test from a BUG_ON"
* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal/vm86_32: Remove pointless test in BUG_ON
kernel test robot <oliver.sang@intel.com> writes[1]:
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 1a4d21a23c ("signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> with following parameters:
>
>
> [ 70.645554][ T3747] kernel BUG at arch/x86/kernel/vm86_32.c:109!
> [ 70.646185][ T3747] invalid opcode: 0000 [#1] SMP
> [ 70.646682][ T3747] CPU: 0 PID: 3747 Comm: trinity-c6 Not tainted 5.15.0-rc1-00009-g1a4d21a23c4c #1
> [ 70.647598][ T3747] EIP: save_v86_state (arch/x86/kernel/vm86_32.c:109 (discriminator 3))
> [ 70.648113][ T3747] Code: 89 c3 64 8b 35 60 b8 25 c2 83 ec 08 89 55 f0 8b 96 10 19 00 00 89 55 ec e8 c6 2d 0c 00 fb 8b 55 ec 85 d2 74 05 83 3a 00 75 02 <0f> 0b 8b 86 10 19 00 00 8b 4b 38 8b 78 48 31 cf 89 f8 8b 7a 4c 81
> [ 70.650136][ T3747] EAX: 00000001 EBX: f5f49fac ECX: 0000000b EDX: f610b600
> [ 70.650852][ T3747] ESI: f5f79cc0 EDI: f5f79cc0 EBP: f5f49f04 ESP: f5f49ef0
> [ 70.651593][ T3747] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
> [ 70.652413][ T3747] CR0: 80050033 CR2: 00004000 CR3: 35fc7000 CR4: 000406d0
> [ 70.653169][ T3747] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 70.653897][ T3747] DR6: fffe0ff0 DR7: 00000400
> [ 70.654382][ T3747] Call Trace:
> [ 70.654719][ T3747] arch_do_signal_or_restart (arch/x86/kernel/signal.c:792 arch/x86/kernel/signal.c:867)
> [ 70.655288][ T3747] exit_to_user_mode_prepare (kernel/entry/common.c:174 kernel/entry/common.c:209)
> [ 70.655854][ T3747] irqentry_exit_to_user_mode (kernel/entry/common.c:126 kernel/entry/common.c:317)
> [ 70.656450][ T3747] irqentry_exit (kernel/entry/common.c:406)
> [ 70.656897][ T3747] exc_page_fault (arch/x86/mm/fault.c:1535)
> [ 70.657369][ T3747] ? sysvec_kvm_asyncpf_interrupt (arch/x86/mm/fault.c:1488)
> [ 70.657989][ T3747] handle_exception (arch/x86/entry/entry_32.S:1085)
vm86_32.c:109 is: "BUG_ON(!vm86 || !vm86->user_vm86)"
When trying to understand the failure Brian Gerst pointed out[2] that
the code does not need protection against vm86->user_vm86 being NULL.
The copy_from_user code will already handles that case if the address
is going to fault.
Looking futher I realized that if we care about not allowing struct
vm86plus_struct at address 0 it should be do_sys_vm86 (the system
call) that does the filtering. Not way down deep when the emulation
has completed in save_v86_state.
So let's just remove the silly case of attempting to filter a
userspace address with a BUG_ON. Existing userspace can't break and
it won't make the kernel any more attackable as the userspace access
helpers will handle it, if it isn't a good userspace pointer.
I have run the reproducer the fuzzer gave me before I made this change
and it reproduced, and after I made this change and I have not seen
the reported failure. So it does looks like this fixes the reported
issue.
[1] https://lkml.kernel.org/r/20211112074030.GB19820@xsang-OptiPlex-9020
[2] https://lkml.kernel.org/r/CAMzpN2jkK5sAv-Kg_kVnCEyVySiqeTdUORcC=AdG1gV6r8nUew@mail.gmail.com
Suggested-by: Brian Gerst <brgerst@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Errata SKX37 is word-for-word identical to the other errata listed in
this workaround. I happened to notice this after investigating a CMCI
storm on a Skylake host. While I can't confirm this was the root cause,
spurious corrected errors does sound like a likely suspect.
Fixes: 2976908e41 ("x86/mce: Do not log spurious corrected mce errors")
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20211029205759.GA7385@codemonkey.org.uk
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status
* Fix selftests on APICv machines
* Fix sparse warnings
* Fix detection of KVM features in CPUID
* Cleanups for bogus writes to MSR_KVM_PV_EOI_EN
* Fixes and cleanups for MSR bitmap handling
* Cleanups for INVPCID
* Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures