Move the Kconfig symbols related to transparent hugepages (THP) under a
separate config symbol, separate from CONFIG_PPC_BOOK3S_64.
The new symbol is automatically enabled if CONFIG_PPC_BOOK3S_64 is
enabled, so there is no behaviour change, except for the existence of
the new PPC_THP symbol.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240823032911.1238471-1-mpe@ellerman.id.au
'struct kobj_type' is not modified. It is only used in kobject_init()
which takes a 'const struct kobj_type *ktype' parameter.
Constifying this structure moves some data to a read-only section,
so increase over all security.
On a x86_64, compiled with ppc64 defconfig:
Before:
======
text data bss dec hex filename
1885 368 16 2269 8dd arch/powerpc/platforms/pseries/vas-sysfs.o
After:
======
text data bss dec hex filename
1981 272 16 2269 8dd arch/powerpc/platforms/pseries/vas-sysfs.o
Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240826150957.3500237-3-huangxiaojia2@huawei.com
'struct kobj_type' is not modified. It is only used in kobject_init()
which takes a 'const struct kobj_type *ktype' parameter.
Constifying this structure moves some data to a read-only section,
so increase over all security.
On a x86_64, compiled with ppc64 defconfig:
Before:
======
text data bss dec hex filename
3775 256 8 4039 fc7 arch/powerpc/platforms/powernv/opal-dump.o
2679 260 8 2947 b83 arch/powerpc/platforms/powernv/opal-elog.o
After:
======
text data bss dec hex filename
3823 208 8 4039 fc7 arch/powerpc/platforms/powernv/opal-dump.o
2727 212 8 2947 b83 arch/powerpc/platforms/powernv/opal-elog.o
Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240826150957.3500237-2-huangxiaojia2@huawei.com
Besides the obvious (and desired) difference between krealloc() and
kvrealloc(), there is some inconsistency in their function signatures and
behavior:
- krealloc() frees the memory when the requested size is zero, whereas
kvrealloc() simply returns a pointer to the existing allocation.
- krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas
kvrealloc() does not accept a NULL pointer at all and, if passed,
would fault instead.
- krealloc() is self-contained, whereas kvrealloc() relies on the caller
to provide the size of the previous allocation.
Inconsistent behavior throughout allocation APIs is error prone, hence
make kvrealloc() behave like krealloc(), which seems superior in all
mentioned aspects.
Besides that, implementing kvrealloc() by making use of krealloc() and
vrealloc() provides oppertunities to grow (and shrink) allocations more
efficiently. For instance, vrealloc() can be optimized to allocate and
map additional pages to grow the allocation or unmap and free unused pages
to shrink the allocation.
[dakr@kernel.org: document concurrency restrictions]
Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org
[dakr@kernel.org: disable KASAN when switching to vmalloc]
Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org
[dakr@kernel.org: properly document __GFP_ZERO behavior]
Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org
Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the powerpc-pseries specific implementation, the IO hotplug
event is handled in the user space (drmgr tool). For the DLPAR
IO ADD, the corresponding device tree nodes and properties will
be added to the device tree after the device enable. The user
space (drmgr tool) uses configure_connector RTAS call with the
DRC index to retrieve the device nodes and updates the device
tree by writing to /proc/ppc64/ofdt. Under system lockdown,
/dev/mem access to allocate buffers for configure_connector RTAS
call is restricted which means the user space can not issue this
RTAS call and also can not access to /proc/ppc64/ofdt. The
pseries implementation need user interaction to power-on and add
device to the slot during the ADD event handling. So adds
complexity if the complete hotplug ADD event handling moved to
the kernel.
To overcome /dev/mem access restriction, this patch extends the
/sys/kernel/dlpar interface and provides ‘dt add index <drc_index>’
to the user space. The drmgr tool uses this interface to update
the device tree whenever the device is added. This interface
retrieves device tree nodes for the corresponding DRC index using
the configure_connector RTAS call and adds new device nodes /
properties to the device tree.
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822025028.938332-3-haren@linux.ibm.com
In the powerpc-pseries specific implementation, the IO hotplug
event is handled in the user space (drmgr tool). But update the
device tree and /dev/mem access to allocate buffers for some
RTAS calls are restricted when the kernel lockdown feature is
enabled. For the DLPAR IO REMOVE, the corresponding device tree
nodes and properties have to be removed from the device tree
after the device disable. The user space removes the device tree
nodes by updating /proc/ppc64/ofdt which is not allowed under
system lockdown is enabled. This restriction can be resolved
by moving the complete IO hotplug handling in the kernel. But
the pseries implementation need user interaction to power off
and to remove device from the slot during hotplug event handling.
To overcome the /proc/ppc64/ofdt restriction, this patch extends
the /sys/kernel/dlpar interface and provides
‘dt remove index <drc_index>’ to the user space so that drmgr
tool can remove the corresponding device tree nodes based on DRC
index from the device tree.
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822025028.938332-2-haren@linux.ibm.com
_be32 type is defined for some elements in pseries_hp_errorlog
struct but also used them u32 after be32_to_cpu() conversion.
Example: In handle_dlpar_errorlog()
hp_elog->_drc_u.drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
And later assigned to u32 type
dlpar_cpu() - u32 drc_index = hp_elog->_drc_u.drc_index;
This incorrect usage is giving the following warnings and the
patch resolve these warnings with the correct assignment.
arch/powerpc/platforms/pseries/dlpar.c:398:53: sparse: sparse:
incorrect type in argument 1 (different base types) @@
expected unsigned int [usertype] drc_index @@
got restricted __be32 [usertype] drc_index @@
...
arch/powerpc/platforms/pseries/dlpar.c:418:43: sparse: sparse:
incorrect type in assignment (different base types) @@
expected restricted __be32 [usertype] drc_count @@
got unsigned int [usertype] @@
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408182142.wuIKqYae-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202408182302.o7QRO45S-lkp@intel.com/
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
v3:
- Fix warnings from using incorrect data types in pseries_hp_errorlog
struct
v2:
- Remove pr_info() and TODO comments
- Update more information in the commit logs
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822025028.938332-1-haren@linux.ibm.com
This reverts commit bccc58986a.
When STRICT_KERNEL_RWX is selected, EXEC memory must stop where
RW memory start. When pinning iTLBs it means an 8M alignment for
RW data start. That may be acceptable on boards with a lot of
memory but one of my supported boards only has 32 Mbytes and this
forced alignment leads to a waste of almost 4 Mbytes with is more
than 10% of the total memory.
So revert commit bccc58986a ("powerpc/8xx: Always pin kernel text
TLB") but don't restore previous behaviour in ITLB miss handler
as now kernel PGD entries are copied into each process PGDIR.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/01b6780b860c8043b51a1ba9d83acfc6f2dde910.1724173828.git.christophe.leroy@csgroup.eu
The pnv_pci_init_ioda_hub() have been removed since
commit 5ac129cdb5 ("powerpc/powernv/pci: Remove ioda1 support"),
and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822130043.783756-1-cuigaosheng1@huawei.com
The pas_pci_irq_fixup() have been removed since
commit 771f7404a9 ("pasemi_mac: Move the IRQ mapping from the
PCI layer to the driver"), and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822130609.786431-4-cuigaosheng1@huawei.com
The maple_calibrate_decr() have been removed since
commit 10f7e7c15e ("[PATCH] ppc64: consolidate calibrate_decr
implementations"), and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822130609.786431-3-cuigaosheng1@huawei.com
for_each_child_of_node can help to iterate through the device_node,
and we don't need to use while loop. No functional change with this
conversion.
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822085430.25753-3-zhangzekun11@huawei.com
for_each_child_of_node() can help to iterate through the device_node,
and we don't need to do it manually. No functional change with this
conversion.
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822085430.25753-2-zhangzekun11@huawei.com
These changes are for patch_instruction() uses on data. Unlike ppc64
these should not be incorrect as-is, but using the patch_uint() alias
better reflects what kind of data being patched and allows for
benchmarking the effect of different patch_* implementations (e.g.,
skipping instruction flushing when patching data).
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Tested-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240515024445.236364-5-bgray@linux.ibm.com
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This driver doesn't use the driver_data member of struct i2c_device_id,
so don't explicitly initialize this member.
This prepares putting driver_data in an anonymous union which requires
either no initialization or named designators. But it's also a nice
cleanup on its own.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240804112032.3628645-2-u.kleine-koenig@baylibre.com
Use of_property_present() to test for property presence rather than
of_get_property(). This is part of a larger effort to remove callers
of of_get_property() and similar functions. of_get_property() leaks
the DT property data pointer which is a problem for dynamically
allocated nodes which may be freed.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240731191312.1710417-9-robh@kernel.org
Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases to
get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions. It's not all that useful for a "write a whole driver
in rust" type of thing, but the firmware bindings do help out the
phy rust drivers, and the driver core bindings give a solid base on
which others can start their work. There is still a long way to go
here before we have a multitude of rust drivers being added, but
it's a great first step.
- driver core const api changes. This reached across all bus types,
and there are some fix-ups for some not-common bus types that
linux-next and 0-day testing shook out. This work is being done to
help make the rust bindings more safe, as well as the C code, moving
toward the end-goal of allowing us to put driver structures into
read-only memory. We aren't there yet, but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZqH+aQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymoOQCfVBdLcBjEDAGh3L8qHRGMPy4rV2EAoL/r+zKm
cJEYtJpGtWX6aAtugm9E
=ZyJV
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases
to get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions.
It's not all that useful for a "write a whole driver in rust" type
of thing, but the firmware bindings do help out the phy rust
drivers, and the driver core bindings give a solid base on which
others can start their work.
There is still a long way to go here before we have a multitude of
rust drivers being added, but it's a great first step.
- driver core const api changes.
This reached across all bus types, and there are some fix-ups for
some not-common bus types that linux-next and 0-day testing shook
out.
This work is being done to help make the rust bindings more safe,
as well as the C code, moving toward the end-goal of allowing us to
put driver structures into read-only memory. We aren't there yet,
but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
ARM: sa1100: make match function take a const pointer
sysfs/cpu: Make crash_hotplug attribute world-readable
dio: Have dio_bus_match() callback take a const *
zorro: make match function take a const pointer
driver core: module: make module_[add|remove]_driver take a const *
driver core: make driver_find_device() take a const *
driver core: make driver_[create|remove]_file take a const *
firmware_loader: fix soundness issue in `request_internal`
firmware_loader: annotate doctests as `no_run`
devres: Correct code style for functions that return a pointer type
devres: Initialize an uninitialized struct member
devres: Fix memory leakage caused by driver API devm_free_percpu()
devres: Fix devm_krealloc() wasting memory
driver core: platform: Switch to use kmemdup_array()
driver core: have match() callback in struct bus_type take a const *
MAINTAINERS: add Rust device abstractions to DRIVER CORE
device: rust: improve safety comments
MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
firmware: rust: improve safety comments
...
- Remove support for 40x CPUs & platforms.
- Add support to the 64-bit BPF JIT for cpu v4 instructions.
- Fix PCI hotplug driver crash on powernv.
- Fix doorbell emulation for KVM on PAPR guests (nestedv2).
- Fix KVM nested guest handling of some less used SPRs.
- Online NUMA nodes with no CPU/memory if they have a PCI device attached.
- Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels.
- Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE.
Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King,
Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani,
Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski,
Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm),
Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy
Pearson, Uwe Kleine-König, Vaibhav Jain.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmaaUNITHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDA+D/4o7OZ+SY0plTlMKSy3hW/SRXVj/byA
CCKdizNY+3Rf/+K7KhuLOUPXhZOemLPE0xfKS3ND4mIEKCswzzXqmi6kjPH0qd8q
qUhkHbt/LNpNJzZOYYw+usaklMTMdZtAl/jD9WEvGwgu2EYHgrujRIq04kEI1b0e
OPiRnXOZcfevRBepQmYZKHvFlCRRa5vvsQcvLfY64yFqD0AsKTHgIi/48Dn33pb2
hqHYyV1tZA3uT86Z1TgF1OG83VOSDsgc19Sb2xn14O9aJJ7lD2TOgVa4P4FfBlXA
TXYYGQwK31ymGVWGcGfebVdC1ECeTem9n28vlk5I0NO9xNgPok/Ov4DAiZ+u1G0E
3CXRDx9Uz2yPcGBJI2dpxfp2iw83Ad2DtBzAdukMD36xnC7xfrQz+W9SQfbcPJ8e
I5SMAstWuLNgrX7YkjAOnXh1N41kht/mdV6KHdcMxPc7jOtAD65gUOZcgwYLeXlT
Av17Ax0PMbiQ1BpFe2KNr/0T9Ba5k5rN7oDSKncDAq4uX8LcZKHj4bSHT9KroT1C
q+GERspoCYp2VDMO742Jm7KTmQDHsS5y4Q+iSdOR8cQBXF613FaryDxSoJZhg2pf
C2zIVED13RGcjIFcWlv73iA6QpBsphM+WWFz7mjULyJhxFQwm6BYt+Wy6jFu84oH
sOgvPH8YyaK2uA==
=eHVd
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Remove support for 40x CPUs & platforms
- Add support to the 64-bit BPF JIT for cpu v4 instructions
- Fix PCI hotplug driver crash on powernv
- Fix doorbell emulation for KVM on PAPR guests (nestedv2)
- Fix KVM nested guest handling of some less used SPRs
- Online NUMA nodes with no CPU/memory if they have a PCI device
attached
- Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels
- Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE
Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian
King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra,
Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna
Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler,
Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat,
Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and
Vaibhav Jain.
* tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits)
Documentation/powerpc: Mention 40x is removed
powerpc: Remove 40x leftovers
macintosh/therm_windtunnel: fix module unload.
powerpc: Check only single values are passed to CPU/MMU feature checks
powerpc/xmon: Fix disassembly CPU feature checks
powerpc: Drop clang workaround for builtin constant checks
powerpc64/bpf: jit support for signed division and modulo
powerpc64/bpf: jit support for sign extended mov
powerpc64/bpf: jit support for sign extended load
powerpc64/bpf: jit support for unconditional byte swap
powerpc64/bpf: jit support for 32bit offset jmp instruction
powerpc/pci: Hotplug driver bridge support
pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC
powerpc: add missing MODULE_DESCRIPTION() macros
macintosh/mac_hid: add MODULE_DESCRIPTION()
KVM: PPC: add missing MODULE_DESCRIPTION() macros
powerpc/kexec: Use of_property_read_reg()
powerpc/64s/radix/kfence: map __kfence_pool at page granularity
powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API
...
kmsg_dump doesn't forward the panic reason string to the kmsg_dumper
callback.
This patch adds a new struct kmsg_dump_detail, that will hold the
reason and description, and pass it to the dump() callback.
To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc()
function and a macro for backward compatibility.
I've written this for drm_panic, but it can be useful for other
kmsg_dumper.
It allows to see the panic reason, like "sysrq triggered crash"
or "VFS: Unable to mount root fs on xxxx" on the drm panic screen.
v2:
* Use a struct kmsg_dump_detail to hold the reason and description
pointer, for more flexibility if we want to add other parameters.
(Kees Cook)
* Fix powerpc/nvram_64 build, as I didn't update the forward
declaration of oops_to_nvram()
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Kees Cook <kees@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240702122639.248110-1-jfalempe@redhat.com
With ARCH=powerpc, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/kernel/rtas_flash.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/sysdev/rtc_cmos_setup.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/pseries/papr_scm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/spufs/spufs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_thermal.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cpufreq_spudemand.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_powerbutton.o
Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().
This includes 85xx/t1042rdb_diu.c and chrp/nvram.c which, although
they did not produce a warning with the powerpc allmodconfig
configuration, may cause this warning with other configurations.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240615-md-powerpc-arch-powerpc-v1-1-ba4956bea47a@quicinc.com
In the match() callback, the struct device_driver * should not be
changed, so change the function callback to be a const *. This is one
step of many towards making the driver core safe to have struct
device_driver in read-only memory.
Because the match() callback is in all busses, all busses are modified
to handle this properly. This does entail switching some container_of()
calls to container_of_const() to properly handle the constant *.
For some busses, like PCI and USB and HV, the const * is cast away in
the match callback as those busses do want to modify those structures at
this point in time (they have a local lock in the driver structure.)
That will have to be changed in the future if they wish to have their
struct device * in read-only-memory.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alex Elder <elder@kernel.org>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace 4xx usage with 44x, and replace 4xx_SOC with 44x.
Also, as pointed out by Christophe, if 44x || BOOKE can be simplified to
just test BOOKE, because 44x always selects BOOKE.
Retain the CONFIG_4xx symbol, as there are drivers that use it to mean
4xx || 44x, those will need updating before CONFIG_4xx can be removed.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-6-mpe@ellerman.id.au
Now that 40x platforms have gone, remove support
for 40x in the core of powerpc arch.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-4-mpe@ellerman.id.au
40x platforms have been orphaned for many years.
Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-1-mpe@ellerman.id.au
kexec on pseries disables AIL (reloc_on_exc), required for scv
instruction support, before other CPUs have been shut down. This means
they can execute scv instructions after AIL is disabled, which causes an
interrupt at an unexpected entry location that crashes the kernel.
Change the kexec sequence to disable AIL after other CPUs have been
brought down.
As a refresher, the real-mode scv interrupt vector is 0x17000, and the
fixed-location head code probably couldn't easily deal with implementing
such high addresses so it was just decided not to support that interrupt
at all.
Fixes: 7fa95f9ada ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Closes: https://lore.kernel.org/3b4b2943-49ad-4619-b195-bc416f1d1409@linux.ibm.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Link: https://msgid.link/20240625134047.298759-1-npiggin@gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PPC64 IOMMU API defines iommu_table_group_ops which handles DMA
windows for PEs, their ownership transfer, create/set/unset the TCE
tables for the Dynamic DMA wundows(DDW). VFIOS uses these APIs for
support on POWER.
The commit 9d67c94335 ("powerpc/iommu: Add "borrowing"
iommu_table_group_ops") implemented partial support for this API with
"borrow" mechanism wherein the DMA windows if created already by the
host driver, they would be available for VFIO to use. Also, it didn't
have the support to control/modify the window size or the IO page
size.
The current patch implements all the necessary iommu_table_group_ops
APIs there by avoiding the "borrrowing". So, just the way it is on the
PowerNV platform, with this patch the iommu table group ownership is
transferred to the VFIO PPC subdriver, the iommu table, DMA windows
creation/deletion all driven through the APIs.
The pSeries uses the query-pe-dma-window, create-pe-dma-window and
reset-pe-dma-window RTAS calls for DMA window creation, deletion and
reset to defaul. The RTAs calls do show some minor differences to the
way things are to be handled on the pSeries which are listed below.
* On pSeries, the default DMA window size is "fixed" cannot be custom
sized as requested by the user. For non-SRIOV VFs, It is fixed at 2GB
and for SRIOV VFs, its variable sized based on the capacity assigned
to it during the VF assignment to the LPAR. So, for the default DMA
window alone the size if requested less than tce32_size, the smaller
size is enforced using the iommu table->it_size.
* The DMA start address for 32-bit window is 0, and for the 64-bit
window in case of PowerNV is hardcoded to TVE select (bit 59) at 512PiB
offset. This address is returned at the time of create_table() API call
(even before the window is created), the subsequent set_window() call
actually opens the DMA window. On pSeries, the DMA start address for
32-bit window is known from the 'ibm,dma-window' DT property. However,
the 64-bit window start address is not known until the create-pe-dma
RTAS call is made. So, the create_table() which returns the DMA window
start address actually opens the DMA window and returns the DMA start
address as returned by the Hypervisor for the create-pe-dma RTAS call.
* The reset-pe-dma RTAS call resets the DMA windows and restores the
default DMA window, however it does not clear the TCE table entries
if there are any. In case of ownership transfer from platform domain
which used direct mapping, the patch chooses remove-pe-dma instead of
reset-pe for the 64-bit window intentionally so that the
clear_dma_window() is called.
Other than the DMA window management changes mentioned above, the
patch also brings back the userspace view for the single level TCE
as it existed before commit 090bad39b2 ("powerpc/powernv: Add
indirect levels to it_userspace") along with the relavent
refactoring.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923275958.1397.907964437142542242.stgit@linux.ibm.com
This patch basically brings consistency with PowerNV approach
to use the first freely available iommu table when the default
window is removed.
The pSeries iommu code convention has been that the table[0] is
for the default 32 bit DMA window and the table[1] is for the
64 bit DDW.
With VFs having only 1 DMA window, the default has to be removed
for creating the larger DMA window. The existing code uses the
table[1] for that, while marking the table[0] as NULL. This is
fine as long as the host driver itself uses the device.
For the VFIO user, on pSeries there is no way to skip table[0]
as the VFIO subdriver uses the first freely available table.
The window 0, when created as 64-bit DDW in that context would
still be on table[0], as the maximum number of windows is 1.
This won't have any impact for the host driver as the table is
fetched from the device's iommu_table_base.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
[mpe: Rebase and resolve conflicts]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923272328.1397.1817843961216868850.stgit@linux.ibm.com
The ioctl VFIO_IOMMU_SPAPR_TCE_GET_INFO is not reporting the
actuals on the platform as not all the details are correctly
collected during the platform probe/scan into the iommu_table_group.
Collect the information during the device setup time as the DMA
window property is already looked up on parent nodes anyway.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923271138.1397.7908302630061814623.stgit@linux.ibm.com
The PowerNV specific table_group_ops are defined in powernv/pci-ioda.c.
The pSeries specific table_group_ops are sitting in the generic powerpc
file. Move it to where it actually belong(pseries/iommu.c).
The functions are currently defined even for CONFIG_PPC_POWERNV
which are unused on PowerNV.
Only code movement, no functional changes intended.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923269701.1397.15758640002786937132.stgit@linux.ibm.com
VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
The open and close VAS window functions hold mutex and then issue
these HCALLs. So these operations can take longer than the
necessary when multiple threads issue open or close window APIs
simultaneously, especially might affect the performance in the
case of repeat open/close APIs for each compression request.
Multiple tasks can open / close VAS windows at the same time
which depends on the available VAS credits. For example, 240
cores system provides 4800 VAS credits. It means 4800 tasks can
execute open VAS windows HCALLs with the mutex. Since each
msleep() will often sleep more than 20 msecs, some tasks are
waiting more than 120 secs to acquire mutex. It can cause hung
traces for these tasks in dmesg due to mutex contention around
open/close HCALLs.
Instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again. So since each
task sleep 10 msecs maximum, this patch allow more tasks can
issue open/close VAS calls without any hung traces in the
dmesg.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240116055910.421605-1-haren@linux.ibm.com
In the current design, a numa-node is made online only if that node is
attached to cpu/memory. With this design, if any PCI/IO device is found
to be attached to a numa-node which is not online then the numa-node
id of the corresponding PCI/IO device is set to NUMA_NO_NODE(-1). This
design may negatively impact the performance of PCIe device if the
numa-node assigned to PCIe device is -1 because in such case we may not
be able to accurately calculate the distance between two nodes.
The multi-controller NVMe PCIe disk has an issue with calculating the
node distance if the PCIe NVMe controller is attached to a PCI host
bridge which has numa-node id value set to NUMA_NO_NODE. This patch
helps fix this ensuring that a cpu/memory less numa node is made online
if it's attached to PCI host bridge.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240517142531.3273464-3-nilay@linux.ibm.com
Dynamic DMA Window (DDW) supports TCEs that are backed by 2MB page
size. In most configurations, DDW is big enough to pre-map all of LPAR
memory for IO. Pre-mapping of memory for DMA results in improvements in
IO performance.
Persistent memory, vPMEM, can be assigned to an LPAR as well. vPMEM is
not contiguous with LPAR memory and usually is assigned at high memory
addresses. This makes is not possible to pre-map both vPMEM and LPAR
memory in the same DDW.
For a dedicated adapter this limitation is not an issue. Dedicated
adapters can have both Default DMA window, which is backed by 4K page
size and a DDW backed by 2MB page size TCEs. In this scenario, LPAR
memory is pre-mapped in the DDW. Any DMA going to the vPMEM is routed
via dynamically allocated TCEs in the default window.
The issue arises with SR-IOV adapters. There is only one DMA window -
either Default or DDW. If an LPAR has vPMEM assigned, memory is not
pre-mapped in the DDW since TCEs needs to be allocated for vPMEM as well.
In this case, DDW is created and TCEs are dynamically allocated for both
vPMEM and LPAR memory.
Today, DDW is only used in single mode - direct mapped TCEs or
dynamically mapped TCEs. This enhancement breaks a single DDW in 2
regions -
1. First region to pre-map LPAR memory
2. Second region to dynamically allocate TCEs for IO to vPMEM
The DDW is split only if it is big enough to pre-map complete LPAR
memory and still have some space left to dynamically map vPMEM. Maximum
size possible DDW is created as permitted by the Hypervisor.
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240514014608.35537-1-gbatra@linux.ibm.com
It's not an error or exceptional situation when the hosting
environment does not expose a name for the LP/guest via RTAS or the
device tree. This happens with qemu when run without the '-name'
option. The message also lacks a newline. Remove it.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: eddaa9a402 ("powerpc/pseries: read the lpar name from the firmware")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240524-lparcfg-updates-v2-1-62e2e9d28724@linux.ibm.com
Here is the small set of driver core and kernfs changes for 6.10-rc1.
Nothing major here at all, just a small set of changes for some driver
core apis, and minor fixups. Included in here are:
- sysfs_bin_attr_simple_read() helper added and used
- device_show_string() helper added and used
All usages of these were acked by the various maintainers. Also in here
are:
- kernfs minor cleanup
- removed unused functions
- typo fix in documentation
- pay attention to sysfs_create_link() failures in module.c finally.
All of these have been in linux-next for a very long time with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk3+hQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylfTwCfUyHWkDZuZ7ehdtjzfmcd4EKZBK8An3AAV99G
ox8PXMxuFTaUEdT/69FQ
=2sEo
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the small set of driver core and kernfs changes for 6.10-rc1.
Nothing major here at all, just a small set of changes for some driver
core apis, and minor fixups. Included in here are:
- sysfs_bin_attr_simple_read() helper added and used
- device_show_string() helper added and used
All usages of these were acked by the various maintainers. Also in
here are:
- kernfs minor cleanup
- removed unused functions
- typo fix in documentation
- pay attention to sysfs_create_link() failures in module.c finally
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
device property: Fix a typo in the description of device_get_child_node_count()
kernfs: mount: Remove unnecessary ‘NULL’ values from knparent
scsi: Use device_show_string() helper for sysfs attributes
platform/x86: Use device_show_string() helper for sysfs attributes
perf: Use device_show_string() helper for sysfs attributes
IB/qib: Use device_show_string() helper for sysfs attributes
hwmon: Use device_show_string() helper for sysfs attributes
driver core: Add device_show_string() helper for sysfs attributes
treewide: Use sysfs_bin_attr_simple_read() helper
sysfs: Add sysfs_bin_attr_simple_read() helper
module: don't ignore sysfs_create_link() failures
driver core: Remove unused platform_notify, platform_notify_remove
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings via
prctl, notably NPHIE which controls hashst/hashchk for ROP protection.
- Install powerpc selftests in sub-directories. Note this changes the way
run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove events.
- Other small features, cleanups and fixes.
Thanks to: Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann,
Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe
Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner,
Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz,
Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong,
Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer,
Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas
Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta,
Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, Zhao Chenhui.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZHLtwTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgCGdD/0cqQkYl6+E0/K68Y7jnAWF+l0LNFlm
/4jZ+zKXPiPhSdaQq4xo2ZjEooUPsm3c+AHidmrAtOMBULvv4pyciu61hrVu4Y2b
aAudkBMUc+i/Lfaz7fq1KnN4LDFVm7xZZ+i/ju9tOBLMpOZ3YZ+YoOGA6nqsshJF
XuB5h0T+H55he1wBpvyyrsUUyss53Mp3IsajxdwBOsUDDp0fSAg8SLEyhoiK3BsQ
EjEa6iEqJSBheqFEXPvqsMuqM3k51CHe/pCOMODjo7P+u/MNrClZUscZKXGB5xq9
Bu3SPxIYfRmU4XE53517faElEPmlxSBrjQGCD1EGEVXGsjn6r7TD6R5voow3SoUq
CLTy90KNNrS1cIqeomu6bJ/anzYrViqTdekImA7Vb+Ol8f+uT9l+l1D75eYOKPQ3
N0AHoa4rnWIb5kjCAjHaZ54O+B2q2tPlQqFUmt+BrvZyKS13zjE36stnArxP3MPC
Xw6y3huX3AkZiJ4mQYRiBn//xGOLwrRCd/EoTDnoe08yq0Hoor6qIm4uEy2Nu3Kf
0mBsEOxMsmQd6NEq43B/sFgVbbxKhAyxfZ9gHqxDQZcgoxXcMesyj/n4+jM5sRYK
zmavLlykM2Tjlh1evs8+e0mCEwDjDn2GRlqstJQTrmnGhbMKi3jvw9I7gGtZVqbS
kAflTXzsIXvxBA==
=GoCV
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings
via prctl, notably NPHIE which controls hashst/hashchk for ROP
protection.
- Install powerpc selftests in sub-directories. Note this changes the
way run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory
add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove
events.
- Other small features, cleanups and fixes.
Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.
* tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
powerpc/fadump: Fix section mismatch warning
powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
powerpc/fadump: update documentation about bootargs_append
powerpc/fadump: pass additional parameters when fadump is active
powerpc/fadump: setup additional parameters for dump capture kernel
powerpc/pseries/fadump: add support for multiple boot memory regions
selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
KVM: PPC: Fix documentation for ppc mmu caps
KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
powerpc/code-patching: Use dedicated memory routines for patching
powerpc/code-patching: Test patch_instructions() during boot
powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
powerpc: Fix typos
powerpc/eeh: Fix spelling of the word "auxillary" and update comment
macintosh/ams: Fix unused variable warning
powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
...
Code cleanups, remove duplicate code, and updates for current
interfaces.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQSgX9xt+GwmrJEQ+euebuN7TNx1MQUCZkThOhQcaXJhLndlaW55
QGludGVsLmNvbQAKCRCebuN7TNx1MWUvAP9VuWLTc+bCguRMbArOsdE2T60780pR
DT12KXfwm/FvKAD9Hyz8S2CkwvdlwC5hqH2AmdpzDhpJ1f9LBpcsb6GUKwk=
=eZC/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm updates from Ira Weiny:
"The changes include removing duplicate code and updating the nvdimm
tree to the current kernel interfaces such as using const for struct
device_type and changing the platform remove callback signature"
* tag 'libnvdimm-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: remove redundant assignment to variable rc
ndtest: Convert to platform remove callback returning void
nvdimm/btt: always set max_integrity_segments
nvdimm: remove nd_integrity_init
dax: constify the struct device_type usage
powerpc/papr_scm: Move duplicate definitions to common header files
For fadump case, passing additional parameters to dump capture kernel
helps in minimizing the memory footprint for it and also provides the
flexibility to disable components/modules, like hugepages, that are
hindering the boot process of the special dump capture environment.
Set up a dedicated parameter area to be passed to the capture kernel.
This area type is defined as RTAS_FADUMP_PARAM_AREA. Sysfs attribute
'/sys/kernel/fadump/bootargs_append' is exported to the userspace to
specify the additional parameters to be passed to the capture kernel
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240509115755.519982-3-hbathini@linux.ibm.com
Currently, fadump on pseries assumes a single boot memory region even
though f/w supports more than one boot memory region. Add support for
more boot memory regions to make the implementation flexible for any
enhancements that introduce other region types. For this, rtas memory
structure for fadump is updated to have multiple boot memory regions
instead of just one. Additionally, methods responsible for creating
the fadump memory structure during both the first and second kernel
boot have been modified to take these multiple boot memory regions
into account. Also, a new callback has been added to the fadump_ops
structure to get the maximum boot memory regions supported by the
platform.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240509115755.519982-2-hbathini@linux.ibm.com
This register number is hardware-specific, rename it for clarity.
FIXME comments are added in a few places where it seems like the wrong
register is used. As I can't test this, only the rename is done with no
functional change.
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240124105031.45734-1-matthias.schiffer@ew.tq-group.com
Fix typos, most reported by "codespell arch/powerpc". Only touches
comments, no code changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240103231605.1801364-8-helgaas@kernel.org
All supported compilers today (gcc v5.1+ and clang v11+) have support for
-mcmodel=medium. As such, NO_MINIMAL_TOC is no longer being set. Remove
NO_MINIMAL_TOC as well as the fallback to -mminimal-toc.
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240110141237.3179199-1-naveen@kernel.org
This part was commented in about 19 years before.
If there are no plans to enable this part code in the future,
we can remove this dead code.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240126025030.577795-1-chentao@kylinos.cn
This part was commented from commit a33a7d7309
("[PATCH] spufs: implement mfc access for PPE-side DMA")
in about 18 years before.
If there are no plans to enable this part code in the future,
we can remove this dead code.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240126021258.574916-1-chentao@kylinos.cn
This part was commented from commit 165785e5c0 ("[POWERPC] Cell
iommu support") in about 17 years before.
If there are no plans to enable this part code in the future,
we can remove this dead code.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240125082637.532826-1-chentao@kylinos.cn
This commit adds kernel-doc style comments with complete parameter
descriptions for the function smp_startup_cpu().
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408053109.96360-2-yang.lee@linux.alibaba.com
We noticed the following nuisance messages during boot process:
vio vio: uevent: failed to send synthetic uevent
vio 4000: uevent: failed to send synthetic uevent
vio 4001: uevent: failed to send synthetic uevent
vio 4002: uevent: failedto send synthetic uevent
vio 4004: uevent: failed to send synthetic uevent
It's caused by either vio_register_device_node() failing to set
dev->of_node or the node is missing a "compatible" property. To match
the definition of modalias in modalias_show(), remove the return of
ENODEV in such cases. The failure messages is also suppressed with this
change.
Signed-off-by: Lidong Zhong <lidong.zhong@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240411020450.12725-1-lidong.zhong@suse.com
Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.
Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.
The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.
There are several notable issues associated with re-registering fadump
for every memory add/remove events.
1. Bulk memory add/remove events with udev-based fadump re-registration
can lead to race conditions and, more importantly, it creates a wide
window during which fadump is inactive until all memory add/remove
events are settled.
2. Re-registering fadump for every memory add/remove event is
inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
available during early boot and remains fixed thereafter. However, if
elfcorehdr is later recreated with additional memblock regions, its
size will increase, potentially leading to memory corruption.
Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.
At present, the first kernel prepares fadump header and stores it in the
fadump reserved area. The fadump header includes the start address of
the elfcorehdr, crashing CPU details, and other relevant information. In
the event of a crash in the first kernel, the second/fadump boots and
accesses the fadump header prepared by the first kernel. It then
performs the following steps in a platform-specific function
[rtas|opal]_fadump_process:
1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr
Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr and set its address to the global variable elfcorehdr_addr
for the vmcore module to process it in the second/fadump kernel.
Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.
To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.
At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.
In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.
1. version:
It stores the fadump header version, which is currently set to 1.
This provides flexibility to update the fadump crash info header in
the future without changing the magic number. For each change in the
fadump header, the version will be increased. This will help the
updated kernel determine how to handle kernel dumps from older
kernels. The magic number remains relevant for checking fadump header
corruption.
2. pt_regs_sz/cpu_mask_sz:
Store size of pt_regs and cpu_mask structure of first kernel. These
attributes are used to prevent dump processing if the sizes of
pt_regs or cpu_mask structure differ between the first and fadump
kernels.
Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here then kernel fail to collect the dump and
prints relevant error message on the console.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422195932.1583833-2-sourabhjain@linux.ibm.com
Couple of Minor fixes:
- hcall return values are long. Fix that for h_get_mpp, h_get_ppp and
parse_ppp_data
- If hcall fails, values set should be at-least zero. It shouldn't be
uninitialized values. Fix that for h_get_mpp and h_get_ppp
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240412092047.455483-3-sshegde@linux.ibm.com
When there are no options specified for lparstat, it is expected to
give reports since LPAR(Logical Partition) boot.
APP(Available Processor Pool) is an indicator of how many cores in the
shared pool are free to use in Shared Processor LPAR(SPLPAR). APP is
derived using pool_idle_time which is obtained using H_PIC call.
The interval based reports show correct APP value while since boot
report shows very high APP values. This happens because in that case APP
is obtained by dividing pool idle time by LPAR uptime. Since pool idle
time is reported by the PowerVM hypervisor since its boot, it need not
align with LPAR boot.
To fix that export boot pool idle time in lparcfg and powerpc-utils will
use this info to derive APP as below for since boot reports.
APP = (pool idle time - boot pool idle time) / (uptime * timebase)
Results:: Observe APP values.
====================== Shared LPAR ================================
lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600
So in this case app is expected to close to 37-6=31.
====== 6.9-rc1 and lparstat 1.3.10 =============
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21
=== With this patch and powerpc-utils patch to do the above equation ===
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21
=====================================================================
Note: physc, purr/idle purr being inaccurate is being handled in a
separate patch in powerpc-utils tree.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240412092047.455483-2-sshegde@linux.ibm.com
After redefining alloc_pages, all uses of that name are being replaced.
Change the conflicting names to prevent preprocessor from replacing them
when it's not intended.
Link: https://lkml.kernel.org/r/20240321163705.3067592-18-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
papr_scm and ndtest share common PDSM payload structs like
nd_papr_pdsm_health. Presently these structs are duplicated across
papr_pdsm.h and ndtest.h header files. Since 'ndtest' is essentially
arch independent and can run on platforms other than PPC64, a way
needs to be deviced to avoid redundancy and duplication of PDSM
structs in future.
So the patch proposes moving the PDSM header from arch/powerpc/include-
-/uapi/ to the generic include/uapi/linux directory. Also, there
are some #defines common between papr_scm and ndtest which are not
exported to the user space. So, move them to a header file which
can be shared across ndtest and papr_scm via newly introduced
include/linux/papr_scm.h.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/170638176942.112443.2937254675538057083.stgit@ltcd48-lp2.aus.stglab.ibm.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
At the time of LPAR boot up, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.
There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR boot up. One of the scenario is
where the firmware has frozen the PE due to some error condition. This
PE is frozen for 24 hours or unless the whole system is reinitialized.
Within this time frame, if the LPAR is booted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.
Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.
BUG: Kernel NULL pointer dereference on read at 0x000000c8
Faulting instruction address: 0xc0000000001024c0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
NIP: c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450
REGS: c0000000037db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default)
MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000822 XER: 00000000
CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0
...
NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
Call Trace:
pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
pcibios_setup_bus_self+0x1c0/0x370
__of_scan_bus+0x2f8/0x330
pcibios_scan_phb+0x280/0x3d0
pcibios_init+0x88/0x12c
do_one_initcall+0x60/0x320
kernel_init_freeable+0x344/0x3e4
kernel_init+0x34/0x1d0
ret_from_kernel_user_thread+0x14/0x1c
Fixes: b1fc44eaa9 ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422205141.10662-1-gbatra@linux.ibm.com
Currently, plpks_confirm_object_flushed() function polls for 5msec in
total instead of 5sec.
Keep max polling time consistent for all the H_CALLs, which take longer
than expected, to be 5sec. Also, make use of fsleep() everywhere to
insert delay.
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Fixes: 2454a7af0f ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240418031230.170954-1-nayna@linux.ibm.com
Deduplicate ->read() callbacks of bin_attributes which are backed by a
simple buffer in memory:
Use the newly introduced sysfs_bin_attr_simple_read() helper instead,
either by referencing it directly or by declaring such bin_attributes
with BIN_ATTR_SIMPLE_RO() or BIN_ATTR_SIMPLE_ADMIN_RO().
Aside from a reduction of LoC, this shaves off a few bytes from vmlinux
(304 bytes on an x86_64 allyesconfig).
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Zhi Wang <zhiwang@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/92ee0a0e83a5a3f3474845db6c8575297698933a.1712410202.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
of_gpio.h is deprecated and subject to remove.
The driver doesn't use it directly, replace it
with what is really being used.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240313135645.2066362-1-andriy.shevchenko@linux.intel.com
The struct ps3_notification_device in the ps3_probe_thread routine
is too large to be on the stack, causing a warning for an
allmodconfig build with clang.
Change the struct ps3_notification_device from a variable on the stack
to a dynamically allocated variable.
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d64f06f4-81ae-4ec5-ab3b-d7f7f091e0ac@infradead.org
Remove CONFIG_CRASH_DUMP dependency on CONFIG_KEXEC. CONFIG_KEXEC_CORE
was used at places where CONFIG_CRASH_DUMP or CONFIG_CRASH_RESERVE was
appropriate. Replace with appropriate #ifdefs to support CONFIG_KEXEC
and !CONFIG_CRASH_DUMP configuration option. Also, make CONFIG_FA_DUMP
dependent on CONFIG_CRASH_DUMP to avoid unmet dependencies for FA_DUMP
with !CONFIG_KEXEC_CORE configuration option.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240226103010.589537-4-hbathini@linux.ibm.com
- Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use by glibc.
- Add support for recognising the Power11 architected and raw PVRs.
- Add support for nr_cpus=n on the command line where the boot CPU is >= n.
- Add ppcxx_allmodconfig targets for all 32-bit sub-arches.
- Other small features, cleanups and fixes.
Thanks to: Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff Levand,
Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan, Li zeming,
Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor, Nicholas Piggin, Peter
Bergner, Qiheng Lin, Randy Dunlap, Ricardo B. Marliere, Rob Herring, Sathvika
Vasireddy, Shrikanth Hegde, Uwe Kleine-König, Vaibhav Jain, Wen Xiong.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmX01vgTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJ4bEACVsxXXjbjl+WKgWNjHsM7sVwUX/sSV
z43iVycLPXDqochSkkgKjyIEFowaWhjgWVHFHmUXWxB5FjjFEEoH4FPo3VB0IY48
VoSFT6PhzqXDrGmt2fWsJ+k6zUyJZa8pNS38DHg1yuuYDAa0KWxd3E/x/r0qzsbr
vcas1uWcDWgjoUDMBuJpyx0sYTl6+mR9HlZuM4+aNQdzhTFU/jK69hAN0RFvryes
K2/fLgI0fgLZpQDogCn4HV1/4uixi1eEFlVNXkwvMYDpQVo2FqiBaWLF0hNLWNCk
kvm/fYIJhdFoNlp38jVKv0KJnBhW7aAs3prF+8B3YL2B23rLnvA6ZLZKHcdBAeLb
8PJMRrbAbmVxOnVSAG0fgU+0dEdkJQ+0ABqa+usMOV7xIPg9uIui1YrKT1KVq6Fs
KyGHM5EQuBC/P6bTsKO6X+1beY2QIfwWxaIkoo8pj6d0WU69qU4u+LzQiDO4XR0L
UQQguB1Qo8yaip3rHXhuv0hlnMNVAVye56Zw63uq1MWGkewRKSkY91Ms02L+pXpF
r6+96xoFB0ulKZFnyxyBdkj2iC0426fHtTiiJFfQ4R1fiibPKtAx9P59WYnqymVh
QsSYqlgC2/jWzRgqJTweLp/XQK8fWqmFkNmCGDN1N9Sij9Xjx/8aZb5dvwJkSBnK
rZ4ObxBoaCPbPA==
=K9Ok
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use
by glibc
- Add support for recognising the Power11 architected and raw PVRs
- Add support for nr_cpus=n on the command line where the
boot CPU is >= n
- Add ppcxx_allmodconfig targets for all 32-bit sub-arches
- Other small features, cleanups and fixes
Thanks to Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff
Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan,
Li zeming, Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor,
Nicholas Piggin, Peter Bergner, Qiheng Lin, Randy Dunlap, Ricardo B.
Marliere, Rob Herring, Sathvika Vasireddy, Shrikanth Hegde, Uwe
Kleine-König, Vaibhav Jain, and Wen Xiong.
* tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (71 commits)
powerpc/macio: Make remove callback of macio driver void returned
powerpc/83xx: Fix build failure with FPU=n
powerpc/64s: Fix get_hugepd_cache_index() build failure
powerpc/4xx: Fix warp_gpio_leds build failure
powerpc/amigaone: Make several functions static
powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc.
macintosh/adb: make adb_dev_class constant
powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
powerpc/fsl: Fix mfpmr() asm constraint error
powerpc: Remove cpu-as-y completely
powerpc/fsl: Modernise mt/mfpmr
powerpc/fsl: Fix mfpmr build errors with newer binutils
powerpc/64s: Use .machine power4 around dcbt
powerpc/64s: Move dcbt/dcbtst sequence into a macro
powerpc/mm: Code cleanup for __hash_page_thp
powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks
powerpc/irq: Allow softirq to hardirq stack transition
powerpc: Stop using of_root
powerpc/machdep: Define 'compatibles' property in ppc_md and use it
of: Reimplement of_machine_is_compatible() using of_machine_compatible_match()
...
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is hotplugged
as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving policy
wherein we allocate memory across nodes in a weighted fashion rather
than uniformly. This is beneficial in heterogeneous memory environments
appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the process
has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown situations.
The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's series
"Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page faults.
He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction test",
Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
our handling of DAX on archiecctures which have virtually aliasing data
caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
improvements in worst-case mmap_lock hold times during certain
userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability improvements
in his series "Mitigate a vmap lock contention". It realizes a 12x
improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging of
large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages() to
an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which are
configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
=TG55
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
core:
- EDID cleanups
- scheduler error handling fixes
- managed: add drmm_release_action() with tests
- add ratelimited drm debug print
- DPCD PSR early transport macro
- DP tunneling and bandwidth allocation helpers
- remove built-in edids
- dp: Avoid AUX transfers on powered-down displays
- dp: Add VSC SDP helpers
cross drivers:
- use new drm print helpers
- switch to ->read_edid callback
- gem: add stats for shared buffers plus updates to amdgpu, i915, xe
syncobj:
- fixes to waiting and sleeping
ttm:
- add tests
- fix errno codes
- simply busy-placement handling
- fix page decryption
media:
- tc358743: fix v4l device registration
video:
- move all kernel parameters for video behind CONFIG_VIDEO
sound:
- remove <drm/drm_edid.h> include from header
ci:
- add tests for msm
- fix apq8016 runner
efifb:
- use copy of global screen_info state
vesafb:
- use copy of global screen_info state
simplefb:
- fix logging
bridge:
- ite-6505: fix DP link-training bug
- samsung-dsim: fix error checking in probe
- samsung-dsim: add bsh-smm-s2/pro boards
- tc358767: fix regmap usage
- imx: add i.MX8MP HDMI PVI plus DT bindings
- imx: add i.MX8MP HDMI TX plus DT bindings
- sii902x: fix probing and unregistration
- tc358767: limit pixel PLL input range
- switch to new drm_bridge_read_edid() interface
panel:
- ltk050h3146w: error-handling fixes
- panel-edp: support delay between power-on and enable; use put_sync in
unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49 V8.0,
BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings
- panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings
- panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings
- add BOE TH101MB31IG002-28A plus DT bindings
- add EDT ETML1010G3DRA plus DT bindings
- add Novatek NT36672E LCD DSI plus DT bindings
- nt36523: support 120Hz timings, fix includes
- simple: fix display timings on RK32FN48H
- visionox-vtdr6130: fix initialization
- add Powkiddy RGB10MAX3 plus DT bindings
- st7703: support panel rotation plus DT bindings
- add Himax HX83112A plus DT bindings
- ltk500hd1829: add support for ltk101b4029w and admatec 9904370
- simple: add BOE BP082WX1-100 8.2" panel plus DT bindungs
panel-orientation-quirks:
- GPD Win Mini
amdgpu:
- Validate DMABuf imports in compute VMs
- Add RAS ACA framework
- PSP 13 fixes
- Misc code cleanups
- Replay fixes
- Atom interpretor PS, WS bounds checking
- DML2 fixes
- Audio fixes
- DCN 3.5 Z state fixes
- Remove deprecated ida_simple usage
- UBSAN fixes
- RAS fixes
- Enable seq64 infrastructure
- DC color block enablement
- Documentation updates
- DC documentation updates
- DMCUB updates
- ATHUB 4.1 support
- LSDMA 7.0 support
- JPEG DPG support
- IH 7.0 support
- HDP 7.0 support
- VCN 5.0 support
- SMU 13.0.6 updates
- NBIO 7.11 updates
- SDMA 6.1 updates
- MMHUB 3.3 updates
- DCN 3.5.1 support
- NBIF 6.3.1 support
- VPE 6.1.1 support
amdkfd:
- Validate DMABuf imports in compute VMs
- SVM fixes
- Trap handler updates and enhancements
- Fix cache size reporting
- Relocate the trap handler
radeon:
- Atom interpretor PS, WS bounds checking
- Misc code cleanups
xe:
- new query for GuC submission version
- Remove unused persistent exec_queues
- Add vram frequency sysfs attributes
- Add the flag XE_VM_BIND_FLAG_DUMPABLE
- Drop pre-production workarounds
- Drop kunit tests for unsupported platforms
- Start pumbling SR-IOV support with memory based interrupts for VF
- Allow to map BO in GGTT with PAT index corresponding to
XE_CACHE_UC to work with memory based interrupts
- Add GuC Doorbells Manager as prep work SR-IOV
- Implement additional workarounds for xe2 and MTL
- Program a few registers according to perfomance guide spec for Xe2
- Fix remaining 32b build issues and enable it back
- Fix build with CONFIG_DEBUG_FS=n
- Fix warnings from GuC ABI headers
- Introduce Relay Communication for SR-IOV for VF <-> GuC <-> PF
- Release mmap mappings on rpm suspend
- Disable mid-thread preemption when not properly supported by hardware
- Fix xe_exec by reserving extra fence slot for CPU bind
- Fix xe_exec with full long running exec queue
- Canonicalize addresses where needed for Xe2 and add to devcoredum
- Toggle USM support for Xe2
- Only allow 1 ufence per exec / bind IOCTL
- Add GuC firmware loading for Lunar Lake
- Add XE_VMA_PTE_64K VMA flag
i915:
- Add more ADL-N PCI IDs
- Enable fastboot also on older platforms
- Early transport for panel replay and PSR
- New ARL PCI IDs
- DP TPS4 PHY test pattern support
- Unify and improve VSC SDP for PSR and non-PSR cases
- Refactor memory regions and improve debug logging
- Rework global state serialization
- Remove unused CDCLK divider fields
- Unify HDCP connector logging format
- Use display instead of graphics version in display code
- Move VBT and opregion debugfs next to the implementation
- Abstract opregion interface, use opaque type
- MTL fixes
- HPD handling fixes
- Add GuC submission interface version query
- Atomically invalidate userptr on mmu-notifier
- Update handling of MMIO triggered reports
- Don't make assumptions about intel_wakeref_t type
- Extend driver code of Xe_LPG to Xe_LPG+
- Add flex arrays to struct i915_syncmap
- Allow for very slow HuC loading
- DP tunneling and bandwidth allocation support
msm:
- Correct bindings for MSM8976 and SM8650 platforms
- Start migration of MDP5 platforms to DPU driver
- X1E80100 MDSS support
- DPU:
- Improve DSC allocation, fixing several important corner cases
- Add support for SDM630/SDM660 platforms
- Simplify dpu_encoder_phys_ops
- Apply fixes targeting DSC support with a single DSC encoder
- Apply fixes for HCTL_EN timing configuration
- X1E80100 support
- Add support for YUV420 over DP
- GPU:
- fix sc7180 UBWC config
- fix a7xx LLC config
- new gpu support: a305B, a750, a702
- machine support: SM7150 (different power levels than other a618)
- a7xx devcoredump support
habanalabs:
- configure IRQ affinity according to NUMA node
- move HBM MMU page tables inside the HBM
- improve device reset
- check extended PCIe errors
ivpu:
- updates to firmware API
- refactor BO allocation
imx:
- use devm_ functions during init
hisilicon:
- fix EDID includes
mgag200:
- improve ioremap usage
- convert to struct drm_edid
- Work around PCI write bursts
nouveau:
- disp: use kmemdup()
- fix EDID includes
- documentation fixes
qaic:
- fixes to BO handling
- make use of DRM managed release
- fix order of remove operations
rockchip:
- analogix_dp: get encoder port from DT
- inno_hdmi: support HDMI for RK3128
- lvds: error-handling fixes
ssd130x:
- support SSD133x plus DT bindings
tegra:
- fix error handling
tilcdc:
- make use of DRM managed release
v3d:
- show memory stats in debugfs
- Support display MMU page size
vc4:
- fix error handling in plane prepare_fb
- fix framebuffer test in plane helpers
virtio:
- add venus capset defines
vkms:
- fix OOB access when programming the LUT
- Kconfig improvements
vmwgfx:
- unmap surface before changing plane state
- fix memory leak in error handling
- documentation fixes
- list command SVGA_3D_CMD_DEFINE_GB_SURFACE_V4 as invalid
- fix null-pointer deref in execbuf
- refactor display-mode probing
- fix fencing for creating cursor MOBs
- fix cursor-memory lifetime
xlnx:
- fix live video input for ZynqMP DPSUB
lima:
- fix memory leak
loongson:
- fail if no VRAM present
meson:
- switch to new drm_bridge_read_edid() interface
renesas:
- add RZ/G2L DU support plus DT bindings
mxsfb:
- Use managed mode config
sun4i:
- HDMI: updates to atomic mode setting
mediatek:
- Add display driver for MT8188 VDOSYS1
- DSI driver cleanups
- Filter modes according to hardware capability
- Fix a null pointer crash in mtk_drm_crtc_finish_page_flip
etnaviv:
- enhancements for NPU and MRT support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmXxI+AACgkQDHTzWXnE
hr5isxAApZ+DxesDbV8bd91KXL03vTfJtM5xVQuZoDzrr20KdTvu2EfQcCFnAUjl
YtY05U9arDT4Txq5nX70Xc6I5M9HN6lqSUfsWhI6xUcR9TUollPbYwEu8IdoMaCG
TRnspkiheye+DLFY6omLNH2aG1/k1IIefVWKaClFpbNPaaSHREDiY7/rkmErMBIS
hrN13+6IVzX7+6fmNgHugUfdyawDJ8J9Nsc8T3Zlioljq3p+VbtStLsGeABTHSEJ
MX18FwbGllI+QcXvaXM8gIg8NYKvSx/ZtlvKTpyPpTjZT3i3BpY+7yJqWDRQhiGM
VTX7di1f90yWgzlYE5T33MW7Imvw3q04N7qYJ+Z1LHD/A8VyjwPUKLeul8P9ousT
0qQLSQsnuXH5AMLDh8IeLG/i0hAMWJ2UbProFSAFhd/UQHP7QOm2mmCsf79me9It
qKFn6QZKvAKGZk/myTbQIVAmQWrDCpKq4i1aoKXEvcEuQUtM1lPvmMVsStVEfG+y
ACaI24zSJACViH6rfhVzr74giwZX/ay0NSXqwRXfD5kX8fXb050LxLGW93iYZoHv
FpdT2C8oTS1A5nsZpoxwVP35euUsp7D4J5YYbrZder2m0s0DDCVLMqdFrSVNdWDM
4ZQRiY3wCiJjSS8dpwppW0uaVGjtnGQnjQ5sQrIw0vKkwxee0TQ=
=WLj9
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"Highlights are usual, more AMD IP blocks for future hw, i915/xe
changes, Displayport tunnelling support for i915, msm YUV over DP
changes, new tests for ttm, but its mostly a lot of stuff all over the
place from lots of people.
core:
- EDID cleanups
- scheduler error handling fixes
- managed: add drmm_release_action() with tests
- add ratelimited drm debug print
- DPCD PSR early transport macro
- DP tunneling and bandwidth allocation helpers
- remove built-in edids
- dp: Avoid AUX transfers on powered-down displays
- dp: Add VSC SDP helpers
cross drivers:
- use new drm print helpers
- switch to ->read_edid callback
- gem: add stats for shared buffers plus updates to amdgpu, i915, xe
syncobj:
- fixes to waiting and sleeping
ttm:
- add tests
- fix errno codes
- simply busy-placement handling
- fix page decryption
media:
- tc358743: fix v4l device registration
video:
- move all kernel parameters for video behind CONFIG_VIDEO
sound:
- remove <drm/drm_edid.h> include from header
ci:
- add tests for msm
- fix apq8016 runner
efifb:
- use copy of global screen_info state
vesafb:
- use copy of global screen_info state
simplefb:
- fix logging
bridge:
- ite-6505: fix DP link-training bug
- samsung-dsim: fix error checking in probe
- samsung-dsim: add bsh-smm-s2/pro boards
- tc358767: fix regmap usage
- imx: add i.MX8MP HDMI PVI plus DT bindings
- imx: add i.MX8MP HDMI TX plus DT bindings
- sii902x: fix probing and unregistration
- tc358767: limit pixel PLL input range
- switch to new drm_bridge_read_edid() interface
panel:
- ltk050h3146w: error-handling fixes
- panel-edp: support delay between power-on and enable; use put_sync
in unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49
V8.0, BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings
- panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings
- panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings
- add BOE TH101MB31IG002-28A plus DT bindings
- add EDT ETML1010G3DRA plus DT bindings
- add Novatek NT36672E LCD DSI plus DT bindings
- nt36523: support 120Hz timings, fix includes
- simple: fix display timings on RK32FN48H
- visionox-vtdr6130: fix initialization
- add Powkiddy RGB10MAX3 plus DT bindings
- st7703: support panel rotation plus DT bindings
- add Himax HX83112A plus DT bindings
- ltk500hd1829: add support for ltk101b4029w and admatec 9904370
- simple: add BOE BP082WX1-100 8.2" panel plus DT bindungs
panel-orientation-quirks:
- GPD Win Mini
amdgpu:
- Validate DMABuf imports in compute VMs
- Add RAS ACA framework
- PSP 13 fixes
- Misc code cleanups
- Replay fixes
- Atom interpretor PS, WS bounds checking
- DML2 fixes
- Audio fixes
- DCN 3.5 Z state fixes
- Remove deprecated ida_simple usage
- UBSAN fixes
- RAS fixes
- Enable seq64 infrastructure
- DC color block enablement
- Documentation updates
- DC documentation updates
- DMCUB updates
- ATHUB 4.1 support
- LSDMA 7.0 support
- JPEG DPG support
- IH 7.0 support
- HDP 7.0 support
- VCN 5.0 support
- SMU 13.0.6 updates
- NBIO 7.11 updates
- SDMA 6.1 updates
- MMHUB 3.3 updates
- DCN 3.5.1 support
- NBIF 6.3.1 support
- VPE 6.1.1 support
amdkfd:
- Validate DMABuf imports in compute VMs
- SVM fixes
- Trap handler updates and enhancements
- Fix cache size reporting
- Relocate the trap handler
radeon:
- Atom interpretor PS, WS bounds checking
- Misc code cleanups
xe:
- new query for GuC submission version
- Remove unused persistent exec_queues
- Add vram frequency sysfs attributes
- Add the flag XE_VM_BIND_FLAG_DUMPABLE
- Drop pre-production workarounds
- Drop kunit tests for unsupported platforms
- Start pumbling SR-IOV support with memory based interrupts for VF
- Allow to map BO in GGTT with PAT index corresponding to XE_CACHE_UC
to work with memory based interrupts
- Add GuC Doorbells Manager as prep work SR-IOV
- Implement additional workarounds for xe2 and MTL
- Program a few registers according to perfomance guide spec for Xe2
- Fix remaining 32b build issues and enable it back
- Fix build with CONFIG_DEBUG_FS=n
- Fix warnings from GuC ABI headers
- Introduce Relay Communication for SR-IOV for VF <-> GuC <-> PF
- Release mmap mappings on rpm suspend
- Disable mid-thread preemption when not properly supported by
hardware
- Fix xe_exec by reserving extra fence slot for CPU bind
- Fix xe_exec with full long running exec queue
- Canonicalize addresses where needed for Xe2 and add to devcoredum
- Toggle USM support for Xe2
- Only allow 1 ufence per exec / bind IOCTL
- Add GuC firmware loading for Lunar Lake
- Add XE_VMA_PTE_64K VMA flag
i915:
- Add more ADL-N PCI IDs
- Enable fastboot also on older platforms
- Early transport for panel replay and PSR
- New ARL PCI IDs
- DP TPS4 PHY test pattern support
- Unify and improve VSC SDP for PSR and non-PSR cases
- Refactor memory regions and improve debug logging
- Rework global state serialization
- Remove unused CDCLK divider fields
- Unify HDCP connector logging format
- Use display instead of graphics version in display code
- Move VBT and opregion debugfs next to the implementation
- Abstract opregion interface, use opaque type
- MTL fixes
- HPD handling fixes
- Add GuC submission interface version query
- Atomically invalidate userptr on mmu-notifier
- Update handling of MMIO triggered reports
- Don't make assumptions about intel_wakeref_t type
- Extend driver code of Xe_LPG to Xe_LPG+
- Add flex arrays to struct i915_syncmap
- Allow for very slow HuC loading
- DP tunneling and bandwidth allocation support
msm:
- Correct bindings for MSM8976 and SM8650 platforms
- Start migration of MDP5 platforms to DPU driver
- X1E80100 MDSS support
- DPU:
- Improve DSC allocation, fixing several important corner cases
- Add support for SDM630/SDM660 platforms
- Simplify dpu_encoder_phys_ops
- Apply fixes targeting DSC support with a single DSC encoder
- Apply fixes for HCTL_EN timing configuration
- X1E80100 support
- Add support for YUV420 over DP
- GPU:
- fix sc7180 UBWC config
- fix a7xx LLC config
- new gpu support: a305B, a750, a702
- machine support: SM7150 (different power levels than other a618)
- a7xx devcoredump support
habanalabs:
- configure IRQ affinity according to NUMA node
- move HBM MMU page tables inside the HBM
- improve device reset
- check extended PCIe errors
ivpu:
- updates to firmware API
- refactor BO allocation
imx:
- use devm_ functions during init
hisilicon:
- fix EDID includes
mgag200:
- improve ioremap usage
- convert to struct drm_edid
- Work around PCI write bursts
nouveau:
- disp: use kmemdup()
- fix EDID includes
- documentation fixes
qaic:
- fixes to BO handling
- make use of DRM managed release
- fix order of remove operations
rockchip:
- analogix_dp: get encoder port from DT
- inno_hdmi: support HDMI for RK3128
- lvds: error-handling fixes
ssd130x:
- support SSD133x plus DT bindings
tegra:
- fix error handling
tilcdc:
- make use of DRM managed release
v3d:
- show memory stats in debugfs
- Support display MMU page size
vc4:
- fix error handling in plane prepare_fb
- fix framebuffer test in plane helpers
virtio:
- add venus capset defines
vkms:
- fix OOB access when programming the LUT
- Kconfig improvements
vmwgfx:
- unmap surface before changing plane state
- fix memory leak in error handling
- documentation fixes
- list command SVGA_3D_CMD_DEFINE_GB_SURFACE_V4 as invalid
- fix null-pointer deref in execbuf
- refactor display-mode probing
- fix fencing for creating cursor MOBs
- fix cursor-memory lifetime
xlnx:
- fix live video input for ZynqMP DPSUB
lima:
- fix memory leak
loongson:
- fail if no VRAM present
meson:
- switch to new drm_bridge_read_edid() interface
renesas:
- add RZ/G2L DU support plus DT bindings
mxsfb:
- Use managed mode config
sun4i:
- HDMI: updates to atomic mode setting
mediatek:
- Add display driver for MT8188 VDOSYS1
- DSI driver cleanups
- Filter modes according to hardware capability
- Fix a null pointer crash in mtk_drm_crtc_finish_page_flip
etnaviv:
- enhancements for NPU and MRT support"
* tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel: (1420 commits)
drm/amd/display: Removed redundant @ symbol to fix kernel-doc warnings in -next repo
drm/amd/pm: wait for completion of the EnableGfxImu message
drm/amdgpu/soc21: add mode2 asic reset for SMU IP v14.0.1
drm/amdgpu: add smu 14.0.1 support
drm/amdgpu: add VPE 6.1.1 discovery support
drm/amdgpu/vpe: add VPE 6.1.1 support
drm/amdgpu/vpe: don't emit cond exec command under collaborate mode
drm/amdgpu/vpe: add collaborate mode support for VPE
drm/amdgpu/vpe: add PRED_EXE and COLLAB_SYNC OPCODE
drm/amdgpu/vpe: add multi instance VPE support
drm/amdgpu/discovery: add nbif v6_3_1 ip block
drm/amdgpu: Add nbif v6_3_1 ip block support
drm/amdgpu: Add pcie v6_1_0 ip headers (v5)
drm/amdgpu: Add nbif v6_3_1 ip headers (v5)
arch/powerpc: Remove <linux/fb.h> from backlight code
macintosh/via-pmu-backlight: Include <linux/backlight.h>
fbdev/chipsfb: Include <linux/backlight.h>
drm/etnaviv: Restore some id values
drm/amdkfd: make kfd_class constant
drm/amdgpu: add ring timeout information in devcoredump
...
Replace <linux/fb.h> with a forward declaration in <asm/backlight.h> to
resolve an unnecessary dependency. Remove pmac_backlight_curve_lookup()
and struct fb_info from source and header files. The function and the
framebuffer struct are unused. No functional changes.
v3:
* Add Fixes tag (Christophe)
* fix typos in commit message (Jani)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: d565dd3b08 ("[PATCH] powerpc: More via-pmu backlight fixes")
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> # (powerpc)
Link: https://patchwork.freedesktop.org/patch/msgid/20240306122935.10626-4-tzimmermann@suse.de
Building eg. 83xx/mpc832x_rdb_defconfig with FPU=n, fails with:
arch/powerpc/platforms/83xx/suspend.c: In function 'mpc83xx_suspend_enter':
arch/powerpc/platforms/83xx/suspend.c:209:17: error: implicit declaration of function 'enable_kernel_fp'
209 | enable_kernel_fp();
Fix it by providing an enable_kernel_fp() stub for FPU=n builds,
which allows using IS_ENABLED(CONFIG_PPC_FPU) around the call to
enable_kernel_fp().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240306125853.3714578-2-mpe@ellerman.id.au
The 44x/warp_defconfig build fails with:
arch/powerpc/platforms/44x/warp.c:109:15: error: variable ‘warp_gpio_leds’ has initializer but incomplete type
109 | static struct platform_device warp_gpio_leds = {
| ^~~~~~~~~~~~~~~
Fix it by including platform_device.h.
Fixes: ef175b29a2 ("of: Stop circularly including of_device.h and of_platform.h")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240305123410.3306253-3-mpe@ellerman.id.au
These functions can all be static. Make them so. That also fixes no
previous prototype warnings:
arch/powerpc/platforms/amigaone/setup.c:28:6: error: no previous prototype for 'amigaone_show_cpuinfo'
arch/powerpc/platforms/amigaone/setup.c:68:13: error: no previous prototype for 'amigaone_setup_arch'
arch/powerpc/platforms/amigaone/setup.c:86:13: error: no previous prototype for 'amigaone_init_IRQ'
arch/powerpc/platforms/amigaone/setup.c:126:17: error: no previous prototype for 'amigaone_restart'
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240305123410.3306253-2-mpe@ellerman.id.au
Move the prototypes into mpc10x.h which is included by all the relevant
C files, fixes:
arch/powerpc/platforms/embedded6xx/ls_uart.c:59:6: error: no previous prototype for 'avr_uart_configure'
arch/powerpc/platforms/embedded6xx/ls_uart.c:82:6: error: no previous prototype for 'avr_uart_send'
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240305123410.3306253-1-mpe@ellerman.id.au
Replace all usages of of_root by of_find_node_by_path("/")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214103152.12269-5-mpe@ellerman.id.au
Most probe functions that do not use the 'compatible' string do
nothing else than checking whether the machine is compatible with
one of the strings in a NULL terminated table of strings.
Define that table of strings in ppc_md structure and check it directly
from probe_machine() instead of using ppc_md.probe() for that.
Keep checking in ppc_md.probe() only for more complex probing.
All .compatible could be replaced with a single element NULL
terminated list but that's not worth the churn. Can be do incrementaly
in follow-up patches.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214103152.12269-4-mpe@ellerman.id.au
`buf` is allocated in papr_get_attr(), and krealloc() of `buf`
could fail. We need to free the original `buf` in the case of failure.
Fixes: 3c14b73454 ("powerpc/pseries: Interface to represent PAPR firmware attributes")
Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20221208133449.16284-1-linqiheng@huawei.com
Some devices are not capable of addressing 64 bits
via DMA, which includes MSI-X vectors. This allows
us to ensure these devices use MSI-X vectors in
32 bit space.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240117214632.134539-1-brking@linux.vnet.ibm.com
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmXb0T4eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5YQH/3eCV90sNGch0Y94
8rtTdqFrVx7QPNl0pz+Mo6OUIKUUHvTuwime16ckLxG+3x2Y3I0MjP1edd1NB99C
Kje//JTpaZBPpTZ/jY4u8B1Shov2Drdx/J4NFnE/9rG6yXzKQBtvON/xAxXDCVHT
mLhst2LR0FeCSMk9jAX6CoqUPEgwlylNyAetKxaDQgoHl4GTZC7FDO17WxyjpIxe
1rVHsrV9Eq8kD4uxrzpTYWgZrwTObPmlZjvefa1JfzSwRNABIBJj/C1nra1Zc1oi
b7xVaXS1cMOxrtuuG00fmHsPnWivu0tuND7H3/yLd1mRCZAPSsVbVvrI/KNtoeV4
1euINlY=
=7IFt
-----END PGP SIGNATURE-----
Merge v6.8-rc6 into drm-next
Thomas Zimmermann asked to backmerge -rc6 for drm-misc branches,
there's a few same-area-changed conflicts (xe and amdgpu mostly) that
are getting a bit too annoying.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now move the relevant codes into separate files:
kernel/crash_reserve.c, include/linux/crash_reserve.h.
And add config item CRASH_RESERVE to control its enabling.
And also update the old ifdeffery of CONFIG_CRASH_CORE, including of
<linux/crash_core.h> and config item dependency on CRASH_CORE
accordingly.
And also do renaming as follows:
- arch/xxx/kernel/{crash_core.c => vmcore_info.c}
because they are only related to vmcoreinfo exporting on x86, arm64,
riscv.
And also Remove config item CRASH_CORE, and rely on CONFIG_KEXEC_CORE to
decide if build in crash_core.c.
[yang.lee@linux.alibaba.com: remove duplicated include in vmcore_info.c]
Link: https://lkml.kernel.org/r/20240126005744.16561-1-yang.lee@linux.alibaba.com
Link: https://lkml.kernel.org/r/20240124051254.67105-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When kdump kernel tries to copy dump data over SR-IOV, LPAR panics due
to NULL pointer exception:
Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000020847ad4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: mlx5_core(+) vmx_crypto pseries_wdt papr_scm libnvdimm mlxfw tls psample sunrpc fuse overlay squashfs loop
CPU: 12 PID: 315 Comm: systemd-udevd Not tainted 6.4.0-Test102+ #12
Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries
NIP: c000000020847ad4 LR: c00000002083b2dc CTR: 00000000006cd18c
REGS: c000000029162ca0 TRAP: 0300 Not tainted (6.4.0-Test102+)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48288244 XER: 00000008
CFAR: c00000002083b2d8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1
...
NIP _find_next_zero_bit+0x24/0x110
LR bitmap_find_next_zero_area_off+0x5c/0xe0
Call Trace:
dev_printk_emit+0x38/0x48 (unreliable)
iommu_area_alloc+0xc4/0x180
iommu_range_alloc+0x1e8/0x580
iommu_alloc+0x60/0x130
iommu_alloc_coherent+0x158/0x2b0
dma_iommu_alloc_coherent+0x3c/0x50
dma_alloc_attrs+0x170/0x1f0
mlx5_cmd_init+0xc0/0x760 [mlx5_core]
mlx5_function_setup+0xf0/0x510 [mlx5_core]
mlx5_init_one+0x84/0x210 [mlx5_core]
probe_one+0x118/0x2c0 [mlx5_core]
local_pci_probe+0x68/0x110
pci_call_probe+0x68/0x200
pci_device_probe+0xbc/0x1a0
really_probe+0x104/0x540
__driver_probe_device+0xb4/0x230
driver_probe_device+0x54/0x130
__driver_attach+0x158/0x2b0
bus_for_each_dev+0xa8/0x130
driver_attach+0x34/0x50
bus_add_driver+0x16c/0x300
driver_register+0xa4/0x1b0
__pci_register_driver+0x68/0x80
mlx5_init+0xb8/0x100 [mlx5_core]
do_one_initcall+0x60/0x300
do_init_module+0x7c/0x2b0
At the time of LPAR dump, before kexec hands over control to kdump
kernel, DDWs (Dynamic DMA Windows) are scanned and added to the FDT.
For the SR-IOV case, default DMA window "ibm,dma-window" is removed from
the FDT and DDW added, for the device.
Now, kexec hands over control to the kdump kernel.
When the kdump kernel initializes, PCI busses are scanned and IOMMU
group/tables created, in pci_dma_bus_setup_pSeriesLP(). For the SR-IOV
case, there is no "ibm,dma-window". The original commit: b1fc44eaa9,
fixes the path where memory is pre-mapped (direct mapped) to the DDW.
When TCEs are direct mapped, there is no need to initialize IOMMU
tables.
iommu_table_setparms_lpar() only considers "ibm,dma-window" property
when initiallizing IOMMU table. In the scenario where TCEs are
dynamically allocated for SR-IOV, newly created IOMMU table is not
initialized. Later, when the device driver tries to enter TCEs for the
SR-IOV device, NULL pointer execption is thrown from iommu_area_alloc().
The fix is to initialize the IOMMU table with DDW property stored in the
FDT. There are 2 points to remember:
1. For the dedicated adapter, kdump kernel would encounter both
default and DDW in FDT. In this case, DDW property is used to
initialize the IOMMU table.
2. A DDW could be direct or dynamic mapped. kdump kernel would
initialize IOMMU table and mark the existing DDW as
"dynamic". This works fine since, at the time of table
initialization, iommu_table_clear() makes some space in the
DDW, for some predefined number of TCEs which are needed for
kdump to succeed.
Fixes: b1fc44eaa9 ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240125203017.61014-1-gbatra@linux.ibm.com
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/34847d756453af2e85e5944a8cc2e2c21aacc905.1708529736.git.u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/28dd12b7cbde4b278b8b1d0ae4382dbd8ce9c9c5.1708529736.git.u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/8a5ac8044578694879e919322dbd46f140b64950.1708529736.git.u.kleine-koenig@pengutronix.de
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1e8396078942d9e46e56d70ed2f749a76391c381.1708529736.git.u.kleine-koenig@pengutronix.de
The PS3 hcall assembly code makes ad-hoc stack frames that don't have
a back-chain pointer or meet other requirements like minimum frame size.
This probably confuses stack unwinders. Give all hcalls a real stack
frame.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Geoff Levand <geoff@infradead.org>
[mpe: Add missing \ in LV1_2_IN_4_OUT]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231227072405.63751-4-npiggin@gmail.com
The LRSAVE constant is required for assembly compiled for both 32-bit
and 64-bit, because the value differs there. PS3 is 64-bit only so
this is a noop, but it is nice to abstract stack frame offsets.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231227072405.63751-3-npiggin@gmail.com
Stack-passed parameters begin at a different offset in the caller's
stack in the ELFv2 ABI.
Reported-by: Geoff Levand <geoff@infradead.org>
Fixes: 8c5fa3b5c4 ("powerpc/64: Make ELFv2 the default for big-endian builds")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231227072405.63751-2-npiggin@gmail.com
When an ifdef is used in the below manner, second one could be considered
as duplicate.
ifdef DEFINE_A
...code block...
ifdef DEFINE_A <-- This is a duplicate.
...code block...
endif
else
ifndef DEFINE_A <-- This is also duplicate.
...code block...
endif
endif
More details about the script and methods used to find these code
patterns are in cover letter of [1].
Few places in arch/powerpc where this pattern was seen:
paca.h:
Hunk1: Code is under check of CONFIG_PPC64 from line 13, hence the
second CONFIG_PPC64 at line 166 is a duplicate.
Hunk2: CONFIG_PPC_BOOK3S_64 was defined back to back. Merged the two
ifdefs.
asm-offsets.c:
Code is under check of CONFIG_PPC64 from line 176 hence second
CONFIG_PPC64 at line 249 is a duplicate.
powermac/feature.c:
#ifndef CONFIG_PPC64 is used at line 2066. And then in #else again
#ifdef CONFIG_PPC64 is used. Which is a duplicate since in #else means
CONFIG_PPC64 is defined.
xmon.c:
Code is under the check of CONFIG_SMP from line 521 hence the same
check of CONFIG_SMP at line 646 is a duplicate.
No functional change is intended here. It only aims to improve code
readability.
[1] https://lore.kernel.org/all/20240118080326.13137-1-sshegde@linux.ibm.com/
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240216053016.528906-1-sshegde@linux.ibm.com
Since commit d492cc2573 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the ibmebus_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212-bus_cleanup-powerpc2-v2-5-8441b3f77827@marliere.net
Since commit d492cc2573 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the vio_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212-bus_cleanup-powerpc2-v2-2-8441b3f77827@marliere.net
In order to make the distinction of the vio_bus_type variable based on
CONFIG_PPC_SMLPAR more explicit, move the required structs into a new
ifdef block. This is needed in order to make vio_bus_type const and
because the distinction is made explicit, there is no need to set the
fields within the vio_cmo_sysfs_init function.
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212-bus_cleanup-powerpc2-v2-1-8441b3f77827@marliere.net
powerVM hypervisor updates the VPA fields with stolen time data.
It currently reports enqueue_dispatch_tb and ready_enqueue_tb for
this purpose. In linux these two fields are used to report the stolen time.
The VPA fields are updated at the TB frequency. On powerPC its mostly
set at 512Mhz. Hence this needs a conversion to ns when reporting it
back as rest of the kernel timings are in ns. This conversion is already
handled in tb_to_ns function. So use that function to report accurate
stolen time.
Observed this issue and used an Capped Shared Processor LPAR(SPLPAR) to
simplify the experiments. In all these cases, 100% VP Load is run using
stress-ng workload. Values of stolen time is in percentages as reported
by mpstat. With the patch values are close to expected.
6.8.rc1 +Patch
12EC/12VP 0.0 0.0
12EC/24VP 25.7 50.2
12EC/36VP 37.3 69.2
12EC/48VP 38.5 78.3
Fixes: 0e8a631328 ("powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240213052635.231597-1-sshegde@linux.ibm.com
When VIDEO is not set, there is a build error. Fix that by selecting
VIDEO for PS3_PS3AV.
ERROR: modpost: ".video_get_options" [drivers/ps3/ps3av_mod.ko] undefined!
Fixes: dae7fbf43f ("driver/ps3: Include <video/cmdline.h> for mode parsing")
Fixes: a3b6792e99 ("video/cmdline: Introduce CONFIG_VIDEO for video= parameter")
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Geoff Levand <geoff@infradead.org>
Acked-by: Geoff Levand <geoff@infradead.org>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240207161322.8073-1-rdunlap@infradead.org
These functions are either used in only one file and can just be
made static or need an #include statement to avoid a warning:
arch/powerpc/platforms/85xx/mpc8536_ds.c:30:13: error: no previous prototype for 'mpc8536_ds_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1010rdb.c:27:13: error: no previous prototype for 'p1010_rdb_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_ds.c:373:6: error: no previous prototype for 'p1022ds_set_pixel_clock' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_ds.c:422:1: error: no previous prototype for 'p1022ds_valid_monitor_port' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_ds.c:435:13: error: no previous prototype for 'p1022_ds_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_rdk.c:43:6: error: no previous prototype for 'p1022rdk_set_pixel_clock' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_rdk.c:92:1: error: no previous prototype for 'p1022rdk_valid_monitor_port' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_rdk.c:99:13: error: no previous prototype for 'p1022_rdk_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/socrates_fpga_pic.c:273:13: error: no previous prototype for 'socrates_fpga_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/xes_mpc85xx.c:40:13: error: no previous prototype for 'xes_mpc85xx_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/mvme2500.c:24:13: error: no previous prototype for 'mvme2500_pic_init' [-Werror=missing-prototypes]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240123125148.2004648-2-arnd@kernel.org
Here is the big set of tty and serial driver changes for 6.8-rc1.
As usual, Jiri has a bunch of refactoring and cleanups for the tty core
and drivers in here, along with the usual set of rs485 updates (someday
this might work properly...) Along with those, in here are changes for:
- sc16is7xx serial driver updates
- platform driver removal api updates
- amba-pl011 driver updates
- tty driver binding updates
- other small tty/serial driver updates and changes
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeUaw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykyOgCgp1uhP/b9iW6qM7qL6OYEG6idI0kAnj0VASNm
vSI69HmdKKwo69YLOSBp
=14n1
-----END PGP SIGNATURE-----
Merge tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty and serial driver changes for 6.8-rc1.
As usual, Jiri has a bunch of refactoring and cleanups for the tty
core and drivers in here, along with the usual set of rs485 updates
(someday this might work properly...)
Along with those, in here are changes for:
- sc16is7xx serial driver updates
- platform driver removal api updates
- amba-pl011 driver updates
- tty driver binding updates
- other small tty/serial driver updates and changes
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (197 commits)
serial: sc16is7xx: refactor EFR lock
serial: sc16is7xx: reorder code to remove prototype declarations
serial: sc16is7xx: refactor FIFO access functions to increase commonality
serial: sc16is7xx: drop unneeded MODULE_ALIAS
serial: sc16is7xx: replace hardcoded divisor value with BIT() macro
serial: sc16is7xx: add explicit return for some switch default cases
serial: sc16is7xx: add macro for max number of UART ports
serial: sc16is7xx: add driver name to struct uart_driver
serial: sc16is7xx: use i2c_get_match_data()
serial: sc16is7xx: use spi_get_device_match_data()
serial: sc16is7xx: use DECLARE_BITMAP for sc16is7xx_lines bitfield
serial: sc16is7xx: improve do/while loop in sc16is7xx_irq()
serial: sc16is7xx: remove obsolete loop in sc16is7xx_port_irq()
serial: sc16is7xx: set safe default SPI clock frequency
serial: sc16is7xx: add check for unsupported SPI modes during probe
serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error
serial: 8250_exar: Set missing rs485_supported flag
serial: omap: do not override settings for RS485 support
serial: core, imx: do not set RS485 enabled if it is not supported
serial: core: make sure RS485 cannot be enabled when it is not supported
...
change of locking rules for __dentry_kill(), regularized refcounting
rules in that area, assorted cleanups and removal of weird corner
cases (e.g. now ->d_iput() on child is always called before the parent
might hit __dentry_kill(), etc.)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+sQQAKCRBZ7Krx/gZQ
6ybjAQDM5jiS93IUzfHjCWq0nVBX5YGbDAkZOeqxbmIdQb+2UAEA6elP5r0fBBcA
seo3bry4DirQMDaA/Cjh4+8r71YSOQs=
=7+Hk
-----END PGP SIGNATURE-----
Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache updates from Al Viro:
"Change of locking rules for __dentry_kill(), regularized refcounting
rules in that area, assorted cleanups and removal of weird corner
cases (e.g. now ->d_iput() on child is always called before the parent
might hit __dentry_kill(), etc)"
* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
dcache: remove unnecessary NULL check in dget_dlock()
kill DCACHE_MAY_FREE
__d_unalias() doesn't use inode argument
d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant
get rid of DCACHE_GENOCIDE
d_genocide(): move the extern into fs/internal.h
simple_fill_super(): don't bother with d_genocide() on failure
nsfs: use d_make_root()
d_alloc_pseudo(): move setting ->d_op there from the (sole) caller
kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller
retain_dentry(): introduce a trimmed-down lockless variant
__dentry_kill(): new locking scheme
d_prune_aliases(): use a shrink list
switch select_collect{,2}() to use of to_shrink_list()
to_shrink_list(): call only if refcount is 0
fold dentry_kill() into dput()
don't try to cut corners in shrink_lock_dentry()
fold the call of retain_dentry() into fast_dput()
Call retain_dentry() with refcount 0
dentry_kill(): don't bother with retain_dentry() on slow path
...
are included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the
series
"maple_tree: add mt_free_one() and mt_attr() helpers"
"Some cleanups of maple tree"
- In the series "mm: use memmap_on_memory semantics for dax/kmem"
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few
fixes) in the patch series
"Add folio_zero_tail() and folio_fill_tail()"
"Make folio_start_writeback return void"
"Fix fault handler's handling of poisoned tail pages"
"Convert aops->error_remove_page to ->error_remove_folio"
"Finish two folio conversions"
"More swap folio conversions"
- Kefeng Wang has also contributed folio-related work in the series
"mm: cleanup and use more folio in page fault"
- Jim Cromie has improved the kmemleak reporting output in the
series "tweak kmemleak report format".
- In the series "stackdepot: allow evicting stack traces" Andrey
Konovalov to permits clients (in this case KASAN) to cause
eviction of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series "mm:
page_alloc: fixes for high atomic reserve caluculations".
- Dmitry Rokosov has added to the samples/ dorectory some sample
code for a userspace memcg event listener application. See the
series "samples: introduce cgroup events listeners".
- Some mapletree maintanance work from Liam Howlett in the series
"maple_tree: iterator state changes".
- Nhat Pham has improved zswap's approach to writeback in the
series "workload-specific and memory pressure-driven zswap
writeback".
- DAMON/DAMOS feature and maintenance work from SeongJae Park in
the series
"mm/damon: let users feed and tame/auto-tune DAMOS"
"selftests/damon: add Python-written DAMON functionality tests"
"mm/damon: misc updates for 6.8"
- Yosry Ahmed has improved memcg's stats flushing in the series
"mm: memcg: subtree stats flushing and thresholds".
- In the series "Multi-size THP for anonymous memory" Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series "More buffer_head
cleanups".
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
"userfaultfd move option". UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a "KSM Advisor", in the series
"mm/ksm: Add ksm advisor". This is a governor which tunes KSM's
scanning aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory
use in the series "mm/zswap: dstmem reuse optimizations and
cleanups".
- Matthew Wilcox has performed some maintenance work on the
writeback code, both code and within filesystems. The series is
"Clean up the writeback paths".
- Andrey Konovalov has optimized KASAN's handling of alloc and
free stack traces for secondary-level allocators, in the series
"kasan: save mempool stack traces".
- Andrey also performed some KASAN maintenance work in the series
"kasan: assorted clean-ups".
- David Hildenbrand has gone to town on the rmap code. Cleanups,
more pte batching, folio conversions and more. See the series
"mm/rmap: interface overhaul".
- Kinsey Ho has contributed some maintenance work on the MGLRU
code in the series "mm/mglru: Kconfig cleanup".
- Matthew Wilcox has contributed lruvec page accounting code
cleanups in the series "Remove some lruvec page accounting
functions".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
=0NHs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
- Add initial support to recognise the HeXin C2000 processor.
- Add papr-vpd and papr-sysparm character device drivers for VPD & sysparm
retrieval, so userspace tools can be adapted to avoid doing raw firmware
calls from userspace.
- Sched domains optimisations for shared processor partitions on P9/P10.
- A series of optimisations for KVM running as a nested HV under PowerVM.
- Other small features and fixes.
Thanks to: Aditya Gupta, Aneesh Kumar K.V, Arnd Bergmann, Christophe Leroy,
Colin Ian King, Dario Binacchi, David Heidelberg, Geoff Levand, Gustavo A.
R. Silva, Haoran Liu, Jordan Niethe, Kajol Jain, Kevin Hao, Kunwu Chan, Li
kunyu, Li zeming, Masahiro Yamada, Michal Suchánek, Nathan Lynch, Naveen N Rao,
Nicholas Piggin, Randy Dunlap, Sathvika Vasireddy, Srikar Dronamraju, Stephen
Rothwell, Vaibhav Jain, Zhao Ke.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmWRVf0THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIfpEACns86LkKuH1wTxbXJFaY2vIdPbBVUO
oh0+y6Bm6ybCVvSp/CcyDPRRWpVlnp4BZlAh4x3gHrdRYEbIaFhI3gUzUtPLxAmf
Oza1qyN570AFOudTNOy3VErtHiMHSuI7ckRshXWCakbAN8VlBDFWje3VJ4vZZ5OB
Ii4RM0a3e/XqUZodLQXvDcqo3GDeIVmf1BnOTvEFFPhjZUZBfJarL6OHuyX7Xp1J
oGSBA3O7UBVGrQsoGS5UAMRqZQnvLc5hn150FU1qDPkHu5X5iLvIMUakTFCYgGYw
mT7DBPpDWKKFSfVjsjIVX2GPv8XSMPnZDmxOl/SIKM1F4aKAL9vmbYP6AMXXmvVB
SpluSmkcp+YujtK5QO8BN4I2SD3xIbhH8yjMUh2CAFP1SBR0QnKpXUGHRiZ0m7fM
SSFAHHLEzKJC46vUsazazoldyWQMAwBHKQzoASHf59yrEP4uta/+pimHdsOeU2UP
IAQEYzw7fTKbEIvqV4qf6sW+5bVUhISS1vSlJ3OEkGqUxVvaUMQ2ePPbX+rfv7lS
hXlxh9vjFzcDK5PYmLi0Agua9ct0ER0MOdY5kRMXAb4+AlVLQi4EgymxRCrjYu2/
XodDf1xJU2w7gdMc4TpiouHRrOtZQ9JWH5j+x0YnN4lG2vmG7lbU22a4myn6PjP9
RLAymXt4/1iHqA==
=LjlQ
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add initial support to recognise the HeXin C2000 processor.
- Add papr-vpd and papr-sysparm character device drivers for VPD &
sysparm retrieval, so userspace tools can be adapted to avoid doing
raw firmware calls from userspace.
- Sched domains optimisations for shared processor partitions on
P9/P10.
- A series of optimisations for KVM running as a nested HV under
PowerVM.
- Other small features and fixes.
Thanks to Aditya Gupta, Aneesh Kumar K.V, Arnd Bergmann, Christophe
Leroy, Colin Ian King, Dario Binacchi, David Heidelberg, Geoff Levand,
Gustavo A. R. Silva, Haoran Liu, Jordan Niethe, Kajol Jain, Kevin Hao,
Kunwu Chan, Li kunyu, Li zeming, Masahiro Yamada, Michal Suchánek,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Randy Dunlap, Sathvika
Vasireddy, Srikar Dronamraju, Stephen Rothwell, Vaibhav Jain, and
Zhao Ke.
* tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (96 commits)
powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2
powerpc/86xx: Drop unused CONFIG_MPC8610
powerpc/powernv: Add error handling to opal_prd_range_is_valid
selftests/powerpc: Fix spelling mistake "EACCESS" -> "EACCES"
powerpc/hvcall: Reorder Nestedv2 hcall opcodes
powerpc/ps3: Add missing set_freezable() for ps3_probe_thread()
powerpc/mpc83xx: Use wait_event_freezable() for freezable kthread
powerpc/mpc83xx: Add the missing set_freezable() for agent_thread_fn()
powerpc/fsl: Fix fsl,tmu-calibration to match the schema
powerpc/smp: Dynamically build Powerpc topology
powerpc/smp: Avoid asym packing within thread_group of a core
powerpc/smp: Add __ro_after_init attribute
powerpc/smp: Disable MC domain for shared processor
powerpc/smp: Enable Asym packing for cores on shared processor
powerpc/sched: Cleanup vcpu_is_preempted()
powerpc: add cpu_spec.cpu_features to vmcoreinfo
powerpc/imc-pmu: Add a null pointer check in update_events_in_group()
powerpc/powernv: Add a null pointer check in opal_powercap_init()
powerpc/powernv: Add a null pointer check in opal_event_init()
powerpc/powernv: Add a null pointer check to scom_debug_init_one()
...
commit 23baf831a3 ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The MPC8610 symbol used to be default y if MPC8610_HPCD, but since
MPC8610_HPCD was removed MPC8610 is now never used. Remove it.
Fixes: 248667f8bb ("powerpc: drop HPCD/MPC8610 evaluation platform support")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231123032902.2760818-1-mpe@ellerman.id.au
In the opal_prd_range_is_valid function within opal-prd.c,
error handling was missing for the of_get_address call.
This patch adds necessary error checking, ensuring that the
function gracefully handles scenarios where of_get_address fails.
Signed-off-by: Haoran Liu <liuhaoran14@163.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231127144108.29782-1-liuhaoran14@163.com
The kernel thread function ps3_probe_thread() invokes the try_to_freeze()
in its loop. But all the kernel threads are non-freezable by default.
So if we want to make a kernel thread to be freezable, we have to invoke
set_freezable() explicitly.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231221044510.1802429-4-haokexin@gmail.com
A freezable kernel thread can enter frozen state during freezing by
either calling try_to_freeze() or using wait_event_freezable() and its
variants. So for the following snippet of code in a kernel thread loop:
wait_event_interruptible();
try_to_freeze();
We can change it to a simple wait_event_freezable() and then eliminate
a function call.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231221044510.1802429-3-haokexin@gmail.com
The kernel thread function agent_thread_fn() invokes the try_to_freeze()
in its loop. But all the kernel threads are non-freezable by default.
So if we want to make a kernel thread to be freezable, we have to invoke
set_freezable() explicitly.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231221044510.1802429-2-haokexin@gmail.com
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Fixes: b9ef7b4b86 ("powerpc: Convert to using %pOFn instead of device_node.name")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231126095739.1501990-1-chentao@kylinos.cn
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Fixes: 2717a33d60 ("powerpc/opal-irqchip: Use interrupt names if present")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231127030755.1546750-1-chentao@kylinos.cn
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Add a null pointer check, and release 'ent' to avoid memory leaks.
Fixes: bfd2f0d49a ("powerpc/powernv: Get rid of old scom_controller abstraction")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231208085937.107210-1-chentao@kylinos.cn
The hypervisor returns migration failure if all VAS windows are not
closed. During pre-migration stage, vas_migration_handler() sets
migration_in_progress flag and closes all windows from the list.
The allocate VAS window routine checks the migration flag, setup
the window and then add it to the list. So there is possibility of
the migration handler missing the window that is still in the
process of setup.
t1: Allocate and open VAS t2: Migration event
window
lock vas_pseries_mutex
If migration_in_progress set
unlock vas_pseries_mutex
return
open window HCALL
unlock vas_pseries_mutex
Modify window HCALL lock vas_pseries_mutex
setup window migration_in_progress=true
Closes all windows from the list
// May miss windows that are
// not in the list
unlock vas_pseries_mutex
lock vas_pseries_mutex return
if nr_closed_windows == 0
// No DLPAR CPU or migration
add window to the list
// Window will be added to the
// list after the setup is completed
unlock vas_pseries_mutex
return
unlock vas_pseries_mutex
Close VAS window
// due to DLPAR CPU or migration
return -EBUSY
This patch resolves the issue with the following steps:
- Set the migration_in_progress flag without holding mutex.
- Introduce nr_open_wins_progress counter in VAS capabilities
struct
- This counter tracks the number of open windows are still in
progress
- The allocate setup window thread closes windows if the migration
is set and decrements nr_open_window_progress counter
- The migration handler waits for no in-progress open windows.
The code flow with the fix is as follows:
t1: Allocate and open VAS t2: Migration event
window
lock vas_pseries_mutex
If migration_in_progress set
unlock vas_pseries_mutex
return
open window HCALL
nr_open_wins_progress++
// Window opened, but not
// added to the list yet
unlock vas_pseries_mutex
Modify window HCALL migration_in_progress=true
setup window lock vas_pseries_mutex
Closes all windows from the list
While nr_open_wins_progress {
unlock vas_pseries_mutex
lock vas_pseries_mutex sleep
if nr_closed_windows == 0 // Wait if any open window in
or migration is not started // progress. The open window
// No DLPAR CPU or migration // thread closes the window without
add window to the list // adding to the list and return if
nr_open_wins_progress-- // the migration is in progress.
unlock vas_pseries_mutex
return
Close VAS window
nr_open_wins_progress--
unlock vas_pseries_mutex
return -EBUSY lock vas_pseries_mutex
}
unlock vas_pseries_mutex
return
Fixes: 37e6764895 ("powerpc/pseries/vas: Add VAS migration handler")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231125235104.3405008-1-haren@linux.ibm.com
Until now the papr_sysparm APIs have been kernel-internal. But user
space needs access to PAPR system parameters too. The only method
available to user space today to get or set system parameters is using
sys_rtas() and /dev/mem to pass RTAS-addressable buffers between user
space and firmware. This is incompatible with lockdown and should be
deprecated.
So provide an alternative ABI to user space in the form of a
/dev/papr-sysparm character device with just two ioctl commands (get
and set). The data payloads involved are small enough to fit in the
ioctl argument buffer, making the code relatively simple.
Exposing the system parameters through sysfs has been considered but
it would be too awkward:
* The kernel currently does not have to contain an exhaustive list of
defined system parameters. This is a convenient property to maintain
because we don't have to update the kernel whenever a new parameter
is added to PAPR. Exporting a named attribute in sysfs for each
parameter would negate this.
* Some system parameters are text-based and some are not.
* Retrieval of at least one system parameter requires input data,
which a simple read-oriented interface can't support.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-11-e9eafd0c8c6c@linux.ibm.com
The ability to get and set system parameters will be exposed to user
space, so let's get a little more strict about malformed
papr_sysparm_buf objects.
* Create accessors for the length field of struct papr_sysparm_buf.
The length is always stored in MSB order and this is better than
spreading the necessary conversions all over.
* Reject attempts to submit invalid buffers to RTAS.
* Warn if RTAS returns a buffer with an invalid length, clamping the
returned length to a safe value that won't overrun the buffer.
These are meant as precautionary measures to mitigate both firmware
and kernel bugs in this area, should they arise, but I am not aware of
any.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-10-e9eafd0c8c6c@linux.ibm.com
PowerVM LPARs may retrieve Vital Product Data (VPD) for system
components using the ibm,get-vpd RTAS function.
We can expose this to user space with a /dev/papr-vpd character
device, where the programming model is:
struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
int devfd = open("/dev/papr-vpd", O_RDONLY);
int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, &plc);
size_t size = lseek(vpdfd, 0, SEEK_END);
char *buf = malloc(size);
pread(devfd, buf, size, 0);
When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
the file contains the result of a complete ibm,get-vpd sequence. The
file contents are immutable from the POV of user space. To get a new
view of the VPD, the client must create a new handle.
This design choice insulates user space from most of the complexities
that ibm,get-vpd brings:
* ibm,get-vpd must be called more than once to obtain complete
results.
* Only one ibm,get-vpd call sequence should be in progress at a time;
interleaved sequences will disrupt each other. Callers must have a
protocol for serializing their use of the function.
* A call sequence in progress may receive a "VPD changed, try again"
status, requiring the client to abandon the sequence and start
over.
The memory required for the VPD buffers seems acceptable, around 20KB
for all VPD on one of my systems. And the value of the
/rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is
consistently 300KB across various systems I've checked.
I've implemented support for this new ABI in the rtas_get_vpd()
function in librtas, which the vpdupdate command currently uses to
populate its VPD database. I've verified that an unmodified vpdupdate
binary generates an identical database when using a librtas.so that
prefers the new ABI.
Along with the papr-vpd.h header exposed to user space, this
introduces a common papr-miscdev.h uapi header to share a base ioctl
ID with similar drivers to come.
Tested-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-9-e9eafd0c8c6c@linux.ibm.com
Switch character types to u8 and sizes to size_t. To conform to
characters/sizes in the rest of the tty layer.
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Amit Shah <amit@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux.dev
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20231206073712.17776-13-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an add operation for multiple LMBs fails, there is currently
little indication from the kernel of what went wrong. Be a little more
verbose about error conditions in the add paths.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231114-pseries-memhp-fixes-v1-3-fb8f2bb7c557@linux.ibm.com
dlpar_memory_remove_by_index() may access beyond the bounds of the
drmem lmb array when the LMB lookup fails to match an entry with the
given DRC index. When the search fails, the cursor is left pointing to
&drmem_info->lmbs[drmem_info->n_lmbs], which is one element past the
last valid entry in the array. The debug message at the end of the
function then dereferences this pointer:
pr_debug("Failed to hot-remove memory at %llx\n",
lmb->base_addr);
This was found by inspection and confirmed with KASAN:
pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 1234
==================================================================
BUG: KASAN: slab-out-of-bounds in dlpar_memory+0x298/0x1658
Read of size 8 at addr c000000364e97fd0 by task bash/949
dump_stack_lvl+0xa4/0xfc (unreliable)
print_report+0x214/0x63c
kasan_report+0x140/0x2e0
__asan_load8+0xa8/0xe0
dlpar_memory+0x298/0x1658
handle_dlpar_errorlog+0x130/0x1d0
dlpar_store+0x18c/0x3e0
kobj_attr_store+0x68/0xa0
sysfs_kf_write+0xc4/0x110
kernfs_fop_write_iter+0x26c/0x390
vfs_write+0x2d4/0x4e0
ksys_write+0xac/0x1a0
system_call_exception+0x268/0x530
system_call_vectored_common+0x15c/0x2ec
Allocated by task 1:
kasan_save_stack+0x48/0x80
kasan_set_track+0x34/0x50
kasan_save_alloc_info+0x34/0x50
__kasan_kmalloc+0xd0/0x120
__kmalloc+0x8c/0x320
kmalloc_array.constprop.0+0x48/0x5c
drmem_init+0x2a0/0x41c
do_one_initcall+0xe0/0x5c0
kernel_init_freeable+0x4ec/0x5a0
kernel_init+0x30/0x1e0
ret_from_kernel_user_thread+0x14/0x1c
The buggy address belongs to the object at c000000364e80000
which belongs to the cache kmalloc-128k of size 131072
The buggy address is located 0 bytes to the right of
allocated 98256-byte region [c000000364e80000, c000000364e97fd0)
==================================================================
pseries-hotplug-mem: Failed to hot-remove memory at 0
Log failed lookups with a separate message and dereference the
cursor only when it points to a valid entry.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 51925fb3c5 ("powerpc/pseries: Implement memory hotplug remove in the kernel")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231114-pseries-memhp-fixes-v1-1-fb8f2bb7c557@linux.ibm.com
Fix build errors when CURRITUCK=y and I2C is not builtin (=m or is
not set). Fixes these build errors:
powerpc-linux-ld: arch/powerpc/platforms/44x/ppc476.o: in function `avr_halt_system':
ppc476.c:(.text+0x58): undefined reference to `i2c_smbus_write_byte_data'
powerpc-linux-ld: arch/powerpc/platforms/44x/ppc476.o: in function `ppc47x_device_probe':
ppc476.c:(.init.text+0x18): undefined reference to `i2c_register_driver'
Fixes: 2a2c74b2ef ("IBM Akebono: Add the Akebono platform")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: lore.kernel.org/r/202312010820.cmdwF5X9-lkp@intel.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231201055159.8371-1-rdunlap@infradead.org
The 44x/fsp2_defconfig build fails with:
arch/powerpc/platforms/44x/idle.c:30:12: error: no previous prototype for ‘ppc44x_idle_init’ [-Werror=missing-prototypes]
30 | int __init ppc44x_idle_init(void)
| ^~~~~~~~~~~~~~~~
Fix it by making ppc44x_idle_init() static.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231129131919.2528517-4-mpe@ellerman.id.au
The mpc512x_defconfig build fails with:
arch/powerpc/platforms/512x/mpc5121_ads_cpld.c:142:1: error: no previous prototype for ‘mpc5121_ads_cpld_map’ [-Werror=missing-prototypes]
142 | mpc5121_ads_cpld_map(void)
| ^~~~~~~~~~~~~~~~~~~~
arch/powerpc/platforms/512x/mpc5121_ads_cpld.c:157:1: error: no previous prototype for ‘mpc5121_ads_cpld_pic_init’ [-Werror=missing-prototypes]
157 | mpc5121_ads_cpld_pic_init(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
There are prototypes for these functions but the header they are in is
not included by mpc5121_ads_cpld.c. Include it to fix the build error.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231129131919.2528517-3-mpe@ellerman.id.au
The rtas_read_config() and rtas_write_config() functions in
kernel/rtas_pci.c have external linkage and two users in arch/powerpc:
the rtas_pci code itself and the pseries platform's "enhanced error
handling" (EEH) support code.
The prototypes for these functions in asm/ppc-pci.h have until now
been guarded by CONFIG_EEH since the only external caller is the
pseries EEH code. However, this presumably has always generated
warnings when built with !CONFIG_EEH and -Wmissing-prototypes:
arch/powerpc/kernel/rtas_pci.c:46:5: error: no previous prototype for
function 'rtas_read_config' [-Werror,-Wmissing-prototypes]
46 | int rtas_read_config(struct pci_dn *pdn, int where,
int size, u32 *val)
arch/powerpc/kernel/rtas_pci.c:98:5: error: no previous prototype for
function 'rtas_write_config' [-Werror,-Wmissing-prototypes]
98 | int rtas_write_config(struct pci_dn *pdn, int where,
int size, u32 val)
The introduction of commit c6345dfa6e3e ("Makefile.extrawarn: turn on
missing-prototypes globally") forces the issue.
The efika and chrp platform code have (static) functions with the same
names but different signatures. We may as well eliminate the potential
for conflicts and confusion by renaming the globally visible versions
as their prototypes get moved out of the CONFIG_EEH-guarded region;
their current names are too generic anyway. Since they operate on
objects of the type 'struct pci_dn *', give them the slightly more
verbose prefix "rtas_pci_dn_" and fix up all the call sites.
Fixes: c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/linuxppc-dev/CA+G9fYt0LLXtjSz+Hkf3Fhm-kf0ZQanrhUS+zVZGa3O+Wt2+vg@mail.gmail.com/
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231127-rtas-pci-rw-config-v1-1-385d29ace3df@linux.ibm.com
Saves a pointer per struct dentry and actually makes the things less
clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use
of hlist_for_... iterators.
A couple of new helpers - d_first_child() and d_next_sibling(),
to make the expressions less awful.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is a pseries-specific function declaration that doesn't belong in
rtas.h. Move it to the pseries platform code and adjust
pseries/suspend.c accordingly.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231106-rtas-trivial-v1-5-61847655c51f@linux.ibm.com
These functions are only called locally and should be static like the
other corresponding functions are:
arch/powerpc/platforms/powermac/smp.c:416:13: error: no previous prototype for 'smp_psurge_take_timebase' [-Werror=missing-prototypes]
416 | void __init smp_psurge_take_timebase(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/platforms/powermac/smp.c:432:13: error: no previous prototype for 'smp_psurge_give_timebase' [-Werror=missing-prototypes]
432 | void __init smp_psurge_give_timebase(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231108125843.3806765-20-arnd@kernel.org
Allmodconfig builds show a warning about one function that is accidentally
marked global:
arch/powerpc/platforms/pasemi/setup.c:67:6: error: no previous prototype for 'pas_shutdown' [-Werror=missing-prototypes]
Fixes: 656fdf3ad8 ("powerpc/pasemi: Add Nemo board device init code.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231108125843.3806765-19-arnd@kernel.org
Allmodconfig kernels produce a missing-prototypes warning:
arch/powerpc/platforms/ps3/gelic_udbg.c:239:6: error: no previous prototype for 'udbg_shutdown_ps3gelic' [-Werror=missing-prototypes]
Move the declaration from a local header to asm/ps3.h where it can be
seen from both the caller and the definition.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
[mpe: Drop CONFIG_PS3GELIC_UDBG to fix build error]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231108125843.3806765-18-arnd@kernel.org
- Add support for KVM running as a nested hypervisor under development versions
of PowerVM, using the new PAPR nested virtualisation API.
- Add support for the BPF prog pack allocator.
- A rework of the non-server MMU handling to support execute-only on all platforms.
- Some optimisations & cleanups for the powerpc qspinlock code.
- Various other small features and fixes.
Thanks to: Aboorva Devarajan, Aditya Gupta, Amit Machhiwal, Benjamin Gray,
Christophe Leroy, Dr. David Alan Gilbert, Gaurav Batra, Gautam Menghani, Geert
Uytterhoeven, Haren Myneni, Hari Bathini, Joel Stanley, Jordan Niethe, Julia
Lawall, Kautuk Consul, Kuan-Wei Chiu, Michael Neuling, Minjie Du, Muhammad
Muzammil, Naveen N Rao, Nicholas Piggin, Nick Child, Nysal Jan K.A, Peter
Lafreniere, Rob Herring, Sachin Sant, Sebastian Andrzej Siewior, Shrikanth
Hegde, Srikar Dronamraju, Stanislav Kinsburskii, Vaibhav Jain, Wang Yufen, Yang
Yingliang, Yuan Tan.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmVEf38THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgMKgD/4vmPVcBE31xCAuuksrVvmMDRsCoC8N
IJe4A5dHda1tYgdN2YdeK4LBszv5pWICjf2xZHlNh+L0s3Vxpngd4ycAWGPfDAyk
SOlM24NCKl5j3327QZEt+iZVmJeTSnrmjxO0A1y04yvzLrfvFT7mbP4EXoidjShd
GNb/EoH9kkCFn65zulc+lN2itQEX6Ht2GQTAz5z5GKtF6d1zZGM8ftOW+SQ5LeU3
5JOkQtMtwAKhzBiglA4BB3pQyjaOOkPaTaj/WLoxx5tbVaCkV4wrFq48Bmtbm7E3
kYkMNoI3IsC615GqY1CaRs/RSpMt74tIVh3tstSecHWRIwNGnfF6zeZpKLvJSs8k
Qa5greGWMUDuJdDg9oDwAX2AKtO+3byI2v1hKE+sMhMh0eeMtDP9WIrIRg4BDjKL
mq8RffXLTCtepehgfwBpoZbcvFSwFUMwuihBD7+bDMZQeDbtuFdZ2ouMFXBP9M1n
cuv4KySouvKv9Xp5EeCkHlpL7QmSqrtSHOPYjoPeLueJYlmjheWdreLM9p7Nl2ma
5wBxLpdLCGCpDJOyGgWNoQRHXucBNlU97DLx2V70nXG4wvvRyXh9EZ6I2niPSdPx
N3LJnINz4MJ52Gd1KWJvufOyJlLwXxuI07rzCq67ZegpEPh+baWqVcPscuKU8+q0
dSh2DPCht8gw1A==
=ddT4
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for KVM running as a nested hypervisor under development
versions of PowerVM, using the new PAPR nested virtualisation API
- Add support for the BPF prog pack allocator
- A rework of the non-server MMU handling to support execute-only on
all platforms
- Some optimisations & cleanups for the powerpc qspinlock code
- Various other small features and fixes
Thanks to Aboorva Devarajan, Aditya Gupta, Amit Machhiwal, Benjamin
Gray, Christophe Leroy, Dr. David Alan Gilbert, Gaurav Batra, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Julia Lawall, Kautuk Consul, Kuan-Wei Chiu, Michael
Neuling, Minjie Du, Muhammad Muzammil, Naveen N Rao, Nicholas Piggin,
Nick Child, Nysal Jan K.A, Peter Lafreniere, Rob Herring, Sachin Sant,
Sebastian Andrzej Siewior, Shrikanth Hegde, Srikar Dronamraju, Stanislav
Kinsburskii, Vaibhav Jain, Wang Yufen, Yang Yingliang, and Yuan Tan.
* tag 'powerpc-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (100 commits)
powerpc/vmcore: Add MMU information to vmcoreinfo
Revert "powerpc: add `cur_cpu_spec` symbol to vmcoreinfo"
powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data
powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
powerpc/bpf: implement bpf_arch_text_copy
powerpc/code-patching: introduce patch_instructions()
powerpc/32s: Implement local_flush_tlb_page_psize()
powerpc/pseries: use kfree_sensitive() in plpks_gen_password()
powerpc/code-patching: Perform hwsync in __patch_instruction() in case of failure
powerpc/fsl_msi: Use device_get_match_data()
powerpc: Remove cpm_dp...() macros
powerpc/qspinlock: Rename yield_propagate_owner tunable
powerpc/qspinlock: Propagate sleepy if previous waiter is preempted
powerpc/qspinlock: don't propagate the not-sleepy state
powerpc/qspinlock: propagate owner preemptedness rather than CPU number
powerpc/qspinlock: stop queued waiters trying to set lock sleepy
powerpc/perf: Fix disabling BHRB and instruction sampling
powerpc/trace: Add support for HAVE_FUNCTION_ARG_ACCESS_API
powerpc/tools: Pass -mabi=elfv2 to gcc-check-mprofile-kernel.sh
...
To help make the move of sysctls out of kernel/sysctl.c not incur a size
penalty sysctl has been changed to allow us to not require the sentinel, the
final empty element on the sysctl array. Joel Granados has been doing all this
work. On the v6.6 kernel we got the major infrastructure changes required to
support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove
the sentinel. Both arch and driver changes have been on linux-next for a bit
less than a month. It is worth re-iterating the value:
- this helps reduce the overall build time size of the kernel and run time
memory consumed by the kernel by about ~64 bytes per array
- the extra 64-byte penalty is no longer inncurred now when we move sysctls
out from kernel/sysctl.c to their own files
For v6.8-rc1 expect removal of all the sentinels and also then the unneeded
check for procname == NULL.
The last 2 patches are fixes recently merged by Krister Johansen which allow
us again to use softlockup_panic early on boot. This used to work but the
alias work broke it. This is useful for folks who want to detect softlockups
super early rather than wait and spend money on cloud solutions with nothing
but an eventual hung kernel. Although this hadn't gone through linux-next it's
also a stable fix, so we might as well roll through the fixes now.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmVCqKsSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinEgYQAIpkqRL85DBwems19Uk9A27lkctwZ6Fc
HdslQCObQTsbuKVimZFP4IL2beUfUE0cfLZCXlzp+4nRDOf6vyhyf3w19jPQtI0Q
YdqwTk9y6G5VjDsb35QK0+UBloY/kZ1H3/LW4uCwjXTuksUGmWW2Qvey35696Scv
hDMLADqKQmdpYxLUaNi9QyYbEAjYtOai2ezg3+i7hTG168t1k/Ab2BxIFrPVsCR2
FAiq05L4ugWjNskdsWBjck05JZsx9SK/qcAxpIPoUm4nGiFNHApXE0E0hs3vsnmn
WIHIbxCQw8ZlUDlmw4S+0YH3NFFzFbWfmW8k2b0f2qZTJm/rU4KiJfcJVknkAUVF
raFox6XDW0AUQ9L/NOUJ9ip5rup57GcFrMYocdJ3PPAvvmHKOb1D1O741p75RRcc
9j7zwfIRrzjPUqzhsQS/GFjdJu3lJNmEBK1AcgrVry6WoItrAzJHKPPDC7TwaNmD
eXpjxMl1sYzzHqtVh4hn+xkUYphj/6gTGMV8zdo+/FopFswgeJW9G8kHtlEWKDPk
MRIKwACmfetP6f3ngHunBg+BOipbjCANL7JI0nOhVOQoaULxCCPx+IPJ6GfSyiuH
AbcjH8DGI7fJbUkBFoF0dsRFZ2gH8ds1PYMbWUJ6x3FtuCuv5iIuvQYoaWU6itm7
6f0KvCogg0fU
=Qf50
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"To help make the move of sysctls out of kernel/sysctl.c not incur a
size penalty sysctl has been changed to allow us to not require the
sentinel, the final empty element on the sysctl array. Joel Granados
has been doing all this work. On the v6.6 kernel we got the major
infrastructure changes required to support this. For v6.7-rc1 we have
all arch/ and drivers/ modified to remove the sentinel. Both arch and
driver changes have been on linux-next for a bit less than a month. It
is worth re-iterating the value:
- this helps reduce the overall build time size of the kernel and run
time memory consumed by the kernel by about ~64 bytes per array
- the extra 64-byte penalty is no longer inncurred now when we move
sysctls out from kernel/sysctl.c to their own files
For v6.8-rc1 expect removal of all the sentinels and also then the
unneeded check for procname == NULL.
The last two patches are fixes recently merged by Krister Johansen
which allow us again to use softlockup_panic early on boot. This used
to work but the alias work broke it. This is useful for folks who want
to detect softlockups super early rather than wait and spend money on
cloud solutions with nothing but an eventual hung kernel. Although
this hadn't gone through linux-next it's also a stable fix, so we
might as well roll through the fixes now"
* tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits)
watchdog: move softlockup_panic back to early_param
proc: sysctl: prevent aliased sysctls from getting passed to init
intel drm: Remove now superfluous sentinel element from ctl_table array
Drivers: hv: Remove now superfluous sentinel element from ctl_table array
raid: Remove now superfluous sentinel element from ctl_table array
fw loader: Remove the now superfluous sentinel element from ctl_table array
sgi-xp: Remove the now superfluous sentinel element from ctl_table array
vrf: Remove the now superfluous sentinel element from ctl_table array
char-misc: Remove the now superfluous sentinel element from ctl_table array
infiniband: Remove the now superfluous sentinel element from ctl_table array
macintosh: Remove the now superfluous sentinel element from ctl_table array
parport: Remove the now superfluous sentinel element from ctl_table array
scsi: Remove now superfluous sentinel element from ctl_table array
tty: Remove now superfluous sentinel element from ctl_table array
xen: Remove now superfluous sentinel element from ctl_table array
hpet: Remove now superfluous sentinel element from ctl_table array
c-sky: Remove now superfluous sentinel element from ctl_talbe array
powerpc: Remove now superfluous sentinel element from ctl_table arrays
riscv: Remove now superfluous sentinel element from ctl_table array
x86/vdso: Remove now superfluous sentinel element from ctl_table array
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmU/vjMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqVcEADaNf6X7LVKKrdQ4sA38dBZYGM3kNz0SCYV
vkjQAs0Fyylbu6EhYOLO/R+UCtpytLlnbr4NmFDbhaEG4OJcwoDLDxpMQ7Gda58v
4RBXAiIlhZX3g99/ebvtNtVEvQa9gF4h8k2n/gKsG+PoS+cbkKAI0Na2duI1d/pL
B5nQ31VAHhsyjUv1nIPLrQS6lsL7ZTFvH8L6FLcEVM03poy8PE2H6kN7WoyXwtfo
LN3KK0Nu7B0Wx2nDx0ffisxcDhbChGs7G2c9ndPTvxg6/4HW+2XSeNUwTxXYpyi2
ZCD+AHCzMB/w6GNNWFw4xfau5RrZ4c4HdBnmyR6+fPb1u6nGzjgquzFyLyLu5MkA
n/NvOHP1Cbd3QIXG1TnBi2kDPkQ5FOIAjFSe9IZAGT4dUkZ63wBoDil1jCgMLuCR
C+AFPLhiIg3cFvu9+fdZ6BkCuZYESd3YboBtRKeMionEexrPTKt4QWqIoVJgd/Y7
nwvR8jkIBpVgQZT8ocYqhSycLCYV2lGqEBSq4rlRiEb/W1G9Awmg8UTGuUYFSC1G
vGPCwhGi+SBsbo84aPCfSdUkKDlruNWP0GwIFxo0hsiTOoHP+7UWeenJ2Jw5lNPt
p0Y72TEDDaSMlE4cJx6IWdWM/B+OWzCyRyl3uVcy7bToEsVhIbBSSth7+sh2n7Cy
WgH1lrtMzg==
=sace
-----END PGP SIGNATURE-----
Merge tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- Improvements to the queue_rqs() support, and adding null_blk support
for that as well (Chengming)
- Series improving badblocks support (Coly)
- Key store support for sed-opal (Greg)
- IBM partition string handling improvements (Jan)
- Make number of ublk devices supported configurable (Mike)
- Cancelation improvements for ublk (Ming)
- MD pull requests via Song:
- Handle timeout in md-cluster, by Denis Plotnikov
- Cleanup pers->prepare_suspend, by Yu Kuai
- Rewrite mddev_suspend(), by Yu Kuai
- Simplify md_seq_ops, by Yu Kuai
- Reduce unnecessary locking array_state_store(), by Mariusz
Tkaczyk
- Make rdev add/remove independent from daemon thread, by Yu Kuai
- Refactor code around quiesce() and mddev_suspend(), by Yu Kuai
- NVMe pull request via Keith:
- nvme-auth updates (Mark)
- nvme-tcp tls (Hannes)
- nvme-fc annotaions (Kees)
- Misc cleanups and improvements (Jiapeng, Joel)
* tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux: (95 commits)
block: ublk_drv: Remove unused function
md: cleanup pers->prepare_suspend()
nvme-auth: allow mixing of secret and hash lengths
nvme-auth: use transformed key size to create resp
nvme-auth: alloc nvme_dhchap_key as single buffer
nvmet-tcp: use 'spin_lock_bh' for state_lock()
powerpc/pseries: PLPKS SED Opal keystore support
block: sed-opal: keystore access for SED Opal keys
block:sed-opal: SED Opal keystore
ublk: simplify aborting request
ublk: replace monitor with cancelable uring_cmd
ublk: quiesce request queue when aborting queue
ublk: rename mm_lock as lock
ublk: move ublk_cancel_dev() out of ub->mutex
ublk: make sure io cmd handled in submitter task context
ublk: don't get ublk device reference in ublk_abort_queue()
ublk: Make ublks_max configurable
ublk: Limit dev_id/ub_number values
md-cluster: check for timeout while a new disk adding
nvme: rework NVME_AUTH Kconfig selection
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppYgAKCRCRxhvAZXjc
okIHAP9anLz1QDyMLH12ASuHjgBc0Of3jcB6NB97IWGpL4O21gEA46ohaD+vcJuC
YkBLU3lXqQ87nfu28ExFAzh10hG2jwM=
=m4pB
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode time accessor updates from Christian Brauner:
"This finishes the conversion of all inode time fields to accessor
functions as discussed on list. Changing timestamps manually as we
used to do before is error prone. Using accessors function makes this
robust.
It does not contain the switch of the time fields to discrete 64 bit
integers to replace struct timespec and free up space in struct inode.
But after this, the switch can be trivially made and the patch should
only affect the vfs if we decide to do it"
* tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits)
fs: rename inode i_atime and i_mtime fields
security: convert to new timestamp accessors
selinux: convert to new timestamp accessors
apparmor: convert to new timestamp accessors
sunrpc: convert to new timestamp accessors
mm: convert to new timestamp accessors
bpf: convert to new timestamp accessors
ipc: convert to new timestamp accessors
linux: convert to new timestamp accessors
zonefs: convert to new timestamp accessors
xfs: convert to new timestamp accessors
vboxsf: convert to new timestamp accessors
ufs: convert to new timestamp accessors
udf: convert to new timestamp accessors
ubifs: convert to new timestamp accessors
tracefs: convert to new timestamp accessors
sysv: convert to new timestamp accessors
squashfs: convert to new timestamp accessors
server: convert to new timestamp accessors
client: convert to new timestamp accessors
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTpoQAAKCRCRxhvAZXjc
ovFNAQDgIRjXfZ1Ku+USxsRRdqp8geJVaNc3PuMmYhOYhUenqgEAmC1m+p0y31dS
P6+HlL16Mqgu0tpLCcJK9BibpDZ0Ew4=
=7yD1
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual fses.
Features:
- Rename and export helpers that get write access to a mount. They
are used in overlayfs to get write access to the upper mount.
- Print the pretty name of the root device on boot failure. This
helps in scenarios where we would usually only print
"unknown-block(1,2)".
- Add an internal SB_I_NOUMASK flag. This is another part in the
endless POSIX ACL saga in a way.
When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip
the umask because if the relevant inode has POSIX ACLs set it might
take the umask from there. But if the inode doesn't have any POSIX
ACLs set then we apply the umask in the filesytem itself. So we end
up with:
(1) no SB_POSIXACL -> strip umask in vfs
(2) SB_POSIXACL -> strip umask in filesystem
The umask semantics associated with SB_POSIXACL allowed filesystems
that don't even support POSIX ACLs at all to raise SB_POSIXACL
purely to avoid umask stripping. That specifically means NFS v4 and
Overlayfs. NFS v4 does it because it delegates this to the server
and Overlayfs because it needs to delegate umask stripping to the
upper filesystem, i.e., the filesystem used as the writable layer.
This went so far that SB_POSIXACL is raised eve on kernels that
don't even have POSIX ACL support at all.
Stop this blatant abuse and add SB_I_NOUMASK which is an internal
superblock flag that filesystems can raise to opt out of umask
handling. That should really only be the two mentioned above. It's
not that we want any filesystems to do this. Ideally we have all
umask handling always in the vfs.
- Make overlayfs use SB_I_NOUMASK too.
- Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in
IS_POSIXACL() if the kernel doesn't have support for it. This is a
very old patch but it's only possible to do this now with the wider
cleanup that was done.
- Follow-up work on fake path handling from last cycle. Citing mostly
from Amir:
When overlayfs was first merged, overlayfs files of regular files
and directories, the ones that are installed in file table, had a
"fake" path, namely, f_path is the overlayfs path and f_inode is
the "real" inode on the underlying filesystem.
In v6.5, we took another small step by introducing of the
backing_file container and the file_real_path() helper. This change
allowed vfs and filesystem code to get the "real" path of an
overlayfs backing file. With this change, we were able to make
fsnotify work correctly and report events on the "real" filesystem
objects that were accessed via overlayfs.
This method works fine, but it still leaves the vfs vulnerable to
new code that is not aware of files with fake path. A recent
example is commit db1d1e8b98 ("IMA: use vfs_getattr_nosec to get
the i_version"). This commit uses direct referencing to f_path in
IMA code that otherwise uses file_inode() and file_dentry() to
reference the filesystem objects that it is measuring.
This contains work to switch things around: instead of having
filesystem code opt-in to get the "real" path, have generic code
opt-in for the "fake" path in the few places that it is needed.
Is it far more likely that new filesystems code that does not use
the file_dentry() and file_real_path() helpers will end up causing
crashes or averting LSM/audit rules if we keep the "fake" path
exposed by default.
This change already makes file_dentry() moot, but for now we did
not change this helper just added a WARN_ON() in ovl_d_real() to
catch if we have made any wrong assumptions.
After the dust settles on this change, we can make file_dentry() a
plain accessor and we can drop the inode argument to ->d_real().
- Switch struct file to SLAB_TYPESAFE_BY_RCU. This looks like a small
change but it really isn't and I would like to see everyone on
their tippie toes for any possible bugs from this work.
Essentially we've been doing most of what SLAB_TYPESAFE_BY_RCU for
files since a very long time because of the nasty interactions
between the SCM_RIGHTS file descriptor garbage collection. So
extending it makes a lot of sense but it is a subtle change. There
are almost no places that fiddle with file rcu semantics directly
and the ones that did mess around with struct file internal under
rcu have been made to stop doing that because it really was always
dodgy.
I forgot to put in the link tag for this change and the discussion
in the commit so adding it into the merge message:
https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com
Cleanups:
- Various smaller pipe cleanups including the removal of a spin lock
that was only used to protect against writes without pipe_lock()
from O_NOTIFICATION_PIPE aka watch queues. As that was never
implemented remove the additional locking from pipe_write().
- Annotate struct watch_filter with the new __counted_by attribute.
- Clarify do_unlinkat() cleanup so that it doesn't look like an extra
iput() is done that would cause issues.
- Simplify file cleanup when the file has never been opened.
- Use module helper instead of open-coding it.
- Predict error unlikely for stale retry.
- Use WRITE_ONCE() for mount expiry field instead of just commenting
that one hopes the compiler doesn't get smart.
Fixes:
- Fix readahead on block devices.
- Fix writeback when layztime is enabled and inodes whose timestamp
is the only thing that changed reside on wb->b_dirty_time. This
caused excessively large zombie memory cgroup when lazytime was
enabled as such inodes weren't handled fast enough.
- Convert BUG_ON() to WARN_ON_ONCE() in open_last_lookups()"
* tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
file, i915: fix file reference for mmap_singleton()
vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups
writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
chardev: Simplify usage of try_module_get()
ovl: rely on SB_I_NOUMASK
fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=n
fs: store real path instead of fake path in backing file f_path
fs: create helper file_user_path() for user displayed mapped file path
fs: get mnt_writers count for an open backing file's real path
vfs: stop counting on gcc not messing with mnt_expiry_mark if not asked
vfs: predict the error in retry_estale as unlikely
backing file: free directly
vfs: fix readahead(2) on block devices
io_uring: use files_lookup_fd_locked()
file: convert to SLAB_TYPESAFE_BY_RCU
vfs: shave work on failed file open
fs: simplify misleading code to remove ambiguity regarding ihold()/iput()
watch_queue: Annotate struct watch_filter with __counted_by
fs/pipe: use spinlock in pipe_read() only if there is a watch_queue
fs/pipe: remove unnecessary spinlock from pipe_write()
...
password might contain private information, so better use
kfree_sensitive to free it.
In plpks_gen_password() use kfree_sensitive().
Signed-off-by: Minjie Du <duminjie@vivo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230717092648.9752-1-duminjie@vivo.com
If the vcpu_associativity alloc memory successfully but the
pcpu_associativity fails to alloc memory, the vcpu_associativity
memory leaks.
Fixes: d62c8deeb6 ("powerpc/pseries: Provide vcpu dispatch statistics")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Reviewed-by: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1671003983-10794-1-git-send-email-wangyufen@huawei.com
The VAS open window call prints error message and returns -EBUSY
after the migration suspend event initiated and until the resume
event completed on the destination system. It can cause the log
buffer filled with these error messages if the user space issues
continuous open window calls. Similar case even for DLPAR CPU
remove event when no credits are available until the credits are
freed or with the other DLPAR CPU add event.
So changes in the patch to use pr_err_ratelimited() instead of
pr_err() to display open window failure and not-available credits
error messages.
Use pr_fmt() and make the corresponding changes to have the
consistencein prefix all pr_*() messages (vas-api.c).
Fixes: 37e6764895 ("powerpc/pseries/vas: Add VAS migration handler")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Use "vas-api" as the prefix to match the file name.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231019215033.1335251-1-haren@linux.ibm.com
When a device is initialized, the driver invokes dma_supported() twice -
first for streaming mappings followed by coherent mappings. For an
SR-IOV device, default window is deleted and DDW created. With vPMEM
enabled, TCE mappings are dynamically created for both vPMEM and SR-IOV
device. There are no direct mappings.
First time when dma_supported() is called with 64 bit mask, DDW is created
and marked as dynamic window. The second time dma_supported() is called,
enable_ddw() finds existing window for the device and incorrectly returns
it as "direct mapping".
This only happens when size of DDW is big enough to map max LPAR memory.
This results in streaming TCEs to not get dynamically mapped, since code
incorrently assumes these are already pre-mapped. The adapter initially
comes up but goes down due to EEH.
Fixes: 381ceda88c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231003030802.47914-1-gbatra@linux.vnet.ibm.com
In recent discussions around some performance improvements in the file
handling area we discussed switching the file cache to rely on
SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
freeing for files completely. This is a pretty sensitive change overall
but it might actually be worth doing.
The main downside is the subtlety. The other one is that we should
really wait for Jann's patch to land that enables KASAN to handle
SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
exists.
With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
which requires a few changes. So it isn't sufficient anymore to just
acquire a reference to the file in question under rcu using
atomic_long_inc_not_zero() since the file might have already been
recycled and someone else might have bumped the reference.
In other words, callers might see reference count bumps from newer
users. For this reason it is necessary to verify that the pointer is the
same before and after the reference count increment. This pattern can be
seen in get_file_rcu() and __files_get_rcu().
In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it. Not doing that was always
very dodgy and it was only usable for non-pointer data in struct file.
With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
reference under rcu or they must hold the files_lock of the fdtable.
Failing to do either one of this is a bug.
Thanks to Jann for pointing out that we need to ensure memory ordering
between reallocations and pointer check by ensuring that all subsequent
loads have a dependency on the second load in get_file_rcu() and
providing a fixup that was folded into this patch.
Cc: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Sparse reports an endianness error with the else case of
val = (cpu_endian ? be64_to_cpu(reg_entry->reg_val) :
(u64)(reg_entry->reg_val));
This is a safe operation because the code is explicitly working with
dynamic endianness, so add the __force annotation to tell Sparse to
ignore it.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231011053711.93427-13-bgray@linux.ibm.com
Sparse reports several endianness warnings on variables and functions
that are consistently treated as big endian. There are no
multi-endianness shenanigans going on here so fix these low hanging
fruit up in one patch.
All changes are just type annotations; no endianness switching
operations are introduced by this patch.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231011053711.93427-7-bgray@linux.ibm.com
Sparse reports several uses of 0 for pointer arguments and comparisons.
Replace with NULL to better convey the intent. Remove entirely if a
comparison to follow the kernel style of implicit boolean conversions.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231011053711.93427-5-bgray@linux.ibm.com
fixmap.h need pgtable.h for [un]map_kernel_page()
pgtable.h need fixmap.h for FIXADDR_TOP.
Untangle the two files by moving FIXADDR_TOP into pgtable.h
Also move VIRT_IMMR_BASE to fixmap.h to avoid fixmap.h in mmu.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5eba12392a018be28ad0a02ed844767b132589e7.1695659959.git.christophe.leroy@csgroup.eu
Define operations for SED Opal to read/write keys
from POWER LPAR Platform KeyStore(PLPKS). This allows
non-volatile storage of SED Opal keys.
Signed-off-by: Greg Joyce <gjoyce@linux.vnet.ibm.com>
Reviewed-by: Jonathan Derrick <jonathan.derrick@linux.dev>
Link: https://lore.kernel.org/r/20231004201957.1451669-4-gjoyce@linux.vnet.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Remove sentinel from powersave_nap_ctl_table and nmi_wd_lpm_factor_ctl_table.
This removal is safe because register_sysctl implicitly uses ARRAY_SIZE()
in addition to checking for the sentinel.
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
In the plpar_hcall trace code, currently we use r0
to store the value of r4. But this value is not
used subsequently in the code. Hence remove this unused
save to r0 in plpar_hcall and plpar_hcall9
Suggested-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230929172337.7906-2-atrajeev@linux.vnet.ibm.com
In powerpc pseries system, below behaviour is observed while
enabling tracing on hcall:
# cd /sys/kernel/debug/tracing/
# cat events/powerpc/hcall_exit/enable
0
# echo 1 > events/powerpc/hcall_exit/enable
# ls
-bash: fork: Bad address
Above is from power9 lpar with latest kernel. Past this, softlockup
is observed. Initially while attempting via perf_event_open to
use "PERF_TYPE_TRACEPOINT", kernel panic was observed.
perf config used:
================
memset(&pe[1],0,sizeof(struct perf_event_attr));
pe[1].type=PERF_TYPE_TRACEPOINT;
pe[1].size=96;
pe[1].config=0x26ULL; /* 38 raw_syscalls/sys_exit */
pe[1].sample_type=0; /* 0 */
pe[1].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1f */
pe[1].inherit=1;
pe[1].precise_ip=0; /* arbitrary skid */
pe[1].wakeup_events=0;
pe[1].bp_type=HW_BREAKPOINT_EMPTY;
pe[1].config1=0x1ULL;
Kernel panic logs:
==================
Kernel attempted to read user page (8) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000008
Faulting instruction address: 0xc0000000004c2814
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: nfnetlink bonding tls rfkill sunrpc dm_service_time dm_multipath pseries_rng xts vmx_crypto xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth dm_mirror dm_region_hash dm_log dm_mod fuse
CPU: 0 PID: 1431 Comm: login Not tainted 6.4.0+ #1
Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.30 (VL950_892) hv:phyp pSeries
NIP page_remove_rmap+0x44/0x320
LR wp_page_copy+0x384/0xec0
Call Trace:
0xc00000001416e400 (unreliable)
wp_page_copy+0x384/0xec0
__handle_mm_fault+0x9d4/0xfb0
handle_mm_fault+0xf0/0x350
___do_page_fault+0x48c/0xc90
hash__do_page_fault+0x30/0x70
do_hash_fault+0x1a4/0x330
data_access_common_virt+0x198/0x1f0
--- interrupt: 300 at 0x7fffae971abc
git bisect tracked this down to below commit:
'commit baa49d81a9 ("powerpc/pseries: hvcall stack frame overhead")'
This commit changed STACK_FRAME_OVERHEAD (112 ) to
STACK_FRAME_MIN_SIZE (32 ) since 32 bytes is the minimum size
for ELFv2 stack. With the latest kernel, when running on ELFv2,
STACK_FRAME_MIN_SIZE is used to allocate stack size.
During plpar_hcall_trace, first call is made to HCALL_INST_PRECALL
which saves the registers and allocates new stack frame. In the
plpar_hcall_trace code, STK_PARAM is accessed at two places.
1. To save r4: std r4,STK_PARAM(R4)(r1)
2. To access r4 back: ld r12,STK_PARAM(R4)(r1)
HCALL_INST_PRECALL precall allocates a new stack frame. So all
the stack parameter access after the precall, needs to be accessed
with +STACK_FRAME_MIN_SIZE. So the store instruction should be:
std r4,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
If the "std" is not updated with STACK_FRAME_MIN_SIZE, we will
end up with overwriting stack contents and cause corruption.
But instead of updating 'std', we can instead remove it since
HCALL_INST_PRECALL already saves it to the correct location.
similarly load instruction should be:
ld r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
Fix the load instruction to correctly access the stack parameter
with +STACK_FRAME_MIN_SIZE and remove the store of r4 since the
precall saves it correctly.
Cc: stable@vger.kernel.org # v6.2+
Fixes: baa49d81a9 ("powerpc/pseries: hvcall stack frame overhead")
Co-developed-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230929172337.7906-1-atrajeev@linux.vnet.ibm.com
It used to be impossible to select CONFIG_CPM2 without selecting
CONFIG_FSL_SOC at the same time because CONFIG_CPM2 was dependent
on CONFIG_8260 and CONFIG_8260 was selecting CONFIG_FSL_SOC.
But after commit eb5aa21372 ("powerpc/82xx: Remove CONFIG_8260
and CONFIG_8272") CONFIG_CPM2 depends on CONFIG_PPC_82xx instead
but CONFIG_PPC_82xx doesn't directly selects CONFIG_FSL_SOC.
Fix it by forcing CONFIG_PPC_82xx to select CONFIG_FSL_SOC just
like already done by PPC_8xx, PPC_MPC512x, PPC_83xx, PPC_86xx.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: eb5aa21372 ("powerpc/82xx: Remove CONFIG_8260 and CONFIG_8272")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7ab513546148ebe33ddd4b0ea92c7bfd3cce3ad7.1694705016.git.christophe.leroy@csgroup.eu
for_each_node_by_name performs an of_node_get on each
iteration, so a break out of the loop requires an
of_node_put.
This was done using the Coccinelle semantic patch
iterators/for_each_child.cocci
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230907095521.14053-4-Julia.Lawall@inria.fr
Here is the big set of tty and serial driver changes for 6.6-rc1.
Lots of cleanups in here this cycle, and some driver updates. Short
summary is:
- Jiri's continued work to make the tty code and apis be a bit more
sane with regards to modern kernel coding style and types
- cpm_uart driver updates
- n_gsm updates and fixes
- meson driver updates
- sc16is7xx driver updates
- 8250 driver updates for different hardware types
- qcom-geni driver fixes
- tegra serial driver change
- stm32 driver updates
- synclink_gt driver cleanups
- tty structure size reduction
All of these have been in linux-next this week with no reported issues.
The last bit of cleanups from Jiri and the tty structure size reduction
came in last week, a bit late but as they were just style changes and
size reductions, I figured they should get into this merge cycle so that
others can work on top of them with no merge conflicts.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH+jA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykKyACgldt6QeenTN+6dXIHS/eQHtTKZwMAn3arSeXI
QrUUnLFjOWyoX87tbMBQ
=LVw0
-----END PGP SIGNATURE-----
Merge tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big set of tty and serial driver changes for 6.6-rc1.
Lots of cleanups in here this cycle, and some driver updates. Short
summary is:
- Jiri's continued work to make the tty code and apis be a bit more
sane with regards to modern kernel coding style and types
- cpm_uart driver updates
- n_gsm updates and fixes
- meson driver updates
- sc16is7xx driver updates
- 8250 driver updates for different hardware types
- qcom-geni driver fixes
- tegra serial driver change
- stm32 driver updates
- synclink_gt driver cleanups
- tty structure size reduction
All of these have been in linux-next this week with no reported
issues. The last bit of cleanups from Jiri and the tty structure size
reduction came in last week, a bit late but as they were just style
changes and size reductions, I figured they should get into this merge
cycle so that others can work on top of them with no merge conflicts"
* tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits)
tty: shrink the size of struct tty_struct by 40 bytes
tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw()
tty: n_tty: extract ECHO_OP processing to a separate function
tty: n_tty: unify counts to size_t
tty: n_tty: use u8 for chars and flags
tty: n_tty: simplify chars_in_buffer()
tty: n_tty: remove unsigned char casts from character constants
tty: n_tty: move newline handling to a separate function
tty: n_tty: move canon handling to a separate function
tty: n_tty: use MASK() for masking out size bits
tty: n_tty: make n_tty_data::num_overrun unsigned
tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun()
tty: n_tty: use 'num' for writes' counts
tty: n_tty: use output character directly
tty: n_tty: make flow of n_tty_receive_buf_common() a bool
Revert "tty: serial: meson: Add a earlycon for the T7 SoC"
Documentation: devices.txt: Fix minors for ttyCPM*
Documentation: devices.txt: Remove ttySIOC*
Documentation: devices.txt: Remove ttyIOC*
serial: 8250_bcm7271: improve bcm7271 8250 port
...
- Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the
configured SMT state when hotplugging CPUs into the system.
- Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix
MMU to avoid a broadcast TLBIE flush on exit.
- Drop the exclusion between ptrace/perf watchpoints, and drop the now unused
associated arch hooks.
- Add support for the "nohlt" command line option to disable CPU idle.
- Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1.
- Rework memory block size determination, and support 256MB size on systems
with GPUs that have hotpluggable memory.
- Various other small features and fixes.
Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev,
Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand,
Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin
Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He,
Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara
R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick
Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell
Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav
Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, Zheng Zengkai.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmTwgbwTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFmpD/432vipeoqvkAYsyK0xi/Y3GcY0wcyd
WJApLXXadEbtKQrgXQ6sowWqalg5thYnQCRarg/tXKK/po3KfgwkPjGDpOL+cIdr
12QVN2XJm9VmJ1wYJxzk+yXx4F43AdmMdr94qWAGufbTHezwb4UpzVR1NxtFrOE/
X5TNsC2+2mdZY/ZaNHS5vsTIFv3EhQfqgjZPlIAdLn6CGc8xWT514Q/uHA8+ytM/
HL7Hqs33DoPSvgTa5TT/2E0d0k5nO3P5KObzAjpYlireTPaBi51mpKGewcrtm0o2
v3cBlbfx3C7pe9ZhKBK9BH8cjynfiqsVZ9/lCw/7eBNdm9tHuzG0jeS7Db9tCZXS
fM7G2R7SoIusPTqxlBmkU5DpYslwrHiVgCyy3ijxkoA/fakVwh/GgTcMsRt73IY6
n6DsUvWwuYHCIeIiHmHQJqCqCRtV+aMzU3AbbBHOjtdIanhlW16M686dEsgCirh7
akRVRD5VqKaqXs34PpkRL89Xv3wZRjl6XZ3hZFfCjSYXfpXDXhgSToIskpHYhKL8
gpY7WtG9YQP05Xz5HRCx6EluaZVeKe0lZi6fezX7Mi9AygJQO8FfXqP1mHBlEq40
ThWtvL9D89RV6lADqqFN20XepgvKNOyAXcE4szvsnIZYUSPmZQZSPxx+DHtROaLP
jX3ifxtxJp92pQ==
=5g7K
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the
configured SMT state when hotplugging CPUs into the system
- Combine final TLB flush and lazy TLB mm shootdown IPIs when using the
Radix MMU to avoid a broadcast TLBIE flush on exit
- Drop the exclusion between ptrace/perf watchpoints, and drop the now
unused associated arch hooks
- Add support for the "nohlt" command line option to disable CPU idle
- Add support for -fpatchable-function-entry for ftrace, with GCC >=
13.1
- Rework memory block size determination, and support 256MB size on
systems with GPUs that have hotpluggable memory
- Various other small features and fixes
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam
Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel
Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof
Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar,
Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar
Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh
Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain,
Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai.
* tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits)
macintosh/ams: linux/platform_device.h is needed
powerpc/xmon: Reapply "Relax frame size for clang"
powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached
powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled
powerpc/iommu: Fix notifiers being shared by PCI and VIO buses
powerpc/mpc5xxx: Add missing fwnode_handle_put()
powerpc/config: Disable SLAB_DEBUG_ON in skiroot
powerpc/pseries: Remove unused hcall tracing instruction
powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n
powerpc: dts: add missing space before {
powerpc/eeh: Use pci_dev_id() to simplify the code
powerpc/64s: Move CPU -mtune options into Kconfig
powerpc/powermac: Fix unused function warning
powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT
powerpc: Don't include lppaca.h in paca.h
powerpc/pseries: Move hcall_vphn() prototype into vphn.h
powerpc/pseries: Move VPHN constants into vphn.h
cxl: Drop unused detach_spa()
powerpc: Drop zalloc_maybe_bootmem()
powerpc/powernv: Use struct opal_prd_msg in more places
...
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support tracking
KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the GENERIC_IOREMAP
ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency improvements
("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation, from
Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code ("Two
minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table range
API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM subsystem
documentation ("Improve mm documentation").
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
=Afws
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in
add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support
tracking KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with
UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the
GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
architectures to take GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency
improvements ("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation,
from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code
("Two minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
most file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
memmap on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock
migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table
range API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM
subsystem documentation ("Improve mm documentation").
* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
maple_tree: shrink struct maple_tree
maple_tree: clean up mas_wr_append()
secretmem: convert page_is_secretmem() to folio_is_secretmem()
nios2: fix flush_dcache_page() for usage from irq context
hugetlb: add documentation for vma_kernel_pagesize()
mm: add orphaned kernel-doc to the rst files.
mm: fix clean_record_shared_mapping_range kernel-doc
mm: fix get_mctgt_type() kernel-doc
mm: fix kernel-doc warning from tlb_flush_rmaps()
mm: remove enum page_entry_size
mm: allow ->huge_fault() to be called without the mmap_lock held
mm: move PMD_ORDER to pgtable.h
mm: remove checks for pte_index
memcg: remove duplication detection for mem_cgroup_uncharge_swap
mm/huge_memory: work on folio->swap instead of page->private when splitting folio
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
mm/swap: use dedicated entry for swap in folio
mm/swap: stop using page->private on tail pages for THP_SWAP
selftests/mm: fix WARNING comparing pointer to 0
selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
...
Core
----
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with large
writes operations.
- Store netdevs in an xarray, to simplify iterating over netdevs.
- Refactor nexthop selection for multipath routes.
- Improve sched class lifetime handling.
- Add backup nexthop ID support for bridge.
- Implement drop reasons support in openvswitch.
- Several data races annotations and fixes.
- Constify the sk parameter of routing functions.
- Prepend kernel version to netconsole message.
Protocols
---------
- Implement support for TCP probing the peer being under memory
pressure.
- Remove hard coded limitation on IPv6 specific info placement
inside the socket struct.
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated
per socket scaling factor.
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes.
- In-kernel support for the TLS alert protocol.
- Better support for UDP reuseport with connected sockets.
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size.
- Get rid of additional ancillary per MPTCP connection struct socket.
- Implement support for BPF-based MPTCP packet schedulers.
- Format MPTCP subtests selftests results in TAP.
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation.
BPF
---
- Multi-buffer support in AF_XDP.
- Add multi uprobe BPF links for attaching multiple uprobes
and usdt probes, which is significantly faster and saves extra fds.
- Implement an fd-based tc BPF attach API (TCX) and BPF link support on
top of it.
- Add SO_REUSEPORT support for TC bpf_sk_assign.
- Support new instructions from cpu v4 to simplify the generated code and
feature completeness, for x86, arm64, riscv64.
- Support defragmenting IPv(4|6) packets in BPF.
- Teach verifier actual bounds of bpf_get_smp_processor_id()
and fix perf+libbpf issue related to custom section handling.
- Introduce bpf map element count and enable it for all program types.
- Add a BPF hook in sys_socket() to change the protocol ID
from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy.
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress.
- Add uprobe support for the bpf_get_func_ip helper.
- Check skb ownership against full socket.
- Support for up to 12 arguments in BPF trampoline.
- Extend link_info for kprobe_multi and perf_event links.
Netfilter
---------
- Speed-up process exit by aborting ruleset validation if a
fatal signal is pending.
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types.
Driver API
----------
- Page pool optimizations, to improve data locality and cache usage.
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need
for raw ioctl() handling in drivers.
- Simplify genetlink dump operations (doit/dumpit) providing them
the common information already populated in struct genl_info.
- Extend and use the yaml devlink specs to [re]generate the split ops.
- Introduce devlink selective dumps, to allow SF filtering SF based on
handle and other attributes.
- Add yaml netlink spec for netlink-raw families, allow route, link and
address related queries via the ynl tool.
- Remove phylink legacy mode support.
- Support offload LED blinking to phy.
- Add devlink port function attributes for IPsec.
New hardware / drivers
----------------------
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers
-------
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq
9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr
qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO
/QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq
OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz
6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH
ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1
/daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3
Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW
Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n
4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP
DCW8VGFOZjZM
=9CsR
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory
pressure
- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool
- Remove phylink legacy mode support
- Support offload LED blinking to phy
- Add devlink port function attributes for IPsec
New hardware / drivers:
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers:
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTKAAKCRCRxhvAZXjc
oifJAQCzi/p+AdQu8LA/0XvR7fTwaq64ZDCibU4BISuLGT2kEgEAuGbuoFZa0rs2
XYD/s4+gi64p9Z01MmXm2XO1pu3GPg0=
=eJz5
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner:
"This adds VFS support for multi-grain timestamps and converts tmpfs,
xfs, ext4, and btrfs to use them. This carries acks from all relevant
filesystems.
The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot of metadata updates, down to around 1 per
jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide to invalidate the cache.
Even with NFSv4, a lot of exported filesystems don't properly support
a change attribute and are subject to the same problems with timestamp
granularity. Other applications have similar issues with timestamps
(e.g., backup applications).
If we were to always use fine-grained timestamps, that would improve
the situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
This introduces fine-grained timestamps that are used when they are
actively queried.
This uses the 31st bit of the ctime tv_nsec field to indicate that
something has queried the inode for the mtime or ctime. When this flag
is set, on the next mtime or ctime update, the kernel will fetch a
fine-grained timestamp instead of the usual coarse-grained one.
As POSIX generally mandates that when the mtime changes, the ctime
must also change the kernel always stores normalized ctime values, so
only the first 30 bits of the tv_nsec field are ever used.
Filesytems can opt into this behavior by setting the FS_MGTIME flag in
the fstype. Filesystems that don't set this flag will continue to use
coarse-grained timestamps.
Various preparatory changes, fixes and cleanups are included:
- Fixup all relevant places where POSIX requires updating ctime
together with mtime. This is a wide-range of places and all
maintainers provided necessary Acks.
- Add new accessors for inode->i_ctime directly and change all
callers to rely on them. Plain accesses to inode->i_ctime are now
gone and it is accordingly rename to inode->__i_ctime and commented
as requiring accessors.
- Extend generic_fillattr() to pass in a request mask mirroring in a
sense the statx() uapi. This allows callers to pass in a request
mask to only get a subset of attributes filled in.
- Rework timestamp updates so it's possible to drop the @now
parameter the update_time() inode operation and associated helpers.
- Add inode_update_timestamps() and convert all filesystems to it
removing a bunch of open-coding"
* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
btrfs: convert to multigrain timestamps
ext4: switch to multigrain timestamps
xfs: switch to multigrain timestamps
tmpfs: add support for multigrain timestamps
fs: add infrastructure for multigrain timestamps
fs: drop the timespec64 argument from update_time
xfs: have xfs_vn_update_time gets its own timestamp
fat: make fat_update_time get its own timestamp
fat: remove i_version handling from fat_update_time
ubifs: have ubifs_update_time use inode_update_timestamps
btrfs: have it use inode_update_timestamps
fs: drop the timespec64 arg from generic_update_time
fs: pass the request_mask to generic_fillattr
fs: remove silly warning from current_time
gfs2: fix timestamp handling on quota inodes
fs: rename i_ctime field to __i_ctime
selinux: convert to ctime accessor functions
security: convert to ctime accessor functions
apparmor: convert to ctime accessor functions
sunrpc: convert to ctime accessor functions
...
When JUMP_LABEL=n, the tracepoint refcount test in the pre-call stores
the refcount value to the stack, so the same value can be used for the
post-call (presumably to avoid racing with the value concurrently
changing).
On little-endian (ELFv2) that might have just worked by luck, because
32(r1) is STK_PARAM(R3) there and so the value save gets clobbered by
the tracing code when it's non-zero, but fortunately r3 is the hcall
number and 0 is an invalid hcall number so it should get clobbered by
another non-zero value. In any case, commit cc1adb5f32
("powerpc/pseries: Use jump labels for hcall tracepoints") removed the
code that actually used the value stored, so now it's just dead code.
It's fragile to be storing to the stack like this, and confusing. Better
remove it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230509091600.70994-2-npiggin@gmail.com
With JUMP_LABEL=n, hcall_tracepoint_refcount's address is being tested
instead of its value. This results in the tracing slowpath always being
taken unnecessarily.
Fixes: 9a10ccb29c ("powerpc/pseries: move hcall_tracepoint_refcount out of .toc")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230509091600.70994-1-npiggin@gmail.com
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it mannually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230815023303.3515503-1-zhangjialin11@huawei.com
Currently the -mtune options are set in the Makefile, depending on what
the compiler supports.
One downside of doing it that way is that the chosen -mtune option is
not recorded in the .config.
Another downside is that if there's ever a need to do more complicated
logic to calculate the correct option, that gets messy in the Makefile.
So move the determination of which -mtune option to use into Kconfig
logic.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230329234308.2215833-1-mpe@ellerman.id.au
Clang reports:
arch/powerpc/platforms/powermac/feature.c:137:19: error: unused function 'simple_feature_tweak'
It's only used inside the #ifndef CONFIG_PPC64 block, so move it in
there to fix the warning. While at it drop the inline, the compiler will
decide whether it should be inlined or not.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308181501.AR5HMDWC-lkp@intel.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230821140949.491881-1-mpe@ellerman.id.au
lppaca_shared_proc() takes a pointer to the lppaca which is typically
accessed through get_lppaca(). With DEBUG_PREEMPT enabled, this leads
to checking if preemption is enabled, for example:
BUG: using smp_processor_id() in preemptible [00000000] code: grep/10693
caller is lparcfg_data+0x408/0x19a0
CPU: 4 PID: 10693 Comm: grep Not tainted 6.5.0-rc3 #2
Call Trace:
dump_stack_lvl+0x154/0x200 (unreliable)
check_preemption_disabled+0x214/0x220
lparcfg_data+0x408/0x19a0
...
This isn't actually a problem however, as it does not matter which
lppaca is accessed, the shared proc state will be the same.
vcpudispatch_stats_procfs_init() already works around this by disabling
preemption, but the lparcfg code does not, erroring any time
/proc/powerpc/lparcfg is accessed with DEBUG_PREEMPT enabled.
Instead of disabling preemption on the caller side, rework
lppaca_shared_proc() to not take a pointer and instead directly access
the lppaca, bypassing any potential preemption checks.
Fixes: f13c13a005 ("powerpc: Stop using non-architected shared_proc field in lppaca")
Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Rework to avoid needing a definition in paca.h and lppaca.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055317.751786-4-mpe@ellerman.id.au
Use the newly added struct opal_prd_msg in some other functions that
operate on opal_prd messages, rather than using other types.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230821142820.497107-2-mpe@ellerman.id.au
As reported by Mahesh & Aneesh, opal_prd_msg_notifier() triggers a
FORTIFY_SOURCE warning:
memcpy: detected field-spanning write (size 32) of single field "&item->msg" at arch/powerpc/platforms/powernv/opal-prd.c:355 (size 4)
WARNING: CPU: 9 PID: 660 at arch/powerpc/platforms/powernv/opal-prd.c:355 opal_prd_msg_notifier+0x174/0x188 [opal_prd]
NIP opal_prd_msg_notifier+0x174/0x188 [opal_prd]
LR opal_prd_msg_notifier+0x170/0x188 [opal_prd]
Call Trace:
opal_prd_msg_notifier+0x170/0x188 [opal_prd] (unreliable)
notifier_call_chain+0xc0/0x1b0
atomic_notifier_call_chain+0x2c/0x40
opal_message_notify+0xf4/0x2c0
This happens because the copy is targeting item->msg, which is only 4
bytes in size, even though the enclosing item was allocated with extra
space following the msg.
To fix the warning define struct opal_prd_msg with a union of the header
and a flex array, and have the memcpy target the flex array.
Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Reported-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Tested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230821142820.497107-1-mpe@ellerman.id.au
The debugfs_create_dir returns ERR_PTR incase of an error and the
correct way of checking it by using the IS_ERR inline function, and
not the simple null comparision. This patch fixes this.
Suggested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Immad Mir <mirimmad17@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/CY5PR12MB64553EE96EBB3927311DB598C6459@CY5PR12MB6455.namprd12.prod.outlook.com
Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB
We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature can
work with a memory block size of 256MB. Using altmap.reserve feature to
align things correctly at pageblock granularity. We can end up losing
some pages in memory with this. For ex: with a 256MiB memory block size,
we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.
Link: https://lkml.kernel.org/r/20230808091501.287660-6-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is enabled only with radix translation and 1G hugepage size. This
will be used with devdax device memory with a namespace alignment of 1G.
Anon transparent hugepage is not supported even though we do have helpers
checking pud_trans_huge(). We should never find that return true. The
only expected pte bit combination is _PAGE_PTE | _PAGE_DEVMAP.
Some of the helpers are never expected to get called on hash translation
and hence is marked to call BUG() in such a case.
Link: https://lkml.kernel.org/r/20230724190759.483013-10-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Parse the device tree in early init to find the memory block size to be
used by the kernel. Consolidate the memory block size device tree parsing
to one helper and use that on both powernv and pseries. We still want to
use machine-specific callback because on all machine types other than
powernv and pseries we continue to return MIN_MEMORY_BLOCK_SIZE.
pseries_memory_block_size used to look for the second memory
block (memory@x) to determine the memory_block_size value. This patch
changed that to look at all memory blocks and make sure we can map them all
correctly using the computed memory block size value.
Add workaround to force 256MB memory block size if device driver managed
memory such as GPU memory is present. This helps to add GPU memory
that is not aligned to 1G.
Co-developed-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230801044447.11275-1-aneesh.kumar@linux.ibm.com
A W=1 build of ppc40x_defconfig throws the followings errors:
CC arch/powerpc/platforms/4xx/uic.o
arch/powerpc/platforms/4xx/uic.c:274:13: warning: no previous prototype for 'uic_init_tree' [-Wmissing-prototypes]
274 | void __init uic_init_tree(void)
| ^~~~~~~~~~~~~
arch/powerpc/platforms/4xx/uic.c:319:14: warning: no previous prototype for 'uic_get_irq' [-Wmissing-prototypes]
319 | unsigned int uic_get_irq(void)
| ^~~~~~~~~~~
CC arch/powerpc/platforms/4xx/machine_check.o
CC arch/powerpc/platforms/4xx/soc.o
arch/powerpc/platforms/4xx/soc.c:193:6: warning: no previous prototype for 'ppc4xx_reset_system' [-Wmissing-prototypes]
193 | void ppc4xx_reset_system(char *cmd)
| ^~~~~~~~~~~~~~~~~~~
Add missing includes to get the missing prototypes.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/c8253017e355638132737ff47936e290df8738d1.1692282432.git.christophe.leroy@csgroup.eu
ppc4xx_defconfig with W=1 results in:
CC arch/powerpc/platforms/44x/warp.o
arch/powerpc/platforms/44x/warp.c:369:5: error: no previous prototype for 'pika_dtm_register_shutdown' [-Werror=missing-prototypes]
369 | int pika_dtm_register_shutdown(void (*func)(void *arg), void *arg)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/platforms/44x/warp.c:374:5: error: no previous prototype for 'pika_dtm_unregister_shutdown' [-Werror=missing-prototypes]
374 | int pika_dtm_unregister_shutdown(void (*func)(void *arg), void *arg)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
The functions were added by commit 4ebef31fa6 ("[POWERPC] PIKA Warp:
Update platform code to support Rev B boards")
Those functions are not used localy and allthough their symbols are
exported they are not declared in any header file so they can't be used.
Remove them, then remove the associated list as it will now remain empty
hence becomes useless.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/830923f0e0375a14609204246d302c7476a8f948.1692279855.git.christophe.leroy@csgroup.eu
A W=1 build of mpc885_ads_defconfig throws the following error:
CC arch/powerpc/platforms/8xx/m8xx_setup.o
arch/powerpc/platforms/8xx/m8xx_setup.c:41:1: error: no previous prototype for 'init_internal_rtc' [-Werror=missing-prototypes]
41 | init_internal_rtc(void)
| ^~~~~~~~~~~~~~~~~
init_internal_rtc() was introduced by commit df34403dca ("[POWERPC]
8xx: Add mpc885ads support and common mpc8xx files") as a weak
function but has never been defined and/or used outside m8xx_setup.c
As it is called only once there, just fold it into its caller and
remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/0aa1141e18a84d926e199093204b37ec993f0c87.1692275185.git.christophe.leroy@csgroup.eu
Replace printk(KERN_WARN with pr_warn(
Remove a couple of blank lines
Re-align multi-line code.
Replace asm/io.h by linux/io.h
mpc83xx.h doesn't need linux/device.h or asm/pci-bridge.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2cb498f637e082a4af8032311fad3cae84d6aa5d.1692199324.git.christophe.leroy@csgroup.eu
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it mannually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230804080435.191196-1-wangxiongfeng2@huawei.com
If device_register() returns error in ibmebus_bus_init(), name of kobject
which is allocated in dev_set_name() called in device_add() is leaked.
As comment of device_add() says, it should call put_device() to drop
the reference count that was set in device_initialize() when it fails,
so the name can be freed in kobject_cleanup().
Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20221110011929.3709774-1-ruanjinjie@huawei.com
Since commit fb533d0c5a ("[POWERPC] 8xx: Infrastructure code cleanup.")
immr_map() is just returning mpc8xxx_immr pointer and immr_unmap()
do nothing.
We already have parts of code that use mpc8xxx_immr directly so get rid
of immr_map() and immr_unmap() by using mpc8xxx_immr directly. And avoid
going through local pointers that hide the pointed structure.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/633ed46f6015ff44d5599258647ea517f75d6a1d.1691474658.git.christophe.leroy@csgroup.eu
SIU_INT_IRQ1 is not used anywhere and __IO_BASE is defined in
asm/io.h
Remove m82xx_pci.h
Then the only thing remaining in mpc8260.h is MPC82XX_BCR_PLDP
Move MPC82XX_BCR_PLDP into asm/cpm2.h then remove mpc8260.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/afe23bf3624c389ff17e9789884c78c124b7b202.1691474658.git.christophe.leroy@csgroup.eu
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it mannually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230811102039.17257-1-zhengzengkai@huawei.com
Don't use kernel-doc "/**" comment format for non-kernel-doc comments.
This prevents a kernel-doc warning:
arch/powerpc/platforms/pseries/plpks.c:186: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Label is combination of label attributes + name.
Fixes: 2454a7af0f ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202308040430.GxmPAnwZ-lkp@intel.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230810000740.23756-1-rdunlap@infradead.org