Commit Graph

3473 Commits

Author SHA1 Message Date
Coiby Xu
5eb3f60554 x86/crash: pass dm crypt keys to kdump kernel
1st kernel will build up the kernel command parameter dmcryptkeys as
similar to elfcorehdr to pass the memory address of the stored info of dm
crypt key to kdump kernel.

Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21 10:48:21 -07:00
Coiby Xu
9ebfa8dcae crash_dump: reuse saved dm crypt keys for CPU/memory hot-plugging
When there are CPU and memory hot un/plugs, the dm crypt keys may need to
be reloaded again depending on the solution for crash hotplug support. 
Currently, there are two solutions.  One is to utilizes udev to instruct
user space to reload the kdump kernel image and initrd, elfcorehdr and etc
again.  The other is to only update the elfcorehdr segment introduced in
commit 2472627561 ("crash: add generic infrastructure for crash hotplug
support").

For the 1st solution, the dm crypt keys need to be reloaded again.  The
user space can write true to /sys/kernel/config/crash_dm_crypt_key/reuse
so the stored keys can be re-used.

For the 2nd solution, the dm crypt keys don't need to be reloaded. 
Currently, only x86 supports the 2nd solution.  If the 2nd solution gets
extended to all arches, this patch can be dropped.

Link: https://lkml.kernel.org/r/20250502011246.99238-5-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21 10:48:21 -07:00
Coiby Xu
180cf31af7 crash_dump: make dm crypt keys persist for the kdump kernel
A configfs /sys/kernel/config/crash_dm_crypt_keys is provided for user
space to make the dm crypt keys persist for the kdump kernel.  Take the
case of dumping to a LUKS-encrypted target as an example, here is the life
cycle of the kdump copies of LUKS volume keys,

 1. After the 1st kernel loads the initramfs during boot, systemd uses
    an user-input passphrase to de-crypt the LUKS volume keys or simply
    TPM-sealed volume keys and then save the volume keys to specified
    keyring (using the --link-vk-to-keyring API) and the keys will expire
    within specified time.

 2. A user space tool (kdump initramfs loader like kdump-utils) create
    key items inside /sys/kernel/config/crash_dm_crypt_keys to inform
    the 1st kernel which keys are needed.

 3. When the kdump initramfs is loaded by the kexec_file_load
    syscall, the 1st kernel will iterate created key items, save the
    keys to kdump reserved memory.

 4. When the 1st kernel crashes and the kdump initramfs is booted, the
    kdump initramfs asks the kdump kernel to create a user key using the
    key stored in kdump reserved memory by writing yes to
    /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted
    device is unlocked with libcryptsetup's --volume-key-keyring API.

 5. The system gets rebooted to the 1st kernel after dumping vmcore to
    the LUKS encrypted device is finished

Eventually the keys have to stay in the kdump reserved memory for the
kdump kernel to unlock encrypted volumes.  During this process, some
measures like letting the keys expire within specified time are desirable
to reduce security risk.

This patch assumes,
1) there are 128 LUKS devices at maximum to be unlocked thus
   MAX_KEY_NUM=128.

2) a key description won't exceed 128 bytes thus KEY_DESC_MAX_LEN=128.

And here is a demo on how to interact with
/sys/kernel/config/crash_dm_crypt_keys,

    # Add key #1
    mkdir /sys/kernel/config/crash_dm_crypt_keys/7d26b7b4-e342-4d2d-b660-7426b0996720
    # Add key #1's description
    echo cryptsetup:7d26b7b4-e342-4d2d-b660-7426b0996720 > /sys/kernel/config/crash_dm_crypt_keys/description

    # how many keys do we have now?
    cat /sys/kernel/config/crash_dm_crypt_keys/count
    1

    # Add key# 2 in the same way

    # how many keys do we have now?
    cat /sys/kernel/config/crash_dm_crypt_keys/count
    2

    # the tree structure of /crash_dm_crypt_keys configfs
    tree /sys/kernel/config/crash_dm_crypt_keys/
    /sys/kernel/config/crash_dm_crypt_keys/
    ├── 7d26b7b4-e342-4d2d-b660-7426b0996720
    │   └── description
    ├── count
    ├── fce2cd38-4d59-4317-8ce2-1fd24d52c46a
    │   └── description

Link: https://lkml.kernel.org/r/20250502011246.99238-3-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21 10:48:20 -07:00
Shashank Balaji
dc9f08bac2 cgroup, docs: be specific about bandwidth control of rt processes
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-21 06:33:47 -10:00
Greg Kroah-Hartman
6381f99504 thunderbolt: Changes for v6.16 merge window
This includes following USB4/Thunderbolt changes for the v6.16 merge
 window:
 
   - Enable wake on connect and disconnect over system suspend.
   - Add mapping between Type-C ports and USB4 ports on non-Chrome systems.
   - Expose tunneling related events to userspace.
 
 All these have been in linux-next with no reported issues.
 -----BEGIN PGP SIGNATURE-----
 
 iQJUBAABCgA+FiEEVTdhRGBbNzLrSUBaAP2fSd+ZWKAFAmgsVNUgHG1pa2Eud2Vz
 dGVyYmVyZ0BsaW51eC5pbnRlbC5jb20ACgkQAP2fSd+ZWKCopQ/9GsI7l9d5gswZ
 w+LE1ouz5lOFlw+RV3EpMeb8nSzkTSoxtlM4gOlRg5zEec4l9MW6LUVQ/n0mBCNY
 R22Jc3KhgFqIX1G0XOW8t4g3piLAG3Q7NauzTSdRrC3bGKV73FjMw3WMSmlEE68E
 jQxmsfPnJdcM9joxCdHxIqVBfmTiv+IKU7+60a8YnIllfd+aaXcrbU4bRkgN/dbN
 f0Hw5av0K5K0qNejn/egaQHxBp9zJwzIitYTnLLncc5s0s44LPauJt+bxakeje92
 bae4oPJUZlJovOXwclT9alcZ78GjRNNx80CyF7QXVFWb6eXweKrOhveouyGaeXWw
 x4kJDZW2LroL5A1f+7i4iX6Ng9tqkCl18/KUGz+NDjghD9YTQtj1VYQlo0HEzX0O
 lnNxXPjkNAiTxxdGmcwhYSWZPGblTWfxYcrnrcnr11EBFWXyNw0i06sq4b1MdGpO
 2yTWlwQLFgLkMp003LUIUTiNVj7aEsAiPmHoApRfwcsehGLhPpPTpBDiFMLrvjwW
 ycZ5obGMsBsvrZMr3hSEACiGIT0j2pjl7IxVCaznjVW0qyaIv56mePBAxHCyL8nu
 wkDilYctsGwjIdBhsN4laZ7uGT3fByjBc6oetx+3VjY4ZO9oLKKQLcH49/ZGrfdP
 lZMUkiwyDR8Qsv8lNg6JXqg0SIFAqQE=
 =yU5Q
 -----END PGP SIGNATURE-----

Merge tag 'thunderbolt-for-v6.16-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next

Mika writes:

thunderbolt: Changes for v6.16 merge window

This includes following USB4/Thunderbolt changes for the v6.16 merge
window:

  - Enable wake on connect and disconnect over system suspend.
  - Add mapping between Type-C ports and USB4 ports on non-Chrome systems.
  - Expose tunneling related events to userspace.

All these have been in linux-next with no reported issues.

* tag 'thunderbolt-for-v6.16-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
  Documentation/admin-guide: Document Thunderbolt/USB4 tunneling events
  thunderbolt: Notify userspace about firmware CM tunneling events
  thunderbolt: Notify userspace about software CM tunneling events
  thunderbolt: Introduce domain event message handler
  usb: typec: Connect Type-C port with associated USB4 port
  thunderbolt: Add Thunderbolt/USB4 <-> USB3 match function
  thunderbolt: Expose usb4_port_index() to other modules
  thunderbolt: Fix a logic error in wake on connect
  thunderbolt: Use wake on connect and disconnect over suspend
2025-05-21 12:26:51 +02:00
Shivam Sharma
95112d977f docs: admin-guide: fix typos in reporting-issues.rst
Correct pin-point to pinpoint, If that the case to If that is the case,
and its only slightly modified to it's only slightly modified in
Documentation/admin-guide/reporting-issues.rst for proper spelling and grammar.

Signed-off-by: Shivam Sharma <10sharmashivam@gmail.com>
Acked-by: Thorsten Leemhuis <linux@leemhuis.info>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20250518172658.6983-1-10sharmashivam@gmail.com>
2025-05-19 05:29:10 -06:00
Joel Fernandes
aafe12f980 rcutorture: Perform more frequent testing of ->gpwrap
Currently, the ->gpwrap is not tested (at all per my testing) due to the
requirement of a large delta between a CPU's rdp->gp_seq and its node's
rnp->gpseq.

This results in no testing of ->gpwrap being set. This patch by default
adds 5 minutes of testing with ->gpwrap forced by lowering the delta
between rdp->gp_seq and rnp->gp_seq to just 8 GPs. All of this is
configurable, including the active time for the setting and a full
testing cycle.

By default, the first 25 minutes of a test will have the _default_
behavior there is right now (ULONG_MAX / 4) delta. Then for 5 minutes,
we switch to a smaller delta causing 1-2 wraps in 5 minutes. I believe
this is reasonable since we at least add a little bit of testing for
usecases where ->gpwrap is set.

[ Apply fix for Dan Carpenter's bug report on init path cleanup. ]
[ Apply kernel doc warning fix from Akira Yokosawa. ]

Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16 11:13:27 -04:00
Jakub Kicinski
bebd7b2626 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc7).

Conflicts:

tools/testing/selftests/drivers/net/hw/ncdevmem.c
  97c4e094a4 ("tests/ncdevmem: Fix double-free of queue array")
  2f1a805f32 ("selftests: ncdevmem: Implement devmem TCP TX")
https://lore.kernel.org/20250514122900.1e77d62d@canb.auug.org.au

Adjacent changes:

net/core/devmem.c
net/core/devmem.h
  0afc44d8cd ("net: devmem: fix kernel panic when netlink socket close after module unload")
  bd61848900 ("net: devmem: Implement TX path")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15 11:28:30 -07:00
Srinivas Pandruvada
e636e3f742
Documentation: admin-guide: pm: Add documentation for die_id
Add documentation to describe die_id attribute.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250508230250.1186619-6-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-15 14:47:28 +03:00
Srinivas Pandruvada
bfbe7729d6
Documentation: admin-guide: pm: Add documentation for agent_types
Add documentation to describe agent_types attribute.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250508230250.1186619-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-15 14:47:24 +03:00
Yafang Shao
e7b9cea718
vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup
involves scanning all files and caching their metadata (including dentries
and inodes) in memory. Each HDD contains approximately 2 million files,
resulting in a total of ~20 million cached dentries after initialization.

To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite
this configuration, memory pressure conditions can still trigger
reclamation of up to 50% of cached dentries, reducing the cache from 20
million to approximately 10 million entries. During the subsequent cache
rebuild period, any HDFS datanode restart operation incurs substantial
latency penalties until full cache recovery completes.

To maintain service stability, we need to preserve more dentries during
memory reclamation. The current minimum reclaim ratio (1/100 of total
dentries) remains too aggressive for our workload. This patch introduces
vfs_cache_pressure_denom for more granular cache pressure control. The
configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000]
effectively maintains the full 20 million dentry cache under memory
pressure, preventing datanode restart performance degradation.

Link: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes [0]

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/20250511083624.9305-1-laoar.shao@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-15 11:12:59 +02:00
Rafael J. Wysocki
f20af84c29 cpufreq: intel_pstate: Document hybrid processor support
Describe the support for hybrid processors in intel_pstate, including
the CAS and EAS support, in the admin-guide documentation.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1935040.CQOukoFCf9@rjwysocki.net
2025-05-13 14:35:25 +02:00
Ingo Molnar
c4070e1996 Merge commit 'its-for-linus-20250509-merge' into x86/core, to resolve conflicts
Conflicts:
	Documentation/admin-guide/hw-vuln/index.rst
	arch/x86/include/asm/cpufeatures.h
	arch/x86/kernel/alternative.c
	arch/x86/kernel/cpu/bugs.c
	arch/x86/kernel/cpu/common.c
	drivers/base/cpu.c
	include/linux/cpu.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13 10:47:10 +02:00
Ingo Molnar
69cb33e2f8 Merge branch 'x86/microcode' into x86/core, to merge dependent commits
Prepare to resolve conflicts with an upstream series of fixes that conflict
with pending x86 changes:

  6f5bf947ba Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13 10:37:52 +02:00
Petr Vaněk
678927c0c9
Documentation: fix typo in root= kernel parameter description
Fixes a typo in the root= parameter description, changing
"this a a" to "this is a".

Fixes: c0c1a7dcb6 ("init: move the nfs/cifs/ram special cases out of name_to_dev_t")
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Link: https://lore.kernel.org/20250512110827.32530-1-arkamar@atlas.cz
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-13 09:27:57 +02:00
Alexander Graf
3498209ff6 Documentation: add documentation for KHO
With KHO in place, let's add documentation that describes what it is and
how to use it.

Link: https://lkml.kernel.org/r/20250509074635.3187114-17-changyuanl@google.com
Signed-off-by: Alexander Graf <graf@amazon.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Co-developed-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:42 -07:00
Zhongkun He
b40599930f mm: add max swappiness arg to lru_gen for anonymous memory only
The MGLRU already supports reclaiming only from anonymous memory via the
/sys/kernel/debug/lru_gen interface.  Now, memory.reclaim also supports
the swappiness=max parameter to enable reclaiming solely from anonymous
memory.  To unify the semantics of proactive reclaiming from anonymous
folios, the max parameter is introduced.

[hezhongkun.hzk@bytedance.com: use strcmp instead of strncmp, if swappiness is not set, use the default value]
  Link: https://lkml.kernel.org/r/20250507071057.3184240-1-hezhongkun.hzk@bytedance.com
[akpm@linux-foundation.org: tweak coding style]
Link: https://lkml.kernel.org/r/65181f7745d657d664d833c26d8a94cae40538b9.1745225696.git.hezhongkun.hzk@bytedance.com
Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:36 -07:00
Zhongkun He
68a1436bde mm: add swappiness=max arg to memory.reclaim for only anon reclaim
Patch series "add max arg to swappiness in memory.reclaim and lru_gen", v4.

This patchset adds max arg to swappiness in memory.reclaim and lru_gen for
anon only proactive memory reclaim.

With commit <68cd9050d871> ("mm: add swappiness= arg to memory.reclaim")
we can submit an additional swappiness=<val> argument to memory.reclaim. 
It is very useful because we can dynamically adjust the reclamation ratio
based on the anonymous folios and file folios of each cgroup.  For
example,when swappiness is set to 0, we only reclaim from file folios. 
But we can not relciam memory just from anon folios.

This patchset introduces a new macro, SWAPPINESS_ANON_ONLY, defined as
MAX_SWAPPINESS + 1, represent the max arg semantics.  It specifically
indicates that reclamation should occur only from anonymous pages.

Patch 1 adds swappiness=max arg to memory.reclaim suggested-by: Yosry
Ahmed

Patch 2 add more comments for cache_trim_mode from Johannes Weiner in [1].

Patch 3 add max arg to lru_gen for proactive memory reclaim in MGLRU.  The
MGLRU already supports reclaiming exclusively from anonymous pages.  This
patch formalizes that behavior by introducing a max parameter to represent
the corresponding semantics.

Patch 4 using SWAPPINESS_ANON_ONLY in MGLRU Using SWAPPINESS_ANON_ONLY
instead of MAX_SWAPPINESS + 1 to indicate reclaiming only from anonymous
pages makes the code more readable and explicit

Here is the previous discussion:
https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250318135330.3358345-1-hezhongkun.hzk@bytedance.com/


This patch (of 4):

With commit <68cd9050d871> ("mm: add swappiness= arg to memory.reclaim")
we can submit an additional swappiness=<val> argument to memory.reclaim. 
It is very useful because we can dynamically adjust the reclamation ratio
based on the anonymous folios and file folios of each cgroup.  For
example,when swappiness is set to 0, we only reclaim from file folios.

However,we have also encountered a new issue: when swappiness is set to
the MAX_SWAPPINESS, it may still only reclaim file folios.

So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
proactive memory reclaim only reclaims from anonymous folios when
swappiness is set to max.  The swappiness semantics from a user
perspective remain unchanged.

For example, something like this:

echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim

will perform reclaim on the rootcg with a swappiness setting of 'max' (a
new mode) regardless of the file folios.  Users have a more comprehensive
view of the application's memory distribution because there are many
metrics available.  For example, if we find that a certain cgroup has a
large number of inactive anon folios, we can reclaim only those and skip
file folios, because with the zram/zswap, the IO tradeoff that
cache_trim_mode or other file first logic is making doesn't hold - file
refaults will cause IO, whereas anon decompression will not.

With this patch, the swappiness argument of memory.reclaim has a new
mode 'max', means reclaiming just from anonymous folios both in traditional
LRU and MGLRU.

Link: https://lkml.kernel.org/r/cover.1745225696.git.hezhongkun.hzk@bytedance.com
Link: https://lore.kernel.org/all/20250314141833.GA1316033@cmpxchg.org/ [1]
Link: https://lkml.kernel.org/r/519e12b9b1f8c31a01e228c8b4b91a2419684f77.1745225696.git.hezhongkun.hzk@bytedance.com
Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:35 -07:00
Shakeel Butt
c6c895cf2d memcg-introduce-non-blocking-limit-setting-option-v3
add more explanation in doc and commit message on O_NONBLOCK side-effects
(Johannes)

Link: https://lkml.kernel.org/r/20250506232833.3109790-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:35 -07:00
Shakeel Butt
c8e6002bd6 memcg: introduce non-blocking limit setting option
Setting the max and high limits can trigger synchronous reclaim and/or
oom-kill if the usage is higher than the given limit.  This behavior is
fine for newly created cgroups but it can cause issues for the node
controller while setting limits for existing cgroups.

In our production multi-tenant and overcommitted environment, we are
seeing priority inversion when the node controller dynamically adjusts the
limits of running jobs of different priorities.  Based on the system
situation, the node controller may reduce the limits of lower priority
jobs and increase the limits of higher priority jobs.  However we are
seeing node controller getting stuck for long period of time while
reclaiming from lower priority jobs while setting their limits and also
spends a lot of its own CPU.

One of the workaround we are trying is to fork a new process which sets
the limit of the lower priority job along with setting an alarm to get
itself killed if it get stuck in the reclaim for lower priority job. 
However we are finding it very unreliable and costly.  Either we need a
good enough time buffer for the alarm to be delivered after setting limit
and potentialy spend a lot of CPU in the reclaim or be unreliable in
setting the limit for much shorter but cheaper (less reclaim) alarms.

Let's introduce new limit setting option which does not trigger reclaim
and/or oom-kill and let the processes in the target cgroup to trigger
reclaim and/or throttling and/or oom-kill in their next charge request. 
This will make the node controller on multi-tenant overcommitted
environment much more reliable.

Explanation from Johannes on side-effects of O_NONBLOCK limit change:
  It's usually the allocating tasks inside the group bearing the cost of
  limit enforcement and reclaim. This allows a (privileged) updater from
  outside the group to keep that cost in there - instead of having to
  help, from a context that doesn't necessarily make sense.

  I suppose the tradeoff with that - and the reason why this was doing
  sync reclaim in the first place - is that, if the group is idle and
  not trying to allocate more, it can take indefinitely for the new
  limit to actually be met.

  It should be okay in most scenarios in practice. As the capacity is
  reallocated from group A to B, B will exert pressure on A once it
  tries to claim it and thereby shrink it down. If A is idle, that
  shouldn't be hard. If A is running, it's likely to fault/allocate
  soon-ish and then join the effort.

  It does leave a (malicious) corner case where A is just busy-hitting
  its memory to interfere with the clawback. This is comparable to
  reclaiming memory.low overage from the outside, though, which is an
  acceptable risk. Users of O_NONBLOCK just need to be aware.

Link: https://lkml.kernel.org/r/20250419183545.1982187-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:35 -07:00
Christoph Lameter (Ampere)
786d5cc2b9 Update Christoph's Email address and make it consistent
Use cl@gentwo.org throughout and remove the old email addresses.

Link: https://lkml.kernel.org/r/8b962f57-4d98-cbb0-cd82-b6ba456733e8@gentwo.org
Signed-off-by: Christoph Lameter <cl@gentwo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:31 -07:00
SeongJae Park
a7bb1e7545 Docs/admin-guide/mm/damon/usage: document 'nid' file
Add description of 'nid' file, which is optionally used for specific DAMOS
quota goal metrics such as node_mem_{used,free}_bp on DAMON usage
document.

Link: https://lkml.kernel.org/r/20250420194030.75838-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yunjeong Mun <yunjeong.mun@sk.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:30 -07:00
Zhiquan Li
247021624a crash: export PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfo
On Intel TDX guest, unaccepted memory is unusable free memory which is not
managed by buddy, until it's accepted by guest.  Before that, it cannot be
accessed by the first kernel as well as the kexec'ed kernel.  The kexec'ed
kernel will skip these pages and fill in zero data for the reader of
vmcore.

The dump tool like makedumpfile creates a page descriptor (size 24 bytes)
for each non-free page, including zero data page, but it will not create
descriptor for free pages.  If it is not able to distinguish these
unaccepted pages with zero data pages, a certain amount of space will be
wasted in proportion (~1/170).  In fact, as a special kind of free page
the unaccepted pages should be excluded, like the real free pages.

Export the page type PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfo, so that
dump tool can identify whether a page is unaccepted.

[zhiquan1.li@intel.com: fix docs: "Title underline too short" warning]
  Link: https://lore.kernel.org/all/20240809114854.3745464-5-kirill.shutemov@linux.intel.com/
  Link: https://lkml.kernel.org/r/20250405060610.860465-1-zhiquan1.li@intel.com
Link: https://lore.kernel.org/all/20240809114854.3745464-5-kirill.shutemov@linux.intel.com/
Link: https://lkml.kernel.org/r/20250403030801.758687-1-zhiquan1.li@intel.com
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:04 -07:00
Sergey Senozhatsky
b05f8d7e07 Documentation: zram: update IDLE pages tracking documentation
Move IDLE pages tracking into a separate chapter because there are
multiple features that use (or depend on) it either in built-in variant
("mark all") or in extended variant (ac-time tracking).

In addition, recompression doesn't require memory tracking to be enabled
in order to be able to perform idle recompression.

Link: https://lkml.kernel.org/r/20250416042833.3858827-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reported-by: Shin Kawamura <kawasin@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:33 -07:00
Andrei Vagin
a516403787 fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions
Patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard
regions", v2.

Introduce the PAGE_IS_GUARD flag in the PAGEMAP_SCAN ioctl to expose
information about guard regions.  This allows userspace tools, such as
CRIU, to detect and handle guard regions.

Currently, CRIU utilizes PAGEMAP_SCAN as a more efficient alternative to
parsing /proc/pid/pagemap.  Without this change, guard regions are
incorrectly reported as swap-anon regions, leading CRIU to attempt dumping
them and subsequently failing.

The series includes updates to the documentation and selftests to reflect
the new functionality.


This patch (of 3):

Introduce the PAGE_IS_GUARD flag in the PAGEMAP_SCAN ioctl to expose
information about guard regions.  This allows userspace tools, such as
CRIU, to detect and handle guard regions.

Link: https://lkml.kernel.org/r/20250324065328.107678-1-avagin@google.com
Link: https://lkml.kernel.org/r/20250324065328.107678-2-avagin@google.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:16 -07:00
Michal Clapinski
98c9389042 mm/compaction: reduce the difference between low and high watermarks
Reduce the diff between low and high watermarks when compaction
proactiveness is set to high.  This allows users who set the proactiveness
really high to have more stable fragmentation score over time.

Link: https://lkml.kernel.org/r/20250404111103.1994507-3-mclapinski@google.com
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:10 -07:00
Sergey Senozhatsky
cf42d4cccf zram: modernize writeback interface
The writeback interface supports a page_index=N parameter which performs
writeback of the given page.  Since we rarely need to writeback just one
single page, the typical use case involves a number of writeback calls,
each performing writeback of one page:

  echo page_index=100 > zram0/writeback
  ...
  echo page_index=200 > zram0/writeback
  echo page_index=500 > zram0/writeback
  ...
  echo page_index=700 > zram0/writeback

One obvious downside of this is that it increases the number of syscalls. 
Less obvious, but a significantly more important downside, is that when
given only one page to post-process zram cannot perform an optimal target
selection.  This becomes a critical limitation when writeback_limit is
enabled, because under writeback_limit we want to guarantee the highest
memory savings hence we first need to writeback pages that release the
highest amount of zsmalloc pool memory.

This patch adds page_indexes=LOW-HIGH parameter to the writeback
interface:

  echo page_indexes=100-200 page_indexes=500-700 > zram0/writeback

This gives zram a chance to apply an optimal target selection strategy on
each iteration of the writeback loop.

We also now permit multiple page_index parameters per call (previously
zram would recognize only one page_index) and a mix or single pages and
page ranges:

  echo page_index=42 page_index=99 page_indexes=100-200 \
       page_indexes=500-700 > zram0/writeback

Apart from that the patch also unifies parameters passing and resembles
other "modern" zram device attributes (e.g.  recompression), while the old
interface used a mixed scheme: values-less parameters for mode and a
key=value format for page_index.  We still support the "old" value-less
format for compatibility reasons.

[senozhatsky@chromium.org: simplify parse_page_index() range checks, per Brian]
  nk: https://lkml.kernel.org/r/20250404015327.2427684-1-senozhatsky@chromium.org
[sozhatsky@chromium.org: fix uninitialized variable in zram_writeback_slots(), per Dan]
  nk: https://lkml.kernel.org/r/20250409112611.1154282-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20250327015818.4148660-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Richard Chang <richardycc@google.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:09 -07:00
Pawan Gupta
facd226f7e x86/its: Add support for RSB stuffing mitigation
When retpoline mitigation is enabled for spectre-v2, enabling
call-depth-tracking and RSB stuffing also mitigates ITS. Add cmdline option
indirect_target_selection=stuff to allow enabling RSB stuffing mitigation.

When retpoline mitigation is not enabled, =stuff option is ignored, and
default mitigation for ITS is deployed.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09 13:22:05 -07:00
Pawan Gupta
2665281a07 x86/its: Add "vmexit" option to skip mitigation on some CPUs
Ice Lake generation CPUs are not affected by guest/host isolation part of
ITS. If a user is only concerned about KVM guests, they can now choose a
new cmdline option "vmexit" that will not deploy the ITS mitigation when
CPU is not affected by guest/host isolation. This saves the performance
overhead of ITS mitigation on Ice Lake gen CPUs.

When "vmexit" option selected, if the CPU is affected by ITS guest/host
isolation, the default ITS mitigation is deployed.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09 13:22:05 -07:00
Pawan Gupta
f4818881c4 x86/its: Enable Indirect Target Selection mitigation
Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with
eIBRS. It affects prediction of indirect branch and RETs in the
lower half of cacheline. Due to ITS such branches may get wrongly predicted
to a target of (direct or indirect) branch that is located in the upper
half of the cacheline.

Scope of impact
===============

Guest/host isolation
--------------------
When eIBRS is used for guest/host isolation, the indirect branches in the
VMM may still be predicted with targets corresponding to branches in the
guest.

Intra-mode
----------
cBPF or other native gadgets can be used for intra-mode training and
disclosure using ITS.

User/kernel isolation
---------------------
When eIBRS is enabled user/kernel isolation is not impacted.

Indirect Branch Prediction Barrier (IBPB)
-----------------------------------------
After an IBPB, indirect branches may be predicted with targets
corresponding to direct branches which were executed prior to IBPB. This is
mitigated by a microcode update.

Add cmdline parameter indirect_target_selection=off|on|force to control the
mitigation to relocate the affected branches to an ITS-safe thunk i.e.
located in the upper half of cacheline. Also add the sysfs reporting.

When retpoline mitigation is deployed, ITS safe-thunks are not needed,
because retpoline sequence is already ITS-safe. Similarly, when call depth
tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return
thunk is not used, as CDT prevents RSB-underflow.

To not overcomplicate things, ITS mitigation is not supported with
spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy
lfence;jmp mitigation on ITS affected parts anyways.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09 13:22:05 -07:00
Pawan Gupta
1ac116ce64 Documentation: x86/bugs/its: Add ITS documentation
Add the admin-guide for Indirect Target Selection (ITS).

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09 13:22:04 -07:00
Zihuan Zhang
50c9bb30dc PM: hibernate: add configurable delay for pm_test
Turn the default 5 second test delay for hibernation into a
configurable module parameter, so users can determine how
long to wait in this pseudo-hibernate state before resuming
the system.

The configurable delay parameter has been added for suspend, so
add an analogous one for hibernation.

Example (wait 30 seconds);

  # echo 30 > /sys/module/hibernate/parameters/pm_test_delay
  # echo core > /sys/power/pm_test

Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20250507063520.419635-1-zhangzihuan@kylinos.cn
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-09 15:59:10 +02:00
Keke Li
f8953ee959 Documentation: media: Add documentation file c3-isp.rst
Add the file 'c3-isp.rst' that documents the c3-isp driver.

Signed-off-by: Keke Li <keke.li@amlogic.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-05-09 12:08:37 +02:00
Darrick J. Wong
4528b90527 xfs: allow sysadmins to specify a maximum atomic write limit at mount time
Introduce a mount option to allow sysadmins to specify the maximum size
of an atomic write.  If the filesystem can work with the supplied value,
that becomes the new guaranteed maximum.

The value mustn't be too big for the existing filesystem geometry (max
write size, max AG/rtgroup size).  We dynamically recompute the
tr_atomic_write transaction reservation based on the given block size,
check that the current log size isn't less than the new minimum log size
constraints, and set a new maximum.

The actual software atomic write max is still computed based off of
tr_atomic_ioend the same way it has for the past few commits.  Note also
that xfs_calc_atomic_write_log_geometry is non-static because mkfs will
need that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
2025-05-07 14:25:33 -07:00
Bagas Sanjaya
4804f5ad5d x86/cpu: Add "Old Microcode" docs to hw-vuln toctree
Sphinx reports missing toctree entry warning:

Documentation/admin-guide/hw-vuln/old_microcode.rst: WARNING: document isn't included in any toctree

Add entry for "Old Microcode" docs to fix the warning.

Fixes: 4e2c719782 ("x86/cpu: Help users notice when running old Intel microcode")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250502023358.14846-1-bagasdotme%40gmail.com
2025-05-02 11:33:35 -07:00
Damien Le Moal
9e4f11c122 Documentation: Document the new zoned loop block device driver
Introduce the zoned_loop.rst documentation file under
admin-guide/blockdev to document the zoned loop block device driver.
An overview of the driver is provided and its usage to create and delete
zoned devices described.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250407075222.170336-3-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-01 17:03:56 -06:00
Jakub Kicinski
337079d31f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc5).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01 15:11:38 -07:00
Arnd Bergmann
118c40b7b5 kbuild: require gcc-8 and binutils-2.30
Commit a3e8fe814a ("x86/build: Raise the minimum GCC version to 8.1")
raised the minimum compiler version as enforced by Kbuild to gcc-8.1
and clang-15 for x86.

This is actually the same gcc version that has been discussed as the
minimum for all architectures several times in the past, with little
objection. A previous concern was the kernel for SLE15-SP7 needing to
be built with gcc-7. As this ended up still using linux-6.4 and there
is no plan for an SP8, this is no longer a problem.

Change it for all architectures and adjust the documentation accordingly.
A few version checks can be removed in the process.  The binutils
version 2.30 is the lowest version used in combination with gcc-8 on
common distros, so use that as the corresponding minimum.

Link: https://lore.kernel.org/lkml/20240925150059.3955569-32-ardb+git@google.com/
Link: https://lore.kernel.org/lkml/871q7yxrgv.wl-tiwai@suse.de/
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-04-30 21:53:35 +02:00
Joel Savitz
c0fe189b59 docs: namespace: Tweak and reword resource control doc
Fix the document title and reword the phrasing to active voice.

Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Message-ID: <20250421161723.1138903-1-jsavitz@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-04-28 18:00:06 -06:00
Martin Tůma
1a27fce0fa docs: media: mgb4: Improve mgb4 driver documentation
Add some basic info about the HW/driver + contact info.

Signed-off-by: Martin Tůma <martin.tuma@digiteqautomotive.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-04-25 15:14:28 +02:00
Lukas Bulwahn
29d69273fe media: remove STA2x11 media pci driver
With commit dcbb01fbb7 ("x86/pci: Remove old STA2x11 support"), the
STA2X11 Video Input Port driver is not needed and cannot be built anymore.

Remove the driver and its reference in media documentation.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-04-25 15:14:25 +02:00
Alan Borzeszkowski
36f6f7e2d4 Documentation/admin-guide: Document Thunderbolt/USB4 tunneling events
Add documentation about the Thunderbolt/USB4 tunneling events to the
user’s and administrator’s guide.

Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2025-04-24 08:24:39 +03:00
Hans Holmberg
f0447f80ae xfs: remove duplicate Zoned Filesystems sections in admin-guide
Remove the duplicated section and while at it, turn spaces into tabs.

Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Fixes: c7b67ddc3c ("xfs: document zoned rt specifics in admin-guide")
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-22 16:05:24 +02:00
Dave Hansen
4e2c719782 x86/cpu: Help users notice when running old Intel microcode
Old microcode is bad for users and for kernel developers.

For users, it exposes them to known fixed security and/or functional
issues. These obviously rarely result in instant dumpster fires in
every environment. But it is as important to keep your microcode up
to date as it is to keep your kernel up to date.

Old microcode also makes kernels harder to debug. A developer looking
at an oops need to consider kernel bugs, known CPU issues and unknown
CPU issues as possible causes. If they know the microcode is up to
date, they can mostly eliminate known CPU issues as the cause.

Make it easier to tell if CPU microcode is out of date. Add a list
of released microcode. If the loaded microcode is older than the
release, tell users in a place that folks can find it:

	/sys/devices/system/cpu/vulnerabilities/old_microcode

Tell kernel kernel developers about it with the existing taint
flag:

	TAINT_CPU_OUT_OF_SPEC

== Discussion ==

When a user reports a potential kernel issue, it is very common
to ask them to reproduce the issue on mainline. Running mainline,
they will (independently from the distro) acquire a more up-to-date
microcode version list. If their microcode is old, they will
get a warning about the taint and kernel developers can take that
into consideration when debugging.

Just like any other entry in "vulnerabilities/", users are free to
make their own assessment of their exposure.

== Microcode Revision Discussion ==

The microcode versions in the table were generated from the Intel
microcode git repo:

	8ac9378a8487 ("microcode-20241112 Release")

which as of this writing lags behind the latest microcode-20250211.

It can be argued that the versions that the kernel picks to call "old"
should be a revision or two old. Which specific version is picked is
less important to me than picking *a* version and enforcing it.

This repository contains only microcode versions that Intel has deemed
to be OS-loadable. It is quite possible that the BIOS has loaded a
newer microcode than the latest in this repo. If this happens, the
system is considered to have new microcode, not old.

Specifically, the sysfs file and taint flag answer the question:

	Is the CPU running on the latest OS-loadable microcode,
	or something even later that the BIOS loaded?

In other words, Intel never publishes an authoritative list of CPUs
and latest microcode revisions. Until it does, this is the best that
Linux can do.

Also note that the "intel-ucode-defs.h" file is simple, ugly and
has lots of magic numbers. That's on purpose and should allow a
single file to be shared across lots of stable kernel regardless of if
they have the new "VFM" infrastructure or not. It was generated with
a dumb script.

== FAQ ==

Q: Does this tell me if my system is secure or insecure?
A: No. It only tells you if your microcode was old when the
   system booted.

Q: Should the kernel warn if the microcode list itself is too old?
A: No. New kernels will get new microcode lists, both mainline
   and stable. The only way to have an old list is to be running
   an old kernel in which case you have bigger problems.

Q: Is this for security or functional issues?
A: Both.

Q: If a given microcode update only has functional problems but
   no security issues, will it be considered old?
A: Yes. All microcode image versions within a microcode release
   are treated identically. Intel appears to make security
   updates without disclosing them in the release notes.  Thus,
   all updates are considered to be security-relevant.

Q: Who runs old microcode?
A: Anybody with an old distro. This happens all the time inside
   of Intel where there are lots of weird systems in labs that
   might not be getting regular distro updates and might also
   be running rather exotic microcode images.

Q: If I update my microcode after booting will it stop saying
   "Vulnerable"?
A: No. Just like all the other vulnerabilies, you need to
   reboot before the kernel will reassess your vulnerability.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "Ahmed S. Darwish" <darwi@linutronix.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/20250421195659.CF426C07%40davehans-spike.ostc.intel.com
(cherry picked from commit 9127865b15eb0a1bd05ad7efe29489c44394bdc1)
2025-04-22 08:33:52 +02:00
Jakub Kicinski
240ce924d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc3).

No conflicts. Adjacent changes:

tools/net/ynl/pyynl/ynl_gen_c.py
  4d07bbf2d4 ("tools: ynl-gen: don't declare loop iterator in place")
  7e8ba0c7de ("tools: ynl: don't use genlmsghdr in classic netlink")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-17 12:26:50 -07:00
Hans Holmberg
c7b67ddc3c xfs: document zoned rt specifics in admin-guide
Document the lifetime, nolifetime and max_open_zones mount options
added for zoned rt file systems.

Also add documentation describing the max_open_zones sysfs attribute
exposed in /sys/fs/xfs/<dev>/zoned/

Fixes: 4e4d520755 ("xfs: add the zoned space allocator")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-17 08:16:59 +02:00
Artem Bityutskiy
af3a1b6a18 Documentation: admin-guide: pm: Document intel_idle C1 demotion
Document the intel_idle driver sysfs file for enabling/disabling C1
demotion.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20250317135541.1471754-3-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-04-15 20:38:26 +02:00
Randy Dunlap
e54ac58667 cpufreq: editing corrections to cpufreq.rst
Change a few words and abbreviations/punctuation.

Change one echo command to include a trailing '`'.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250405001447.4039463-1-rdunlap@infradead.org
2025-04-14 10:29:32 -06:00
James Addison
3fa3b20ba1 docs: Disambiguate a pair of rST labels
According to the reStructuredText documentation, internal hyperlink
targets[1] are intended to resolve within the current document.

Sphinx has a bug that causes internal hyperlinks declared with
duplicate names to resolve nondeterministically, producing incorrect
documentation. Sphinx does not yet emit a warning when these
duplicate target names are declared.

To improve the reproducibility and correctness of the HTML
documentation, disambiguate two labels both previously titled
"submit_improvements".

[1] - https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#hyperlink-targets

Link: https://github.com/sphinx-doc/sphinx/issues/13383
Signed-off-by: James Addison <jay@jp-hosting.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250407195120.331103-2-jvanderwaa@redhat.com
2025-04-14 10:22:46 -06:00
Hans Holmberg
845abeb1f0 xfs: add tunable threshold parameter for triggering zone GC
Presently we start garbage collection late - when we start running
out of free zones to backfill max_open_zones. This is a reasonable
default as it minimizes write amplification. The longer we wait,
the more blocks are invalidated and reclaim cost less in terms
of blocks to relocate.

Starting this late however introduces a risk of GC being outcompeted
by user writes. If GC can't keep up, user writes will be forced to
wait for free zones with high tail latencies as a result.

This is not a problem under normal circumstances, but if fragmentation
is bad and user write pressure is high (multiple full-throttle
writers) we will "bottom out" of free zones.

To mitigate this, introduce a zonegc_low_space tunable that lets the
user specify a percentage of how much of the unused space that GC
should keep available for writing. A high value will reclaim more of
the space occupied by unused blocks, creating a larger buffer against
write bursts.

This comes at a cost as write amplification is increased. To
illustrate this using a sample workload, setting zonegc_low_space to
60% avoids high (500ms) max latencies while increasing write
amplification by 15%.

Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-14 10:41:33 +02:00
Kuniyuki Iwashima
2a63dd0edf net: Retire DCCP socket.
DCCP was orphaned in 2021 by commit 054c4610bd ("MAINTAINERS: dccp:
move Gerrit Renker to CREDITS"), which noted that the last maintainer
had been inactive for five years.

In recent years, it has become a playground for syzbot, and most changes
to DCCP have been odd bug fixes triggered by syzbot.  Apart from that,
the only changes have been driven by treewide or networking API updates
or adjustments related to TCP.

Thus, in 2023, we announced we would remove DCCP in 2025 via commit
b144fcaf46 ("dccp: Print deprecation notice.").

Since then, only one individual has contacted the netdev mailing list. [0]

There is ongoing research for Multipath DCCP.  The repository is hosted
on GitHub [1], and development is not taking place through the upstream
community.  While the repository is published under the GPLv2 license,
the scheduling part remains proprietary, with a LICENSE file [2] stating:

  "This is not Open Source software."

The researcher mentioned a plan to address the licensing issue, upstream
the patches, and step up as a maintainer, but there has been no further
communication since then.

Maintaining DCCP for a decade without any real users has become a burden.

Therefore, it's time to remove it.

Removing DCCP will also provide significant benefits to TCP.  It allows
us to freely reorganize the layout of struct inet_connection_sock, which
is currently shared with DCCP, and optimize it to reduce the number of
cachelines accessed in the TCP fast path.

Note that we keep DCCP netfilter modules as requested.  [3]

Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u #[0]
Link: https://github.com/telekom/mp-dccp #[1]
Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE #[2]
Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/ #[3]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-11 18:58:10 -07:00
Kurt Borja
3e48767ab5
Documentation: admin-guide: laptops: Add documentation for alienware-wmi
Add driver admin-guide documentation for the alienware-wmi driver.

Reviewed-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250329-hwm-v7-11-a14ea39d8a94@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-04-11 13:10:16 +03:00
Koichiro Den
10f94d092b Documentation: gpio: document configfs interface for gpio-aggregator
Add documentation for the newly added configfs-based interface for GPIO
aggregator.

Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Link: https://lore.kernel.org/r/20250407043019.4105613-9-koichiro.den@canonical.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-04-09 16:57:29 +02:00
Josh Poimboeuf
83f6665a49 x86/bugs: Add RSB mitigation document
Create a document to summarize hard-earned knowledge about RSB-related
mitigations, with references, and replace the overly verbose yet
incomplete comments with a reference to the document.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/ab73f4659ba697a974759f07befd41ae605e33dd.1744148254.git.jpoimboe@kernel.org
2025-04-09 12:42:09 +02:00
Andy Shevchenko
996457176b x86/early_printk: Use 'mmio32' for consistency, fix comments
First of all, using 'mmio' prevents proper implementation of 8-bit accessors.
Second, it's simply inconsistent with uart8250 set of options. Rename it to
'mmio32'. While at it, remove rather misleading comment in the documentation.
From now on mmio32 is self-explanatory and pciserial supports not only 32-bit
MMIO accessors.

Also, while at it, fix the comment for the "pciserial" case. The comment
seems to be a copy'n'paste error when mentioning "serial" instead of
"pciserial" (with double quotes). Fix this.

With that, move it upper, so we don't calculate 'buf' twice.

Fixes: 3181424aea ("x86/early_printk: Add support for MMIO-based UARTs")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Denis Mukhin <dmukhin@ford.com>
Link: https://lore.kernel.org/r/20250407172214.792745-1-andriy.shevchenko@linux.intel.com
2025-04-09 12:27:08 +02:00
Michal Koutný
e34e0131fe sched: Add commadline option for RT_GROUP_SCHED toggling
Only simple implementation with a static key wrapper, it will be wired
in later.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310170442.504716-5-mkoutny@suse.com
2025-04-08 20:55:53 +02:00
Matthew Wilcox (Oracle)
6b0dfabb35
fs: Remove aops->writepage
All callers and implementations are now removed, so remove the operation
and update the documentation to match.

Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250402150005.2309458-10-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:36:50 +02:00
Linus Torvalds
dd9db3bff8 more s390 updates for 6.15 merge window
- Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
   missing base register for relocated lowcore address
 
 - Fix build failure on older linkers by conditionally adding the -no-pie
   linker option only when it is supported
 
 - Fix inaccurate kernel messages in vfio-ap by providing descriptive
   error notifications for AP queue sharing violations
 
 - Fix PCI isolation logic by ensuring non-VF devices correctly return
   false in zpci_bus_is_isolated_vf()
 
 - Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
   proper sentinel element, preventing potential overruns and translation
   errors
 
 - Cleanup header dependency problems with asm-offsets.c
 
 - Add fault info for unexpected low-address protection faults in user mode
 
 - Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
   handling with common code handling
 
 - Use bitop functions to implement CPU flag helper functions to ensure
   that bits cannot get lost if modified in different contexts on a CPU
 
 - Remove unused machine_flags for the lowcore
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmfwbhwACgkQjYWKoQLX
 FBgmlwf5ASFl51yYziSg47NTOdCgkdE1un69VKCU7g9nzQBCFiK4PT+H/d69JHpt
 g9dHi4sW6aXooUWyqtrQbsDZSnGKvJd48wOkpqAKKKJCBx3On/7s1fi1sJXgJovo
 rSwC1cRD+yfFDNoQgXytSrRAkoPYvIPxf0aQei/8ziVNOrl7iF7nyPYpgNcZl2iz
 rQWt+o4oWhygs6lK4w9t+0V0QLL+CqTIN/tpf7qeQzNFN5WfRJBn0dtxQhK72enG
 WepU41LnZxDF5DTKhH4+PZFdLPP+YwwTGdqojW28leGy5T3/Eg0wrDaU1034Wpgd
 m5Fg03z94Vdul/rAEscaH1PLT3Ufng==
 =G5/g
 -----END PGP SIGNATURE-----

Merge tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
   missing base register for relocated lowcore address

 - Fix build failure on older linkers by conditionally adding the
   -no-pie linker option only when it is supported

 - Fix inaccurate kernel messages in vfio-ap by providing descriptive
   error notifications for AP queue sharing violations

 - Fix PCI isolation logic by ensuring non-VF devices correctly return
   false in zpci_bus_is_isolated_vf()

 - Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
   proper sentinel element, preventing potential overruns and
   translation errors

 - Cleanup header dependency problems with asm-offsets.c

 - Add fault info for unexpected low-address protection faults in user
   mode

 - Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
   handling with common code handling

 - Use bitop functions to implement CPU flag helper functions to ensure
   that bits cannot get lost if modified in different contexts on a CPU

 - Remove unused machine_flags for the lowcore

* tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
  s390/pci: Fix dev.dma_range_map missing sentinel element
  s390/mm: Dump fault info in case of low address protection fault
  s390/smp: Add support for HOTPLUG_SMT
  s390: Fix linker error when -no-pie option is unavailable
  s390/processor: Use bitop functions for cpu flag helper functions
  s390/asm-offsets: Remove ASM_OFFSETS_C
  s390/asm-offsets: Include ftrace_regs.h instead of ftrace.h
  s390/kvm: Split kvm_host header file
  s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
  s390/lowcore: Remove unused machine_flags
  s390/entry: Fix setting _CIF_MCCK_GUEST with lowcore relocation
2025-04-04 16:58:34 -07:00
Linus Torvalds
4a1d8ababd RISC-V Patches for the 6.15 Merge Window, Part 1
* The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed.
 * The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings.
 * Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling.
 * Support for relocatable !MMU kernels builds.
 * Support for hpge pfnmaps, which should improve TLB utilization.
 * Support for runtime constants, which improves the d_hash()
   performance.
 * Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm.
 * Various fixes, including:
       - We were missing a secondary mmu notifier call when flushing the
 	tlb which is required for IOMMU.
       - Fix ftrace panics by saving the registers as expected by ftrace.
       - Fix a couple of stimecmp usage related to cpu hotplug.
       - purgatory_start is now aligned as per the STVEC requirements.
       - A fix for hugetlb when calculating the size of non-present PTEs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmfv/soTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYierZEACDwI9lJFCEbQPon3z8rAy1moTj0+AZ
 bMfZFqMphUTrJ0cMm2+Bc+XZgck12zHCyu1UljDcZVYMCHA9aOoj5C5NkBBVLCuL
 uLYrhIoQXtJaVIANiFl0SHAZmh2s2OoSgmUzrEZ8JGlHpKCF7EVX5bHEsOvzn9ir
 B2W992W6q3ISuKXHKsTpa7rmTtf7swGYg6zW3pX3l6HmY+EMEQOcQl0tAB383J/T
 lm0K4+YvLpRJdm2ARpNGWlcFXj9/UXUM5hplK3aBAHpPKQ5/83/4tMDsfRvhpEVC
 VJXNgK+H4XLD542aQ8d4ZROguyhwn9e2n6Dkv0OqfNk4lg5pUBcJUZftQ+rB7AWg
 VYB1KVpxhwcruheXJFz8S3EzjZTcS+JrcD80vvx8JmHdXkZwHTfYUgiFwe/TR7yr
 b518fEbXpVwDZiCbaAe3Cmpw0mlNnSVmU4hgNbiwt0fu9DGdPN9WQbyds68RKb7A
 TWwDmmD6kV2BTWl0mHPtu9VhX58CDG+0WYbHA7r82p2T50187766C92GYfN2UPpz
 lH0iMRDkmucclZ3fEoosJ+HsDntc4oe6Bhdzuj52Q7vBpDd/QB6t5cfrlDpEEdgU
 3qoWMN5mb5l1rbvrqENh5ZgmEpzV8K0R5F5quiXh/9wO0y1kepDslTqC2oXK/m0p
 DzsvvD6UnNMOUQ==
 =nCJo
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed

 - The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings

 - Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling

 - Support for relocatable !MMU kernels builds

 - Support for hpge pfnmaps, which should improve TLB utilization

 - Support for runtime constants, which improves the d_hash()
   performance

 - Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm

 - Various fixes, including:
      - We were missing a secondary mmu notifier call when flushing the
        tlb which is required for IOMMU
      - Fix ftrace panics by saving the registers as expected by ftrace
      - Fix a couple of stimecmp usage related to cpu hotplug
      - purgatory_start is now aligned as per the STVEC requirements
      - A fix for hugetlb when calculating the size of non-present PTEs

* tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (65 commits)
  riscv: Add norvc after .option arch in runtime const
  riscv: Make sure toolchain supports zba before using zba instructions
  riscv/purgatory: 4B align purgatory_start
  riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
  selftests: riscv: fix v_exec_initval_nolibc.c
  riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
  riscv: print hartid on bringup
  riscv: Add norvc after .option arch in runtime const
  riscv: Remove CONFIG_PAGE_OFFSET
  riscv: Support CONFIG_RELOCATABLE on riscv32
  asm-generic: Always define Elf_Rel and Elf_Rela
  riscv: Support CONFIG_RELOCATABLE on NOMMU
  riscv: Allow NOMMU kernels to access all of RAM
  riscv: Remove duplicate CONFIG_PAGE_OFFSET definition
  RISC-V: errata: Use medany for relocatable builds
  dt-bindings: riscv: document vector crypto requirements
  dt-bindings: riscv: add vector sub-extension dependencies
  dt-bindings: riscv: d requires f
  RISC-V: add f & d extension validation checks
  RISC-V: add vector crypto extension validation checks
  ...
2025-04-04 09:49:17 -07:00
Linus Torvalds
6cb0bd94c0 Persistent buffer cleanups and simplifications for v6.15:
It was mistaken that the physical memory returned from "reserve_mem" had to
 be vmap()'d to get to it from a virtual address. But reserve_mem already
 maps the memory to the virtual address of the kernel so a simple
 phys_to_virt() can be used to get to the virtual address from the physical
 memory returned by "reserve_mem". With this new found knowledge, the
 code can be cleaned up and simplified.
 
 - Enforce that the persistent memory is page aligned
 
   As the buffers using the persistent memory are all going to be
   mapped via pages, make sure that the memory given to the tracing
   infrastructure is page aligned. If it is not, it will print a warning
   and fail to map the buffer.
 
 - Use phys_to_virt() to get the virtual address from reserve_mem
 
   Instead of calling vmap() on the physical memory returned from
   "reserve_mem", use phys_to_virt() instead.
 
   As the memory returned by "memmap" or any other means where a physical
   address is given to the tracing infrastructure, it still needs to
   be vmap(). Since this memory can never be returned back to the buddy
   allocator nor should it ever be memmory mapped to user space, flag
   this buffer and up the ref count. The ref count will keep it from
   ever being freed, and the flag will prevent it from ever being memory
   mapped to user space.
 
 - Use vmap_page_range() for memmap virtual address mapping
 
   For the memmap buffer, instead of allocating an array of struct pages,
   assigning them to the contiguous phsycial memory and then passing that to
   vmap(), use vmap_page_range() instead
 
 - Replace flush_dcache_folio() with flush_kernel_vmap_range()
 
   Instead of calling virt_to_folio() and passing that to
   flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
   This also fixes a bug where if a subbuffer was bigger than PAGE_SIZE
   only the PAGE_SIZE portion would be flushed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+6oZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhq6AP481KHAgaowQCg7zrKPkMlbYBIigYoU
 7aqoAg2rSLBRSQEAl8fViHZgZ9Q+O7xdozQWiIR7/KQW8VIaTcP/V7cHkAU=
 =+5JB
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:
 "Persistent buffer cleanups and simplifications.

  It was mistaken that the physical memory returned from "reserve_mem"
  had to be vmap()'d to get to it from a virtual address. But
  reserve_mem already maps the memory to the virtual address of the
  kernel so a simple phys_to_virt() can be used to get to the virtual
  address from the physical memory returned by "reserve_mem". With this
  new found knowledge, the code can be cleaned up and simplified.

   - Enforce that the persistent memory is page aligned

     As the buffers using the persistent memory are all going to be
     mapped via pages, make sure that the memory given to the tracing
     infrastructure is page aligned. If it is not, it will print a
     warning and fail to map the buffer.

   - Use phys_to_virt() to get the virtual address from reserve_mem

     Instead of calling vmap() on the physical memory returned from
     "reserve_mem", use phys_to_virt() instead.

     As the memory returned by "memmap" or any other means where a
     physical address is given to the tracing infrastructure, it still
     needs to be vmap(). Since this memory can never be returned back to
     the buddy allocator nor should it ever be memmory mapped to user
     space, flag this buffer and up the ref count. The ref count will
     keep it from ever being freed, and the flag will prevent it from
     ever being memory mapped to user space.

   - Use vmap_page_range() for memmap virtual address mapping

     For the memmap buffer, instead of allocating an array of struct
     pages, assigning them to the contiguous phsycial memory and then
     passing that to vmap(), use vmap_page_range() instead

   - Replace flush_dcache_folio() with flush_kernel_vmap_range()

     Instead of calling virt_to_folio() and passing that to
     flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
     This also fixes a bug where if a subbuffer was bigger than
     PAGE_SIZE only the PAGE_SIZE portion would be flushed"

* tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
  tracing: Use vmap_page_range() to map memmap ring buffer
  tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
  tracing: Enforce the persistent ring buffer to be page aligned
2025-04-03 16:09:29 -07:00
Linus Torvalds
5014bebee0 - dm-crypt: switch to using the crc32 library
- dm-verity, dm-integrity, dm-crypt: documentation improvement
 
 - dm-vdo fixes
 
 - dm-stripe: enable inline crypto passthrough
 
 - dm-integrity: set ti->error on memory allocation failure
 
 - dm-bufio: remove unused return value
 
 - dm-verity: do forward error correction on metadata I/O errors
 
 - dm: fix unconditional IO throttle caused by REQ_PREFLUSH
 
 - dm cache: prevent BUG_ON by blocking retries on failed device resumes
 
 - dm cache: support shrinking the origin device
 
 - dm: restrict dm device size to 2^63-512 bytes
 
 - dm-delay: support zoned devices
 
 - dm-verity: support block number limits for different ioprio classes
 
 - dm-integrity: fix non-constant-time tag verification (security bug)
 
 - dm-verity, dm-ebs: fix prefetch-vs-suspend race
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCZ+u7shQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbZ0JAQDVhbl77u9jjPWjxJvFodMAqw+KPXGC
 MNzkyzG0lu7oPAEA33vt5pHQtr7F3SJj/sDBuZ+rb5bvUtgxeGqpJOQpTAk=
 =tj00
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - dm-crypt: switch to using the crc32 library

 - dm-verity, dm-integrity, dm-crypt: documentation improvement

 - dm-vdo fixes

 - dm-stripe: enable inline crypto passthrough

 - dm-integrity: set ti->error on memory allocation failure

 - dm-bufio: remove unused return value

 - dm-verity: do forward error correction on metadata I/O errors

 - dm: fix unconditional IO throttle caused by REQ_PREFLUSH

 - dm cache: prevent BUG_ON by blocking retries on failed device resumes

 - dm cache: support shrinking the origin device

 - dm: restrict dm device size to 2^63-512 bytes

 - dm-delay: support zoned devices

 - dm-verity: support block number limits for different ioprio classes

 - dm-integrity: fix non-constant-time tag verification (security bug)

 - dm-verity, dm-ebs: fix prefetch-vs-suspend race

* tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits)
  dm-ebs: fix prefetch-vs-suspend race
  dm-verity: fix prefetch-vs-suspend race
  dm-integrity: fix non-constant-time tag verification
  dm-verity: support block number limits for different ioprio classes
  dm-delay: support zoned devices
  dm: restrict dm device size to 2^63-512 bytes
  dm cache: support shrinking the origin device
  dm cache: prevent BUG_ON by blocking retries on failed device resumes
  dm vdo indexer: reorder uds_request to reduce padding
  dm: fix unconditional IO throttle caused by REQ_PREFLUSH
  dm vdo: rework processing of loaded refcount byte arrays
  dm vdo: remove remaining ring references
  dm-verity: do forward error correction on metadata I/O errors
  dm-bufio: remove unused return value
  dm-integrity: set ti->error on memory allocation failure
  dm: Enable inline crypto passthrough for striped target
  dm vdo slab-depot: read refcount blocks in large chunks at load time
  dm vdo vio-pool: allow variable-sized metadata vios
  dm vdo vio-pool: support pools with multiple data blocks per vio
  dm vdo vio-pool: add a pool pointer to pooled_vio
  ...
2025-04-02 21:27:59 -07:00
Linus Torvalds
5e17b5c717 fuse update for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ+vB5QAKCRDh3BK/laaZ
 PGA2AQCVsyLmZFinaNC10S+Bkmx+a7f9MLhX6u+ILbmio8nT1AD7BKCDFD9pucG0
 pilz+OaCXjXt/og6doyugM4SW/Q3tA0=
 =BhLT
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Allow connection to server to time out (Joanne Koong)

 - If server doesn't support creating a hard link, return EPERM rather
   than ENOSYS (Matt Johnston)

 - Allow file names longer than 1024 chars (Bernd Schubert)

 - Fix a possible race if request on io_uring queue is interrupted
   (Bernd Schubert)

 - Misc fixes and cleanups

* tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unneeded atomic set in uring creation
  fuse: fix uring race condition for null dereference of fc
  fuse: Increase FUSE_NAME_MAX to PATH_MAX
  fuse: Allocate only namelen buf memory in fuse_notify_
  fuse: add default_request_timeout and max_request_timeout sysctls
  fuse: add kernel-enforced timeout option for requests
  fuse: optmize missing FUSE_LINK support
  fuse: Return EPERM rather than ENOSYS from link()
  fuse: removed unused function fuse_uring_create() from header
  fuse: {io-uring} Fix a possible req cancellation race
2025-04-02 16:36:59 -07:00
Steven Rostedt
c44a14f216 tracing: Enforce the persistent ring buffer to be page aligned
Enforce that the address and the size of the memory used by the persistent
ring buffer is page aligned. Also update the documentation to reflect this
requirement.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.412882844@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Linus Torvalds
d6b02199cd - The 7 patch series "powerpc/crash: use generic crashkernel
reservation" from Sourabh Jain changes powerpc's kexec code to use more
   of the generic layers.
 
 - The 2 patch series "get_maintainer: report subsystem status
   separately" from Vlastimil Babka makes some long-requested improvements
   to the get_maintainer output.
 
 - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the ucount
   code.
 
 - The 12 patch series "reboot: support runtime configuration of
   emergency hw_protection action" from Ahmad Fatoum improves the ability
   for a driver to perform an emergency system shutdown or reboot.
 
 - The 16 patch series "Converge on using secs_to_jiffies() part two"
   from Easwar Hariharan performs further migrations from
   msecs_to_jiffies() to secs_to_jiffies().
 
 - The 7 patch series "lib/interval_tree: add some test cases and
   cleanup" from Wei Yang permits more userspace testing of kernel library
   code, adds some more tests and performs some cleanups.
 
 - The 2 patch series "hung_task: Dump the blocking task stacktrace" from
   Masami Hiramatsu arranges for the hung_task detector to dump the stack
   of the blocking task and not just that of the blocked task.
 
 - The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
   Andy Shevchenko provides some cleanups to the resource definition
   macros.
 
 - Plus the usual shower of singleton patches - please see the individual
   changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
 jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
 r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
 =Kii8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Linus Torvalds
eb0ece1602 - The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.
 
   This has caused a small amount of fallout - two or three issues were
   reported.  In all cases the calling code was founf to be incorrect.
 
 - The 4 patch series "Some cleanup for memcg" from Chen Ridong
   implements some relatively monir cleanups for the memcontrol code.
 
 - The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
   from David Hildenbrand fixes a boatload of issues which David found then
   using device-exclusive PTE entries when THP is enabled.  More work is
   needed, but this makes thins better - our own HMM selftests now succeed.
 
 - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
   Ahmed remove the z3fold and zbud implementations.  They have been
   deprecated for half a year and nobody has complained.
 
 - The 5 patch series "mm: further simplify VMA merge operation" from
   Lorenzo Stoakes implements numerous simplifications in this area.  No
   runtime effects are anticipated.
 
 - The 4 patch series "mm/madvise: remove redundant mmap_lock operations
   from process_madvise()" from SeongJae Park rationalizes the locking in
   the madvise() implementation.  Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.
 
 - The 12 patch series "Tiny cleanup and improvements about SWAP code"
   from Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.
 
 - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak user-visible
   output.
 
 - The 2 patch series "mm/damon/paddr: fix large folios access and
   schemes handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.
 
 - The 3 patch series "mm/damon/core: fix wrong and/or useless
   damos_walk() behaviors" from SeongJae Park fixes a few issues with the
   accuracy of kdamond's walking of DAMON regions.
 
 - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
   Lorenzo Stoakes changes the interaction between framebuffer deferred-io
   and core MM.  No functional changes are anticipated - this is
   preparatory work for the future removal of page structure fields.
 
 - The 4 patch series "mm/damon: add support for hugepage_size DAMOS
   filter" from Usama Arif adds a DAMOS filter which permits the filtering
   by huge page sizes.
 
 - The 4 patch series "mm: permit guard regions for file-backed/shmem
   mappings" from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state.  The feature now covers shmem and
   file-backed mappings.
 
 - The 4 patch series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping for
   pte-mapped large folios.
 
 - The 18 patch series "reimplement per-vma lock as a refcount" from
   Suren Baghdasaryan puts the vm_lock back into the vma.  Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy.  This patchset provides small (0-10%) improvements on one
   microbenchmark.
 
 - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
   fixes and improves" from SeongJae Park does some maintenance work on the
   DAMON docs.
 
 - The 27 patch series "hugetlb/CMA improvements for large systems" from
   Frank van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
   pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.
 
 - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.
 
 - The 12 patch series "selftests/mm: Some cleanups from trying to run
   them" from Brendan Jackman fixes a pile of unrelated issues which
   Brendan encountered while runnimg our selftests.
 
 - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
   from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.
 
 - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply wasn't
   being effective.
 
 - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
   from David Hildenbrand implements a number of unrelated cleanups in this
   code.
 
 - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
   Khandual implements a number of preparatoty cleanups to the
   GENERIC_PTDUMP Kconfig logic.
 
 - The 8 patch series "mm/damon: auto-tune aggregation interval" from
   SeongJae Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.
 
 - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
   issues in powerpc, sparc and x86 lazy MMU implementations.  Ryan did
   this in preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.
 
 - The 2 patch series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the code
   easier to follow.
 
 - The 3 patch series "page_counter cleanup and size reduction" from
   Shakeel Butt cleans up the page_counter code and fixes a size increase
   which we accidentally added late last year.
 
 - The 3 patch series "Add a command line option that enables control of
   how many threads should be used to allocate huge pages" from Thomas
   Prescher does that.  It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.
 
 - The 3 patch series "Fix calculations in trace_balance_dirty_pages()
   for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.
 
 - The 9 patch series "mm/damon: make allow filters after reject filters
   useful and intuitive" from SeongJae Park improves the handling of allow
   and reject filters.  Behaviour is made more consistent and the
   documention is updated accordingly.
 
 - The 5 patch series "Switch zswap to object read/write APIs" from Yosry
   Ahmed updates zswap to the new object read/write APIs and thus permits
   the removal of some legacy code from zpool and zsmalloc.
 
 - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
   does as it claims.
 
 - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
   from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.
 
 - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
   is a preparatoty refactoring and cleanup of the mremap() code.
 
 - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
   + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.
 
 - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
   filters based on handling layers" from SeongJae Park adds a couple of
   new sysfs directories to ease the management of DAMON/DAMOS filters.
 
 - The 13 patch series "arch, mm: reduce code duplication in mem_init()"
   from Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.
 
 - The 13 patch series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.
 
 - The 3 patch series "mm: page_ext: Introduce new iteration API" from
   Luiz Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.
 
 - The 8 patch series "Buddy allocator like (or non-uniform) folio split"
   from Zi Yan reworks the code to split a folio into smaller folios.  The
   main benefit is lessened memory consumption: fewer post-split folios are
   generated.
 
 - The 2 patch series "Minimize xa_node allocation during xarry split"
   from Zi Yan reduces the number of xarray xa_nodes which are generated
   during an xarray split.
 
 - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.
 
 - The 3 patch series "Add tracepoints for lowmem reserves, watermarks
   and totalreserve_pages" from Martin Liu adds some more tracepoints to
   the page allocator code.
 
 - The 4 patch series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which SeongJae
   observed during his earlier madvise work.
 
 - The 3 patch series "mm/hwpoison: Fix regressions in memory failure
   handling" from Shuai Xue addresses two quite serious regressions which
   Shuai has observed in the memory-failure implementation.
 
 - The 5 patch series "mm: reliable huge page allocator" from Johannes
   Weiner makes huge page allocations cheaper and more reliable by reducing
   fragmentation.
 
 - The 5 patch series "Minor memcg cleanups & prep for memdescs" from
   Matthew Wilcox is preparatory work for the future implementation of
   memdescs.
 
 - The 4 patch series "track memory used by balloon drivers" from Nico
   Pache introduces a way to track memory used by our various balloon
   drivers.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for active
   pages" from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.
 
 - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
   Hao Jia separates the proactive reclaim statistics from the direct
   reclaim statistics.
 
 - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
   from Jinjiang Tu fixes our handling of hwpoisoned pages within the
   reclaim code.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
 jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
 Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
 =Pn2K
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Joanne Koong
9b17cb59a7 fuse: add default_request_timeout and max_request_timeout sysctls
Introduce two new sysctls, "default_request_timeout" and
"max_request_timeout". These control how long (in seconds) a server can
take to reply to a request. If the server does not reply by the timeout,
then the connection will be aborted. The upper bound on these sysctl
values is 65535.

"default_request_timeout" sets the default timeout if no timeout is
specified by the fuse server on mount. 0 (default) indicates no default
timeout should be enforced. If the server did specify a timeout, then
default_request_timeout will be ignored.

"max_request_timeout" sets the max amount of time the server may take to
reply to a request. 0 (default) indicates no maximum timeout. If
max_request_timeout is set and the fuse server attempts to set a
timeout greater than max_request_timeout, the system will use
max_request_timeout as the timeout. Similarly, if default_request_timeout
is greater than max_request_timeout, the system will use
max_request_timeout as the timeout. If the server does not request a
timeout and default_request_timeout is set to 0 but max_request_timeout
is set, then the timeout will be max_request_timeout.

Please note that these timeouts are not 100% precise. The request may
take roughly an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the set max
timeout due to how it's internally implemented.

$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 0

$ echo 65536 | sudo tee /proc/sys/fs/fuse/default_request_timeout
tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument

$ echo 65535 | sudo tee /proc/sys/fs/fuse/default_request_timeout
65535

$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 65535

$ echo 0 | sudo tee /proc/sys/fs/fuse/default_request_timeout
0

$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 0

[Luis Henriques: Limit the timeout to the range [FUSE_TIMEOUT_TIMER_FREQ,
fuse_max_req_timeout]]

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Luis Henriques <luis@igalia.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31 14:59:27 +02:00
Heiko Carstens
1018424ace s390/smp: Add support for HOTPLUG_SMT
Add support for HOTPLUG_SMT. With this the s390 specific "nosmt" kernel
command line parameter handling is replaced with common code handling.

This means that just specifying "nosmt" still enables smt from an
architectural point of view, however only the primary (base) cpu can be set
online. Enabling smt during runtime via /sys/devices/system/cpu/smt/control
allows to set secondary cpus online. This way "nosmt" works like on other
architectures where enabling and disabling smt during runtime is possible.

If "nosmt=force" is specified smt is also still enabled from an
architectural point of view, but there is no way to set secondary cpus
online during runtime, also like on other architectures.

In order to disable smt from architectural point of view, which was
previously achieved with the s390 specific "nosmt" command line option,
"smt=1" can be used.

Tested-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-31 12:20:39 +02:00
Linus Torvalds
7405c0f01a Miscellaneous x86 fixes and updates:
- Fix a large number of x86 Kconfig dependency and help text accuracy
    bugs/problems, by Mateusz Jończyk and David Heideberg.
 
  - Fix a VM_PAT interaction with fork() crash. This also touches
    core kernel code.
 
  - Fix an ORC unwinder bug for interrupt entries
 
  - Fixes and cleanups.
 
  - Fix an AMD microcode loader bug that can promote verification failures
    into success.
 
  - Add early-printk support for MMIO based UARTs on an x86 board that
    had no other serial debugging facility and also experienced early
    boot crashes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfnFBERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iVDxAAmiB4soT3/WbaWJJdeVyxEL7sOmUNOm04
 5kAVHJVK8QGdje0eWa6h7xmuQD3UOxafE2coCrOxHhZi2qpAAY6CPIIy6oIBRwZK
 gLgT5xn1CHojfm4UFC3YUOyecBRPUF2C5jfkajWdZHumyPP/sOObqvGanpQRAYd5
 bfPHEvrBpeEeS7WkATCdyF2j+I5xYflD4g/MDAsMmqasQHOnjBuFX5VBeVxxkysC
 dMsFkFpxqcA95MnnyOnxXzgOtRTY0UystX07D3Bk1pqhG9zor+mp8OynsTRCU87T
 ZPPbUr2qACNmCqEEXl+F1mAkgj5H66xE2gaJdYx0/jBAIbX8Nwih7mMxhJShVU07
 Lhc0tukmVrDoDaVIr2HsxqI8iokuYLszUjDAqEQmQDrgelL6usPYghN1b2bDSJ9r
 0hCO/s79024H/U9oMrC+CF52D5UH/fE98ipigrbKRIO/hOsoxiiniF3DG2NVWZM2
 n5nPnOdbperqjCEteN1nxQfr7XZkvP95Bwmuqqc90XH+tzKJdHruUkbm4ua7NEEz
 WKgsUIYFjeN5ZrHbJaNtHlQueTyvsyGmL1nlaLi/MaJbSXPsM/WfwvHsaKTh3NrE
 BFwEAhMZVLDHEfnFT0Ev7Mm1MGpW8MbHoRBR1+E5FWWNS4X0yGLKXWRp8diw25Tm
 W3ZVsn65E6U=
 =/qKX
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes and updates from Ingo Molnar:

 - Fix a large number of x86 Kconfig dependency and help text accuracy
   bugs/problems, by Mateusz Jończyk and David Heideberg

 - Fix a VM_PAT interaction with fork() crash. This also touches core
   kernel code

 - Fix an ORC unwinder bug for interrupt entries

 - Fixes and cleanups

 - Fix an AMD microcode loader bug that can promote verification
   failures into success

 - Add early-printk support for MMIO based UARTs on an x86 board that
   had no other serial debugging facility and also experienced early
   boot crashes

* tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
  x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
  x86/fpu: Update the outdated comment above fpstate_init_user()
  x86/early_printk: Add support for MMIO-based UARTs
  x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment
  x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1
  x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text
  x86/Kconfig: Correct X86_X2APIC help text
  x86/speculation: Remove the extra #ifdef around CALL_NOSPEC
  x86/Kconfig: Document release year of glibc 2.3.3
  x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32
  x86/Kconfig: Document CONFIG_PCI_MMCONFIG
  x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM
  x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together
  x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE
  x86/Kconfig: Enable X86_X2APIC by default and improve help text
2025-03-30 15:25:15 -07:00
Linus Torvalds
0ccff074d6 fwctl first pull request
fwctl is a new subsystem intended to bring some common rules and order to
 the growing pattern of exposing a secure FW interface directly to
 userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
 exposing a device for datapath operations fwctl is focused on debugging,
 configuration and provisioning of the device. It will not have the
 necessary features like interrupt delivery to support a datapath.
 
 This concept is similar to the long standing practice in the "HW" RAID
 space of having a device specific misc device to manage the RAID
 controller FW. fwctl generalizes this notion of a companion debug and
 management interface that goes along with a dataplane implemented in an
 appropriate subsystem.
 
 There have been three LWN articles written discussing various aspects of
 this:
 
  https://lwn.net/Articles/955001/
  https://lwn.net/Articles/969383/
  https://lwn.net/Articles/990802/
 
 This pull requests includes three drivers to launch the subsystem:
 
  - CXL provides a vendor scheme for executing commands and a way to learn
    the 'command effects' (ie the security properties) of such
    commands. The fwctl driver allows access to these mechanism within the
    fwctl security model
 
  - mlx5 is family of networking products, the driver supports all current
    Mellanox HW still receiving FW feature updates. This includes RDMA
    multiprotocol NICs like ConnectX and the Bluefield family of Smart
    NICs.
 
  - AMD/Pensando Distributed Services card is a multi protocol Smart NIC
    with a multi PCI function design. fwctl works on the management PCI
    function following a 'command effects' model similar to CXL.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ939zQAKCRCFwuHvBreF
 YdOoAQCJq59/UC7lXU+sOsR6LISaDVTT5jAweBo0o036P9+DNAEA1iQdZ/GK2yCJ
 Ub33Xo9L+hzIpIbCouI3BtCXqymybAg=
 =f5YG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull fwctl subsystem from Jason Gunthorpe:
 "fwctl is a new subsystem intended to bring some common rules and order
  to the growing pattern of exposing a secure FW interface directly to
  userspace.

  Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a
  device for datapath operations fwctl is focused on debugging,
  configuration and provisioning of the device. It will not have the
  necessary features like interrupt delivery to support a datapath.

  This concept is similar to the long standing practice in the "HW" RAID
  space of having a device specific misc device to manage the RAID
  controller FW. fwctl generalizes this notion of a companion debug and
  management interface that goes along with a dataplane implemented in
  an appropriate subsystem.

  There have been three LWN articles written discussing various aspects
  of this:

	https://lwn.net/Articles/955001/
	https://lwn.net/Articles/969383/
	https://lwn.net/Articles/990802/

  This includes three drivers to launch the subsystem:

   - CXL provides a vendor scheme for executing commands and a way to
     learn the 'command effects' (ie the security properties) of such
     commands. The fwctl driver allows access to these mechanism within
     the fwctl security model

   - mlx5 is family of networking products, the driver supports all
     current Mellanox HW still receiving FW feature updates. This
     includes RDMA multiprotocol NICs like ConnectX and the Bluefield
     family of Smart NICs.

   - AMD/Pensando Distributed Services card is a multi protocol Smart
     NIC with a multi PCI function design. fwctl works on the management
     PCI function following a 'command effects' model similar to CXL"

* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (30 commits)
  pds_fwctl: add Documentation entries
  pds_fwctl: add rpc and query support
  pds_fwctl: initial driver framework
  pds_core: add new fwctl auxiliary_device
  pds_core: specify auxiliary_device to be created
  pds_core: make pdsc_auxbus_dev_del() void
  cxl: Fixup kdoc issues for include/cxl/features.h
  fwctl/cxl: Add documentation to FWCTL CXL
  cxl/test: Add Set Feature support to cxl_test
  cxl/test: Add Get Feature support to cxl_test
  cxl: Add support to handle user feature commands for set feature
  cxl: Add support to handle user feature commands for get feature
  cxl: Add support for fwctl RPC command to enable CXL feature commands
  cxl: Move cxl feature command structs to user header
  cxl: Add FWCTL support to CXL
  mlx5: Create an auxiliary device for fwctl_mlx5
  fwctl/mlx5: Support for communicating with mlx5 fw
  fwctl: Add documentation
  fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  taint: Add TAINT_FWCTL
  ...
2025-03-29 10:45:20 -07:00
Linus Torvalds
7288511606 Landlock update for v6.15-rc1
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ+bGgBAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSKmgBAICZsmQTuKMHIXdB7kwA+BX5k++SZcyA+qHN
 0hrJTSMsAP0Uv6NpiPT4CTduqBMRbuMwNhujBczRiok6yaHDbC8eCw==
 =K8XL
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock updates from Mickaël Salaün:
 "This brings two main changes to Landlock:

   - A signal scoping fix with a new interface for user space to know if
     it is compatible with the running kernel.

   - Audit support to give visibility on why access requests are denied,
     including the origin of the security policy, missing access rights,
     and description of object(s). This was designed to limit log spam
     as much as possible while still alerting about unexpected blocked
     access.

  With these changes come new and improved documentation, and a lot of
  new tests"

* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
  landlock: Add audit documentation
  selftests/landlock: Add audit tests for network
  selftests/landlock: Add audit tests for filesystem
  selftests/landlock: Add audit tests for abstract UNIX socket scoping
  selftests/landlock: Add audit tests for ptrace
  selftests/landlock: Test audit with restrict flags
  selftests/landlock: Add tests for audit flags and domain IDs
  selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
  selftests/landlock: Add test for invalid ruleset file descriptor
  samples/landlock: Enable users to log sandbox denials
  landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
  landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
  landlock: Log scoped denials
  landlock: Log TCP bind and connect denials
  landlock: Log truncate and IOCTL denials
  landlock: Factor out IOCTL hooks
  landlock: Log file-related denials
  landlock: Log mount-related denials
  landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
  landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
  ...
2025-03-28 12:37:13 -07:00
LongPing Wei
5c5d0d7050 dm-verity: support block number limits for different ioprio classes
Calling verity_verify_io in bh for IO of all sizes is not suitable for
embedded devices. From our tests, it can improve the performance of 4K
synchronise random reads.
For example:
./fio --name=rand_read --ioengine=psync --rw=randread --bs=4K \
 --direct=1 --numjobs=8 --runtime=60 --time_based --group_reporting \
 --filename=/dev/block/mapper/xx-verity

But it will degrade the performance of 512K synchronise sequential reads
on our devices.
For example:
./fio --name=read --ioengine=psync --rw=read --bs=512K --direct=1 \
 --numjobs=8 --runtime=60 --time_based --group_reporting \
 --filename=/dev/block/mapper/xx-verity

A parameter array is introduced by this change. And users can modify the
default config by /sys/module/dm_verity/parameters/use_bh_bytes.

The default limits for NONE/RT/BE is set to 8192.
The default limits for IDLE is set to 0.

Call verity_verify_io directly when verity_end_io is not in hardirq.

Signed-off-by: LongPing Wei <weilongping@oppo.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-03-28 11:32:55 +01:00
Linus Torvalds
7b667acd69 powerpc updates for 6.15
- Removal of support for IBM Cell Blades
 
  - SMP support for microwatt platform
 
  - Support for inline static calls on PPC32
 
  - Enable pmu selftests for power11 platform
 
  - Enable hardware trace macro (HTM) hcall support
 
  - Support for limited address mode capability
 
  - Changes to RMA size from 512 MB to 768 MB to handle fadump
 
  - Misc fixes and cleanups
 
 Thanks to: Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
 Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav
 Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar,
 Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy,
 Segher Boessenkool, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmfjZnYACgkQpnEsdPSH
 ZJTmKg/+NGW7wyFY9d8Iai9ncYY7GSzsMSDTaan7qg0QWOd5gjHbsdbava7TM/DW
 8p9XsC+17kSeftRNUjtc52bSN8Ei2gBdsXIagQG1alfB2X2e6wkNauifK+dz3Su6
 usMEZZTO5R/jFDotFXNM1nsUj+8dvjnPgOUrji/P8k7PT5295wpza0hz1fy5SrOA
 hM5cliBP36UgFe5Efvgm4OUX2gQIhbc3stt9MVfymW/k0Mit5f41UIPuVGiTWowY
 s0cUJGkhxUlGXT3VfOVKuZfn4u9KMha7UCl9afSceJzXOdnUIKIbskui1VEv6cD/
 iSIxi839uErAobFHlsLYprgYFciYLII3xe2qNZCA/ZxeIMS/Mm6xokESeWLhBnfa
 P7ke6l0z3GDtTvgI2eSeU9BdrVveF1NgbP9GYSKgT6gtw/kRRnxgHF8tzmLON5PT
 KXpQlzz8VuSBRtF2jnLFU89+FFwSA1bRUhDrp89HyYFqw1B5g4N7kFFTUJWHOuKS
 fwPGy+cveKehmCUBedeTRFqHvvqdwpD/WnPlQzCly3WxqdL8U/eTXYftMiAwuK28
 ovLuSs3vRThKRQ8DnUa5oB0UGsjMpRV5LdvYkhw+x8mZKUR59oj4fx2ae4TtPakg
 dbAYuPPkCORdaSga/nV6vQgsLprFpcGX3dq6E19+BVBAY5D+1PE=
 =GFcj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Madhavan Srinivasan:

 - Remove support for IBM Cell Blades

 - SMP support for microwatt platform

 - Support for inline static calls on PPC32

 - Enable pmu selftests for power11 platform

 - Enable hardware trace macro (HTM) hcall support

 - Support for limited address mode capability

 - Changes to RMA size from 512 MB to 768 MB to handle fadump

 - Misc fixes and cleanups

Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom,
Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook,
Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani
(IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav
Jain, and Venkat Rao Bagalkote.

* tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits)
  powerpc/kexec: fix physical address calculation in clear_utlb_entry()
  crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD
  powerpc: Fix 'intra_function_call not a direct call' warning
  powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'
  KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests
  powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
  powerpc/microwatt: Add SMP support
  powerpc: Define config option for processors with broadcast TLBIE
  powerpc/microwatt: Define an idle power-save function
  powerpc/microwatt: Device-tree updates
  powerpc/microwatt: Select COMMON_CLK in order to get the clock framework
  net: toshiba: Remove reference to PPC_IBM_CELL_BLADE
  net: spider_net: Remove powerpc Cell driver
  cpufreq: ppc_cbe: Remove powerpc Cell driver
  genirq: Remove IRQ_EDGE_EOI_HANDLER
  docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR
  powerpc: Remove UDBG_RTAS_CONSOLE
  powerpc/io: Use standard barrier macros in io.c
  powerpc/io: Rename _insw_ns() etc.
  powerpc/io: Use generic raw accessors
  ...
2025-03-27 19:39:08 -07:00
Linus Torvalds
96050814a3 printk changes for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmflJC4ACgkQUqAMR0iA
 lPIEMBAAsIJmQPHdZUQ49tkMGMoQPU9gnrrDZbRzN9IWwMQQymwEobSEkJkrx7O3
 MFQHc5Z29cAxDqrWDCG5hrH0P3rd994pLPgC/XZC9+UbksYIkIrudCqaPaekLVaw
 BQmBZnZRe7PltOiDpx/TxQ8WQgNlAmZpLpIRTB3ePHZpikq/mf4CGcoYXkrNij1j
 E7uzurLZssJwXasRkWOVNvaOFyVcxrjn78H0JoT7mVeh2IUkWyMALX4IyDeD6Uvh
 SCe19/6W3Uyeh/jpJ6+gjuirVy/Vam/2GXeer4yKpgSQBtsqIaGyki4Cc6Xy12gU
 71nRPTggysZXNnqZ40qw68Irb9JGAo63qFDwWf3OQI+yQQWHyGZIinxr+2xupwWF
 ZgmDJtvUsWKHT4+WmB9+MVTGW7Jy8wKq2u/Haf8xCJxs2yhI0J8iUvQi3CPAMdRH
 Q8G9UJP29Lfj0hdKVT2/wVNGxZKIZ8c6v1vR6Rd1hJ0Zqp4Oj+XGr0btE8Ah5LYv
 GI/M00LCdTNT2kBbbw8lekovUIs2stiRl6hfKoQFR11oiZwFDUePLDdJH2GiJAZm
 g9fPs6YuBERfPtzUeo9a9UbSd6oPTLgLNBEklpHUTWilzO8E/Jh+ZXyvQwj7eXyq
 Hi4OnShn1uxL68o6DJfKsF2qW8S2U5g61JrU3D1CaQ98Derrp0o=
 =lNTZ
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - New option "printk.debug_non_panic_cpus" allows to store printk
   messages from non-panic CPUs during panic. It might be useful when
   panic() fails. It is disabled by default because it increases the
   chance to see the messages printed before panic() and on the
   panic-CPU.

 - New build option "CONFIG_NULL_TTY_DEFAULT_CONSOLE" allows to build
   kernel without the virtual terminal support which prefers ttynull
   over serial console.

 - Do not unblank suspended consoles.

 - Some code clean up.

* tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk/panic: Add option to allow non-panic CPUs to write to the ring buffer.
  printk: Add an option to allow ttynull to be a default console device
  printk: Check CON_SUSPEND when unblanking a console
  printk: Rename console_start to console_resume
  printk: Rename console_stop to console_suspend
  printk: Rename resume_console to console_resume_all
  printk: Rename suspend_console to console_suspend_all
2025-03-27 19:22:24 -07:00
Linus Torvalds
744fab2d9f tracing updates for v6.15:
- Add option traceoff_after_boot
 
   In order to debug kernel boot, it sometimes is helpful to enable tracing
   via the kernel command line. Unfortunately, by the time the login prompt
   appears, the trace is overwritten by the init process and other user space
   start up applications. Adding a "traceoff_after_boot" will disable tracing
   when the kernel passes control to init which will allow developers to be
   able to see the traces that occurred during boot.
 
 - Clean up the mmflags macros that display the GFP flags in trace events
 
   The macros to print the GFP flags for trace events had a bit of
   duplication. The code was restructured to remove duplication and in the
   process it also adds some flags that were missed before.
 
 - Removed some dead code and scripts/draw_functrace.py
 
   draw_functrace.py hasn't worked in years and as nobody complained
   about it, remove it.
 
 - Constify struct event_trigger_ops
 
   The event_trigger_ops is just a structure that has function pointers that
   are assigned when the variables are created. These variables should all be
   constants.
 
 - Other minor clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+V9IhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qr4RAP9JhE3n69pGuOVaJTN/LGLr2Axl59n4
 KqZSZS1nUM76/gD6AxYpR7nxyxgJ7VjNkLptS9tSjJVdPDxGAl0v3eO04w4=
 =SU30
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add option traceoff_after_boot

   In order to debug kernel boot, it sometimes is helpful to enable
   tracing via the kernel command line. Unfortunately, by the time the
   login prompt appears, the trace is overwritten by the init process
   and other user space start up applications.

   Adding a "traceoff_after_boot" will disable tracing when the kernel
   passes control to init which will allow developers to be able to see
   the traces that occurred during boot.

 - Clean up the mmflags macros that display the GFP flags in trace
   events

   The macros to print the GFP flags for trace events had a bit of
   duplication. The code was restructured to remove duplication and in
   the process it also adds some flags that were missed before.

 - Removed some dead code and scripts/draw_functrace.py

   draw_functrace.py hasn't worked in years and as nobody complained
   about it, remove it.

 - Constify struct event_trigger_ops

   The event_trigger_ops is just a structure that has function pointers
   that are assigned when the variables are created. These variables
   should all be constants.

 - Other minor clean ups and fixes

* tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Replace strncpy with memcpy for fixed-length substring copy
  tracing: Fix synth event printk format for str fields
  tracing: Do not use PERF enums when perf is not defined
  tracing: Ensure module defining synth event cannot be unloaded while tracing
  tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER
  tracing/osnoise: Fix possible recursive locking for cpus_read_lock()
  tracing: Align synth event print fmt
  tracing: gfp: vsprintf: Do not print "none" when using %pGg printf format
  tracepoint: Print the function symbol when tracepoint_debug is set
  tracing: Constify struct event_trigger_ops
  scripts/tracing: Remove scripts/tracing/draw_functrace.py
  tracing: Update MAINTAINERS file to include tracepoint.c
  tracing/user_events: Slightly simplify user_seq_show()
  tracing/user_events: Don't use %pK through printk
  tracing: gfp: Remove duplication of recording GFP flags
  tracing: Remove orphaned event_trace_printk
  ring-buffer: Fix typo in comment about header page pointer
  tracing: Add traceoff_after_boot option
2025-03-27 16:22:12 -07:00
Linus Torvalds
5c2a430e85 Ext4 bug fixes and cleanups, including:
* hardening against maliciously fuzzed file systems
   * backwards compatibility for the brief period when we attempted to
      ignore zero-width characters
   * avoid potentially BUG'ing if there is a file system corruption found
     during the file system unmount
   * fix free space reporting by statfs when project quotas are enabled
     and the free space is less than the remaining project quota
 
 Also improve performance when replaying a journal with a very large
 number of revoke records (applicable for Lustre volumes).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmflfY4ACgkQ8vlZVpUN
 gaMx7Qf/akTELvyBZ7iPCCHh2HwayuO8qLhPNqrU0TmYMFvgwgYUPcQ3BLn8CE+/
 j5UeT8XxNaLU4GJn3z+q6yW6PnNHfqZqKry9j/iPc3s1mjTslntr/xENlgu6i4Bp
 Q58xc7Pj45vdmP+xmYhRnJcefgsZMvB/N1SEHxwIP8bntZqsEvP9pI82r9Ouc8SA
 ZLQ1/K4OADmk7f3GhlPr9AtgH7O0CjlAas30h/AW77DXBQl7ZgbDsGDlgTwaGqkR
 jHcvfr6hLnWy+MUVGmlNZ2HY6iUgBPItWlYCP/fsrUdnc+CONyl5E17JPSl1QQtR
 CLYlo4xV8j1+zJ094DjhDWMKI2G7jw==
 =oudL
 -----END PGP SIGNATURE-----

Merge tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Ext4 bug fixes and cleanups, including:

   - hardening against maliciously fuzzed file systems

   - backwards compatibility for the brief period when we attempted to
     ignore zero-width characters

   - avoid potentially BUG'ing if there is a file system corruption
     found during the file system unmount

   - fix free space reporting by statfs when project quotas are enabled
     and the free space is less than the remaining project quota

  Also improve performance when replaying a journal with a very large
  number of revoke records (applicable for Lustre volumes)"

* tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (71 commits)
  ext4: fix OOB read when checking dotdot dir
  ext4: on a remount, only log the ro or r/w state when it has changed
  ext4: correct the error handle in ext4_fallocate()
  ext4: Make sb update interval tunable
  ext4: avoid journaling sb update on error if journal is destroying
  ext4: define ext4_journal_destroy wrapper
  ext4: hash: simplify kzalloc(n * 1, ...) to kzalloc(n, ...)
  jbd2: add a missing data flush during file and fs synchronization
  ext4: don't over-report free space or inodes in statvfs
  ext4: clear DISCARD flag if device does not support discard
  jbd2: remove jbd2_journal_unfile_buffer()
  ext4: reorder capability check last
  ext4: update the comment about mb_optimize_scan
  jbd2: fix off-by-one while erasing journal
  ext4: remove references to bh->b_page
  ext4: goto right label 'out_mmap_sem' in ext4_setattr()
  ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()
  ext4: introduce ITAIL helper
  jbd2: remove redundant function jbd2_journal_has_csum_v2or3_feature
  ext4: remove redundant function ext4_has_metadata_csum
  ...
2025-03-27 13:27:08 -07:00
Petr Mladek
f49040c7aa Merge branch 'for-6.15-console-suspend-api-cleanup' into for-linus 2025-03-27 11:09:34 +01:00
Linus Torvalds
22093997ac ata changes for 6.15
- Add 'external' to the libata.force module parameter, in order to
    allow a user to workaround broken firmware (me)
 
  - Use the str_up_down() helper in the sata_via driver (Salah Triki)
 
  - Convert the Freescale PowerQUICC SATA device tree binding to YAML
    (J. Neuschäfer)
 
  - Do not use ATAPI DMA for a device that only supports PIO (me)
 
  - Add Marvell 88SE9215 PCI device ID to the ahci driver.
    Since the controller has quirks, it cannot rely on the generic
    AHCI PCI class code entry (Daniel Kral)
 
  - Improve the return value of atapi_check_dma() (Huacai Chen)
 
  - Fix the NCQ Non-Data log not supported print to actually reference
    the correct log (me)
 
  - Make Marvel 88SE9215 prefer DMA for ATAPI devices (Huacai Chen)
 
  - Simplify the AHCI IRQ vector allocations by performing the IRQ
    vector allocations in the same function, regardless of IRQ type
    (Tomas Henzl)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRN+ES/c4tHlMch3DzJZDGjmcZNcgUCZ+MDSwAKCRDJZDGjmcZN
 ctr2AQCu7c9GPgL6VbVIzgoTdHqN39f7YuyajWxsNKi0832CTAEAkkk2exqa1tMy
 3gFmHcKqm1PoWD2VhjU6odzHy0UtQQY=
 =61mg
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata updates from Niklas Cassel:

 - Add 'external' to the libata.force module parameter, in order to
   allow a user to workaround broken firmware (me)

 - Use the str_up_down() helper in the sata_via driver (Salah Triki)

 - Convert the Freescale PowerQUICC SATA device tree binding to YAML
   (J. Neuschäfer)

 - Do not use ATAPI DMA for a device that only supports PIO (me)

 - Add Marvell 88SE9215 PCI device ID to the ahci driver. Since the
   controller has quirks, it cannot rely on the generic AHCI PCI class
   code entry (Daniel Kral)

 - Improve the return value of atapi_check_dma() (Huacai Chen)

 - Fix the NCQ Non-Data log not supported print to actually reference
   the correct log (me)

 - Make Marvel 88SE9215 prefer DMA for ATAPI devices (Huacai Chen)

 - Simplify the AHCI IRQ vector allocations by performing the IRQ vector
   allocations in the same function, regardless of IRQ type (Tomas
   Henzl)

* tag 'ata-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: ahci: simplify init function
  ahci: Marvell 88SE9215 controllers prefer DMA for ATAPI
  ata: libata: Fix NCQ Non-Data log not supported print
  ata: libata: Improve return value of atapi_check_dma()
  ahci: add PCI ID for Marvell 88SE9215 SATA Controller
  ata: libata-eh: Do not use ATAPI DMA for a device limited to PIO mode
  dt-bindings: ata: Convert fsl,pq-sata to YAML
  ata: sata_via: Use str_up_down() helper in vt6420_prereset()
  ata: libata-core: Add 'external' to the libata.force kernel parameter
2025-03-26 19:49:02 -07:00
Linus Torvalds
fb1ceb29b2 platform-drivers-x86 for v6.15-1
Highlights:
 
  - alienware-wmi: Refactor and split WMAX/legacy drivers
 
  - dell-ddv:
 
     - Correct +0.1 offset in temperature
 
     - Use the power supply extension mechanism for battery temperatures
 
  - intel/pmc:
 
     - Refactor init to mostly use a common init function
 
     - Add support for Arrow Lake U/H
 
     - Add support for Panther Lake
 
  - intel/sst:
 
     - Improve multi die handling
 
     - Prefix header search path with sysroot (fixes cross-compiling)
 
  - lenovo-wmi-hotkey-utilities: Support for mic & audio mute LEDs
 
  - samsung-galaxybook: Add driver for Samsung Galaxy Book series
 
  - wmi:
 
     - Rework WCxx/WExx ACPI method handling
 
     - Enable data block collection when the data block is set
 
  - platform/arm:
 
     - Add Huawei Matebook E Go EC driver
 
  - platform/mellanox:
 
     - Relocate to drivers/platform/mellanox/
 
     - mlxbf-bootctl: RTC battery status sysfs support
 
  - Miscellaneous cleanups / refactoring / improvements
 
 The following is an automated shortlog grouped by driver:
 
 alienware-wmi:
  -  Add a state container for LED control feature
  -  Add a state container for thermal control methods
  -  Add alienware-wmi.h
  -  Add WMI Drivers
  -  Refactor hdmi, amplifier, deepslp methods
  -  Refactor LED control methods
  -  Refactor thermal control methods
  -  Rename alienware-wmi.c
  -  Split DMI table
  -  Split the alienware-wmi driver
  -  Update alienware-wmi config entries
  -  Update header and module information
 
 amd/pmc:
  -  fix leak in probe()
  -  Move macros and structures to the PMC header file
  -  Notify user when platform does not support s0ix transition
  -  Remove unnecessary line breaks
  -  Use managed APIs for mutex
 
 amd/pmf:
  -  convert timeouts to secs_to_jiffies()
 
 amd:
  -  Use *-y instead of *-objs in Makefiles
 
 arm64:
  -  add Huawei Matebook E Go EC driver
 
 arm64: dts: qcom: gaokun3:
  -  Add Embedded Controller node
 
 compal-laptop:
  -  Do not include <linux/fb.h>
 
 dell-ddv:
  -  Fix temperature calculation
  -  Use devm_battery_hook_register
  -  Use the power supply extension mechanism
 
 dell: dell-wmi-sysman:
  -  Use *-y instead of *-objs in Makefile
 
 dell:
  -  Modify Makefile alignment
  -  Use *-y instead of *-objs in Makefile
 
 dell-uart-backlight:
  -  Make dell_uart_bl_serdev_driver static
 
 dt-bindings: platform:
  -  Add Huawei Matebook E Go EC
 
 hp-bioscfg:
  -  Replace deprecated strncpy() with strscpy()
  -  Use wmi_instance_count()
 
 hp:
  -  Use *-y instead of *-objs in Makefile
 
 hwmon:
  -  (hp-wmi-sensors) Use the WMI bus API when accessing sensors
 
 ideapad-laptop:
  -  use dev_groups to register attribute groups
 
 int3472:
  -  Call "func" "con_id" instead
 
 intel/pmc:
  -  Add Arrow Lake U/H support to intel_pmc_core driver
  -  Add Panther Lake support to intel_pmc_core
  -  Create generic_core_init() for all platforms
  -  Make tgl_core_generic_init() static
  -  Move arch specific action to init function
  -  Remove duplicate enum
  -  Remove simple init functions
  -  Remove unnecessary declarations in header
  -  Remove unneeded extern keyword in header
 
 intel:
  -  Use *-y instead of *-objs in Makefile
 
 irqdomain: platform/x86:
  -  Switch to irq_domain_create_linear()
 
 lenovo-wmi-hotkey-utilities.c:
  -  Support for mic and audio mute LEDs
 
 lenovo-yoga-tab2-pro-1380-fastcharger:
  -  Make symbol static
 
 MAINTAINERS:
  -  Add documentation reference for Mellanox platform
  -  Update ALIENWARE WMI DRIVER entry
 
 mellanox:
  -  Relocate mlx-platform driver
 
 mlxbf-bootctl:
  -  Support sysfs entries for RTC battery status
 
 mlx-platform:
  -  Change register name
  -  Cosmetic changes
 
 samsung-galaxybook:
  -  Add samsung-galaxybook driver
  -  Fix block_recording not supported logic
 
 sonypi:
  -  Use str_on_off() helper in sonypi_display_info()
 
 think-lmi:
  -  Use ACPI object when extracting strings
  -  Use WMI bus API when accessing BIOS settings
 
 thinkpad_acpi:
  -  check the return value of devm_mutex_init()
  -  convert timeouts to secs_to_jiffies()
  -  Do not include <linux/fb.h>
  -  Move HWMON initialization to tpacpi_hwmon_pdriver's probe
  -  Move subdriver initialization to tpacpi_pdriver's probe.
 
 tools/power/x86/intel-speed-select:
  -  Die ID for IO dies
  -  Fix the condition to check multi die system
  -  Prefix header search path with sysroot
  -  Prevent increasing MAX_DIE_PER_PACKAGE
  -  v1.22 release
 
 wmi:
  -  Call WCxx methods when setting data blocks
  -  Rework WCxx/WExx ACPI method handling
  -  Update documentation regarding the GUID-based API
  -  Use devres to disable the WMI device
 
 x86-android-tablets:
  -  Add select POWER_SUPPLY to Kconfig
 
 Merges:
  -  Merge branch 'fixes' into for-next
  -  Merge branch 'intel-sst' of https://github.com/spandruvada/linux-kernel into review-ilpo-next
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCZ+QJyQAKCRBZrE9hU+XO
 MUPYAP4sjtB8DKTIHgiQqado7PJZmdgpeJfAplitwKGe4BOcEQEA1mhuqMA5n6S+
 jvbTtRF+jiKxis/5zAtAdChScM/pIgo=
 =W+gy
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers updates from Ilpo Järvinen:

 - alienware-wmi:
     - Refactor and split WMAX/legacy drivers

 - dell-ddv:
     - Correct +0.1 offset in temperature
     - Use the power supply extension mechanism for battery temperatures

 - intel/pmc:
     - Refactor init to mostly use a common init function
     - Add support for Arrow Lake U/H
     - Add support for Panther Lake

 - intel/sst:
     - Improve multi die handling
     - Prefix header search path with sysroot (fixes cross-compiling)

 - lenovo-wmi-hotkey-utilities:
     - Support for mic & audio mute LEDs

 - samsung-galaxybook:
     - Add driver for Samsung Galaxy Book series

 - wmi:
     - Rework WCxx/WExx ACPI method handling
     - Enable data block collection when the data block is set

 - platform/arm:
     - Add Huawei Matebook E Go EC driver

 - platform/mellanox:
     - Relocate to drivers/platform/mellanox/
     - mlxbf-bootctl:
     - RTC battery status sysfs support

 - Miscellaneous cleanups / refactoring / improvements

* tag 'platform-drivers-x86-v6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (75 commits)
  platform/x86: x86-android-tablets: Add select POWER_SUPPLY to Kconfig
  platform/x86/amd/pmf: convert timeouts to secs_to_jiffies()
  platform/x86: thinkpad_acpi: convert timeouts to secs_to_jiffies()
  irqdomain: platform/x86: Switch to irq_domain_create_linear()
  platform/x86/amd/pmc: fix leak in probe()
  tools/power/x86/intel-speed-select: v1.22 release
  tools/power/x86/intel-speed-select: Prefix header search path with sysroot
  tools/power/x86/intel-speed-select: Die ID for IO dies
  tools/power/x86/intel-speed-select: Fix the condition to check multi die system
  tools/power/x86/intel-speed-select: Prevent increasing MAX_DIE_PER_PACKAGE
  platform/x86/amd/pmc: Use managed APIs for mutex
  platform/x86/amd/pmc: Remove unnecessary line breaks
  platform/x86/amd/pmc: Move macros and structures to the PMC header file
  platform/x86/amd/pmc: Notify user when platform does not support s0ix transition
  platform/x86: dell-ddv: Use the power supply extension mechanism
  platform/x86: dell-ddv: Use devm_battery_hook_register
  platform/x86: dell-ddv: Fix temperature calculation
  platform/x86: thinkpad_acpi: check the return value of devm_mutex_init()
  platform/x86: samsung-galaxybook: Fix block_recording not supported logic
  platform/x86: dell-uart-backlight: Make dell_uart_bl_serdev_driver static
  ...
2025-03-26 09:54:40 -07:00
Mickaël Salaün
8e2dd47b10
landlock: Add audit documentation
Because audit is dedicated to the system administrator, create a new
entry in Documentation/admin-guide/LSM .  Extend other Landlock
documentation's pages with this new one.

Extend UAPI with the new log flags.

Extend the guiding principles with logs.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-29-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:49 +01:00
Linus Torvalds
1e26c5e28c [GIT PULL for v6.15] media updates
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmfbCbgACgkQCF8+vY7k
 4RX20A/+Lec4h7TqDC3ctPHVvZ6dN+xMjqx5RnhsJ2yo7NqrbgrZ1HAC6nNkUCAP
 ZbhVUBEVYN3pxCiA1Qj6YuainQkWW5qPIB1ACQ/spdXacluQaPuub03LXLzbn6Qh
 inojtO1v04q2LPtnl7Lv1F0aUbXQK2JPMOlByrgX/XdFYq3B91uV5Z7SeBnsSshW
 usOdh3Dv1QmIlHvlSLFlAVPAE1PdfvDmV3XmhSy4HZM+zhQjYia563F+9lj5tUe9
 F7mX7fGlh9AL2uiucH7GlW3U6+SNc0JwNr3Ra9LdBvqksU8TYp1Dt+lvqpscTyuY
 DPMZS5i/YfZPFukPNGMf7RKKP9/gaKWza684PTq5XsASUAiGwpsiyPHM2rJYXf9u
 kAzLJH7vk7sLdssgjFONwOLJlD93EXclT7myDpNCbAyD7tni7iTfGyxmNYFFLc/w
 NzqfowZfHSxYwDYzoXUGcU2lksMqWrHF+8oO2de0NUxhrLjawcGj3acCe1R4zU0W
 CoKDCI8YXoBI2YNrF1Nvnca/2qkPXi1IKyVdiDGMv4aeuja2hB8Z2IKssSP8Jjlp
 ui97q2MR/WwYAKkNTyH8F0kV6x6E5FqVbVAm6wp+d4uzi2QSGPFD8kuXwTmohYGa
 ZbVXBtjM3ZxO5rb/LsCPAPy9USR3J3Jkt5rVfaRo+EusU+8Iqig=
 =RsqI
 -----END PGP SIGNATURE-----

Merge tag 'media/v6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - platform: synopsys: hdmirx: Fix 64-bit division for 32-bit targets

 - vim2m: print device name after registering device

 - Synopsys DesignWare HDMI RX Driver and various fixes

 - cec/printk fixes and the removal of the vidioc_g/s_ctrl and
   vidioc_queryctrl callbacks

 - AVerMedia H789-C PCIe support and rc-core structs padding

 - Several camera sensor patches

 - uvcvideo improvements

 - visl: Fix ERANGE error when setting enum controls

 - codec fixes

 - V4L2 camera sensor patches mostly

 - chips-media: wave5: Fixes

 - Add SDM670 camera subsystem

 - Qualcomm iris video decoder driver

 - dt-bindings: update clocks for sc7280-camss

 - various fixes and enhancements

* tag 'media/v6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (264 commits)
  media: pci: mgb4: include linux/errno.h
  media: synopsys: hdmirx: Fix signedness bug in hdmirx_parse_dt()
  media: platform: synopsys: hdmirx: Fix 64-bit division for 32-bit targets
  media: vim2m: print device name after registering device
  media: vivid: Introduce VIDEO_VIVID_OSD
  media: vivid: Move all fb_info references into vivid-osd
  media: platform: synopsys: hdmirx: Optimize struct snps_hdmirx_dev
  media: platform: synopsys: hdmirx: Remove unused HDMI audio CODEC relics
  media: platform: synopsys: hdmirx: Remove duplicated header inclusion
  media: qcom: Clean up Kconfig dependencies
  media: dvb-frontends: tda10048: Make the range of z explicit.
  media: platform: stm32: Add check for clk_enable()
  media: xilinx-tpg: fix double put in xtpg_parse_of()
  media: siano: Fix error handling in smsdvb_module_init()
  media: c8sectpfe: Call of_node_put(i2c_bus) only once in c8sectpfe_probe()
  media: i2c: tda1997x: Call of_node_put(ep) only once in tda1997x_parse_dt()
  dt-bindings: media: mediatek,vcodec: Revise description
  dt-bindings: media: mediatek,jpeg: Relax IOMMU max item count
  media: v4l2-dv-timings: prevent possible overflow in v4l2_detect_gtf()
  media: rockchip: rga: fix rga offset lookup
  ...
2025-03-25 21:00:31 -07:00
Linus Torvalds
7d20aa5c32 Power management updates for 6.15-rc1
- Manage sysfs attributes and boost frequencies efficiently from
    cpufreq core to reduce boilerplate code in drivers (Viresh Kumar).
 
  - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
    Dhananjay Ugwekar, Imran Shaik, zuoqian).
 
  - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
    Bai).
 
  - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski).
 
  - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng).
 
  - Optimize the amd-pstate driver to avoid cases where call paths end
    up calling the same writes multiple times and needlessly caching
    variables through code reorganization, locking overhaul and tracing
    adjustments (Mario Limonciello, Dhananjay Ugwekar).
 
  - Make it possible to avoid enabling capacity-aware scheduling (CAS) in
    the intel_pstate driver and relocate a check for out-of-band (OOB)
    platform handling in it to make it detect OOB before checking HWP
    availability (Rafael Wysocki).
 
  - Fix dbs_update() to avoid inadvertent conversions of negative integer
    values to unsigned int which causes CPU frequency selection to be
    inaccurate in some cases when the "conservative" cpufreq governor is
    in use (Jie Zhan).
 
  - Update the handling of the most recent idle intervals in the menu
    cpuidle governor to prevent useful information from being discarded
    by it in some cases and improve the prediction accuracy (Rafael
    Wysocki).
 
  - Make it possible to tell the intel_idle driver to ignore its built-in
    table of idle states for the given processor, clean up the handling
    of auto-demotion disabling on Baytrail and Cherrytrail chips in it,
    and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy,
    Rafael Wysocki).
 
  - Make some cpuidle drivers use for_each_present_cpu() instead of
    for_each_possible_cpu() during initialization to avoid issues
    occurring when nosmp or maxcpus=0 are used (Jacky Bai).
 
  - Clean up the Energy Model handling code somewhat (Rafael Wysocki).
 
  - Use kfree_rcu() to simplify the handling of runtime Energy Model
    updates (Li RongQing).
 
  - Add an entry for the Energy Model framework to MAINTAINERS as
    properly maintained (Lukasz Luba).
 
  - Address RCU-related sparse warnings in the Energy Model code (Rafael
    Wysocki).
 
  - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
    when DEVFREQ is set without CPUFREQ so it can be used on a wider
    range of systems (Jeson Gao).
 
  - Unify error handling during runtime suspend and runtime resume in the
    core to help drivers to implement more consistent runtime PM error
    handling (Rafael Wysocki).
 
  - Drop a redundant check from pm_runtime_force_resume() and rearrange
    documentation related to __pm_runtime_disable() (Rafael Wysocki).
 
  - Rework the handling of the "smart suspend" driver flag in the PM core
    to avoid issues hat may occur when drivers using it depend on some
    other drivers and clean up the related PM core code (Rafael Wysocki,
    Colin Ian King).
 
  - Fix the handling of devices with the power.direct_complete flag set
    if device_suspend() returns an error for at least one device to avoid
    situations in which some of them may not be resumed (Rafael Wysocki).
 
  - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
    possible deadlock that may occur if the "compressor" hibernation
    module parameter is accessed during the registration of a new
    ieee80211 device (Lizhi Xu).
 
  - Suppress sleeping parent warning in device_pm_add() in the case when
    new children are added under a device with the power.direct_complete
    set after it has been processed by device_resume() (Xu Yang).
 
  - Remove needless return in three void functions related to system
    wakeup (Zijun Hu).
 
  - Replace deprecated kmap_atomic() with kmap_local_page() in the
    hibernation core code (David Reaver).
 
  - Remove unused helper functions related to system sleep (David Alan
    Gilbert).
 
  - Clean up s2idle_enter() so it does not lock and unlock CPU offline
    in vain and update comments in it (Ulf Hansson).
 
  - Clean up broken white space in dpm_wait_for_children() (Geert
    Uytterhoeven).
 
  - Update the cpupower utility to fix lib version-ing in it and memory
    leaks in error legs, remove hard-coded values, and implement CPU
    physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
    Khan, Yiwei Lin, Zhongqiu Han).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhTYSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO16/gIAKuRiG1fFgUcUSXC1iFu42vrB/1i4wpA
 02GICACqM3K6/5jd3ct/WOU28GUgDs+xcmqH7CnMaM6y9nXEWjWarmSfFekAO+0q
 TPtQ7xTy0hBCB3he1P2uLKBJBin4Wn47U9/rvs4J7mQd5zDxTINKIiVoHg2lEE+s
 HAeSoNRb2sp5IZDm9+/LfhHNYRP1mJ97cbZlymqctGB3xgDL7qMLid/1+gFPHAQS
 4/LXj3IgyU8DpA/j5nhtpaAqjN5g2QxIUfQgADRIcESK99Y/7aAMs1/G0WhJKaay
 9yx+4/xmkGvVCZQx1DphksFLISEzltY0SFWLsoppPzBTGVEW2GQQsNI=
 =LqVy
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are dominated by cpufreq updates which in turn are dominated by
  updates related to boost support in the core and drivers and
  amd-pstate driver optimizations.

  Apart from the above, there are some cpuidle updates including a
  rework of the most recent idle intervals handling in the venerable
  menu governor that leads to significant improvements in some
  performance benchmarks, as the governor is now more likely to predict
  a shorter idle duration in some cases, and there are updates of the
  core device power management code, mostly related to system suspend
  and resume, that should help to avoid potential issues arising when
  the drivers of devices depending on one another want to use different
  optimizations.

  There is also a usual collection of assorted fixes and cleanups,
  including removal of some unused code.

  Specifics:

   - Manage sysfs attributes and boost frequencies efficiently from
     cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)

   - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
     Dhananjay Ugwekar, Imran Shaik, zuoqian)

   - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
     Bai)

   - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)

   - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)

   - Optimize the amd-pstate driver to avoid cases where call paths end
     up calling the same writes multiple times and needlessly caching
     variables through code reorganization, locking overhaul and tracing
     adjustments (Mario Limonciello, Dhananjay Ugwekar)

   - Make it possible to avoid enabling capacity-aware scheduling (CAS)
     in the intel_pstate driver and relocate a check for out-of-band
     (OOB) platform handling in it to make it detect OOB before checking
     HWP availability (Rafael Wysocki)

   - Fix dbs_update() to avoid inadvertent conversions of negative
     integer values to unsigned int which causes CPU frequency selection
     to be inaccurate in some cases when the "conservative" cpufreq
     governor is in use (Jie Zhan)

   - Update the handling of the most recent idle intervals in the menu
     cpuidle governor to prevent useful information from being discarded
     by it in some cases and improve the prediction accuracy (Rafael
     Wysocki)

   - Make it possible to tell the intel_idle driver to ignore its
     built-in table of idle states for the given processor, clean up the
     handling of auto-demotion disabling on Baytrail and Cherrytrail
     chips in it, and update its MAINTAINERS entry (David Arcari, Artem
     Bityutskiy, Rafael Wysocki)

   - Make some cpuidle drivers use for_each_present_cpu() instead of
     for_each_possible_cpu() during initialization to avoid issues
     occurring when nosmp or maxcpus=0 are used (Jacky Bai)

   - Clean up the Energy Model handling code somewhat (Rafael Wysocki)

   - Use kfree_rcu() to simplify the handling of runtime Energy Model
     updates (Li RongQing)

   - Add an entry for the Energy Model framework to MAINTAINERS as
     properly maintained (Lukasz Luba)

   - Address RCU-related sparse warnings in the Energy Model code
     (Rafael Wysocki)

   - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
     when DEVFREQ is set without CPUFREQ so it can be used on a wider
     range of systems (Jeson Gao)

   - Unify error handling during runtime suspend and runtime resume in
     the core to help drivers to implement more consistent runtime PM
     error handling (Rafael Wysocki)

   - Drop a redundant check from pm_runtime_force_resume() and rearrange
     documentation related to __pm_runtime_disable() (Rafael Wysocki)

   - Rework the handling of the "smart suspend" driver flag in the PM
     core to avoid issues hat may occur when drivers using it depend on
     some other drivers and clean up the related PM core code (Rafael
     Wysocki, Colin Ian King)

   - Fix the handling of devices with the power.direct_complete flag set
     if device_suspend() returns an error for at least one device to
     avoid situations in which some of them may not be resumed (Rafael
     Wysocki)

   - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
     possible deadlock that may occur if the "compressor" hibernation
     module parameter is accessed during the registration of a new
     ieee80211 device (Lizhi Xu)

   - Suppress sleeping parent warning in device_pm_add() in the case
     when new children are added under a device with the
     power.direct_complete set after it has been processed by
     device_resume() (Xu Yang)

   - Remove needless return in three void functions related to system
     wakeup (Zijun Hu)

   - Replace deprecated kmap_atomic() with kmap_local_page() in the
     hibernation core code (David Reaver)

   - Remove unused helper functions related to system sleep (David Alan
     Gilbert)

   - Clean up s2idle_enter() so it does not lock and unlock CPU offline
     in vain and update comments in it (Ulf Hansson)

   - Clean up broken white space in dpm_wait_for_children() (Geert
     Uytterhoeven)

   - Update the cpupower utility to fix lib version-ing in it and memory
     leaks in error legs, remove hard-coded values, and implement CPU
     physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
     Khan, Yiwei Lin, Zhongqiu Han)"

* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
  PM: sleep: Fix bit masking operation
  dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
  dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
  cpufreq: Init cpufreq only for present CPUs
  PM: sleep: Fix handling devices with direct_complete set on errors
  cpuidle: Init cpuidle only for present CPUs
  PM: clk: Remove unused pm_clk_remove()
  PM: sleep: core: Fix indentation in dpm_wait_for_children()
  PM: s2idle: Extend comment in s2idle_enter()
  PM: s2idle: Drop redundant locks when entering s2idle
  PM: sleep: Remove unused pm_generic_ wrappers
  cpufreq: tegra186: Share policy per cluster
  cpupower: Make lib versioning scheme more obvious and fix version link
  PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
  PM: EM: Address RCU-related sparse warnings
  cpupower: Implement CPU physical core querying
  pm: cpupower: remove hard-coded topology depth values
  pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
  ...
2025-03-25 15:00:18 -07:00
Linus Torvalds
21e0ff5b10 ACPI updates for 6.15-rc1
- Use the str_on_off() helper function instead of hard-coded strings in
    the ACPI power resources handling code (Thorsten Blum).
 
  - Add fan speed reporting for ACPI fans that have _FST, but otherwise
    do not support the entire ACPI 4 fan interface (Joshua Grisham).
 
  - Fix a stale comment regarding trip points in acpi_thermal_add() that
    diverged from the commented code after removing _CRT evaluation from
    acpi_thermal_get_trip_points() (xueqin Luo).
 
  - Make ACPI button driver also subscribe to system events (Mario
    Limonciello).
 
  - Use the str_yes_no() helper function instead of hard-coded strings in
    the ACPI backlight (video) driver (Thorsten Blum).
 
  - Add a missing header file include to the x86 arch CPPC code (Mario
    Limonciello).
 
  - Rework the sysfs attributes implementation in the ACPI platform-profile
    driver and improve the unregistration code in it (Nathan Chancellor,
    Kurt Borja).
 
  - Prevent the ACPI HED driver from being built as a module and change
    its initcall level to subsys_initcall to avoid initialization ordering
    issues related to it (Xiaofei Tan).
 
  - Update a maintainer email address in the ACPI PMIC entry in
    MAINTAINERS (Mika Westerberg).
 
  - Address a GCC 15's -Wunterminated-string-initialization warning in
    the core PNP subsystem code and remove some dead code from it (Kees
    Cook, David Alan Gilbert).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhZYSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1yCMH/3IbFftpA2sJg504igRgLdMzDAhTc/Y3
 x8Y37v1e1Psxyp0SQni84H4E11QWaytSXngemnp39+LgN+14KW243z6v+PBGioyI
 +cdJw7kAk8v1aX+Ujel2Z3BIz9QPFqZd6d3R0AsZkZcI/28VW7kHNRXZ6p2kYmxK
 7acx0Y1cM1k0UotzpzQ4RDaTnFNKUGJFQwdKTEJU237gsFrVh8ev2hLm9Hy4FU6R
 zGtjrU2/oBEmoCVgrXG4n6bZYP4dZwCZ8ewckIvepGuTPTHP8tYMxrELw/3A1901
 +lnN6zK6nMpvCd9cl0ongT5iFG4gsuBGanvnP7Mf51YI380jmNSLeCI=
 =/b+r
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "From the functional perspective, the most significant changes here are
  the ACPI fan driver update allowing it to handle fans with
  fine-grained state checking supported, but without fine-grained
  control, and the ACPI button driver update making it subscribe to
  system event notifications (in addition to device notifications) which
  on some systems is requisite for waking up the system from sleep.

  The rest is fixes and cleanups including removal of some dead code.

  Specifics:

   - Use the str_on_off() helper function instead of hard-coded strings
     in the ACPI power resources handling code (Thorsten Blum)

   - Add fan speed reporting for ACPI fans that have _FST, but otherwise
     do not support the entire ACPI 4 fan interface (Joshua Grisham)

   - Fix a stale comment regarding trip points in acpi_thermal_add()
     that diverged from the commented code after removing _CRT
     evaluation from acpi_thermal_get_trip_points() (xueqin Luo)

   - Make ACPI button driver also subscribe to system events (Mario
     Limonciello)

   - Use the str_yes_no() helper function instead of hard-coded strings
     in the ACPI backlight (video) driver (Thorsten Blum)

   - Add a missing header file include to the x86 arch CPPC code (Mario
     Limonciello)

   - Rework the sysfs attributes implementation in the ACPI
     platform-profile driver and improve the unregistration code in it
     (Nathan Chancellor, Kurt Borja)

   - Prevent the ACPI HED driver from being built as a module and change
     its initcall level to subsys_initcall to avoid initialization
     ordering issues related to it (Xiaofei Tan)

   - Update a maintainer email address in the ACPI PMIC entry in
     MAINTAINERS (Mika Westerberg)

   - Address a GCC 15's -Wunterminated-string-initialization warning in
     the core PNP subsystem code and remove some dead code from it (Kees
     Cook, David Alan Gilbert)"

* tag 'acpi-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PNP: Expand length of fixup id string
  PNP: Remove prehistoric deadcode
  ACPI: button: Install notifier for system events as well
  ACPI: fan: Add fan speed reporting for fans with only _FST
  ACPI: HED: Always initialize before evged
  x86/ACPI: CPPC: Add missing include
  ACPI: video: Use str_yes_no() helper in acpi_video_bus_add()
  ACPI: platform_profile: Improve platform_profile_unregister()
  ACPI: platform-profile: Fix CFI violation when accessing sysfs files
  ACPI: power: Use str_on_off() helper function
  ACPI: thermal: Fix stale comment regarding trip points
  MAINTAINERS: Use my kernel.org address for ACPI PMIC work
2025-03-25 14:56:33 -07:00
Linus Torvalds
906174776c - Some preparatory work to convert the mitigations machinery to mitigating
attack vectors instead of single vulnerabilities
 
 - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag
 
 - Add support for a Zen5-specific SRSO mitigation
 
 - Cleanups and minor improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfixS0ACgkQEsHwGGHe
 VUpi1xAAgvH2u8Eo8ibT5dABQpD65w3oQiykO+9aDpObG9w9beDVGlld8DJE61Rz
 6tcE0Clp2H/tMcCbn8zXIJ92TQ3wIX/85uZwLi1VEM1Tx7A6VtAbPv8WKfZE3FCX
 9v92HRKnK3ql+A2ZR+oyy+/8RedUmia7y7/bXH1H7Zf2uozoKkmq5cQnwfq5iU4A
 qNiKuvSlQwjZ8Zz6Ax1ugHUkE4R7mlKh8rccLXl4+mVr63/lkPHSY3OFTjcYf4HW
 Ir92N86Spfo0/l0vsOOsWoYKmoaiVP7ouJh7YbKR3B0BGN0pt2MT476mehkEs427
 m4J6XhRKhIrsYmzEkLvvpsg12zO4/PKk8BEYNS7YPYlRaOwjV4ivyFS2aY6e55rh
 yUHyo9s+16f/Mp+/fNFXll3mdMxYBioPWh3M191nJkdfyKMrtf0MdKPRibaJB8wH
 yMF4D1gMx+hFbs0/VOS6dtqD9DKW7VgPg0LW+RysfhnLTuFFb5iBcH6Of7l7Z/Ca
 vVK+JxrhB1EDVI1+MKnESKPF9c6j3DRa2xrQHi/XYje1TGqnQ1v4CmsEObYBuJDN
 9M9t4QLzNuA/DA5tS7cxxtQ3YUthuJjPLcO4EVHOCvnqCAxkzp0i3dVMUr+YISl+
 2yFqaZdTt8s8FjTI21LOyuloCo30ZLlzaorFa0lp2cIyYup+1vg=
 =btX/
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 speculation mitigation updates from Borislav Petkov:

 - Some preparatory work to convert the mitigations machinery to
   mitigating attack vectors instead of single vulnerabilities

 - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag

 - Add support for a Zen5-specific SRSO mitigation

 - Cleanups and minor improvements

* tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2
  x86/bugs: Use the cpu_smt_possible() helper instead of open-coded code
  x86/bugs: Add AUTO mitigations for mds/taa/mmio/rfds
  x86/bugs: Relocate mds/taa/mmio/rfds defines
  x86/bugs: Add X86_BUG_SPECTRE_V2_USER
  x86/bugs: Remove X86_FEATURE_USE_IBPB
  KVM: nVMX: Always use IBPB to properly virtualize IBRS
  x86/bugs: Use a static branch to guard IBPB on vCPU switch
  x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set()
  x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation()
  x86/bugs: Move the X86_FEATURE_USE_IBPB check into callers
  x86/bugs: KVM: Add support for SRSO_MSR_FIX
2025-03-25 13:30:18 -07:00
Linus Torvalds
2d09a9449e arm64 updates for 6.15:
Perf and PMUs:
 
  - Support for the "Rainier" CPU PMU from Arm
 
  - Preparatory driver changes and cleanups that pave the way for BRBE
    support
 
  - Support for partial virtualisation of the Apple-M1 PMU
 
  - Support for the second event filter in Arm CSPMU designs
 
  - Minor fixes and cleanups (CMN and DWC PMUs)
 
  - Enable EL2 requirements for FEAT_PMUv3p9
 
 Power, CPU topology:
 
  - Support for AMUv1-based average CPU frequency
 
  - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It adds
    a generic topology_is_primary_thread() function overridden by x86 and
    powerpc
 
 New(ish) features:
 
  - MOPS (memcpy/memset) support for the uaccess routines
 
 Security/confidential compute:
 
  - Fix the DMA address for devices used in Realms with Arm CCA. The
    CCA architecture uses the address bit to differentiate between shared
    and private addresses
 
  - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
    default
 
 Memory management clean-ups:
 
  - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs
 
  - Some minor page table accessor clean-ups
 
  - PIE/POE (permission indirection/overlay) helpers clean-up
 
 Kselftests:
 
  - MTE: skip hugetlb tests if MTE is not supported on such mappings and
    user correct naming for sync/async tag checking modes
 
 Miscellaneous:
 
  - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
    request)
 
  - Sysreg updates for new register fields
 
  - CPU type info for some Qualcomm Kryo cores
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmfjB2QACgkQa9axLQDI
 XvGrfg//W3Bx9+jw1G/XHHEQqGEVFmvltvxZUkvgV0Qki0rPSMnappJhZRL9n0Nm
 V6PvGd2KoKHZuL3g5ViZb3cs2R9BiD2JB6PncwBKuxumHGh3vz3kk1JMkDVfWdHv
 qAceOckFJD9rXjPZn+PDsfYiEi2i3RRWIP5VglZ14ue8j3prHQ6DJXLUQF2GYvzE
 /bgLSq44wp5N59ddy23+qH9rxrHzz3bgpbVv/F56W/LErvE873mRmyFwiuGJm+M0
 Pn8ra572rI6a4sgSwrMTeNPBU+F9o5AbqwauVhkz428RdMvgfEuW6qHUBnGWJDmt
 HotXmu+4Eb2KJks/iQkDo4OTJ38yUqvvZZJtP171ms3E4yqESSJngWP6O2A6LF+y
 xhe0sESF/Ew6jLhM6/hvOmBcE2AyB14JE3ymqLkXbWub4NXddBn2AF1WXFjF4CBw
 F8KSUhNLekrCYKv1k9M3nhvkcpoS9FkTF/TI+zEg546alI/GLPih6uDRkgMAODh1
 RDJYixHsf2NDDRQbfwvt9Xua/KKpDF6qNkHLA4OiqqVUwh1hkas24Lrnp8vmce4o
 wIpWCLqYWey8Rl3XWuWgWz2Xu58fHH4Dl2k72Z8I0pwp3abCDa9xEj79G0Svk7Si
 Q+FCYrNlpKee1RXBC+1MUD/Gl5r/28dEUFkAzPD80F7AgafXPd0=
 =Kc9c
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Nothing major this time around.

  Apart from the usual perf/PMU updates, some page table cleanups, the
  notable features are average CPU frequency based on the AMUv1
  counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in
  the uaccess routines.

  Perf and PMUs:

   - Support for the 'Rainier' CPU PMU from Arm

   - Preparatory driver changes and cleanups that pave the way for BRBE
     support

   - Support for partial virtualisation of the Apple-M1 PMU

   - Support for the second event filter in Arm CSPMU designs

   - Minor fixes and cleanups (CMN and DWC PMUs)

   - Enable EL2 requirements for FEAT_PMUv3p9

  Power, CPU topology:

   - Support for AMUv1-based average CPU frequency

   - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It
     adds a generic topology_is_primary_thread() function overridden by
     x86 and powerpc

  New(ish) features:

   - MOPS (memcpy/memset) support for the uaccess routines

  Security/confidential compute:

   - Fix the DMA address for devices used in Realms with Arm CCA. The
     CCA architecture uses the address bit to differentiate between
     shared and private addresses

   - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
     default

  Memory management clean-ups:

   - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs

   - Some minor page table accessor clean-ups

   - PIE/POE (permission indirection/overlay) helpers clean-up

  Kselftests:

   - MTE: skip hugetlb tests if MTE is not supported on such mappings
     and user correct naming for sync/async tag checking modes

  Miscellaneous:

   - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
     request)

   - Sysreg updates for new register fields

   - CPU type info for some Qualcomm Kryo cores"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
  arm64: mm: Don't use %pK through printk
  perf/arm_cspmu: Fix missing io.h include
  arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
  arm64: cputype: Add MIDR_CORTEX_A76AE
  arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
  arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
  arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
  arm64/sysreg: Enforce whole word match for open/close tokens
  arm64/sysreg: Fix unbalanced closing block
  arm64: Kconfig: Enable HOTPLUG_SMT
  arm64: topology: Support SMT control on ACPI based system
  arch_topology: Support SMT control for OF based system
  cpu/SMT: Provide a default topology_is_primary_thread()
  arm64/mm: Define PTDESC_ORDER
  perf/arm_cspmu: Add PMEVFILT2R support
  perf/arm_cspmu: Generalise event filtering
  perf/arm_cspmu: Move register definitons to header
  arm64/kernel: Always use level 2 or higher for early mappings
  arm64/mm: Drop PXD_TABLE_BIT
  arm64/mm: Check pmd_table() in pmd_trans_huge()
  ...
2025-03-25 13:16:16 -07:00
Denis Mukhin
3181424aea x86/early_printk: Add support for MMIO-based UARTs
During the bring-up of an x86 board, the kernel was crashing before
reaching the platform's console driver because of a bug in the firmware,
leaving no trace of the boot progress.

The only available method to debug the kernel boot process was via the
platform's MMIO-based UART, as the board lacked an I/O port-based UART,
PCI UART, or functional video output.

Then it turned out that earlyprintk= does not have a knob to configure
the MMIO-mapped UART.

Extend the early printk facility to support platform MMIO-based UARTs
on x86 systems, enabling debugging during the system bring-up phase.

The command line syntax to enable platform MMIO-based UART is:

  earlyprintk=mmio,membase[,{nocfg|baudrate}][,keep]

Note, the change does not integrate MMIO-based UART support to:

  arch/x86/boot/early_serial_console.c

Also, update kernel parameters documentation with the new syntax and
add the missing 'nocfg' setting to the PCI serial cards description.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250324-earlyprintk-v3-1-aee7421dc469@ford.com
2025-03-25 08:35:38 +01:00
Linus Torvalds
e34c38057a [ Merge note: this pull request depends on you having merged
two locking commits in the locking tree,
 	      part of the locking-core-2025-03-22 pull request. ]
 
 x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid='
     (Brendan Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)
 
 Percpu code:
   - Standardize & reorganize the x86 percpu layout and
     related cleanups (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu
     variable (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
 
 MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
     (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation
     (Kirill A. Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)
 
 KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems,
     to support PCI BAR space beyond the 10TiB region
     (CONFIG_PCI_P2PDMA=y) (Balbir Singh)
 
 CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
 
 System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)
 
 Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling
 
 AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
 
 Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()
 
 Bootup:
 
 Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0
     (Nathan Chancellor)
 
 Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit
 
 Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
     (Thomas Huth)
 
 Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
     (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking instructions
     (Uros Bizjak)
 
 Earlyprintk:
   - Harden early_serial (Peter Zijlstra)
 
 NMI handler:
   - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
     (Waiman Long)
 
 Miscellaneous fixes and cleanups:
 
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
     Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
     Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
     Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
     Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
     Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
     Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
     Vitaly Kuznetsov, Xin Li, liuye.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
 OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
 Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
 DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
 fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
 NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
 THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
 tJj1oc2TIZw=
 =Dcb3
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Linus Torvalds
3ba7dfb8da RCU pull request for v6.15
This pull request contains the following branches:
 
 docs.2025.02.04a:
  - Add broken-timing possibility to stallwarn.rst.
  - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr().
  - Document self-propagating callbacks.
  - Point call_srcu() to call_rcu() for detailed memory ordering.
  - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header.
  - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text.
  - Remove references to old grace-period-wait primitives.
 
 srcu.2025.02.05a:
  - Introduce srcu_read_{un,}lock_fast(), which is similar to
    srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock at the
    cost of calling synchronize_rcu() in synchronize_srcu(). Moreover, by
    returning the percpu offset of the counter at srcu_read_lock_fast()
    time, srcu_read_unlock_fast() can save extra pointer dereferencing,
    which makes it faster than srcu_read_{un,}lock_lite().
    srcu_read_{un,}lock_fast() are intended to replace
    rcu_read_{un,}lock_trace() if possible.
 
 torture.2025.02.05a:
  - Add get_torture_init_jiffies() to return the start time of the test.
  - Add a test_boost_holdoff module parameter to allow delaying boosting
    tests when building rcutorture as built-in.
  - Add grace period sequence number logging at the beginning and end of
    failure/close-call results.
  - Switch to hexadecimal for the expedited grace period sequence number
    in the rcu_exp_grace_period trace point.
  - Make cur_ops->format_gp_seqs take buffer length.
  - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool.
  - Complain when invalid SRCU reader_flavor is specified.
  - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
    uses atomics even when percpu ops are NMI safe, and use the Kconfig
    for SRCU lockdep testing.
 
 misc.2025.03.04a:
  - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing.
  - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes().
  - Fix get_state_synchronize_rcu_full() GP-start detection.
  - Move RCU Tasks self-tests to core_initcall().
  - Print segment lengths in show_rcu_nocb_gp_state().
  - Make RCU watch ct_kernel_exit_state() warning.
  - Flush console log from kernel_power_off().
  - rcutorture: Allow a negative value for nfakewriters.
  - rcu: Update TREE05.boot to test normal synchronize_rcu().
  - rcu: Use _full() API to debug synchronize_rcu().
 
 lazypreempt.2025.03.04a: Make RCU handle PREEMPT_LAZY better:
  - Fix header guard for rcu_all_qs().
  - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY.
  - Update __cond_resched comment about RCU quiescent states.
  - Handle unstable rdp in rcu_read_unlock_strict().
  - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y.
  - osnoise: Provide quiescent states.
  - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
    combination.
  - Limit PREEMPT_RCU configurations.
  - Make rcutorture senario TREE07 and senario TREE10 use PREEMPT_LAZY=y.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmfeBLQACgkQSXnow7UH
 +rh11Qf/Rt6IZJ/YT/V9Sd+8hMx4O0BMh779pr9cD6mbAG+FDk2Yeva1m8vIdFOb
 qId6oc8K/ef2JfFjSn0oHMzQP2D3XUyiJWPNbBDHv/D8Os8GZgjzu8dkxVkSbdbY
 OxtvIflbcqFN1JDJfGKZnTEW0/YxGqfnS9b6R7iyyA7SOGQ/WubGOE5qNCqPufc9
 zJiP+qTUFYQzCIiPlEJul39o9KboPogbt3QAAQjWmi3utd77ehJnm/15FvAjyau4
 uhC2cnGfMY535rQaiaQeBQ/IHIowKripCq0JQFvcUNdyArZM3HOI2x79+2II6ft7
 mjHskNODOIJHfW2o1RzQ0yRYAywFIg==
 =J+mH
 -----END PGP SIGNATURE-----

Merge tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Boqun Feng:
 "Documentation:
   - Add broken-timing possibility to stallwarn.rst
   - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr()
   - Document self-propagating callbacks
   - Point call_srcu() to call_rcu() for detailed memory ordering
   - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
   - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
   - Remove references to old grace-period-wait primitives

  srcu:
   - Introduce srcu_read_{un,}lock_fast(), which is similar to
     srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock
     at the cost of calling synchronize_rcu() in synchronize_srcu()

     Moreover, by returning the percpu offset of the counter at
     srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid
     extra pointer dereferencing, which makes it faster than
     srcu_read_{un,}lock_lite()

     srcu_read_{un,}lock_fast() are intended to replace
     rcu_read_{un,}lock_trace() if possible

  RCU torture:
   - Add get_torture_init_jiffies() to return the start time of the test
   - Add a test_boost_holdoff module parameter to allow delaying
     boosting tests when building rcutorture as built-in
   - Add grace period sequence number logging at the beginning and end
     of failure/close-call results
   - Switch to hexadecimal for the expedited grace period sequence
     number in the rcu_exp_grace_period trace point
   - Make cur_ops->format_gp_seqs take buffer length
   - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
   - Complain when invalid SRCU reader_flavor is specified
   - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
     uses atomics even when percpu ops are NMI safe, and use the Kconfig
     for SRCU lockdep testing

  Misc:
   - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
   - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
   - Fix get_state_synchronize_rcu_full() GP-start detection
   - Move RCU Tasks self-tests to core_initcall()
   - Print segment lengths in show_rcu_nocb_gp_state()
   - Make RCU watch ct_kernel_exit_state() warning
   - Flush console log from kernel_power_off()
   - rcutorture: Allow a negative value for nfakewriters
   - rcu: Update TREE05.boot to test normal synchronize_rcu()
   - rcu: Use _full() API to debug synchronize_rcu()

  Make RCU handle PREEMPT_LAZY better:
   - Fix header guard for rcu_all_qs()
   - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY
   - Update __cond_resched comment about RCU quiescent states
   - Handle unstable rdp in rcu_read_unlock_strict()
   - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
   - osnoise: Provide quiescent states
   - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
     combination
   - Limit PREEMPT_RCU configurations
   - Make rcutorture senario TREE07 and senario TREE10 use
     PREEMPT_LAZY=y"

* tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits)
  rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
  rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
  rcu: limit PREEMPT_RCU configurations
  rcutorture: Update ->extendables check for lazy preemption
  rcutorture: Update rcutorture_one_extend_check() for lazy preemption
  osnoise: provide quiescent states
  rcu: Use _full() API to debug synchronize_rcu()
  rcu: Update TREE05.boot to test normal synchronize_rcu()
  rcutorture: Allow a negative value for nfakewriters
  Flush console log from kernel_power_off()
  context_tracking: Make RCU watch ct_kernel_exit_state() warning
  rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
  rcu-tasks: Move RCU Tasks self-tests to core_initcall()
  rcu: Fix get_state_synchronize_rcu_full() GP-start detection
  torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()
  srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
  rcutorture: Complain when invalid SRCU reader_flavor is specified
  rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
  rcutorture: Make cur_ops->format_gp_seqs take buffer length
  rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
  ...
2025-03-24 19:41:37 -07:00
Linus Torvalds
f81c2b8150 It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
   current standards and raise the minimum Python required to 3.9.  Much of
   this is preparatory to replacing the ancient Perl scripts/kernel-doc
   horror with a slightly less horrifying Python implementation, expected
   for 6.16.
 
 - Update the minimum Sphinx required to 3.4.3, allowing us to remove a
   bunch of older compatibility code.
 
 - Rework and improve the generation of the ABI documentation.
 
   (All of the above done by Mauro)
 
 - Lots of translation updates.  Alex Shi and Yanteng Si are taking on
   responsibility for the Chinese translations going forward; that work will
   still get to you via docs-next
 
 - Try to standardize the format for indicating a developer's affiliation in
   commit tags.
 
 - Clarify the TAB's role in CoC enforcement actions.
 
 - Try to spell out the rules for when a commit tag can name another
   developer without their explicit permission.
 
 Plus lots of other typo fixes and updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx
 PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU
 wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6
 FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2
 Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl
 gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw=
 =Gvdp
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.15' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a reasonably busy cycle for docs...

   - Significant changes throughout the tree to bring Python code up to
     current standards and raise the minimum Python required to 3.9

     Much of this is preparatory to replacing the ancient Perl
     scripts/kernel-doc horror with a slightly less horrifying Python
     implementation, expected for 6.16

   - Update the minimum Sphinx required to 3.4.3, allowing us to remove
     a bunch of older compatibility code

   - Rework and improve the generation of the ABI documentation

  (All of the above done by Mauro)

   - Lots of translation updates. Alex Shi and Yanteng Si are taking on
     responsibility for the Chinese translations going forward; that
     work will still get to you via docs-next

   - Try to standardize the format for indicating a developer's
     affiliation in commit tags

   - Clarify the TAB's role in CoC enforcement actions

   - Try to spell out the rules for when a commit tag can name another
     developer without their explicit permission

  Plus lots of other typo fixes and updates"

* tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits)
  docs/zh_CN: fix spelling mistake
  docs/Chinese: change the disclaimer words
  docs/zh_CN: Add snp-tdx-threat-model index Chinese translation
  docs: driver-api: firmware: clarify userspace requirements
  docs: clarify rules wrt tagging other people
  docs: Remove outdated highuid.rst documentation
  Documentation: dma-buf: heaps: Add heap name definitions
  docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README
  docs: Correct installation instruction
  Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst
  Documentation/CoC: Spell out the TAB role in enforcement decisions
  Documentation: ocxl.rst: Update consortium site
  scripts: get_feat.pl: substitute s390x with s390
  scripts/kernel-doc: drop dead code for Wcontents_before_sections
  scripts/kernel-doc: don't add not needed new lines
  docs: driver-api/infiniband.rst: fix Kerneldoc markup
  drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
  drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup
  include/asm-generic/io.h: fix kerneldoc markup
  Docs/arch/arm64: Fix spelling in amu.rst
  ...
2025-03-24 18:42:27 -07:00
Linus Torvalds
94dc216ad8 cgroup: Changes for v6.15
- Add deprecation info messages to cgroup1-only features.
 
 - rstat updates including a bug fix and breaking up a critical section to
   reduce interrupt latency impact.
 
 - Other misc and doc updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ9xO2g4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGQz4AQDeWKmngRsnddEMkqOV1ArwXSr+8xUQrvCBx0RL
 vcjOQQEAusGCTeGXWJ96kw+N9BXvGwFsfSeoxjOqAnvrBS1EgAc=
 =WvJg
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Add deprecation info messages to cgroup1-only features

 - rstat updates including a bug fix and breaking up a critical section
   to reduce interrupt latency impact

 - Other misc and doc updates

* tag 'cgroup-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rstat: Cleanup flushing functions and locking
  cgroup/rstat: avoid disabling irqs for O(num_cpu)
  mm: Fix a build breakage in memcontrol-v1.c
  blk-cgroup: Simplify policy files registration
  cgroup: Update file naming comment
  cgroup: Add deprecation message to legacy freezer controller
  mm: Add transformation message for per-memcg swappiness
  RFC cgroup/cpuset-v1: Add deprecation messages to sched_relax_domain_level
  cgroup/cpuset-v1: Add deprecation messages to memory_migrate
  cgroup/cpuset-v1: Add deprecation messages to mem_exclusive and mem_hardwall
  cgroup: Print message when /proc/cgroups is read on v2-only system
  cgroup/blkio: Add deprecation messages to reset_stats
  cgroup/cpuset-v1: Add deprecation messages to memory_spread_page and memory_spread_slab
  cgroup/cpuset-v1: Add deprecation messages to sched_load_balance and memory_pressure_enabled
  cgroup, docs: Be explicit about independence of RT_GROUP_SCHED and non-cpu controllers
  cgroup/rstat: Fix forceidle time in cpu.stat
  cgroup/misc: Remove unused misc_cg_res_total_usage
  cgroup/cpuset: Move procfs cpuset attribute under cgroup-v1.c
  cgroup: update comment about dropping cgroup kn refs
2025-03-24 16:49:40 -07:00
Linus Torvalds
fc13a78e1f hardening updates for v6.15-rc1
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan Vadivel)
 
 - samples/check-exec: Fix script name (Mickaël Salaün)
 
 - yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
 
 - lib/string_choices: Sort by function name (R Sundar)
 
 - hardening: Allow default HARDENED_USERCOPY to be set at compile time
   (Mel Gorman)
 
 - uaccess: Split out compile-time checks into ucopysize.h
 
 - kbuild: clang: Support building UM with SUBARCH=i386
 
 - x86: Enable i386 FORTIFY_SOURCE on Clang 16+
 
 - ubsan/overflow: Rework integer overflow sanitizer option
 
 - Add missing __nonstring annotations for callers of memtostr*()/strtomem*()
 
 - Add __must_be_noncstr() and have memtostr*()/strtomem*() check for it
 
 - Introduce __nonstring_array for silencing future GCC 15 warnings
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hGrgAKCRA2KwveOeQk
 u1WvAQC3ZxFu3b0Omfmht2pPqCltf2UOQNvUx3egjoeXpUaNSgD+Lxr/T4xksy7E
 jHh7rCYDkruOWs3DHA5JjRQcf0BBLQo=
 =FTQp
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "As usual, it's scattered changes all over. Patches touching things
  outside of our traditional areas in the tree have been Acked by
  maintainers or were trivial changes:

   - loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
     Vadivel)

   - samples/check-exec: Fix script name (Mickaël Salaün)

   - yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)

   - lib/string_choices: Sort by function name (R Sundar)

   - hardening: Allow default HARDENED_USERCOPY to be set at compile
     time (Mel Gorman)

   - uaccess: Split out compile-time checks into ucopysize.h

   - kbuild: clang: Support building UM with SUBARCH=i386

   - x86: Enable i386 FORTIFY_SOURCE on Clang 16+

   - ubsan/overflow: Rework integer overflow sanitizer option

   - Add missing __nonstring annotations for callers of
     memtostr*()/strtomem*()

   - Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
     it

   - Introduce __nonstring_array for silencing future GCC 15 warnings"

* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
  compiler_types: Introduce __nonstring_array
  hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
  x86/build: Remove -ffreestanding on i386 with GCC
  ubsan/overflow: Enable ignorelist parsing and add type filter
  ubsan/overflow: Enable pattern exclusions
  ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
  samples/check-exec: Fix script name
  yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
  kbuild: clang: Support building UM with SUBARCH=i386
  loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
  lib/string_choices: Rearrange functions in sorted order
  string.h: Validate memtostr*()/strtomem*() arguments more carefully
  compiler.h: Introduce __must_be_noncstr()
  nilfs2: Mark on-disk strings as nonstring
  uapi: stddef.h: Introduce __kernel_nonstring
  x86/tdx: Mark message.bytes as nonstring
  string: kunit: Mark nonstring test strings as __nonstring
  scsi: qla2xxx: Mark device strings as nonstring
  scsi: mpt3sas: Mark device strings as nonstring
  scsi: mpi3mr: Mark device strings as nonstring
  ...
2025-03-24 15:18:08 -07:00
Rafael J. Wysocki
7a9072d859 Merge branch 'pm-cpuidle'
Merge cpuidle updates for 6.15-rc5, including a menu governor update
that is reported to improve some benchmark results quite significantly:

 - Update the handling of the most recent idle intervals in the menu
   cpuidle governor to prevent useful information from being discarded
   by it in some cases and improve the prediction accuracy (Rafael
   Wysocki).

 - Make it possible to tell the intel_idle driver to ignore its built-in
   table of idle states for the given processor, clean up the handling
   of auto-demotion disabling on Baytrail and Cherrytrail chips in it,
   and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy,
   Rafael Wysocki).

 - Make some cpuidle drivers use for_each_present_cpu() instead of
   for_each_possible_cpu() during initialization to avoid issues
   occurring when nosmp or maxcpus=0 are used (Jacky Bai).

* pm-cpuidle:
  cpuidle: Init cpuidle only for present CPUs
  cpuidle: intel_idle: Update MAINTAINERS
  intel_idle: introduce 'no_native' module parameter
  cpuidle: menu: Update documentation after get_typical_interval() changes
  cpuidle: menu: Avoid discarding useful information
  cpuidle: menu: Eliminate outliers on both ends of the sample set
  cpuidle: menu: Tweak threshold use in get_typical_interval()
  cpuidle: menu: Use one loop for average and variance computations
  cpuidle: menu: Drop a redundant local variable
  intel_idle: clean up BYT/CHT auto demotion disable
2025-03-24 14:43:30 +01:00
Rafael J. Wysocki
1774be7cfc Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.15-rc1:

 - Manage sysfs attributes and boost frequencies efficiently from
   cpufreq core to reduce boilerplate code from drivers (Viresh Kumar).

 - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
   Dhananjay Ugwekar, Imran Shaik, and zuoqian).

 - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
   Bai).

 - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski).

 - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng).

 - Optimize the amd-pstate driver to avoid cases where call paths end
   up calling the same writes multiple times and needlessly caching
   variables through code reorganization, locking overhaul and tracing
   adjustments (Mario Limonciello, Dhananjay Ugwekar).

 - Make it possible to avoid enabling capacity-aware scheduling (CAS) in
   the intel_pstate driver and relocate a check for out-of-band (OOB)
   platform handling in it to make it detect OOB before checking HWP
   availability (Rafael Wysocki).

 - Fix dbs_update() to avoid inadvertent conversions of negative integer
   values to unsigned int which causes CPU frequency selection to be
   inaccurate in some cases when the "conservative" cpufreq governor is
   in use (Jie Zhan).

* pm-cpufreq: (91 commits)
  dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
  dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
  cpufreq: Init cpufreq only for present CPUs
  cpufreq: tegra186: Share policy per cluster
  cpufreq/amd-pstate: Drop actions in amd_pstate_epp_cpu_offline()
  cpufreq/amd-pstate: Stop caching EPP
  cpufreq/amd-pstate: Rework CPPC enabling
  cpufreq/amd-pstate: Drop debug statements for policy setting
  cpufreq/amd-pstate: Update cppc_req_cached for shared mem EPP writes
  cpufreq/amd-pstate: Move all EPP tracing into *_update_perf and *_set_epp functions
  cpufreq/amd-pstate: Cache CPPC request in shared mem case too
  cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks
  cpufreq/amd-pstate-ut: Adjust variable scope
  cpufreq/amd-pstate-ut: Run on all of the correct CPUs
  cpufreq/amd-pstate-ut: Drop SUCCESS and FAIL enums
  cpufreq/amd-pstate-ut: Allow lowest nonlinear and lowest to be the same
  cpufreq/amd-pstate-ut: Use _free macro to free put policy
  cpufreq/amd-pstate: Drop `cppc_cap1_cached`
  ...
2025-03-24 14:19:50 +01:00
Hao Jia
4c8bc7c4e3 cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
The commit 15ff4d409e ("mm/memcontrol: add per-memcg pgpgin/pswpin
counter") introduced the pswpin and pswpout items in the memory.stat of
cgroup v2.  Therefore, update them accordingly in the cgroup-v2
documentation.

Link: https://lkml.kernel.org/r/20250318075833.90615-3-jiahao.kernel@gmail.com
Fixes: 15ff4d409e ("mm/memcontrol: add per-memcg pgpgin/pswpin counter") 
Signed-off-by: Hao Jia <jiahao1@lixiang.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:16 -07:00
Hao Jia
e452872b40 mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
Patch series "Adding Proactive Memory Reclaim Statistics".

These two patches are related to proactive memory reclaim.

Patch 1 Split proactive reclaim statistics from direct reclaim counters
and introduces new counters: pgsteal_proactive, pgdemote_proactive,
and pgscan_proactive.

Patch 2 Adds pswpin and pswpout items to the cgroup-v2 documentation.


This patch (of 2):

In proactive memory reclaim scenarios, it is necessary to accurately track
proactive reclaim statistics to dynamically adjust the frequency and
amount of memory being reclaimed proactively.  Currently, proactive
reclaim is included in direct reclaim statistics, which can make these
direct reclaim statistics misleading.

Therefore, separate proactive reclaim memory from the direct reclaim
counters by introducing new counters: pgsteal_proactive,
pgdemote_proactive, and pgscan_proactive, to avoid confusion with direct
reclaim.

Link: https://lkml.kernel.org/r/20250318075833.90615-1-jiahao.kernel@gmail.com
Link: https://lkml.kernel.org/r/20250318075833.90615-2-jiahao.kernel@gmail.com
Signed-off-by: Hao Jia <jiahao1@lixiang.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:16 -07:00
Donghyeok Choe
c1aa3daa51 printk/panic: Add option to allow non-panic CPUs to write to the ring buffer.
Commit 779dbc2e78 ("printk: Avoid non-panic CPUs writing to ringbuffer")
aimed to isolate panic-related messages. However, when panic() itself
malfunctions, messages from non-panic CPUs become crucial for debugging.

While commit bcc954c6ca ("printk/panic: Allow cpu backtraces to
be written into ringbuffer during panic") enables non-panic CPU
backtraces, it may not provide sufficient diagnostic information.

Introduce the "debug_non_panic_cpus" command-line option, enabling
non-panic CPU messages to be stored in the ring buffer during a panic.
This also prevents discarding non-finalized messages from non-panic CPUs
during console flushing, providing a more comprehensive view of system
state during critical failures.

Link: https://lore.kernel.org/all/Z8cLEkqLL2IOyNIj@pathway/
Signed-off-by: Donghyeok Choe <d7271.choe@samsung.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20250318022320.2428155-1-d7271.choe@samsung.com
[pmladek@suse.com: Added documentation, added module_parameter, removed printk_ prefix.]
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-20 15:48:36 +01:00
Andrew Jones
9fe58530a8
Documentation/kernel-parameters: Add riscv unaligned speed parameters
Document riscv parameters used to select scalar and vector unaligned
access speeds.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-18-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-19 14:23:31 +00:00
Pawan Gupta
722fa0dba7 x86/rfds: Exclude P-only parts from the RFDS affected list
The affected CPU table (cpu_vuln_blacklist) marks Alderlake and Raptorlake
P-only parts affected by RFDS. This is not true because only E-cores are
affected by RFDS. With the current family/model matching it is not possible
to differentiate the unaffected parts, as the affected and unaffected
hybrid variants have the same model number.

Add a cpu-type match as well for such parts so as to exclude P-only parts
being marked as affected.

Note, family/model and cpu-type enumeration could be inaccurate in
virtualized environments. In a guest affected status is decided by RFDS_NO
and RFDS_CLEAR bits exposed by VMMs.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20250311-add-cpu-type-v8-5-e8514dcaaff2@linux.intel.com
2025-03-19 11:17:23 +01:00
Ingo Molnar
89771319e0 Linux 6.14-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfXVtUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGN/sH/i5423Gt/z51gDjA
 s4v5Z7GaBJ9zOGBahn2RWFe72zytTqKrEJmMnGfguirs0atD1DtQj4WAP7iFKP+e
 WyO663X6HF7i5y37ja0Yd4PZc31hwtqzKH8LjBf8f8tTy8UsEVqumdi5A4sS9KTM
 qm4kTyyVEY9D/s7oRY8ywjDlRJtO6nT0aKMp4kAqNEbrNUYbilT/a0hgXcgSmPyB
 uIjmjL2fZfutxGI5LgfbaSHCa1ElmhvTvivOMpaAmZSGCRVHCKGgT0CTNnHyn/7C
 dB145JkRO4ZOUqirCdO4PE/23id3ajq9fcixJGBzAv7c45y+B3JZ1r2kAfKalE8/
 qrOKLys=
 =8r7a
 -----END PGP SIGNATURE-----

Merge tag 'v6.14-rc7' into x86/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-19 11:03:06 +01:00
Johannes Weiner
e3aa7df331 mm: page_alloc: defrag_mode
The page allocator groups requests by migratetype to stave off
fragmentation.  However, in practice this is routinely defeated by the
fact that it gives up *before* invoking reclaim and compaction - which may
well produce suitable pages.  As a result, fragmentation of physical
memory is a common ongoing process in many load scenarios.

Fragmentation deteriorates compaction's ability to produce huge pages. 
Depending on the lifetime of the fragmenting allocations, those effects
can be long-lasting or even permanent, requiring drastic measures like
forcible idle states or even reboots as the only reliable ways to recover
the address space for THP production.

In a kernel build test with supplemental THP pressure, the THP allocation
rate steadily declines over 15 runs:

    thp_fault_alloc
    61988
    56474
    57258
    50187
    52388
    55409
    52925
    47648
    43669
    40621
    36077
    41721
    36685
    34641
    33215

This is a hurdle in adopting THP in any environment where hosts are shared
between multiple overlapping workloads (cloud environments), and rarely
experience true idle periods.  To make THP a reliable and predictable
optimization, there needs to be a stronger guarantee to avoid such
fragmentation.

Introduce defrag_mode.  When enabled, reclaim/compaction is invoked to its
full extent *before* falling back.  Specifically, ALLOC_NOFRAGMENT is
enforced on the allocator fastpath and the reclaiming slowpath.

For now, fallbacks are permitted to avert OOMs.  There is a plan to add
defrag_mode=2 to prefer OOMs over fragmentation, but this requires
additional prep work in compaction and the reserve management to make it
ready for all possible allocation contexts.

The following test results are from a kernel build with periodic bursts of
THP allocations, over 15 runs:

                                        vanilla    defrag_mode=1
@claimer[unmovable]:                        189              103
@claimer[movable]:                           92              103
@claimer[reclaimable]:                      207               61
@pollute[unmovable from movable]:            25                0
@pollute[unmovable from reclaimable]:        28                0
@pollute[movable from unmovable]:         38835                0
@pollute[movable from reclaimable]:      147136                0
@pollute[reclaimable from unmovable]:       178                0
@pollute[reclaimable from movable]:          33                0
@steal[unmovable from movable]:              11                0
@steal[unmovable from reclaimable]:           5                0
@steal[reclaimable from unmovable]:         107                0
@steal[reclaimable from movable]:            90                0
@steal[movable from reclaimable]:           354                0
@steal[movable from unmovable]:             130                0

Both types of polluting fallbacks are eliminated in this workload.

Interestingly, whole block conversions are reduced as well.  This is
because once a block is claimed for a type, its empty space remains
available for future allocations, instead of being padded with fallbacks;
this allows the native type to group up instead of spreading out to new
blocks.  The assumption in the allocator has been that pollution from
movable allocations is less harmful than from other types, since they can
be reclaimed or migrated out should the space be needed.  However, since
fallbacks occur *before* reclaim/compaction is invoked, movable pollution
will still cause non-movable allocations to spread out and claim more
blocks.

Without fragmentation, THP rates hold steady with defrag_mode=1:

    thp_fault_alloc
    32478
    20725
    45045
    32130
    14018
    21711
    40791
    29134
    34458
    45381
    28305
    17265
    22584
    28454
    30850

While the downward trend is eliminated, the keen reader will of course
notice that the baseline rate is much smaller than the vanilla kernel's to
begin with.  This is due to deficiencies in how reclaim and compaction are
currently driven: ALLOC_NOFRAGMENT increases the extent to which smaller
allocations are competing with THPs for pageblocks, while making no effort
themselves to reclaim or compact beyond their own request size.  This
effect already exists with the current usage of ALLOC_NOFRAGMENT, but is
amplified by defrag_mode insisting on whole block stealing much more
strongly.

Subsequent patches will address defrag_mode reclaim strategy to raise the
THP success baseline above the vanilla kernel.

Link: https://lkml.kernel.org/r/20250313210647.1314586-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:07:07 -07:00
SeongJae Park
114b480877 Docs/admin-guide/mm/damon/usage: update for {core,ops}_filters directories
Document {core,ops}_filters directories on usage document.

Link: https://lkml.kernel.org/r/20250305222733.59089-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:50 -07:00
David Hildenbrand
749492229e mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)
Everything is in place to stop using the per-page mapcounts in large
folios: the mapcount of tail pages will always be logically 0 (-1 value),
just like it currently is for hugetlb folios already, and the page
mapcount of the head page is either 0 (-1 value) or contains a page type
(e.g., hugetlb).

Maintaining _nr_pages_mapped without per-page mapcounts is impossible, so
that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.

There are two remaining implications:

(1) Per-node, per-cgroup and per-lruvec stats of "NR_ANON_MAPPED"
    ("mapped anonymous memory") and "NR_FILE_MAPPED"
    ("mapped file memory"):

    As soon as any page of the folio is mapped -- folio_mapped() -- we
    now account the complete folio as mapped. Once the last page is
    unmapped -- !folio_mapped() -- we account the complete folio as
    unmapped.

    This implies that ...

    * "AnonPages" and "Mapped" in /proc/meminfo and
      /sys/devices/system/node/*/meminfo
    * cgroup v2: "anon" and "file_mapped" in "memory.stat" and
      "memory.numa_stat"
    * cgroup v1: "rss" and "mapped_file" in "memory.stat" and
      "memory.numa_stat

    ... can now appear higher than before. But note that these folios do
    consume that memory, simply not all pages are actually currently
    mapped.

    It's worth nothing that other accounting in the kernel (esp. cgroup
    charging on allocation) is not affected by this change.

    [why oh why is "anon" called "rss" in cgroup v1]

 (2) Detecting partial mappings

     Detecting whether anon THPs are partially mapped gets a bit more
     unreliable. As long as a single MM maps such a large folio
     ("exclusively mapped"), we can reliably detect it. Especially before
     fork() / after a short-lived child process quit, we will detect
     partial mappings reliably, which is the common case.

     In essence, if the average per-page mapcount in an anon THP is < 1,
     we know for sure that we have a partial mapping.

     However, as soon as multiple MMs are involved, we might miss detecting
     partial mappings: this might be relevant with long-lived child
     processes. If we have a fully-mapped anon folio before fork(), once
     our child processes and our parent all unmap (zap/COW) the same pages
     (but not the complete folio), we might not detect the partial mapping.
     However, once the child processes quit we would detect the partial
     mapping.

     How relevant this case is in practice remains to be seen.
     Swapout/migration will likely mitigate this.

     In the future, RMAP walkers could check for that for that case
     (e.g., when collecting access bits during reclaim) and simply flag
     them for deferred-splitting.

Link: https://lkml.kernel.org/r/20250303163014.1128035-21-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:48 -07:00
David Hildenbrand
eb16876971 fs/proc/task_mmu: remove per-page mapcount dependency for PM_MMAP_EXCLUSIVE (CONFIG_NO_PAGE_MAPCOUNT)
Let's implement an alternative when per-page mapcounts in large folios are
no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.

PM_MMAP_EXCLUSIVE will now be set if folio_likely_mapped_shared() is true
-- when the folio is considered "mapped shared", including when it once
was "mapped shared" but no longer is, as documented.

This might result in and under-indication of "exclusively mapped", which
is considered better than over-indicating it: under-estimating the USS
(Unique Set Size) is better than over-estimating it.

As an alternative, we could simply remove that flag with
CONFIG_NO_PAGE_MAPCOUNT completely, but there might be value to it.  So,
let's keep it like that and document the behavior.

Link: https://lkml.kernel.org/r/20250303163014.1128035-18-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:47 -07:00
David Hildenbrand
ae4192b769 fs/proc/page: remove per-page mapcount dependency for /proc/kpagecount (CONFIG_NO_PAGE_MAPCOUNT)
Let's implement an alternative when per-page mapcounts in large folios are
no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.

For large folios, we'll return the per-page average mapcount within the
folio, whereby we round to the closest integer when calculating the
average: however, we'll always return at least 1 if the folio is mapped.

So assuming a folio with 512 pages, the average would be:
* 0 if not pages are mapped
* 1 if there are 1 .. 767 per-page mappings
* 2 if there are 767 .. 1279 per-page mappings
...

For hugetlb folios and for large folios that are fully mapped into all
address spaces, there is no change.

We'll make use of this helper in other context next.

As an alternative, we could simply return 0 for non-hugetlb large folios,
or disable this legacy interface with CONFIG_NO_PAGE_MAPCOUNT.

But the information exposed by this interface can still be valuable, and
frequently we deal with fully-mapped large folios where the average
corresponds to the actual page mapcount.  So we'll leave it like this for
now and document the new behavior.

Note: this interface is likely not very relevant for performance.  If ever
required, we could try doing a rather expensive rmap walk to collect
precisely how often this folio page is mapped.

Link: https://lkml.kernel.org/r/20250303163014.1128035-17-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:47 -07:00
kth
fba374f328 docs: Remove outdated highuid.rst documentation
The highuid.rst document describes a transition that is outdated and no
longer relevant. Additionally, it references filesystems (ncpfs and smbfs),
which have been removed or replaced.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Kang Taeho <kangtaeho2456@gmail.com>
Link: https://lore.kernel.org/r/20250313145650.278346-1-kangtaeho2456@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-03-17 16:42:27 -06:00
Adam Simonelli
2f1f7787b6 printk: Add an option to allow ttynull to be a default console device
The new option is CONFIG_NULL_TTY_DEFAULT_CONSOLE.

if enabled, and CONFIG_VT is disabled, ttynull will become the default
primary console device.

ttynull will be the only console device usually with this option enabled.
Some architectures do call add_preferred_console() which may add another
console though.

Motivation:

Many distributions ship with CONFIG_VT enabled. On tested desktop hardware
if CONFIG_VT is disabled, the default console device falls back to
/dev/ttyS0 instead of /dev/tty.

This could cause issues in user space, and hardware problems:

1. The user space issues include the case where  /dev/ttyS0 is
disconnected, and the TCGETS ioctl, which some user space libraries use
as a probe to determine if a file is a tty, is called on /dev/console and
fails. Programs that call isatty() on /dev/console and get an incorrect
false value may skip expected logging to /dev/console.

2. The hardware issues include the case if a user has a science instrument
or other device connected to the /dev/ttyS0 port, and they were to upgrade
to a kernel that is disabling the CONFIG_VT option, kernel logs will then
be sent to the device connected to /dev/ttyS0 unless they edit their
kernel command line manually.

The new CONFIG_NULL_TTY_DEFAULT_CONSOLE option will give users and
distribution maintainers an option to avoid this. Disabling CONFIG_VT and
enabling CONFIG_NULL_TTY_DEFAULT_CONSOLE will ensure the default kernel
console behavior is not dependent on hardware configuration by default, and
avoid unexpected new behavior on devices connected to the /dev/ttyS0 serial
port.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Adam Simonelli <adamsimonelli@gmail.com>
Link: https://lore.kernel.org/r/20250314160749.3286153-2-adamsimonelli@gmail.com
[pmladek@suse.com: Fixed indentation of the commit message.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-17 16:25:18 +01:00
Thomas Prescher
71f7456889 mm: hugetlb: add hugetlb_alloc_threads cmdline option
Add a command line option that enables control of how many threads should
be used to allocate huge pages.

[akpm@linux-foundation.org: tidy up a comment]
Link: https://lkml.kernel.org/r/20250227-hugepage-parameter-v2-2-7db8c6dc0453@cyberus-technology.de
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 00:05:36 -07:00
SeongJae Park
b243d666d1 Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the hierarchy
Document DAMON sysfs interface usage for DAMON sampling and aggregation
intervals auto-tuning.

Link: https://lkml.kernel.org/r/20250303221726.484227-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 00:05:34 -07:00
Ahmad Fatoum
e016173f65 reboot: add support for configuring emergency hardware protection action
We currently leave the decision of whether to shutdown or reboot to
protect hardware in an emergency situation to the individual drivers.

This works out in some cases, where the driver detecting the critical
failure has inside knowledge: It binds to the system management controller
for example or is guided by hardware description that defines what to do.

In the general case, however, the driver detecting the issue can't know
what the appropriate course of action is and shouldn't be dictating the
policy of dealing with it.

Therefore, add a global hw_protection toggle that allows the user to
specify whether shutdown or reboot should be the default action when the
driver doesn't set policy.

This introduces no functional change yet as hw_protection_trigger() has no
callers, but these will be added in subsequent commits.

[arnd@arndb.de: hide unused hw_protection_attr]
  Link: https://lkml.kernel.org/r/20250224141849.1546019-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-7-e1c09b090c0c@pengutronix.de
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Cc: Benson Leung <bleung@chromium.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Fabio Estevam <festevam@denx.de>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matteo Croce <teknoraver@meta.com>
Cc: Matti Vaittinen <mazziesaccount@gmail.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 23:24:14 -07:00
Lorenzo Stoakes
8e2f2aeb8b fs/proc/task_mmu: add guard region bit to pagemap
Patch series "fs/proc/task_mmu: add guard region bit to pagemap".

Currently there is no means of determining whether a given page in a
mapping range is designated a guard region (as installed via madvise()
using the MADV_GUARD_INSTALL flag).

This is generally not an issue, but in some instances users may wish to
determine whether this is the case.

This series adds this ability via /proc/$pid/pagemap, updates the
documentation and adds a self test to assert that this functions
correctly.


This patch (of 2):

Currently there is no means by which users can determine whether a given
page in memory is in fact a guard region, that is having had the
MADV_GUARD_INSTALL madvise() flag applied to it.

This is intentional, as to provide this information in VMA metadata would
contradict the intent of the feature (providing a means to change fault
behaviour at a page table level rather than a VMA level), and would
require VMA metadata operations to scan page tables, which is
unacceptable.

In many cases, users have no need to reflect and determine what regions
have been designated guard regions, as it is the user who has established
them in the first place.

But in some instances, such as monitoring software, or software that
relies upon being able to ascertain the nature of mappings within a remote
process for instance, it becomes useful to be able to determine which
pages have the guard region marker applied.

This patch makes use of an unused pagemap bit (58) to provide this
information.

This patch updates the documentation at the same time as making the change
such that the implementation of the feature and the documentation of it
are tied together.

Link: https://lkml.kernel.org/r/cover.1740139449.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/521d99c08b975fb06a1e7201e971cc24d68196d1.1740139449.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:41 -07:00
Sergey Senozhatsky
4127e13c93 zram: remove max_comp_streams device attr
max_comp_streams device attribute has been defunct since May 2016 when
zram switched to per-CPU compression streams, remove it.

Link: https://lkml.kernel.org/r/20250303022425.285971-5-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:33 -07:00
Frank van der Linden
f866cfcec2 mm/hugetlb: add hugetlb_cma_only cmdline option
Add an option to force hugetlb gigantic pages to be allocated using CMA
only (if hugetlb_cma is enabled).  This avoids a fallback to allocation
from the rest of system memory if the CMA allocation fails.  This makes
the size of hugetlb_cma a hard upper boundary for gigantic hugetlb page
allocations.

This is useful because, with a large CMA area, the kernel's unmovable
allocations will have less room to work with and it is undesirable for new
hugetlb gigantic page allocations to be done from that remaining area.  It
will eat in to the space available for unmovable allocations, leading to
unwanted system behavior (OOMs because the kernel fails to do unmovable
allocations).

So, with this enabled, an administrator can force a hard upper bound for
runtime gigantic page allocations, and have more predictable system
behavior.

Link: https://lkml.kernel.org/r/20250228182928.2645936-26-fvdl@google.com
Signed-off-by: Frank van der Linden <fvdl@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:31 -07:00
Frank van der Linden
5b47c02967 mm/hugetlb: convert cmdline parameters from setup to early
Convert the cmdline parameters (hugepagesz, hugepages, default_hugepagesz
and hugetlb_free_vmemmap) to early parameters.

Since parse_early_param might run before MMU setups on some platforms
(powerpc), validation of huge page sizes as specified in command line
parameters would fail.  So instead, for the hstate-related values, just
record the them and parse them on demand, from hugetlb_bootmem_alloc.

The allocation of hugetlb bootmem pages is now done in
hugetlb_bootmem_alloc, which is called explicitly at the start of
mm_core_init().  core_initcall would be too late, as that happens with
memblock already torn down.

This change will allow earlier allocation and initialization of bootmem
hugetlb pages later on.

No functional change intended.

Link: https://lkml.kernel.org/r/20250228182928.2645936-8-fvdl@google.com
Signed-off-by: Frank van der Linden <fvdl@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:26 -07:00
Frank van der Linden
c009da4258 mm, cma: support multiple contiguous ranges, if requested
Currently, CMA manages one range of physically contiguous memory. 
Creation of larger CMA areas with hugetlb_cma may run in to gaps in
physical memory, so that they are not able to allocate that contiguous
physical range from memblock when creating the CMA area.

This can happen, for example, on an AMD system with > 1TB of memory, where
there will be a gap just below the 1TB (40bit DMA) line.  If you have set
aside most of memory for potential hugetlb CMA allocation,
cma_declare_contiguous_nid will fail.

hugetlb_cma doesn't need the entire area to be one physically contiguous
range.  It just cares about being able to get physically contiguous chunks
of a certain size (e.g.  1G), and it is fine to have the CMA area backed
by multiple physical ranges, as long as it gets 1G contiguous allocations.

Multi-range support is implemented by introducing an array of ranges,
instead of just one big one.  Each range has its own bitmap.  Effectively,
the allocate and release operations work as before, just per-range.  So,
instead of going through one large bitmap, they now go through a number of
smaller ones.

The maximum number of supported ranges is 8, as defined in CMA_MAX_RANGES.

Since some current users of CMA expect a CMA area to just use one
physically contiguous range, only allow for multiple ranges if a new
interface, cma_declare_contiguous_nid_multi, is used.  The other
interfaces will work like before, creating only CMA areas with 1 range.

cma_declare_contiguous_nid_multi works as follows, mimicking the
default "bottom-up, above 4G" reservation approach:

0) Try cma_declare_contiguous_nid, which will use only one
   region. If this succeeds, return. This makes sure that for
   all the cases that currently work, the behavior remains
   unchanged even if the caller switches from
   cma_declare_contiguous_nid to cma_declare_contiguous_nid_multi.
1) Select the largest free memblock ranges above 4G, with
   a maximum number of CMA_MAX_RANGES.
2) If we did not find at most CMA_MAX_RANGES that add
   up to the total size requested, return -ENOMEM.
3) Sort the selected ranges by base address.
4) Reserve them bottom-up until we get what we wanted.

Link: https://lkml.kernel.org/r/20250228182928.2645936-3-fvdl@google.com
Signed-off-by: Frank van der Linden <fvdl@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:25 -07:00
SeongJae Park
0f28583b28 Docs/damon: move DAMOS filter type names and meaning to design doc
DAMON sysfs usage doc is describing DAMOS filter type names and their
meanings in short.  The design doc is providing the short meaning and
detailed descriptions, too.  This is unnecessary duplicates and confuses
where to document new DAMOS filter types and features.  Move the details
from usage to design doc.

Link: https://lkml.kernel.org/r/20250218223708.53437-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:24 -07:00
Usama Arif
4ddb209268 Docs/admin-guide/mm/damon/usage: document hugepage_size filter type
This includes both the 'hugepage_size' filter type and the min/max files
used to decide range of sizes to filter on.

Link: https://lkml.kernel.org/r/20250211124437.278873-5-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:13 -07:00
Yosry Ahmed
6df8bae8e8 mm: zbud: remove zbud
The zbud compressed pages allocator is rarely used, most users use
zsmalloc.  zbud consumes much more memory (only stores 1 or 2 compressed
pages per physical page).  The only advantage of zbud is a marginal
performance improvement that by no means justify the memory overhead.

Historically, zsmalloc had significantly worse latency than zbud and
z3fold but offered better memory savings.  This is no longer the case as
shown by a simple recent analysis [1].  In a kernel build test on tmpfs in
a limited cgroup, zbud 2-3% less time than zsmalloc, but at the cost of
using ~32% more memory (1.5G vs 1.13G).  The tradeoff does not make sense
for zbud in any practical scenario.

The only alleged advantage of zbud is not having the dependency on
CONFIG_MMU, but CONFIG_SWAP already depends on CONFIG_MMU anyway, and zbud
is only used by zswap.

Remove zbud after z3fold's removal, leaving zsmalloc as the one and only
zpool allocator.  Leave the removal of the zpool API (and its associated
config options) to a followup cleanup after no more allocators show up.

Deprecating zbud for a few cycles before removing it was initially
proposed [2], like z3fold was marked as deprecated for 2 cycles [3]. 
However, Johannes rightfully pointed out that the 2 cycles is too short
for most downstream consumers, and z3fold was deprecated first only as a
courtesy anyway.

[1]https://lore.kernel.org/lkml/CAJD7tkbRF6od-2x_L8-A1QL3=2Ww13sCj4S3i4bNndqF+3+_Vg@mail.gmail.com/
[2]https://lore.kernel.org/lkml/Z5gdnSX5Lv-nfjQL@google.com/
[3]https://lore.kernel.org/lkml/20240904233343.933462-1-yosryahmed@google.com/

Link: https://lkml.kernel.org/r/20250129180633.3501650-3-yosry.ahmed@linux.dev
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:01 -07:00
Baokun Li
62c3da1eac ext4: update the descriptions of data_err=abort and data_err=ignore
We now print error messages in ext4_end_bio() when page writeback
encounters an error. If data_err=abort is set, the journal will also
be aborted in a kworker. This means that we now check all Buffer I/O
in all modes and decide whether to abort the journal based on the
data_err option. Therefore, we remove the ordered mode restriction
in the descriptions of data_err=abort and data_err=ignore.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250122110533.4116662-8-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13 10:08:09 -04:00
I Hsin Cheng
e5e6c016fc docs: Correct installation instruction
Ammend missing "install" operation keyword after "apt-get", and fix
"build-essentials" to "build-essential".

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250306030708.8133-1-richard120310@gmail.com
2025-03-12 16:40:06 -06:00
Dr. David Alan Gilbert
270247a209 PNP: Remove prehistoric deadcode
pnp_remove_card() is currently unused, it has been since it was
added in 2003's BKrev: 3e6d3f19XSmESWEZnNEReEJOJW5SOw

pnp_unregister_protocol() is currently unused,  it has been since
it was added in 2002's BKrev: 3df0cf6d4FVUKndhbfxjL7pksw5PGA

Remove them, and pnp_remove_card_device() and __pnp_remove_device()
which are now no longer used.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250307214936.74504-1-linux@treblig.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12 20:37:56 +01:00
Michal Koutný
78f6519ed0 cgroup: Add deprecation message to legacy freezer controller
As explained in the commit 76f969e894 ("cgroup: cgroup v2 freezer"),
the original freezer is imperfect, some users may unwittingly rely on it
when there exists the alternative of v2. Print a message when it happens
and explain that in the docs.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11 09:22:54 -10:00
Michal Koutný
fd4fd0a869 mm: Add transformation message for per-memcg swappiness
The concept of per-memcg swappiness has never landed well in memcg for
cgroup v2. Add a message to users who use it on v1 hierarchy.
Decreased swappiness transforms to memory.swap.max=0 whereas
increased swappiness transforms into active memory.reclaim operation.

Link: https://lore.kernel.org/r/1577252208-32419-1-git-send-email-teawater@gmail.com/
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11 09:22:54 -10:00
Linus Torvalds
9712d38c87 Kbuild fixes for v6.14 (3rd)
- Use the specified $(LD) when building userprogs with Clang
 
  - Pass the correct target triple when compile-testing UAPI headers
    with Clang
 
  - Fix pacman-pkg build error with KBUILD_OUTPUT
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmfN23gVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGui0P/0SOGcw7fprs7nKLbGKxwj6zVz+Y
 0kf/lVDrabyPc7AznsBVaoRNTJK7XkkpWsOyMLIRbrFRl9rFDZwvJXP9AKP4lSwM
 km7D6r27WBi06GlOyRHDt3bFUxYscS8NbDJ9g4gYqTJ0Rm0G+GwxHKsfSwEjDdg4
 jGaq3+lVqcHSywxmEApDOLyI1rZhcF0qSIqUW4EXY+m/Z5Z+Um0CjAMibN8ljphQ
 nk4I8HmFQgGoE5nVT8/KrrJdZlgEZp1QRU4LD5ToFfg+4ehukqkVg11c5lMsdUke
 P8Blok65r+cNFCLzfBrALHx+PnAefV3c5KdrGLZArNbyjLaPDmMcY02+CbPwzEvY
 UEpC69h9ygOHMUcdyGRxvB4DNS92yiO0GNBnG8giMUjpnrTiThPZ1eIhoUR9T87J
 QYk2RLJ1APwOtvQ87WKA5LnTTSJlxizWhaD/8n5Z2vmleYUdnl9r3JG1OV8Gre4p
 S7izDsckYm13g32R/kAaVDnLWUjVHZMOF6SCq4wpej8uG8LPAPy8Ji7xMLJ7B6Vr
 hlpJM2mkM+NnIAYVLmbXGXWSLu6Xw80PPWJm8dqcKDqdof64DM8/xXNCgTIuQwgl
 ravUau+KLx70R9vG+wPZLgpDTxT4WkUP9rtpxBrdmMyQzB2CrPPF3pteOehfX2BP
 7GJVR2lw0XkCx0rg
 =V4ee
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Use the specified $(LD) when building userprogs with Clang

 - Pass the correct target triple when compile-testing UAPI headers
   with Clang

 - Fix pacman-pkg build error with KBUILD_OUTPUT

* tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: install-extmod-build: Fix build when specifying KBUILD_OUTPUT
  docs: Kconfig: fix defconfig description
  kbuild: hdrcheck: fix cross build with clang
  kbuild: userprogs: use correct lld when linking through clang
2025-03-09 09:23:14 -10:00
Jason Gunthorpe
8eea4e7447 taint: Add TAINT_FWCTL
Requesting a fwctl scope of access that includes mutating device debug
data will cause the kernel to be tainted. Changing the device operation
through things in the debug scope may cause the device to malfunction in
undefined ways. This should be reflected in the TAINT flags to help any
debuggers understand that something has been done.

Link: https://patch.msgid.link/r/4-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06 15:13:13 -04:00
Shashank Balaji
c7461cca91 cgroup, docs: Be explicit about independence of RT_GROUP_SCHED and non-cpu controllers
The cgroup v2 cpu controller has a limitation that if
CONFIG_RT_GROUP_SCHED is enabled, the cpu controller can be enabled only
if all the realtime processes are in the root cgroup. The other
controllers have no such restriction. They can be used for the resource
control of realtime processes irrespective of whether
CONFIG_RT_GROUP_SCHED is enabled or not.

Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-05 08:43:53 -10:00
Satoru Takeuchi
dd0b7d4a56 docs: Kconfig: fix defconfig description
Commit 2a86f66121 ("kbuild: use KBUILD_DEFCONFIG as the fallback
for DEFCONFIG_LIST") removed arch/$ARCH/defconfig; however,
the document has not been updated to reflect this change yet.

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-03-05 04:07:06 +09:00
Breno Leitao
98fdaeb296 x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2
Change the default value of spectre v2 in user mode to respect the
CONFIG_MITIGATION_SPECTRE_V2 config option.

Currently, user mode spectre v2 is set to auto
(SPECTRE_V2_USER_CMD_AUTO) by default, even if
CONFIG_MITIGATION_SPECTRE_V2 is disabled.

Set the spectre_v2 value to auto (SPECTRE_V2_USER_CMD_AUTO) if the
Spectre v2 config (CONFIG_MITIGATION_SPECTRE_V2) is enabled, otherwise
set the value to none (SPECTRE_V2_USER_CMD_NONE).

Important to say the command line argument "spectre_v2_user" overwrites
the default value in both cases.

When CONFIG_MITIGATION_SPECTRE_V2 is not set, users have the flexibility
to opt-in for specific mitigations independently. In this scenario,
setting spectre_v2= will not enable spectre_v2_user=, and command line
options spectre_v2_user and spectre_v2 are independent when
CONFIG_MITIGATION_SPECTRE_V2=n.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Kaplan <David.Kaplan@amd.com>
Link: https://lore.kernel.org/r/20241031-x86_bugs_last_v2-v2-2-b7ff1dab840e@debian.org
2025-03-03 12:48:41 +01:00
Mel Gorman
d2132f453e mm: security: Allow default HARDENED_USERCOPY to be set at compile time
HARDENED_USERCOPY defaults to on if enabled at compile time. Allow
hardened_usercopy= default to be set at compile time similar to
init_on_alloc= and init_on_free=. The intent is that hardening
options that can be disabled at runtime can set their default at
build time.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20250123221115.19722-3-mgorman@techsingularity.net
Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28 11:51:31 -08:00
Arnd Bergmann
0081fdeccb x86/mm: Drop support for CONFIG_HIGHPTE
With the maximum amount of RAM now 4GB, there is very little point
to still have PTE pages in highmem. Drop this for simplification.

The only other architecture supporting HIGHPTE is 32-bit arm, and
once that feature is removed as well, the highpte logic can be
dropped from common code as well.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-8-arnd@kernel.org
2025-02-27 11:22:06 +01:00
Arnd Bergmann
bbeb69ce30 x86/mm: Remove CONFIG_HIGHMEM64G support
HIGHMEM64G support was added in linux-2.3.25 to support (then)
high-end Pentium Pro and Pentium III Xeon servers with more than 4GB of
addressing, NUMA and PCI-X slots started appearing.

I have found no evidence of this ever being used in regular dual-socket
servers or consumer devices, all the users seem obsolete these days,
even by i386 standards:

 - Support for NUMA servers (NUMA-Q, IBM x440, unisys) was already
   removed ten years ago.

 - 4+ socket non-NUMA servers based on Intel 450GX/450NX, HP F8 and
   ServerWorks ServerSet/GrandChampion could theoretically still work
   with 8GB, but these were exceptionally rare even 20 years ago and
   would have usually been equipped with than the maximum amount of
   RAM.

 - Some SKUs of the Celeron D from 2004 had 64-bit mode fused off but
   could still work in a Socket 775 mainboard designed for the later
   Core 2 Duo and 8GB. Apparently most BIOSes at the time only allowed
   64-bit CPUs.

 - The rare Xeon LV "Sossaman" came on a few motherboards with
   registered DDR2 memory support up to 16GB.

 - In the early days of x86-64 hardware, there was sometimes the need
   to run a 32-bit kernel to work around bugs in the hardware drivers,
   or in the syscall emulation for 32-bit userspace. This likely still
   works but there should never be a need for this any more.

PAE mode is still required to get access to the 'NX' bit on Atom
'Pentium M' and 'Core Duo' CPUs.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-6-arnd@kernel.org
2025-02-27 11:21:53 +01:00
Arnd Bergmann
0abf508675 x86/smp: Drop 32-bit "bigsmp" machine support
The x86-32 kernel used to support multiple platforms with more than eight
logical CPUs, from the 1999-2003 timeframe: Sequent NUMA-Q, IBM Summit,
Unisys ES7000 and HP F8. Support for all except the latter was dropped
back in 2014, leaving only the F8 based DL740 and DL760 G2 machines in
this catery, with up to eight single-core Socket-603 Xeon-MP processors
with hyperthreading.

Like the already removed machines, the HP F8 servers at the time cost
upwards of $100k in typical configurations, but were quickly obsoleted
by their 64-bit Socket-604 cousins and the AMD Opteron.

Earlier servers with up to 8 Pentium Pro or Xeon processors remain
fully supported as they had no hyperthreading. Similarly, the more
common 4-socket Xeon-MP machines with hyperthreading using Intel
or ServerWorks chipsets continue to work without this, and all the
multi-core Xeon processors also run 64-bit kernels.

While the "bigsmp" support can also be used to run on later 64-bit
machines (including VM guests), it seems best to discourage that
and get any remaining users to update their kernels to 64-bit builds
on these. As a side-effect of this, there is also no more need to
support NUMA configurations on 32-bit x86, as all true 32-bit
NUMA platforms are already gone.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-3-arnd@kernel.org
2025-02-27 11:19:05 +01:00
Michael Ellerman
215bd64ada docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR
Remove a reference to CBE_CPUFREQ_SPU_GOVERNOR which has been removed.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241218105523.416573-20-mpe@ellerman.id.au
2025-02-26 21:15:09 +05:30
Borislav Petkov
8442df2b49 x86/bugs: KVM: Add support for SRSO_MSR_FIX
Add support for

  CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
  indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
  SRSO.

Enable BpSpecReduce to mitigate SRSO across guest/host boundaries.

Switch back to enabling the bit when virtualization is enabled and to
clear the bit when virtualization is disabled because using a MSR slot
would clear the bit when the guest is exited and any training the guest
has done, would potentially influence the host kernel when execution
enters the kernel and hasn't VMRUN the guest yet.

More detail on the public thread in Link below.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241202120416.6054-1-bp@kernel.org
2025-02-26 15:13:06 +01:00
Steven Rostedt
937fbf111a tracing: Add traceoff_after_boot option
Sometimes tracing is used to debug issues during the boot process. Since
the trace buffer has a limited amount of storage, it may be prudent to
disable tracing after the boot is finished, otherwise the critical
information may be overwritten.  With this option, the main tracing buffer
will be turned off at the end of the boot process.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lore.kernel.org/20250208103017.48a7ec83@batman.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-25 13:46:40 -05:00
Rafael J. Wysocki
de585eac08 Merge branch 'cpuidle-menu'
This work had been triggered by a report that commit 0611a640e6
("eventpoll: prefer kfree_rcu() in __ep_remove()") had caused the
critical-jOPS metric of the SPECjbb 2015 benchmark [1] to drop by around
50% even though it generally reduced kernel overhead.  Indeed, it was
found during further investigation that the total interrupt rate while
running the SPECjbb workload had fallen as a result of that commit by
55% and the local timer interrupt rate had fallen by almost 80%.

That turned out to cause the menu cpuidle governor to select the deepest
idle state supplied by the cpuidle driver (intel_idle) much more often
which added significantly more idle state latency to the workload and
that led to the decrease of the critical-jOPS score.

Interestingly enough, this problem was not visible when the teo cpuidle
governor was used instead of menu, so it appeared to be specific to the
latter.  CPU wakeup event statistics collected while running the
workload indicated that the menu governor was effectively ignoring non-
timer wakeup information and all of its idle state selection decisions
appeared to be based on timer wakeups only.  Thus, it appeared that the
reduction of the local timer interrupt rate caused the governor to
predict a idle duration much more often while running the workload and
the deepest idle state was selected significantly more often as a result
of that.

A subsequent inspection of the get_typical_interval() function in the
menu governor indicated that it might return UINT_MAX too often which
then caused the governor's decisions to be based entirely on information
related to timers.

Generally speaking, UINT_MAX is returned by get_typical_interval() if it
cannot make a prediction based on the most recent idle intervals data
with sufficiently high confidence, but at least in some cases this means
that useful information is not taken into account at all which may lead
to significant idle state selection mistakes.  Moreover, this is not
really unlikely to happen.

One issue with get_typical_interval() is that, when it eliminates
outliers from the sample set in an attempt to reduce the standard
deviation (and so improve the prediction confidence), it does that by
dropping high-end samples only, while samples at the low end of the set
are retained.  However, the samples at the low end very well may be the
outliers and they should be eliminated from the sample set instead of
the high-end samples.  Accordingly, the likelihood of making a
meaningful idle duration prediction can be improved by making it also
eliminate low-end samples if they are farther from the average than
high-end samples.

Another issue is that get_typical_interval() gives up after eliminating
1/4 of the samples if the standard deviation is still not as low as
desired (within 1/6 of the average or within 20 us if the average is
close to 0), but the remaining samples in the set still represent useful
information at that point and discarding them altogether may lead to
suboptimal idle state selection.

For instance, the largest idle duration value in the get_typical_interval()
data set is the maximum idle duration observed recently and it is likely
that the upcoming idle duration will not exceed it.  Therefore, in the
absence of a better choice, this value can be used as an upper bound on
the target residency of the idle state to select.

* cpuidle-menu:
  cpuidle: menu: Update documentation after get_typical_interval() changes
  cpuidle: menu: Avoid discarding useful information
  cpuidle: menu: Eliminate outliers on both ends of the sample set
  cpuidle: menu: Tweak threshold use in get_typical_interval()
  cpuidle: menu: Use one loop for average and variance computations
  cpuidle: menu: Drop a redundant local variable
2025-02-25 12:13:52 +01:00
David Arcari
5e7e39ae15 intel_idle: introduce 'no_native' module parameter
Since commit 18734958e9 ("intel_idle: Use ACPI _CST for processor models
without C-state tables") the intel_idle driver has had the ability to use
the ACPI _CST to populate C-states when the processor model is not
recognized.

However, even when the processor model is recognized (native mode) there
are cases where it is useful to make the driver ignore the per-CPU idle
states in lieu of ACPI C-states (such as specific application performance).

Add a new 'no_native' module parameter to provide this functionality.

Signed-off-by: David Arcari <darcari@redhat.com>
Link: https://patch.msgid.link/20250220151120.1131122-1-darcari@redhat.com
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Spell CPU in capitals ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-02-25 12:13:40 +01:00
Rafael J. Wysocki
5c35041099 cpuidle: menu: Update documentation after get_typical_interval() changes
The documentation of the menu cpuidle governor needs to be updated
to match the code behavior after some changes made recently.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/4998484.31r3eYUQgx@rjwysocki.net
[ rjw: More specific subject, two typos fixed in the changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-02-25 12:12:46 +01:00
Kees Cook
39ec9eaaa1 coredump: Only sort VMAs when core_sort_vma sysctl is set
The sorting of VMAs by size in commit 7d442a33bf ("binfmt_elf: Dump
smaller VMAs first in ELF cores") breaks elfutils[1]. Instead, sort
based on the setting of the new sysctl, core_sort_vma, which defaults
to 0, no sorting.

Reported-by: Michael Stapelberg <michael@stapelberg.ch>
Closes: https://lore.kernel.org/all/20250218085407.61126-1-michael@stapelberg.de/ [1]
Fixes: 7d442a33bf ("binfmt_elf: Dump smaller VMAs first in ELF cores")
Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-24 10:51:57 -08:00
Maksimilijan Marosevic
4dd4eef60f Fix typos in admin-guide/gpio
Fixing typos.

Signed-off-by: Maksimilijan Marosevic <maksimilijan.marosevic@proton.me>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250218193822.1031-1-maksimilijan.marosevic@proton.me
2025-02-21 13:27:44 -07:00
Martin Tůma
7b8b6bdfab media: admin-guide: add mgb4 GMSL modules variants description
Document the (new) mgb4 GMSL modules variants.

Signed-off-by: Martin Tůma <martin.tuma@digiteqautomotive.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-02-21 10:33:03 +01:00
Chandra Pratap
bf786586ca Documentation: media: fix spelling error in the HDMI CEC documentation
Correct the erroneous spelling of 'responsible' in
Documentation/admin-guide/media/cec.rst. This fixes all errors
pointed out by codespell in the file.

Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-02-21 10:33:02 +01:00
Mauro Carvalho Chehab
2234652a73 docs: thunderbolt: Allow creating cross-references for ABI
Now that Documentation/ABI is processed by automarkup, let it
generate cross-references for the corresponding ABI file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/a655e770e1446f91088f579b79ae890a19771119.1739254867.git.mchehab+huawei@kernel.org
2025-02-18 13:42:46 -07:00
David Reaver
85df12c599 docs: iostats: Rewrite intro, remove outdated formats
The introduction discussed stat file formats for very old kernel versions,
which obscured key information that readers may find useful. Additionally,
the example file contents and the reference to "15 fields" did not account
for the flush fields added in b686631865 ("block: add iostat counters for
flush requests") [1].

Rewrite the introduction to focus only on the current kernel's disk I/O stat
file formats. Also, clean up wording for conciseness.

Link: https://lore.kernel.org/lkml/157433282607.7928.5202409984272248322.stgit@buzz/T/ [1]

Signed-off-by: David Reaver <me@davidreaver.com>
Link: https://lore.kernel.org/r/20250215180114.157948-1-me@davidreaver.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-02-18 13:28:30 -07:00
Mike Rapoport (Microsoft)
8b2ee518fc Documentation/kernel-parameters: fix typo in description of reserve_mem
The format description of reserve_mem uses [KNG] as units, rather than
[KMG].

Fix it.

Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250218070845.3769520-1-rppt@kernel.org
2025-02-18 13:25:06 -07:00
Rafael J. Wysocki
7802fce7dc cpufreq: intel_pstate: Make it possible to avoid enabling CAS
Capacity-aware scheduling (CAS) is enabled by default by intel_pstate on
hybrid systems without SMT, but in some usage scenarios it may be more
attractive to place tasks for maximum CPU performance regardless of the
extra cost in terms of energy, which is the case on such systems when
CAS is not enabled, so introduce a command line option to forbid
intel_pstate to enable CAS.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by:Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/2781262.mvXUDI8C0e@rjwysocki.net
2025-02-18 20:47:33 +01:00
Beata Michalska
fbb4a4759b cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
Currently the CPUFreq core exposes two sysfs attributes that can be used
to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
and scaling_cur_freq. Both provide slightly different view on the
subject and they do come with their own drawbacks.

cpuinfo_cur_freq provides higher precision though at a cost of being
rather expensive. Moreover, the information retrieved via this attribute
is somewhat short lived as frequency can change at any point of time
making it difficult to reason from.

scaling_cur_freq, on the other hand, tends to be less accurate but then
the actual level of precision (and source of information) varies between
architectures making it a bit ambiguous.

The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
distinct interface, exposing an average frequency of a given CPU(s), as
reported by the hardware, over a time frame spanning no more than a few
milliseconds. As it requires appropriate hardware support, this
interface is optional.

Note that under the hood, the new attribute relies on the information
provided by arch_freq_get_on_cpu, which, up to this point, has been
feeding data for scaling_cur_freq attribute, being the source of
ambiguity when it comes to interpretation. This has been amended by
restoring the intended behavior for scaling_cur_freq, with a new
dedicated config option to maintain status quo for those, who may need
it.

CC: Jonathan Corbet <corbet@lwn.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Phil Auld <pauld@redhat.com>
CC: x86@kernel.org
CC: linux-doc@vger.kernel.org

Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20250131162439.3843071-3-beata.michalska@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-02-17 18:09:31 +00:00
Mauro Carvalho Chehab
de61d6515b docs: ABI: move README contents to the top
The ABI documentation looks a little bit better if it starts
with the contents of the README is placed at the beginning.

Move it to the beginning of the ABI chapter. While here, improve
the README text and change the title that will be shown at the
html/pdf output to be coherent with both ABI file contents and
with the generated documentation output.

Suggested-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/20250211055809.1898623-1-mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-02-13 09:47:44 -07:00
Mauro Carvalho Chehab
4bb2dbd757 docs: admin-guide/abi: split files from symbols
Now that get_abi has gained support for filtering its output,
split ABI symbols from files at the html output.

That makes pages smaller and easier to navigate.

As an additional bonus, as it will paralelize files handling,
it gives an additional performance improvement.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/30e3cf2a8aeef23ca889de60a90f7de141e0dc0e.1739182025.git.mchehab+huawei@kernel.org
2025-02-10 11:19:57 -07:00
Mauro Carvalho Chehab
5d7871d77f docs: sphinx/kernel_abi: parse ABI files only once
Right now, the logic parses ABI files on 4 steps, one for each
directory. While this is fine in principle, by doing that, not
all symbol cross-references will be created.

Change the logic to do the parsing only once in order to get
a global dictionary to be used when creating ABI cross-references.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/5205c53838b6ea25f4cdd4cc1e3d17c0141e75a6.1739182025.git.mchehab+huawei@kernel.org
2025-02-10 11:19:57 -07:00
Mauro Carvalho Chehab
9d7ec88679 docs: use get_abi.py for ABI generation
Use the new script instead of the old one when generating ABI docs.

For now, execute it via exec. Future changes will better integrate it
by using the class defined there directly.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/e7fcb121c0612c94f6f54f0d742cd3a26a46cd7d.1739182025.git.mchehab+huawei@kernel.org
2025-02-10 11:19:56 -07:00
Mauro Carvalho Chehab
7ceb84b726 docs: admin-guide: abi: add SPDX tags to ABI files
Such files are missing SPDX tags containing the licensing information.

Add them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/8c7bfe676e7349ea2d1930bf918d54e27d15ae9e.1739182025.git.mchehab+huawei@kernel.org
2025-02-10 11:19:56 -07:00