Commit Graph

3473 Commits

Author SHA1 Message Date
Jonathan Corbet
270beb5b2a docs: admin-guide: bring some order to the "everything else" section
The bulk of the admin guide had become a big pile of stuff haphazardly
tossed together, mostly in the catch-all "everything else" section.  Split
that section into a few broad categories and sort the documents into them
as appropriate.

No documents have been added or removed, they are just reordered.  Note
that many of these documents are severely obsolete and should be considered
for removal.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241213182057.343527-4-corbet@lwn.net
2024-12-17 13:23:43 -07:00
Jonathan Corbet
2eb4e66cdd docs: admin-guide: add some subsection headings
As part of the goal of bringing some order to this file, add subsection
headings to help readers find what they are looking for.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241213182057.343527-3-corbet@lwn.net
2024-12-17 13:23:38 -07:00
Jonathan Corbet
42463d3e89 docs: admin-guide: join the sysfs information in one place
The documents describing sysfs are spread out in the admin guide; bring
them together in one place.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241213182057.343527-2-corbet@lwn.net
2024-12-17 13:23:31 -07:00
Paul E. McKenney
282e06cc8f rcutorture: Add parameters to control polled/conditional wait interval
This commit adds rcutorture module parameters gp_cond_wi, gp_cond_wi_exp,
gp_poll_wi, and gp_poll_wi_exp to control the wait interval for
conditional, conditional expedited, polled, and polled expedited grace
periods, respectively.  When rcu_torture_writer() is testing these types
of grace periods, hrtimers are used to randomly wait up to the specified
number of microseconds, but with nanosecond granularity.

In the case of conditional grace periods (get_state_synchronize_rcu()
and cond_synchronize_rcu(), for example) there is just one
wait.  For polled grace periods (start_poll_synchronize_rcu() and
poll_state_synchronize_rcu(), for example), there is a repeated series
of waits until the grace period ends.

For normal grace periods, the default is 16 jiffies (for example, 16,000
microseconds on a HZ=1000 system) and for expedited grace periods the
default is 128 microseconds.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-12-14 17:05:27 +01:00
Paul E. McKenney
cae7f6319e rcutorture: Add documentation for recent conditional and polled APIs
This commit adds kernel-parameters.txt documentation for rcutorture's
(relatively) new gp_cond_exp, gp_cond_full, gp_cond_exp, gp_poll,
gp_poll_exp, gp_poll_full, and gp_poll_exp module parameters.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-12-14 17:05:17 +01:00
Paul E. McKenney
584975ccb7 rcutorture: Add random real-time preemption
This commit adds the rcutorture.preempt_duration kernel module parameter,
which gives the real-time preemption duration in milliseconds (zero to
disable, which is the default) and also the rcutorture.preempt_interval
module parameter, which gives the interval between successive preemptions,
also in milliseconds, defaulting to one second.  The CPU to preempt is
chosen at random from those online at that time.  Races between preempting
a given CPU and that CPU going offline are ignored, and preemption is
forgone when this occurs.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-12-14 17:01:05 +01:00
Saru2003
5c14b68596 Documentation: zram: fix dictionary spelling
Fixes a typo in the ZRAM documentation where 'dictioary' was
misspelled. Corrected it to 'dictionary' in the example usage
of 'algorithm_params'.

Signed-off-by: Sarveshwaar SS <sarvesh20123@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241125165122.17521-1-sarvesh20123@gmail.com
2024-12-13 08:52:22 -07:00
Guixin Liu
80568f479b docs, nvme: introduce nvme-multipath document
This adds a document about nvme-multipath and policies supported
by the Linux NVMe host driver, and also each policy's best scenario.

Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241209071127.22922-1-kanie@linux.alibaba.com
2024-12-13 08:37:27 -07:00
Ruffalo Lavoisier
5a3f0a11b2 docs: remove duplicate word
- Remove duplicate word, 'to'.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241120043414.78811-1-RuffaloLavoisier@gmail.com
2024-12-11 09:07:40 -07:00
Cengiz Can
e551bd4109 Documentation: remove :kyb: tags
:kyb: is an extra markup that we should avoid when we can.

It worsens the plain-text reading experience and adds very little value
to rendered views.

Remove all :kbd: tags from Documentation/*

Signed-off-by: Cengiz Can <cengiz@kernel.wtf>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241202090514.1716-1-cengiz@kernel.wtf
2024-12-11 09:07:39 -07:00
Ingo Molnar
05453d36a2 Merge branch 'linus' into x86/cleanups, to resolve conflict
These two commits interact:

 upstream:     73da582a47 ("x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC")
 x86/cleanups: 13148e22c1 ("x86/apic: Remove "disablelapic" cmdline option")

Resolve it.

 Conflicts:
	arch/x86/kernel/cpu/topology.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-10 19:33:03 +01:00
Borislav Petkov (AMD)
ab0e7f2076 Documentation: Merge x86-specific boot options doc into kernel-parameters.txt
Documentation/arch/x86/x86_64/boot-options.rst is causing unnecessary
confusion by being a second place where one can put x86 boot options.
Move them into the main one.

Drop removed ones like "acpi=ht", while at it.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20241202190011.11979-1-bp@kernel.org
2024-12-10 18:25:40 +01:00
Mario Limonciello
50a062a762 cpufreq/amd-pstate: Store the boost numerator as highest perf again
commit ad4caad58d ("cpufreq: amd-pstate: Merge
amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()")
changed the semantics for highest perf and commit 18d9b52271
("cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled")
worked around those semantic changes.

This however is a confusing result and furthermore makes it awkward to
change frequency limits and boost due to the scaling differences. Restore
the boost numerator to highest perf again.

Suggested-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Fixes: ad4caad58d ("cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()")
Link: https://lore.kernel.org/r/20241209185248.16301-2-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-12-10 10:17:43 -06:00
Yicong Yang
3b051bb7cb drivers/perf: hisi: Export associated CPUs of each PMU through sysfs
Although the event of the uncore PMU can only be opened on a single
CPU, some PMU does have the affinity on a range of CPUs. For example
the L3C PMU is associated to the CPUs sharing the L3T it monitors.
Users may infer this affinity by the PMU name which may have SCCL ID
and CCL ID encoded (for L3C etc), but it's not that straightforward.
So export this information by adding an "associated_cpus" sysfs
attribute then user can get this directly.

Reviewed-by: Jonathan Cameron <Joanthan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-9-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-10 15:57:24 +00:00
Gowthami Thiagarajan
5fcccba118 perf/marvell: Odyssey LLC-TAD performance monitor support
Each TAD provides eight 64-bit counters for monitoring
cache behavior.The driver always configures the same counter for
all the TADs. The user would end up effectively reserving one of
eight counters in every TAD to look across all TADs.
The occurrences of events are aggregated and presented to the user
at the end of running the workload. The driver does not provide a
way for the user to partition TADs so that different TADs are used for
different applications.

The performance events reflect various internal or interface activities.
By combining the values from multiple performance counters, cache
performance can be measured in terms such as: cache miss rate, cache
allocations, interface retry rate, internal resource occupancy, etc.

Each supported counter's event and formatting information is exposed
to sysfs at /sys/devices/tad/. Use perf tool stat command to measure
the pmu events. For instance:

perf stat -e tad_hit_ltg,tad_hit_dtg <workload>

Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-6-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-09 15:57:49 +00:00
Gowthami Thiagarajan
d950c381dc perf/marvell: Odyssey DDR Performance monitor support
Odyssey DRAM Subsystem supports eight counters for monitoring performance
and software can program those counters to monitor any of the defined
performance events. Supported performance events include those counted
at the interface between the DDR controller and the PHY, interface between
the DDR Controller and the CHI interconnect, or within the DDR Controller.

Additionally DSS also supports two fixed performance event counters, one
for ddr reads and the other for ddr writes.

Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-4-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-09 15:57:39 +00:00
Ilkka Koskinen
8632306e09 Documentation: dwc_pcie_pmu: Fix the mnemonics and eventid
Fix the event id and type in the example. In addition, the recent fix,
which addressed the mnemonics with mixed case, didn't fix the document.
Match the names with the driver.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241205061914.5568-3-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-09 15:45:21 +00:00
Besar Wicaksono
bce61d5c57 perf: arm_cspmu: nvidia: monitor all ports by default
Some NVIDIA PMUs like the NVLINK-C2C, CNVLINK, and PCIE PMU provide
port filtering. If the port filter is set to zero, the counter of
these PMUs will not capture any event. To avoid meaningless
experiment, the driver sets the port filter value to a default
non-zero value.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20241031142118.1865965-5-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-09 15:07:49 +00:00
Besar Wicaksono
ca26df4b10 perf: arm_cspmu: nvidia: enable NVLINK-C2C port filtering
Enable NVLINK-C2C port filtering to distinguish traffic from
different GPUs connected to NVLINK-C2C.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20241031142118.1865965-4-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-09 15:07:49 +00:00
Besar Wicaksono
5f7cd0dc98 perf: arm_cspmu: nvidia: fix sysfs path in the kernel doc
Fix typos to the sysfs path referenced by NVIDIA
uncore pmu kernel doc.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20241031142118.1865965-3-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-09 15:07:49 +00:00
Waiman Long
1174b9344b sched/isolation: Make "isolcpus=nohz" equivalent to "nohz_full"
The "isolcpus=nohz" boot parameter and flag were used to disable tick
when running a single task.  Nowsdays, this "nohz" flag is seldomly used
as it is included as part of the "nohz_full" parameter.  Extend this
flag to cover other kernel noises disabled by the "nohz_full" parameter
to make them equivalent. This also eliminates the need to use both the
"isolcpus" and the "nohz_full" parameters to fully isolated a given
set of CPUs.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20241030175253.125248-3-longman@redhat.com
2024-12-02 12:24:28 +01:00
Sebastian Andrzej Siewior
f66e4a9965 sched/core: Update kernel boot parameters for LAZY preempt.
Update the documentation for the `preempt=' parameter which now also
accepts `lazy'.

Fixes: 7c70cb94d2 ("sched: Add Lazy preemption model")
Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://lore.kernel.org/r/20241122173557.MYOtT95Q@linutronix.de
2024-12-02 12:01:28 +01:00
Linus Torvalds
29caf07e9d * Features
- extend next/check table to add support for 2^24 states to the
     state machine.
   - rework capability audit cache to use broader cred information
     instead of just the profile. Also add a time stamp so old
     entries can be aged out of the cache.
 
 * Bug Fixes
   - fix 'Do simple duplicate message elimination' to clear previous
     state when updating in capability audit cache
   - Fix memory leak for aa_unpack_strdup()
   - properly handle cx/px lookup failure when in complain mode
   - allocate xmatch for nullpdb inside aa_alloc_null fixing a
     NULL ptr deref of tracking profiles in when in complain mode
 
 * Cleanups
   - Remove everything being reported as deadcode
   - replace misleading 'scrubbing environment' phrase in debug print
   - Remove unnecessary NULL check before kvfree()
   - clean up duplicated parts of handle_onexec()
   - Use IS_ERR_OR_NULL() helper function
   - move new_profile declaration to top of block instead immediately
     after label to remove C23 extension warning
 
 * Documentation
   - add comment to document capability.c:profile_capable ad ptr
     parameter can not be NULL
   - add comment to document first entry is in packed perms struct is
     reserved for future planned expansion.
   - Update LSM/apparmor.rst add blurb for
     CONFIG_DEFAULT_SECURITY_APPARMOR
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE7cSDD705q2rFEEf7BS82cBjVw9gFAmdHgIgACgkQBS82cBjV
 w9jwFA//egzWWJtLKKgd4QJ/kfPJS/tYnnamZI7b+64Aqe2a+WP6tYZ7dNBrMFff
 Y5svjKDOkotLXKz01+rsnecf5o8SVNuU+6XSYYX+WIuSfeMHcxB3lI1SDEQF/tdk
 ODMfvmI0O9SVwXlkIw2BPA8S06HsrFSXj2KLBvZEGCHX4Ur4Dj2WrmOuZ8Otk9rK
 fUez9Om/Rc2cunaCEzZ53zfX5IjhN6yYYMc9ANDhsH5TaEvryIt1GzhnfSpKrUgm
 zJmK/h85ihgbTH+d5gwNuh4jfRMOqvDy6nBeNtSwp/AqDqMyHdtgSyX1oYRvS5nf
 9EC94fyW22/DVRFF+DS4iUs9RBWvMyyeqdylpsxP66p+qGky6W72VUJi0+5JS6l8
 CWelY65g2p3A6NKzgcxdBz35364g+0v1qNEoFTZUA3nz2mNfDAemjG6zgq7ABhLF
 hrF/RLyTNTOECI83KuHWuvKxpPYeZoSj/PFkCCQI+56/vpcdOlJooTFUJP5kUNyj
 WZK4X6uNbVIoRHlGOg0zHbC1eqAPEdGdBt0sYJb2DYSYu/fZ6xsAy3olk1FR2uhD
 K69LpUQNt1JqV3jlM1y6c4b+d9Rc9rMOVzW14oDLtMfTY3BeCKu2VAY0bJ3mPvXb
 eIU32XsZr83J7iWcVCQb+/frS44/I9yjawKQ89aPsAOC4G0IcSw=
 =2M1G
 -----END PGP SIGNATURE-----

Merge tag 'apparmor-pr-2024-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor updates from John Johansen:
 "Features:
   - extend next/check table to add support for 2^24 states to the state
     machine.
   - rework capability audit cache to use broader cred information
     instead of just the profile. Also add a time stamp so old entries
     can be aged out of the cache.

  Bug Fixes:
   - fix 'Do simple duplicate message elimination' to clear previous
     state when updating in capability audit cache
   - Fix memory leak for aa_unpack_strdup()
   - properly handle cx/px lookup failure when in complain mode
   - allocate xmatch for nullpdb inside aa_alloc_null fixing a NULL ptr
     deref of tracking profiles in when in complain mode

  Cleanups:
   - Remove everything being reported as deadcode
   - replace misleading 'scrubbing environment' phrase in debug print
   - Remove unnecessary NULL check before kvfree()
   - clean up duplicated parts of handle_onexec()
   - Use IS_ERR_OR_NULL() helper function
   - move new_profile declaration to top of block instead immediately
     after label to remove C23 extension warning

  Documentation:
   - add comment to document capability.c:profile_capable ad ptr
     parameter can not be NULL
   - add comment to document first entry is in packed perms struct is
     reserved for future planned expansion.
   - Update LSM/apparmor.rst add blurb for DEFAULT_SECURITY_APPARMOR"

* tag 'apparmor-pr-2024-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: lift new_profile declaration to remove C23 extension warning
  apparmor: replace misleading 'scrubbing environment' phrase in debug print
  parser: drop dead code for XXX_comb macros
  apparmor: Remove unused parameter L1 in macro next_comb
  Docs: Update LSM/apparmor.rst
  apparmor: audit_cap dedup based on subj_cred instead of profile
  apparmor: add a cache entry expiration time aging out capability audit cache
  apparmor: document capability.c:profile_capable ad ptr not being NULL
  apparmor: fix 'Do simple duplicate message elimination'
  apparmor: document first entry is in packed perms struct is reserved
  apparmor: test: Fix memory leak for aa_unpack_strdup()
  apparmor: Remove deadcode
  apparmor: Remove unnecessary NULL check before kvfree()
  apparmor: domain: clean up duplicated parts of handle_onexec()
  apparmor: Use IS_ERR_OR_NULL() helper function
  apparmor: add support for 2^24 states to the dfa state machine.
  apparmor: properly handle cx/px lookup failure for complain
  apparmor: allocate xmatch for nullpdb inside aa_alloc_null
2024-11-29 11:10:30 -08:00
Linus Torvalds
9ad55a67a7 soundwire updates for 6.13
- structure optimization of few bus structures and header updates
  - support for 2.0 disco spec
  - amd driver updates for acp revision, refactoring code and support for
    acp6.3
  - soft reset support for cadence driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmdEgiMACgkQfBQHDyUj
 g0dVBA/+MWwHs+Sl7LMSmkpGsfAsmSbD2il+v+9WcVnaQpl/dgv8EXPGbafBBgK/
 AlVUvLCNdbwY93wCb/2xdGPOJS699D7AtdJnUEppcL2VsMtEbgQxyG0OSekRVH0c
 NxVLNPVLQFQnZayh7MNflQNVrXJyEqUJg8n0G9G1KT7jTeMavejYhqmhN7TKNtLD
 vJzF79QFC2n7+f7jK9+d2pJlhW5V3XUyQCRF6FipftKbuZN+ciVh9kjnAf1GjPsi
 qpv7kRZ3ttZiYW+/8FjJxqChnT10b/ahRDwJTXE+uGhqxHD9Cjo/GYrzUtQQbDR2
 uvZ6+o0UxhN3HR5Dq09FJYPluHpt8S/s/wZ0dj+dXlvPR82qT6LA9LP16BFwYj3S
 36/DpGwJBYg3tsmwECKbY08t3aI1d8nXNKG0tXbkEU3RUWVeOJOLAyXbwYQ9DRGN
 k3RbTTEZiw223FlgAk9dzCI6mMuekdh20UWVH7iZwUl8ZvJhWNdWiZOV4uaUcGZS
 fmJ6JE7cM1ntv5rXjKIhhnTnoL5Z+3es3PjLxj8PE7VNC8Dlln67FF1NuoDd0uF0
 jWA13iNUOKgytsx2jxAxWnU8S3SAPjB1+GD65ovMxH+b9xtgwhtmCcpySJaG4/Pn
 P7F7dx1+bK8gbmc5xJf8ZddYeDF/Nb/493trk+Sf+zZSs+hevRY=
 =3Ob7
 -----END PGP SIGNATURE-----

Merge tag 'soundwire-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire updates from Vinod Koul:

 - structure optimization of few bus structures and header updates

 - support for 2.0 disco spec

 - amd driver updates for acp revision, refactoring code and support for
   acp6.3

 - soft reset support for cadence driver

* tag 'soundwire-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (24 commits)
  soundwire: Minor formatting fixups in sdw.h header
  soundwire: Update the includes on the sdw.h header
  soundwire: cadence: clear MCP BLOCK_WAKEUP in init
  soundwire: cadence: add soft-reset on startup
  soundwire: intel_auxdevice: add kernel parameter for mclk divider
  soundwire: mipi-disco: add support for DP0/DPn 'lane-list' property
  soundwire: mipi-disco: add new properties from 2.0 spec
  soundwire: mipi-disco: add comment on DP0-supported property
  soundwire: mipi-disco: add support for peripheral channelprepare timeout
  soundwire: mipi_disco: add support for clock-scales property
  soundwire: mipi-disco: add error handling for property array read
  soundwire: mipi-disco: remove DPn audio-modes
  soundwire: optimize sdw_dpn_prop
  soundwire: optimize sdw_dp0_prop
  soundwire: optimize sdw_slave_prop
  soundwire: optimize sdw_bus structure
  soundwire: optimize sdw_master_prop
  soundwire: optimize sdw_stream_runtime memory layout
  soundwire: mipi_disco: add MIPI-specific property_read_bool() helpers
  soundwire: Correct some typos in comments
  ...
2024-11-27 13:38:09 -08:00
Siddharth Menon
d00c2359fc Docs: Update LSM/apparmor.rst
After the deprecation of CONFIG_DEFAULT_SECURITY, it is no longer used
to enable and configure AppArmor. Since kernel 5.0,
`CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE` is not used either.
Instead, the CONFIG_LSM parameter manages the order and selection of LSMs.

Signed-off-by: Siddharth Menon <simeddon@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2024-11-26 19:21:06 -08:00
Linus Torvalds
1746db26f8 pci-v6.13-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmdE14wUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxMPRAAslaEhHZ06cU/I+BA0UrMJBbzOw+/
 XM2XUojxWaNMYSBPVXbtSBrfFMnox4G3hFBPK0T0HiWoc7wGx/TUVJk65ioqM8ug
 gS/U3NjSlqlnH8NHxKrb/2t0tlMvSll9WwumOD9pMFeMGFOS3fAgUk+fBqXFYsI/
 RsVRMavW9BucZ0yMHpgr0KGLPSt3HK/E1h0NLO+TN6dpFcoIq3XimKFyk1QQQgiR
 V3W21JMwjw+lDnUAsijU+RBYi5Fj6Rpqig/biRnzagVE6PJOci3ZJEBE7dGqm4LM
 UlgG6Ql/eK+bb3fPhcXxVmscj5XlEfbesX5PUzTmuj79Wq5l9hpy+0c654G79y8b
 rGiEVGM0NxmRdbuhWQUM2EsffqFlkFu7MN3gH0tP0Z0t3VTXfBcGrQJfqCcSCZG3
 5IwGdEE2kmGb5c3RApZrm+HCXdxhb3Nwc3P8c27eXDT4eqHWDJag4hzLETNBdIrn
 Rsbgry6zzAVA6lLT0uasUlWerq/I6OrueJvnEKRGKDtbw/JL6PLveR1Rvsc//cQD
 Tu4FcG81bldQTUOdHEgFyJgmSu77Gvfs5RZBV0cEtcCBc33uGJne08kOdGD4BwWJ
 dqN3wJFh5yX4jlMGmBDw0KmFIwKstfUCIoDE4Kjtal02CURhz5ZCDVGNPnSUKN0C
 hflVX0//cRkHc5g=
 =2Otz
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Make pci_stop_dev() and pci_destroy_dev() safe so concurrent
     callers can't stop a device multiple times, even as we migrate from
     the global pci_rescan_remove_lock to finer-grained locking (Keith
     Busch)

   - Improve pci_walk_bus() implementation by making it recursive and
     moving locking up to avoid need for a 'locked' parameter (Keith
     Busch)

   - Unexport pci_walk_bus_locked(), which is only used internally by
     the PCI core (Keith Busch)

   - Detect some Thunderbolt chips that are built-in and hence
     'trustworthy' by a heuristic since the 'ExternalFacingPort' and
     'usb4-host-interface' ACPI properties are not quite enough (Esther
     Shimanovich)

  Resource management:

   - Use PCI bus addresses (not CPU addresses) in 'ranges' properties
     when building dynamic DT nodes so systems where PCI and CPU
     addresses differ work correctly (Andrea della Porta)

   - Tidy resource sizing and assignment with helpers to reduce
     redundancy (Ilpo Järvinen)

   - Improve pdev_sort_resources() 'bogus alignment' warning to be more
     specific (Ilpo Järvinen)

  Driver binding:

   - Convert driver .remove_new() callbacks to .remove() again to finish
     the conversion from returning 'int' to being 'void' (Sergio
     Paracuellos)

   - Export pcim_request_all_regions(), a managed interface to request
     all BARs (Philipp Stanner)

   - Replace pcim_iomap_regions_request_all() with
     pcim_request_all_regions(), and pcim_iomap_table()[n] with
     pcim_iomap(n), in the following drivers: ahci, crypto qat, crypto
     octeontx2, intel_th, iwlwifi, ntb idt, serial rp2, ALSA korg1212
     (Philipp Stanner)

   - Remove the now unused pcim_iomap_regions_request_all() (Philipp
     Stanner)

   - Export pcim_iounmap_region(), a managed interface to unmap and
     release a PCI BAR (Philipp Stanner)

   - Replace pcim_iomap_regions(mask) with pcim_iomap_region(n), and
     pcim_iounmap_regions(mask) with pcim_iounmap_region(n), in the
     following drivers: fpga dfl-pci, block mtip32xx, gpio-merrifield,
     cavium (Philipp Stanner)

  Error handling:

   - Add sysfs 'reset_subordinate' to reset the entire hierarchy below a
     bridge; previously Secondary Bus Reset could only be used when
     there was a single device below a bridge (Keith Busch)

   - Warn if we reset a running device where the driver didn't register
     pci_error_handlers notification callbacks (Keith Busch)

  ASPM:

   - Disable ASPM L1 before touching L1 PM Substates to follow the spec
     closer and avoid a CPU load timeout on some platforms (Ajay
     Agarwal)

   - Set devices below Intel VMD to D0 before enabling ASPM L1 Substates
     as required per spec for all L1 Substates changes (Jian-Hong Pan)

  Power management:

   - Enable starfive controller runtime PM before probing host bridge
     (Mayank Rana)

   - Enable runtime power management for host bridges (Krishna chaitanya
     chundru)

  Power control:

   - Use of_platform_device_create() instead of of_platform_populate()
     to create pwrctl platform devices so we can control it based on the
     child nodes (Manivannan Sadhasivam)

   - Create pwrctrl platform devices only if there's a relevant power
     supply property (Manivannan Sadhasivam)

   - Add device link from the pwrctl supplier to the PCI dev to ensure
     pwrctl drivers are probed before the PCI dev driver; this avoids a
     race where pwrctl could change device power state while the PCI
     driver was active (Manivannan Sadhasivam)

   - Find pwrctl device for removal with of_find_device_by_node()
     instead of searching all children of the parent (Manivannan
     Sadhasivam)

   - Rename 'pwrctl' to 'pwrctrl' to match new bandwidth controller
     ('bwctrl') and hotplug files (Bjorn Helgaas)

  Bandwidth control:

   - Add read/modify/write locking for Link Control 2, which is used to
     manage Link speed (Ilpo Järvinen)

   - Extract Link Bandwidth Management Status check into
     pcie_lbms_seen(), where it can be shared between the bandwidth
     controller and quirks that use it to help retrain failed links
     (Ilpo Järvinen)

   - Re-add Link Bandwidth notification support with updates to address
     the reasons it was previously reverted (Alexandru Gagniuc, Ilpo
     Järvinen)

   - Add pcie_set_target_speed() and related functionality so drivers
     can manage PCIe Link speed based on thermal or other constraints
     (Ilpo Järvinen)

   - Add a thermal cooling driver to throttle PCIe Links via the
     existing thermal management framework (Ilpo Järvinen)

   - Add a userspace selftest for the PCIe bandwidth controller (Ilpo
     Järvinen)

  PCI device hotplug:

   - Add hotplug controller driver for Marvell OCTEON multi-function
     device where function 0 has a management console interface to
     enable/disable and provision various personalities for the other
     functions (Shijith Thotton)

   - Retain a reference to the pci_bus for the lifetime of a pci_slot to
     avoid a use-after-free when the thunderbolt driver resets USB4 host
     routers on boot, causing hotplug remove/add of downstream docks or
     other devices (Lukas Wunner)

   - Remove unused cpcihp struct cpci_hp_controller_ops.hardware_test
     (Guilherme Giacomo Simoes)

   - Remove unused cpqphp struct ctrl_dbg.ctrl (Christophe JAILLET)

   - Use pci_bus_read_dev_vendor_id() instead of hand-coded presence
     detection in cpqphp (Ilpo Järvinen)

   - Simplify cpqphp enumeration, which is already simple-minded and
     doesn't handle devices below hot-added bridges (Ilpo Järvinen)

  Virtualization:

   - Add ACS quirk for Wangxun FF5xxx NICs, which don't advertise an ACS
     capability but do isolate functions as though PCI_ACS_RR and
     PCI_ACS_CR were set, so the functions can be in independent IOMMU
     groups (Mengyuan Lou)

  TLP Processing Hints (TPH):

   - Add and document TLP Processing Hints (TPH) support so drivers can
     enable and disable TPH and the kernel can save/restore TPH
     configuration (Wei Huang)

   - Add TPH Steering Tag support so drivers can retrieve Steering Tag
     values associated with specific CPUs via an ACPI _DSM to improve
     performance by directing DMA writes closer to their consumers (Wei
     Huang)

  Data Object Exchange (DOE):

   - Wait up to 1 second for DOE Busy bit to clear before writing a
     request to the mailbox to avoid failures if the mailbox is still
     busy from a previous transfer (Gregory Price)

  Endpoint framework:

   - Skip attempts to allocate from endpoint controller memory window if
     the requested size is larger than the window (Damien Le Moal)

   - Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to
     handle controller-specific size and alignment constraints, and add
     test cases to the endpoint test driver (Damien Le Moal)

   - Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can
     observe DWC-specific alignment requirements (Damien Le Moal)

   - Synchronously cancel command handler work in endpoint test before
     cleaning up DMA and BARs (Damien Le Moal)

   - Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas
     Cassel)

   - Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and
     dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent
     (Niklas Cassel)

   - Avoid NULL dereference if Modem Host Interface Endpoint lacks
     'mmio' DT property (Zhongqiu Han)

   - Release PCI domain ID of Endpoint controller parent (not controller
     itself) and before unregistering the controller, to avoid
     use-after-free (Zijun Hu)

   - Clear secondary (not primary) EPC in pci_epc_remove_epf() when
     removing the secondary controller associated with an NTB (Zijun Hu)

  Cadence PCIe controller driver:

   - Lower severity of 'phy-names' message (Bartosz Wawrzyniak)

  Freescale i.MX6 PCIe controller driver:

   - Fix suspend/resume support on i.MX6QDL, which has a hardware
     erratum that prevents use of L2 (Stefan Eichenberger)

  Intel VMD host bridge driver:

   - Add 0xb60b and 0xb06f Device IDs for client SKUs (Nirmal Patel)

  MediaTek PCIe Gen3 controller driver:

   - Update mediatek-gen3 DT binding to require the exact number of
     clocks for each SoC (Fei Shao)

   - Add support for DT 'max-link-speed' and 'num-lanes' properties to
     restrict the link speed and width (AngeloGioacchino Del Regno)

  Microchip PolarFlare PCIe controller driver:

   - Add DT and driver support for using either of the two PolarFire
     Root Ports (Conor Dooley)

  NVIDIA Tegra194 PCIe controller driver:

   - Move endpoint controller cleanups that depend on refclk from the
     host to the notifier that tells us the host has deasserted PERST#,
     when refclk should be valid (Manivannan Sadhasivam)

  Qualcomm PCIe controller driver:

   - Add qcom SAR2130P DT binding with an additional clock (Dmitry
     Baryshkov)

   - Enable MSI interrupts if 'global' IRQ is supported, since a
     previous commit unintentionally masked them (Manivannan Sadhasivam)

   - Move endpoint controller cleanups that depend on refclk from the
     host to the notifier that tells us the host has deasserted PERST#,
     when refclk should be valid (Manivannan Sadhasivam)

   - Add DT binding and driver support for IPQ9574, with Synopsys IP
     v5.80a and Qcom IP 1.27.0 (devi priya)

   - Move the OPP "operating-points-v2" table from the
     qcom,pcie-sm8450.yaml DT binding to qcom,pcie-common.yaml, where it
     can be used by other Qcom platforms (Qiang Yu)

   - Add 'global' SPI interrupt for events like link-up, link-down to
     qcom,pcie-x1e80100 DT binding so we can start enumeration when the
     link comes up (Qiang Yu)

   - Disable ASPM L0s for qcom,pcie-x1e80100 since the PHY is not tuned
     to support this (Qiang Yu)

   - Add ops_1_21_0 for SC8280X family SoC, which doesn't use the
     'iommu-map' DT property and doesn't need BDF-to-SID translation
     (Qiang Yu)

  Rockchip PCIe controller driver:

   - Define ROCKCHIP_PCIE_AT_SIZE_ALIGN to replace magic 256 endpoint
     .align value (Damien Le Moal)

   - When unmapping an endpoint window, compute the region index instead
     of searching for it, and verify that the address was mapped (Damien
     Le Moal)

   - When mapping an endpoint window, verify that the address hasn't
     been mapped already (Damien Le Moal)

   - Implement pci_epc_ops.align_addr() for rockchip-ep (Damien Le Moal)

   - Fix MSI IRQ data mapping to observe the alignment constraint, which
     fixes intermittent page faults in memcpy_toio() and memcpy_fromio()
     (Damien Le Moal)

   - Rename rockchip_pcie_parse_ep_dt() to
     rockchip_pcie_ep_get_resources() for consistency with similar DT
     interfaces (Damien Le Moal)

   - Skip the unnecessary link train in rockchip_pcie_ep_probe() and do
     it only in the endpoint start operation (Damien Le Moal)

   - Implement pci_epc_ops.stop_link() to disable link training and
     controller configuration (Damien Le Moal)

   - Attempt link training at 5 GT/s when both partners support it
     (Damien Le Moal)

   - Add a handler for PERST# signal so we can detect host-initiated
     resets and start link training after PERST# is deasserted (Damien
     Le Moal)

  Synopsys DesignWare PCIe controller driver:

   - Clear outbound address on unmap so dw_pcie_find_index() won't match
     an ATU index that was already unmapped (Damien Le Moal)

   - Use of_property_present() instead of of_property_read_bool() when
     testing for presence of non-boolean DT properties (Rob Herring)

   - Advertise 1MB size if endpoint supports Resizable BARs, which was
     inadvertently lost in v6.11 (Niklas Cassel)

  TI J721E PCIe driver:

   - Add PCIe support for J722S SoC (Siddharth Vadapalli)

   - Delay PCIE_T_PVPERL_MS (100 ms), not just PCIE_T_PERST_CLK_US (100
     us), before deasserting PERST# to ensure power and refclk are
     stable (Siddharth Vadapalli)

  TI Keystone PCIe controller driver:

   - Set the 'ti,keystone-pcie' mode so v3.65a devices work in Root
     Complex mode (Kishon Vijay Abraham I)

   - Try to avoid unrecoverable SError for attempts to issue config
     transactions when the link is down; this is racy but the best we
     can do (Kishon Vijay Abraham I)

  Miscellaneous:

   - Reorganize kerneldoc parameter names to match order in function
     signature (Julia Lawall)

   - Fix sysfs reset_method_store() memory leak (Todd Kjos)

   - Simplify pci_create_slot() (Ilpo Järvinen)

   - Fix incorrect printf format specifiers in pcitest (Luo Yifan)"

* tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (127 commits)
  PCI: rockchip-ep: Handle PERST# signal in EP mode
  PCI: rockchip-ep: Improve link training
  PCI: rockship-ep: Implement the pci_epc_ops::stop_link() operation
  PCI: rockchip-ep: Refactor endpoint link training enable
  PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding
  PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations
  PCI: rockchip-ep: Rename rockchip_pcie_parse_ep_dt()
  PCI: rockchip-ep: Fix MSI IRQ data mapping
  PCI: rockchip-ep: Implement the pci_epc_ops::align_addr() operation
  PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr()
  PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr()
  PCI: rockchip-ep: Use a macro to define EP controller .align feature
  PCI: rockchip-ep: Fix address translation unit programming
  PCI/pwrctrl: Rename pwrctrl functions and structures
  PCI/pwrctrl: Rename pwrctl files to pwrctrl
  PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent
  PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers
  PCI/pwrctl: Create pwrctl device only if at least one power supply is present
  PCI/pwrctl: Use of_platform_device_create() to create pwrctl devices
  tools: PCI: Fix incorrect printf format specifiers
  ...
2024-11-26 18:05:44 -08:00
Linus Torvalds
e68ce9474a A few late-arriving fixes, plus two more significant changes that were
*almost* ready at the beginning of the merge window:
 
 - A new document on debugging techniques from Sebastian Fricke
 
 - A clarification on MODULE_LICENSE terms meant to head off the sort of
   confusion that led to the recent Tuxedo Computers mess.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmdFEf0PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YLCQH+wY0lGEF5BloFrNOcwKoB96rXQjLMlPVpccP
 lWVprapS+NrlhTq4RZ9b6qbQ1RAdu0JCppew1viwclO8g8SmUoXmqNnlYIFH+3MB
 HZETbPWUHK2BRQqV7h3VkgvO30hUa0kHL3WfmKpGEG1P6FsQQ5o3WDi3YN8GM6xk
 tfHSiR4rgBw40VLyeDtRi++aEgYa/DfWpdtco58poCiAS6soTDDEWCxSBdibeDOQ
 YDuj1NtqieMk963z8CoJm/Qbw/ZLfW2jd3A43cZ0h6g/oloVYSucFcMjXpePvoZr
 9BSkU9OyX5BRfhU/6EbU8eWYjgu0BuBk5uvCwkQgHz1p05MGRDE=
 =MmbD
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.13-2' of git://git.lwn.net/linux

Pull more documentation updates from Jonathan Corbet:
 "A few late-arriving fixes, plus two more significant changes that were
  *almost* ready at the beginning of the merge window:

   - A new document on debugging techniques from Sebastian Fricke

   - A clarification on MODULE_LICENSE terms meant to head off the sort
     of confusion that led to the recent Tuxedo Computers mess"

* tag 'docs-6.13-2' of git://git.lwn.net/linux:
  docs: Add debugging guide for the media subsystem
  docs: Add debugging section to process
  docs/licensing: Clarify wording about "GPL" and "Proprietary"
  docs: core-api/gfp_mask-from-fs-io: indicate that vmalloc supports GFP_NOFS/GFP_NOIO
  Documentation: kernel-doc: enumerate identifier *type*s
  Documentation: pwrseq: Fix trivial misspellings
  Documentation: filesystems: update filename extensions
2024-11-26 13:44:27 -08:00
Linus Torvalds
fb527fc1f3 fuse update for 6.13
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ0Rb/wAKCRDh3BK/laaZ
 PK80AQDAUgA6S5SSrbJxwRFNOhbwtZxZqJ8fomJR5xuWIEQ9pwEAkpFqhBhBW0y1
 0YaREow2aDANQQtSUrfPtgva1ZXFwQU=
 =Cyx5
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Add page -> folio conversions (Joanne Koong, Josef Bacik)

 - Allow max size of fuse requests to be configurable with a sysctl
   (Joanne Koong)

 - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun)

 - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao)

 - Fix attribute inconsistency in case readdirplus (and plain lookup in
   corner cases) is racing with inode eviction (Zhang Tianci)

 - Fix a WARN_ON triggered by virtio_fs (Asahi Lina)

* tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits)
  virtiofs: dax: remove ->writepages() callback
  fuse: check attributes staleness on fuse_iget()
  fuse: remove pages for requests and exclusively use folios
  fuse: convert direct io to use folios
  mm/writeback: add folio_mark_dirty_lock()
  fuse: convert writebacks to use folios
  fuse: convert retrieves to use folios
  fuse: convert ioctls to use folios
  fuse: convert writes (non-writeback) to use folios
  fuse: convert reads to use folios
  fuse: convert readdir to use folios
  fuse: convert readlink to use folios
  fuse: convert cuse to use folios
  fuse: add support in virtio for requests using folios
  fuse: support folios in struct fuse_args_pages and fuse_copy_pages()
  fuse: convert fuse_notify_store to use folios
  fuse: convert fuse_retrieve to use folios
  fuse: use the folio based vmstat helpers
  fuse: convert fuse_writepage_need_send to take a folio
  fuse: convert fuse_do_readpage to use folios
  ...
2024-11-26 12:41:27 -08:00
Linus Torvalds
e06635e26c slab updates for 6.13
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmdERvEACgkQu+CwddJF
 iJre6Af9EBMVQiWJrmoMOjbGLqLgmZzSXRNxR862WGn4D/wesA1HmSlWgEn54hgc
 GIYIeD++v4JaIRNH0yZqb2UBSKjF/rYPDkKstnqgFaVakLoDrwkkwV2n3Gk5BEgR
 m/SzLGgoDWKR65I/oMpL6e2KrMOfMfjpB31qiVvdlaQd2Nv/5rw+gUVylxhNIZEH
 W11N3IC+e9hmgT3ZBpTmHeqNrlXE1+USWPrp/HV05Ndz6yf97JnP4Wr9f9pcyN3R
 aflLHR38+Q9cCfO7y8wNqtYvIV/kbqgdaqD76frSgalC4Lmz9+L+TZ2NuENCPoGj
 Xdbip2z+iffWhvqM+qooOLVxR0XqTA==
 =Sepb
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - Add new slab_strict_numa boot parameter to enforce per-object memory
   policies on top of slab folio policies, for systems where saving cost
   of remote accesses is more important than minimizing slab allocation
   overhead (Christoph Lameter)

 - Fix for freeptr_offset alignment check being too strict for m68k
   (Geert Uytterhoeven)

 - krealloc() fixes for not violating __GFP_ZERO guarantees on
   krealloc() when slub_debug (redzone and object tracking) is enabled
   (Feng Tang)

 - Fix a memory leak in case sysfs registration fails for a slab cache,
   and also no longer fail to create the cache in that case (Hyeonggon
   Yoo)

 - Fix handling of detected consistency problems (due to buggy slab
   user) with slub_debug enabled, so that it does not cause further list
   corruption bugs (yuan.gao)

 - Code cleanup and kerneldocs polishing (Zhen Lei, Vlastimil Babka)

* tag 'slab-for-6.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slab: Fix too strict alignment check in create_cache()
  mm/slab: Allow cache creation to proceed even if sysfs registration fails
  mm/slub: Avoid list corruption when removing a slab from the full list
  mm/slub, kunit: Add testcase for krealloc redzone and zeroing
  mm/slub: Improve redzone check and zeroing for krealloc()
  mm/slub: Consider kfence case for get_orig_size()
  SLUB: Add support for per object memory policies
  mm, slab: add kerneldocs for common SLAB_ flags
  mm/slab: remove duplicate check in create_cache()
  mm/slub: Move krealloc() and related code to slub.c
  mm/kasan: Don't store metadata inside kmalloc object when slub_debug_orig_size is on
2024-11-25 16:51:24 -08:00
Linus Torvalds
f5f4745a7f - The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code.
 
 - The series "Improve the copy of task comm" from Yafang Shao addresses
   possible race-induced overflows in the management of task_struct.comm[].
 
 - The series "Remove unnecessary header includes from
   {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
   small fix to the list_sort library code and to its selftest.
 
 - The series "Enhance min heap API with non-inline functions and
   optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
   min_heap library code.
 
 - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
   finishes off nilfs2's folioification.
 
 - The series "add detect count for hung tasks" from Lance Yang adds more
   userspace visibility into the hung-task detector's activity.
 
 - Apart from that, singelton patches in many places - please see the
   individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ0L6lQAKCRDdBJ7gKXxA
 jmEIAPwMSglNPKRIOgzOvHh8MUJW1Dy8iKJ2kWCO3f6QTUIM2AEA+PazZbUd/g2m
 Ii8igH0UBibIgva7MrCyJedDI1O23AA=
 =8BIU
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "resource: A couple of cleanups" from Andy Shevchenko
   performs some cleanups in the resource management code

 - The series "Improve the copy of task comm" from Yafang Shao addresses
   possible race-induced overflows in the management of
   task_struct.comm[]

 - The series "Remove unnecessary header includes from
   {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
   small fix to the list_sort library code and to its selftest

 - The series "Enhance min heap API with non-inline functions and
   optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
   min_heap library code

 - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
   finishes off nilfs2's folioification

 - The series "add detect count for hung tasks" from Lance Yang adds
   more userspace visibility into the hung-task detector's activity

 - Apart from that, singelton patches in many places - please see the
   individual changelogs for details

* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  gdb: lx-symbols: do not error out on monolithic build
  kernel/reboot: replace sprintf() with sysfs_emit()
  lib: util_macros_kunit: add kunit test for util_macros.h
  util_macros.h: fix/rework find_closest() macros
  Improve consistency of '#error' directive messages
  ocfs2: fix uninitialized value in ocfs2_file_read_iter()
  hung_task: add docs for hung_task_detect_count
  hung_task: add detect count for hung tasks
  dma-buf: use atomic64_inc_return() in dma_buf_getfile()
  fs/proc/kcore.c: fix coccinelle reported ERROR instances
  resource: avoid unnecessary resource tree walking in __region_intersects()
  ocfs2: remove unused errmsg function and table
  ocfs2: cluster: fix a typo
  lib/scatterlist: use sg_phys() helper
  checkpatch: always parse orig_commit in fixes tag
  nilfs2: convert metadata aops from writepage to writepages
  nilfs2: convert nilfs_recovery_copy_block() to take a folio
  nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
  nilfs2: remove nilfs_writepage
  nilfs2: convert checkpoint file to be folio-based
  ...
2024-11-25 16:09:48 -08:00
Linus Torvalds
5c00ff742b - The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection algorithm.
   This leads to improved memory savings.
 
 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
 
 	- "refine mas_mab_cp()"
 	- "Reduce the space to be cleared for maple_big_node"
 	- "maple_tree: simplify mas_push_node()"
 	- "Following cleanup after introduce mas_wr_store_type()"
 	- "refine storing null"
 
 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.
 
 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping code.
 
 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of shadow
   entries.
 
 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.
 
 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in the
   hugetlb code.
 
 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page into
   small pages.  Instead we replace the whole thing with a THP.  More
   consistent cleaner and potentiall saves a large number of pagefaults.
 
 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.
 
 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to do.
 
 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio size
   rather than as individual pages.  A 20% speedup was observed.
 
 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
 
 - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
   removes the long-deprecated memcgv2 charge moving feature.
 
 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.
 
 - The series "x86/module: use large ROX pages for text allocations" from
   Mike Rapoport teaches x86 to use large pages for read-only-execute
   module text.
 
 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.
 
 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/.  A slow march towards shrinking
   struct page.
 
 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.
 
 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression.  It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.
 
 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
   over to the KUnit framework.
 
 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a single
   VMA, rather than requiring that multiple VMAs be created for this.
   Improved efficiencies for userspace memory allocators are expected.
 
 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.
 
 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.
 
 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP from
   the kernel boot command line.
 
 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.
 
 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep is
   enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
 jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
 xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
 =JfWR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "zram: optimal post-processing target selection" from
   Sergey Senozhatsky improves zram's post-processing selection
   algorithm. This leads to improved memory savings.

 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
	- "refine mas_mab_cp()"
	- "Reduce the space to be cleared for maple_big_node"
	- "maple_tree: simplify mas_push_node()"
	- "Following cleanup after introduce mas_wr_store_type()"
	- "refine storing null"

 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.

 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping
   code.

 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of
   shadow entries.

 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.

 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in
   the hugetlb code.

 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page
   into small pages. Instead we replace the whole thing with a THP. More
   consistent cleaner and potentiall saves a large number of pagefaults.

 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.

 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to
   do.

 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio
   size rather than as individual pages. A 20% speedup was observed.

 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
   splitting.

 - The series "memcg-v1: fully deprecate charge moving" from Shakeel
   Butt removes the long-deprecated memcgv2 charge moving feature.

 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.

 - The series "x86/module: use large ROX pages for text allocations"
   from Mike Rapoport teaches x86 to use large pages for
   read-only-execute module text.

 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.

 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/. A slow march towards shrinking
   struct page.

 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.

 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression. It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.

 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
   tests over to the KUnit framework.

 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a
   single VMA, rather than requiring that multiple VMAs be created for
   this. Improved efficiencies for userspace memory allocators are
   expected.

 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.

 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.

 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP
   from the kernel boot command line.

 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.

 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep
   is enabled.

* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
  cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
  mm/kfence: add a new kunit test test_use_after_free_read_nofault()
  zram: fix NULL pointer in comp_algorithm_show()
  memcg/hugetlb: add hugeTLB counters to memcg
  vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
  mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
  zram: ZRAM_DEF_COMP should depend on ZRAM
  MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
  Docs/mm/damon: recommend academic papers to read and/or cite
  mm: define general function pXd_init()
  kmemleak: iommu/iova: fix transient kmemleak false positive
  mm/list_lru: simplify the list_lru walk callback function
  mm/list_lru: split the lock to per-cgroup scope
  mm/list_lru: simplify reparenting and initial allocation
  mm/list_lru: code clean up for reparenting
  mm/list_lru: don't export list_lru_add
  mm/list_lru: don't pass unnecessary key parameters
  kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
  kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
  kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
  ...
2024-11-23 09:58:07 -08:00
Sebastian Fricke
83a474c11e docs: Add debugging guide for the media subsystem
Provide a guide for developers on how to debug code with a focus on the
media subsystem. This document aims to provide a rough overview over the
possibilities and a rational to help choosing the right tool for the
given circumstances.

Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241028-media_docs_improve_v3-v3-2-edf5c5b3746f@collabora.com
2024-11-22 10:48:12 -07:00
Linus Torvalds
fcc79e1714 Networking changes for 6.13.
The most significant set of changes is the per netns RTNL. The new
 behavior is disabled by default, regression risk should be contained.
 
 Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
 default value from PTP_1588_CLOCK_KVM, as the first is intended to be
 a more reliable replacement for the latter.
 
 Core
 ----
 
  - Started a very large, in-progress, effort to make the RTNL lock
    scope per network-namespace, thus reducing the lock contention
    significantly in the containerized use-case, comprising:
    - RCU-ified some relevant slices of the FIB control path
    - introduce basic per netns locking helpers
    - namespacified the IPv4 address hash table
    - remove rtnl_register{,_module}() in favour of rtnl_register_many()
    - refactor rtnl_{new,del,set}link() moving as much validation as
      possible out of RTNL lock
    - convert all phonet doit() and dumpit() handlers to RCU
    - convert IPv4 addresses manipulation to per-netns RTNL
    - convert virtual interface creation to per-netns RTNL
    the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
    knob, disabled by default ad interim.
 
  - Introduce NAPI suspension, to efficiently switching between busy
    polling (NAPI processing suspended) and normal processing.
 
  - Migrate the IPv4 routing input, output and control path from direct
    ToS usage to DSCP macros. This is a work in progress to make ECN
    handling consistent and reliable.
 
  - Add drop reasons support to the IPv4 rotue input path, allowing
    better introspection in case of packets drop.
 
  - Make FIB seqnum lockless, dropping RTNL protection for read
    access.
 
  - Make inet{,v6} addresses hashing less predicable.
 
  - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
    and timestamps
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Add small file operations for debugfs, to reduce the struct ops size.
 
  - Refactoring and optimization for the implementation of page_frag API,
    This is a preparatory work to consolidate the page_frag
    implementation.
 
 Netfilter
 ---------
 
  - Optimize set element transactions to reduce memory consumption
 
  - Extended netlink error reporting for attribute parser failure.
 
  - Make legacy xtables configs user selectable, giving users
    the option to configure iptables without enabling any other config.
 
  - Address a lot of false-positive RCU issues, pointed by recent
    CI improvements.
 
 BPF
 ---
 
  - Put xsk sockets on a struct diet and add various cleanups. Overall,
    this helps to bump performance by 12% for some workloads.
 
  - Extend BPF selftests to increase coverage of XDP features in
    combination with BPF cpumap.
 
  - Optimize and homogenize bpf_csum_diff helper for all archs and also
    add a batch of new BPF selftests for it.
 
  - Extend netkit with an option to delegate skb->{mark,priority}
    scrubbing to its BPF program.
 
  - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
    programs.
 
 Protocols
 ---------
 
  - Introduces 4-tuple hash for connected udp sockets, speeding-up
    significantly connected sockets lookup.
 
  - Add a fastpath for some TCP timers that usually expires after close,
    the socket lock contention.
 
  - Add inbound and outbound xfrm state caches to speed up state lookups.
 
  - Avoid sending MPTCP advertisements on stale subflows, reducing
    risks on loosing them.
 
  - Make neighbours table flushing more scalable, maintaining per device
    neigh lists.
 
 Driver API
 ----------
 
  - Introduce a unified interface to configure transmission H/W shaping,
    and expose it to user-space via generic-netlink.
 
  - Add support for per-NAPI config via netlink. This makes napi
    configuration persistent across queues removal and re-creation.
    Requires driver updates, currently supported drivers are:
    nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
 
  - Add ethtool support for writing SFP / PHY firmware blocks.
 
  - Track RSS context allocation from ethtool core.
 
  - Implement support for mirroring to DSA CPU port, via TC mirror
    offload.
 
  - Consolidate FDB updates notification, to avoid duplicates on
    device-specific entries.
 
  - Expose DPLL clock quality level to the user-space.
 
  - Support master-slave PHY config via device tree.
 
 Tests and tooling
 -----------------
 
  - forwarding: introduce deferred commands, to simplify
    the cleanup phase
 
 Drivers
 -------
 
  - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
    Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
    IRQs and queues to NAPI IDs, allowing busy polling and better
    introspection.
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox:
      - mlx5:
        - a large refactor to implement support for cross E-Switch
          scheduling
        - refactor H/W conter management to let it scale better
        - H/W GRO cleanups
    - Intel (100G, ice)::
      - adds support for ethtool reset
      - implement support for per TX queue H/W shaping
    - AMD/Solarflare:
      - implement per device queue stats support
    - Broadcom (bnxt):
      - improve wildcard l4proto on IPv4/IPv6 ntuple rules
    - Marvell Octeon:
      - Adds representor support for each Resource Virtualization Unit
        (RVU) device.
    - Hisilicon:
      - adds support for the BMC Gigabit Ethernet
    - IBM (EMAC):
      - driver cleanup and modernization
    - Cisco (VIC):
      - raise the queues number limit to 256
 
  - Ethernet virtual:
    - Google vNIC:
      - implements page pool support
    - macsec:
      - inherit lower device's features and TSO limits when offloading
    - virtio_net:
      - enable premapped mode by default
      - support for XDP socket(AF_XDP) zerocopy TX
    - wireguard:
      - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
        packets.
 
  - Ethernet NICs embedded and virtual:
    - Broadcom ASP:
      - enable software timestamping
    - Freescale:
      - add enetc4 PF driver
    - MediaTek: Airoha SoC:
      - implement BQL support
    - RealTek r8169:
      - enable TSO by default on r8168/r8125
      - implement extended ethtool stats
    - Renesas AVB:
      - enable TX checksum offload
    - Synopsys (stmmac):
      - support header splitting for vlan tagged packets
      - move common code for DWMAC4 and DWXGMAC into a separate FPE
        module.
      - Add the dwmac driver support for T-HEAD TH1520 SoC
    - Synopsys (xpcs):
      - driver refactor and cleanup
    - TI:
      - icssg_prueth: add VLAN offload support
    - Xilinx emaclite:
      - adds clock support
 
  - Ethernet switches:
    - Microchip:
      - implement support for the lan969x Ethernet switch family
      - add LAN9646 switch support to KSZ DSA driver
 
  - Ethernet PHYs:
    - Marvel: 88q2x: enable auto negotiation
    - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
 
  - PTP:
    - Add support for the Amazon virtual clock device
    - Add PtP driver for s390 clocks
 
  - WiFi:
    - mac80211
      - EHT 1024 aggregation size for transmissions
      - new operation to indicate that a new interface is to be added
      - support radio separation of multi-band devices
      - move wireless extension spy implementation to libiw
    - Broadcom:
      - brcmfmac: optional LPO clock support
    - Microchip:
      - add support for Atmel WILC3000
    - Qualcomm (ath12k):
      - firmware coredump collection support
      - add debugfs support for a multitude of statistics
    - Qualcomm (ath5k):
      -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
    - Realtek:
      - rtw88: 8821au and 8812au USB adapters support
      - rtw89: add thermal protection
      - rtw89: fine tune BT-coexsitence to improve user experience
      - rtw89: firmware secure boot for WiFi 6 chip
 
  - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
 9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
 wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
 vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
 jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
 b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
 kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
 BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
 lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
 Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
 eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
 THcvVM+bupEZ
 =GzPr
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "The most significant set of changes is the per netns RTNL. The new
  behavior is disabled by default, regression risk should be contained.

  Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
  default value from PTP_1588_CLOCK_KVM, as the first is intended to be
  a more reliable replacement for the latter.

  Core:

   - Started a very large, in-progress, effort to make the RTNL lock
     scope per network-namespace, thus reducing the lock contention
     significantly in the containerized use-case, comprising:
       - RCU-ified some relevant slices of the FIB control path
       - introduce basic per netns locking helpers
       - namespacified the IPv4 address hash table
       - remove rtnl_register{,_module}() in favour of
         rtnl_register_many()
       - refactor rtnl_{new,del,set}link() moving as much validation as
         possible out of RTNL lock
       - convert all phonet doit() and dumpit() handlers to RCU
       - convert IPv4 addresses manipulation to per-netns RTNL
       - convert virtual interface creation to per-netns RTNL
     the per-netns lock infrastructure is guarded by the
     CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.

   - Introduce NAPI suspension, to efficiently switching between busy
     polling (NAPI processing suspended) and normal processing.

   - Migrate the IPv4 routing input, output and control path from direct
     ToS usage to DSCP macros. This is a work in progress to make ECN
     handling consistent and reliable.

   - Add drop reasons support to the IPv4 rotue input path, allowing
     better introspection in case of packets drop.

   - Make FIB seqnum lockless, dropping RTNL protection for read access.

   - Make inet{,v6} addresses hashing less predicable.

   - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
     and timestamps

  Things we sprinkled into general kernel code:

   - Add small file operations for debugfs, to reduce the struct ops
     size.

   - Refactoring and optimization for the implementation of page_frag
     API, This is a preparatory work to consolidate the page_frag
     implementation.

  Netfilter:

   - Optimize set element transactions to reduce memory consumption

   - Extended netlink error reporting for attribute parser failure.

   - Make legacy xtables configs user selectable, giving users the
     option to configure iptables without enabling any other config.

   - Address a lot of false-positive RCU issues, pointed by recent CI
     improvements.

  BPF:

   - Put xsk sockets on a struct diet and add various cleanups. Overall,
     this helps to bump performance by 12% for some workloads.

   - Extend BPF selftests to increase coverage of XDP features in
     combination with BPF cpumap.

   - Optimize and homogenize bpf_csum_diff helper for all archs and also
     add a batch of new BPF selftests for it.

   - Extend netkit with an option to delegate skb->{mark,priority}
     scrubbing to its BPF program.

   - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
     programs.

  Protocols:

   - Introduces 4-tuple hash for connected udp sockets, speeding-up
     significantly connected sockets lookup.

   - Add a fastpath for some TCP timers that usually expires after
     close, the socket lock contention.

   - Add inbound and outbound xfrm state caches to speed up state
     lookups.

   - Avoid sending MPTCP advertisements on stale subflows, reducing
     risks on loosing them.

   - Make neighbours table flushing more scalable, maintaining per
     device neigh lists.

  Driver API:

   - Introduce a unified interface to configure transmission H/W
     shaping, and expose it to user-space via generic-netlink.

   - Add support for per-NAPI config via netlink. This makes napi
     configuration persistent across queues removal and re-creation.
     Requires driver updates, currently supported drivers are:
     nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.

   - Add ethtool support for writing SFP / PHY firmware blocks.

   - Track RSS context allocation from ethtool core.

   - Implement support for mirroring to DSA CPU port, via TC mirror
     offload.

   - Consolidate FDB updates notification, to avoid duplicates on
     device-specific entries.

   - Expose DPLL clock quality level to the user-space.

   - Support master-slave PHY config via device tree.

  Tests and tooling:

   - forwarding: introduce deferred commands, to simplify the cleanup
     phase

  Drivers:

   - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
     Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
     IRQs and queues to NAPI IDs, allowing busy polling and better
     introspection.

   - Ethernet high-speed NICs:
      - nVidia/Mellanox:
         - mlx5:
           - a large refactor to implement support for cross E-Switch
             scheduling
           - refactor H/W conter management to let it scale better
           - H/W GRO cleanups
      - Intel (100G, ice)::
         - add support for ethtool reset
         - implement support for per TX queue H/W shaping
      - AMD/Solarflare:
         - implement per device queue stats support
      - Broadcom (bnxt):
         - improve wildcard l4proto on IPv4/IPv6 ntuple rules
      - Marvell Octeon:
         - Add representor support for each Resource Virtualization Unit
           (RVU) device.
      - Hisilicon:
         - add support for the BMC Gigabit Ethernet
      - IBM (EMAC):
         - driver cleanup and modernization
      - Cisco (VIC):
         - raise the queues number limit to 256

   - Ethernet virtual:
      - Google vNIC:
         - implement page pool support
      - macsec:
         - inherit lower device's features and TSO limits when
           offloading
      - virtio_net:
         - enable premapped mode by default
         - support for XDP socket(AF_XDP) zerocopy TX
      - wireguard:
         - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
           packets.

   - Ethernet NICs embedded and virtual:
      - Broadcom ASP:
         - enable software timestamping
      - Freescale:
         - add enetc4 PF driver
      - MediaTek: Airoha SoC:
         - implement BQL support
      - RealTek r8169:
         - enable TSO by default on r8168/r8125
         - implement extended ethtool stats
      - Renesas AVB:
         - enable TX checksum offload
      - Synopsys (stmmac):
         - support header splitting for vlan tagged packets
         - move common code for DWMAC4 and DWXGMAC into a separate FPE
           module.
         - add dwmac driver support for T-HEAD TH1520 SoC
      - Synopsys (xpcs):
         - driver refactor and cleanup
      - TI:
         - icssg_prueth: add VLAN offload support
      - Xilinx emaclite:
         - add clock support

   - Ethernet switches:
      - Microchip:
         - implement support for the lan969x Ethernet switch family
         - add LAN9646 switch support to KSZ DSA driver

   - Ethernet PHYs:
      - Marvel: 88q2x: enable auto negotiation
      - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2

   - PTP:
      - Add support for the Amazon virtual clock device
      - Add PtP driver for s390 clocks

   - WiFi:
      - mac80211
         - EHT 1024 aggregation size for transmissions
         - new operation to indicate that a new interface is to be added
         - support radio separation of multi-band devices
         - move wireless extension spy implementation to libiw
      - Broadcom:
         - brcmfmac: optional LPO clock support
      - Microchip:
         - add support for Atmel WILC3000
      - Qualcomm (ath12k):
         - firmware coredump collection support
         - add debugfs support for a multitude of statistics
      - Qualcomm (ath5k):
         -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
      - Realtek:
         - rtw88: 8821au and 8812au USB adapters support
         - rtw89: add thermal protection
         - rtw89: fine tune BT-coexsitence to improve user experience
         - rtw89: firmware secure boot for WiFi 6 chip

   - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature"

* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
  mm: page_frag: fix a compile error when kernel is not compiled
  Documentation: tipc: fix formatting issue in tipc.rst
  selftests: nic_performance: Add selftest for performance of NIC driver
  selftests: nic_link_layer: Add selftest case for speed and duplex states
  selftests: nic_link_layer: Add link layer selftest for NIC driver
  bnxt_en: Add FW trace coredump segments to the coredump
  bnxt_en: Add a new ethtool -W dump flag
  bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
  bnxt_en: Add functions to copy host context memory
  bnxt_en: Do not free FW log context memory
  bnxt_en: Manage the FW trace context memory
  bnxt_en: Allocate backing store memory for FW trace logs
  bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
  bnxt_en: Refactor bnxt_free_ctx_mem()
  bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
  bnxt_en: Update firmware interface spec to 1.10.3.85
  selftests/bpf: Add some tests with sockmap SK_PASS
  bpf: fix recursive lock when verdict program return SK_PASS
  wireguard: device: support big tcp GSO
  wireguard: selftests: load nf_conntrack if not present
  ...
2024-11-21 08:28:08 -08:00
Linus Torvalds
9f5a6a1fe6 media updates for v6.13-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmc8Q80ACgkQCF8+vY7k
 4RX+3g//dMBSmu3uC9OiXyfw3aB8w62RMeieRxSVPMdkiacUm1J8HyzHnXPXIUn3
 tfBT9E/YbeFZ+PlrOXRDUi1i8jmN47VuwRe01rxxF/FdlYknC2eGH3Ug9DW90VBh
 wmZ1kSjyjizwDkKAm+Jc2xynTaX+iInJ4Kzp9RStDZPuaqj2Qzd1qVRk2FJwAYRh
 5dTpi0W1PexjxQXDIcnHi/tPapGLSP5PnrunrAJR0tYfp60wrKMaxTO36yJzbnDP
 MxkF8A+9dWtePRqoPWxPIvnOVu/+Twc730xkQp62qPvwEM2HZRtU7cgQFlWos6p/
 ijK2i6sAQslMhQ9oIyKlO7HpXX60rjE3XtdzEtGxBq6DyIqx1riN+OqJB2C4Cdsr
 2qUET8aTIisPURw1ecNAbthvLt8tljBe08/eX0GYaWFjALJx3Pds23ahH8hw295N
 o3SY5NaGmO9Tg6HzYLSwfBmxgGpWDuRic6PDCVKok5mS5D1+uV/tu8fQFiNiFNVe
 Okufjvo7HtZ3+rWR90b/Udpz/lBB/dceppnUX2iKevrG190VHxEwJ2pQKkfdH9ha
 LUZQajikiv5rbGxKIGrjrCnjrJ24TC2vCSPhkgOb1r91LUY4RUV61c3hZbH73rzQ
 2Ykwvmy+gpE4GEYiJRfSR6KlXdBruMa5FToLUHEK0uBMSlG5k7Q=
 =25BS
 -----END PGP SIGNATURE-----

Merge tag 'media/v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - removal of the old omap4iss media driver

 - mantis: remove orphan mantis_core.h

 - add support for Raspberypi CFE

 - uvc driver got a co-maintainer

 - main media tree moved to git://linuxtv.org/media.git

 - lots of driver cleanups, updates and fixes

* tag 'media/v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (233 commits)
  docs: media: update location of the media patches
  MAINTAINERS: update location of media main tree
  media: MAINTAINERS: Add Hans de Goede as USB VIDEO CLASS co-maintainer
  media: platform: samsung: s5p-jpeg: Remove deadcode
  media: qcom: camss: Add MSM8953 resources
  media: dt-bindings: Add qcom,msm8953-camss
  media: qcom: camss: implement pm domain ops for VFE v4.1
  media: platform: exynos4-is: Fix an OF node reference leak in fimc_md_is_isp_available
  media: adv7180: Also check for "adi,force-bt656-4"
  media: dt-bindings: adv7180: Document 'adi,force-bt656-4'
  media: mgb4: Fix inconsistent input/output alignment in loopback mode
  media: replace obsolete hans.verkuil@cisco.com alias
  Documentation: media: improve V4L2_CID_MIN_BUFFERS_FOR_*, doc
  media: vicodec: add V4L2_CID_MIN_BUFFERS_FOR_* controls
  media: atomisp: Add check for rgby_data memory allocation failure
  media: atomisp: remove redundant re-checking of err
  media: atomisp: Fix spelling errors reported by codespell
  media: atomisp: Remove License information boilerplate
  media: atomisp: Fix typos in comment
  media: atomisp: hmm_bo: Fix spelling errors in hmm_bo.h
  ...
2024-11-20 14:01:15 -08:00
Linus Torvalds
c3cda60e83 Another moderately busy cycle in docsland:
- Work on Chinese translations has picked up again.  Happily, they are
   maintaining the existing translations and not just adding new ones.
 
 - Some maintenance of the Japanese and Italian translations as well.
 
 - The removal of the venerable "dontdiff" file.  It has long outlived its
   usefulness and contained entries ("parse.*") that would actively mask
   actual source change.
 
 - The addition of enforcement information to the code-of-conduct
   documentation.
 
 Along with some build-system fixes and a lot of typo and language fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmc7eD4PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YSp4H/2zknNZNhxAtWbF1L/MprjVgh5OtS0xEI8SR
 Klks8pHm9Dg5sg3EciJ9Jt7C3ZdPANOb7K4ykL2w2TKLgZbIMUa6FIqKbASqbryX
 0t3nTn0gvkVMEtLlNLw4M1QIUox55fxLKUMV0MxcTAkvmFnG6XJl2gzGoL/SrI/h
 19QDAKZZn2+S7Yow8MAdfef+ILu1Y9ms/4pumeUXHgVPJO7HDMCS85zQGU3tAB2n
 HgR4RRSXNsfXvW/rxx2YvGtJ3SZWnZM7NVbWcb25i8Wu/uBDOzoSW7uFRRad67cP
 d0MiHrB9RqltHGaJpEUisKLpTExd/GEZlTL+ILbXDROT+BHdLDQ=
 =ndvR
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.13' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Another moderately busy cycle in docsland:

   - Work on Chinese translations has picked up again. Happily, they are
     maintaining the existing translations and not just adding new ones.

   - Some maintenance of the Japanese and Italian translations as well.

   - The removal of the venerable "dontdiff" file. It has long outlived
     its usefulness and contained entries ("parse.*") that would
     actively mask actual source change.

   - The addition of enforcement information to the code-of-conduct
     documentation.

  Along with some build-system fixes and a lot of typo and language
  fixes"

* tag 'docs-6.13' of git://git.lwn.net/linux: (52 commits)
  Documentation/CoC: spell out enforcement for unacceptable behaviors
  docs: fix typos and whitespace in Documentation/process/backporting.rst
  docs/zh_CN: fix one sentence in llvm.rst
  docs: bug-bisect: add a note about bisecting -next
  docs/zh_CN: add the translation of kbuild/llvm.rst
  Documentation: Fix incorrect paths/magic in magic numbers rst
  Documentation/maintainer-tip: Fix typos
  Documentation: Improve crash_kexec_post_notifiers description
  Docs/zh_CN: Translate physical_memory.rst to Simplified Chinese
  Documentation: admin: reorganize kernel-parameters intro
  docs/zh_CN: update the translation of process/programming-language.rst
  docs/zh_CN: update the translation of mm/page_owner.rst
  docs/zh_CN: update the translation of mm/page_table_check.rst
  docs/zh_CN: update the translation of mm/overcommit-accounting.rst
  docs/zh_CN: update the translation of mm/admon/faq.rst
  docs/zh_CN: update the translation of mm/active_mm.rst
  docs/zh_CN: update the translation of mm/hmm.rst
  docs: remove Documentation/dontdiff
  docs/zh_CN: Add a entry in Chinese glossary
  Docs/zh_CN: Fix the pfn calculation error in page_tables.rst
  ...
2024-11-20 09:16:45 -08:00
Linus Torvalds
8cdf2d1903 RCU pull request for v6.13
SRCU:
 
 	- Introduction of the new SRCU-lite flavour with a new pair of
 	  srcu_read_[un]lock_lite() APIs. In practice the read side using
 	  this flavour becomes lighter by removing a full memory barrier on
 	  LOCK and a full memory barrier on UNLOCK. This comes at the
 	  expense of a higher latency write side with two (in the best case
 	  of a snaphot of unused read-sides) or more RCU grace periods on
 	  the update side which now assumes by itself the whole full
 	  ordering guarantee against the LOCK/UNLOCK counters on both
 	  indexes, along with the accesses performed inside.
 
 	  Uretprobes is a known potential user.
 
 	  Note this doesn't replace the default normal flavour of SRCU which
 	  still behaves the same as usual.
 
 	- Add testing of SRCU-lite through rcutorture and rcuscale
 
 	- Various cleanups on the way.
 
 FIXES:
 
 	- Allow short-circuiting RCU-TASKS-RUDE grace periods on architectures
 	  that have sane noinstr boundaries forbidding tracing on low-level
 	  idle and kernel entry code. RCU-TASKS is enough on such configurations
 	  because it involves an RCU grace period that waits for all idle
 	  tasks to either schedule out voluntarily or enter into RCU
 	  unwatched noinstr code.
 
 	- Allow and test start_poll_synchronize_rcu() with IRQs disabled.
 
 	- Mention rcuog kthreads in relevant documentation and Kconfig help
 
 	- Various fixes and consolidations
 
 RCUTORTURE:
 
 	- Add --no-affinity on tools to leave the affinity setting of guests
 	  up to the user.
 
 	- Add guest_os_delay parameter to rcuscale for better warm-up
 	  control.
 
 	- Fix and improve some rcuscale error handling.
 
 	- Various cleanups and fixes
 
 STALL:
 
 	- Remove dead code
 
 	- Stop dumping tasks if a stalled grace period eventually ended
 	  midway as that only produces confusing output.
 
 	- Optimize detection of stalling CPUs and avoid useless node
 	  locking otherwise.
 
 NOCB:
 
 	- Fix rcu_barrier() hang due to a race against callbacks
 	  deoffloading. This is not yet used, except by rcutorture, and
 	  waits for its promised cpusets interface.
 
 	- Remove leftover function declaration
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmc6gP0ACgkQhSRUR1CO
 jHcHfw/5AWg5wiapwJtLO9KNdtELflTTbT/NhhqwYVReHnOSvtPNwWgo984T3jYJ
 xikE4Ccn5Nu4zJVbTOtmwJ/RP6WWP1I28LgoTCdcz9BB9b+CRLogV/dR5r5uZbhD
 +jqXRAzDhEifR0pcfSK28MkXoh+puXMg4C78f7xtT1Oe3Gr67RLf6xvE59gHJrDg
 QrPStdwhOn2bhmbKcflw1bHYqpypL09P2WHuRLmsJJUMUGIHTohK05lJOkD3hV9g
 HTxOecNmeF/r8NyN8l/ERJgKmwDukIG02xih8UMEtqDEl04IxZFHbCfB6yyIsKDT
 fTFxnRCHnm/PxIKRA5ENvyg/6uArMJ0xuSTZRG4K5v0nx7okR8gbCPmwiwn1m5w3
 +/oppjCmG/gRgyiOytuEGKfaN9q/oJqQgeS7j8WruWj9V68FYUKr6COfQByw0xOc
 H6ftaLGeFHgHxk3nua2wFrfMtQhucYAMGAlVK82yd7Q1EFW47kzleO8w/HSvfrBt
 trX+9HZ77GVVmREJMstnIWRr5mbPtUf8yRZdA5bBrlEYz0A/ToNaFACid0fsaMC2
 Dbo9Q+wDqL2wwOpjZy+MA3k1IVyDdUTuOQmPt57LmFTxUNZ+AQQlJcrhrUqWVvdM
 Nne2EHdqCHADKd7g3i17HtvpTsapz+Qakpzx8UsPqNtfo1DSd5A=
 =MWrw
 -----END PGP SIGNATURE-----

Merge tag 'rcu.release.v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Frederic Weisbecker:
 "SRCU:

   - Introduction of the new SRCU-lite flavour with a new pair of
     srcu_read_[un]lock_lite() APIs. In practice the read side using
     this flavour becomes lighter by removing a full memory barrier on
     LOCK and a full memory barrier on UNLOCK. This comes at the expense
     of a higher latency write side with two (in the best case of a
     snaphot of unused read-sides) or more RCU grace periods on the
     update side which now assumes by itself the whole full ordering
     guarantee against the LOCK/UNLOCK counters on both indexes, along
     with the accesses performed inside.

     Uretprobes is a known potential user.

     Note this doesn't replace the default normal flavour of SRCU which
     still behaves the same as usual.

   - Add testing of SRCU-lite through rcutorture and rcuscale

   - Various cleanups on the way.

  Fixes:

   - Allow short-circuiting RCU-TASKS-RUDE grace periods on
     architectures that have sane noinstr boundaries forbidding tracing
     on low-level idle and kernel entry code. RCU-TASKS is enough on
     such configurations because it involves an RCU grace period that
     waits for all idle tasks to either schedule out voluntarily or
     enter into RCU unwatched noinstr code.

   - Allow and test start_poll_synchronize_rcu() with IRQs disabled.

   - Mention rcuog kthreads in relevant documentation and Kconfig help

   - Various fixes and consolidations

  rcutorture:

   - Add --no-affinity on tools to leave the affinity setting of guests
     up to the user.

   - Add guest_os_delay parameter to rcuscale for better warm-up
     control.

   - Fix and improve some rcuscale error handling.

   - Various cleanups and fixes

  stall:

   - Remove dead code

   - Stop dumping tasks if a stalled grace period eventually ended
     midway as that only produces confusing output.

   - Optimize detection of stalling CPUs and avoid useless node locking
     otherwise.

  NOCB:

   - Fix rcu_barrier() hang due to a race against callbacks
     deoffloading. This is not yet used, except by rcutorture, and waits
     for its promised cpusets interface.

   - Remove leftover function declaration"

* tag 'rcu.release.v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (42 commits)
  rcuscale: Remove redundant WARN_ON_ONCE() splat
  rcuscale: Do a proper cleanup if kfree_scale_init() fails
  srcu: Unconditionally record srcu_read_lock_lite() in ->srcu_reader_flavor
  srcu: Check for srcu_read_lock_lite() across all CPUs
  srcu: Remove smp_mb() from srcu_read_unlock_lite()
  rcutorture: Avoid printing cpu=-1 for no-fault RCU boost failure
  rcuscale: Add guest_os_delay module parameter
  refscale: Correct affinity check
  torture: Add --no-affinity parameter to kvm.sh
  rcu/nocb: Fix missed RCU barrier on deoffloading
  rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu
  rcu/srcutiny: don't return before reenabling preemption
  rcu-tasks: Remove open-coded one-byte cmpxchg() emulation
  doc: Remove kernel-parameters.txt entry for rcutorture.read_exit
  rcutorture: Test start-poll primitives with interrupts disabled
  rcu: Permit start_poll_synchronize_rcu*() with interrupts disabled
  rcu: Allow short-circuiting of synchronize_rcu_tasks_rude()
  doc: Add rcuog kthreads to kernel-per-CPU-kthreads.rst
  rcu: Add rcuog kthreads to RCU_NOCB_CPU help text
  rcu: Use the BITS_PER_LONG macro
  ...
2024-11-19 11:27:07 -08:00
Linus Torvalds
ba1f9c8fe3 arm64 updates for 6.13:
* Support for running Linux in a protected VM under the Arm Confidential
   Compute Architecture (CCA)
 
 * Guarded Control Stack user-space support. Current patches follow the
   x86 ABI of implicitly creating a shadow stack on clone(). Subsequent
   patches (already on the list) will add support for clone3() allowing
   finer-grained control of the shadow stack size and placement from libc
 
 * AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are
   getting close with the upcoming dpISA support)
 
 * Other arch features:
 
   - In-kernel use of the memcpy instructions, FEAT_MOPS (previously only
     exposed to user; uaccess support not merged yet)
 
   - MTE: hugetlbfs support and the corresponding kselftests
 
   - Optimise CRC32 using the PMULL instructions
 
   - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG
 
   - Optimise the kernel TLB flushing to use the range operations
 
   - POE/pkey (permission overlays): further cleanups after bringing the
     signal handler in line with the x86 behaviour for 6.12
 
 * arm64 perf updates:
 
   - Support for the NXP i.MX91 PMU in the existing IMX driver
 
   - Support for Ampere SoCs in the Designware PCIe PMU driver
 
   - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC
 
   - Support for Samsung's 'Mongoose' CPU PMU
 
   - Support for PMUv3.9 finer-grained userspace counter access control
 
   - Switch back to platform_driver::remove() now that it returns 'void'
 
   - Add some missing events for the CXL PMU driver
 
 * Miscellaneous arm64 fixes/cleanups:
 
   - Page table accessors cleanup: type updates, drop unused macros,
     reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity
     check addresses before runtime P4D/PUD folding
 
   - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the
     FEAT_ECV for the generic timers) allowing Linux to boot with
     firmware deployments that don't set SCTLR_EL3.ECVEn
 
   - ACPI/arm64: tighten the check for the array of platform timer
     structures and adjust the error handling procedure in
     gtdt_parse_timer_block()
 
   - Optimise the cache flush for the uprobes xol slot (skip if no
     change) and other uprobes/kprobes cleanups
 
   - Fix the context switching of tpidrro_el0 when kpti is enabled
 
   - Dynamic shadow call stack fixes
 
   - Sysreg updates
 
   - Various arm64 kselftest improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmc5POIACgkQa9axLQDI
 XvEDYA//a3eeNkgMuGdnSCVcLz+zy+oNwAwboG/4X1DqL8jiCbI4npwugPx95RIA
 YZOUvo9T2aL3OyefpUHll4gFHqx9OwoZIig2F70TEUmlPsGUbh0KBkdfQF3xZPdl
 EwV0kHSGEqMWMBwsGJGwgCYrUaf1MUQzh1GBl7VJ2ts5XsJBaBeOyKkysij26wtZ
 V+aHq2IUx7qQS7+HC/4P6IoHxKziFcsCMovaKaynP4cw9xXBQbDMcNlHEwndOMyk
 pu2zrv7GG0j3KQuVP/2Alf5FKhmI0GVGP/6Nc/zsOmw96w8Kf7HfzEtkHawr2aRq
 rqg/c9ivzDn1p+fUBo4ZYtrRk4IAY+yKu6hdzdLTP5+bQrBTWTO9rjQVBm9FAGYT
 sCdEj1NqzvExvNHD7X6ut/GJ05lmce3K+qeSXSEysN9gqiT3eomYWMXrD2V2lxzb
 rIDDcb/icfaqjt14Mksh19r/rzNeq7noj9CGSmcqw0BHZfHzl38Lai6pdfYzCNyn
 vCM/c4c1D/WWX8/lifO1JZVbhDk1jy82Iphg2KEhL8iKPxDsKBBZLmYuU1oa7tMo
 WryGAz9+GQwd+W9chFuaOEtMnzvW2scEJ5Eb2fEf0Qj0aEurkL+C9dZR6o1GN77V
 DBUxtU628Ef4PJJGfbNCwZzdd8UPYG3a/mKfQQ3dz0oz2LySlW4=
 =wDot
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Support for running Linux in a protected VM under the Arm
   Confidential Compute Architecture (CCA)

 - Guarded Control Stack user-space support. Current patches follow the
   x86 ABI of implicitly creating a shadow stack on clone(). Subsequent
   patches (already on the list) will add support for clone3() allowing
   finer-grained control of the shadow stack size and placement from
   libc

 - AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are
   getting close with the upcoming dpISA support)

 - Other arch features:

     - In-kernel use of the memcpy instructions, FEAT_MOPS (previously
       only exposed to user; uaccess support not merged yet)

     - MTE: hugetlbfs support and the corresponding kselftests

     - Optimise CRC32 using the PMULL instructions

     - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG

     - Optimise the kernel TLB flushing to use the range operations

     - POE/pkey (permission overlays): further cleanups after bringing
       the signal handler in line with the x86 behaviour for 6.12

 - arm64 perf updates:

     - Support for the NXP i.MX91 PMU in the existing IMX driver

     - Support for Ampere SoCs in the Designware PCIe PMU driver

     - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC

     - Support for Samsung's 'Mongoose' CPU PMU

     - Support for PMUv3.9 finer-grained userspace counter access
       control

     - Switch back to platform_driver::remove() now that it returns
       'void'

     - Add some missing events for the CXL PMU driver

 - Miscellaneous arm64 fixes/cleanups:

     - Page table accessors cleanup: type updates, drop unused macros,
       reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity
       check addresses before runtime P4D/PUD folding

     - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the
       FEAT_ECV for the generic timers) allowing Linux to boot with
       firmware deployments that don't set SCTLR_EL3.ECVEn

     - ACPI/arm64: tighten the check for the array of platform timer
       structures and adjust the error handling procedure in
       gtdt_parse_timer_block()

     - Optimise the cache flush for the uprobes xol slot (skip if no
       change) and other uprobes/kprobes cleanups

     - Fix the context switching of tpidrro_el0 when kpti is enabled

     - Dynamic shadow call stack fixes

     - Sysreg updates

     - Various arm64 kselftest improvements

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (168 commits)
  arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
  kselftest/arm64: Try harder to generate different keys during PAC tests
  kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
  arm64/ptrace: Clarify documentation of VL configuration via ptrace
  kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
  acpi/arm64: remove unnecessary cast
  arm64/mm: Change protval as 'pteval_t' in map_range()
  kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
  kselftest/arm64: Add FPMR coverage to fp-ptrace
  kselftest/arm64: Expand the set of ZA writes fp-ptrace does
  kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
  kselftest/arm64: Enable build of PAC tests with LLVM=1
  kselftest/arm64: Check that SVCR is 0 in signal handlers
  selftests/mm: Fix unused function warning for aarch64_write_signal_pkey()
  kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
  kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
  kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
  kselftest/arm64: Fix build with stricter assemblers
  arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
  arm64/scs: Deal with 64-bit relative offsets in FDE frames
  ...
2024-11-18 18:10:37 -08:00
Linus Torvalds
70e7730c2a vfs-6.13.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc
 osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli
 pqlG+38UdATImpfqqWjPbb72sBYcfQg=
 =wLUh
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Fixup and improve NLM and kNFSD file lock callbacks

     Last year both GFS2 and OCFS2 had some work done to make their
     locking more robust when exported over NFS. Unfortunately, part of
     that work caused both NLM (for NFS v3 exports) and kNFSD (for
     NFSv4.1+ exports) to no longer send lock notifications to clients

     This in itself is not a huge problem because most NFS clients will
     still poll the server in order to acquire a conflicted lock

     It's important for NLM and kNFSD that they do not block their
     kernel threads inside filesystem's file_lock implementations
     because that can produce deadlocks. We used to make sure of this by
     only trusting that posix_lock_file() can correctly handle blocking
     lock calls asynchronously, so the lock managers would only setup
     their file_lock requests for async callbacks if the filesystem did
     not define its own lock() file operation

     However, when GFS2 and OCFS2 grew the capability to correctly
     handle blocking lock requests asynchronously, they started
     signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
     for also trusting posix_lock_file() was inadvertently dropped, so
     now most filesystems no longer produce lock notifications when
     exported over NFS

     Fix this by using an fop_flag which greatly simplifies the problem
     and grooms the way for future uses by both filesystems and lock
     managers alike

   - Add a sysctl to delete the dentry when a file is removed instead of
     making it a negative dentry

     Commit 681ce86235 ("vfs: Delete the associated dentry when
     deleting a file") introduced an unconditional deletion of the
     associated dentry when a file is removed. However, this led to
     performance regressions in specific benchmarks, such as
     ilebench.sum_operations/s, prompting a revert in commit
     4a4be1ad3a ("Revert "vfs: Delete the associated dentry when
     deleting a file""). This reintroduces the concept conditionally
     through a sysctl

   - Expand the statmount() system call:

       * Report the filesystem subtype in a new fs_subtype field to
         e.g., report fuse filesystem subtypes

       * Report the superblock source in a new sb_source field

       * Add a new way to return filesystem specific mount options in an
         option array that returns filesystem specific mount options
         separated by zero bytes and unescaped. This allows caller's to
         retrieve filesystem specific mount options and immediately pass
         them to e.g., fsconfig() without having to unescape or split
         them

       * Report security (LSM) specific mount options in a separate
         security option array. We don't lump them together with
         filesystem specific mount options as security mount options are
         generic and most users aren't interested in them

         The format is the same as for the filesystem specific mount
         option array

   - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command

   - Optimize acl_permission_check() to avoid costly {g,u}id ownership
     checks if possible

   - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()

   - Add synchronous wakeup support for ep_poll_callback.

     Currently, epoll only uses wake_up() to wake up task. But sometimes
     there are epoll users which want to use the synchronous wakeup flag
     to give a hint to the scheduler, e.g., the Android binder driver.
     So add a wake_up_sync() define, and use wake_up_sync() when sync is
     true in ep_poll_callback()

  Fixes:

   - Fix kernel documentation for inode_insert5() and iget5_locked()

   - Annotate racy epoll check on file->f_ep

   - Make F_DUPFD_QUERY associative

   - Avoid filename buffer overrun in initramfs

   - Don't let statmount() return empty strings

   - Add a cond_resched() to dump_user_range() to avoid hogging the CPU

   - Don't query the device logical blocksize multiple times for hfsplus

   - Make filemap_read() check that the offset is positive or zero

  Cleanups:

   - Various typo fixes

   - Cleanup wbc_attach_fdatawrite_inode()

   - Add __releases annotation to wbc_attach_and_unlock_inode()

   - Add hugetlbfs tracepoints

   - Fix various vfs kernel doc parameters

   - Remove obsolete TODO comment from io_cancel()

   - Convert wbc_account_cgroup_owner() to take a folio

   - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()

   - Reorder struct posix_acl to save 8 bytes

   - Annotate struct posix_acl with __counted_by()

   - Replace one-element array with flexible array member in freevxfs

   - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"

* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
  statmount: retrieve security mount options
  vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
  statmount: add flag to retrieve unescaped options
  fs: add the ability for statmount() to report the sb_source
  writeback: wbc_attach_fdatawrite_inode out of line
  writeback: add a __releases annoation to wbc_attach_and_unlock_inode
  fs: add the ability for statmount() to report the fs_subtype
  fs: don't let statmount return empty strings
  fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
  hfsplus: don't query the device logical block size multiple times
  freevxfs: Replace one-element array with flexible array member
  fs: optimize acl_permission_check()
  initramfs: avoid filename buffer overrun
  fs/writeback: convert wbc_account_cgroup_owner to take a folio
  acl: Annotate struct posix_acl with __counted_by()
  acl: Realign struct posix_acl to save 8 bytes
  epoll: Add synchronous wakeup support for ep_poll_callback
  coredump: add cond_resched() to dump_user_range
  mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
  mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
  ...
2024-11-18 09:35:30 -08:00
Mauro Carvalho Chehab
72ad4ff638 docs: media: update location of the media patches
Due to recent changes on the way we're maintaining media, the
location of the main tree was updated.

Change docs accordingly.

Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Hans Verkuil <hverkuil@xs4all.nl>
2024-11-18 12:41:22 +01:00
Frederic Weisbecker
d8dfba2c60 Merge branches 'rcu/fixes', 'rcu/nocb', 'rcu/torture', 'rcu/stall' and 'rcu/srcu' into rcu/dev 2024-11-15 22:38:53 +01:00
Joshua Hahn
05d4532b60 memcg/hugetlb: add hugeTLB counters to memcg
This patch introduces a new counter to memory.stat that tracks hugeTLB
usage, only if hugeTLB accounting is done to memory.current.  This feature
is enabled the same way hugeTLB accounting is enabled, via the
memory_hugetlb_accounting mount flag for cgroupsv2.

1. Why is this patch necessary?
Currently, memcg hugeTLB accounting is an opt-in feature [1] that adds
hugeTLB usage to memory.current.  However, the metric is not reported in
memory.stat.  Given that users often interpret memory.stat as a breakdown
of the value reported in memory.current, the disparity between the two
reports can be confusing.  This patch solves this problem by including the
metric in memory.stat as well, but only if it is also reported in
memory.current (it would also be confusing if the value was reported in
memory.stat, but not in memory.current)

Aside from the consistency between the two files, we also see benefits in
observability.  Userspace might be interested in the hugeTLB footprint of
cgroups for many reasons.  For instance, system admins might want to
verify that hugeTLB usage is distributed as expected across tasks: i.e. 
memory-intensive tasks are using more hugeTLB pages than tasks that don't
consume a lot of memory, or are seen to fault frequently.  Note that this
is separate from wanting to inspect the distribution for limiting purposes
(in which case, hugeTLB controller makes more sense).

2. We already have a hugeTLB controller. Why not use that?
It is true that hugeTLB tracks the exact value that we want.  In fact, by
enabling the hugeTLB controller, we get all of the observability benefits
that I mentioned above, and users can check the total hugeTLB usage,
verify if it is distributed as expected, etc.

With this said, there are 2 problems:
(a) They are still not reported in memory.stat, which means the
    disparity between the memcg reports are still there.
(b) We cannot reasonably expect users to enable the hugeTLB controller
    just for the sake of hugeTLB usage reporting, especially since
    they don't have any use for hugeTLB usage enforcing [2].

3. Implementation Details:
In the alloc / free hugetlb functions, we call lruvec_stat_mod_folio
regardless of whether memcg accounts hugetlb.  mem_cgroup_commit_charge
which is called from alloc_hugetlb_folio will set memcg for the folio only
if the CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING cgroup mount option is used, so
lruvec_stat_mod_folio accounts per-memcg hugetlb counters only if the
feature is enabled.  Regardless of whether memcg accounts for hugetlb, the
newly added global counter is updated and shown in /proc/vmstat.

The global counter is added because vmstats is the preferred framework for
cgroup stats.  It makes stat items consistent between global and cgroups. 
It also provides a per-node breakdown, which is useful.  Because it does
not use cgroup-specific hooks, we also keep generic MM code separate from
memcg code.

[1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@gmail.com/
[2] Of course, we can't make a new patch for every feature that can be
    duplicated. However, since the existing solution of enabling the
    hugeTLB controller is an imperfect solution that still leaves a
    discrepancy between memory.stat and memory.curent, I think that it
    is reasonable to isolate the feature in this case.

Link: https://lkml.kernel.org/r/20241101204402.1885383-1-joshua.hahnjy@gmail.com
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Suggested-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:49:19 -08:00
Jakub Kicinski
a79993b5fc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc8).

Conflicts:

tools/testing/selftests/net/.gitignore
  252e01e682 ("selftests: net: add netlink-dumps to .gitignore")
  be43a6b238 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw")
https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/

drivers/net/phy/phylink.c
  671154f174 ("net: phylink: ensure PHY momentary link-fails are handled")
  7530ea26c8 ("net: phylink: remove "using_mac_select_pcs"")

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c
  5b366eae71 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines")
  e96321fad3 ("net: ethernet: Switch back to struct platform_driver::remove()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14 11:29:15 -08:00
Catalin Marinas
5a4332062e Merge branches 'for-next/gcs', 'for-next/probes', 'for-next/asm-offsets', 'for-next/tlb', 'for-next/misc', 'for-next/mte', 'for-next/sysreg', 'for-next/stacktrace', 'for-next/hwcap3', 'for-next/kselftest', 'for-next/crc32', 'for-next/guest-cca', 'for-next/haft' and 'for-next/scs', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: Switch back to struct platform_driver::remove()
  perf: arm_pmuv3: Add support for Samsung Mongoose PMU
  dt-bindings: arm: pmu: Add Samsung Mongoose core compatible
  perf/dwc_pcie: Fix typos in event names
  perf/dwc_pcie: Add support for Ampere SoCs
  ARM: pmuv3: Add missing write_pmuacr()
  perf/marvell: Marvell PEM performance monitor support
  perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access control
  perf/dwc_pcie: Convert the events with mixed case to lowercase
  perf/cxlpmu: Support missing events in 3.1 spec
  perf: imx_perf: add support for i.MX91 platform
  dt-bindings: perf: fsl-imx-ddr: Add i.MX91 compatible
  drivers perf: remove unused field pmu_node

* for-next/gcs: (42 commits)
  : arm64 Guarded Control Stack user-space support
  kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
  arm64/gcs: Fix outdated ptrace documentation
  kselftest/arm64: Ensure stable names for GCS stress test results
  kselftest/arm64: Validate that GCS push and write permissions work
  kselftest/arm64: Enable GCS for the FP stress tests
  kselftest/arm64: Add a GCS stress test
  kselftest/arm64: Add GCS signal tests
  kselftest/arm64: Add test coverage for GCS mode locking
  kselftest/arm64: Add a GCS test program built with the system libc
  kselftest/arm64: Add very basic GCS test program
  kselftest/arm64: Always run signals tests with GCS enabled
  kselftest/arm64: Allow signals tests to specify an expected si_code
  kselftest/arm64: Add framework support for GCS to signal handling tests
  kselftest/arm64: Add GCS as a detected feature in the signal tests
  kselftest/arm64: Verify the GCS hwcap
  arm64: Add Kconfig for Guarded Control Stack (GCS)
  arm64/ptrace: Expose GCS via ptrace and core files
  arm64/signal: Expose GCS state in signal frames
  arm64/signal: Set up and restore the GCS context for signal handlers
  arm64/mm: Implement map_shadow_stack()
  ...

* for-next/probes:
  : Various arm64 uprobes/kprobes cleanups
  arm64: insn: Simulate nop instruction for better uprobe performance
  arm64: probes: Remove probe_opcode_t
  arm64: probes: Cleanup kprobes endianness conversions
  arm64: probes: Move kprobes-specific fields
  arm64: probes: Fix uprobes for big-endian kernels
  arm64: probes: Fix simulate_ldr*_literal()
  arm64: probes: Remove broken LDR (literal) uprobe support

* for-next/asm-offsets:
  : arm64 asm-offsets.c cleanup (remove unused offsets)
  arm64: asm-offsets: remove PREEMPT_DISABLE_OFFSET
  arm64: asm-offsets: remove DMA_{TO,FROM}_DEVICE
  arm64: asm-offsets: remove VM_EXEC and PAGE_SZ
  arm64: asm-offsets: remove MM_CONTEXT_ID
  arm64: asm-offsets: remove COMPAT_{RT_,SIGFRAME_REGS_OFFSET
  arm64: asm-offsets: remove VMA_VM_*
  arm64: asm-offsets: remove TSK_ACTIVE_MM

* for-next/tlb:
  : TLB flushing optimisations
  arm64: optimize flush tlb kernel range
  arm64: tlbflush: add __flush_tlb_range_limit_excess()

* for-next/misc:
  : Miscellaneous patches
  arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
  arm64/ptrace: Clarify documentation of VL configuration via ptrace
  acpi/arm64: remove unnecessary cast
  arm64/mm: Change protval as 'pteval_t' in map_range()
  arm64: uprobes: Optimize cache flushes for xol slot
  acpi/arm64: Adjust error handling procedure in gtdt_parse_timer_block()
  arm64: fix .data.rel.ro size assertion when CONFIG_LTO_CLANG
  arm64/ptdump: Test both PTE_TABLE_BIT and PTE_VALID for block mappings
  arm64/mm: Sanity check PTE address before runtime P4D/PUD folding
  arm64/mm: Drop setting PTE_TYPE_PAGE in pte_mkcont()
  ACPI: GTDT: Tighten the check for the array of platform timer structures
  arm64/fpsimd: Fix a typo
  arm64: Expose ID_AA64ISAR1_EL1.XS to sanitised feature consumers
  arm64: Return early when break handler is found on linked-list
  arm64/mm: Re-organize arch_make_huge_pte()
  arm64/mm: Drop _PROT_SECT_DEFAULT
  arm64: Add command-line override for ID_AA64MMFR0_EL1.ECV
  arm64: head: Drop SWAPPER_TABLE_SHIFT
  arm64: cpufeature: add POE to cpucap_is_possible()
  arm64/mm: Change pgattr_change_is_safe() arguments as pteval_t

* for-next/mte:
  : Various MTE improvements
  selftests: arm64: add hugetlb mte tests
  hugetlb: arm64: add mte support

* for-next/sysreg:
  : arm64 sysreg updates
  arm64/sysreg: Update ID_AA64MMFR1_EL1 to DDI0601 2024-09

* for-next/stacktrace:
  : arm64 stacktrace improvements
  arm64: preserve pt_regs::stackframe during exec*()
  arm64: stacktrace: unwind exception boundaries
  arm64: stacktrace: split unwind_consume_stack()
  arm64: stacktrace: report recovered PCs
  arm64: stacktrace: report source of unwind data
  arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
  arm64: use a common struct frame_record
  arm64: pt_regs: swap 'unused' and 'pmr' fields
  arm64: pt_regs: rename "pmr_save" -> "pmr"
  arm64: pt_regs: remove stale big-endian layout
  arm64: pt_regs: assert pt_regs is a multiple of 16 bytes

* for-next/hwcap3:
  : Add AT_HWCAP3 support for arm64 (also wire up AT_HWCAP4)
  arm64: Support AT_HWCAP3
  binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4

* for-next/kselftest: (30 commits)
  : arm64 kselftest fixes/cleanups
  kselftest/arm64: Try harder to generate different keys during PAC tests
  kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
  kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
  kselftest/arm64: Add FPMR coverage to fp-ptrace
  kselftest/arm64: Expand the set of ZA writes fp-ptrace does
  kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
  kselftest/arm64: Enable build of PAC tests with LLVM=1
  kselftest/arm64: Check that SVCR is 0 in signal handlers
  kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
  kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
  kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
  kselftest/arm64: Fix build with stricter assemblers
  kselftest/arm64: Test signal handler state modification in fp-stress
  kselftest/arm64: Provide a SIGUSR1 handler in the kernel mode FP stress test
  kselftest/arm64: Implement irritators for ZA and ZT
  kselftest/arm64: Remove unused ADRs from irritator handlers
  kselftest/arm64: Correct misleading comments on fp-stress irritators
  kselftest/arm64: Poll less often while waiting for fp-stress children
  kselftest/arm64: Increase frequency of signal delivery in fp-stress
  kselftest/arm64: Fix encoding for SVE B16B16 test
  ...

* for-next/crc32:
  : Optimise CRC32 using PMULL instructions
  arm64/crc32: Implement 4-way interleave using PMULL
  arm64/crc32: Reorganize bit/byte ordering macros
  arm64/lib: Handle CRC-32 alternative in C code

* for-next/guest-cca:
  : Support for running Linux as a guest in Arm CCA
  arm64: Document Arm Confidential Compute
  virt: arm-cca-guest: TSM_REPORT support for realms
  arm64: Enable memory encrypt for Realms
  arm64: mm: Avoid TLBI when marking pages as valid
  arm64: Enforce bounce buffers for realm DMA
  efi: arm64: Map Device with Prot Shared
  arm64: rsi: Map unprotected MMIO as decrypted
  arm64: rsi: Add support for checking whether an MMIO is protected
  arm64: realm: Query IPA size from the RMM
  arm64: Detect if in a realm and set RIPAS RAM
  arm64: rsi: Add RSI definitions

* for-next/haft:
  : Support for arm64 FEAT_HAFT
  arm64: pgtable: Warn unexpected pmdp_test_and_clear_young()
  arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG
  arm64: Add support for FEAT_HAFT
  arm64: setup: name 'tcr2' register
  arm64/sysreg: Update ID_AA64MMFR1_EL1 register

* for-next/scs:
  : Dynamic shadow call stack fixes
  arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
  arm64/scs: Deal with 64-bit relative offsets in FDE frames
  arm64/scs: Fix handling of DWARF augmentation data in CIE/FDE frames
2024-11-14 12:07:16 +00:00
Linus Torvalds
4ba05b0e85 Hi,
Two bug fixes for TPM bus encryption (the remaining reported issues in
 the feature).
 
 BR, Jarkko
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZzT7PQAKCRAaerohdGur
 0ny/AP9e6gN+H3lIVQNdbeSKhtBJyWalxOnvAQ4ymjTCyyyqXAD/Ua36HA7FRYxI
 Ltp2swtz3WcsGgqtRpU+cmcb1Y21DAg=
 =5kvC
 -----END PGP SIGNATURE-----

Merge tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
 "Two bug fixes for TPM bus encryption (the remaining reported issues in
  the feature)"

* tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Disable TPM on tpm2_create_primary() failure
  tpm: Opt-in in disable PCR integrity protection
2024-11-13 13:28:58 -08:00
Jarkko Sakkinen
27184f8905 tpm: Opt-in in disable PCR integrity protection
The initial HMAC session feature added TPM bus encryption and/or integrity
protection to various in-kernel TPM operations. This can cause performance
bottlenecks with IMA, as it heavily utilizes PCR extend operations.

In order to mitigate this performance issue, introduce a kernel
command-line parameter to the TPM driver for disabling the integrity
protection for PCR extend operations (i.e. TPM2_PCR_Extend).

Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/
Fixes: 6519fea6fd ("tpm: add hmac checks to tpm2_pcr_extend()")
Tested-by: Mimi Zohar <zohar@linux.ibm.com>
Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Co-developed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-11-13 21:10:45 +02:00
Paul E. McKenney
0a116dc86d doc: Remove kernel-parameters.txt entry for rcutorture.read_exit
There is only ever the one read-exit task, and there is no module
parameter named rcutorture.read_exit, so remove the bogus documentation.
Instead, use rcutorture.read_exit_burst to enable/disable read-exit
race testing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12 21:45:06 +01:00
Paul E. McKenney
4fa7f729ce doc: Add rcuog kthreads to kernel-per-CPU-kthreads.rst
This commit adds the rcuog kthreads to the list of callback-offloading
kthreads that can be affinitied away from worker CPUs.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12 21:44:19 +01:00
Thorsten Leemhuis
f5aff6fa64 docs: bug-bisect: add a note about bisecting -next
Explicitly mention how to bisect -next, as nothing in the kernel tree
currently explains that bisects between -next versions won't work well
and it's better to bisect between mainline and -next.

Co-developed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/ec19d5fc503ff7db3d4c4ff9e97fff24cc78f72a.1730808651.git.linux@leemhuis.info
2024-11-12 13:06:07 -07:00
Paul E. McKenney
43349fc4d8 rcutorture: Add srcu_read_lock_lite() support to rcutorture.reader_flavor
This commit causes bit 0x4 of rcutorture.reader_flavor to select the new
srcu_read_lock_lite() and srcu_read_unlock_lite() functions.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12 15:44:37 +01:00
Paul E. McKenney
95a5de2154 rcutorture: Add reader_flavor parameter for SRCU readers
This commit adds an rcutorture.reader_flavor parameter whose bits
correspond to reader flavors.  For example, SRCU's readers are 0x1 for
normal and 0x2 for NMI-safe.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12 15:44:30 +01:00
Breno Leitao
12079a59ce net: Implement fault injection forcing skb reallocation
Introduce a fault injection mechanism to force skb reallocation. The
primary goal is to catch bugs related to pointer invalidation after
potential skb reallocation.

The fault injection mechanism aims to identify scenarios where callers
retain pointers to various headers in the skb but fail to reload these
pointers after calling a function that may reallocate the data. This
type of bug can lead to memory corruption or crashes if the old,
now-invalid pointers are used.

By forcing reallocation through fault injection, we can stress-test code
paths and ensure proper pointer management after potential skb
reallocations.

Add a hook for fault injection in the following functions:

 * pskb_trim_rcsum()
 * pskb_may_pull_reason()
 * pskb_trim()

As the other fault injection mechanism, protect it under a debug Kconfig
called CONFIG_FAIL_SKB_REALLOC.

This patch was *heavily* inspired by Jakub's proposal from:
https://lore.kernel.org/all/20240719174140.47a868e6@kernel.org/

CC: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20241107-fault_v6-v6-1-1b82cb6ecacd@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12 12:05:33 +01:00
Lance Yang
62bf7065cc hung_task: add docs for hung_task_detect_count
This commit introduces documentation for hung_task_detect_count in
kernel.rst.

Link: https://lkml.kernel.org/r/20241027120747.42833-3-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Cc: Bang Li <libang.li@antgroup.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Cun <cunhuang@tencent.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Siddle <jsiddle@redhat.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 17:17:03 -08:00
Maíra Canal
24f9cd195f mm: shmem: override mTHP shmem default with a kernel parameter
Add the ``thp_shmem=`` kernel command line to allow specifying the default
policy of each supported shmem hugepage size.  The kernel parameter
accepts the following format:

thp_shmem=<size>[KMG],<size>[KMG]:<policy>;<size>[KMG]-<size>[KMG]:<policy>

For example,

thp_shmem=16K-64K:always;128K,512K:inherit;256K:advise;1M-2M:never;4M-8M:within_size

Some GPUs may benefit from using huge pages.  Since DRM GEM uses shmem to
allocate anonymous pageable memory, it's essential to control the huge
page allocation policy for the internal shmem mount.  This control can be
achieved through the ``transparent_hugepage_shmem=`` parameter.

Beyond just setting the allocation policy, it's crucial to have granular
control over the size of huge pages that can be allocated.  The GPU may
support only specific huge page sizes, and allocating pages larger/smaller
than those sizes would be ineffective.

Link: https://lkml.kernel.org/r/20241101165719.1074234-6-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 13:09:43 -08:00
Maíra Canal
9490428111 mm: shmem: control THP support through the kernel command line
Patch series "mm: add more kernel parameters to control mTHP", v5.

This series introduces four patches related to the kernel parameters
controlling mTHP and a fifth patch replacing `strcpy()` for `strscpy()` in
the file `mm/huge_memory.c`.

The first patch is a straightforward documentation update, correcting the
format of the kernel parameter ``thp_anon=``.

The second, third, and fourth patches focus on controlling THP support for
shmem via the kernel command line.  The second patch introduces a
parameter to control the global default huge page allocation policy for
the internal shmem mount.  The third patch moves a piece of code to a
shared header to ease the implementation of the fourth patch.  Finally,
the fourth patch implements a parameter similar to ``thp_anon=``, but for
shmem.

The goal of these changes is to simplify the configuration of systems that
rely on mTHP support for shmem.  For instance, a platform with a GPU that
benefits from huge pages may want to enable huge pages for shmem.  Having
these kernel parameters streamlines the configuration process and ensures
consistency across setups.


This patch (of 4):

Add a new kernel command line to control the hugepage allocation policy
for the internal shmem mount, ``transparent_hugepage_shmem``. The
parameter is similar to ``transparent_hugepage`` and has the following
format:

transparent_hugepage_shmem=<policy>

where ``<policy>`` is one of the seven valid policies available for
shmem.

Configuring the default huge page allocation policy for the internal
shmem mount can be beneficial for DRM GPU drivers. Just as CPU
architectures, GPUs can also take advantage of huge pages, but this is
possible only if DRM GEM objects are backed by huge pages.

Since GEM uses shmem to allocate anonymous pageable memory, having control
over the default huge page allocation policy allows for the exploration of
huge pages use on GPUs that rely on GEM objects backed by shmem.

Link: https://lkml.kernel.org/r/20241101165719.1074234-2-mcanal@igalia.com
Link: https://lkml.kernel.org/r/20241101165719.1074234-4-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: dri-devel@lists.freedesktop.org
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kernel-dev@igalia.com
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 13:09:43 -08:00
Mauro Carvalho Chehab
5516200c46 Linux 6.12-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmcxMXceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1IgH/A3O7KIy/VR7D7O3
 usbLqk1V+YWs/NsVdewEL/SYfXjCxqnejdk/AvN3ZAIxFeMHhAmcSCKno3zKgK9L
 ML4kDrz22dPlA0XncNM8qKTCqAMgXTur1wafv3NAjutg0D3eHvAp0BW0GO5px93G
 +kt3kOY32UaB+2Fl1GIub777pRi5U4u5AboQTu3x0TdRZJtV1pqgeddGoymNn6mi
 xmMVbY3r5MXJQyHntoT9FIuxK3d+jGcgRHP5RWr53+vAUEFdlXiGcJV4dUXsuQNa
 sEKJutCaUqQeiamjoo4bRZO7/2OAPX9Sv7sNIXD/irZZJmCcWr+GDCcUmL69Mjg7
 7mx6XrM=
 =HYUx
 -----END PGP SIGNATURE-----

Merge tag 'v6.12-rc7' into __tmp-hansg-linux-tags_media_atomisp_6_13_1

Linux 6.12-rc7

* tag 'v6.12-rc7': (1909 commits)
  Linux 6.12-rc7
  filemap: Fix bounds checking in filemap_read()
  i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set
  mailmap: add entry for Thorsten Blum
  ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
  signal: restore the override_rlimit logic
  fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
  ucounts: fix counter leak in inc_rlimit_get_ucounts()
  selftests: hugetlb_dio: check for initial conditions to skip in the start
  mm: fix docs for the kernel parameter ``thp_anon=``
  mm/damon/core: avoid overflow in damon_feed_loop_next_input()
  mm/damon/core: handle zero schemes apply interval
  mm/damon/core: handle zero {aggregation,ops_update} intervals
  mm/mlock: set the correct prev on failure
  objpool: fix to make percpu slot allocation more robust
  mm/page_alloc: keep track of free highatomic
  bcachefs: Fix UAF in __promote_alloc() error path
  bcachefs: Change OPT_STR max to be 1 less than the size of choices array
  bcachefs: btree_cache.freeable list fixes
  bcachefs: check the invalid parameter for perf test
  ...
2024-11-11 12:16:33 +01:00
Barry Song
aaf2914aec mm: add per-order mTHP swpin counters
This helps profile the sizes of folios being swapped in. Currently,
only mTHP swap-out is being counted.
The new interface can be found at:
/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
         swpin
For example,
cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpin
12809
cat /sys/kernel/mm/transparent_hugepage/hugepages-32kB/stats/swpin
4763

[v-songbaohua@oppo.com: add a blank line in doc]
  Link: https://lkml.kernel.org/r/20241030233423.80759-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20241026082423.26298-1-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 00:26:43 -08:00
Kanchana P Sridhar
0c560dd860 mm: swap: count successful large folio zswap stores in hugepage zswpout stats
Added a new MTHP_STAT_ZSWPOUT entry to the sysfs transparent_hugepage
stats so that successful large folio zswap stores can be accounted under
the per-order sysfs "zswpout" stats:

/sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout

Other non-zswap swap device swap-out events will be counted under
the existing sysfs "swpout" stats:

/sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout

Also, added documentation for the newly added sysfs per-order hugepage
"zswpout" stats. The documentation clarifies that only non-zswap swapouts
will be accounted in the existing "swpout" stats.

Link: https://lkml.kernel.org/r/20241001053222.6944-8-kanchana.p.sridhar@intel.com
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Wajdi Feghali <wajdi.k.feghali@intel.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
Cc: Barry Song <21cnbao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 00:26:43 -08:00
Andrew Morton
2ec0859039 Merge branch 'mm-hotfixes-stable' into mm-stable
Pick up e7ac4daeed ("mm: count zeromap read and set for swapout and
swapin") in order to move

mm: define obj_cgroup_get() if CONFIG_MEMCG is not defined
mm: zswap: modify zswap_compress() to accept a page instead of a folio
mm: zswap: rename zswap_pool_get() to zswap_pool_tryget()
mm: zswap: modify zswap_stored_pages to be atomic_long_t
mm: zswap: support large folios in zswap_store()
mm: swap: count successful large folio zswap stores in hugepage zswpout stats
mm: zswap: zswap_store_page() will initialize entry after adding to xarray.
mm: add per-order mTHP swpin counters

from mm-unstable into mm-stable.
2024-11-11 00:04:10 -08:00
Barry Song
e7ac4daeed mm: count zeromap read and set for swapout and swapin
When the proportion of folios from the zeromap is small, missing their
accounting may not significantly impact profiling.  However, it's easy to
construct a scenario where this becomes an issue—for example, allocating
1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT,
and then swapping it back in.  In this case, the swap-out and swap-in
counts seem to vanish into a black hole, potentially causing semantic
ambiguity.

On the other hand, Usama reported that zero-filled pages can exceed 10% in
workloads utilizing zswap, while Hailong noted that some app in Android
have more than 6% zero-filled pages.  Before commit 0ca0c24e32 ("mm:
store zero pages to be swapped out in a bitmap"), both zswap and zRAM
implemented similar optimizations, leading to these optimized-out pages
being counted in either zswap or zRAM counters (with pswpin/pswpout also
increasing for zRAM).  With zeromap functioning prior to both zswap and
zRAM, userspace will no longer detect these swap-out and swap-in actions.

We have three ways to address this:

1. Introduce a dedicated counter specifically for the zeromap.

2. Use pswpin/pswpout accounting, treating the zero map as a standard
   backend.  This approach aligns with zRAM's current handling of
   same-page fills at the device level.  However, it would mean losing the
   optimized-out page counters previously available in zRAM and would not
   align with systems using zswap.  Additionally, as noted by Nhat Pham,
   pswpin/pswpout counters apply only to I/O done directly to the backend
   device.

3. Count zeromap pages under zswap, aligning with system behavior when
   zswap is enabled.  However, this would not be consistent with zRAM, nor
   would it align with systems lacking both zswap and zRAM.

Given the complications with options 2 and 3, this patch selects
option 1.

We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).

For example:

$ grep -E 'swpin_zero|swpout_zero' /proc/vmstat
swpin_zero 1648
swpout_zero 33536

$ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat
swpin_zero 3905
swpout_zero 3985

This patch does not address any specific zeromap bug, but the missing
swpout and swpin counts for zero-filled pages can be highly confusing and
may mislead user-space agents that rely on changes in these counters as
indicators.  Therefore, we add a Fixes tag to encourage the inclusion of
this counter in any kernel versions with zeromap.

Many thanks to Kanchana for the contribution of changing
count_objcg_event() to count_objcg_events() to support large folios[1],
which has now been incorporated into this patch.

[1] https://lkml.kernel.org/r/20241001053222.6944-5-kanchana.p.sridhar@intel.com

Link: https://lkml.kernel.org/r/20241107011246.59137-1-21cnbao@gmail.com
Fixes: 0ca0c24e32 ("mm: store zero pages to be swapped out in a bitmap")
Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Hailong Liu <hailong.liu@oppo.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 00:00:37 -08:00
Maíra Canal
652e1a5146 mm: fix docs for the kernel parameter `thp_anon=`
If we add ``thp_anon=32,64K:always`` to the kernel command line, we
will see the following error:

[    0.000000] huge_memory: thp_anon=32,64K:always: error parsing string, ignoring setting

This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```,
as [KMG] must follow each number to especify its unit. So, the correct
format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```.

Therefore, adjust the documentation to reflect the correct format of the
parameter ``thp_anon=``.

Link: https://lkml.kernel.org/r/20241101165719.1074234-3-mcanal@igalia.com
Fixes: dd4d30d1cd ("mm: override mTHP "enabled" defaults at kernel cmdline")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Acked-by: Barry Song <baohua@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07 14:14:59 -08:00
Shakeel Butt
aa6b4fdf59 memcg-v1: fully deprecate move_charge_at_immigrate
Patch series "memcg-v1: fully deprecate charge moving".

The memcg v1's charge moving feature has been deprecated for almost 2
years and the kernel warns if someone try to use it.  This warning has
been backported to all stable kernel and there have not been any report of
the warning or the request to support this feature anymore.  Let's proceed
to fully deprecate this feature.


This patch (of 6):

Proceed with the complete deprecation of memcg v1's charge moving feature.
The deprecation warning has been in the kernel for almost two years and
has been ported to all stable kernel since.  Now is the time to fully
deprecate this feature.

Link: https://lkml.kernel.org/r/20241025012304.2473312-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20241025012304.2473312-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06 20:11:18 -08:00
Sergey Senozhatsky
58652f2b6d zram: permit only one post-processing operation at a time
Both recompress and writeback soon will unlock slots during processing,
which makes things too complex wrt possible race-conditions.  We still
want to clear PP_SLOT in slot_free, because this is how we figure out that
slot that was selected for post-processing has been released under us and
when we start post-processing we check if slot still has PP_SLOT set.  At
the same time, theoretically, we can have something like this:

CPU0			    CPU1

recompress
scan slots
set PP_SLOT
unlock slot
			slot_free
			clear PP_SLOT

			allocate PP_SLOT
			writeback
			scan slots
			set PP_SLOT
			unlock slot
select PP-slot
test PP_SLOT

So recompress will not detect that slot has been re-used and re-selected
for concurrent writeback post-processing.

Make sure that we only permit on post-processing operation at a time.  So
now recompress and writeback post-processing don't race against each
other, we only need to handle slot re-use (slot_free and write), which is
handled individually by each pp operation.

Having recompress and writeback competing for the same slots is not
exactly good anyway (can't imagine anyone doing that).

Link: https://lkml.kernel.org/r/20240917021020.883356-3-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05 16:56:22 -08:00
Guilherme G. Piccoli
7d6094e62c Documentation: Improve crash_kexec_post_notifiers description
The crash_kexec_post_notifiers description could be improved a bit,
by clarifying its upsides (yes, there are some!) and be more descriptive
about the downsides, specially mentioning code that enables the option
unconditionally, like Hyper-V[0], PowerPC (fadump)[1] and more recently,
AMD SEV-SNP[2].

[0] Commit a11589563e ("x86/Hyper-V: Report crash register data or kmsg before running crash kernel").
[1] Commit 06e629c25d ("powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic").
[2] Commit 8ef979584e ("crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump").

Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241027204159.985163-1-gpiccoli@igalia.com
2024-11-04 12:28:43 -07:00
Randy Dunlap
32643e10df Documentation: admin: reorganize kernel-parameters intro
Reorganize the introduction to the kernel-parameters file to place
related paragraphs together:

- move module info together and near the beginning
- add a Special Handling section for dashes, underscores, double quotes,
  cpu lists, and metric (KMG) suffixes. Expand the KMG suffixes to
  include TPE as well.
- add a Kernel Build Options section

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241029180320.412629-1-rdunlap@infradead.org
2024-11-04 11:03:24 -07:00
Christoph Lameter
f7c80fad6c SLUB: Add support for per object memory policies
The old SLAB allocator used to support memory policies on a per
allocation bases. In SLUB the memory policies are applied on a
per page frame / folio bases. Doing so avoids having to check memory
policies in critical code paths for kmalloc and friends.

This worked on general well on Intel/AMD/PowerPC because the
interconnect technology is mature and can minimize the latencies
through intelligent caching even if a small object is not
placed optimally.

However, on ARM we have an emergence of new NUMA interconnect
technology based more on embedded devices. Caching of remote content
can currently be ineffective using the standard building blocks / mesh
available on that platform. Such architectures benefit if each slab
object is individually placed according to memory policies
and other restrictions.

This patch adds another kernel parameter

    slab_strict_numa

If that is set then a static branch is activated that will cause
the hotpaths of the allocator to evaluate the current memory
allocation policy. Each object will be properly placed by
paying the price of extra processing and SLUB will no longer
defer to the page allocator to apply memory policies at the
folio level.

This patch improves performance of memcached running
on Ampere Altra 2P system (ARM Neoverse N1 processor)
by 3.6% due to accurate placement of small kernel objects.

Tested-by: Huang Shijie <shijie@os.amperecomputing.com>
Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-10-29 10:43:53 +01:00
Gowthami Thiagarajan
e1dce56443 perf/marvell: Marvell PEM performance monitor support
PCI Express Interface PMU includes various performance counters
to monitor the data that is transmitted over the PCIe link. The
counters track various inbound and outbound transactions which
includes separate counters for posted/non-posted/completion TLPs.
Also, inbound and outbound memory read requests along with their
latencies can also be monitored. Address Translation Services(ATS)events
such as ATS Translation, ATS Page Request, ATS Invalidation along with
their corresponding latencies are also supported.

The performance counters are 64 bits wide.

For instance,
perf stat -e ib_tlp_pr <workload>
tracks the inbound posted TLPs for the workload.

Co-developed-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241028055309.17893-1-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-10-28 17:35:35 +00:00
Pankaj Raghav
30dac24e14
fs/writeback: convert wbc_account_cgroup_owner to take a folio
Most of the callers of wbc_account_cgroup_owner() are converting a folio
to page before calling the function. wbc_account_cgroup_owner() is
converting the page back to a folio to call mem_cgroup_css_from_folio().

Convert wbc_account_cgroup_owner() to take a folio instead of a page,
and convert all callers to pass a folio directly except f2fs.

Convert the page to folio for all the callers from f2fs as they were the
only callers calling wbc_account_cgroup_owner() with a page. As f2fs is
already in the process of converting to folios, these call sites might
also soon be calling wbc_account_cgroup_owner() with a folio directly in
the future.

No functional changes. Only compile tested.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28 13:26:54 +01:00
Joanne Koong
2b3933b1e0 fuse: enable dynamic configuration of fuse max pages limit (FUSE_MAX_MAX_PAGES)
Introduce the capability to dynamically configure the max pages limit
(FUSE_MAX_MAX_PAGES) through a sysctl. This allows system administrators
to dynamically set the maximum number of pages that can be used for
servicing requests in fuse.

Previously, this is gated by FUSE_MAX_MAX_PAGES which is statically set
to 256 pages. One result of this is that the buffer size for a write
request is limited to 1 MiB on a 4k-page system.

The default value for this sysctl is the original limit (256 pages).

$ sysctl -a | grep max_pages_limit
fs.fuse.max_pages_limit = 256

$ sysctl -n fs.fuse.max_pages_limit
256

$ echo 1024 | sudo tee /proc/sys/fs/fuse/max_pages_limit
1024

$ sysctl -n fs.fuse.max_pages_limit
1024

$ echo 65536 | sudo tee /proc/sys/fs/fuse/max_pages_limit
tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument

$ echo 0 | sudo tee /proc/sys/fs/fuse/max_pages_limit
tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument

$ echo 65535 | sudo tee /proc/sys/fs/fuse/max_pages_limit
65535

$ sysctl -n fs.fuse.max_pages_limit
65535

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-10-25 17:05:49 +02:00
Yafang Shao
e6957c99dc
vfs: Add a sysctl for automated deletion of dentry
Commit 681ce86235 ("vfs: Delete the associated dentry when deleting a
file") introduced an unconditional deletion of the associated dentry when a
file is removed. However, this led to performance regressions in specific
benchmarks, such as ilebench.sum_operations/s [0], prompting a revert in
commit 4a4be1ad3a ("Revert "vfs: Delete the associated dentry when
deleting a file"").

This patch seeks to reintroduce the concept conditionally, where the
associated dentry is deleted only when the user explicitly opts for it
during file removal. A new sysctl fs.automated_deletion_of_dentry is
added for this purpose. Its default value is set to 0.

There are practical use cases for this proactive dentry reclamation.
Besides the Elasticsearch use case mentioned in commit 681ce86235,
additional examples have surfaced in our production environment. For
instance, in video rendering services that continuously generate temporary
files, upload them to persistent storage servers, and then delete them, a
large number of negative dentries—serving no useful purpose—accumulate.
Users in such cases would benefit from proactively reclaiming these
negative dentries.

Link: https://lore.kernel.org/linux-fsdevel/202405291318.4dfbb352-oliver.sang@intel.com [0]
Link: https://lore.kernel.org/all/20240912-programm-umgibt-a1145fa73bb6@brauner/
Suggested-by: Christian Brauner <brauner@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20240929122831.92515-1-laoar.shao@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-22 11:16:57 +02:00
Christian Loehle
29dcbea924 cpufreq: docs: Reflect latency changes in docs
There were two changes related to transition latency recently.
Namely commit e13aa799c2 ("cpufreq: Change default transition delay
to 2ms") and
commit 37c6dccd68 ("cpufreq: Remove LATENCY_MULTIPLIER").

Both changed the defaults / maximums so let the documentation
reflect that.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/46853b6e-bad5-4ace-9b23-ff157f234ae3@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-21 13:20:03 +02:00
Luca Boccassi
02e2f9aa33 ipe: allow secondary and platform keyrings to install/update policies
The current policy management makes it impossible to use IPE
in a general purpose distribution. In such cases the users are not
building the kernel, the distribution is, and access to the private
key included in the trusted keyring is, for obvious reason, not
available.
This means that users have no way to enable IPE, since there will
be no built-in generic policy, and no access to the key to sign
updates validated by the trusted keyring.

Just as we do for dm-verity, kernel modules and more, allow the
secondary and platform keyrings to also validate policies. This
allows users enrolling their own keys in UEFI db or MOK to also
sign policies, and enroll them. This makes it sensible to enable
IPE in general purpose distributions, as it becomes usable by
any user wishing to do so. Keys in these keyrings can already
load kernels and kernel modules, so there is no security
downgrade.

Add a kconfig each, like dm-verity does, but default to enabled if
the dependencies are available.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
[FW: fixed some style issues]
Signed-off-by: Fan Wu <wufan@kernel.org>
2024-10-17 11:46:10 -07:00
Luca Boccassi
5ceecb301e ipe: also reject policy updates with the same version
Currently IPE accepts an update that has the same version as the policy
being updated, but it doesn't make it a no-op nor it checks that the
old and new policyes are the same. So it is possible to change the
content of a policy, without changing its version. This is very
confusing from userspace when managing policies.
Instead change the update logic to reject updates that have the same
version with ESTALE, as that is much clearer and intuitive behaviour.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
2024-10-17 11:38:15 -07:00
Tomi Valkeinen
40249b1d5b media: admin-guide: Document the Raspberry Pi CFE (rp1-cfe)
Add documentation for rp1-cfe driver.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2024-10-16 09:32:40 +02:00
Steven Rostedt
f7e1d19105 Documentation/tracing: Mention that RESET_ATTACK_MITIGATION can clear memory
At the 2024 Linux Plumbers Conference, I was talking with Hans de Goede
about the persistent buffer to display traces from previous boots. He
mentioned that UEFI can clear memory. In my own tests I have not seen
this. He later informed me that it requires the config option:

 CONFIG_RESET_ATTACK_MITIGATION

It appears that setting this will allow the memory to be cleared on boot
up, which will definitely clear out the trace of the previous boot.

Add this information under the trace_instance in kernel-parameters.txt
to let people know that this can cause issues.

Link: https://lore.kernel.org/all/20170825155019.6740-2-ard.biesheuvel@linaro.org/

Reported-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241007131653.35837081@gandalf.local.home
2024-10-14 09:50:30 -06:00
Pierre-Louis Bossart
cbcb7edd09 soundwire: intel_auxdevice: add kernel parameter for mclk divider
Add a kernel parameter to work-around discrepancies between hardware
and platform firmware, it's not unusual to see e.g. 38.4MHz listed in
_DSD properties as the SoundWire clock source, but the hardware may be
based on a 19.2 MHz mclk source.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20241004021850.9758-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-10-10 17:43:55 +05:30
Hans Verkuil
95397784be media: staging: drop omap4iss
The omap4 camera driver has seen no progress since forever, and
now OMAP4 support has also been dropped from u-boot (1). So it is
time to retire this driver.

(1): https://lists.denx.de/pipermail/u-boot/2024-July/558846.html

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2024-10-08 13:43:47 +02:00
Mark Brown
a94452112c arm64/idreg: Add overrride for GCS
Hook up an override for GCS, allowing it to be disabled from the command
line by specifying arm64.nogcs in case there are problems.

Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-17-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-04 12:04:36 +01:00
Wei Huang
f69767a1ad PCI: Add TLP Processing Hints (TPH) support
Add support for PCIe TLP Processing Hints (TPH) support (see PCIe r6.2,
sec 6.17).

Add TPH register definitions in pci_regs.h, including the TPH Requester
capability register, TPH Requester control register, TPH Completer
capability, and the ST fields of MSI-X entry.

Introduce pcie_enable_tph() and pcie_disable_tph(), enabling drivers to
toggle TPH support and configure specific ST mode as needed. Also add a new
kernel parameter, "pci=notph", allowing users to disable TPH support across
the entire system.

Link: https://lore.kernel.org/r/20241002165954.128085-2-wei.huang2@amd.com
Co-developed-by: Jing Liu <jing2.liu@intel.com>
Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com>
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2024-10-02 16:20:01 -05:00
Linus Torvalds
3efc57369a x86:
* KVM currently invalidates the entirety of the page tables, not just
   those for the memslot being touched, when a memslot is moved or deleted.
   The former does not have particularly noticeable overhead, but Intel's
   TDX will require the guest to re-accept private pages if they are
   dropped from the secure EPT, which is a non starter.  Actually,
   the only reason why this is not already being done is a bug which
   was never fully investigated and caused VM instability with assigned
   GeForce GPUs, so allow userspace to opt into the new behavior.
 
 * Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
   functionality that is on the horizon).
 
 * Rework common MSR handling code to suppress errors on userspace accesses to
   unsupported-but-advertised MSRs.  This will allow removing (almost?) all of
   KVM's exemptions for userspace access to MSRs that shouldn't exist based on
   the vCPU model (the actual cleanup is non-trivial future work).
 
 * Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
   64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
   stores the entire 64-bit value at the ICR offset.
 
 * Fix a bug where KVM would fail to exit to userspace if one was triggered by
   a fastpath exit handler.
 
 * Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
   there's already a pending wake event at the time of the exit.
 
 * Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest
   state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN
   (the SHUTDOWN hits the VM altogether, not the nested guest)
 
 * Overhaul the "unprotect and retry" logic to more precisely identify cases
   where retrying is actually helpful, and to harden all retry paths against
   putting the guest into an infinite retry loop.
 
 * Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in
   the shadow MMU.
 
 * Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for
   adding multi generation LRU support in KVM.
 
 * Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled,
   i.e. when the CPU has already flushed the RSB.
 
 * Trace the per-CPU host save area as a VMCB pointer to improve readability
   and cleanup the retrieval of the SEV-ES host save area.
 
 * Remove unnecessary accounting of temporary nested VMCB related allocations.
 
 * Set FINAL/PAGE in the page fault error code for EPT violations if and only
   if the GVA is valid.  If the GVA is NOT valid, there is no guest-side page
   table walk and so stuffing paging related metadata is nonsensical.
 
 * Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of
   emulating posted interrupt delivery to L2.
 
 * Add a lockdep assertion to detect unsafe accesses of vmcs12 structures.
 
 * Harden eVMCS loading against an impossible NULL pointer deref (really truly
   should be impossible).
 
 * Minor SGX fix and a cleanup.
 
 * Misc cleanups
 
 Generic:
 
 * Register KVM's cpuhp and syscore callbacks when enabling virtualization in
   hardware, as the sole purpose of said callbacks is to disable and re-enable
   virtualization as needed.
 
 * Enable virtualization when KVM is loaded, not right before the first VM
   is created.  Together with the previous change, this simplifies a
   lot the logic of the callbacks, because their very existence implies
   virtualization is enabled.
 
 * Fix a bug that results in KVM prematurely exiting to userspace for coalesced
   MMIO/PIO in many cases, clean up the related code, and add a testcase.
 
 * Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
   the gpa+len crosses a page boundary, which thankfully is guaranteed to not
   happen in the current code base.  Add WARNs in more helpers that read/write
   guest memory to detect similar bugs.
 
 Selftests:
 
 * Fix a goof that caused some Hyper-V tests to be skipped when run on bare
   metal, i.e. NOT in a VM.
 
 * Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.
 
 * Explicitly include one-off assets in .gitignore.  Past Sean was completely
   wrong about not being able to detect missing .gitignore entries.
 
 * Verify userspace single-stepping works when KVM happens to handle a VM-Exit
   in its fastpath.
 
 * Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb201AUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOM1gf+Ij7dpCh0KwoNYlHfW2aCHAv3PqQd
 cKMDSGxoCernbJEyPO/3qXNUK+p4zKedk3d92snW3mKa+cwxMdfthJ3i9d7uoNiw
 7hAgcfKNHDZGqAQXhx8QcVF3wgp+diXSyirR+h1IKrGtCCmjMdNC8ftSYe6voEkw
 VTVbLL+tER5H0Xo5UKaXbnXKDbQvWLXkdIqM8dtLGFGLQ2PnF/DdMP0p6HYrKf1w
 B7LBu0rvqYDL8/pS82mtR3brHJXxAr9m72fOezRLEUbfUdzkTUi/b1vEe6nDCl0Q
 i/PuFlARDLWuetlR0VVWKNbop/C/l4EmwCcKzFHa+gfNH3L9361Oz+NzBw==
 =Q7kz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
 "x86:

   - KVM currently invalidates the entirety of the page tables, not just
     those for the memslot being touched, when a memslot is moved or
     deleted.

     This does not traditionally have particularly noticeable overhead,
     but Intel's TDX will require the guest to re-accept private pages
     if they are dropped from the secure EPT, which is a non starter.

     Actually, the only reason why this is not already being done is a
     bug which was never fully investigated and caused VM instability
     with assigned GeForce GPUs, so allow userspace to opt into the new
     behavior.

   - Advertise AVX10.1 to userspace (effectively prep work for the
     "real" AVX10 functionality that is on the horizon)

   - Rework common MSR handling code to suppress errors on userspace
     accesses to unsupported-but-advertised MSRs

     This will allow removing (almost?) all of KVM's exemptions for
     userspace access to MSRs that shouldn't exist based on the vCPU
     model (the actual cleanup is non-trivial future work)

   - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
     splits the 64-bit value into the legacy ICR and ICR2 storage,
     whereas Intel (APICv) stores the entire 64-bit value at the ICR
     offset

   - Fix a bug where KVM would fail to exit to userspace if one was
     triggered by a fastpath exit handler

   - Add fastpath handling of HLT VM-Exit to expedite re-entering the
     guest when there's already a pending wake event at the time of the
     exit

   - Fix a WARN caused by RSM entering a nested guest from SMM with
     invalid guest state, by forcing the vCPU out of guest mode prior to
     signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
     nested guest)

   - Overhaul the "unprotect and retry" logic to more precisely identify
     cases where retrying is actually helpful, and to harden all retry
     paths against putting the guest into an infinite retry loop

   - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
     rmaps in the shadow MMU

   - Refactor pieces of the shadow MMU related to aging SPTEs in
     prepartion for adding multi generation LRU support in KVM

   - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
     enabled, i.e. when the CPU has already flushed the RSB

   - Trace the per-CPU host save area as a VMCB pointer to improve
     readability and cleanup the retrieval of the SEV-ES host save area

   - Remove unnecessary accounting of temporary nested VMCB related
     allocations

   - Set FINAL/PAGE in the page fault error code for EPT violations if
     and only if the GVA is valid. If the GVA is NOT valid, there is no
     guest-side page table walk and so stuffing paging related metadata
     is nonsensical

   - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
     instead of emulating posted interrupt delivery to L2

   - Add a lockdep assertion to detect unsafe accesses of vmcs12
     structures

   - Harden eVMCS loading against an impossible NULL pointer deref
     (really truly should be impossible)

   - Minor SGX fix and a cleanup

   - Misc cleanups

  Generic:

   - Register KVM's cpuhp and syscore callbacks when enabling
     virtualization in hardware, as the sole purpose of said callbacks
     is to disable and re-enable virtualization as needed

   - Enable virtualization when KVM is loaded, not right before the
     first VM is created

     Together with the previous change, this simplifies a lot the logic
     of the callbacks, because their very existence implies
     virtualization is enabled

   - Fix a bug that results in KVM prematurely exiting to userspace for
     coalesced MMIO/PIO in many cases, clean up the related code, and
     add a testcase

   - Fix a bug in kvm_clear_guest() where it would trigger a buffer
     overflow _if_ the gpa+len crosses a page boundary, which thankfully
     is guaranteed to not happen in the current code base. Add WARNs in
     more helpers that read/write guest memory to detect similar bugs

  Selftests:

   - Fix a goof that caused some Hyper-V tests to be skipped when run on
     bare metal, i.e. NOT in a VM

   - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
     guest

   - Explicitly include one-off assets in .gitignore. Past Sean was
     completely wrong about not being able to detect missing .gitignore
     entries

   - Verify userspace single-stepping works when KVM happens to handle a
     VM-Exit in its fastpath

   - Misc cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  Documentation: KVM: fix warning in "make htmldocs"
  s390: Enable KVM_S390_UCONTROL config in debug_defconfig
  selftests: kvm: s390: Add VM run test case
  KVM: SVM: let alternatives handle the cases when RSB filling is required
  KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
  KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
  KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
  KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
  KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
  KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
  KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
  KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
  KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
  KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
  KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
  KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
  KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
  KVM: x86: Update retry protection fields when forcing retry on emulation failure
  KVM: x86: Apply retry protection to "unprotect on failure" path
  KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
  ...
2024-09-28 09:20:14 -07:00
Linus Torvalds
e477dba544 - Misc VDO fixes
- Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio()
 
 - Dm-delay: Improve kernel documentation
 
 - Dm-crypt: Allow to specify the integrity key size as an option
 
 - Dm-bufio: Remove pointless NULL check
 
 - Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR; use
   __assign_bit
 
 - Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix smatch
   warning
 
 - Dm-integrity: Support recalculation in the 'I' mode
 
 - Revert "dm: requeue IO if mapping table not yet available"
 
 - Dm-crypt: Small refactoring to make the code more readable
 
 - Dm-cache: Remove pointless error check
 
 - Dm: Fix spelling errors
 
 - Dm-verity: Restart or panic on an I/O error if restart or panic was
   requested
 
 - Dm-verity: Fallback to platform keyring also if key in trusted keyring
   is rejected
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCZvapzRQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbdKAAP4gHNU7aRmwTPcmvytEqBO4Pcz4eGB/
 tytj2+o1orph3AD/YD2X75YHOrdNKTLq+N0ecetAt0yDVUnJAUtKiOnx6Q8=
 =0f9T
 -----END PGP SIGNATURE-----

Merge tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - Misc VDO fixes

 - Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio()

 - Dm-delay: Improve kernel documentation

 - Dm-crypt: Allow to specify the integrity key size as an option

 - Dm-bufio: Remove pointless NULL check

 - Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR;
   use __assign_bit

 - Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix
   smatch warning

 - Dm-integrity: Support recalculation in the 'I' mode

 - Revert "dm: requeue IO if mapping table not yet available"

 - Dm-crypt: Small refactoring to make the code more readable

 - Dm-cache: Remove pointless error check

 - Dm: Fix spelling errors

 - Dm-verity: Restart or panic on an I/O error if restart or panic was
   requested

 - Dm-verity: Fallback to platform keyring also if key in trusted
   keyring is rejected

* tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
  dm verity: fallback to platform keyring also if key in trusted keyring is rejected
  dm-verity: restart or panic on an I/O error
  dm: fix spelling errors
  dm-cache: remove pointless error check
  dm vdo: handle unaligned discards correctly
  dm vdo indexer: Convert comma to semicolon
  dm-crypt: Use common error handling code in crypt_set_keyring_key()
  dm-crypt: Use up_read() together with key_put() only once in crypt_set_keyring_key()
  Revert "dm: requeue IO if mapping table not yet available"
  dm-integrity: check mac_size against HASH_MAX_DIGESTSIZE in sb_mac()
  dm-integrity: support recalculation in the 'I' mode
  dm integrity: Convert comma to semicolon
  dm integrity: fix gcc 5 warning
  dm: Make use of __assign_bit() API
  dm integrity: Remove extra unlikely helper
  dm: Convert to use ERR_CAST()
  dm bufio: Remove NULL check of list_entry()
  dm-crypt: Allow to specify the integrity key size as option
  dm: Remove unused declaration and empty definition "dm_zone_map_bio"
  dm delay: enhance kernel documentation
  ...
2024-09-27 09:12:51 -07:00
Linus Torvalds
abf2050f51 media updates for v6.12-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmbxByQACgkQCF8+vY7k
 4RUWeg//QHB1sjjYsRs0IF6wBdzxWwpjqudxLjUTDEMKHVHUuCPRzMayZrokfTha
 /NOt+SSKpZqRtCjKuyLz7lup81b+oFXQ4CKChXJvLVJ+wuFFd/B9fbs3yw5fCWFk
 odHjVUpLOqdDSOHcqisKwim0ENzvJ4/rvFiiWUPqPJbjWYEyXX4eW3F8JlAXkdkI
 OGX9ixPsdAG2vdXpIsp3T3/KKTo7qwM6j/ckQ4SxkrTHMMdvJrkl3HutibaqOmOq
 FcR4Y9XSkBsNlrJ9CvG/uOSz5aQrY7A7s21OiH59FFyeOWsvKGFkLmAox/+M6vg8
 G/cjZjK3AiDll2fmAWo33MuyQG9HOTEFjWNN3cusr1gTcFowqH+cIjV3SKa1dH/1
 G+4KoOzJdkFrt1Y21zg+tWejoOjdPpklTiSAhgYRSblqsMbr3XF0cOqnSs5JOKo3
 ZgMko/JEI0CKltYc7kcTy8w6fg3g8B9RJ7nATgsKEQsln1hEy/Bii482tk6vbb9g
 7YLAzrExWnLjxOxco/LzHcBkQMov6/HA5ntEMieaybRcMPsXE8WNA2ElNOFOREgP
 MSWqoQD+7CuqYRnmfpQw2SPA0nRy2BT4ltITsA/ksfBWLvoK8UX99eiznkpWSm7p
 191mGPalKV07RUVXgXRTynTvfma6oTKIaebvPLoVaKg11cC5hq0=
 =TPMV
 -----END PGP SIGNATURE-----

Merge tag 'media/v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - New CEC driver: Extron DA HD 4K Plus

 - Lots of driver fixes, cleanups and improvements

* tag 'media/v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (179 commits)
  media: atomisp: Use clamp() in ia_css_eed1_8_vmem_encode()
  media: atomisp: Fix eed1_8 code assigning signed values to an unsigned variable
  media: atomisp: set lock before calling vb2_queue_init()
  media: atomisp: Improve binary finding debug logging
  media: atomisp: Drop dev_dbg() calls from hmm_[alloc|free]()
  media: atomisp: csi2-bridge: Add DMI quirk for t4ka3 on Xiaomi Mipad2
  media: atomisp: add missing wait_prepare/finish ops
  media: atomisp: Remove unused declaration
  media: atomisp: use clamp() in compute_coring()
  media: atomisp: use clamp() in ia_css_eed1_8_encode()
  media: atomisp: Simplify ia_css_pipe_create_cas_scaler_desc_single_output()
  media: atomisp: Replace rarely used macro from math_support.h
  media: atomisp: Remove duplicated leftover, i.e. sh_css_dvs_info.h
  media: atomisp: bnr: fix trailing statement
  media: atomisp: move trailing */ to separate lines
  media: atomisp: move trailing statement to next line.
  media: atomisp: Fix trailing statement in ia_css_de.host.c
  media: atomisp: Fix spelling mistakes in atomisp.h
  media: atomisp: Fix spelling mistakes in atomisp_platform.h
  media: atomisp: Fix spelling mistake in csi_rx_public.h
  ...
2024-09-23 15:27:58 -07:00
Linus Torvalds
af9c191ac2 ring-buffer: Updates for v6.12:
- Merged v6.11-rc3 into trace/ring-buffer/core
 
   The v6.10 ring buffer pull request was not made due to Mathieu Desnoyers
   making a comment to the pull request. Mathieu and I resolved it on IRC,
   but we did not let Linus know that it was resolved. Linus did not do the
   pull thinking it still had some unresolved issues.
 
   The ring buffer work for 6.12 was dependent on both this pull request as
   well as the reserve_mem kernel command line option that was going upstream
   through the memory management tree. The ring buffer repo was being used by
   others so it could not be rebased. In order to continue the work, the
   v6.11-rc3 branch was pulled in to get access to the reserve_mem work.
 
 This has the 6.11 pull request that did not make it into 6.11, which was:
 
   tracing/ring-buffer: Have persistent buffer across reboots
 
   This allows for the tracing instance ring buffer to stay persistent across
   reboots. The way this is done is by adding to the kernel command line:
 
     trace_instance=boot_map@0x285400000:12M
 
   This will reserve 12 megabytes at the address 0x285400000, and then map
   the tracing instance "boot_map" ring buffer to that memory. This will
   appear as a normal instance in the tracefs system:
 
     /sys/kernel/tracing/instances/boot_map
 
   A user could enable tracing in that instance, and on reboot or kernel
   crash, if the memory is not wiped by the firmware, it will recreate the
   trace in that instance. For example, if one was debugging a shutdown of a
   kernel reboot:
 
    # cd /sys/kernel/tracing
    # echo function > instances/boot_map/current_tracer
    # reboot
   [..]
    # cd /sys/kernel/tracing
    # tail instances/boot_map/trace
          swapper/0-1       [000] d..1.   164.549800: restore_boot_irq_mode <-native_machine_shutdown
          swapper/0-1       [000] d..1.   164.549801: native_restore_boot_irq_mode <-native_machine_shutdown
          swapper/0-1       [000] d..1.   164.549802: disconnect_bsp_APIC <-native_machine_shutdown
          swapper/0-1       [000] d..1.   164.549811: hpet_disable <-native_machine_shutdown
          swapper/0-1       [000] d..1.   164.549812: iommu_shutdown_noop <-native_machine_restart
          swapper/0-1       [000] d..1.   164.549813: native_machine_emergency_restart <-__do_sys_reboot
          swapper/0-1       [000] d..1.   164.549813: tboot_shutdown <-native_machine_emergency_restart
          swapper/0-1       [000] d..1.   164.549820: acpi_reboot <-native_machine_emergency_restart
          swapper/0-1       [000] d..1.   164.549821: acpi_reset <-acpi_reboot
          swapper/0-1       [000] d..1.   164.549822: acpi_os_write_port <-acpi_reboot
 
   On reboot, the buffer is examined to make sure it is valid. The validation
   check even steps through every event to make sure the meta data of the
   event is correct. If any test fails, it will simply reset the buffer, and
   the buffer will be empty on boot.
 
 The new changes for 6.12 are:
 
 - Allow the tracing persistent boot buffer to use the "reserve_mem" option
 
   Instead of having the admin find a physical address to store the persistent
   buffer, which can be very tedious if they have to administrate several
   different machines, allow them to use the "reserve_mem" option that will
   find a location for them. It is not as reliable because of KASLR, as the
   loading of the kernel in different locations can cause the memory
   allocated to be inconsistent. Booting with "nokaslr" can make reserve_mem
   more reliable.
 
 - Have function graph tracer handle offsets from a previous boot.
 
   The ring buffer output from a previous boot may have different addresses
   due to kaslr. Have the function graph tracer handle these by using the
   delta from the previous boot to the new boot address space.
 
 - Only reset the saved meta offset when the buffer is started or reset
 
   In the persistent memory meta data, it holds the previous address space
   information, so that it can calculate the delta to have function tracing
   work. But this gets updated after being read to hold the new address
   space. But if the buffer isn't used for that boot, on reboot, the delta is
   now calculated from the previous boot and not the boot that holds the data
   in the ring buffer. This causes the functions not to be shown. Do not save
   the address space information of the current kernel until it is being
   recorded.
 
 - Add a magic variable to test the valid meta data
 
   Add a magic variable in the meta data that can also be used for
   validation. The validator of the previous buffer doesn't need this magic
   data, but it can be used if the meta data is changed by a new kernel, which
   may have the same format that passes the validator but is used
   differently. This magic number can also be used as a "versioning" of the
   meta data.
 
 - Align user space mapped ring buffer sub buffers to improve TLB entries
 
   Linus mentioned that the mapped ring buffer sub buffers were misaligned
   between the meta page and the sub-buffers, so that if the sub-buffers were
   bigger than PAGE_SIZE, it wouldn't allow the TLB to use bigger entries.
 
 - Add new kernel command line "traceoff" to disable tracing on boot for instances
 
   If tracing is enabled for a boot instance, there needs a way to be able to
   disable it on boot so that new events do not get entered into the ring
   buffer and be mixed with events from a previous boot, as that can be
   confusing.
 
 - Allow trace_printk() to go to other instances
 
   Currently, trace_printk() can only go to the top level instance. When
   debugging with a persistent buffer, it is really useful to be able to add
   trace_printk() to go to that buffer, so that you have access to them after
   a crash.
 
 - Do not use "bin_printk()" for traces to a boot instance
 
   The bin_printk() saves only a pointer to the printk format in the ring
   buffer, as the reader of the buffer can still have access to it. But this
   is not the case if the buffer is from a previous boot. If the
   trace_printk() is going to a "persistent" buffer, it will use the slower
   version that writes the printk format into the buffer.
 
 - Add command line option to allow trace_printk() to go to an instance
 
   Allow the kernel command line to define which instance the trace_printk()
   goes to, instead of forcing the admin to set it for every boot via the
   tracefs options.
 
 - Start a document that explains how to use tracefs to debug the kernel
 
 - Add some more kernel selftests to test user mapped ring buffer
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZu/PxxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qowiAQCx86Nm48aCACjrvGWCFb+jgQZn8QdO
 MeK15Fcc5C3b5gEAkJkDKqtul7ybI9+vq+3yNzdl7pO7Y7+pCNzz3PfVaQA=
 =Ce81
 -----END PGP SIGNATURE-----

Merge tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:

 - tracing/ring-buffer: persistent buffer across reboots

   This allows for the tracing instance ring buffer to stay persistent
   across reboots. The way this is done is by adding to the kernel
   command line:

     trace_instance=boot_map@0x285400000:12M

   This will reserve 12 megabytes at the address 0x285400000, and then
   map the tracing instance "boot_map" ring buffer to that memory. This
   will appear as a normal instance in the tracefs system:

     /sys/kernel/tracing/instances/boot_map

   A user could enable tracing in that instance, and on reboot or kernel
   crash, if the memory is not wiped by the firmware, it will recreate
   the trace in that instance. For example, if one was debugging a
   shutdown of a kernel reboot:

     # cd /sys/kernel/tracing
     # echo function > instances/boot_map/current_tracer
     # reboot
     [..]
     # cd /sys/kernel/tracing
     # tail instances/boot_map/trace
           swapper/0-1       [000] d..1.   164.549800: restore_boot_irq_mode <-native_machine_shutdown
           swapper/0-1       [000] d..1.   164.549801: native_restore_boot_irq_mode <-native_machine_shutdown
           swapper/0-1       [000] d..1.   164.549802: disconnect_bsp_APIC <-native_machine_shutdown
           swapper/0-1       [000] d..1.   164.549811: hpet_disable <-native_machine_shutdown
           swapper/0-1       [000] d..1.   164.549812: iommu_shutdown_noop <-native_machine_restart
           swapper/0-1       [000] d..1.   164.549813: native_machine_emergency_restart <-__do_sys_reboot
           swapper/0-1       [000] d..1.   164.549813: tboot_shutdown <-native_machine_emergency_restart
           swapper/0-1       [000] d..1.   164.549820: acpi_reboot <-native_machine_emergency_restart
           swapper/0-1       [000] d..1.   164.549821: acpi_reset <-acpi_reboot
           swapper/0-1       [000] d..1.   164.549822: acpi_os_write_port <-acpi_reboot

   On reboot, the buffer is examined to make sure it is valid. The
   validation check even steps through every event to make sure the meta
   data of the event is correct. If any test fails, it will simply reset
   the buffer, and the buffer will be empty on boot.

 - Allow the tracing persistent boot buffer to use the "reserve_mem"
   option

   Instead of having the admin find a physical address to store the
   persistent buffer, which can be very tedious if they have to
   administrate several different machines, allow them to use the
   "reserve_mem" option that will find a location for them. It is not as
   reliable because of KASLR, as the loading of the kernel in different
   locations can cause the memory allocated to be inconsistent. Booting
   with "nokaslr" can make reserve_mem more reliable.

 - Have function graph tracer handle offsets from a previous boot.

   The ring buffer output from a previous boot may have different
   addresses due to kaslr. Have the function graph tracer handle these
   by using the delta from the previous boot to the new boot address
   space.

 - Only reset the saved meta offset when the buffer is started or reset

   In the persistent memory meta data, it holds the previous address
   space information, so that it can calculate the delta to have
   function tracing work. But this gets updated after being read to hold
   the new address space. But if the buffer isn't used for that boot, on
   reboot, the delta is now calculated from the previous boot and not
   the boot that holds the data in the ring buffer. This causes the
   functions not to be shown. Do not save the address space information
   of the current kernel until it is being recorded.

 - Add a magic variable to test the valid meta data

   Add a magic variable in the meta data that can also be used for
   validation. The validator of the previous buffer doesn't need this
   magic data, but it can be used if the meta data is changed by a new
   kernel, which may have the same format that passes the validator but
   is used differently. This magic number can also be used as a
   "versioning" of the meta data.

 - Align user space mapped ring buffer sub buffers to improve TLB
   entries

   Linus mentioned that the mapped ring buffer sub buffers were
   misaligned between the meta page and the sub-buffers, so that if the
   sub-buffers were bigger than PAGE_SIZE, it wouldn't allow the TLB to
   use bigger entries.

 - Add new kernel command line "traceoff" to disable tracing on boot for
   instances

   If tracing is enabled for a boot instance, there needs a way to be
   able to disable it on boot so that new events do not get entered into
   the ring buffer and be mixed with events from a previous boot, as
   that can be confusing.

 - Allow trace_printk() to go to other instances

   Currently, trace_printk() can only go to the top level instance. When
   debugging with a persistent buffer, it is really useful to be able to
   add trace_printk() to go to that buffer, so that you have access to
   them after a crash.

 - Do not use "bin_printk()" for traces to a boot instance

   The bin_printk() saves only a pointer to the printk format in the
   ring buffer, as the reader of the buffer can still have access to it.
   But this is not the case if the buffer is from a previous boot. If
   the trace_printk() is going to a "persistent" buffer, it will use the
   slower version that writes the printk format into the buffer.

 - Add command line option to allow trace_printk() to go to an instance

   Allow the kernel command line to define which instance the
   trace_printk() goes to, instead of forcing the admin to set it for
   every boot via the tracefs options.

 - Start a document that explains how to use tracefs to debug the kernel

 - Add some more kernel selftests to test user mapped ring buffer

* tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (28 commits)
  selftests/ring-buffer: Handle meta-page bigger than the system
  selftests/ring-buffer: Verify the entire meta-page padding
  tracing/Documentation: Start a document on how to debug with tracing
  tracing: Add option to set an instance to be the trace_printk destination
  tracing: Have trace_printk not use binary prints if boot buffer
  tracing: Allow trace_printk() to go to other instance buffers
  tracing: Add "traceoff" flag to boot time tracing instances
  ring-buffer: Align meta-page to sub-buffers for improved TLB usage
  ring-buffer: Add magic and struct size to boot up meta data
  ring-buffer: Don't reset persistent ring-buffer meta saved addresses
  tracing/fgraph: Have fgraph handle previous boot function addresses
  tracing: Allow boot instances to use reserve_mem boot memory
  tracing: Fix ifdef of snapshots to not prevent last_boot_info file
  ring-buffer: Use vma_pages() helper function
  tracing: Fix NULL vs IS_ERR() check in enable_instances()
  tracing: Add last boot delta offset for stack traces
  tracing: Update function tracing output for previous boot buffer
  tracing: Handle old buffer mappings for event strings and functions
  tracing/ring-buffer: Add last_boot_info file to boot instance
  ring-buffer: Save text and data locations in mapped meta data
  ...
2024-09-22 09:47:16 -07:00
Linus Torvalds
7856a56541 Many singleton patches - please see the various changelogs for details.
Quite a lot of nilfs2 work this time around.
 
 Notable patch series in this pull request are:
 
 "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with
 assistance from Uwe Kleine-König.  Reimplement mul_u64_u64_div_u64() to
 provide (much) more accurate results.  The current implementation was
 causing Uwe some issues in the PWM drivers.
 
 "xz: Updates to license, filters, and compression options" from Lasse
 Collin.  Miscellaneous maintenance and kinor feature work to the xz
 decompressor.
 
 "Fix some GDB command error and add some GDB commands" from Kuan-Ying Lee.
 Fixes and enhancements to the gdb scripts.
 
 "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff Johnson.
 Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of warnings about this.
 
 "nilfs2: add support for some common ioctls" from Ryusuke Konishi.  Adds
 various commonly-available ioctls to nilfs2.
 
 "This series fixes a number of formatting issues in kernel doc comments"
 from Ryusuke Konishi does that.
 
 "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke Konishi.  Fix
 issues where -ENOENT was being unintentionally and inappropriately
 returned to userspace.
 
 "nilfs2: assorted cleanups" from Huang Xiaojia.
 
 "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke
 Konishi fixes some issues which can occur on corrupted nilfs2 filesystems.
 
 "scripts/decode_stacktrace.sh: improve error reporting and usability" from
 Luca Ceresoli does those things.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu7dpAAKCRDdBJ7gKXxA
 jsPqAPwMDEZyKlfSw7QioEHNHDkmkbP7VYCYR0CbUnppbztwpAD8D37aVbWQ+UzM
 3nnOq3W2Pc2o/20zqi8Upf1mnvUrygQ=
 =/NWE
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Many singleton patches - please see the various changelogs for
  details.

  Quite a lot of nilfs2 work this time around.

  Notable patch series in this pull request are:

   - "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with
     assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64()
     to provide (much) more accurate results. The current implementation
     was causing Uwe some issues in the PWM drivers.

   - "xz: Updates to license, filters, and compression options" from
     Lasse Collin. Miscellaneous maintenance and kinor feature work to
     the xz decompressor.

   - "Fix some GDB command error and add some GDB commands" from
     Kuan-Ying Lee. Fixes and enhancements to the gdb scripts.

   - "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff
     Johnson. Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of
     warnings about this.

   - "nilfs2: add support for some common ioctls" from Ryusuke Konishi.
     Adds various commonly-available ioctls to nilfs2.

   - "This series fixes a number of formatting issues in kernel doc
     comments" from Ryusuke Konishi does that.

   - "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke
     Konishi. Fix issues where -ENOENT was being unintentionally and
     inappropriately returned to userspace.

   - "nilfs2: assorted cleanups" from Huang Xiaojia.

   - "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke
     Konishi fixes some issues which can occur on corrupted nilfs2
     filesystems.

   - "scripts/decode_stacktrace.sh: improve error reporting and
     usability" from Luca Ceresoli does those things"

* tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (103 commits)
  list: test: increase coverage of list_test_list_replace*()
  list: test: fix tests for list_cut_position()
  proc: use __auto_type more
  treewide: correct the typo 'retun'
  ocfs2: cleanup return value and mlog in ocfs2_global_read_info()
  nilfs2: remove duplicate 'unlikely()' usage
  nilfs2: fix potential oob read in nilfs_btree_check_delete()
  nilfs2: determine empty node blocks as corrupted
  nilfs2: fix potential null-ptr-deref in nilfs_btree_insert()
  user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation
  tools/mm: rm thp_swap_allocator_test when make clean
  squashfs: fix percpu address space issues in decompressor_multi_percpu.c
  lib: glob.c: added null check for character class
  nilfs2: refactor nilfs_segctor_thread()
  nilfs2: use kthread_create and kthread_stop for the log writer thread
  nilfs2: remove sc_timer_task
  nilfs2: do not repair reserved inode bitmap in nilfs_new_inode()
  nilfs2: eliminate the shared counter and spinlock for i_generation
  nilfs2: separate inode type information from i_state field
  nilfs2: use the BITS_PER_LONG macro
  ...
2024-09-21 08:20:50 -07:00
Linus Torvalds
617a814f14 ALong with the usual shower of singleton patches, notable patch series in
this pull request are:
 
 "Align kvrealloc() with krealloc()" from Danilo Krummrich.  Adds
 consistency to the APIs and behaviour of these two core allocation
 functions.  This also simplifies/enables Rustification.
 
 "Some cleanups for shmem" from Baolin Wang.  No functional changes - mode
 code reuse, better function naming, logic simplifications.
 
 "mm: some small page fault cleanups" from Josef Bacik.  No functional
 changes - code cleanups only.
 
 "Various memory tiering fixes" from Zi Yan.  A small fix and a little
 cleanup.
 
 "mm/swap: remove boilerplate" from Yu Zhao.  Code cleanups and
 simplifications and .text shrinkage.
 
 "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt.  This
 is a feature, it adds new feilds to /proc/vmstat such as
 
     $ grep kstack /proc/vmstat
     kstack_1k 3
     kstack_2k 188
     kstack_4k 11391
     kstack_8k 243
     kstack_16k 0
 
 which tells us that 11391 processes used 4k of stack while none at all
 used 16k.  Useful for some system tuning things, but partivularly useful
 for "the dynamic kernel stack project".
 
 "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
 Teaches kmemleak to detect leaksage of percpu memory.
 
 "mm: memcg: page counters optimizations" from Roman Gushchin.  "3
 independent small optimizations of page counters".
 
 "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
 Hildenbrand.  Improves PTE/PMD splitlock detection, makes powerpc/8xx work
 correctly by design rather than by accident.
 
 "mm: remove arch_make_page_accessible()" from David Hildenbrand.  Some
 folio conversions which make arch_make_page_accessible() unneeded.
 
 "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
 Cleans up and fixes our handling of the resetting of the cgroup/process
 peak-memory-use detector.
 
 "Make core VMA operations internal and testable" from Lorenzo Stoakes.
 Rationalizaion and encapsulation of the VMA manipulation APIs.  With a
 view to better enable testing of the VMA functions, even from a
 userspace-only harness.
 
 "mm: zswap: fixes for global shrinker" from Takero Funaki.  Fix issues in
 the zswap global shrinker, resulting in improved performance.
 
 "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao.  Fill in
 some missing info in /proc/zoneinfo.
 
 "mm: replace follow_page() by folio_walk" from David Hildenbrand.  Code
 cleanups and rationalizations (conversion to folio_walk()) resulting in
 the removal of follow_page().
 
 "improving dynamic zswap shrinker protection scheme" from Nhat Pham.  Some
 tuning to improve zswap's dynamic shrinker.  Significant reductions in
 swapin and improvements in performance are shown.
 
 "mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
 Improvements to the new unaccepted memory feature,
 
 "mm/mprotect: Fix dax puds" from Peter Xu.  Implements mprotect on DAX
 PUDs.  This was missing, although nobody seems to have notied yet.
 
 "Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
 Cleanups and modest performance improvements for the maple tree library
 code.
 
 "memcg: further decouple v1 code from v2" from Shakeel Butt.  Move more
 cgroup v1 remnants away from the v2 memcg code.
 
 "memcg: initiate deprecation of v1 features" from Shakeel Butt.  Adds
 various warnings telling users that memcg v1 features are deprecated.
 
 "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
 Greatly improves the success rate of the mTHP swap allocation.
 
 "mm: introduce numa_memblks" from Mike Rapoport.  Moves various disparate
 per-arch implementations of numa_memblk code into generic code.
 
 "mm: batch free swaps for zap_pte_range()" from Barry Song.  Greatly
 improves the performance of munmap() of swap-filled ptes.
 
 "support large folio swap-out and swap-in for shmem" from Baolin Wang.
 With this series we no longer split shmem large folios into simgle-page
 folios when swapping out shmem.
 
 "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao.  Nice performance
 improvements and code reductions for gigantic folios.
 
 "support shmem mTHP collapse" from Baolin Wang.  Adds support for
 khugepaged's collapsing of shmem mTHP folios.
 
 "mm: Optimize mseal checks" from Pedro Falcato.  Fixes an mprotect()
 performance regression due to the addition of mseal().
 
 "Increase the number of bits available in page_type" from Matthew Wilcox.
 Increases the number of bits available in page_type!
 
 "Simplify the page flags a little" from Matthew Wilcox.  Many legacy page
 flags are now folio flags, so the page-based flags and their
 accessors/mutators can be removed.
 
 "mm: store zero pages to be swapped out in a bitmap" from Usama Arif.  An
 optimization which permits us to avoid writing/reading zero-filled zswap
 pages to backing store.
 
 "Avoid MAP_FIXED gap exposure" from Liam Howlett.  Fixes a race window
 which occurs when a MAP_FIXED operqtion is occurring during an unrelated
 vma tree walk.
 
 "mm: remove vma_merge()" from Lorenzo Stoakes.  Major rotorooting of the
 vma_merge() functionality, making ot cleaner, more testable and better
 tested.
 
 "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.  Minor
 fixups of DAMON selftests and kunit tests.
 
 "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.  Code
 cleanups and folio conversions.
 
 "Shmem mTHP controls and stats improvements" from Ryan Roberts.  Cleanups
 for shmem controls and stats.
 
 "mm: count the number of anonymous THPs per size" from Barry Song.  Expose
 additional anon THP stats to userspace for improved tuning.
 
 "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
 conversions and removal of now-unused page-based APIs.
 
 "replace per-quota region priorities histogram buffer with per-context
 one" from SeongJae Park.  DAMON histogram rationalization.
 
 "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
 Park.  DAMON documentation updates.
 
 "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
 related doc and warn" from Jason Wang: fixes usage of page allocator
 __GFP_NOFAIL and GFP_ATOMIC flags.
 
 "mm: split underused THPs" from Yu Zhao.  Improve THP=always policy - this
 was overprovisioning THPs in sparsely accessed memory areas.
 
 "zram: introduce custom comp backends API" frm Sergey Senozhatsky.  Add
 support for zram run-time compression algorithm tuning.
 
 "mm: Care about shadow stack guard gap when getting an unmapped area" from
 Mark Brown.  Fix up the various arch_get_unmapped_area() implementations
 to better respect guard areas.
 
 "Improve mem_cgroup_iter()" from Kinsey Ho.  Improve the reliability of
 mem_cgroup_iter() and various code cleanups.
 
 "mm: Support huge pfnmaps" from Peter Xu.  Extends the usage of huge
 pfnmap support.
 
 "resource: Fix region_intersects() vs add_memory_driver_managed()" from
 Huang Ying.  Fix a bug in region_intersects() for systems with CXL memory.
 
 "mm: hwpoison: two more poison recovery" from Kefeng Wang.  Teaches a
 couple more code paths to correctly recover from the encountering of
 poisoned memry.
 
 "mm: enable large folios swap-in support" from Barry Song.  Support the
 swapin of mTHP memory into appropriately-sized folios, rather than into
 single-page folios.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
 jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
 =s0T+
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Along with the usual shower of singleton patches, notable patch series
  in this pull request are:

   - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
     consistency to the APIs and behaviour of these two core allocation
     functions. This also simplifies/enables Rustification.

   - "Some cleanups for shmem" from Baolin Wang. No functional changes -
     mode code reuse, better function naming, logic simplifications.

   - "mm: some small page fault cleanups" from Josef Bacik. No
     functional changes - code cleanups only.

   - "Various memory tiering fixes" from Zi Yan. A small fix and a
     little cleanup.

   - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
     simplifications and .text shrinkage.

   - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
     Butt. This is a feature, it adds new feilds to /proc/vmstat such as

       $ grep kstack /proc/vmstat
       kstack_1k 3
       kstack_2k 188
       kstack_4k 11391
       kstack_8k 243
       kstack_16k 0

     which tells us that 11391 processes used 4k of stack while none at
     all used 16k. Useful for some system tuning things, but
     partivularly useful for "the dynamic kernel stack project".

   - "kmemleak: support for percpu memory leak detect" from Pavel
     Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.

   - "mm: memcg: page counters optimizations" from Roman Gushchin. "3
     independent small optimizations of page counters".

   - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
     David Hildenbrand. Improves PTE/PMD splitlock detection, makes
     powerpc/8xx work correctly by design rather than by accident.

   - "mm: remove arch_make_page_accessible()" from David Hildenbrand.
     Some folio conversions which make arch_make_page_accessible()
     unneeded.

   - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
     Finkel. Cleans up and fixes our handling of the resetting of the
     cgroup/process peak-memory-use detector.

   - "Make core VMA operations internal and testable" from Lorenzo
     Stoakes. Rationalizaion and encapsulation of the VMA manipulation
     APIs. With a view to better enable testing of the VMA functions,
     even from a userspace-only harness.

   - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
     issues in the zswap global shrinker, resulting in improved
     performance.

   - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
     in some missing info in /proc/zoneinfo.

   - "mm: replace follow_page() by folio_walk" from David Hildenbrand.
     Code cleanups and rationalizations (conversion to folio_walk())
     resulting in the removal of follow_page().

   - "improving dynamic zswap shrinker protection scheme" from Nhat
     Pham. Some tuning to improve zswap's dynamic shrinker. Significant
     reductions in swapin and improvements in performance are shown.

   - "mm: Fix several issues with unaccepted memory" from Kirill
     Shutemov. Improvements to the new unaccepted memory feature,

   - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
     DAX PUDs. This was missing, although nobody seems to have notied
     yet.

   - "Introduce a store type enum for the Maple tree" from Sidhartha
     Kumar. Cleanups and modest performance improvements for the maple
     tree library code.

   - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
     more cgroup v1 remnants away from the v2 memcg code.

   - "memcg: initiate deprecation of v1 features" from Shakeel Butt.
     Adds various warnings telling users that memcg v1 features are
     deprecated.

   - "mm: swap: mTHP swap allocator base on swap cluster order" from
     Chris Li. Greatly improves the success rate of the mTHP swap
     allocation.

   - "mm: introduce numa_memblks" from Mike Rapoport. Moves various
     disparate per-arch implementations of numa_memblk code into generic
     code.

   - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
     improves the performance of munmap() of swap-filled ptes.

   - "support large folio swap-out and swap-in for shmem" from Baolin
     Wang. With this series we no longer split shmem large folios into
     simgle-page folios when swapping out shmem.

   - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
     performance improvements and code reductions for gigantic folios.

   - "support shmem mTHP collapse" from Baolin Wang. Adds support for
     khugepaged's collapsing of shmem mTHP folios.

   - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
     performance regression due to the addition of mseal().

   - "Increase the number of bits available in page_type" from Matthew
     Wilcox. Increases the number of bits available in page_type!

   - "Simplify the page flags a little" from Matthew Wilcox. Many legacy
     page flags are now folio flags, so the page-based flags and their
     accessors/mutators can be removed.

   - "mm: store zero pages to be swapped out in a bitmap" from Usama
     Arif. An optimization which permits us to avoid writing/reading
     zero-filled zswap pages to backing store.

   - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
     window which occurs when a MAP_FIXED operqtion is occurring during
     an unrelated vma tree walk.

   - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
     the vma_merge() functionality, making ot cleaner, more testable and
     better tested.

   - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
     Minor fixups of DAMON selftests and kunit tests.

   - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
     Code cleanups and folio conversions.

   - "Shmem mTHP controls and stats improvements" from Ryan Roberts.
     Cleanups for shmem controls and stats.

   - "mm: count the number of anonymous THPs per size" from Barry Song.
     Expose additional anon THP stats to userspace for improved tuning.

   - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
     folio conversions and removal of now-unused page-based APIs.

   - "replace per-quota region priorities histogram buffer with
     per-context one" from SeongJae Park. DAMON histogram
     rationalization.

   - "Docs/damon: update GitHub repo URLs and maintainer-profile" from
     SeongJae Park. DAMON documentation updates.

   - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
     improve related doc and warn" from Jason Wang: fixes usage of page
     allocator __GFP_NOFAIL and GFP_ATOMIC flags.

   - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
     This was overprovisioning THPs in sparsely accessed memory areas.

   - "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
     Add support for zram run-time compression algorithm tuning.

   - "mm: Care about shadow stack guard gap when getting an unmapped
     area" from Mark Brown. Fix up the various arch_get_unmapped_area()
     implementations to better respect guard areas.

   - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
     of mem_cgroup_iter() and various code cleanups.

   - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
     pfnmap support.

   - "resource: Fix region_intersects() vs add_memory_driver_managed()"
     from Huang Ying. Fix a bug in region_intersects() for systems with
     CXL memory.

   - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
     a couple more code paths to correctly recover from the encountering
     of poisoned memry.

   - "mm: enable large folios swap-in support" from Barry Song. Support
     the swapin of mTHP memory into appropriately-sized folios, rather
     than into single-page folios"

* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
  zram: free secondary algorithms names
  uprobes: turn xol_area->pages[2] into xol_area->page
  uprobes: introduce the global struct vm_special_mapping xol_mapping
  Revert "uprobes: use vm_special_mapping close() functionality"
  mm: support large folios swap-in for sync io devices
  mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
  mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
  mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
  set_memory: add __must_check to generic stubs
  mm/vma: return the exact errno in vms_gather_munmap_vmas()
  memcg: cleanup with !CONFIG_MEMCG_V1
  mm/show_mem.c: report alloc tags in human readable units
  mm: support poison recovery from copy_present_page()
  mm: support poison recovery from do_cow_fault()
  resource, kunit: add test case for region_intersects()
  resource: make alloc_free_mem_region() works for iomem_resource
  mm: z3fold: deprecate CONFIG_Z3FOLD
  vfio/pci: implement huge_fault support
  mm/arm64: support large pfn mappings
  mm/x86: support large pfn mappings
  ...
2024-09-21 07:29:05 -07:00
Linus Torvalds
056f8c437d Lots of cleanups and bug fixes this cycle, primarily in the block
allocation, extent management, fast commit, and journalling.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmbsGRcACgkQ8vlZVpUN
 gaP+pwgAop3LUpOFQ9dPRTR3+37AJI8adfabfLIDkEkoVA7lyYY/6Q8pcQ0rklq3
 wE1WxrJ7MaE1GaFCwRIDIL6TP+uYRK0pPjqbFBxGakhDc+WXrTcALOWWofb7J7PL
 FLwP264lRRfKfpMHdK8bx6YHnEN8425PR+ZNXGVPsw+wjo72mmnq54w+ct1iOKiw
 dKfIrwwCGKlBsNdYHS/XsSx7MMK8e7nsKoSq0UtpJ4PqF11/asOtlYYODc4hd27U
 E3I3UDKuntmz+meAscDejOJqQk5FT184HIt/Y5JfetKU2zpUFj9IKqXDzMjijdaj
 vGn9RkTXfJdxMPm1ouF2R6KIRJollg==
 =V7+A
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Lots of cleanups and bug fixes this cycle, primarily in the block
  allocation, extent management, fast commit, and journalling"

* tag 'ext4_for_linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (93 commits)
  ext4: convert EXT4_B2C(sbi->s_stripe) users to EXT4_NUM_B2C
  ext4: check stripe size compatibility on remount as well
  ext4: fix i_data_sem unlock order in ext4_ind_migrate()
  ext4: remove the special buffer dirty handling in do_journal_get_write_access
  ext4: fix a potential assertion failure due to improperly dirtied buffer
  ext4: hoist ext4_block_write_begin and replace the __block_write_begin
  ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers
  ext4: dax: keep orphan list before truncate overflow allocated blocks
  ext4: fix error message when rejecting the default hash
  ext4: save unnecessary indentation in ext4_ext_create_new_leaf()
  ext4: make some fast commit functions reuse extents path
  ext4: refactor ext4_swap_extents() to reuse extents path
  ext4: get rid of ppath in convert_initialized_extent()
  ext4: get rid of ppath in ext4_ext_handle_unwritten_extents()
  ext4: get rid of ppath in ext4_ext_convert_to_initialized()
  ext4: get rid of ppath in ext4_convert_unwritten_extents_endio()
  ext4: get rid of ppath in ext4_split_convert_extents()
  ext4: get rid of ppath in ext4_split_extent()
  ext4: get rid of ppath in ext4_force_split_extent_at()
  ext4: get rid of ppath in ext4_split_extent_at()
  ...
2024-09-20 19:26:45 -07:00
Linus Torvalds
84bbfe6b64 platform-drivers-x86 for v6.12-1
Highlights:
  -  asus-wmi: Add support for vivobook fan profiles
  -  dell-laptop: Add knobs to change battery charge settings
  -  lg-laptop: Add operation region support
  -  intel-uncore-freq: Add support for efficiency latency control
  -  intel/ifs: Add SBAF test support
  -  intel/pmc: Ignore all LTRs during suspend
  -  platform/surface: Support for arm64 based Surface devices
  -  wmi: Pass event data directly to legacy notify handlers
  -  x86/platform/geode: switch GPIO buttons and LEDs to software properties
  -  bunch of small cleanups, fixes, hw-id additions, etc.
 
 The following is an automated git shortlog grouped by driver:
 
 Documentation:
  -  admin-guide: pm: Add efficiency vs. latency tradeoff to uncore documentation
 
 ISST:
  -  Simplify isst_misc_reg() and isst_misc_unreg()
 
 MAINTAINERS:
  -  adjust file entry in INTEL MID PLATFORM
  -  Add Intel MID section
 
 Merge tag 'hwmon-for-v6.11-rc7' into review-hans:
  - Merge tag 'hwmon-for-v6.11-rc7' into review-hans
 
 Merge tag 'platform-drivers-x86-v6.11-3' into review-hans:
  - Merge tag 'platform-drivers-x86-v6.11-3' into review-hans
 
 acer-wmi:
  -  Use backlight power constants
 
 asus-laptop:
  -  Use backlight power constants
 
 asus-nb-wmi:
  -  Use backlight power constants
 
 asus-wmi:
  -  don't fail if platform_profile already registered
  -  add debug print in more key places
  -  Use backlight power constants
  -  add support for vivobook fan profiles
 
 dell-laptop:
  -  remove duplicate code w/ battery function
  -  Add knobs to change battery charge settings
 
 dt-bindings:
  -  platform: Add Surface System Aggregator Module
  -  serial: Allow embedded-controller as child node
 
 eeepc-laptop:
  -  Use backlight power constants
 
 eeepc-wmi:
  -  Use backlight power constants
 
 fujitsu-laptop:
  -  Use backlight power constants
 
 hid-asus:
  -  use hid for brightness control on keyboard
 
 ideapad-laptop:
  -  Make the scope_guard() clear of its scope
  -  move ACPI helpers from header to source file
  -  Use backlight power constants
 
 int3472:
  -  Use str_high_low()
  -  Use GPIO_LOOKUP() macro
  -  make common part a separate module
 
 intel-hid:
  -  Use string_choices API instead of ternary operator
 
 intel/pmc:
  -  Ignore all LTRs during suspend
  -  Remove unused param idx from pmc_for_each_mode()
 
 intel_scu_ipc:
  -  Move intel_scu_ipc.h out of arch/x86/include/asm
 
 intel_scu_wdt:
  -  Move intel_scu_wdt.h to x86 subfolder
 
 lenovo-ymc:
  -  Ignore the 0x0 state
 
 lg-laptop:
  -  Add operation region support
 
 oaktrail:
  -  Use backlight power constants
 
 panasonic-laptop:
  -  Add support for programmable buttons
 
 platform/mellanox:
  -  mlxbf-pmc: fix lockdep warning
 
 platform/olpc:
  -  Remove redundant null pointer checks in olpc_ec_setup_debugfs()
 
 platform/surface:
  -  Add OF support
 
 platform/x86/amd:
  -  pmf: Add quirk for TUF Gaming A14
 
 platform/x86/amd/pmf:
  -  Update SMU metrics table for 1AH family series
  -  Relocate CPU ID macros to the PMF header
  -  Add support for notifying Smart PC Solution updates
 
 platform/x86/intel-uncore-freq:
  -  Add efficiency latency control to sysfs interface
  -  Add support for efficiency latency control
  -  Do not present separate package-die domain
 
 platform/x86/intel/ifs:
  -  Fix SBAF title underline length
  -  Add SBAF test support
  -  Add SBAF test image loading support
  -  Refactor MSR usage in IFS test code
 
 platform/x86/intel/pmc:
  -  Show live substate requirements
 
 platform/x86/intel/pmt:
  -  Use PMT callbacks
 
 platform/x86/intel/vsec:
  -  Add PMT read callbacks
 
 platform/x86/intel/vsec.h:
  -  Move to include/linux
 
 samsung-laptop:
  -  Use backlight power constants
 
 serial-multi-instantiate:
  -  Don't require both I2C and SPI
 
 thinkpad_acpi:
  -  Fix uninitialized symbol 's' warning
  -  Add Thinkpad Edge E531 fan support
 
 touchscreen_dmi:
  -  add nanote-next quirk
 
 trace:
  -  platform/x86/intel/ifs: Add SBAF trace support
 
 wmi:
  -  Call both legacy and WMI driver notify handlers
  -  Merge get_event_data() with wmi_get_notify_data()
  -  Remove wmi_get_event_data()
  -  Pass event data directly to legacy notify handlers
 
 x86-android-tablets:
  -  Adjust Xiaomi Pad 2 bottom bezel touch buttons LED
  -  Fix spelling in the comments
 
 x86/platform/geode:
  -  switch GPIO buttons and LEDs to software properties
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmbq2tYUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9xKYAgAoXZt1MjBDA1mP813i4bj8CYQHWO+
 YnugVhEccucxgC6sBGzQeRLBNuG/VaBN6tyJ1pKYMpWV5gSthq1Iop+DZbno2ciM
 QAnSSzioHB/dhYBXuKmZatkMsKLjLjtfcexUed9DfwKapqFl3XQMb6cEYasM37hH
 197K4yAFF3oqQImlACwQDxN1q3eCG6bdIbEAByZW7yH644IC5zH8/CiFjTCwUx/F
 aFIHQlLLzt1kjhD8AbRHhRcsGbzG2ejHsC3yrQddEJSOkInDO8baR0aDyhBTUFPE
 lztuekFfaJ1Xcyoc/Zf4pi3ab1Djt+Htck3CHLO/xcl0YYMlM5vcs1QlhQ==
 =sAk7
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers updates from Hans de Goede:

 - asus-wmi: Add support for vivobook fan profiles

 - dell-laptop: Add knobs to change battery charge settings

 - lg-laptop: Add operation region support

 - intel-uncore-freq: Add support for efficiency latency control

 - intel/ifs: Add SBAF test support

 - intel/pmc: Ignore all LTRs during suspend

 - platform/surface: Support for arm64 based Surface devices

 - wmi: Pass event data directly to legacy notify handlers

 - x86/platform/geode: switch GPIO buttons and LEDs to software
   properties

 - bunch of small cleanups, fixes, hw-id additions, etc.

* tag 'platform-drivers-x86-v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (65 commits)
  MAINTAINERS: adjust file entry in INTEL MID PLATFORM
  platform/x86: x86-android-tablets: Adjust Xiaomi Pad 2 bottom bezel touch buttons LED
  platform/mellanox: mlxbf-pmc: fix lockdep warning
  platform/x86/amd: pmf: Add quirk for TUF Gaming A14
  platform/x86: touchscreen_dmi: add nanote-next quirk
  platform/x86: asus-wmi: don't fail if platform_profile already registered
  platform/x86: asus-wmi: add debug print in more key places
  platform/x86: intel_scu_wdt: Move intel_scu_wdt.h to x86 subfolder
  platform/x86: intel_scu_ipc: Move intel_scu_ipc.h out of arch/x86/include/asm
  MAINTAINERS: Add Intel MID section
  platform/x86: panasonic-laptop: Add support for programmable buttons
  platform/olpc: Remove redundant null pointer checks in olpc_ec_setup_debugfs()
  platform/x86: intel/pmc: Ignore all LTRs during suspend
  platform/x86: wmi: Call both legacy and WMI driver notify handlers
  platform/x86: wmi: Merge get_event_data() with wmi_get_notify_data()
  platform/x86: wmi: Remove wmi_get_event_data()
  platform/x86: wmi: Pass event data directly to legacy notify handlers
  platform/x86: thinkpad_acpi: Fix uninitialized symbol 's' warning
  platform/x86: x86-android-tablets: Fix spelling in the comments
  platform/x86: ideapad-laptop: Make the scope_guard() clear of its scope
  ...
2024-09-19 09:16:04 +02:00
Linus Torvalds
eec91e22fe IOMMU Updates for Linux v6.12
Including:
 
 	- Core changes:
 	  - Allow ATS on VF when parent device is identity mapped.
 	  - Optimize unmap path on ARM io-pagetable implementation.
 	  - Use of_property_present().
 
 	- ARM-SMMU changes:
 	  - SMMUv2:
 	    - Devicetree binding updates for Qualcomm MMU-500 implementations.
 	    - Extend workarounds for broken Qualcomm hypervisor to avoid
 	      touching features that are not available (e.g. 16KiB page
 	      support, reserved context banks).
 	  - SMMUv3:
 	    - Support for NVIDIA's custom virtual command queue hardware.
 	    - Fix Stage-2 stall configuration and extend tests to cover this
 	      area.
 	    - A bunch of driver cleanups, including simplification of the
 	      master rbtree code.
 	  - Plus minor cleanups and fixes across both drivers.
 
 	- Intel VT-d changes:
 	  - Retire si_domain and convert to use static identity domain.
 	  - Batched IOTLB/dev-IOTLB invalidation.
 	  - Small code refactoring and cleanups.
 
 	- AMD-Vi changes:
 	  - Cleanup and refactoring of io-pagetable code.
 	  - Add parameter to limit the used io-pagesizes.
 	  - Other cleanups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmboAtoACgkQK/BELZcB
 GuNidQ//WOhwVQZmdS6vnU2vu//LwFE7Q7PsRYPW2QhFri1eurKo6jxMNBtgUsXu
 fPTSEBM7/lhagRgb29ycrbOYoavkEnUiIMX7vRsjl9tVkqd/GKNTrMuUC+QPiBYQ
 ASkStmEUW6Zvye4rWyUxiCJIFIA5wm74wSOOQ6X2Wg3WMo51njrj1DK/k2H5JenJ
 RTmIA9Ynef2py38xWDd0UE/psvKrzA5uug4IP0E0v014i36cSEVrH7hjztMfd8Sc
 2dUuJ8eUUtLTo1ffTcmxoTvUBjBzJOzeSQrFfaDZDgyqayt6JoSKeX1DV/nCI8kc
 ftg0pe37Zr3mndgQC7wNyUO1GOmkJl+GpMFyJTG8wpnBc0tr+TDn1o6QERymcRxA
 kn62n4vxxjWoRSKt3di7hNM0Uuwj8/z/cIbDSTNbSov4fDuuz0xppdcA/ewKATv0
 VgmpP5OyIFZXM+mR4Vem2hZQQ3wPOsJAFVWS1ROtYQFgiimrGf+w9et8rEU4pmp5
 Ve4rSmka60NLdE6i1JNqx4sRrRsdJJ55knI77nHrt0TZkbMzA/JG1UT3TbbMJTtd
 v5dviMMOXLpcKQLgqlde8QWOEjT6VUw/fbU640iyzhrWAm8fWDBefrSv6JLhevQ4
 fBajoaej89cd9DkkEJiSTiyGig8QkY3HFaqDo3u5g/sBBrMBZas=
 =1QvI
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
 "Core changes:
   - Allow ATS on VF when parent device is identity mapped
   - Optimize unmap path on ARM io-pagetable implementation
   - Use of_property_present()

  ARM-SMMU changes:
   - SMMUv2:
       - Devicetree binding updates for Qualcomm MMU-500 implementations
       - Extend workarounds for broken Qualcomm hypervisor to avoid
         touching features that are not available (e.g. 16KiB page
         support, reserved context banks)
   - SMMUv3:
       - Support for NVIDIA's custom virtual command queue hardware
       - Fix Stage-2 stall configuration and extend tests to cover this
         area
       - A bunch of driver cleanups, including simplification of the
         master rbtree code
   - Minor cleanups and fixes across both drivers

  Intel VT-d changes:
   - Retire si_domain and convert to use static identity domain
   - Batched IOTLB/dev-IOTLB invalidation
   - Small code refactoring and cleanups

  AMD-Vi changes:
   - Cleanup and refactoring of io-pagetable code
   - Add parameter to limit the used io-pagesizes
   - Other cleanups and fixes"

* tag 'iommu-updates-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (77 commits)
  dt-bindings: arm-smmu: Add compatible for QCS8300 SoC
  iommu/amd: Test for PAGING domains before freeing a domain
  iommu/amd: Fix argument order in amd_iommu_dev_flush_pasid_all()
  iommu/amd: Add kernel parameters to limit V1 page-sizes
  iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg
  iommu/arm-smmu-v3: Add types for each level of the CD table
  iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
  iommu/arm-smmu-v3: Do not use devm for the cd table allocations
  iommu/arm-smmu-v3: Remove strtab_base/cfg
  iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg
  iommu/arm-smmu-v3: Add types for each level of the 2 level stream table
  iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()
  iommu/arm-smmu-qcom: apply num_context_bank fixes for SDM630 / SDM660
  iommu/arm-smmu-v3: Use the new rb tree helpers
  dt-bindings: arm-smmu: document the support on SA8255p
  iommu/tegra241-cmdqv: Do not allocate vcmdq until dma_set_mask_and_coherent
  iommu/tegra241-cmdqv: Drop static at local variable
  iommu/tegra241-cmdqv: Fix ioremap() error handling in probe()
  iommu/amd: Do not set the D bit on AMD v2 table entries
  iommu/amd: Correct the reported page sizes from the V1 table
  ...
2024-09-18 12:45:52 +02:00
Linus Torvalds
7c9026b2b0 pstore updates for v6.12-rc1
- ramoops: Fix .rst typo (Steven Rostedt)
 
 - pstore: replace spinlock_t by raw_spinlock_t (Wen Yang)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZufrKwAKCRA2KwveOeQk
 u0PQAQDtABDUAMyV9kgAG3nU9L0En+p8s+rF+9MJLDmFEnmjkQD8C7ky6cW74/3R
 b56tAsSsuFMuHdx4AGGD7rAWrnVUswc=
 =HNdN
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:

 - ramoops: Fix .rst typo (Steven Rostedt)

 - pstore: replace spinlock_t by raw_spinlock_t (Wen Yang)

* tag 'pstore-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: replace spinlock_t by raw_spinlock_t
  pstore/ramoops: Fix typo as there is no "reserver"
2024-09-18 11:47:03 +02:00
Linus Torvalds
067610ebaa RCU pull request for v6.12
This pull request contains the following branches:
 
 context_tracking.15.08.24a: Rename context tracking state related
         symbols and remove references to "dynticks" in various context
         tracking state variables and related helpers; force
         context_tracking_enabled_this_cpu() to be inlined to avoid
         leaving a noinstr section.
 
 csd.lock.15.08.24a: Enhance CSD-lock diagnostic reports; add an API
         to provide an indication of ongoing CSD-lock stall.
 
 nocb.09.09.24a: Update and simplify RCU nocb code to handle
         (de-)offloading of callbacks only for offline CPUs; fix RT
         throttling hrtimer being armed from offline CPU.
 
 rcutorture.14.08.24a: Remove redundant rcu_torture_ops get_gp_completed
         fields; add SRCU ->same_gp_state and ->get_comp_state
         functions; add generic test for NUM_ACTIVE_*RCU_POLL* for
         testing RCU and SRCU polled grace periods; add CFcommon.arch
         for arch-specific Kconfig options; print number of update types
         in rcu_torture_write_types();
         add rcutree.nohz_full_patience_delay testing to the TREE07
         scenario; add a stall_cpu_repeat module parameter to test
         repeated CPU stalls; add argument to limit number of CPUs a
         guest OS can use in torture.sh;
 
 rcustall.09.09.24a: Abbreviate RCU CPU stall warnings during CSD-lock
         stalls; Allow dump_cpu_task() to be called without disabling
         preemption; defer printing stall-warning backtrace when holding
         rcu_node lock.
 
 srcu.12.08.24a: Make SRCU gp seq wrap-around faster; add KCSAN checks
         for concurrent updates to ->srcu_n_exp_nodelay and
         ->reschedule_count which are used in heuristics governing
         auto-expediting of normal SRCU grace periods and
         grace-period-state-machine delays; mark idle SRCU-barrier
         callbacks to help identify stuck SRCU-barrier callback.
 
 rcu.tasks.14.08.24a: Remove RCU Tasks Rude asynchronous APIs as they
         are no longer used; stop testing RCU Tasks Rude asynchronous
         APIs; fix access to non-existent percpu regions; check
         processor-ID assumptions during chosen CPU calculation for
         callback enqueuing; update description of rtp->tasks_gp_seq
         grace-period sequence number; add rcu_barrier_cb_is_done()
         to identify whether a given rcu_barrier callback is stuck;
         mark idle Tasks-RCU-barrier callbacks; add
         *torture_stats_print() functions to print detailed
         diagnostics for Tasks-RCU variants; capture start time of
         rcu_barrier_tasks*() operation to help distinguish a hung
         barrier operation from a long series of barrier operations.
 
 rcu_scaling_tests.15.08.24a:
         refscale: Add a TINY scenario to support tests of Tiny RCU
         and Tiny SRCU; Optimize process_durations() operation;
 
         rcuscale: Dump stacks of stalled rcu_scale_writer() instances;
         dump grace-period statistics when rcu_scale_writer() stalls;
         mark idle RCU-barrier callbacks to identify stuck RCU-barrier
         callbacks; print detailed grace-period and barrier diagnostics
         on rcu_scale_writer() hangs for Tasks-RCU variants; warn if
         async module parameter is specified for RCU implementations
         that do not have async primitives such as RCU Tasks Rude;
         make all writer tasks report upon hang; tolerate repeated
         GFP_KERNEL failure in rcu_scale_writer(); use special allocator
         for rcu_scale_writer(); NULL out top-level pointers to heap
         memory to avoid double-free bugs on modprobe failures; maintain
         per-task instead of per-CPU callbacks count to avoid any issues
         with migration of either tasks or callbacks; constify struct
         ref_scale_ops.
 
 fixes.12.08.24a: Use system_unbound_wq for kfree_rcu work to avoid
         disturbing isolated CPUs.
 
 misc.11.08.24a: Warn on unexpected rcu_state.srs_done_tail state;
         Better define "atomic" for list_replace_rcu() and
         hlist_replace_rcu() routines; annotate struct
         kvfree_rcu_bulk_data with __counted_by().
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCZt8+8wAKCRAAHS7/6Z0w
 pTqoAPwPN//tlEoJx2PRs6t0q+nD1YNvnZawPaRmdzgdM8zJogD+PiSN+XhqRr80
 jzyvMDU4Aa0wjUNP3XsCoaCxo7L/lQk=
 =bZ9z
 -----END PGP SIGNATURE-----

Merge tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Neeraj Upadhyay:
 "Context tracking:
   - rename context tracking state related symbols and remove references
     to "dynticks" in various context tracking state variables and
     related helpers
   - force context_tracking_enabled_this_cpu() to be inlined to avoid
     leaving a noinstr section

  CSD lock:
   - enhance CSD-lock diagnostic reports
   - add an API to provide an indication of ongoing CSD-lock stall

  nocb:
   - update and simplify RCU nocb code to handle (de-)offloading of
     callbacks only for offline CPUs
   - fix RT throttling hrtimer being armed from offline CPU

  rcutorture:
   - remove redundant rcu_torture_ops get_gp_completed fields
   - add SRCU ->same_gp_state and ->get_comp_state functions
   - add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU
     polled grace periods
   - add CFcommon.arch for arch-specific Kconfig options
   - print number of update types in rcu_torture_write_types()
   - add rcutree.nohz_full_patience_delay testing to the TREE07 scenario
   - add a stall_cpu_repeat module parameter to test repeated CPU stalls
   - add argument to limit number of CPUs a guest OS can use in
     torture.sh

  rcustall:
   - abbreviate RCU CPU stall warnings during CSD-lock stalls
   - Allow dump_cpu_task() to be called without disabling preemption
   - defer printing stall-warning backtrace when holding rcu_node lock

  srcu:
   - make SRCU gp seq wrap-around faster
   - add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and
     ->reschedule_count which are used in heuristics governing
     auto-expediting of normal SRCU grace periods and
     grace-period-state-machine delays
   - mark idle SRCU-barrier callbacks to help identify stuck
     SRCU-barrier callback

  rcu tasks:
   - remove RCU Tasks Rude asynchronous APIs as they are no longer used
   - stop testing RCU Tasks Rude asynchronous APIs
   - fix access to non-existent percpu regions
   - check processor-ID assumptions during chosen CPU calculation for
     callback enqueuing
   - update description of rtp->tasks_gp_seq grace-period sequence
     number
   - add rcu_barrier_cb_is_done() to identify whether a given
     rcu_barrier callback is stuck
   - mark idle Tasks-RCU-barrier callbacks
   - add *torture_stats_print() functions to print detailed diagnostics
     for Tasks-RCU variants
   - capture start time of rcu_barrier_tasks*() operation to help
     distinguish a hung barrier operation from a long series of barrier
     operations

  refscale:
   - add a TINY scenario to support tests of Tiny RCU and Tiny
     SRCU
   - optimize process_durations() operation

  rcuscale:
   - dump stacks of stalled rcu_scale_writer() instances and
     grace-period statistics when rcu_scale_writer() stalls
   - mark idle RCU-barrier callbacks to identify stuck RCU-barrier
     callbacks
   - print detailed grace-period and barrier diagnostics on
     rcu_scale_writer() hangs for Tasks-RCU variants
   - warn if async module parameter is specified for RCU implementations
     that do not have async primitives such as RCU Tasks Rude
   - make all writer tasks report upon hang
   - tolerate repeated GFP_KERNEL failure in rcu_scale_writer()
   - use special allocator for rcu_scale_writer()
   - NULL out top-level pointers to heap memory to avoid double-free
     bugs on modprobe failures
   - maintain per-task instead of per-CPU callbacks count to avoid any
     issues with migration of either tasks or callbacks
   - constify struct ref_scale_ops

  Fixes:
   - use system_unbound_wq for kfree_rcu work to avoid disturbing
     isolated CPUs

  Misc:
   - warn on unexpected rcu_state.srs_done_tail state
   - better define "atomic" for list_replace_rcu() and
     hlist_replace_rcu() routines
   - annotate struct kvfree_rcu_bulk_data with __counted_by()"

* tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits)
  rcu: Defer printing stall-warning backtrace when holding rcu_node lock
  rcu/nocb: Remove superfluous memory barrier after bypass enqueue
  rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
  rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
  rcu/nocb: Simplify (de-)offloading state machine
  context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline
  context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching
  rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}()
  rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
  rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck()
  rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save()
  rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
  rcu: Rename struct rcu_data .dynticks_snap into .watching_snap
  rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()
  rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
  rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
  rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online()
  context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu()
  context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*()
  refscale: Constify struct ref_scale_ops
  ...
2024-09-18 07:52:24 +02:00
Linus Torvalds
85a77db95a workqueue: Changes for v6.12
Nothing major:
 
 - workqueue.panic_on_stall boot param added.
 
 - alloc_workqueue_lockdep_map() added (used by DRM).
 
 - Other cleanusp and doc updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZuN3gQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGR1hAP0XObdExeNsVWe1JUUUX061+H+aA6aVffb9+J/t
 b32u3QEAsn+oNWzuvzlGlSQKQMpPk+dT2na0Q0yZNxkNEzUiEQQ=
 =TeDS
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:
 "Nothing major:

   - workqueue.panic_on_stall boot param added

   - alloc_workqueue_lockdep_map() added (used by DRM)

   - Other cleanusp and doc updates"

* tag 'wq-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  kernel/workqueue.c: fix DEFINE_PER_CPU_SHARED_ALIGNED expansion
  workqueue: Fix another htmldocs build warning
  workqueue: fix null-ptr-deref on __alloc_workqueue() error
  workqueue: Don't call va_start / va_end twice
  workqueue: Fix htmldocs build warning
  workqueue: Add interface for user-defined workqueue lockdep map
  workqueue: Change workqueue lockdep map to pointer
  workqueue: Split alloc_workqueue into internal function and lockdep init
  Documentation: kernel-parameters: add workqueue.panic_on_stall
  workqueue: add cmdline parameter workqueue.panic_on_stall
2024-09-18 06:59:44 +02:00
Linus Torvalds
78567e2bc7 cgroup: Changes for v6.12
- cpuset isolation improvements.
 
 - cpuset cgroup1 support is split into its own file behind the new config
   option CONFIG_CPUSET_V1. This makes it the second controller which makes
   cgroup1 support optional after memcg.
 
 - Handling of unavailable v1 controller handling improved during cgroup1
   mount operations.
 
 - union_find applied to cpuset. It makes code simpler and more efficient.
 
 - Reduce spurious events in pids.events.
 
 - Cleanups and other misc changes.
 
 - Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes that
   further changes build upon.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZuNU3Q4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGdMsAP9yqPxu//LiJ3lPWhKcVVKtdwrA3AYDLE81VSJO
 5VZJhAD+Ic+Ly/jZjDtjjQpZ1U3JsBpBRcVBqzeH0gD7eXaJgwk=
 =h/+c
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - cpuset isolation improvements

 - cpuset cgroup1 support is split into its own file behind the new
   config option CONFIG_CPUSET_V1. This makes it the second controller
   which makes cgroup1 support optional after memcg

 - Handling of unavailable v1 controller handling improved during
   cgroup1 mount operations

 - union_find applied to cpuset. It makes code simpler and more
   efficient

 - Reduce spurious events in pids.events

 - Cleanups and other misc changes

 - Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes
   that further changes build upon

* tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (34 commits)
  cgroup: Do not report unavailable v1 controllers in /proc/cgroups
  cgroup: Disallow mounting v1 hierarchies without controller implementation
  cgroup/cpuset: Expose cpuset filesystem with cpuset v1 only
  cgroup/cpuset: Move cpu.h include to cpuset-internal.h
  cgroup/cpuset: add sefltest for cpuset v1
  cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1
  cgroup/cpuset: rename functions shared between v1 and v2
  cgroup/cpuset: move v1 interfaces to cpuset-v1.c
  cgroup/cpuset: move validate_change_legacy to cpuset-v1.c
  cgroup/cpuset: move legacy hotplug update to cpuset-v1.c
  cgroup/cpuset: add callback_lock helper
  cgroup/cpuset: move memory_spread to cpuset-v1.c
  cgroup/cpuset: move relax_domain_level to cpuset-v1.c
  cgroup/cpuset: move memory_pressure to cpuset-v1.c
  cgroup/cpuset: move common code to cpuset-internal.h
  cgroup/cpuset: introduce cpuset-v1.c
  selftest/cgroup: Make test_cpuset_prs.sh deal with pre-isolated CPUs
  cgroup/cpuset: Account for boot time isolated CPUs
  cgroup/cpuset: remove use_parent_ecpus of cpuset
  cgroup/cpuset: remove fetch_xcpus
  ...
2024-09-18 06:39:03 +02:00
Paolo Bonzini
c09dd2bb57 Merge branch 'kvm-redo-enable-virt' into HEAD
Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.

The primary motivation for this series is to simplify dealing with enabling
virtualization for Intel's TDX, which needs to enable virtualization
when kvm-intel.ko is loaded, i.e. long before the first VM is created.

That said, this is a nice cleanup on its own.  By registering the callbacks
on-demand, the callbacks themselves don't need to check kvm_usage_count,
because their very existence implies a non-zero count.

Patch 1 (re)adds a dedicated lock for kvm_usage_count.  This avoids a
lock ordering issue between cpus_read_lock() and kvm_lock.  The lock
ordering issue still exist in very rare cases, and will be fixed for
good by switching vm_list to an (S)RCU-protected list.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-17 11:38:20 -04:00
Linus Torvalds
d58db3f3a0 Another relatively mundane cycle for docs:
- The beginning of an EEVDF scheduler document
 
 - More Chinese translations
 
 - A rethrashing of our bisection documentation
 
 ...plus the usual array of smaller fixes, and more than the usual number of
 typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmboMnkACgkQF0NaE2wM
 flha/Qf/e8zRinIYQJ7BmombNm39w3wUiNuXr8SWq7afqhsAJJzmOZ3oyyfssL+B
 a1pSjhxb15UrKf1kMKhdBxhDndXvto5UekJRBY5gsTvcBMBmtIovN+ZK5Z5jObsw
 gzHD9of08Ti7N4C2dSBdLPHtvIBX0rVeEK4oAH7AUaQviu1cfTaLQQA0dRYsaJeX
 iXsts2NkGl6ZUF7mk4nlzj8+Y1zot+mCd6B53iSimNKxwsPODrCZUobJAvxg1qVU
 pRCQcnpx2fTBnh4ugrcLZbautyhL9bJ8VQzFeoQgYpODDgDnZyTjN6kxv65LpxAz
 dXi+hx5Vk7lP3BbTp9EeGn305/qQPA==
 =JuBw
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.12' of git://git.lwn.net/linux

Pull documentation update from Jonathan Corbet:
 "Another relatively mundane cycle for docs:

   - The beginning of an EEVDF scheduler document

   - More Chinese translations

   - A rethrashing of our bisection documentation

  ...plus the usual array of smaller fixes, and more than the usual
  number of typo fixes"

* tag 'docs-6.12' of git://git.lwn.net/linux: (48 commits)
  Remove duplicate "and" in 'Linux NVMe docs.
  docs:filesystems: fix spelling and grammar mistakes
  docs:filesystem: fix mispelled words on autofs page
  docs:mm: fixed spelling and grammar mistakes on vmalloc kernel stack page
  Documentation: PCI: fix typo in pci.rst
  docs/zh_CN: add the translation of kbuild/gcc-plugins.rst
  docs/process: fix typos
  docs:mm: fix spelling mistakes in heterogeneous memory management page
  accel/qaic: Fix a typo
  docs/zh_CN: update the translation of security-bugs
  docs: block: Fix grammar and spelling mistakes in bfq-iosched.rst
  Documentation: Fix spelling mistakes
  Documentation/gpu: Fix typo in Documentation/gpu/komeda-kms.rst
  scripts: sphinx-pre-install: remove unnecessary double check for $cur_version
  Loongarch: KVM: Add KVM hypercalls documentation for LoongArch
  Documentation: Document the kernel flag bdev_allow_write_mounted
  docs: scheduler: completion: Update member of struct completion
  docs: kerneldoc-preamble.sty: Suppress extra spaces in CJK literal blocks
  docs: submitting-patches: Advertise b4
  docs: update dev-tools/kcsan.rst url about KTSAN
  ...
2024-09-17 16:44:08 +02:00
Linus Torvalds
9ea925c806 Updates for timers and timekeeping:
- Core:
 
 	- Overhaul of posix-timers in preparation of removing the
 	  workaround for periodic timers which have signal delivery
 	  ignored.
 
         - Remove the historical extra jiffie in msleep()
 
 	  msleep() adds an extra jiffie to the timeout value to ensure
 	  minimal sleep time. The timer wheel ensures minimal sleep
 	  time since the large rewrite to a non-cascading wheel, but the
 	  extra jiffie in msleep() remained unnoticed. Remove it.
 
         - Make the timer slack handling correct for realtime tasks.
 
 	  The procfs interface is inconsistent and does neither reflect
 	  reality nor conforms to the man page. Show the correct 0 slack
 	  for real time tasks and enforce it at the core level instead of
 	  having inconsistent individual checks in various timer setup
 	  functions.
 
         - The usual set of updates and enhancements all over the place.
 
   - Drivers:
 
         - Allow the ACPI PM timer to be turned off during suspend
 
 	- No new drivers
 
 	- The usual updates and enhancements in various drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbn7jQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobqnD/9COlU0nwsulABI/aNIrsh6iYvnCC9v
 14CcNta7Qn+157Wfw9BWOyHdNhR1/fPCXE8jJ71zTyIOeW27HV2JyTtxTwe9ZcdK
 ViHAaj7YcIjcVUEC3StCoRCPnvLslEw4qJA5AOQuDyMivdQn+YVa2c0baJxKaXZt
 xk4HZdMj4NAS0jRKnoZSwtKW/+Oz6rR4GAWrZo+Zs1/8ur3HfqnQfi8lJ1hJtLLW
 V7XDCVRvamVi6Ah3ocYPPp/1P6yeQDA1ge9aMddqaza5STWISXRtSnFMUmYP3rbS
 FaL8TyL+ilfny8pkGB2WlG6nLuSbtvogtdEh1gG1k1RmZt44kAtk8ba/KiWFPBSb
 zK9cjojRMBS71f9G4kmb5F4rnXoLsg1YbD1Nzhz3wq2Cs1Z90dc2QwMren0zoQ1x
 Fn56ueRyAiagBlnrSaKyso/2RvqJTNoSdi3RkpjYeAph0UoDCqvTvKjGAf1mWiw1
 T/1lUWSVqWHnzZbM7XXzzajIN9bl6A7bbqlcAJ2O9vZIDt7273DG+bQym9Vh6Why
 0LTGGERHxzKBsG7WRg+2Gmvv6S18UPKRo8tLtlA758rHlFuPTZCShWrIriwSNl1K
 Hxon+d4BparSnm1h9W/NHPKJA574UbWRCBjdk58IkAj8DxZZY4ORD9SMP+ggkV7G
 F6p9cgoDNP9KFg==
 =jE0N
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Core:

   - Overhaul of posix-timers in preparation of removing the workaround
     for periodic timers which have signal delivery ignored.

   - Remove the historical extra jiffie in msleep()

     msleep() adds an extra jiffie to the timeout value to ensure
     minimal sleep time. The timer wheel ensures minimal sleep time
     since the large rewrite to a non-cascading wheel, but the extra
     jiffie in msleep() remained unnoticed. Remove it.

   - Make the timer slack handling correct for realtime tasks.

     The procfs interface is inconsistent and does neither reflect
     reality nor conforms to the man page. Show the correct 0 slack for
     real time tasks and enforce it at the core level instead of having
     inconsistent individual checks in various timer setup functions.

   - The usual set of updates and enhancements all over the place.

  Drivers:

   - Allow the ACPI PM timer to be turned off during suspend

   - No new drivers

   - The usual updates and enhancements in various drivers"

* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  ntp: Make sure RTC is synchronized when time goes backwards
  treewide: Fix wrong singular form of jiffies in comments
  cpu: Use already existing usleep_range()
  timers: Rename next_expiry_recalc() to be unique
  platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
  clocksource/drivers/jcore: Use request_percpu_irq()
  clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
  clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
  clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
  clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
  platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
  clocksource: acpi_pm: Add external callback for suspend/resume
  clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
  dt-bindings: timer: rockchip: Add rk3576 compatible
  timers: Annotate possible non critical data race of next_expiry
  timers: Remove historical extra jiffie for timeout in msleep()
  hrtimer: Use and report correct timerslack values for realtime tasks
  hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
  timers: Add sparse annotation for timer_sync_wait_running().
  signal: Replace BUG_ON()s
  ...
2024-09-17 07:25:37 +02:00
Linus Torvalds
a430d95c5e lsm/stable-6.12 PR 20240911
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbiGGAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPU8BAA1+A15pmS34I9pq7c8TmRz3rNEs/a
 zrW1aWJ0X/+axNS7sW3Pwtt1EKuaOhskKU8gNSieRhljC8rgXIVjZzLw6Atgcr5k
 upulGbU9TXyVisYN+PWv9/84ito6/nYsKb7Mg3nUVsdodtIFVnsk1fxYLPHQEBig
 Pl3i26U3VqH93Kz0W5vs/QR2uduPB8ZyscdTgcbrY9Vv1Y7IDZ2g9QsJVKLvbQKL
 qcPK1JkHa+sBPJxDqS9A40zgbLbdPQgWQzsXX3dz822w1Ga7FIHSqxMBA6HwHZ+L
 kV4P58wVfavhwt/cQSKMWI/yiGPMMd0B6yD+m8ojOvGfOfRCWxGMmEMqHNuZ3m7k
 Bfll5ZgZTY8phUUhiNf3nxO3F3MM/5bHdhPOj3RReqbAbS6uWr4/fThPDYY/zIo6
 NCY3HGxx3Ae64uQ01gC2p/czC50jDsMwlbXiZbrgdBhjBm/CVk5ozb80mLVcGrLB
 +6XMzzSbC8IaNAH2fDmUJ2ABdwyNPgsSOTGZVzIanpxu1SU2/yk3SMxkp8fv5s36
 wLeODUVcLgsjVV538Mkm6PGTE4TlXaH9yi6apMyJAGp0vPYx5c3Xxk2y5A5cur5p
 hcrbDiX2QgeqFbwsz36incmPmbef2NU2c8feR8XLtPJuwNIeRcMSje0pnkaFlRmb
 TAUJ1sDQAzZ8Fy0=
 =HIAO
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Move the LSM framework to static calls

   This transitions the vast majority of the LSM callbacks into static
   calls. Those callbacks which haven't been converted were left as-is
   due to the general ugliness of the changes required to support the
   static call conversion; we can revisit those callbacks at a future
   date.

 - Add the Integrity Policy Enforcement (IPE) LSM

   This adds a new LSM, Integrity Policy Enforcement (IPE). There is
   plenty of documentation about IPE in this patches, so I'll refrain
   from going into too much detail here, but the basic motivation behind
   IPE is to provide a mechanism such that administrators can restrict
   execution to only those binaries which come from integrity protected
   storage, e.g. a dm-verity protected filesystem. You will notice that
   IPE requires additional LSM hooks in the initramfs, dm-verity, and
   fs-verity code, with the associated patches carrying ACK/review tags
   from the associated maintainers. We couldn't find an obvious
   maintainer for the initramfs code, but the IPE patchset has been
   widely posted over several years.

   Both Deven Bowers and Fan Wu have contributed to IPE's development
   over the past several years, with Fan Wu agreeing to serve as the IPE
   maintainer moving forward. Once IPE is accepted into your tree, I'll
   start working with Fan to ensure he has the necessary accounts, keys,
   etc. so that he can start submitting IPE pull requests to you
   directly during the next merge window.

 - Move the lifecycle management of the LSM blobs to the LSM framework

   Management of the LSM blobs (the LSM state buffers attached to
   various kernel structs, typically via a void pointer named "security"
   or similar) has been mixed, some blobs were allocated/managed by
   individual LSMs, others were managed by the LSM framework itself.

   Starting with this pull we move management of all the LSM blobs,
   minus the XFRM blob, into the framework itself, improving consistency
   across LSMs, and reducing the amount of duplicated code across LSMs.
   Due to some additional work required to migrate the XFRM blob, it has
   been left as a todo item for a later date; from a practical
   standpoint this omission should have little impact as only SELinux
   provides a XFRM LSM implementation.

 - Fix problems with the LSM's handling of F_SETOWN

   The LSM hook for the fcntl(F_SETOWN) operation had a couple of
   problems: it was racy with itself, and it was disconnected from the
   associated DAC related logic in such a way that the LSM state could
   be updated in cases where the DAC state would not. We fix both of
   these problems by moving the security_file_set_fowner() hook into the
   same section of code where the DAC attributes are updated. Not only
   does this resolve the DAC/LSM synchronization issue, but as that code
   block is protected by a lock, it also resolve the race condition.

 - Fix potential problems with the security_inode_free() LSM hook

   Due to use of RCU to protect inodes and the placement of the LSM hook
   associated with freeing the inode, there is a bit of a challenge when
   it comes to managing any LSM state associated with an inode. The VFS
   folks are not open to relocating the LSM hook so we have to get
   creative when it comes to releasing an inode's LSM state.
   Traditionally we have used a single LSM callback within the hook that
   is triggered when the inode is "marked for death", but not actually
   released due to RCU.

   Unfortunately, this causes problems for LSMs which want to take an
   action when the inode's associated LSM state is actually released; so
   we add an additional LSM callback, inode_free_security_rcu(), that is
   called when the inode's LSM state is released in the RCU free
   callback.

 - Refactor two LSM hooks to better fit the LSM return value patterns

   The vast majority of the LSM hooks follow the "return 0 on success,
   negative values on failure" pattern, however, there are a small
   handful that have unique return value behaviors which has caused
   confusion in the past and makes it difficult for the BPF verifier to
   properly vet BPF LSM programs. This includes patches to
   convert two of these"special" LSM hooks to the common 0/-ERRNO pattern.

 - Various cleanups and improvements

   A handful of patches to remove redundant code, better leverage the
   IS_ERR_OR_NULL() helper, add missing "static" markings, and do some
   minor style fixups.

* tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits)
  security: Update file_set_fowner documentation
  fs: Fix file_set_fowner LSM hook inconsistencies
  lsm: Use IS_ERR_OR_NULL() helper function
  lsm: remove LSM_COUNT and LSM_CONFIG_COUNT
  ipe: Remove duplicated include in ipe.c
  lsm: replace indirect LSM hook calls with static calls
  lsm: count the LSMs enabled at compile time
  kernel: Add helper macros for loop unrolling
  init/main.c: Initialize early LSMs after arch code, static keys and calls.
  MAINTAINERS: add IPE entry with Fan Wu as maintainer
  documentation: add IPE documentation
  ipe: kunit test for parser
  scripts: add boot policy generation program
  ipe: enable support for fs-verity as a trust provider
  fsverity: expose verified fsverity built-in signatures to LSMs
  lsm: add security_inode_setintegrity() hook
  ipe: add support for dm-verity as a trust provider
  dm-verity: expose root hash digest and signature data to LSMs
  block,lsm: add LSM blob and new LSM hooks for block devices
  ipe: add permissive toggle
  ...
2024-09-16 18:19:47 +02:00
Linus Torvalds
e8fc317dfc vfs-6.12.procfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEwAAKCRCRxhvAZXjc
 onI2AQDXa5XhIx0VpLWE9uVImVy3QuUKc/5pI1e1DKMgxLhKCgEAh15a4ETqmVaw
 Zp3ZSzoLD8Ez1WwWb6cWQuHFYRSjtwU=
 =+LKG
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull procfs updates from Christian Brauner:
 "This contains the following changes for procfs:

   - Add config options and parameters to block forcing memory writes.

     This adds a Kconfig option and boot param to allow removing the
     FOLL_FORCE flag from /proc/<pid>/mem write calls as this can be
     used in various attacks.

     The traditional forcing behavior is kept as default because it can
     break GDB and some other use cases.

     This is the simpler version that you had requested.

   - Restrict overmounting of ephemeral entities.

     It is currently possible to mount on top of various ephemeral
     entities in procfs. This specifically includes magic links. To
     recap, magic links are links of the form /proc/<pid>/fd/<nr>. They
     serve as references to a target file and during path lookup they
     cause a jump to the target path. Such magic links disappear if the
     corresponding file descriptor is closed.

     Currently it is possible to overmount such magic links. This is
     mostly interesting for an attacker that wants to somehow trick a
     process into e.g., reopening something that it didn't intend to
     reopen or to hide a malicious file descriptor.

     But also it risks leaking mounts for long-running processes. When
     overmounting a magic link like above, the mount will not be
     detached when the file descriptor is closed. Only the target
     mountpoint will disappear. Which has the consequence of making it
     impossible to unmount that mount afterwards. So the mount will
     stick around until the process exits and the /proc/<pid>/ directory
     is cleaned up during proc_flush_pid() when the dentries are pruned
     and invalidated.

     That in turn means it's possible for a program to accidentally leak
     mounts and it's also possible to make a task leak mounts without
     it's knowledge if the attacker just keeps overmounting things under
     /proc/<pid>/fd/<nr>.

     Disallow overmounting of such ephemeral entities.

   - Cleanup the readdir method naming in some procfs file operations.

   - Replace kmalloc() and strcpy() with a simple kmemdup() call"

* tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  proc: fold kmalloc() + strcpy() into kmemdup()
  proc: block mounting on top of /proc/<pid>/fdinfo/*
  proc: block mounting on top of /proc/<pid>/fd/*
  proc: block mounting on top of /proc/<pid>/map_files/*
  proc: add proc_splice_unmountable()
  proc: proc_readfdinfo() -> proc_fdinfo_iterate()
  proc: proc_readfd() -> proc_fd_iterate()
  proc: add config & param to block forcing mem writes
2024-09-16 09:36:59 +02:00
Linus Torvalds
02824a5fd1 Power management updates for 6.12-rc1
- Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef).
 
  - Add support for Granite Rapids and Sierra Forest in OOB mode to the
    intel_pstate cpufreq driver (Srinivas Pandruvada).
 
  - Add basic support for CPU capacity scaling on x86 and make the
    intel_pstate driver set asymmetric CPU capacity on hybrid systems
    without SMT (Rafael Wysocki).
 
  - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
    driver (Jeff Johnson).
 
  - Several OF related cleanups in cpufreq drivers (Rob Herring).
 
  - Enable COMPILE_TEST for ARM drivers (Rob Herrring).
 
  - Introduce quirks for syscon failures and use socinfo to get revision
    for TI cpufreq driver (Dhruva Gole, Nishanth Menon).
 
  - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
    Ugwekar).
 
  - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
    (Danila Tikhonov, Huacai Chen, and Liu Jing).
 
  - Make amd-pstate validate return of any attempt to update EPP limits,
    which fixes the masking hardware problems (Mario Limonciello).
 
  - Move the calculation of the AMD boost numerator outside of amd-pstate,
    correcting acpi-cpufreq on systems with preferred cores (Mario
    Limonciello).
 
  - Harden preferred core detection in amd-pstate to avoid potential
    false positives (Mario Limonciello).
 
  - Add extra unit test coverage for mode state machine (Mario
    Limonciello).
 
  - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu).
 
  - Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy).
 
  - Disable promotion to C1E on Jasper Lake and Elkhart Lake in
    intel_idle (Kai-Heng Feng).
 
  - Use scoped device node handling to fix missing of_node_put() and
    simplify walking OF children in the riscv-sbi cpuidle driver (Krzysztof
    Kozlowski).
 
  - Remove dead code from cpuidle_enter_state() (Dhruva Gole).
 
  - Change an error pointer to NULL to fix error handling in the
    intel_rapl power capping driver (Dan Carpenter).
 
  - Fix off by one in get_rpi() in the intel_rapl power capping
    driver (Dan Carpenter).
 
  - Add support for ArrowLake-U to the intel_rapl power capping
    driver (Sumeet Pawnikar).
 
  - Fix the energy-pkg event for AMD CPUs in the intel_rapl power capping
    driver (Dhananjay Ugwekar).
 
  - Add support for AMD family 1Ah processors to the intel_rapl power
    capping driver (Dhananjay Ugwekar).
 
  - Remove unused stub for saveable_highmem_page() and remove deprecated
    macros from power management documentation (Andy Shevchenko).
 
  - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
    sysfs interface (Xueqin Luo).
 
  - Update the maintainers information for the operating-points-v2-ti-cpu DT
    binding (Dhruva Gole).
 
  - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring).
 
  - Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff
    Johnson).
 
  - Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand
    Moon).
 
  - Use of_property_present() instead of of_get_property() in the imx-bus
    devfreq driver (Rob Herring).
 
  - Update directory handling and installation process in the pm-graph
    Makefile and add .gitignore to ignore sleepgraph.py artifacts to
    pm-graph (Amit Vadhavana, Yo-Jung Lin).
 
  - Make cpupower display residency value in idle-info (Aboorva
    Devarajan).
 
  - Add missing powercap_set_enabled() stub function to cpupower (John
    B. Wyatt IV).
 
  - Add SWIG support to cpupower (John B. Wyatt IV).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmbjKEQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx8g8P/1RqL6NuCxH4eobwZigeyBS6/sLHPmKo
 wqHcerZsU7EH8DOlmBU0SH1Br2WBQAbaP8d1ukT5qkGBrZ+IM/A2ipZct0yAHH2D
 aBKwg7V3LvXo2mPuLve0knpM6W7zibPHJJlcjh8DmGQJabhWO7jr+p/0eS4JE2ek
 iE5FCXTxhvbcNJ9yWSt7+3HHmvj74P81As7txysLSzhWSZDcqXb0XJRgVJnWDt+x
 OyTAMEEAY2BuqmijHzqxxHcA1fxOBK/pa9yfPdKP7ePynLnpP7xd9A5oLbXQ4BL9
 PHqpD06ZBdSMQzKkyCODypZt8PL+FcEALE4u9chV/nzVwp7TrtDneXWA7RA0GXgq
 mp9hm51GmdptRayePR3s4TCA6a2BUw3Ue4fgs6XF/bexNpc3nx0wtP8HEevcuy8q
 Z7XQkpqW942vOohfoN42JwTjfDJhYTwSH3dcIY8UghHtzwZ5YKV1M4f97kNR7V2i
 QLJvaGJ5yTTcaHndkpc4EKknPyLRaWPh8h/yVmMRBcAaGBWaImul3a5NI07f0wLM
 LTenlpEcls7WSu9n3uvFXvT7nSS2CBV0huTbg449X4T2J0T6EooYsVuHNsFMNFLy
 Xm3lUtdm5QjAXFf+azOCO+26XQt8wObC0ttZtCC2j1b8D+9Riuwh5QHLr99rRTzn
 7Ic4U5Lkimzx
 =JM+K
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "By the number of new lines of code, the most visible change here is
  the addition of hybrid CPU capacity scaling support to the
  intel_pstate driver. Next are the amd-pstate driver changes related to
  the calculation of the AMD boost numerator and preferred core
  detection.

  As far as new hardware support is concerned, the intel_idle driver
  will now handle Granite Rapids Xeon processors natively, the
  intel_rapl power capping driver will recognize family 1Ah of AMD
  processors and Intel ArrowLake-U chipos, and intel_pstate will handle
  Granite Rapids and Sierra Forest chips in the out-of-band (OOB) mode.

  Apart from the above, there is a usual collection of assorted fixes
  and code cleanups in many places and there are tooling updates.

  Specifics:

   - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef)

   - Add support for Granite Rapids and Sierra Forest in OOB mode to the
     intel_pstate cpufreq driver (Srinivas Pandruvada)

   - Add basic support for CPU capacity scaling on x86 and make the
     intel_pstate driver set asymmetric CPU capacity on hybrid systems
     without SMT (Rafael Wysocki)

   - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
     driver (Jeff Johnson)

   - Several OF related cleanups in cpufreq drivers (Rob Herring)

   - Enable COMPILE_TEST for ARM drivers (Rob Herrring)

   - Introduce quirks for syscon failures and use socinfo to get
     revision for TI cpufreq driver (Dhruva Gole, Nishanth Menon)

   - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
     Ugwekar)

   - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
     (Danila Tikhonov, Huacai Chen, and Liu Jing)

   - Make amd-pstate validate return of any attempt to update EPP
     limits, which fixes the masking hardware problems (Mario
     Limonciello)

   - Move the calculation of the AMD boost numerator outside of
     amd-pstate, correcting acpi-cpufreq on systems with preferred cores
     (Mario Limonciello)

   - Harden preferred core detection in amd-pstate to avoid potential
     false positives (Mario Limonciello)

   - Add extra unit test coverage for mode state machine (Mario
     Limonciello)

   - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang
     Liu)

   - Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy)

   - Disable promotion to C1E on Jasper Lake and Elkhart Lake in
     intel_idle (Kai-Heng Feng)

   - Use scoped device node handling to fix missing of_node_put() and
     simplify walking OF children in the riscv-sbi cpuidle driver
     (Krzysztof Kozlowski)

   - Remove dead code from cpuidle_enter_state() (Dhruva Gole)

   - Change an error pointer to NULL to fix error handling in the
     intel_rapl power capping driver (Dan Carpenter)

   - Fix off by one in get_rpi() in the intel_rapl power capping driver
     (Dan Carpenter)

   - Add support for ArrowLake-U to the intel_rapl power capping driver
     (Sumeet Pawnikar)

   - Fix the energy-pkg event for AMD CPUs in the intel_rapl power
     capping driver (Dhananjay Ugwekar)

   - Add support for AMD family 1Ah processors to the intel_rapl power
     capping driver (Dhananjay Ugwekar)

   - Remove unused stub for saveable_highmem_page() and remove
     deprecated macros from power management documentation (Andy
     Shevchenko)

   - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
     sysfs interface (Xueqin Luo)

   - Update the maintainers information for the
     operating-points-v2-ti-cpu DT binding (Dhruva Gole)

   - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring)

   - Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff
     Johnson)

   - Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand
     Moon)

   - Use of_property_present() instead of of_get_property() in the
     imx-bus devfreq driver (Rob Herring)

   - Update directory handling and installation process in the pm-graph
     Makefile and add .gitignore to ignore sleepgraph.py artifacts to
     pm-graph (Amit Vadhavana, Yo-Jung Lin)

   - Make cpupower display residency value in idle-info (Aboorva
     Devarajan)

   - Add missing powercap_set_enabled() stub function to cpupower (John
     B. Wyatt IV)

   - Add SWIG support to cpupower (John B. Wyatt IV)"

* tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
  PM: hibernate: Remove unused stub for saveable_highmem_page()
  pm:cpupower: Add error warning when SWIG is not installed
  MAINTAINERS: Add Maintainers for SWIG Python bindings
  pm:cpupower: Include test_raw_pylibcpupower.py
  pm:cpupower: Add SWIG bindings files for libcpupower
  pm:cpupower: Add missing powercap_set_enabled() stub function
  ...
2024-09-16 07:47:50 +02:00
Linus Torvalds
114143a595 arm64 updates for 6.12
ACPI:
 * Enable PMCG erratum workaround for HiSilicon HIP10 and 11 platforms.
 * Ensure arm64-specific IORT header is covered by MAINTAINERS.
 
 CPU Errata:
 * Enable workaround for hardware access/dirty issue on Ampere-1A cores.
 
 Memory management:
 * Define PHYSMEM_END to fix a crash in the amdgpu driver.
 * Avoid tripping over invalid kernel mappings on the kexec() path.
 * Userspace support for the Permission Overlay Extension (POE) using
   protection keys.
 
 Perf and PMUs:
 * Add support for the "fixed instruction counter" extension in the CPU
   PMU architecture.
 * Extend and fix the event encodings for Apple's M1 CPU PMU.
 * Allow LSM hooks to decide on SPE permissions for physical profiling.
 * Add support for the CMN S3 and NI-700 PMUs.
 
 Confidential Computing:
 * Add support for booting an arm64 kernel as a protected guest under
   Android's "Protected KVM" (pKVM) hypervisor.
 
 Selftests:
 * Fix vector length issues in the SVE/SME sigreturn tests
 * Fix build warning in the ptrace tests.
 
 Timers:
 * Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
   non-determinism arising from the architected counter.
 
 Miscellaneous:
 * Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
   don't succeed.
 * Minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmbkVNEQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNKeIB/9YtbN7JMgsXktM94GP03r3tlFF36Y1S51S
 +zdDZclAVZCTCZN+PaFeAZ/+ah2EQYrY6rtDoHUSEMQdF9kH+ycuIPDTwaJ4Qkam
 QKXMpAgtY/4yf2rX4lhDF8rEvkhLDsu7oGDhqUZQsA33GrMBHfgA3oqpYwlVjvGq
 gkm7olTo9LdWAxkPpnjGrjB6Mv5Dq8dJRhW+0Q5AntI5zx3RdYGJZA9GUSzyYCCt
 FIYOtMmWPkQ0kKxIVxOxAOm/ubhfyCs2sjSfkaa3vtvtt+Yjye1Xd81rFciIbPgP
 QlK/Mes2kBZmjhkeus8guLI5Vi7tx3DQMkNqLXkHAAzOoC4oConE
 =6osL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The highlights are support for Arm's "Permission Overlay Extension"
  using memory protection keys, support for running as a protected guest
  on Android as well as perf support for a bunch of new interconnect
  PMUs.

  Summary:

  ACPI:
   - Enable PMCG erratum workaround for HiSilicon HIP10 and 11
     platforms.
   - Ensure arm64-specific IORT header is covered by MAINTAINERS.

  CPU Errata:
   - Enable workaround for hardware access/dirty issue on Ampere-1A
     cores.

  Memory management:
   - Define PHYSMEM_END to fix a crash in the amdgpu driver.
   - Avoid tripping over invalid kernel mappings on the kexec() path.
   - Userspace support for the Permission Overlay Extension (POE) using
     protection keys.

  Perf and PMUs:
   - Add support for the "fixed instruction counter" extension in the
     CPU PMU architecture.
   - Extend and fix the event encodings for Apple's M1 CPU PMU.
   - Allow LSM hooks to decide on SPE permissions for physical
     profiling.
   - Add support for the CMN S3 and NI-700 PMUs.

  Confidential Computing:
   - Add support for booting an arm64 kernel as a protected guest under
     Android's "Protected KVM" (pKVM) hypervisor.

  Selftests:
   - Fix vector length issues in the SVE/SME sigreturn tests
   - Fix build warning in the ptrace tests.

  Timers:
   - Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
     non-determinism arising from the architected counter.

  Miscellaneous:
   - Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
     don't succeed.
   - Minor fixes and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
  perf: arm-ni: Fix an NULL vs IS_ERR() bug
  arm64: hibernate: Fix warning for cast from restricted gfp_t
  arm64: esr: Define ESR_ELx_EC_* constants as UL
  arm64: pkeys: remove redundant WARN
  perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
  MAINTAINERS: List Arm interconnect PMUs as supported
  perf: Add driver for Arm NI-700 interconnect PMU
  dt-bindings/perf: Add Arm NI-700 PMU
  perf/arm-cmn: Improve format attr printing
  perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
  arm64/mm: use lm_alias() with addresses passed to memblock_free()
  mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags()
  arm64: Expose the end of the linear map in PHYSMEM_END
  arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()
  arm64/mm: Delete __init region from memblock.reserved
  perf/arm-cmn: Support CMN S3
  dt-bindings: perf: arm-cmn: Add CMN S3
  perf/arm-cmn: Refactor DTC PMU register access
  perf/arm-cmn: Make cycle counts less surprising
  perf/arm-cmn: Improve build-time assertion
  ...
2024-09-16 06:55:07 +02:00
Linus Torvalds
963d0d60d6 - Add CONFIG_ option for every hw CPU mitigation. The intent is to support
configurations and scenarios where the mitigations code is irrelevant
 
 - Other small fixlets and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmbfDhUACgkQEsHwGGHe
 VUrF9A//UkVKmIihXXak0GPqFhu8XrWeYlmwLxWe/uIy2hZCLp9L7n4pg0Ikxqz3
 9D9hYk+Jykfu/jsv0sR6LH6OAUTlJi+P0w3x3VeL1sgFPUkwFtOaN2v/t5H3SW5r
 l+VQpdUXPmLH6QbhvT84U6L/OQYr2cjhiYro47uwM9vO/SNao4HcbC/pdBr2dwxM
 KzzA9sEDg3Le391phIhEOIogA1lPNV7KMScg2VjPTqQzEJ3NQVzyYmqjPO70sN9F
 sAuksdF+rnPjc9K/W+qUcvlp8e9lDB8g0oPlyoOeubjXsnZU5YchriPdBbyAl0dJ
 bjpftXIrBj8Vtmh7Tc0Jx2tlMFXNT5FrzcqdD4sviLnhrKEJSkwAoFgIMp5A+tN8
 Kl8MrlABO8I8+zGRQB7TzhwaCC4AxCqUS3UEcYd4CBf5AWqT5i12ijbtIxPtdpG4
 5itngIV4HT8casudpC8i8OTjOTggorMa7Pu/bQULhnZwagH8chlBdoOlKKQVkeVG
 FUi+L/BljL9mASic7NRZI11tk44m9xWWkbbJOPlZaGJw9YzGrxD0YOfhbgcc9iaX
 SOUMVJEhJVJMBISGiBUQDB6r51ee6B8RKJ3ByxzpAbwsUR9cXyfSYfUyE5reQJy9
 3luj/iorL3guYU6EGEAtvbuTLGbKqybrV6zOB/QRXHWyhtUgrUA=
 =GFld
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 hw mitigation updates from Borislav Petkov:

 - Add CONFIG_ option for every hw CPU mitigation. The intent is to
   support configurations and scenarios where the mitigations code is
   irrelevant

 - Other small fixlets and improvements

* tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Fix handling when SRSO mitigation is disabled
  x86/bugs: Add missing NO_SSB flag
  Documentation/srso: Document a method for checking safe RET operates properly
  x86/bugs: Add a separate config for GDS
  x86/bugs: Remove GDS Force Kconfig option
  x86/bugs: Add a separate config for SSB
  x86/bugs: Add a separate config for Spectre V2
  x86/bugs: Add a separate config for SRBDS
  x86/bugs: Add a separate config for Spectre v1
  x86/bugs: Add a separate config for RETBLEED
  x86/bugs: Add a separate config for L1TF
  x86/bugs: Add a separate config for MMIO Stable Data
  x86/bugs: Add a separate config for TAA
  x86/bugs: Add a separate config for MDS
2024-09-16 06:48:38 +02:00
Joerg Roedel
97162f6093 Merge branches 'fixes', 'arm/smmu', 'intel/vt-d', 'amd/amd-vi' and 'core' into next 2024-09-13 12:53:05 +02:00
Rafael J. Wysocki
415dff1c96 Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.12-rc1:

 - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef).

 - Add support for Granite Rapids and Sierra Forest in OOB mode to the
   intel_pstate cpufreq driver (Srinivas Pandruvada).

 - Add basic support for CPU capacity scaling on x86 and make the
   intel_pstate driver set asymmetric CPU capacity on hybrid systems
   without SMT (Rafael Wysocki).

 - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
   driver (Jeff Johnson).

 - Several OF related cleanups in cpufreq drivers (Rob Herring).

 - Enable COMPILE_TEST for ARM drivers (Rob Herrring).

 - Introduce quirks for syscon failures and use socinfo to get revision
   for TI cpufreq driver (Dhruva Gole, Nishanth Menon).

 - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
   Ugwekar).

 - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
   (Danila Tikhonov, Huacai Chen, and Liu Jing).

 - Make amd-pstate validate return of any attempt to update EPP limits,
   which fixes the masking hardware problems (Mario Limonciello).

 - Move the calculation of the AMD boost numerator outside of amd-pstate,
   correcting acpi-cpufreq on systems with preferred cores (Mario
   Limonciello).

 - Harden preferred core detection in amd-pstate to avoid potential
   false positives (Mario Limonciello).

 - Add extra unit test coverage for mode state machine (Mario
   Limonciello).

 - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu).

* pm-cpufreq: (35 commits)
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
  cpufreq/amd-pstate: Catch failures for amd_pstate_epp_update_limit()
  cpufreq: ti-cpufreq: Use socinfo to get revision in AM62 family
  cpufreq: Fix the cacography in powernv-cpufreq.c
  cpufreq: ti-cpufreq: Introduce quirks to handle syscon fails appropriately
  cpufreq: loongson3: Use raw_smp_processor_id() in do_service_request()
  cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value
  ...
2024-09-11 18:25:54 +02:00
Rafael J. Wysocki
9bcf30348f second round of amd-pstate changes for 6.12 (second try):
* Move the calculation of the AMD boost numerator outside of
   amd-pstate, correcting acpi-cpufreq on systems with preferred cores
 * Harden preferred core detection to avoid potential false positives
 * Add extra unit test coverage for mode state machine
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmbhviEaHG1hcmlvLmxp
 bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3AnYqDA//TrvmXcpk1mnVJw3Y7MG0
 /n8dsLpxqVtEf+USnlGR+iRhgSQ/W/Kr7b5a+jmdCwpHChuWHt2FnNgcHLIxDnZC
 vmEJ02/2BCRoPKvcvV4VTh0ATu3O9nqwQiBVWBdNjDy+Dzr0pzA+SQopt1hCIsO2
 mzUodhpiBqYKlMf/i6+aM1gZCGGqoRC40aGqnJsgegb61vl7zIc2ZcbTxUQlyTfv
 t6J73IXLx8+YtrjejBYc7mRHhMQ2hCKy92C/8cNoGocj5faSKsAA3OUDcWq8qX0U
 zK3GGGdW8MLHSbt3VyntstnfiLL7TnzowcjvrMudIWpjC1987GlE9BApbN9VRZ8e
 ARN3Y7/ltjut/1fRB97BwjI9aDpzA0122Qzy4UOcK8o+be1eIr+ihV3Z9EN/snWg
 0L/oq5+rGHvvIzf1BwGhoPSvgBIu7eMIYDcRxKPlEiKsbXrL4DdJC/nXgaZ/HiGO
 eHx1dNy7LFrdnEwVI1frZWC6ZuZcpmOBdhnfU+leVxzB3Z++Qc266rsxKBsc5taZ
 PPV18pxfbbl3iL85KDIbuBUCmA0aY8WEdCKtfXpl7zlB5g0fZQLyYeUbvahK08Sk
 vyQAnPECbX/4v1Vx54Z70GPk0XD2+TXdg8yApnXrmRc36z/SLdprk5hPKbKhZu/r
 iPxFUnvd0HCtjsLrsq/qUiQ=
 =R4HZ
 -----END PGP SIGNATURE-----

Merge tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux

Merge the second round of amd-pstate changes for 6.12 from Mario
Limonciello:

"* Move the calculation of the AMD boost numerator outside of
   amd-pstate, correcting acpi-cpufreq on systems with preferred cores
 * Harden preferred core detection to avoid potential false positives
 * Add extra unit test coverage for mode state machine"

* tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
2024-09-11 18:22:23 +02:00
Mario Limonciello
15a2b764ea amd-pstate: Add missing documentation for amd_pstate_prefcore_ranking
`amd_pstate_prefcore_ranking` reflects the dynamic rankings of a CPU
core based on platform conditions.  Explicitly include it in the
documentation.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Mario Limonciello
b96b82d1af cpufreq: amd-pstate: Add documentation for amd_pstate_hw_prefcore
Explain that the sysfs file represents both preferred core being
enabled by the user and supported by the hardware.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Mario Limonciello
ad4caad58d cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
The special case in amd_pstate_highest_perf_set() is the value used
for calculating the boost numerator.  Merge this into
amd_get_boost_ratio_numerator() and then use that to calculate boost
ratio.

This allows dropping more special casing of the highest perf value.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Thomas Gleixner
2f7eedca6c Merge branch 'linus' into timers/core
To update with the latest fixes.
2024-09-10 13:49:53 +02:00
Joerg Roedel
f0295913c4 iommu/amd: Add kernel parameters to limit V1 page-sizes
Add two new kernel command line parameters to limit the page-sizes
used for v1 page-tables:

	nohugepages     - Limits page-sizes to 4KiB

	v2_pgsizes_only - Limits page-sizes to 4Kib/2Mib/1GiB; The
	                  same as the sizes used with v2 page-tables

This is needed for multiple scenarios. When assigning devices to
SEV-SNP guests the IOMMU page-sizes need to match the sizes in the RMP
table, otherwise the device will not be able to access all shared
memory.

Also, some ATS devices do not work properly with arbitrary IO
page-sizes as supported by AMD-Vi, so limiting the sizes used by the
driver is a suitable workaround.

All-in-all, these parameters are only workarounds until the IOMMU core
and related APIs gather the ability to negotiate the page-sizes in a
better way.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240905072240.253313-1-joro@8bytes.org
2024-09-10 11:48:57 +02:00
Sergey Senozhatsky
e899007a5e zram: support priority parameter in recompression
recompress device attribute supports alg=NAME parameter so that we can
specify only one particular algorithm we want to perform recompression
with.  However, with algo params we now can have several exactly same
secondary algorithms but each with its own params tuning (e.g.  priority 1
configured to use more aggressive level, and priority 2 configured to use
a pre-trained dictionary).  Support priority=NUM parameter so that we can
correctly determine which secondary algorithm we want to use.

Link: https://lkml.kernel.org/r/20240902105656.1383858-25-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:12 -07:00
Sergey Senozhatsky
97ee4842f2 Documentation/zram: add documentation for algorithm parameters
Document brief description of compression algorithms' parameters:
compression level and pre-trained dictionary.

[senozhatsky@chromium.org: trivial fixup]
  Link: https://lkml.kernel.org/r/20240903063722.1603592-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240902105656.1383858-24-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:11 -07:00
Sergey Senozhatsky
4eac932103 zram: introduce algorithm_params device attribute
This attribute is used to setup compression algorithms' parameters, so we
can tweak algorithms' characteristics.  At this point only 'level' is
supported (to be extended in the future).

Each call sets up parameters for one particular algorithm, which should be
specified either by the algorithm's priority or algo name.  This is
expected to be called after corresponding algorithm is selected via
comp_algorithm or recomp_algorithm.

 echo "priority=0 level=1" > /sys/block/zram0/algorithm_params
or
 echo "algo=zstd level=1" > /sys/block/zram0/algorithm_params

Link: https://lkml.kernel.org/r/20240902105656.1383858-16-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:09 -07:00
Sergey Senozhatsky
917a59e81c zram: introduce custom comp backends API
Moving to custom backends implementation gives us ability to have our own
minimalistic and extendable API, and algorithms tunings becomes possible.

The list of compression backends is empty at this point, we will add
backends in the followup patches.

Link: https://lkml.kernel.org/r/20240902105656.1383858-5-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:07 -07:00
Usama Arif
81d3ff3c6f mm: add sysfs entry to disable splitting underused THPs
If disabled, THPs faulted in or collapsed will not be added to
_deferred_list, and therefore won't be considered for splitting under
memory pressure if underused.

Link: https://lkml.kernel.org/r/20240830100438.3623486-7-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Cc: Alexander Zhu <alexlzhu@fb.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuang Zhai <zhais@google.com>
Cc: Shuang Zhai <szhai2@cs.rochester.edu>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:04 -07:00
Usama Arif
dafff3f4c8 mm: split underused THPs
This is an attempt to mitigate the issue of running out of memory when THP
is always enabled.  During runtime whenever a THP is being faulted in
(__do_huge_pmd_anonymous_page) or collapsed by khugepaged
(collapse_huge_page), the THP is added to _deferred_list.  Whenever memory
reclaim happens in linux, the kernel runs the deferred_split shrinker
which goes through the _deferred_list.

If the folio was partially mapped, the shrinker attempts to split it.  If
the folio is not partially mapped, the shrinker checks if the THP was
underused, i.e.  how many of the base 4K pages of the entire THP were
zero-filled.  If this number goes above a certain threshold (decided by
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none), the
shrinker will attempt to split that THP.  Then at remap time, the pages
that were zero-filled are mapped to the shared zeropage, hence saving
memory.

Link: https://lkml.kernel.org/r/20240830100438.3623486-6-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Suggested-by: Rik van Riel <riel@surriel.com>
Co-authored-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexander Zhu <alexlzhu@fb.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuang Zhai <zhais@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Shuang Zhai <szhai2@cs.rochester.edu>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:04 -07:00
SeongJae Park
23a425aab0 Docs/damon: use damonitor GitHub organization instead of awslabs
Patch series "Docs/damon: update GitHub repo URLs and maintainer-profile".

Replace GitHub URLS on DAMON documents for none-kernel parts DAMON repos
with new ones[1] via the first patch.  With following two patches,
wordsmith maitnainer-profile for better readability, and document the
Google clendsar for bi-weekly meetups, respectively.

[1] https://lore.kernel.org/20240813232158.83903-1-sj@kernel.org


This patch (of 3):

GitHub repos for non-kernel parts of DAMON project including 'damo',
'damon-tests' and 'damoos' will be moved[1] from 'awslabs' org to
'damonitor', by 2024-09-05.  Update related URLs in kernel tree.

[1] https://lore.kernel.org/20240813232158.83903-1-sj@kernel.org

Link: https://lkml.kernel.org/r/20240826015741.80707-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240826015741.80707-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:00 -07:00
Barry Song
8175ebfd30 mm: count the number of partially mapped anonymous THPs per size
When a THP is added to the deferred_list due to partially mapped, its
partial pages are unused, leading to wasted memory and potentially
increasing memory reclamation pressure.

Detailing the specifics of how unmapping occurs is quite difficult and not
that useful, so we adopt a simple approach: each time a THP enters the
deferred_list, we increment the count by 1; whenever it leaves for any
reason, we decrement the count by 1.

Link: https://lkml.kernel.org/r/20240824010441.21308-3-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuai Yuan <yuanshuai@oppo.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:38:57 -07:00
Barry Song
5d65c8d758 mm: count the number of anonymous THPs per size
Patch series "mm: count the number of anonymous THPs per size", v4.

Knowing the number of transparent anon THPs in the system is crucial
for performance analysis. It helps in understanding the ratio and
distribution of THPs versus small folios throughout the system.

Additionally, partial unmapping by userspace can lead to significant waste
of THPs over time and increase memory reclamation pressure. We need this
information for comprehensive system tuning.


This patch (of 2):

Let's track for each anonymous THP size, how many of them are currently
allocated.  We'll track the complete lifespan of an anon THP, starting
when it becomes an anon THP ("large anon folio") (->mapping gets set),
until it gets freed (->mapping gets cleared).

Introduce a new "nr_anon" counter per THP size and adjust the
corresponding counter in the following cases:
* We allocate a new THP and call folio_add_new_anon_rmap() to map
   it the first time and turn it into an anon THP.
* We split an anon THP into multiple smaller ones.
* We migrate an anon THP, when we prepare the destination.
* We free an anon THP back to the buddy.

Note that AnonPages in /proc/meminfo currently tracks the total number of
*mapped* anonymous *pages*, and therefore has slightly different
semantics.  In the future, we might also want to track "nr_anon_mapped"
for each THP size, which might be helpful when comparing it to the number
of allocated anon THPs (long-term pinning, stuck in swapcache, memory
leaks, ...).

Further note that for now, we only track anon THPs after they got their
->mapping set, for example via folio_add_new_anon_rmap().  If we would
allocate some in the swapcache, they will only show up in the statistics
for now after they have been mapped to user space the first time, where we
call folio_add_new_anon_rmap().

[akpm@linux-foundation.org: documentation fixups, per David]
  Link: https://lkml.kernel.org/r/3e8add35-e26b-443b-8a04-1078f4bc78f6@redhat.com
Link: https://lkml.kernel.org/r/20240824010441.21308-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240824010441.21308-2-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuai Yuan <yuanshuai@oppo.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:38:57 -07:00
Anna-Maria Behnsen
bd7c8ff9fe treewide: Fix wrong singular form of jiffies in comments
There are several comments all over the place, which uses a wrong singular
form of jiffies.

Replace 'jiffie' by 'jiffy'. No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Link: https://lore.kernel.org/all/20240904-devel-anna-maria-b4-timers-flseep-v1-3-e98760256370@linutronix.de
2024-09-08 20:47:40 +02:00
Neeraj Upadhyay
355debb83b Merge branches 'context_tracking.15.08.24a', 'csd.lock.15.08.24a', 'nocb.09.09.24a', 'rcutorture.14.08.24a', 'rcustall.09.09.24a', 'srcu.12.08.24a', 'rcu.tasks.14.08.24a', 'rcu_scaling_tests.15.08.24a', 'fixes.12.08.24a' and 'misc.11.08.24a' into next.09.09.24a 2024-09-09 00:09:47 +05:30
Robin Murphy
4d5a7680f2 perf: Add driver for Arm NI-700 interconnect PMU
The Arm NI-700 Network-on-Chip Interconnect has a relatively
straightforward design with a hierarchy of voltage, power, and clock
domains, where each clock domain then contains a number of interface
units and a PMU which can monitor events thereon. As such, it begets a
relatively straightforward driver to interface those PMUs with perf.

Even more so than with arm-cmn, users will require detailed knowledge of
the wider system topology in order to meaningfully analyse anything,
since the interconnect itself cannot know what lies beyond the boundary
of each inscrutably-numbered interface. Given that, for now they are
also expected to refer to the NI-700 documentation for the relevant
event IDs to provide as well. An identifier is implemented so we can
come back and add jevents if anyone really wants to.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/9933058d0ab8138c78a61cd6852ea5d5ff48e393.1725470837.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-06 12:58:28 +01:00
Guilherme G. Piccoli
e04eb52bfa Documentation: Document the kernel flag bdev_allow_write_mounted
Commit ed5cc702d3 ("block: Add config option to not allow writing to mounted
devices") added a Kconfig option along with a kernel command-line tuning to
control writes to mounted block devices, as a means to deal with fuzzers like
Syzkaller, that provokes kernel crashes by directly writing on block devices
bypassing the filesystem (so the FS has no awareness and cannot cope with that).

The patch just missed adding such kernel command-line option to the kernel
documentation, so let's fix that.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240828145045.309835-1-gpiccoli@igalia.com
2024-09-05 14:18:28 -06:00
Jonathan Corbet
d224338aa1 Linux 6.11-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbUG7oeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG7LUH/26M4QJ5UGJHsehd
 bbHlE4or0jibFyMbUiYDOElqLITjCVH6mi3Kv3E7sfyLxSsglVRRNzLCTq/UgTf8
 E1L90q4wCySElzzIhH6cltuQdAhs7pRWs5BETByvIW+g+ayN0LZxUPbvB8yl/nOU
 Zx8flBEuM2isuRlnx+iRccbf2PxNadSkSYg2TlmZr8mfFKCiRxjU7x355Q3UcylQ
 b8S2jVgq69CSDF3IBOzwHZjdq5OceDsO8he0KcfSTvSgyFMcwhntAT397YEnFXnk
 KKjKPNCu3KqHtTxsi4Sc0wOxVcgctDv4OPethaL8yROQ7jdBTkvNpPT1yMf7bca8
 ZLpSo5Y=
 =TBcj
 -----END PGP SIGNATURE-----

Merge tag 'v6.11-rc6' into docs-mw

This is done primarily to get a docs build fix merged via another tree so
that "make htmldocs" stops failing.
2024-09-05 14:01:38 -06:00
Hans Verkuil
056f2821b6 media: cec: extron-da-hd-4k-plus: add the Extron DA HD 4K Plus CEC driver
Add support for the Extron DA HD 4K Plus series of 4K HDMI
Distrubution Amplifiers (aka HDMI Splitters).

These devices support CEC and this driver adds support for the
CEC protocol for both the input and all outputs (2, 4 or 6 outputs,
depending on the model).

It also exports the EDID from the outputs and allows reading and
setting the EDID of the input.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2024-09-05 20:13:41 +02:00
Hans de Goede
56d8b784c5 hwmon fixes for v6.11-rc7
hp-wmi-sensors: Check if WMI event data exists before accessing it
 
 ltc2991: fix register bits defines
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAmbYrGEACgkQyx8mb86f
 mYERyw//V+MQSPf+wmRcM7gU0+u+Ylbl98wbyaO6+YyTiq2XQJgbrk2AnNJFxqu9
 tkv2Xb3xFu1EB1V6Xz5kADzaP6Wp9W+4TK+PrlN0GCa6FZZwbGZBuhIkLS1492ex
 Zf8oRUNNiRnIAHb+5+E0aonDbsHSWe7Ff5WVTRfKBn8rrQW8pchPx4h/Y+o54Llo
 BEOQTA0rN51eg0RL7e9ECFUmBLRNuZA6E1w25F15VA50O4Xx2i6lL7PwPDb7KKpa
 g6fy2JTtr7+hKOiJMR4KdodsOZHvW4RgZ0MvWQkOZsccHw8CLp8c4egIxPYQ41Zu
 Ef4imKHPDVsiSXHW6rjQro0IQWBvRu1/6R1ibbDlXTWN5dkxGaByb0tlrRo3QBQn
 CvoHqkQkhioaX2k1I7bBDbyz6AregHLBJ7kpLq4ClcfW4QmxJ9zPPSDekyIyQiTZ
 vU/OljM/Je6CIbzspcaKS8uhNdykw3FX/LWLVwUEFJawJHV8b7XD+S3aJq7zQqEl
 R0hIeDyWiakP2ca7gozey1RjxpFDNQhCeZUqEaTlLhHlDykvZAceEXNuoczrpKNq
 UaBincY64PVBmzLPpjpDVg7C7C023xoqrWKSSd9kFsl9KzbKTH0XzkVPtD9cjQC5
 d34++QqhffAtJuekIbrWmtfhZsvkgQrfRv1MowNsbce0Lz7dRGE=
 =sSGh
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-v6.11-rc7' into review-hans

Merge "hwmon fixes for v6.11-rc7" into review-hans to bring in
commit a54da9df75 ("hwmon: (hp-wmi-sensors) Check if WMI event
data exists").

This is a dependency for a set of WMI event data refactoring changes.
2024-09-05 16:57:36 +02:00
Tero Kristo
8022ae2c43 Documentation: admin-guide: pm: Add efficiency vs. latency tradeoff to uncore documentation
Added documentation about the functionality of efficiency vs. latency tradeoff
control in intel Xeon processors, and how this is configured via sysfs.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240828153657.1296410-2-tero.kristo@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-09-04 20:17:31 +02:00
Sean Christopherson
b4886fab6f KVM: Add a module param to allow enabling virtualization when KVM is loaded
Add an on-by-default module param, enable_virt_at_load, to let userspace
force virtualization to be enabled in hardware when KVM is initialized,
i.e. just before /dev/kvm is exposed to userspace.  Enabling virtualization
during KVM initialization allows userspace to avoid the additional latency
when creating/destroying the first/last VM (or more specifically, on the
0=>1 and 1=>0 edges of creation/destruction).

Now that KVM uses the cpuhp framework to do per-CPU enabling, the latency
could be non-trivial as the cpuhup bringup/teardown is serialized across
CPUs, e.g. the latency could be problematic for use case that need to spin
up VMs quickly.

Prior to commit 10474ae894 ("KVM: Activate Virtualization On Demand"),
KVM _unconditionally_ enabled virtualization during load, i.e. there's no
fundamental reason KVM needs to dynamically toggle virtualization.  These
days, the only known argument for not enabling virtualization is to allow
KVM to be autoloaded without blocking other out-of-tree hypervisors, and
such use cases can simply change the module param, e.g. via command line.

Note, the aforementioned commit also mentioned that enabling SVM (AMD's
virtualization extensions) can result in "using invalid TLB entries".
It's not clear whether the changelog was referring to a KVM bug, a CPU
bug, or something else entirely.  Regardless, leaving virtualization off
by default is not a robust "fix", as any protection provided is lost the
instant userspace creates the first VM.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-04 11:02:33 -04:00
Mike Yuan
5a53623d0f Documentation/cgroup-v2: clarify that zswap.writeback is ignored if zswap is disabled
As discussed in [1], zswap-related settings natually lose their effect
when zswap is disabled, specifically zswap.writeback here.  Be explicit
about this behavior.

[1] https://lore.kernel.org/linux-kernel/CAKEwX=Mhbwhh-=xxCU-RjMXS_n=RpV3Gtznb2m_3JgL+jzz++g@mail.gmail.com/

[akpm@linux-foundation.org: fix/simplify text]
Link: https://lkml.kernel.org/r/20240823162506.12117-3-me@yhndnzj.com
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:48 -07:00
Kaiyang Zhao
f77f0c7514 mm,memcg: provide per-cgroup counters for NUMA balancing operations
The ability to observe the demotion and promotion decisions made by the
kernel on a per-cgroup basis is important for monitoring and tuning
containerized workloads on machines equipped with tiered memory.

Different containers in the system may experience drastically different
memory tiering actions that cannot be distinguished from the global
counters alone.

For example, a container running a workload that has a much hotter memory
accesses will likely see more promotions and fewer demotions, potentially
depriving a colocated container of top tier memory to such an extent that
its performance degrades unacceptably.

For another example, some containers may exhibit longer periods between
data reuse, causing much more numa_hint_faults than numa_pages_migrated. 
In this case, tuning hot_threshold_ms may be appropriate, but the signal
can easily be lost if only global counters are available.

In the long term, we hope to introduce per-cgroup control of promotion and
demotion actions to implement memory placement policies in tiering.

This patch set adds seven counters to memory.stat in a cgroup:
numa_pages_migrated, numa_pte_updates, numa_hint_faults, pgdemote_kswapd,
pgdemote_khugepaged, pgdemote_direct and pgpromote_success.  pgdemote_*
and pgpromote_success are also available in memory.numa_stat.

count_memcg_events_mm() is added to count multiple event occurrences at
once, and get_mem_cgroup_from_folio() is added because we need to get a
reference to the memcg of a folio before it's migrated to track
numa_pages_migrated.  The accounting of PGDEMOTE_* is moved to
shrink_inactive_list() before being changed to per-cgroup.

[kaiyang2@cs.cmu.edu: add documentation of the memcg counters in cgroup-v2.rst]
  Link: https://lkml.kernel.org/r/20240814235122.252309-1-kaiyang2@cs.cmu.edu
Link: https://lkml.kernel.org/r/20240814174227.30639-1-kaiyang2@cs.cmu.edu
Signed-off-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:36 -07:00
Mike Rapoport (Microsoft)
101d647080 docs: move numa=fake description to kernel-parameters.txt
NUMA emulation can be now enabled on arm64 and riscv in addition to x86.

Move description of numa=fake parameters from x86 documentation of
admin-guide/kernel-parameters.txt

Link: https://lkml.kernel.org/r/20240807064110.1003856-27-rppt@kernel.org
Suggested-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:32 -07:00
Sourabh Jain
c91c6062d6 Document/kexec: generalize crash hotplug description
Commit 79365026f8 ("crash: add a new kexec flag for hotplug support")
generalizes the crash hotplug support to allow architectures to update
multiple kexec segments on CPU/Memory hotplug and not just elfcorehdr. 
Therefore, update the relevant kernel documentation to reflect the same.

No functional change.

Link: https://lkml.kernel.org/r/20240812041651.703156-1-sourabhjain@linux.ibm.com
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Petr Tesarik <petr@tesarici.cz>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:43:37 -07:00
Shakeel Butt
340afb8027 memcg: initiate deprecation of pressure_level
The pressure_level in memcg v1 provides memory pressure notifications to
the user space.  At the moment it provides notifications for three levels
of memory pressure i.e.  low, medium and critical, which are defined based
on internal memory reclaim implementation details.  More specifically the
ratio of scanned and reclaimed pages during a memory reclaim.  However
this is not robust as there are workloads with mostly unreclaimable user
memory or kernel memory.

For v2, the users can use PSI for memory pressure status of the system or
the cgroup.  Let's start the deprecation process for pressure_level and
add warnings to gather the info on how the current users are using this
interface and how they can be used to PSI.

Link: https://lkml.kernel.org/r/20240814220021.3208384-5-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:21 -07:00
Shakeel Butt
6df4ad7047 memcg: initiate deprecation of oom_control
The oom_control provides functionality to disable memcg oom-killer,
notifications on oom-kill and reading the stats regarding oom-kills.  This
interface was mainly introduced to provide functionality for userspace
oom-killers.  However it is not robust enough and only supports OOM
handling in the page fault path.

For v2, the users can use the combination of memory.events notifications,
memory.high and PSI to provide userspace OOM-killing functionality. 
Actually LMKD in Android and OOMd in systemd and Meta infrastructure
already use PSI in combination with other stats to implement userspace
OOM-killing.

Let's start the deprecation process for v1 and gather the info on how the
current users are using this interface and work on providing a more robust
functionality in v2.

Link: https://lkml.kernel.org/r/20240814220021.3208384-4-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:21 -07:00
Shakeel Butt
569c4f62d8 memcg: initiate deprecation of v1 soft limit
Memcg v1 provides soft limit functionality for the best effort memory
sharing between multiple workloads on a system.  It is usually triggered
through kswapd and at the moment does not reclaim kernel memory.

Memcg v2 provides more straightforward best effort (memory.low) and hard
protection (memory.min) functionalities.  Let's initiate the deprecation
of soft limit from v1 and gather if v2 needs something more to move the
existing v1 users to v2 regarding soft limit.

Link: https://lkml.kernel.org/r/20240814220021.3208384-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:20 -07:00
Shakeel Butt
d046ff46ee memcg: initiate deprecation of v1 tcp accounting
Patch series "memcg: initiate deprecation of v1 features", v2.

Start the deprecation process of the memcg v1 features which we discussed
during LSFMMBPF 2024 [1].  For now add the warnings to collect the
information on how the current users are using these features.  Next we
will work on providing better alternatives in v2 (if needed) and fully
deprecate these features.

Link: https://lwn.net/Articles/974575 [1]


This patch (of 4):

Memcg v1 provides opt-in TCP memory accounting feature.  However it is
mostly unused due to its performance impact on the network traffic.  In
v2, the TCP memory is accounted in the regular memory usage and is
transparent to the users but they can observe the TCP memory usage through
memcg stats.

Let's initiate the deprecation process of memcg v1's tcp accounting
functionality and add warnings to gather if there are any users and if
there are, collect how they are using it and plan to provide them better
alternative in v2.

Link: https://lkml.kernel.org/r/20240814220021.3208384-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20240814220021.3208384-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:20 -07:00
Ryan Roberts
dd4d30d1cd mm: override mTHP "enabled" defaults at kernel cmdline
Add thp_anon= cmdline parameter to allow specifying the default enablement
of each supported anon THP size.  The parameter accepts the following
format and can be provided multiple times to configure each size:

thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>

An example:

thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never

See Documentation/admin-guide/mm/transhuge.rst for more details.

Configuring the defaults at boot time is useful to allow early user space
to take advantage of mTHP before its been configured through sysfs.

[v-songbaohua@oppo.com: use get_oder() and check size is is_power_of_2]
  Link: https://lkml.kernel.org/r/20240814224635.43272-1-21cnbao@gmail.com
[ryan.roberts@arm.com: some minor cleanup according to David's comments]
  Link: https://lkml.kernel.org/r/20240820105244.62703-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240814020247.67297-1-21cnbao@gmail.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:18 -07:00
David Finkel
c6f53ed8f2 mm, memcg: cg2 memory{.swap,}.peak write handlers
Patch series "mm, memcg: cg2 memory{.swap,}.peak write handlers", v7.


This patch (of 2):

Other mechanisms for querying the peak memory usage of either a process or
v1 memory cgroup allow for resetting the high watermark.  Restore parity
with those mechanisms, but with a less racy API.

For example:
 - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets
   the high watermark.
 - writing "5" to the clear_refs pseudo-file in a processes's proc
   directory resets the peak RSS.

This change is an evolution of a previous patch, which mostly copied the
cgroup v1 behavior, however, there were concerns about races/ownership
issues with a global reset, so instead this change makes the reset
filedescriptor-local.

Writing any non-empty string to the memory.peak and memory.swap.peak
pseudo-files reset the high watermark to the current usage for subsequent
reads through that same FD.

Notably, following Johannes's suggestion, this implementation moves the
O(FDs that have written) behavior onto the FD write(2) path.  Instead, on
the page-allocation path, we simply add one additional watermark to
conditionally bump per-hierarchy level in the page-counter.

Additionally, this takes Longman's suggestion of nesting the
page-charging-path checks for the two watermarks to reduce the number of
common-case comparisons.

This behavior is particularly useful for work scheduling systems that need
to track memory usage of worker processes/cgroups per-work-item.  Since
memory can't be squeezed like CPU can (the OOM-killer has opinions), these
systems need to track the peak memory usage to compute system/container
fullness when binpacking workitems.

Most notably, Vimeo's use-case involves a system that's doing global
binpacking across many Kubernetes pods/containers, and while we can use
PSI for some local decisions about overload, we strive to avoid packing
workloads too tightly in the first place.  To facilitate this, we track
the peak memory usage.  However, since we run with long-lived workers (to
amortize startup costs) we need a way to track the high watermark while a
work-item is executing.  Polling runs the risk of missing short spikes
that last for timescales below the polling interval, and peak memory
tracking at the cgroup level is otherwise perfect for this use-case.

As this data is used to ensure that binpacked work ends up with sufficient
headroom, this use-case mostly avoids the inaccuracies surrounding
reclaimable memory.

Link: https://lkml.kernel.org/r/20240730231304.761942-1-davidf@vimeo.com
Link: https://lkml.kernel.org/r/20240729143743.34236-1-davidf@vimeo.com
Link: https://lkml.kernel.org/r/20240729143743.34236-2-davidf@vimeo.com
Signed-off-by: David Finkel <davidf@vimeo.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Waiman Long <longman@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:53 -07:00
Mike Yuan
e399257349 mm/memcontrol: respect zswap.writeback setting from parent cg too
Currently, the behavior of zswap.writeback wrt.  the cgroup hierarchy
seems a bit odd.  Unlike zswap.max, it doesn't honor the value from parent
cgroups.  This surfaced when people tried to globally disable zswap
writeback, i.e.  reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results in subcgroups
with zswap.writeback=1 still performing writeback.

The inconsistency became more noticeable after I introduced the
MemoryZSwapWriteback= systemd unit setting [2] for controlling the knob.
The patch assumed that the kernel would enforce the value of parent
cgroups.  It could probably be workarounded from systemd's side, by going
up the slice unit tree and inheriting the value.  Yet I think it's more
sensible to make it behave consistently with zswap.max and friends.

[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
[2] https://github.com/systemd/systemd/pull/31734

Link: https://lkml.kernel.org/r/20240823162506.12117-1-me@yhndnzj.com
Fixes: 501a06fe8e ("zswap: memcontrol: implement zswap writeback disabling")
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:02 -07:00
Yicong Yang
d1c93d5c67 drivers/perf: hisi_pcie: Export supported Root Ports [bdf_min, bdf_max]
Currently users can get the Root Ports supported by the PCIe PMU by
"bus" sysfs attributes which indicates the PCIe bus number where
Root Ports are located. This maybe insufficient since Root Ports
supported by different PCIe PMUs may be located on the same PCIe bus.
So export the BDF range the Root Ports additionally.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240829090332.28756-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 11:43:10 +01:00
Adrian Ratiu
41e8149c88
proc: add config & param to block forcing mem writes
This adds a Kconfig option and boot param to allow removing
the FOLL_FORCE flag from /proc/pid/mem write calls because
it can be abused.

The traditional forcing behavior is kept as default because
it can break GDB and some other use cases.

Previously we tried a more sophisticated approach allowing
distributions to fine-tune /proc/pid/mem behavior, however
that got NAK-ed by Linus [1], who prefers this simpler
approach with semantics also easier to understand for users.

Link: https://lore.kernel.org/lkml/CAHk-=wiGWLChxYmUA5HrT5aopZrB7_2VTa0NLZcxORgkUe5tEQ@mail.gmail.com/ [1]
Cc: Doug Anderson <dianders@chromium.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Link: https://lore.kernel.org/r/20240802080225.89408-1-adrian.ratiu@collabora.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30 08:19:43 +02:00
Borislav Petkov (AMD)
4015350525 Documentation/srso: Document a method for checking safe RET operates properly
Add a method to quickly verify whether safe RET operates properly on
a given system using perf tool.

Also, add a selftest which does the same thing.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240731160531.28640-1-bp@kernel.org
2024-08-27 09:16:35 +02:00
Stefan Tauner
736c24e62e Documentation: ext4.rst: remove obsolete descriptions of noacl/nouser_xattr options
These have been deprecated for a decade[1] and removed two years ago[2].
1: f70486055e
2: 2d544ec923

Signed-off-by: Stefan Tauner <stefan.tauner@gmx.at>
Link: https://patch.msgid.link/20240728003433.2566649-1-stefan.tauner@gmx.at
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26 23:40:06 -04:00
Jani Nikula
cd0403adea Documentation: admin-guide: direct people to bug trackers, if specified
Update bug reporting info in bug-hunting.rst to direct people to
driver/subsystem bug trackers, if explicitly specified with the
MAINTAINERS "B:" entry. Use the new get_maintainer.pl --bug option to
print the info.

Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240815113450.3397499-2-jani.nikula@intel.com
2024-08-26 16:10:12 -06:00
Thorsten Leemhuis
cbbdb6c625 docs: bug-bisect: rewrite to better match the other bisecting text
Rewrite the short document on bisecting kernel bugs. The new text
improves .config handling, brings a mention of 'git bisect skip', and
explains what to do after the bisection finished -- including trying a
revert to verify the result. The rewrite at the same time removes the
unrelated and outdated section on 'Devices not appearing' and replaces
some sentences about bug reporting with a pointer to the document
covering that topic in detail.

This overall brings the approach close to the one in the recently added
text Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst.
As those two texts serve a similar purpose for different audiences,
mention that document in the head of this one and outline when the
other might be the better one to follow.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Reviewed-by: Petr Tesarik <petr@tesarici.cz>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/74dc0137dcc3e2c05648e885a7bc31ffd39a0890.1724312119.git.linux@leemhuis.info
2024-08-26 15:34:51 -06:00
Steven Rostedt
2fcd5aff92 tracing/Documentation: Start a document on how to debug with tracing
Add a new document Documentation/trace/debugging.rst that will hold
various ways to debug tracing.

This initial version mentions trace_printk and how to create persistent
buffers that can last across bootups.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineeth Pillai <vineeth@bitbyteword.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Aring <aahringo@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Jonathan Corbet" <corbet@lwn.net>
Link: https://lore.kernel.org/20240823014019.702433486@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-26 13:54:08 -04:00
Steven Rostedt
9b7bdf6f6e tracing: Have trace_printk not use binary prints if boot buffer
If the persistent boot mapped ring buffer is used for trace_printk(),
force it to not use the binary versions. trace_printk() by default uses
bin_printf() that only saves the pointer to the format and not the format
itself inside the ring buffer. But for a persistent buffer that is read
after reboot, the pointers to the format strings may not be the same, or
worse, not even exist! Instead, just force the more robust, but slower,
version that does the formatting before saving into the ring buffer.

The boot mapped buffer can now be used for trace_printk and friends!

Using the trace_printk() and the persistent buffer was used to debug the
issue with the osnoise tracer:

Link: https://lore.kernel.org/all/20240822103443.6a6ae051@gandalf.local.home/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineeth Pillai <vineeth@bitbyteword.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Aring <aahringo@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Jonathan Corbet" <corbet@lwn.net>
Link: https://lore.kernel.org/20240823014019.386925800@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-26 13:54:08 -04:00
Steven Rostedt
ddb8ea9e5a tracing: Allow trace_printk() to go to other instance buffers
Currently, trace_printk() just goes to the top level ring buffer. But
there may be times that it should go to one of the instances created by
the kernel command line.

Add a new trace_instance flag: traceprintk (also can use "printk" or
"trace_printk" as people tend to forget the actual flag name).

  trace_instance=foo^traceprintk

Will assign the trace_printk to this buffer at boot up.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineeth Pillai <vineeth@bitbyteword.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Aring <aahringo@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Jonathan Corbet" <corbet@lwn.net>
Link: https://lore.kernel.org/20240823014019.226694946@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-26 13:54:08 -04:00
Steven Rostedt
b6fc31b687 tracing: Add "traceoff" flag to boot time tracing instances
Add a "flags" delimiter (^) to the "trace_instance" kernel command line
parameter, and add the "traceoff" flag. The format is:

   trace_instance=<name>[^<flag1>[^<flag2>]][@<memory>][,<events>]

The code allows for more than one flag to be added, but currently only
"traceoff" is done so.

The motivation for this change came from debugging with the persistent
ring buffer and having trace_printk() writing to it. The trace_printk
calls are always enabled, and the boot after the crash was having the
unwanted trace_printks from the current boot inject into the ring buffer
with the trace_printks of the crash kernel, making the output very
confusing.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineeth Pillai <vineeth@bitbyteword.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Aring <aahringo@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Jonathan Corbet" <corbet@lwn.net>
Link: https://lore.kernel.org/20240823014019.053229958@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-26 13:54:08 -04:00
Krishna chaitanya chundru
96a37ec986 Documentation: dwc_pcie_pmu: Update bdf to sbdf
Update document to reflect the driver change to use sbdf instead
of bdf alone.

Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240816-dwc_pmu_fix-v2-2-198b8ab1077c@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-23 16:07:25 +01:00
Ingo Franzki
4441686b24 dm-crypt: Allow to specify the integrity key size as option
For the MAC based integrity operation, the integrity key size (i.e.
key_mac_size) is currently set to the digest size of the used digest.

For wrapped key HMAC algorithms, the key size is independent of the
cryptographic key size. So there is no known size of the mac key in
such cases. The desired key size can optionally be specified as argument
when the dm-crypt device is configured via 'integrity_key_size:%u'.
If no integrity_key_size argument is specified, the mac key size
is still set to the digest size, as before.

Increase version number to 1.28.0 so that support for the new
argument can be detected by user space (i.e. cryptsetup).

Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-08-21 15:36:27 +02:00
Heinz Mauelshagen
5d3691a826 dm delay: enhance kernel documentation
This commit improves documentation of the dm-delay target.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-08-21 13:12:12 +02:00
Bruce Johnston
47874c98dc dm vdo: add dmsetup message for returning configuration info
Add a new dmsetup message called config, which will return
useful configuration information for the vdo volume and
the uds index associated with it. The output is a YAML
string, and contains a version number to allow future
additions to the content.

Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-08-21 13:05:56 +02:00
Deven Bowers
ac6731870e documentation: add IPE documentation
Add IPE's admin and developer documentation to the kernel tree.

Co-developed-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20 14:03:47 -04:00
Chen Ridong
d1a92d2d6c cgroup: update some statememt about delegation
The comment in cgroup_file_write is missing some interfaces, such as
'cgroup.threads'. All delegatable files are listed in
'/sys/kernel/cgroup/delegate', so update the comment in cgroup_file_write.
Besides, add a statement that files outside the namespace shouldn't be
visible from inside the delegated namespace.

tj: Reflowed text for consistency.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-19 12:16:17 -10:00
Steven Rostedt (Google)
29a02ec665 tracing: Allow boot instances to use reserve_mem boot memory
Allow boot instances to use memory reserved by the reserve_mem boot
option.

  reserve_mem=12M:4096:trace  trace_instance=boot_mapped@trace

The above will allocate 12 megs with 4096 alignment and label it "trace".
The second parameter will create a "boot_mapped" instance and use the
memory reserved and labeled as "trace" as the memory for the ring buffer.

That will create an instance called "boot_mapped":

  /sys/kernel/tracing/instances/boot_mapped

Note, because the ring buffer is using a defined memory ranged, it will
act just like a memory mapped ring buffer. It will not have a snapshot
buffer, as it can't swap out the buffer. The snapshot files as well as any
tracers that uses a snapshot will not be present in the boot_mapped
instance.

Also note that reserve_mem is not reliable in acquiring the same physical
memory at each soft reboot. It is possible that KALSR could map the kernel
at the previous boot memory location forcing the reserve_mem to return a
different memory location. In this case, the previous ring buffer will be
lost.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/20240815082811.669f7d8c@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-15 08:34:48 -04:00
Steven Rostedt
ee057c8c19 Linux 6.11-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAma5LLIeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwUAIAJNwbkdgTIqEsyBU
 wsFcXGaFSsGJNbTulINJb34jl2gD2yr4pmnnrA0NePW1TUKOnx169hNMF8NWbr/A
 0cHIREV9cyfnm/kzAcnHn7cWLSmsKd+x3TnCbCyodDZQDJzdLmw3LQG+4dTNJbw1
 WtJO/EoaU4qaydW2VxtApw54sirq5bopZz7rpcRapA1afzbA2TUDbnnuEWjm9KCF
 5K+RZTJZA/xI9gqEwJB+/p5FglW4n/T3xcDwaQp5uFsDskgV5e1AUrRLM+icTsem
 0Egrs8Ca2Vp4oBM+r9miCSwjRu04jLKyuu20p7AN8zXLyN7WGAjduS15Dv+aHRZ/
 9XABZs0=
 =/T17
 -----END PGP SIGNATURE-----

Merge tag 'v6.11-rc3' into trace/ring-buffer/core

The "reserve_mem" kernel command line parameter has been pulled into
v6.11. Merge the latest -rc3 to allow the persistent ring buffer memory to
be able to be mapped at the address specified by the "reserve_mem" command
line parameter.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-14 16:59:28 -04:00
Paul E. McKenney
1dd01c0650 rcu: Summarize RCU CPU stall warnings during CSD-lock stalls
During CSD-lock stalls, the additional information output by RCU CPU
stall warnings is usually redundant, flooding the console for not good
reason.  However, this has been the way things work for a few years.
This commit therefore adds an rcutree.csd_lock_suppress_rcu_stall kernel
boot parameter that causes RCU CPU stall warnings to be abbreviated to
a single line when there is at least one CPU that has been stuck waiting
for CSD lock for more than five seconds.

To make this abbreviated message happen with decent probability:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 8 \
	--configs "2*TREE01" --kconfig "CONFIG_CSD_LOCK_WAIT_DEBUG=y" \
	--bootargs "csdlock_debug=1 rcutorture.stall_cpu=200 \
	rcutorture.stall_cpu_holdoff=120 rcutorture.stall_cpu_irqsoff=1 \
	rcutree.csd_lock_suppress_rcu_stall=1 \
	rcupdate.rcu_exp_cpu_stall_timeout=5000" --trust-make

[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15 00:10:50 +05:30
Hans Verkuil
a043ea54bb Extensible parameters support for the rkisp1 driver
-----BEGIN PGP SIGNATURE-----
 
 iJgEABYKAEAWIQTAnvhxs4J7QT+XHKnMPy2AAyfeZAUCZry1ZyIcbGF1cmVudC5w
 aW5jaGFydEBpZGVhc29uYm9hcmQuY29tAAoJEMw/LYADJ95kkTQBAKZm34xgUyqh
 NjrFd4QEARsXbsH+6kZrh40prVzws97eAQDUM+OVykg6+mmuVOZITdX6+KRLxv5e
 bo+X5a+yYOW0BQ==
 =X2AX
 -----END PGP SIGNATURE-----

Merge tag 'next-media-rkisp1-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pinchartl/linux.git

Extensible parameters support for the rkisp1 driver.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-08-14 17:18:47 +02:00
Paul E. McKenney
0ff92d145a doc: Remove RCU Tasks Rude asynchronous APIs
The call_rcu_tasks_rude() and rcu_barrier_tasks_rude() APIs are no longer.
This commit therefore removes them from the documentation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14 16:45:07 +05:30
Paul E. McKenney
58cb321054 rcutorture: Add a stall_cpu_repeat module parameter
This commit adds an stall_cpu_repeat kernel, which is also the
rcutorture.stall_cpu_repeat boot parameter, to test repeated CPU stalls.
Note that only the first stall will pay attention to the stall_cpu_irqsoff
module parameter.  For the second and subsequent stalls, interrupts will
be enabled.  This is helpful when testing the interaction between RCU
CPU stall warnings and CSD-lock stall warnings.

Reported-by: Rik van Riel <riel@surriel.com>
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14 16:23:40 +05:30
Martin Tůma
2b4e497c62 media: admin-guide: mgb4: Outputs DV timings documentation update
Properly document the function of the mgb4 output "frame_rate" sysfs
parameter and update the default DV timings values according to the latest
code changes.

Signed-off-by: Martin Tůma <martin.tuma@digiteqautomotive.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-08-14 10:05:31 +02:00
Daniel Yang
86cfa9a85f Documentation: dm-crypt.rst warning + error fix
While building kernel documention using make htmldocs command, I was
getting unexpected indentation error. Single description was given for
two module parameters with wrong indentation. So, I corrected the
indentation of both parameters and the description.

Signed-off-by: Shibu kumar <shibukumar.bit@gmail.com>
Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 0d815e3400 ("dm-crypt: limit the size of encryption requests")
2024-08-13 16:36:39 +02:00
Jacopo Mondi
1fc379f624 media: uapi: videodev2: Add V4L2_META_FMT_RK_ISP1_EXT_PARAMS
The rkisp1 driver stores ISP configuration parameters in the fixed
rkisp1_params_cfg structure. As the members of the structure are part of
the userspace API, the structure layout is immutable and cannot be
extended further. Introducing new parameters or modifying the existing
ones would change the buffer layout and cause breakages in existing
applications.

The allow for future extensions to the ISP parameters, introduce a new
extensible parameters format, with a new format 4CC. Document usage of
the new format in the rkisp1 admin guide.

Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2024-08-12 13:36:32 +03:00
Hans Verkuil
b669f37896 Documentation: media: vivid.rst: update TODO list
The vivid driver supports media controller support for quite a long
time now, so drop that from the list.

Since commit 4c4dacb052 ("media: vivid: loopback based on
'Connected To' controls") making EDID changes causes correct signaling
to happen, but what is still missing is the 100 ms delay required before
signaling that there is an HPD. Modify this TODO item accordingly.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-08-09 07:56:38 +02:00
Steven Rostedt
a7050ca724 pstore/ramoops: Fix typo as there is no "reserver"
For some reason my finger always hits the 'r' after typing "reserve".
Fix the typo in the Documentation example.

Fixes: d9d814eebb ("pstore/ramoops: Add ramoops.mem_name= command line option")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240807170029.3c1ff651@gandalf.local.home
Signed-off-by: Kees Cook <kees@kernel.org>
2024-08-08 10:51:33 -07:00
Steve French
1b5487aefb smb3: fix setting SecurityFlags when encryption is required
Setting encryption as required in security flags was broken.
For example (to require all mounts to be encrypted by setting):

  "echo 0x400c5 > /proc/fs/cifs/SecurityFlags"

Would return "Invalid argument" and log "Unsupported security flags"
This patch fixes that (e.g. allowing overriding the default for
SecurityFlags  0x00c5, including 0x40000 to require seal, ie
SMB3.1.1 encryption) so now that works and forces encryption
on subsequent mounts.

Acked-by: Bharath SM <bharathsm@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-08 11:14:53 -05:00
Shibu Kumar
e9c7acd723 docs: dm-crypt: Removal of unexpected indentation error
Add the required indentation to fix this docs build error:

  Documentation/admin-guide/device-mapper/dm-crypt.rst:167: ERROR: Unexpected indentation.

Also split the documentation for read and write into separate blocks.

Signed-off-by: Shibu kumar shibukumar.bit@gmail.com
[jc: rewrote changelog]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240803183306.32425-1-shibukumar.bit@gmail.com
2024-08-07 13:22:24 -06:00
Sangmoon Kim
946c57e61d Documentation: kernel-parameters: add workqueue.panic_on_stall
The workqueue.panic_on_stall kernel parameter was added in commit
073107b39e ("workqueue: add cmdline parameter workqueue.panic_on_stall")
but not listed in the kernel-parameters doc. Add it there.

Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-07 06:16:14 -10:00
Tetsuo Handa
b88f55389a profiling: remove profile=sleep support
The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc ("sched: Add wrapper for get_wchan()
to keep task blocked")

Booting with the 'profile=sleep' kernel command line option added or
executing

  # echo -n sleep > /sys/kernel/profiling

after boot causes the system to lock up.

Lockdep reports

  kthreadd/3 is trying to acquire lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70

  but task is already holding lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370

with the call trace being

   lock_acquire+0xc8/0x2f0
   get_wchan+0x32/0x70
   __update_stats_enqueue_sleeper+0x151/0x430
   enqueue_entity+0x4b0/0x520
   enqueue_task_fair+0x92/0x6b0
   ttwu_do_activate+0x73/0x140
   try_to_wake_up+0x213/0x370
   swake_up_locked+0x20/0x50
   complete+0x2f/0x40
   kthread+0xfb/0x180

However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.

Fixes: 42a20f86dc ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-04 13:36:28 -07:00
Waiman Long
ab03125268 cgroup: Show # of subsystem CSSes in cgroup.stat
Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
help manage different structures in various cgroup subsystems by being
an embedded element inside a larger structure like cpuset or mem_cgroup.

The /proc/cgroups file shows the number of cgroups for each of the
subsystems.  With cgroup v1, the number of CSSes is the same as the
number of cgroups.  That is not the case anymore with cgroup v2. The
/proc/cgroups file cannot show the actual number of CSSes for the
subsystems that are bound to cgroup v2.

So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
we can't tell by looking at /proc/cgroups which cgroup subsystems may
be responsible.

As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
cgroup.stat file is now being extended to show the number of live and
dying CSSes associated with all the non-inhibited cgroup subsystems that
have been bound to cgroup v2. The number includes CSSes in the current
cgroup as well as in all the descendants underneath it.  This will help
us pinpoint which subsystems are responsible for the increasing number
of dying (nr_dying_descendants) cgroups.

The CSSes dying counts are stored in the cgroup structure itself
instead of inside the CSS as suggested by Johannes. This will allow
us to accurately track dying counts of cgroup subsystems that have
recently been disabled in a cgroup. It is now possible that a zero
subsystem number is coupled with a non-zero dying subsystem number.

The cgroup-v2.rst file is updated to discuss this new behavior.

With this patch applied, a sample output from root cgroup.stat file
was shown below.

	nr_descendants 56
	nr_subsys_cpuset 1
	nr_subsys_cpu 43
	nr_subsys_io 43
	nr_subsys_memory 56
	nr_subsys_perf_event 57
	nr_subsys_hugetlb 1
	nr_subsys_pids 56
	nr_subsys_rdma 1
	nr_subsys_misc 1
	nr_dying_descendants 30
	nr_dying_subsys_cpuset 0
	nr_dying_subsys_cpu 0
	nr_dying_subsys_io 0
	nr_dying_subsys_memory 30
	nr_dying_subsys_perf_event 0
	nr_dying_subsys_hugetlb 0
	nr_dying_subsys_pids 0
	nr_dying_subsys_rdma 0
	nr_dying_subsys_misc 0

Another sample output from system.slice/cgroup.stat was:

	nr_descendants 34
	nr_subsys_cpuset 0
	nr_subsys_cpu 32
	nr_subsys_io 32
	nr_subsys_memory 34
	nr_subsys_perf_event 35
	nr_subsys_hugetlb 0
	nr_subsys_pids 34
	nr_subsys_rdma 0
	nr_subsys_misc 0
	nr_dying_descendants 30
	nr_dying_subsys_cpuset 0
	nr_dying_subsys_cpu 0
	nr_dying_subsys_io 0
	nr_dying_subsys_memory 30
	nr_dying_subsys_perf_event 0
	nr_dying_subsys_hugetlb 0
	nr_dying_subsys_pids 0
	nr_dying_subsys_rdma 0
	nr_dying_subsys_misc 0

Note that 'debug' controller wasn't used to provide this information because
the controller is not recommended in productions kernels, also many of them
won't enable CONFIG_CGROUP_DEBUG by default.

Similar information could be retrieved with debuggers like drgn but that's
also not always available (e.g. lockdown) and the additional cost of runtime
tracking here is deemed marginal.

tj: Added Michal's paragraphs on why this is not added the debug controller
    to the commit message.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Michal Koutný <mkoutny@suse.com>
Link: http://lkml.kernel.org/r/20240715150034.2583772-1-longman@redhat.com
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-31 07:00:02 -10:00
Benjamin Poirier
1b2255db3c Documentation: Add detailed explanation for 'N' taint flag
Every taint flag has an entry in the "More detailed explanation" section
except for the 'N' flag. That omission was probably just an oversight so
add an entry for that flag.

Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240717203521.514348-1-bpoirier@nvidia.com
2024-07-30 07:56:30 -06:00
Linus Torvalds
65ad409e63 more s390 updates for 6.11 merge window
- Fix KMSAN build breakage caused by the conflict between s390 and
   mm-stable trees
 
 - Add KMSAN page markers for ptdump
 
 - Add runtime constant support
 
 - Fix __pa/__va for modules under non-GPL licenses by exporting necessary
   vm_layout struct with EXPORT_SYMBOL to prevent linkage problems
 
 - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement
   Counter Facility code when the counter set size is zero
 
 - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable
   its functionality by default
 
 - Support allocation of multiple MSI interrupts per device and improve
   logging of architecture-specific limitations
 
 - Add support for lowcore relocation as a debugging feature to catch
   all null ptr dereferences in the kernel address space, improving
   detection beyond the current implementation's limited write access
   protection
 
 - Clean up and rework CPU alternatives to allow for callbacks and early
   patching for the lowcore relocation
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmajrGUACgkQjYWKoQLX
 FBgZjwf/X8suh1Gm2qO47hdGURusOKmEa6GjYaihKzOi2I5yAWVXGsAYX7QtXI4X
 fxbKyuGJIvq7LOrIojN1JOCPGkDRztgMddqDJI7WljuRiw6dcd4L5tbaPgCKEv3Q
 AQcoq+Aeg1L5xnuNFPdQXl6+Fy2lTFqJCkUl+uW05pGAn2R212dYG3HB41TpwOtJ
 Sv2R5+yD9TQCKnHyuCQqaGf7d6SQTcVeBj8zrqVmcyduNK+BYYMOwlJ/UTRzeZEX
 3DmQg/TdAkxXf0jZ+vrNILEfHlIvwDAhFjdoAXXL0TX4lx2cHLx9AiqNxYhUprsG
 0gutc/nLq2FxhqofoJ0z9TCdb1Ef7w==
 =skva
 -----END PGP SIGNATURE-----

Merge tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Fix KMSAN build breakage caused by the conflict between s390 and
   mm-stable trees

 - Add KMSAN page markers for ptdump

 - Add runtime constant support

 - Fix __pa/__va for modules under non-GPL licenses by exporting
   necessary vm_layout struct with EXPORT_SYMBOL to prevent linkage
   problems

 - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement
   Counter Facility code when the counter set size is zero

 - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable
   its functionality by default

 - Support allocation of multiple MSI interrupts per device and improve
   logging of architecture-specific limitations

 - Add support for lowcore relocation as a debugging feature to catch
   all null ptr dereferences in the kernel address space, improving
   detection beyond the current implementation's limited write access
   protection

 - Clean up and rework CPU alternatives to allow for callbacks and early
   patching for the lowcore relocation

* tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
  s390: Remove protvirt and kvm config guards for uv code
  s390/boot: Add cmdline option to relocate lowcore
  s390/kdump: Make kdump ready for lowcore relocation
  s390/entry: Make system_call() ready for lowcore relocation
  s390/entry: Make ret_from_fork() ready for lowcore relocation
  s390/entry: Make __switch_to() ready for lowcore relocation
  s390/entry: Make restart_int_handler() ready for lowcore relocation
  s390/entry: Make mchk_int_handler() ready for lowcore relocation
  s390/entry: Make int handlers ready for lowcore relocation
  s390/entry: Make pgm_check_handler() ready for lowcore relocation
  s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro
  s390/entry: Add base register to SIEEXIT macro
  s390/entry: Add base register to MBEAR macro
  s390/entry: Make __sie64a() ready for lowcore relocation
  s390/head64: Make startup code ready for lowcore relocation
  s390: Add infrastructure to patch lowcore accesses
  s390/atomic_ops: Disable flag outputs constraint for GCC versions below 14.2.0
  s390/entry: Move SIE indicator flag to thread info
  s390/nmi: Simplify ptregs setup
  s390/alternatives: Remove alternative facility list
  ...
2024-07-26 10:47:53 -07:00
Linus Torvalds
f646429524 parisc architecture fixes and updates for kernel v6.11-rc1:
- Add gettimeofday() and clock_gettime() vDSO functions
 - Enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor
   with PCIe NVME card to function in parisc machines
 - Allow users to reduce kernel unaligned runtime warnings
 - minor code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZqFfJgAKCRD3ErUQojoP
 X6YqAQDC6/8icgbscM6dP5m4+oQ4/Nf3qzKM12jt87sAuRUAxAD9G6uyzUxtw7xS
 qcRlVoGrc/SLI18JMi3zvs1sEPsicA8=
 =cYa2
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:
 "The gettimeofday() and clock_gettime() syscalls are now available as
  vDSO functions, and Dave added a patch which allows to use NVMe cards
  in the PCI slots as fast and easy alternative to SCSI discs.

  Summary:

   - add gettimeofday() and clock_gettime() vDSO functions

   - enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor
     with PCIe NVME card to function in parisc machines

   - allow users to reduce kernel unaligned runtime warnings

   - minor code cleanups"

* tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
  parisc: Use max() to calculate parisc_tlb_flush_threshold
  parisc: Fix warning at drivers/pci/msi/msi.h:121
  parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions
  parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions
  parisc: Clean up unistd.h file
2024-07-25 12:37:42 -07:00
Helge Deller
cbade82334 parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
Allow users to disable kernel warnings for unaligned memory
accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap
procfs entry.
That way users can disable those warnings in case they happen too
often.

Signed-off-by: Helge Deller <deller@gmx.de>
2024-07-24 02:04:05 +02:00
Sven Schnelle
035248a784 s390/alternatives: Remove noaltinstr option
The current Kernel doesn't boot without alternative patching on
z16 machines. To avoid such bugs in the future, remove the option
disable alternative patching.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-23 16:02:31 +02:00
Linus Torvalds
fbc90c042c - 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression
   (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff).
   Yu has a fix which I'll send along later via the hotfixes branch.
 
 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.
 
 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that.  This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches.  My bad.
 
 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"
 
 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability of
   cgroup writeback"
 
 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache index".
 
 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of the
   zeropage in MAP_SHARED mappings.  I don't see any runtime effects here -
   more a cleanup/understandability/maintainablity thing.
 
 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of
   higher addresses, for aarch64.  The (poorly named) series is
   "Restructure va_high_addr_switch".
 
 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".
 
 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the
   series "Enhance soft hwpoison handling and injection".
 
 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything.  Some landed in this pull.
 
 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has
   simplified migration's use of hardware-offload memory copying.
 
 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".
 
 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code.  This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.
 
 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.
 
 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP.  By default this
   is a no-op - users must opt in vis sysfs controls.  Dramatic
   improvements in pagefault latency are realized.
 
 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".
 
 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".
 
 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".
 
 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".
 
 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances.  A 10x speedup is seen in a microbenchmark.
 
   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.
 
 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".
 
 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.
 
 - Is anyone reading this stuff?  If so, email me!
 
 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.
 
 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".
 
 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.
 
 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".
 
 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE".  It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.
 
 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.
 
 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large folio
   userspace copying.
 
 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers.  From SeongJae Park.
 
 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.
 
 - David Hildenbrand sends along more cleanups, this time against the
   migration code.  The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".
 
 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code.  He addresses this in the series "mm: Fix various
   readahead quirks".
 
 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's self
   testing code.
 
 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code.  The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this.  The series is marked cc:stable.
 
 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.
 
 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion.  The series (which also makes the memcg-v1 code
   Kconfigurable) are
 
   "mm: memcg: separate legacy cgroup v1 code and put under config
   option" and
   "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1"
 
 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.
 
 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of excessive
   correctable memory errors.  In order to permit userspace to monitor and
   handle this situation.
 
 - Kefeng Wang's series "mm: migrate: support poison recover from migrate
   folio" teaches the kernel to appropriately handle migration from
   poisoned source folios rather than simply panicing.
 
 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.
 
 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory utilization.
 
 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare
   refcount increments.  So these paes can first be moved aside if they
   reside in the movable zone or a CMA block.
 
 - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps
   for much faster reading of vma information.  The series is "query VMAs
   from /proc/<pid>/maps".
 
 - In the series "mm: introduce per-order mTHP split counters" Lance Yang
   improves the kernel's presentation of developer information related to
   multisize THP splitting.
 
 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)".  This permits
   userspace to use all available huge page sizes.
 
 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and not
   very useful feature from slab fault injection.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA
 joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3
 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8=
 =z0Lf
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.

 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that. This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches. My
   bad.

 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"

 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability
   of cgroup writeback"

 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache
   index".

 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of
   the zeropage in MAP_SHARED mappings. I don't see any runtime effects
   here - more a cleanup/understandability/maintainablity thing.

 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
   of higher addresses, for aarch64. The (poorly named) series is
   "Restructure va_high_addr_switch".

 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".

 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
   the series "Enhance soft hwpoison handling and injection".

 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything. Some landed in this pull.

 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
   has simplified migration's use of hardware-offload memory copying.

 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".

 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code. This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.

 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.

 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP. By default this
   is a no-op - users must opt in vis sysfs controls. Dramatic
   improvements in pagefault latency are realized.

 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".

 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".

 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".

 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".

 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances. A 10x speedup is seen in a microbenchmark.

   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.

 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".

 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.

 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.

 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".

 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.

 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".

 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.

 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.

 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large
   folio userspace copying.

 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers. From SeongJae Park.

 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.

 - David Hildenbrand sends along more cleanups, this time against the
   migration code. The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".

 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code. He addresses this in the series "mm: Fix various
   readahead quirks".

 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's
   self testing code.

 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code. The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this. The series is marked cc:stable.

 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.

 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion. The series (which also makes the memcg-v1 code
   Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
   under config option" and "mm: memcg: put cgroup v1-specific memcg
   data under CONFIG_MEMCG_V1"

 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.

 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of
   excessive correctable memory errors. In order to permit userspace to
   monitor and handle this situation.

 - Kefeng Wang's series "mm: migrate: support poison recover from
   migrate folio" teaches the kernel to appropriately handle migration
   from poisoned source folios rather than simply panicing.

 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.

 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory
   utilization.

 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than
   bare refcount increments. So these paes can first be moved aside if
   they reside in the movable zone or a CMA block.

 - Andrii Nakryiko has added a binary ioctl()-based API to
   /proc/pid/maps for much faster reading of vma information. The series
   is "query VMAs from /proc/<pid>/maps".

 - In the series "mm: introduce per-order mTHP split counters" Lance
   Yang improves the kernel's presentation of developer information
   related to multisize THP splitting.

 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)". This permits
   userspace to use all available huge page sizes.

 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and
   not very useful feature from slab fault injection.

* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
  mm/mglru: fix ineffective protection calculation
  mm/zswap: fix a white space issue
  mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
  mm/hugetlb: fix possible recursive locking detected warning
  mm/gup: clear the LRU flag of a page before adding to LRU batch
  mm/numa_balancing: teach mpol_to_str about the balancing mode
  mm: memcg1: convert charge move flags to unsigned long long
  alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
  lib: reuse page_ext_data() to obtain codetag_ref
  lib: add missing newline character in the warning message
  mm/mglru: fix overshooting shrinker memory
  mm/mglru: fix div-by-zero in vmpressure_calc_level()
  mm/kmemleak: replace strncpy() with strscpy()
  mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
  mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
  mm: ignore data-race in __swap_writepage
  hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
  mm: shmem: rename mTHP shmem counters
  mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
  mm/migrate: putback split folios when numa hint migration fails
  ...
2024-07-21 17:15:46 -07:00
Linus Torvalds
2c9b351240 ARM:
* Initial infrastructure for shadow stage-2 MMUs, as part of nested
   virtualization enablement
 
 * Support for userspace changes to the guest CTR_EL0 value, enabling
   (in part) migration of VMs between heterogenous hardware
 
 * Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
   the protocol
 
 * FPSIMD/SVE support for nested, including merged trap configuration
   and exception routing
 
 * New command-line parameter to control the WFx trap behavior under KVM
 
 * Introduce kCFI hardening in the EL2 hypervisor
 
 * Fixes + cleanups for handling presence/absence of FEAT_TCRX
 
 * Miscellaneous fixes + documentation updates
 
 LoongArch:
 
 * Add paravirt steal time support.
 
 * Add support for KVM_DIRTY_LOG_INITIALLY_SET.
 
 * Add perf kvm-stat support for loongarch.
 
 RISC-V:
 
 * Redirect AMO load/store access fault traps to guest
 
 * perf kvm stat support
 
 * Use guest files for IMSIC virtualization, when available
 
 ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA
 extensions is coming through the RISC-V tree.
 
 s390:
 
 * Assortment of tiny fixes which are not time critical
 
 x86:
 
 * Fixes for Xen emulation.
 
 * Add a global struct to consolidate tracking of host values, e.g. EFER
 
 * Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
   bus frequency, because TDX.
 
 * Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
 
 * Clean up KVM's handling of vendor specific emulation to consistently act on
   "compatible with Intel/AMD", versus checking for a specific vendor.
 
 * Drop MTRR virtualization, and instead always honor guest PAT on CPUs
   that support self-snoop.
 
 * Update to the newfangled Intel CPU FMS infrastructure.
 
 * Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
   '0' and writes from userspace are ignored.
 
 * Misc cleanups
 
 x86 - MMU:
 
 * Small cleanups, renames and refactoring extracted from the upcoming
   Intel TDX support.
 
 * Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't
   hold leafs SPTEs.
 
 * Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager
   page splitting, to avoid stalling vCPUs when splitting huge pages.
 
 * Bug the VM instead of simply warning if KVM tries to split a SPTE that is
   non-present or not-huge.  KVM is guaranteed to end up in a broken state
   because the callers fully expect a valid SPTE, it's all but dangerous
   to let more MMU changes happen afterwards.
 
 x86 - AMD:
 
 * Make per-CPU save_area allocations NUMA-aware.
 
 * Force sev_es_host_save_area() to be inlined to avoid calling into an
   instrumentable function from noinstr code.
 
 * Base support for running SEV-SNP guests.  API-wise, this includes
   a new KVM_X86_SNP_VM type, encrypting/measure the initial image into
   guest memory, and finalizing it before launching it.  Internally,
   there are some gmem/mmu hooks needed to prepare gmem-allocated pages
   before mapping them into guest private memory ranges.
 
   This includes basic support for attestation guest requests, enough to
   say that KVM supports the GHCB 2.0 specification.
 
   There is no support yet for loading into the firmware those signing
   keys to be used for attestation requests, and therefore no need yet
   for the host to provide certificate data for those keys.  To support
   fetching certificate data from userspace, a new KVM exit type will be
   needed to handle fetching the certificate from userspace. An attempt to
   define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle
   this was introduced in v1 of this patchset, but is still being discussed
   by community, so for now this patchset only implements a stub version
   of SNP Extended Guest Requests that does not provide certificate data.
 
 x86 - Intel:
 
 * Remove an unnecessary EPT TLB flush when enabling hardware.
 
 * Fix a series of bugs that cause KVM to fail to detect nested pending posted
   interrupts as valid wake eents for a vCPU executing HLT in L2 (with
   HLT-exiting disable by L1).
 
 * KVM: x86: Suppress MMIO that is triggered during task switch emulation
 
   Explicitly suppress userspace emulated MMIO exits that are triggered when
   emulating a task switch as KVM doesn't support userspace MMIO during
   complex (multi-step) emulation.  Silently ignoring the exit request can
   result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to
   userspace for some other reason prior to purging mmio_needed.
 
   See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write exits if
   emulator detects exception") for more details on KVM's limitations with
   respect to emulated MMIO during complex emulator flows.
 
 Generic:
 
 * Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE,
   because the special casing needed by these pages is not due to just
   unmovability (and in fact they are only unmovable because the CPU cannot
   access them).
 
 * New ioctl to populate the KVM page tables in advance, which is useful to
   mitigate KVM page faults during guest boot or after live migration.
   The code will also be used by TDX, but (probably) not through the ioctl.
 
 * Enable halt poll shrinking by default, as Intel found it to be a clear win.
 
 * Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.
 
 * Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().
 
 * Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.
 
 * Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.
 
 Selftests:
 
 * Remove dead code in the memslot modification stress test.
 
 * Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs.
 
 * Print the guest pseudo-RNG seed only when it changes, to avoid spamming the
   log for tests that create lots of VMs.
 
 * Make the PMU counters test less flaky when counting LLC cache misses by
   doing CLFLUSH{OPT} in every loop iteration.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaZQB0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkZwf/bv2jiENaLFNGPe/VqTKMQ6PHQLMG
 +sNHx6fJPP35gTM8Jqf0/7/ummZXcSuC1mWrzYbecZm7Oeg3vwNXHZ4LquwwX6Dv
 8dKcUzLbWDAC4WA3SKhi8C8RV2v6E7ohy69NtAJmFWTc7H95dtIQm6cduV2osTC3
 OEuHe1i8d9umk6couL9Qhm8hk3i9v2KgCsrfyNrQgLtS3hu7q6yOTR8nT0iH6sJR
 KE5A8prBQgLmF34CuvYDw4Hu6E4j+0QmIqodovg2884W1gZQ9LmcVqYPaRZGsG8S
 iDdbkualLKwiR1TpRr3HJGKWSFdc7RblbsnHRvHIZgFsMQiimh4HrBSCyQ==
 =zepX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Initial infrastructure for shadow stage-2 MMUs, as part of nested
     virtualization enablement

   - Support for userspace changes to the guest CTR_EL0 value, enabling
     (in part) migration of VMs between heterogenous hardware

   - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
     of the protocol

   - FPSIMD/SVE support for nested, including merged trap configuration
     and exception routing

   - New command-line parameter to control the WFx trap behavior under
     KVM

   - Introduce kCFI hardening in the EL2 hypervisor

   - Fixes + cleanups for handling presence/absence of FEAT_TCRX

   - Miscellaneous fixes + documentation updates

  LoongArch:

   - Add paravirt steal time support

   - Add support for KVM_DIRTY_LOG_INITIALLY_SET

   - Add perf kvm-stat support for loongarch

  RISC-V:

   - Redirect AMO load/store access fault traps to guest

   - perf kvm stat support

   - Use guest files for IMSIC virtualization, when available

  s390:

   - Assortment of tiny fixes which are not time critical

  x86:

   - Fixes for Xen emulation

   - Add a global struct to consolidate tracking of host values, e.g.
     EFER

   - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
     effective APIC bus frequency, because TDX

   - Print the name of the APICv/AVIC inhibits in the relevant
     tracepoint

   - Clean up KVM's handling of vendor specific emulation to
     consistently act on "compatible with Intel/AMD", versus checking
     for a specific vendor

   - Drop MTRR virtualization, and instead always honor guest PAT on
     CPUs that support self-snoop

   - Update to the newfangled Intel CPU FMS infrastructure

   - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
     it reads '0' and writes from userspace are ignored

   - Misc cleanups

  x86 - MMU:

   - Small cleanups, renames and refactoring extracted from the upcoming
     Intel TDX support

   - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
     that can't hold leafs SPTEs

   - Unconditionally drop mmu_lock when allocating TDP MMU page tables
     for eager page splitting, to avoid stalling vCPUs when splitting
     huge pages

   - Bug the VM instead of simply warning if KVM tries to split a SPTE
     that is non-present or not-huge. KVM is guaranteed to end up in a
     broken state because the callers fully expect a valid SPTE, it's
     all but dangerous to let more MMU changes happen afterwards

  x86 - AMD:

   - Make per-CPU save_area allocations NUMA-aware

   - Force sev_es_host_save_area() to be inlined to avoid calling into
     an instrumentable function from noinstr code

   - Base support for running SEV-SNP guests. API-wise, this includes a
     new KVM_X86_SNP_VM type, encrypting/measure the initial image into
     guest memory, and finalizing it before launching it. Internally,
     there are some gmem/mmu hooks needed to prepare gmem-allocated
     pages before mapping them into guest private memory ranges

     This includes basic support for attestation guest requests, enough
     to say that KVM supports the GHCB 2.0 specification

     There is no support yet for loading into the firmware those signing
     keys to be used for attestation requests, and therefore no need yet
     for the host to provide certificate data for those keys.

     To support fetching certificate data from userspace, a new KVM exit
     type will be needed to handle fetching the certificate from
     userspace.

     An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
     exit type to handle this was introduced in v1 of this patchset, but
     is still being discussed by community, so for now this patchset
     only implements a stub version of SNP Extended Guest Requests that
     does not provide certificate data

  x86 - Intel:

   - Remove an unnecessary EPT TLB flush when enabling hardware

   - Fix a series of bugs that cause KVM to fail to detect nested
     pending posted interrupts as valid wake eents for a vCPU executing
     HLT in L2 (with HLT-exiting disable by L1)

   - KVM: x86: Suppress MMIO that is triggered during task switch
     emulation

     Explicitly suppress userspace emulated MMIO exits that are
     triggered when emulating a task switch as KVM doesn't support
     userspace MMIO during complex (multi-step) emulation

     Silently ignoring the exit request can result in the
     WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
     for some other reason prior to purging mmio_needed

     See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write
     exits if emulator detects exception") for more details on KVM's
     limitations with respect to emulated MMIO during complex emulator
     flows

  Generic:

   - Rename the AS_UNMOVABLE flag that was introduced for KVM to
     AS_INACCESSIBLE, because the special casing needed by these pages
     is not due to just unmovability (and in fact they are only
     unmovable because the CPU cannot access them)

   - New ioctl to populate the KVM page tables in advance, which is
     useful to mitigate KVM page faults during guest boot or after live
     migration. The code will also be used by TDX, but (probably) not
     through the ioctl

   - Enable halt poll shrinking by default, as Intel found it to be a
     clear win

   - Setup empty IRQ routing when creating a VM to avoid having to
     synchronize SRCU when creating a split IRQCHIP on x86

   - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
     a flag that arch code can use for hooking both sched_in() and
     sched_out()

   - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
     truncating a bogus value from userspace, e.g. to help userspace
     detect bugs

   - Mark a vCPU as preempted if and only if it's scheduled out while in
     the KVM_RUN loop, e.g. to avoid marking it preempted and thus
     writing guest memory when retrieving guest state during live
     migration blackout

  Selftests:

   - Remove dead code in the memslot modification stress test

   - Treat "branch instructions retired" as supported on all AMD Family
     17h+ CPUs

   - Print the guest pseudo-RNG seed only when it changes, to avoid
     spamming the log for tests that create lots of VMs

   - Make the PMU counters test less flaky when counting LLC cache
     misses by doing CLFLUSH{OPT} in every loop iteration"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  crypto: ccp: Add the SNP_VLEK_LOAD command
  KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
  KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
  KVM: x86: Replace static_call_cond() with static_call()
  KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  x86/sev: Move sev_guest.h into common SEV header
  KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  KVM: x86: Suppress MMIO that is triggered during task switch emulation
  KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
  KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
  KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
  KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
  KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
  KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
  KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
  KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
  KVM: Document KVM_PRE_FAULT_MEMORY ioctl
  mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
  perf kvm: Add kvm-stat for loongarch64
  LoongArch: KVM: Add PV steal time support in guest side
  ...
2024-07-20 12:41:03 -07:00
Linus Torvalds
d2be38b9a5 - added support for Realtek RTL9302C
- added support for Mobileye EyeQ6H
 - added support for Mobileye EyeQ OLB system controller
 - improved r4k clocksource
 - added mode for emulating ieee754 NAN2008
 - rework for BMIPS CBR address handling
 - fixes for Loongson 2K1000
 - defconfig updates
 - cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmabf5oaHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAYOQ//dgWc6RDS5vWKt14goHoR
 m3Qt63oHuxfGJsPCHdAqD4bAjxMa1eaRzbfXZ/cMrCSHsUo6bth8dmqFCDMjjWMT
 ifcCOCwXOf32NUTdm4mNLrKVUvCNeWUN6It8XBBF9r7seogvJPDpDZlEWUzYwfDE
 6e7MaaFIEMZN2Q5OAjb6PozTI0gQ3p3UAHVdvN4Z9jJxkYPzRqVostcFUL9M9iU6
 7OwGypIdZVSzB+6J6k0yv4rqNDei92SmlLjBD1+GK6uLdJG0JXiWn/XEMxOLyRP9
 kKyfpjCwOgAfbTnMoo1N2n1jkP1BqyAPHvGqF2HGpi5mFRW1i25WdcwvF/jImyes
 yQ/gLKt/y3sOqfssayDvK9acRkp0KQltpPfvWxBXM464+8+gKCdYPZ7+81AbXAiL
 Qx+bVVdE3HSoO9T06/b0Lpudue7eNU+jlaO8MLH778heT+5k+mlI/H0Ep7M5U7qO
 5V9xWlvLpceTa/gJ1cc9bUI5MG/2x+imw7COUcnv+wsWBJ3pGX4Jhwwe2hUn7ixd
 0lhrSrQi1ILkFd8gL2REoJ520RNUVfR8yDn7mNuYV1++zlGVb7EAt67v/J6Y1p8l
 9aQP/587oZvLAN2IBlovSzqvc6tHZlK6hO9d+ktqJood5NOjOWEGfT0RCm0eqiFF
 Er6qaWxjROZO1kiGjzo7v+4=
 =/6JH
 -----END PGP SIGNATURE-----

Merge tag 'mips_6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Thomas Bogendoerfer:

 - add support for Realtek RTL9302C

 - add support for Mobileye EyeQ6H

 - add support for Mobileye EyeQ OLB system controller

 - improve r4k clocksource

 - add mode for emulating ieee754 NAN2008

 - rework for BMIPS CBR address handling

 - fixes for Loongson 2K1000

 - defconfig updates

 - cleanups and fixes

* tag 'mips_6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (58 commits)
  MIPS: config: Add ip30_defconfig
  MIPS: config: lemote2f: Regenerate defconfig
  MIPS: config: generic: Add board-litex
  MIPS: config: Enable MSA and virtualization for MIPS64R6
  MIPS: Fix fallback march for SB1
  mips: dts: realtek: Add RTL9302C board
  mips: generic: add fdt fixup for Realtek reference board
  mips: select REALTEK_OTTO_TIMER for Realtek platforms
  dt-bindings: interrupt-controller: realtek,rtl-intc: Add rtl9300-intc
  dt-bindings: mips: realtek: Add rtl930x-soc compatible
  dt-bindings: vendor-prefixes: Add Cameo Communications
  mips: dts: realtek: add device_type property to cpu node
  mips: dts: realtek: use "serial" instead of "uart" in node name
  MIPS: Implement ieee754 NAN2008 emulation mode
  MIPS: lantiq: improve USB initialization
  MIPS: GIC: Generate redirect block accessors
  MIPS: CPS: Add a couple of multi-cluster utility functions
  MIPS: Octeron: remove source file executable bit
  MAINTAINERS: Mobileye: add OLB drivers and dt-bindings
  MIPS: mobileye: eyeq5: add OLB system-controller node
  ...
2024-07-20 09:03:36 -07:00
Linus Torvalds
3f386cb8ee pci-v6.11-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmaahiEUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwypg/+LSzrx0CyyXruwwkjuoMIzqXoEpxV
 SSdJv47E9rnJymQvd0RAeNyc1BPbtRcP1FdEvV/G1ovb8qJSOJgU22PSSiMQsQ0h
 2WGBl1ShubQDDLBdy1AggAsRJhIH4P4tWZ4k5Ftz6WZPWA1UcrDqmjN4d02UIYZb
 A3YYcBEIm6bvrixxy+xq/Ii7S9A2idikabDLLGXOMSliFHx0ehWDNXyQEBONlrDh
 rEHih21rPtOltVEdJl7yF+SIA467HI09NuXfTviHWnJ1hinFoSlEHIhz4j+i+r//
 xOj7iDqtk/UAIToVsxtwgOnElNwY6ab/h/t1AmSSxX4FUEV2TiS1YEpUfX7pByt+
 dytgvepjQyycC/ZHUtRZFZ6+1M0z+Vgb5c3+jXyPh8pQEPqmXt8+KYVIi/wychmJ
 Opo4xniiDoKHSZ4E0bg/wMbe9yVCjTpX0i0S7BbNa/TRjud6vAhXvgx/y092jsdg
 h4lU0ywNCgea/rZFHZYomPjncx9xJ+rtOaH+/dVQhCm/wuRHnj7tJGZnl5LfCWVw
 +yNOcExQaE+lRvKqp6mQvUva3+4UArAL2tnFC00tGd0emRLIvXrxY2lF1sqp9wCZ
 AJu65El4nnpFNU7vJR7x4X31BvcdquFEvfofPxPXbPz09N8hPRhkunKzgd5ftKZS
 mcxMfStvIFXiMEM=
 =vw2i
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Define PCIE_RESET_CONFIG_DEVICE_WAIT_MS for the generic 100ms
     required after reset before config access (Kevin Xie)

   - Define PCIE_T_RRS_READY_MS for the generic 100ms required after
     reset before config access (probably should be unified with
     PCIE_RESET_CONFIG_DEVICE_WAIT_MS) (Damien Le Moal)

  Resource management:

   - Rename find_resource() to find_resource_space() to be more
     descriptive (Ilpo Järvinen)

   - Export find_resource_space() for use by PCI core, which needs to
     learn whether there is available space for a bridge window (Ilpo
     Järvinen)

   - Prevent double counting of resources so window size doesn't grow on
     each remove/rescan cycle (Ilpo Järvinen)

   - Relax bridge window sizing algorithm so a device doesn't break
     simply because it was removed and rescanned (Ilpo Järvinen)

   - Evaluate the ACPI PRESERVE_BOOT_CONFIG _DSM in
     pci_register_host_bridge() (not acpi_pci_root_create()) so we can
     unify it with similar DT functionality (Vidya Sagar)

   - Extend use of DT "linux,pci-probe-only" property so it works
     per-host bridge as well as globally (Vidya Sagar)

   - Unify support for ACPI PRESERVE_BOOT_CONFIG _DSM and the DT
     "linux,pci-probe-only" property in pci_preserve_config() (Vidya
     Sagar)

  Driver binding:

   - Add devres infrastructure for managed request and map of partial
     BAR resources (Philipp Stanner)

   - Deprecate pcim_iomap_table() because uses like
     "pcim_iomap_table()[0]" have no good way to return errors (Philipp
     Stanner)

   - Add an always-managed pcim_request_region() for use instead of
     pci_request_region() and similar, which are sometimes managed
     depending on whether pcim_enable_device() has been called
     previously (Philipp Stanner)

   - Reimplement pcim_set_mwi() so it doesn't need to keep store MWI
     state (Philipp Stanner)

   - Add pcim_intx() for use instead of pci_intx(), which is sometimes
     managed depending on whether pcim_enable_device() has been called
     previously (Philipp Stanner)

   - Add managed pcim_iomap_range() to allow mapping of a partial BAR
     (Philipp Stanner)

   - Fix a devres mapping leak in drm/vboxvideo (Philipp Stanner)

  Error handling:

   - Add missing bridge locking in device reset path and add a warning
     for other possible lock issues (Dan Williams)

   - Fix use-after-free on concurrent DPC and hot-removal (Lukas Wunner)

  Power management:

   - Disable AER and DPC during suspend to avoid spurious wakeups if
     they share an interrupt with PME (Kai-Heng Feng)

  PCIe native device hotplug:

   - Detect if a device was removed or replaced during system sleep so
     we don't assume a new device is the one that used to be there
     (Lukas Wunner)

  Virtualization:

   - Add an ACS quirk for Broadcom BCM5760X multi-function NIC; it
     prevents transactions between functions even though it doesn't
     advertise ACS, so the functions can be attached individually via
     VFIO (Ajit Khaparde)

  Peer-to-peer DMA:

   - Add a "pci=config_acs=" kernel command-line parameter to relax
     default ACS settings to enable additional peer-to-peer
     configurations. Requires expert knowledge of topology and ACS
     operation (Vidya Sagar)

  Endpoint framework:

   - Remove unused struct pci_epf_group.type_group (Christophe JAILLET)

   - Fix error handling in vpci_scan_bus() and epf_ntb_epc_cleanup()
     (Dan Carpenter)

   - Make struct pci_epc_class constant (Greg Kroah-Hartman)

   - Remove unused pci_endpoint_test_bar_{readl,writel} functions
     (Jiapeng Chong)

   - Rename "BME" to "Bus Master Enable" (Manivannan Sadhasivam)

   - Rename struct pci_epc_event_ops.core_init() callback to epc_init()
     (Manivannan Sadhasivam)

   - Move DMA init to MHI .epc_init() callback for uniformity
     (Manivannan Sadhasivam)

   - Cancel EPF test delayed work when link goes down (Manivannan
     Sadhasivam)

   - Add struct pci_epc_event_ops.epc_deinit() callback for cleanup
     needed on fundamental reset (Manivannan Sadhasivam)

   - Add 64KB alignment to endpoint test to support Rockchip rk3588
     (Niklas Cassel)

   - Optimize endpoint test by using memcpy() instead of readl() (Niklas
     Cassel)

  Device tree bindings:

   - Add generic "ats-supported" property to advertise that a PCIe Root
     Complex supports ATS (Jean-Philippe Brucker)

  Amazon Annapurna Labs PCIe controller driver:

   - Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
     (Aleksandr Mishin)

  Axis ARTPEC-6 PCIe controller driver:

   - Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
     address, not a CPU address (Niklas Cassel)

  Freescale i.MX6 PCIe controller driver:

   - Convert to agnostic GPIO API (Andy Shevchenko)

  Freescale Layerscape PCIe controller driver:

   - Make struct mobiveil_rp_ops constant (Christophe JAILLET)

   - Use new generic dw_pcie_ep_linkdown() to handle link-down events
     (Manivannan Sadhasivam)

  HiSilicon Kirin PCIe controller driver:

   - Convert to agnostic GPIO API (Andy Shevchenko)

   - Use _scoped() iterator for OF children to ensure refcounts are
     decremented at loop exit (Javier Carrasco)

  Intel VMD host bridge driver:

   - Create sysfs "domain" symlink before downstream devices are exposed
     to userspace by pci_bus_add_devices() (Jiwei Sun)

  Loongson PCIe controller driver:

   - Enable MSI when LS7A is used with new CPUs that have integrated
     PCIe Root Complex, e.g., Loongson-3C6000, so downstream devices can
     use MSI (Huacai Chen)

  Microchip AXI PolarFlare PCIe controller driver:

   - Move pcie-microchip-host.c to a new PLDA directory (Minda Chen)

   - Factor PLDA generic items out to a common
     plda,xpressrich3-axi-common.yaml binding (Minda Chen)

   - Factor PLDA generic data structures and code out to shared
     pcie-plda.h, pcie-plda-host.c (Minda Chen)

   - Add PLDA generic interrupt handling with a .request_event_irq()
     callback for vendor-specific events (Minda Chen)

   - Add PLDA generic host init/deinit and map bus functions for use by
     vendor-specific drivers (Minda Chen)

   - Rework to use PLDA core (Minda Chen)

  Microsoft Hyper-V host bridge driver:

   - Return zero, not garbage, when reading PCI_INTERRUPT_PIN (Wei Liu)

  NVIDIA Tegra194 PCIe controller driver:

   - Remove unused struct tegra_pcie_soc (Dr. David Alan Gilbert)

   - Set 64KB inbound ATU alignment restriction (Jon Hunter)

  Qualcomm PCIe controller driver:

   - Make the MHI reg region mandatory for X1E80100, since all PCIe
     controllers have it (Abel Vesa)

   - Prevent use of uninitialized data and possible error pointer
     dereference (Dan Carpenter)

   - Return error, not success, if dev_pm_opp_find_freq_floor() fails
     (Dan Carpenter)

   - Add Operating Performance Points (OPP) support to scale performance
     state based on aggregate link bandwidth to improve SoC power
     efficiency (Krishna chaitanya chundru)

   - Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays
     active even if other drivers don't vote for it (Krishna chaitanya
     chundru)

   - Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid
     writing out all the clock names (Manivannan Sadhasivam)

   - Add DT binding and driver support for the SA8775P SoC (Mrinmay
     Sarkar)

   - Add HDMA support for the SA8775P SoC (Mrinmay Sarkar)

   - Override the SA8775P NO_SNOOP default to avoid possible memory
     corruption (Mrinmay Sarkar)

   - Make sure resources are disabled during PERST# assertion, even if
     the link is already disabled (Manivannan Sadhasivam)

   - Use new generic dw_pcie_ep_linkdown() to handle link-down events
     (Manivannan Sadhasivam)

   - Add DT and endpoint driver support for the SA8775P SoC (Mrinmay
     Sarkar)

   - Add Hyper DMA (HDMA) support for the SA8775P SoC and enable it in
     the EPF MHI driver (Mrinmay Sarkar)

   - Set PCIE_PARF_NO_SNOOP_OVERIDE to override the default NO_SNOOP
     attribute on the SA8775P SoC (both Root Complex and Endpoint mode)
     to avoid possible memory corruption (Mrinmay Sarkar)

  Renesas R-Car PCIe controller driver:

   - Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup() to
     avoid unnecessary backtrace (Marek Vasut)

   - Add DT and driver support for R-Car V4H (R8A779G0) host and
     endpoint. This requires separate proprietary firmware (Yoshihiro
     Shimoda)

  Rockchip PCIe controller driver:

   - Assert PERST# for 100ms after power is stable (Damien Le Moal)

   - Wait PCIE_T_RRS_READY_MS (100ms) after reset before starting
     configuration (Damien Le Moal)

   - Use GPIOD_OUT_LOW flag while requesting ep_gpio to fix a firmware
     crash on Qcom-based modems with Rockpro64 board (Manivannan
     Sadhasivam)

  Rockchip DesignWare PCIe controller driver:

   - Factor common parts of rockchip-dw-pcie DT binding to be shared by
     Root Complex and Endpoint mode (Niklas Cassel)

   - Add missing INTx signals to common DT binding (Niklas Cassel)

   - Add eDMA items to DT binding for Endpoint controller (Niklas
     Cassel)

   - Fix initial dw-rockchip PERST# GPIO value to prevent unnecessary
     short assert/deassert that causes issues with some WLAN controllers
     (Niklas Cassel)

   - Refactor dw-rockchip and add support for Endpoint mode (Niklas
     Cassel)

   - Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
     wrapper (Niklas Cassel)

   - Add error messages in .probe() error paths to improve user
     experience (Uwe Kleine-König)

  Samsung Exynos PCIe controller driver:

   - Use bulk clock APIs to simplify clock setup (Shradha Todi)

  StarFive PCIe controller driver:

   - Add DT binding and driver support for the StarFive JH7110
     PLDA-based PCIe controller (Minda Chen)

  Synopsys DesignWare PCIe controller driver:

   - Add generic support for sending PME_Turn_Off when system suspends
     (Frank Li)

   - Fix incorrect interpretation of iATU slot 0 after PERST#
     assert/deassert (Frank Li)

   - Use msleep() instead of usleep_range() while waiting for link
     (Konrad Dybcio)

   - Refactor dw_pcie_edma_find_chip() to enable adding support for
     Hyper DMA (HDMA) (Manivannan Sadhasivam)

   - Enable drivers to supply the eDMA channel count since some can't
     auto detect this (Manivannan Sadhasivam)

   - Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
     wrapper (Manivannan Sadhasivam)

   - Pass the eDMA mapping format directly from drivers instead of
     maintaining a capability for it (Manivannan Sadhasivam)

   - Add generic dw_pcie_ep_linkdown() to notify EPF drivers about
     link-down events and restore non-sticky DWC registers lost on link
     down (Manivannan Sadhasivam)

   - Add vendor-specific "apb" reg name, interrupt names, INTx names to
     generic binding (Niklas Cassel)

   - Enforce DWC restriction that 64-bit BARs must start with an
     even-numbered BAR (Niklas Cassel)

   - Consolidate args of dw_pcie_prog_outbound_atu() into a structure
     (Yoshihiro Shimoda)

   - Add support for endpoints to send Message TLPs, e.g., for INTx
     emulation (Yoshihiro Shimoda)

  TI DRA7xx PCIe controller driver:

   - Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
     address, not a CPU address (Niklas Cassel)

  TI Keystone PCIe controller driver:

   - Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
     (Aleksandr Mishin)

   - Work around AM65x/DRA80xM Errata #i2037 that corrupts TLPs and
     causes processor hangs by limiting Max_Read_Request_Size (MRRS) and
     Max_Payload_Size (MPS) (Kishon Vijay Abraham I)

   - Leave BAR 0 disabled for AM654x to fix a regression caused by
     6ab15b5e70 ("PCI: dwc: keystone: Convert .scan_bus() callback to
     use add_bus"), which caused a 45-second boot delay (Siddharth
     Vadapalli)

  Xilinx Versal CPM PCIe controller driver:

   - Fix overlapping bridge registers and 32-bit BAR addresses in DT
     binding (Thippeswamy Havalige)

  MicroSemi Switchtec management driver:

   - Make struct switchtec_class constant (Greg Kroah-Hartman)

  Miscellaneous:

   - Remove unused struct acpi_handle_node (Dr. David Alan Gilbert)

   - Add missing MODULE_DESCRIPTION() macros (Jeff Johnson)"

* tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (154 commits)
  PCI: loongson: Enable MSI in LS7A Root Complex
  PCI: Extend ACS configurability
  PCI: Add missing bridge lock to pci_bus_lock()
  drm/vboxvideo: fix mapping leaks
  PCI: Add managed pcim_iomap_range()
  PCI: Remove legacy pcim_release()
  PCI: Add managed pcim_intx()
  PCI: vmd: Create domain symlink before pci_bus_add_devices()
  PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()
  PCI: qcom: Prevent potential error pointer dereference
  PCI: qcom: Fix missing error code in qcom_pcie_probe()
  PCI: Give pcim_set_mwi() its own devres cleanup callback
  PCI: Move struct pci_devres.pinned bit to struct pci_dev
  PCI: Remove struct pci_devres.enabled status bit
  PCI: Document hybrid devres hazards
  PCI: Add managed pcim_request_region()
  PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
  PCI: Add managed partial-BAR request and map infrastructure
  PCI: Add devres helpers for iomap table
  PCI: Add and use devres helper for bit masks
  ...
2024-07-19 19:03:18 -07:00
Linus Torvalds
04d17331ca USB/Thunderbolt updates for 6.11-rc1
Here is the big set of USB and Thunderbolt changes for 6.11-rc1.
 Nothing earth-shattering in here, just constant forward progress in
 adding support for new hardware and better debugging functionalities for
 thunderbolt devices and the subsystem.  Included in here are:
   - thunderbolt debugging update and driver additions
   - xhci driver updates
   - typec driver updates
   - kselftest device driver changes (acked by the relevant maintainers,
     depended on other changes in this tree.)
   - cdns3 driver updates
   - gadget driver updates
   - MODULE_DESCRIPTION() additions
   - dwc3 driver updates and fixes
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZppaNA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylXZwCgrEtIAQw0x6EF7w/iTWVS5UJj9AEAoLCj5UwO
 WX978uThyUctuYYKbw+8
 =Cm7j
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt updates from Greg KH:
 "Here is the big set of USB and Thunderbolt changes for 6.11-rc1.

  Nothing earth-shattering in here, just constant forward progress in
  adding support for new hardware and better debugging functionalities
  for thunderbolt devices and the subsystem. Included in here are:

   - thunderbolt debugging update and driver additions

   - xhci driver updates

   - typec driver updates

   - kselftest device driver changes (acked by the relevant maintainers,
     depended on other changes in this tree.)

   - cdns3 driver updates

   - gadget driver updates

   - MODULE_DESCRIPTION() additions

   - dwc3 driver updates and fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (112 commits)
  kselftest: devices: Add test to detect device error logs
  kselftest: Move ksft helper module to common directory
  kselftest: devices: Move discoverable devices test to subdirectory
  usb: gadget: f_uac2: fix non-newline-terminated function name
  USB: uas: Implement the new shutdown callback
  USB: core: add 'shutdown' callback to usb_driver
  usb: typec: Drop explicit initialization of struct i2c_device_id::driver_data to 0
  usb: dwc3: enable CCI support for AMD-xilinx DWC3 controller
  usb: dwc2: add support for other Lantiq SoCs
  usb: gadget: Use u16 types for 16-bit fields
  usb: gadget: midi2: Fix incorrect default MIDI2 protocol setup
  usb: dwc3: core: Check all ports when set phy suspend
  usb: typec: tcpci: add support to set connector orientation
  dt-bindings: usb: Convert fsl-usb to yaml
  usb: typec: ucsi: reorder operations in ucsi_run_command()
  usb: typec: ucsi: extract common code for command handling
  usb: typec: ucsi: inline ucsi_read_message_in
  usb: typec: ucsi: rework command execution functions
  usb: typec: ucsi: split read operation
  usb: typec: ucsi: simplify command sending API
  ...
2024-07-19 15:37:48 -07:00
Linus Torvalds
aba9753c06 TTY/Serial updates for 6.11-rc1
Here is a small set of tty and serial driver updates for 6.11-rc1.  Not
 much happened this cycle, unlike the previous kernel release which had
 lots of "excitement" in this part of the kernel.  Included in here are
 the following changes:
   - dt binding updates for new platforms
   - 8250 driver updates
   - various small serial driver fixes and updates
   - printk/console naming and matching attempt #2 (was reverted for
     6.10-final, should be good to go this time around, acked by the
     relevant maintainers).
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZppbCQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymV1ACeIY5kgipqY7w4d3/7PcpKMiftrisAn0hr6csj
 Gan+k3cuVGlasGkaQ5/B
 =35VK
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial updates from Greg KH:
 "Here is a small set of tty and serial driver updates for 6.11-rc1. Not
  much happened this cycle, unlike the previous kernel release which had
  lots of "excitement" in this part of the kernel. Included in here are
  the following changes:

   - dt binding updates for new platforms

   - 8250 driver updates

   - various small serial driver fixes and updates

   - printk/console naming and matching attempt #2 (was reverted for
     6.10-final, should be good to go this time around, acked by the
     relevant maintainers).

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (22 commits)
  Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports
  serial: core: Add serial_base_match_and_update_preferred_console()
  printk: Add match_devname_and_update_preferred_console()
  serial: sc16is7xx: hardware reset chip if reset-gpios is defined in DT
  dt-bindings: serial: sc16is7xx: add reset-gpios
  dt-bindings: serial: vt8500-uart: convert to json-schema
  serial: 8250_platform: Explicitly show we initialise ISA ports only once
  tty: add missing MODULE_DESCRIPTION() macros
  dt-bindings: serial: mediatek,uart: add MT7988
  serial: sh-sci: Add support for RZ/V2H(P) SoC
  dt-bindings: serial: Add documentation for Renesas RZ/V2H(P) (R9A09G057) SCIF support
  dt-bindings: serial: renesas,scif: Make 'interrupt-names' property as required
  dt-bindings: serial: renesas,scif: Validate 'interrupts' and 'interrupt-names'
  dt-bindings: serial: renesas,scif: Move ref for serial.yaml at the end
  riscv: dts: starfive: jh7110: Add the core reset and jh7110 compatible for uarts
  serial: 8250_dw: Use reset array API to get resets
  dt-bindings: serial: snps-dw-apb-uart: Add one more reset signal for StarFive JH7110 SoC
  serial: 8250: Extract platform driver
  serial: 8250: Extract RSA bits
  serial: imx: stop casting struct uart_port to struct imx_port
  ...
2024-07-19 15:22:14 -07:00
Linus Torvalds
661fb4e68c - Optimize processing of flush bios in the dm-linear and dm-stripe
targets
 
 - Dm-io cleansups and refactoring
 
 - Remove unused 'struct thunk' in dm-cache
 
 - Handle minor device numbers > 255 in dm-init
 
 - Dm-verity refactoring & enabling platform keyring
 
 - Fix warning in dm-raid
 
 - Improve dm-crypt performance - split bios to smaller pieces, so that
   They could be processed concurrently
 
 - Stop using blk_limits_io_{min,opt}
 
 - Dm-vdo cleanup and refactoring
 
 - Remove max_write_zeroes_granularity and max_secure_erase_granularity
 
 - Dm-multipath cleanup & refactoring
 
 - Add dm-crypt and dm-integrity support for non-power-of-2 sector size
 
 - Fix reshape in dm-raid
 
 - Make dm_block_validator const
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCZpo+9xQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbYKDAQCZP2pJyh9tRZ8GsHtk3l/ZMftmk1/c
 26v6vYlOTObJHAEA3TH2ahVnzhqYs/x3zEW/n91feTSeUJrrJ9DqHxWt+Ac=
 =S3yx
 -----END PGP SIGNATURE-----

Merge tag 'for-6.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - Optimize processing of flush bios in the dm-linear and dm-stripe
   targets

 - Dm-io cleansups and refactoring

 - Remove unused 'struct thunk' in dm-cache

 - Handle minor device numbers > 255 in dm-init

 - Dm-verity refactoring & enabling platform keyring

 - Fix warning in dm-raid

 - Improve dm-crypt performance - split bios to smaller pieces, so that
   They could be processed concurrently

 - Stop using blk_limits_io_{min,opt}

 - Dm-vdo cleanup and refactoring

 - Remove max_write_zeroes_granularity and max_secure_erase_granularity

 - Dm-multipath cleanup & refactoring

 - Add dm-crypt and dm-integrity support for non-power-of-2 sector size

 - Fix reshape in dm-raid

 - Make dm_block_validator const

* tag 'for-6.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (33 commits)
  dm vdo: fix a minor formatting issue in vdo.rst
  dm vdo int-map: fix kerneldoc formatting
  dm vdo repair: add missing kerneldoc fields
  dm: Constify struct dm_block_validator
  dm-integrity: introduce the Inline mode
  dm: introduce the target flag mempool_needs_integrity
  dm raid: fix stripes adding reshape size issues
  dm raid: move _get_reshape_sectors() as prerequisite to fixing reshape size issues
  dm-crypt: support for per-sector NVMe metadata
  dm mpath: don't call dm_get_device in multipath_message
  dm: factor out helper function from dm_get_device
  dm-verity: fix dm_is_verity_target() when dm-verity is builtin
  dm: Remove max_secure_erase_granularity
  dm: Remove max_write_zeroes_granularity
  dm vdo indexer: use swap() instead of open coding it
  dm vdo: remove unused struct 'uds_attribute'
  dm: stop using blk_limits_io_{min,opt}
  dm-crypt: limit the size of encryption requests
  dm verity: add support for signature verification with platform keyring
  dm-raid: Fix WARN_ON_ONCE check for sync_thread in raid_resume
  ...
2024-07-19 10:48:44 -07:00
Masatake YAMATO
7f1c4909a8 dm vdo: fix a minor formatting issue in vdo.rst
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-07-19 12:08:21 +02:00
Linus Torvalds
cf05e93af4 Nothing hugely exciting happening in the documentation tree this time
around, mostly more of the usual:
 
 - More Spanish, Italian, and Chinese translations
 
 - A new script, scripts/checktransupdate.py, can be used to see which
   commits have touched an (English) document since a given translation was
   last updated.
 
 - A couple of "best practices" suggestions (on Link: tags and off-list
   discussions) that were not entirely at consensus level, but I concluded
   they were close enough to accept.
 
 - Some nice cleanups removing documentation for kernel parameters that have
   not been recognized for ... a long time.
 
 ...along with the usual updates, typo fixes, and such.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmaZbLMPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y7PkH/jk1LverE9XOXZO5Uq+eEwWlNI2khjQ0hI+M
 b0GZlIfeHsted0I8CsYapbehhqve700QJQ8/dmst9jPEwiQq9omSNp8ux/mpIvk+
 OjeCLoApZ1slYj9HeiDkwuLDw5o0bKOep6fmrlnnc2uJezqBbjSLmUgocqfCnZb1
 fHikvSP0McKjffei76+KH1PYK8BmJwredsHvmfehLJpETHQhe11tO3byPM48iLcy
 mybECacqB8zfy7wkvVTWhd+QFkT7x+BE4g/Z07L8z4m9HRxmJbV6EJF1GPlpDJWZ
 TV0u86cOAlpMeUy44pfUnej6E9ntafeaHmX7CJpcgskh3h4J/qc=
 =uk19
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.11' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Nothing hugely exciting happening in the documentation tree this time
  around, mostly more of the usual:

   - More Spanish, Italian, and Chinese translations

   - A new script, scripts/checktransupdate.py, can be used to see which
     commits have touched an (English) document since a given
     translation was last updated.

   - A couple of "best practices" suggestions (on Link: tags and
     off-list discussions) that were not entirely at consensus level,
     but I concluded they were close enough to accept.

   - Some nice cleanups removing documentation for kernel parameters
     that have not been recognized for ... a long time.

  ...along with the usual updates, typo fixes, and such"

* tag 'docs-6.11' of git://git.lwn.net/linux: (57 commits)
  Documentation: Document user_events ioctl code
  docs/pinctrl: fix typo in mapping example
  docs: maintainer: discourage taking conversations off-list
  docs: driver-model: platform: update the definition of platform_driver
  docs/sp_SP: Add translation for scheduler/sched-design-CFS.rst
  writing_musb_glue_layer.rst: Fix broken URL
  zh_CN/admin-guide: one typo fix
  docs/zh_CN/virt: Update the translation of guest-halt-polling.rst
  Documentation: add reference from dynamic debug to loglevel kernel params
  Documentation: best practices for using Link trailers
  Documentation: fix links to mailing list services
  Documentation: exception-tables.rst: Fix the wrong steps referenced
  docs/zh_CN: add process/researcher-guidelines Chinese translation
  Documentation/tools/rv: fix document header
  docs/sp_SP: Add translation of process/maintainer-kvm-x86.rst
  docs/admin-guide/mm: correct typo 'quired' to 'queried'
  Add libps2 to the input section of driver-api
  Docs/mm/index: move allocation profiling document to unsorted documents chapter
  Docs/mm/index: rename 'Legacy Documentation' to 'Unsorted Documentation'
  Docs/mm/index: Remove 'Memory Management Guide' chapter marker
  ...
2024-07-18 15:54:16 -07:00
Linus Torvalds
b2fc97c186 memblock: updates for 6.11-rc1
* reserve_mem command line parameter to allow creation of named memory
   reservation at boot time.
   The driving use-case is to improve the ability of pstore to retain
   ramoops data across reboots.
 * cleaunps and small improvements in memblock and mm_init
 * new tests cases in memblock test suite
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmaXfoIQHHJwcHRAa2Vy
 bmVsLm9yZwAKCRA5A4Ymyw79kU5mCAC23vIrB8FRlORczMYj+V3VFss3OjKT92lS
 fHGwq2oxHW+rdDpHXFObHU0D3k8d2l5jyrENRAAyA02qR0L6Pv8Na6pGxQua1eic
 VIdw0PFQMsizD1AIj84Y6skkyyF/tvZHpmX0B12D5+Ur65DC/Z867Cm/lE33/fHv
 /1+QB0JlG7W+FzxVstYyebY5/DVkH+bC7/A57FE2oB4BRXjEd8v9tTHBS4kRSvrE
 zE2KFxeGajN749LHztIpIprPKehu8Gc3oLrYLNJO+uLFVCV8ey3OqVj0RXMG2wLl
 hmVYqhbZM/Uz59D/P8pULD49f1Thjv/5A/MvUZ3SxM6zpWlsincf
 =xrZd
 -----END PGP SIGNATURE-----

Merge tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock updates from Mike Rapoport:

 - 'reserve_mem' command line parameter to allow creation of named
   memory reservation at boot time.

   The driving use-case is to improve the ability of pstore to retain
   ramoops data across reboots.

 - cleanups and small improvements in memblock and mm_init

 - new tests cases in memblock test suite

* tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock tests: fix implicit declaration of function 'numa_valid_node'
  memblock: Move late alloc warning down to phys alloc
  pstore/ramoops: Add ramoops.mem_name= command line option
  mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  mm/mm_init.c: don't initialize page->lru again
  mm/mm_init.c: not always search next deferred_init_pfn from very beginning
  mm/mm_init.c: use deferred_init_mem_pfn_range_in_zone() to decide loop condition
  mm/mm_init.c: get the highest zone directly
  mm/mm_init.c: move nr_initialised reset down a bit
  mm/memblock: fix a typo in description of for_each_mem_region()
  mm/mm_init.c: use memblock_region_memory_base_pfn() to get startpfn
  mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap
  mm/memblock: return true directly on finding overlap region
  memblock tests: add memblock_overlaps_region_checks
  mm/memblock: fix comment for memblock_isolate_range()
  memblock tests: add memblock_reserve_many_may_conflict_check()
  memblock tests: add memblock_reserve_all_locations_check()
  mm/memblock: remove empty dummy entry
2024-07-18 14:48:11 -07:00
Linus Torvalds
b1bc554e00 media updates for v6.11-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmaXfCQACgkQCF8+vY7k
 4RWuBg/+NRAVuzYW3AQPIaggajTGHfkk6WCTCVgQQZFCDqphS6YtgfXUJ8qO5YXk
 ZieGu+g2081BFehzcZxcaSo9pFWyqX1fjUU2sjFwRDSl9NRctsjvKE9J1DVKCsMW
 QU5yOYyBJmoVugj4YCH7Yga8OElZAWperxJidV4AmFkX93OwZDZl+wNKuSTmG/lX
 ju+Z6yzv0DN0WvgL8+LlZ2k5tpx+kAld07FFwQM54MPI9CBWyQjogGyro/1S6ymh
 WAbwbEMCvGSvGhi4issMMOK2mpmh2EAKCXBMWF5bXNOLuFWrU9TtCBr6AITKDvn7
 btQNpa8GApO+GehEQtWOX5WgZp2ypwCrMUtiwftPOtF4Z8Tl7MJfn4u6wWCxj4cy
 67HbOgWRZQRIzyUSF8vay6PeMrh8jYi+unWuOxGpnzilno1nV2hTzh4n1we15qIn
 8pnNSbtgrJCvrIgtATYjP1FWgjBxwuNIpFGxo2ly+hgbu6COLZFfg0Oju3FBdOF1
 ZxGkp1SaxcKeuFa6kbATj7y2dAjtre8drB9RfJY1C97Ta+C9ws4jBytVHbceA7u+
 GJfAis2CEStLPpe3ND9n0ekeB/qSPcgGC2HLQR7L1u30Kx75T4I49HF0lcKev9gK
 oTRUPvZu/bI6NmSRwYYY7jo1rox5ffftJ2ZICeQaluV2dbOMUc8=
 =7nBb
 -----END PGP SIGNATURE-----

Merge tag 'media/v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - New sensor drivers: gc05a2, gc08a3 and imx283

 - New serializer/deserializer drivers: max96714 and max96717

 - New JPEG encoder driver: e5010

 - Support for Raspberry Pi PiSP Backend (BE) ISP driver

 - Old documentation for av7110 driver removed, as a new version was
   added as Documentation/userspace-api/media/dvb/legacy*.rst

 - atompisp: Linux firmwares are now available, so drop firmware-related
   task from TODO and update firmware logic

 - The imx258 driver has gained several improvements

 - wave5 driver has gained support for HEVC decoding

 - em28xx gained support for MyGica UTV3

 - av7110 budget-patch driver removed

 - Lots of other cleanups, improvements and fixes

* tag 'media/v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (301 commits)
  media: raspberrypi: Switch to remove_new
  media: uapi: pisp_be_config: Add extra config fields
  media: uapi: pisp_be_config: Re-sort pisp_be_tiles_config
  media: uapi: pisp_common: Capitalize all macros
  media: uapi: pisp_common: Add 32 bpp format test
  media: uapi: pisp_be_config: Drop BIT() from uAPI
  media: stm32: dcmipp: correct error handling in dcmipp_create_subdevs
  media: atomisp: Fix spelling mistakes in sh_css_sp.c
  media: atomisp: Fix spelling mistake in ia_css_debug.c
  media: atomisp: Fix spelling mistake in hmm_bo.c
  media: atomisp: Fix spelling mistake in ia_css_eed1_8.host.c
  media: atomisp: Fix spelling mistake in sh_css_internal.h
  media: atomisp: Fix spelling mistake "pipline" -> "pipeline"
  media: atomisp: Remove unused GPIO related defines and APIs
  media: atomisp: Replace COMPILATION_ERROR_IF() by static_assert()
  media: atomisp: Clean up unused macros from math_support.h
  media: atomisp: csi2-bridge: Add DMI quirk for OV5693 on Xiaomi Mipad2
  media: atomisp: Update TODO
  media: atomisp: Prefix firmware paths with "intel/ipu/"
  media: atomisp: Remove firmware_name module parameter
  ...
2024-07-17 18:30:10 -07:00
Linus Torvalds
a5cb6b2bbf platform-drivers-x86 for v6.11-1
Highlights:
  - amd/pmf:		Report system state changes using existing input
 			events
  - asus-wmi:		Zenbook 2023 camera LED disable support and fix
 			TUF laptop keyboard RGB LED sysfs interface
  - dell-pc:		Fan modes / platform profile support
  - hp-wmi:		Fix platform profile switching on Omen/Victus
 			laptops
  - intel/ISST:		Use only TPMI interface when TPMI and legacy
 			interfaces are available
  - intel/pmc:		LTR restore support to pair with LTR ignore
  - intel/tpmi:		Performance Limit Reasons (PLR) and APIC <-> Punit
 			CPU numbering mapping support
  - WMI:			driver override support and docs improvements
  - lenovo-yoga-c630:	Support for EC (platform/arm64)
  - platform/arm64:	Fix build with COMPILE_TEST (broke after addition
 			of C630)
  - tools:		Intel Speed Select Turbo Ratio Limit fix
  - Miscellaneous cleanups / refactoring / improvements
 
 The following is an automated shortlog grouped by driver:
 
 amd/pmf:
  -  Remove update system state document
  -  Use existing input event codes to update system states
  -  Use memdup_user()
 
 arm64:
  -  add Lenovo Yoga C630 WOS EC driver
  -  build drivers even on non-ARM64 platforms
  -  EC_ACER_ASPIRE1 should depend on ARCH_QCOM
  -  EC_LENOVO_YOGA_C630 should depend on ARCH_QCOM
 
 arm64: lenovo-yoga-c630:
  -  select AUXILIARY_BUS
 
 asus-tf103c-dock:
  -  Use 2-argument strscpy()
 
 asus-wmi:
  -  fix TUF laptop RGB variant
  -  support the disable camera LED on F10 of Zenbook 2023
 
 dell-pc:
  -  avoid double free and invalid unregistration
  -  Implement platform_profile
 
 dell-smbios:
  -  Add helper for checking supported class
  -  Move request functions for reuse
 
 Docs/admin-guide:
  -  Remove pmf leftover reference from the index
 
 doc: TPMI:
  -  Add entry for Performance Limit Reasons
 
 dt-bindings: platform:
  -  Add Lenovo Yoga C630 EC
 
 hp: hp-bioscfg:
  -  Use 2-argument strscpy()
 
 hp-wmi:
  -  Fix implementation of the platform_profile_omen_get function
  -  Fix platform profile option switch bug on Omen and Victus laptops
 
 ideapad-laptop:
  -  use cleanup.h
 
 intel: chtwc_int33fe:
  -  Use 2-argument strscpy()
 
 intel/ifs:
  -  Switch to new Intel CPU model defines
 
 intel_ips:
  -  Switch to new Intel CPU model defines
 
 intel/pmc:
  -  Add support to show ltr_ignore value
  -  Add support to undo ltr_ignore
  -  Convert index variables to be unsigned
  -  Move pmc assignment closer to first usage
  -  Remove unneeded min_t check
  -  Simplify mutex usage with cleanup helpers
  -  Switch to new Intel CPU model defines
  -  Use DEFINE_SHOW_STORE_ATTRIBUTE macro
  -  Use the Elvis operator
  -  Use the return value of pmc_core_send_msg
 
 intel_scu_wdt:
  -  Switch to new Intel CPU model defines
 
 intel_speed_select_if:
  -  Switch to new Intel CPU model defines
 
 intel_telemetry:
  -  Switch to new Intel CPU model defines
 
 intel/tpmi:
  -  Add API to get debugfs root
  -  Add new auxiliary driver for performance limits
  -  Add support for performance limit reasons
 
 intel:
  -  TPMI domain id and CPU mapping
 
 intel/tpmi/plr:
  -  Add support for the plr mailbox
  -  Fix output in plr_print_bits()
 
 intel_turbo_max_3:
  -  Switch to new Intel CPU model defines
 
 intel-uncore-freq:
  -  Get rid of magic min_max argument
  -  Get rid of magic values
  -  Get rid of uncore_read_freq driver API
  -  Re-arrange bit masks
  -  Rename the sysfs helper macro names
  -  Switch to new Intel CPU model defines
  -  Use generic helpers for current frequency
  -  Use uncore_index with read_control_freq
 
 ISST:
  -  Add model specific loading for common module
  -  Avoid some SkyLake server models
  -  Use only TPMI interface when present
 
 p2sb:
  -  Switch to new Intel CPU model defines
 
 serial-multi-instantiate:
  -  Use 2-argument strscpy()
 
 think-lmi:
  -  Use 2-argument strscpy()
 
 thinkpad_acpi:
  -  Use 2-argument strscpy()
 
 tools/power/x86/intel-speed-select:
  -  Set TRL MSR in 100 MHz units
  -  v1.20 release
 
 wmi:
  -  Add bus ABI documentation
  -  Add driver_override support
 
 x86/platform/atom:
  -  Switch to new Intel CPU model defines
 
 Merges:
  -  Merge branch 'pdx86/platform-drivers-x86-lenovo-c630' into review-ilpo
  -  Merge branch 'pdx86/platform-drivers-x86-lenovo-c630' into review-ilpo
  -  Merge branch 'pdx86/platform-drivers-x86-lenovo-c630' into review-ilpo
  -  Merge remote-tracking branch 'intel-speed-select/intel-sst' into review-ilpo
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCZpZIdQAKCRBZrE9hU+XO
 MbIEAQCMVjDuOJSSuS2u7/iVb41Q3+kjP6X0CmSpf8dmt3rH0gD/Z9Qynw6ArRY4
 PPHY25ur8kPtwtyxHfCMcar6ESpztwU=
 =L2LD
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver updates from Ilpo Järvinen:

 - amd/pmf: Report system state changes using existing input events

 - asus-wmi: Zenbook 2023 camera LED disable support and fix TUF laptop
   keyboard RGB LED sysfs interface

 - dell-pc: Fan modes / platform profile support

 - hp-wmi: Fix platform profile switching on Omen/Victus laptops

 - intel/ISST: Use only TPMI interface when TPMI and legacy interfaces
   are available

 - intel/pmc: LTR restore support to pair with LTR ignore

 - intel/tpmi: Performance Limit Reasons (PLR) and APIC <-> Punit CPU
   numbering mapping support

 - WMI: driver override support and docs improvements

 - lenovo-yoga-c630: Support for EC (platform/arm64)

 - platform/arm64: Fix build with COMPILE_TEST (broke after addition of
   C630)

 - tools: Intel Speed Select Turbo Ratio Limit fix

 - Miscellaneous cleanups / refactoring / improvements

* tag 'platform-drivers-x86-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (65 commits)
  platform/x86: asus-wmi: fix TUF laptop RGB variant
  platform/x86/intel/tpmi/plr: Fix output in plr_print_bits()
  Docs/admin-guide: Remove pmf leftover reference from the index
  platform/x86: ideapad-laptop: use cleanup.h
  platform/x86: hp-wmi: Fix implementation of the platform_profile_omen_get function
  platform: arm64: EC_LENOVO_YOGA_C630 should depend on ARCH_QCOM
  platform: arm64: EC_ACER_ASPIRE1 should depend on ARCH_QCOM
  platform/x86/amd/pmf: Remove update system state document
  platform/x86/amd/pmf: Use existing input event codes to update system states
  platform/x86: hp-wmi: Fix platform profile option switch bug on Omen and Victus laptops
  platform/x86:intel/pmc: Add support to undo ltr_ignore
  platform/x86:intel/pmc: Use the Elvis operator
  platform/x86:intel/pmc: Use DEFINE_SHOW_STORE_ATTRIBUTE macro
  platform/x86:intel/pmc: Remove unneeded min_t check
  platform/x86:intel/pmc: Add support to show ltr_ignore value
  platform/x86:intel/pmc: Move pmc assignment closer to first usage
  platform/x86:intel/pmc: Convert index variables to be unsigned
  platform/x86:intel/pmc: Simplify mutex usage with cleanup helpers
  platform/x86:intel/pmc: Use the return value of pmc_core_send_msg
  tools/power/x86/intel-speed-select: v1.20 release
  ...
2024-07-17 17:05:21 -07:00
Linus Torvalds
4a996d90b9 Scheduler changes for v6.11:
- Update Daniel Bristot de Oliveira's entry in MAINTAINERS,
    and credit him in CREDITS.
 
  - Harmonize the lock-yielding behavior on dynamically selected
    preemption models with static ones.
 
  - Reorganize the code a bit: split out sched/syscalls.c to reduce
    the size of sched/core.c
 
  - Micro-optimize psi_group_change()
 
  - Fix set_load_weight() for SCHED_IDLE tasks
 
  - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmaVtVARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iqTQ/9GLNzNBnl0oBWCiybeQjyWsZ6BiZi48R0
 C1g9/RKy++OyGOjn/yqYK0Kg8cdfoGzHGioMMAucHFW1nXZwVw17xAJK127N0apF
 83up7AnFJw/JGr1bI0FwuozqHAs4Z5KzHTv2KBxhYuO77lyYna6/t0liRUbF8ZUZ
 I/nqav7wDB8RBIB5hEJ/uYLDX7qWdUlyFB+mcvV4ANA99yr++OgipCp6Ob3Rz3cP
 O676nKJY4vpNbZ/B6bpKg8ezULRP8re2qD3GJRf2huS63uu/Z5ct7ouLVZ1DwN53
 mFDBTYUMI2ToV0pseikuqwnmrjxAKcEajTyZpD3vckafd2TlWIopkQZoQ9XLLlIZ
 DxO+KoekaHTSVy8FWlO8O+iE3IAdUUgECEpNveX45Pb7nFP+5dtFqqnVIdNqCq5e
 zEuQvizaa5m+A1POZhZKya+z9jbLXXx+gtPCbbADTBWtuyl8azUIh3vjn0bykmv4
 IVV/wvUm+BPEIhnKusZZOgB0vLtxUdntBBfUSxqoSOad9L+0/UtSKoKI6wvW00q8
 ZkW+85yS3YFiN9W61276RLis2j7OAjE0eDJ96wfhooma2JRDJU4Wmg5oWg8x3WuA
 JRmK0s63Qik5gpwG5rHQsR5jNqYWTj5Lp7So+M1kRfFsOM/RXQ/AneSXZu/P7d65
 LnYWzbKu76c=
 =lLab
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Update Daniel Bristot de Oliveira's entry in MAINTAINERS,
   and credit him in CREDITS

 - Harmonize the lock-yielding behavior on dynamically selected
   preemption models with static ones

 - Reorganize the code a bit: split out sched/syscalls.c to reduce
   the size of sched/core.c

 - Micro-optimize psi_group_change()

 - Fix set_load_weight() for SCHED_IDLE tasks

 - Misc cleanups & fixes

* tag 'sched-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Update MAINTAINERS and CREDITS
  sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE tasks
  sched/psi: Optimise psi_group_change a bit
  sched/core: Drop spinlocks on contention iff kernel is preemptible
  sched/core: Move preempt_model_*() helpers from sched.h to preempt.h
  sched/balance: Skip unnecessary updates to idle load balancer's flags
  idle: Remove stale RCU comment
  sched/headers: Move struct pre-declarations to the beginning of the header
  sched/core: Clean up kernel/sched/sched.h a bit
  sched/core: Simplify prefetch_curr_exec_start()
  sched: Fix spelling in comments
  sched/syscalls: Split out kernel/sched/syscalls.c from kernel/sched/core.c
2024-07-16 17:00:50 -07:00
Linus Torvalds
41906248d0 Power management updates for 6.11-rc1
- Add Loongson-3 CPUFreq driver support (Huacai Chen).
 
  - Add support for the Arrow Lake and Lunar Lake platforms and
    the out-of-band (OOB) mode on Emerald Rapids to the intel_pstate
    cpufreq driver, make it support the highest performance change
    interrupt and clean it up (Srinivas Pandruvada).
 
  - Switch cpufreq to new Intel CPU model defines (Tony Luck).
 
  - Simplify the cpufreq driver interface by switching the .exit() driver
    callback to the void return data type (Lizhe, Viresh Kumar).
 
  - Make cpufreq_boost_enabled() return bool (Dhruva Gole).
 
  - Add fast CPPC support to the amd-pstate cpufreq driver, address
    multiple assorted issues in it and clean it up (Perry Yuan, Mario
    Limonciello, Dhananjay Ugwekar, Meng Li, Xiaojian Du).
 
  - Add Allwinner H700 speed bin to the sun50i cpufreq driver (Ryan
    Walklin).
 
  - Fix memory leaks and of_node_put() usage in the sun50i and qcom-nvmem
    cpufreq drivers (Javier Carrasco).
 
  - Clean up the sti and dt-platdev cpufreq drivers (Jeff Johnson,
    Raphael Gallais-Pou).
 
  - Fix deferred probe handling in the TI cpufreq driver and wrong return
    values of ti_opp_supply_probe(), and add OPP tables for the AM62Ax and
    AM62Px SoCs to it (Bryan Brattlof, Primoz Fiser).
 
  - Avoid overflow of target_freq in .fast_switch() in the SCMI cpufreq
    driver (Jagadeesh Kona).
 
  - Use dev_err_probe() in every error path in probe in the Mediatek
    cpufreq driver (Nícolas Prado).
 
  - Fix kernel-doc param for longhaul_setstate in the longhaul cpufreq
    driver (Yang Li).
 
  - Fix system resume handling in the CPPC cpufreq driver (Riwen Lu).
 
  - Improve the teo cpuidle governor and clean up leftover comments from
    the menu cpuidle governor (Christian Loehle).
 
  - Clean up a comment typo in the teo cpuidle governor (Atul Kumar
    Pant).
 
  - Add missing MODULE_DESCRIPTION() macro to cpuidle haltpoll (Jeff
    Johnson).
 
  - Switch the intel_idle driver to new Intel CPU model defines (Tony
    Luck).
 
  - Switch the Intel RAPL driver new Intel CPU model defines (Tony Luck).
 
  - Simplify if condition in the idle_inject driver (Thorsten Blum).
 
  - Fix missing cleanup on error in _opp_attach_genpd() (Viresh Kumar).
 
  - Introduce an OF helper function to inform if required-opps is used
    and drop a redundant in-parameter to _set_opp_level() (Ulf Hansson).
 
  - Update pm-graph to v5.12 which includes fixes and major code revamp
    for python3.12 (Todd Brandt).
 
  - Address several assorted issues in the cpupower utility (Roman
    Storozhenko).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmaVb+8SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxXIUQALFhNTO+wo8uPWUmsp0SV81Sbf17zM0f
 9IDpzJTUZLK0stTdLtxY4khcClPE4MrwS/LjSJlvkEVZChHpUw6vFezHmx0O42Ti
 Tmv3ezABSAmx6QVRSpyVhE3Hb0BmXW9V+3dtoefofV0JWenN7mqk4Hbb2Jx1Cvbh
 zyerUeWWl97yqVMM2l5owKHSvk7SYO6cfML73XcdXQ6pBfQePfekG87i1+r40l+d
 qEzdyh6JjqGbdkvZKtI4zO1Hdai9FdlLWSqYmVZGS5XRN8RVvDaHDIDlSijNXAei
 DFPFoBVAvl8CymBXXnzDyJJhCCkEb2aX3xD6WzthoCygZt5W+tqfGxyZfViBfb55
 kvpyiWZUVaDyX4Hfz1PLnJ7Xg9kPUKUcDDrsV5vKA7W0Sq2T0RbORsVkaP2nIhlY
 4Xspp9nEv+78DG0UjT7jT0Py2Oq9I6BTG+pmMTxcgA7G/U5H2uAvvIM/kwQ+30vi
 yUxO3W5o9TQmvJF1klHgp3YsCNWZG3IYacHZzUIoPbPusEbevYrCuUNriT+zlANc
 Pv/FMfBfHDmU2lHWyLzuoKhlzQosNi9NajMANBJgd55zACWKzgNzFV4P5gIMd1KR
 moJYfosbT2RWetEH8Zrh7xA5dewUphe6tibshElbKJHilnP0iFjYhhdb6aQRcuPd
 q/RECFYT7z0r
 =imBx
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add a new cpufreq driver for Loongson-3, add support for new
  features in the intel_pstate (Lunar Lake and Arrow Lake platforms, OOB
  mode for Emerald Rapids, highest performance change interrupt),
  amd-pstate (fast CPPC) and sun50i (Allwinner H700 speed bin) cpufreq
  drivers, simplify the cpufreq driver interface, simplify the teo
  cpuidle governor, adjust the pm-graph utility for a new version of
  Python, address issues and clean up code.

  Specifics:

   - Add Loongson-3 CPUFreq driver support (Huacai Chen)

   - Add support for the Arrow Lake and Lunar Lake platforms and the
     out-of-band (OOB) mode on Emerald Rapids to the intel_pstate
     cpufreq driver, make it support the highest performance change
     interrupt and clean it up (Srinivas Pandruvada)

   - Switch cpufreq to new Intel CPU model defines (Tony Luck)

   - Simplify the cpufreq driver interface by switching the .exit()
     driver callback to the void return data type (Lizhe, Viresh Kumar)

   - Make cpufreq_boost_enabled() return bool (Dhruva Gole)

   - Add fast CPPC support to the amd-pstate cpufreq driver, address
     multiple assorted issues in it and clean it up (Perry Yuan, Mario
     Limonciello, Dhananjay Ugwekar, Meng Li, Xiaojian Du)

   - Add Allwinner H700 speed bin to the sun50i cpufreq driver (Ryan
     Walklin)

   - Fix memory leaks and of_node_put() usage in the sun50i and
     qcom-nvmem cpufreq drivers (Javier Carrasco)

   - Clean up the sti and dt-platdev cpufreq drivers (Jeff Johnson,
     Raphael Gallais-Pou)

   - Fix deferred probe handling in the TI cpufreq driver and wrong
     return values of ti_opp_supply_probe(), and add OPP tables for the
     AM62Ax and AM62Px SoCs to it (Bryan Brattlof, Primoz Fiser)

   - Avoid overflow of target_freq in .fast_switch() in the SCMI cpufreq
     driver (Jagadeesh Kona)

   - Use dev_err_probe() in every error path in probe in the Mediatek
     cpufreq driver (Nícolas Prado)

   - Fix kernel-doc param for longhaul_setstate in the longhaul cpufreq
     driver (Yang Li)

   - Fix system resume handling in the CPPC cpufreq driver (Riwen Lu)

   - Improve the teo cpuidle governor and clean up leftover comments
     from the menu cpuidle governor (Christian Loehle)

   - Clean up a comment typo in the teo cpuidle governor (Atul Kumar
     Pant)

   - Add missing MODULE_DESCRIPTION() macro to cpuidle haltpoll (Jeff
     Johnson)

   - Switch the intel_idle driver to new Intel CPU model defines (Tony
     Luck)

   - Switch the Intel RAPL driver new Intel CPU model defines (Tony
     Luck)

   - Simplify if condition in the idle_inject driver (Thorsten Blum)

   - Fix missing cleanup on error in _opp_attach_genpd() (Viresh Kumar)

   - Introduce an OF helper function to inform if required-opps is used
     and drop a redundant in-parameter to _set_opp_level() (Ulf Hansson)

   - Update pm-graph to v5.12 which includes fixes and major code revamp
     for python3.12 (Todd Brandt)

   - Address several assorted issues in the cpupower utility (Roman
     Storozhenko)"

* tag 'pm-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (77 commits)
  cpufreq: sti: fix build warning
  cpufreq: mediatek: Use dev_err_probe in every error path in probe
  cpufreq: Add Loongson-3 CPUFreq driver support
  cpufreq: Make cpufreq_driver->exit() return void
  cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems
  cpufreq/amd-pstate-ut: Convert nominal_freq to khz during comparisons
  cpufreq: pcc: Remove empty exit() callback
  cpufreq: loongson2: Remove empty exit() callback
  cpufreq: nforce2: Remove empty exit() callback
  cpupower: fix lib default installation path
  cpufreq: docs: Add missing scaling_available_frequencies description
  cpuidle: teo: Don't count non-existent intercepts
  cpupower: Disable direct build of the 'bench' subproject
  cpuidle: teo: Remove recent intercepts metric
  Revert: "cpuidle: teo: Introduce util-awareness"
  cpufreq: make cpufreq_boost_enabled() return bool
  cpufreq: intel_pstate: Support highest performance change interrupt
  x86/cpufeatures: Add HWP highest perf change feature flag
  Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method
  cpufreq: amd-pstate: Cap the CPPC.max_perf to nominal_perf if CPB is off
  ...
2024-07-16 15:54:03 -07:00
Linus Torvalds
f83e38fc9f xen: branch for v6.11-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZpS2TQAKCRCAXGG7T9hj
 vjryAQDy08vSiCNYnO4K0AXO9KhCLsXMpbdevTE0yHdtSW/IwQD/eOmrntgBArA4
 PfQanbzM3Rj+h6p1zsfvW98DgmFrfAQ=
 =tG6C
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - some trivial cleanups

 - a fix for the Xen timer

 - add boot time selectable debug capability to the Xen multicall
   handling

 - two fixes for the recently added Xen irqfd handling

* tag 'for-linus-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: remove deprecated xen_nopvspin boot parameter
  x86/xen: eliminate some private header files
  x86/xen: make some functions static
  xen: make multicall debug boot time selectable
  xen/arm: Convert comma to semicolon
  xen: privcmd: Fix possible access to a freed kirqfd instance
  xen: privcmd: Switch from mutex to spinlock for irqfds
  xen: add missing MODULE_DESCRIPTION() macros
  x86/xen: Convert comma to semicolon
  x86/xen/time: Reduce Xen timer tick
  xen/manage: Constify struct shutdown_handler
2024-07-16 12:30:57 -07:00
Linus Torvalds
e55037c879 EFI updates for v6.11
- Drop support for the 'fake' EFI memory map on x86
 
 - Add an SMBIOS based tweak to the EFI stub instructing the firmware on
   x86 Macbook Pros to keep both GPUs enabled
 
 - Replace 0-sized array with flexible array in EFI memory attributes
   table handling
 
 - Drop redundant BSS clearing when booting via the native PE entrypoint
   on x86
 
 - Avoid returning EFI_SUCCESS when aborting on an out-of-memory
   condition
 
 - Cosmetic tweak for arm64 KASLR loading logic
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZpTg5gAKCRAwbglWLn0t
 XOrOAQCpZjtjkPRPCBY+t3wUl84rOKiPr1SMHyL50Zl8udJKegD/bnwWSgX3FzLQ
 TN+xjnK7IAxEoKAEWt8lnt04cH5r3As=
 =7VWO
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "Note the removal of the EFI fake memory map support - this is believed
  to be unused and no longer worth supporting. However, we could easily
  bring it back if needed.

  With recent developments regarding confidential VMs and unaccepted
  memory, combined with kexec, creating a known inaccurate view of the
  firmware's memory map and handing it to the OS is a feature we can
  live without, hence the removal. Alternatively, I could imagine making
  this feature mutually exclusive with those confidential VM related
  features, but let's try simply removing it first.

  Summary:

   - Drop support for the 'fake' EFI memory map on x86

   - Add an SMBIOS based tweak to the EFI stub instructing the firmware
     on x86 Macbook Pros to keep both GPUs enabled

   - Replace 0-sized array with flexible array in EFI memory attributes
     table handling

   - Drop redundant BSS clearing when booting via the native PE
     entrypoint on x86

   - Avoid returning EFI_SUCCESS when aborting on an out-of-memory
     condition

   - Cosmetic tweak for arm64 KASLR loading logic"

* tag 'efi-next-for-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Replace efi_memory_attributes_table_t 0-sized array with flexible array
  efi: Rename efi_early_memdesc_ptr() to efi_memdesc_ptr()
  arm64/efistub: Clean up KASLR logic
  x86/efistub: Drop redundant clearing of BSS
  x86/efistub: Avoid returning EFI_SUCCESS on error
  x86/efistub: Call Apple set_os protocol on dual GPU Intel Macs
  x86/efistub: Enable SMBIOS protocol handling for x86
  efistub/smbios: Simplify SMBIOS enumeration API
  x86/efi: Drop support for fake EFI memory maps
2024-07-16 12:22:07 -07:00
Paolo Bonzini
1c5a0b55ab KVM/arm64 changes for 6.11
- Initial infrastructure for shadow stage-2 MMUs, as part of nested
    virtualization enablement
 
  - Support for userspace changes to the guest CTR_EL0 value, enabling
    (in part) migration of VMs between heterogenous hardware
 
  - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
    the protocol
 
  - FPSIMD/SVE support for nested, including merged trap configuration
    and exception routing
 
  - New command-line parameter to control the WFx trap behavior under KVM
 
  - Introduce kCFI hardening in the EL2 hypervisor
 
  - Fixes + cleanups for handling presence/absence of FEAT_TCRX
 
  - Miscellaneous fixes + documentation updates
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZpTCAxccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFjChAQCWs9ucJag4USgvXpg5mo9sxzly
 kBZZ1o49N/VLxs4cagEAtq3KVNQNQyGXelYH6gr20aI85j6VnZW5W5z+sy5TAgk=
 =sSOt
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.11

 - Initial infrastructure for shadow stage-2 MMUs, as part of nested
   virtualization enablement

 - Support for userspace changes to the guest CTR_EL0 value, enabling
   (in part) migration of VMs between heterogenous hardware

 - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
   the protocol

 - FPSIMD/SVE support for nested, including merged trap configuration
   and exception routing

 - New command-line parameter to control the WFx trap behavior under KVM

 - Introduce kCFI hardening in the EL2 hypervisor

 - Fixes + cleanups for handling presence/absence of FEAT_TCRX

 - Miscellaneous fixes + documentation updates
2024-07-16 09:50:44 -04:00
Ilpo Järvinen
9d20c0535e
Docs/admin-guide: Remove pmf leftover reference from the index
pmf.rst was removed by the commit 2fd66f7d3b ("platform/x86/amd/pmf:
Remove update system state document") but the reference in the
admin-guide index remained in place which triggers this warning:

Documentation/admin-guide/index.rst:75: WARNING: toctree contains
reference to nonexisting document 'admin-guide/pmf'

Remove pmf also from the index to avoid the warning.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240715104102.4615-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-16 11:41:46 +03:00
Linus Torvalds
2439a5eaa7 - Add a spectre_bhi=vmexit mitigation option aimed at cloud
environments
 
  - Remove duplicated Spectre cmdline option documentation
 
  - Add separate macro definitions for syscall handlers which do not
    return in order to address objtool warnings
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaVXXMACgkQEsHwGGHe
 VUrd3A/9FFJZcpxdpWJikyEskb3CO1xthfM/6QvV5U3/Nldpz4aROEteqsMYc+xB
 OcA/RkCc8mBBFuydZjNxlNwyMXkoab/rQJC/Dz7q1O61sho4RWk8yCh6xM1JRofF
 WeKGCClz1KnsCc8FlVaHAEhp6gBMJiiqawjXBklfHhUqmbY7UZgcAyeM3uMIwAEG
 qCS7opOSZVijJadoyvROf5na23hggUVO++qS4HYT66G3bI3MdEEWp06dUxXBD/Er
 2zRAY6III4wuGTxe8L49ftsyW9RS7AKY2rUmhpffkeA8tLYBfXogYVSQYyR3S9Ou
 gZg9Yeu64rjqZZUYpzRR+kATUpuSKO6nQBHxd+ICRIUbzSmXUNzvPTi5SWSWh2vC
 HTLgFbGXxg8fLlpqCJ21oaU982w3eteOJ+wgf/AH3hBykFljck9EcaGsaQ5OfeDE
 MA0XaDy2V4jypyxmLpRfRIWJWtNVTgza2Jl0Dg3X+UipAXtvCvJzW1ZJ0ksA+2P0
 K1GeWy4tC51uFndeYpNC1eQ0cJjv1mfAugHcqgVdAhwMYUZdXchaPJHr/fcF7AEG
 xjV7fnoGK6WKKUni+Tnmom3FzBVDztKAtZ4iYgwIWReRj9bKLhP2k779rMXkCftt
 WtiencSCtVn+K/4acYBx0vbRKlDv769Lq64FZ8xNgGw6uRXjhhM=
 =AP9P
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu mitigation updates from Borislav Petkov:

 - Add a spectre_bhi=vmexit mitigation option aimed at cloud
   environments

 - Remove duplicated Spectre cmdline option documentation

 - Add separate macro definitions for syscall handlers which do not
   return in order to address objtool warnings

* tag 'x86_bugs_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Add 'spectre_bhi=vmexit' cmdline option
  x86/bugs: Remove duplicate Spectre cmdline option descriptions
  x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-07-15 20:07:27 -07:00
Linus Torvalds
181a984b7d - Remove an unused function and the documentation of an already removed
cmdline parameter
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaVBnUACgkQEsHwGGHe
 VUoQlA//Tm04rTDScn3lwd7abUCR/chruy7e23EA4g5Bj6b1H1h7ZmNG0n/O9nU9
 LjE96EWu/BW1XGykcre+1CaFJYLru+zdL3PDwFY9Cq1lY/M6dwyTOpRvYJmPuTex
 SLLc8bL6rhBaZ022uNg8aTxPcOdVNYxz60vWLgcwmOh39hfHyjTBaSSjWhilqbKY
 QUjwEpTObaprS7uqxl9tt/aTjs12CMsawEQXXHd+0fHWqM23tnoSqSp6rOPg9THx
 WsSOejJ921WzrR3Lt1w+oQ7k1UePPlhYJhU3up1YrZn65sa1l1vwyzlCNa3dLWYk
 qtK+CP6f0uMy6SAsvdsb2aMBrKtfP/ao+cyh5Y/cKYq1FIYqpLqJc+pypk18aN1o
 zM2G4cVkkWR4sNhMX6KG+S8LJR9cQXwpB2Ex+QDC+ghX2YeG47ud4Nc1rr15+3Ym
 AJMUhLIVqKaY0tn3fjfWmUtSrdKP3I8f0AHxFLILKMRgCr2RN2H4oXGiN+xD0BhE
 PH8KmhwxHprD0N+2WzeTb+N+8pB8taEkzZizQ2xtUGSKnJh2keuF/U5vgqHzQk4X
 HAG1Yyc768umQ43LwWCQJtg873PzVZJYFVbuGXyCFh4xf7CisgyYHBIHygu5FWGS
 oG4JiVvywelZ3cNLKbADSuqT0JVChfNx52qJuFSJ3Z6ziXDV7Mg=
 =GW7x
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:

 - Remove an unused function and the documentation of an already removed
   cmdline parameter

* tag 'x86_cleanups_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Remove unused function __fortify_panic()
  Documentation: Remove "mfgpt_irq=" from the kernel-parameters.txt file
2024-07-15 19:34:20 -07:00
Linus Torvalds
b3c0eccb48 gpio updates for v6.11-rc1
GPIOLIB core:
 - rework kfifo handling rework in the character device code
 - improve the labeling of GPIOs requested as interrupts and show more info
   on interrupt-only GPIOs in debugfs
 - remove unused APIs
 - unexport interfaces that are only used from the core GPIO code
 - drop the return value from gpiochip_set_desc_names() as it cannot fail
 - move a string array definition out of a header and into a specific
   compilation unit
 - convert the last user of gpiochip_get_desc() other than GPIO core to using
   a safer alternative
 - use array_index_nospec() where applicable
 
 New drivers:
 - add a "virtual GPIO consumer" module that allows requesting GPIOs from actual
   hardware and driving tests of the in-kernel GPIO API from user-space over
   debugfs
 - add a GPIO-based "sloppy" logic analyzer module useful for "first glance"
   debugging on remote boards
 
 Driver improvements:
 - add support for a new model to gpio-pca953x
 - lock GPIOs as interrupts in gpio-sim when the lines are requested as irqs
   via the simulator domain + some other minor improvements
 - improve error reporting in gpio-syscon
 - convert gpio-ath79 to using dynamic GPIO base and range
 - use pcibios_err_to_errno() for converting PCIBIOS error codes to errno
   vaues in gpio-amd8111 and gpio-rdc321x
 - allow building gpio-brcmstb for the BCM2835 architecture
 
 DT bindings:
 - convert DT bindings for lsi,zevio, mpc8xxx, and atmel to DT schema
 - document new properties for aspeed,gpio, fsl,qoriq-gpio and gpio-vf610
 - document new compatibles for pca953x and fsl,qoriq-gpio
 
 Documentation:
 - document stricter behavior of the GPIO character device uAPI with regards to
   reconfiguring requested line without direction set
 - clarify the effect of the active-low flag on line values and edges
 - remove documentation for the legacy GPIO API in order to stop tempting
   people to use it
 - document the preference for using pread() for reading edge events in the
   sysfs API
 
 Other:
 - add an extended initializer to the interrupt simulator allowing to specify
   a number of callbacks callers can use to be notified about irqs being
   requested and released
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmaU3n4ACgkQEacuoBRx
 13LnYxAAgIfV2MQxdlB8+I3+PrObiom9uykNpoj8Ho3FiGroJEmSi2Vg9NOP5j8n
 7THjBDFqKk0USMdzGgWDe+u0oCpql8ONLd+lxPiKzRxkebMVzlumeNzWNEE3wqxO
 MdV3AOs9DLM1a4MAuv9E8PgooBVR8Cyqs3tc3wwpZRKoSZBIzwrjFL3tO1P8Dezv
 9xoPqIiMJRBZr8jifU/ZRdLG3gYKqgQH1Mha7bm94ebUwA6q/hxtGYAtc2a3Q+dF
 6lPrPONJBN6/YwvmoDddm2ppoiyWN7QdX9DQjJvKBcNRTZSE1EAggdh8kNnCoa1d
 +PeClIAJLl8ZSkdMS8yvMZIpduK4gTl7yEBMkER1d0JoJLkTowqKsvONgU12Npr2
 3rwbpACt/kVt5v0lRwaafj5vnD3NgiiVCeuZZz99ICbrXqe6rYszMIemKDYWWlTn
 kEFrTM5ql+dwAfvp8Y9JZf4oOgInHbF3LBKM34PKMW9D0a4aQC/HTfmtHobeNHzn
 FmY9ysHjMG6fvuwnkpojW6N3/LLwt+TX8jik9x0O42AE7qXn6a8U2g6RUg6rJOdd
 mUiIX3+rn+AaI6eKPvUNp2h391jH1K3hBCAca4cNAIKpqPuE/A/B5RyZZnL5Q7HQ
 Iz2G3hSlTBVPf7QWMkBUfMzQMwmvqfoKsZljC5y7YgafJc5cf4k=
 =kK88
 -----END PGP SIGNATURE-----

Merge tag 'gpio-updates-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
 "The majority of added lines are two new modules: the GPIO virtual
  consumer module that improves our ability to add automated tests for
  the kernel API and the "sloppy" logic analyzer module that uses the
  GPIO API to implement a coarse-grained debugging tool for useful for
  remote development.

  Other than that we have the usual assortment of various driver
  extensions, improvements to the core GPIO code, DT-bindings and other
  documentation updates as well as an extension to the interrupt
  simulator:

  GPIOLIB core:
   - rework kfifo handling rework in the character device code
   - improve the labeling of GPIOs requested as interrupts and show more
     info on interrupt-only GPIOs in debugfs
   - remove unused APIs
   - unexport interfaces that are only used from the core GPIO code
   - drop the return value from gpiochip_set_desc_names() as it cannot
     fail
   - move a string array definition out of a header and into a specific
     compilation unit
   - convert the last user of gpiochip_get_desc() other than GPIO core
     to using a safer alternative
   - use array_index_nospec() where applicable

  New drivers:
   - add a "virtual GPIO consumer" module that allows requesting GPIOs
     from actual hardware and driving tests of the in-kernel GPIO API
     from user-space over debugfs
   - add a GPIO-based "sloppy" logic analyzer module useful for "first
     glance" debugging on remote boards

  Driver improvements:
   - add support for a new model to gpio-pca953x
   - lock GPIOs as interrupts in gpio-sim when the lines are requested
     as irqs via the simulator domain + some other minor improvements
   - improve error reporting in gpio-syscon
   - convert gpio-ath79 to using dynamic GPIO base and range
   - use pcibios_err_to_errno() for converting PCIBIOS error codes to
     errno vaues in gpio-amd8111 and gpio-rdc321x
   - allow building gpio-brcmstb for the BCM2835 architecture

  DT bindings:
   - convert DT bindings for lsi,zevio, mpc8xxx, and atmel to DT schema
   - document new properties for aspeed,gpio, fsl,qoriq-gpio and
     gpio-vf610
   - document new compatibles for pca953x and fsl,qoriq-gpio

  Documentation:
   - document stricter behavior of the GPIO character device uAPI with
     regards to reconfiguring requested line without direction set
   - clarify the effect of the active-low flag on line values and edges
   - remove documentation for the legacy GPIO API in order to stop
     tempting people to use it
   - document the preference for using pread() for reading edge events
     in the sysfs API

  Other:
   - add an extended initializer to the interrupt simulator allowing to
     specify a number of callbacks callers can use to be notified about
     irqs being requested and released"

* tag 'gpio-updates-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (41 commits)
  gpio: mc33880: Convert comma to semicolon
  gpio: virtuser: actually use the "trimmed" local variable
  dt-bindings: gpio: convert Atmel GPIO to json-schema
  gpio: virtuser: new virtual testing driver for the GPIO API
  dt-bindings: gpio: vf610: Allow gpio-line-names to be set
  gpio: sim: lock GPIOs as interrupts when they are requested
  genirq/irq_sim: add an extended irq_sim initializer
  dt-bindings: gpio: fsl,qoriq-gpio: Add compatible string fsl,ls1046a-gpio
  gpiolib: unexport gpiochip_get_desc()
  gpio: add sloppy logic analyzer using polling
  Documentation: gpio: Reconfiguration with unset direction (uAPI v2)
  Documentation: gpio: Reconfiguration with unset direction (uAPI v1)
  dt-bindings: gpio: fsl,qoriq-gpio: add common property gpio-line-names
  gpio: ath79: convert to dynamic GPIO base allocation
  pinctrl: da9062: replace gpiochip_get_desc() with gpio_device_get_desc()
  gpiolib: put gpio_suffixes in a single compilation unit
  Documentation: gpio: Clarify effect of active low flag on line edges
  Documentation: gpio: Clarify effect of active low flag on line values
  gpiolib: Remove data-less gpiochip_add() function
  gpio: sim: use devm_mutex_init()
  ...
2024-07-15 17:53:24 -07:00
Linus Torvalds
c89d780cc1 arm64 updates for 6.11:
* Virtual CPU hotplug support for arm64 ACPI systems
 
 * cpufeature infrastructure cleanups and making the FEAT_ECBHB ID bits
   visible to guests
 
 * CPU errata: expand the speculative SSBS workaround to more CPUs
 
 * arm64 ACPI:
 
   - acpi=nospcr option to disable SPCR as default console for arm64
 
   - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/
 
 * GICv3, use compile-time PMR values: optimise the way regular IRQs are
   masked/unmasked when GICv3 pseudo-NMIs are used, removing the need for
   a static key in fast paths by using a priority value chosen
   dynamically at boot time
 
 * arm64 perf updates:
 
   - Rework of the IMX PMU driver to enable support for I.MX95
 
   - Enable support for tertiary match groups in the CMN PMU driver
 
   - Initial refactoring of the CPU PMU code to prepare for the fixed
     instruction counter introduced by Arm v9.4
 
   - Add missing PMU driver MODULE_DESCRIPTION() strings
 
   - Hook up DT compatibles for recent CPU PMUs
 
 * arm64 kselftest updates:
 
   - Kernel mode NEON fp-stress
 
   - Cleanups, spelling mistakes
 
 * arm64 Documentation update with a minor clarification on TBI
 
 * Miscellaneous:
 
   - Fix missing IPI statistics
 
   - Implement raw_smp_processor_id() using thread_info rather than a
     per-CPU variable (better code generation)
 
   - Make MTE checking of in-kernel asynchronous tag faults conditional
     on KASAN being enabled
 
   - Minor cleanups, typos
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmaQKN4ACgkQa9axLQDI
 XvE0Nw/+JZ6OEQ+DMUHXZfbWanvn1p0nVOoEV3MYVpOeQK1ILYCoDapatLNIlet0
 wcja7tohKbL1ifc7GOqlkitu824LMlotncrdOBycRqb/4C5KuJ+XhygFv5hGfX0T
 Uh2zbo4w52FPPEUMICfEAHrKT3QB9tv7f66xeUNbWWFqUn3rY02/ZVQVVdw6Zc0e
 fVYWGUUoQDR7+9hRkk6tnYw3+9YFVAUAbLWk+DGrW7WsANi6HuJ/rBMibwFI6RkG
 SZDZHum6vnwx0Dj9H7WrYaQCvUMm7AlckhQGfPbIFhUk6pWysfJtP5Qk49yiMl7p
 oRk/GrSXpiKumuetgTeOHbokiE1Nb8beXx0OcsjCu4RrIaNipAEpH1AkYy5oiKoT
 9vKZErMDtQgd96JHFVaXc+A3D2kxVfkc1u7K3TEfVRnZFV7CN+YL+61iyZ+uLxVi
 d9xrAmwRsWYFVQzlZG3NWvSeQBKisUA1L8JROlzWc/NFDwTqDGIt/zS4pZNL3+OM
 EXW0LyKt7Ijl6vPXKCXqrODRrPlcLc66VMZxofZOl0/dEqyJ+qLL4GUkWZu8lTqO
 BqydYnbTSjiDg/ntWjTrD0uJ8c40Qy7KTPEdaPqEIQvkDEsUGlOnhAQjHrnGNb9M
 psZtpDW2xm7GykEOcd6rgSz4Xeky2iLsaR4Wc7FTyDS0YRmeG44=
 =ob2k
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The biggest part is the virtual CPU hotplug that touches ACPI,
  irqchip. We also have some GICv3 optimisation for pseudo-NMIs that has
  been queued via the arm64 tree. Otherwise the usual perf updates,
  kselftest, various small cleanups.

  Core:

   - Virtual CPU hotplug support for arm64 ACPI systems

   - cpufeature infrastructure cleanups and making the FEAT_ECBHB ID
     bits visible to guests

   - CPU errata: expand the speculative SSBS workaround to more CPUs

   - GICv3, use compile-time PMR values: optimise the way regular IRQs
     are masked/unmasked when GICv3 pseudo-NMIs are used, removing the
     need for a static key in fast paths by using a priority value
     chosen dynamically at boot time

  ACPI:

   - 'acpi=nospcr' option to disable SPCR as default console for arm64

   - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/

  Perf updates:

   - Rework of the IMX PMU driver to enable support for I.MX95

   - Enable support for tertiary match groups in the CMN PMU driver

   - Initial refactoring of the CPU PMU code to prepare for the fixed
     instruction counter introduced by Arm v9.4

   - Add missing PMU driver MODULE_DESCRIPTION() strings

   - Hook up DT compatibles for recent CPU PMUs

  Kselftest updates:

   - Kernel mode NEON fp-stress

   - Cleanups, spelling mistakes

  Miscellaneous:

   - arm64 Documentation update with a minor clarification on TBI

   - Fix missing IPI statistics

   - Implement raw_smp_processor_id() using thread_info rather than a
     per-CPU variable (better code generation)

   - Make MTE checking of in-kernel asynchronous tag faults conditional
     on KASAN being enabled

   - Minor cleanups, typos"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (69 commits)
  selftests: arm64: tags: remove the result script
  selftests: arm64: tags_test: conform test to TAP output
  perf: add missing MODULE_DESCRIPTION() macros
  arm64: smp: Fix missing IPI statistics
  irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI
  ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64
  Documentation: arm64: Update memory.rst for TBI
  arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1
  KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1
  perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
  perf: arm_v6/7_pmu: Drop non-DT probe support
  perf/arm: Move 32-bit PMU drivers to drivers/perf/
  perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
  perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
  arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU
  perf: imx_perf: add support for i.MX95 platform
  perf: imx_perf: fix counter start and config sequence
  perf: imx_perf: refactor driver for imx93
  perf: imx_perf: let the driver manage the counter usage rather the user
  perf: imx_perf: add macro definitions for parsing config attr
  ...
2024-07-15 17:06:19 -07:00
Linus Torvalds
895b9b1207 cgroup: Changes for v6.11
- Added Michal Koutný as a maintainer.
 
 - Counters in pids.events were behaving inconsistently. pids.events made
   properly hierarchical and pids.events.local added.
 
 - misc.peak and misc.events.local added.
 
 - cpuset remote partition creation and cpuset.cpus.exclusive handling
   improved.
 
 - Code cleanups, non-critical fixes, doc updates.
 
 - for-6.10-fixes is merged in to receive two non-critical fixes that didn't
   trigger pull.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZpSsdw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGSEMAQDQ5VfcRz+rW20ez5IAgyN3EKIwSbW6pY6jojgj
 bJtJSQD/TzA8DoRxcCvTdHcZcwJ2e2zBVcuM8NkZHfSCNiPrrgs=
 =5f3I
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Added Michal Koutný as a maintainer

 - Counters in pids.events were behaving inconsistently. pids.events
   made properly hierarchical and pids.events.local added

 - misc.peak and misc.events.local added

 - cpuset remote partition creation and cpuset.cpus.exclusive handling
   improved

 - Code cleanups, non-critical fixes, doc updates

 - for-6.10-fixes is merged in to receive two non-critical fixes that
   didn't trigger pull

* tag 'cgroup-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits)
  cgroup: Add Michal Koutný as a maintainer
  cgroup/misc: Introduce misc.events.local
  cgroup/rstat: add force idle show helper
  cgroup: Protect css->cgroup write under css_set_lock
  cgroup/misc: Introduce misc.peak
  cgroup_misc: add kernel-doc comments for enum misc_res_type
  cgroup/cpuset: Prevent UAF in proc_cpuset_show()
  selftest/cgroup: Update test_cpuset_prs.sh to match changes
  cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus
  cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition
  selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test robot
  cgroup/cpuset: Fix remote root partition creation problem
  cgroup: avoid the unnecessary list_add(dying_tasks) in cgroup_exit()
  cgroup/cpuset: Optimize isolated partition only generate_sched_domains() calls
  cgroup/cpuset: Reduce the lock protecting CS_SCHED_LOAD_BALANCE
  kernel/cgroup: cleanup cgroup_base_files when fail to add cgroup_psi_files
  selftests: cgroup: Add basic tests for pids controller
  selftests: cgroup: Lexicographic order in Makefile
  cgroup/pids: Add pids.events.local
  cgroup/pids: Make event counters hierarchical
  ...
2024-07-15 16:41:32 -07:00
Linus Torvalds
9855e87328 RCU pull request for v6.11
doc.2024.06.06a: Update Tasks RCU and Tasks Rude RCU description in
 	Requirements.rst and clarify rcu_assign_pointer() and
 	rcu_dereference() ordering properties.
 
 fixes.2024.07.04a: Add lockdep assertions for RCU readers, limit inline
 	wakeups for callback-bypass synchronize_rcu(), add an
 	rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter,
 	add Uladzislau Rezki as RCU maintainer, and fix a subtle
 	callback-migration memory-ordering issue.
 
 mb.2024.06.28a: Remove a number of redundant memory barriers.
 
 nocb.2024.06.03a: Remove unnecessary bypass-list lock-contention
 	mitigation, use parking API instead of open-coded ad-hoc
 	equivalent, and upgrade obsolete comments.
 
 rcu-tasks.2024.06.06a: Revert avoidance of a deadlock that can no
 	longer occur and properly synchronize Tasks Trace RCU checking
 	of runqueues.
 
 rcutorture.2024.06.06a: Add tests for handling of double-call_rcu()
 	bug, add missing MODULE_DESCRIPTION, and add a script that
 	histograms the number of calls to RCU updaters.
 
 srcu.2024.06.18a: Fill out SRCU polled-grace-period API.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmaR7/QTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jGwAEACJKef2LryG6khoJdorWbvRf1V2k23H
 19CxXexCE4UoGsgGST9z1/5rM8kBdNhdhQ0JB9CitW+zGlXpOM79/mO3gALKMj++
 YBPw9B5EM622H2cKJGFzoHFSO4X9nM1CCMeuFCo6bVsbWfMtX3ENqsYl2IQy1JkB
 pGiKqcNXGWU0mdUcZKs/8ilfLG1NhaLwrkfinlsP9V1+8z8LxxDH5Qh27AT3rIvu
 W87OITTZoHlUaDVHYTautHTZoqM381xv9kNoQlS9lpH/gcFOPiO9DLj8NcLjkJ4y
 S/OrxOwfQ+BGKwnk8daFQFAc3Nr9KeVAQH7CbOW7guARhj3z97J0+wPm6nZGEE2s
 tDzg8zLT9LtbmUypJLurl29+wFE4fPNsnd69XDONbMFN1Ox2tJM3dd/rPCsHSUvz
 kEOK9gUreHOv7/Ou6UIHlYVlHY7HHuD7TAsrhaaWk7CEmlY31UKwXG+fMl1FAnSy
 F3PcBF/1M687RRFWVeMlug/+0/+ghtc+kZ1YyR79KZR6dI0C7ueQbCBGztCCtFDz
 RjrHcDifS0Y2GNQO9+zAyrJvttidRATdYDeFstk+8nnta3CnYzxCp4rn5hs3Ss3N
 AJVJm244jR3AcoL4V/tQwiQlYh9ZYN5tZ7qxFiASdtV50Uc8HoIrWXeP0Ar+GHiV
 2z/f5fKF4+5clQ==
 =7a1C
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

 - Update Tasks RCU and Tasks Rude RCU description in Requirements.rst
   and clarify rcu_assign_pointer() and rcu_dereference() ordering
   properties

 - Add lockdep assertions for RCU readers, limit inline wakeups for
   callback-bypass synchronize_rcu(), add an
   rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter, add
   Uladzislau Rezki as RCU maintainer, and fix a subtle
   callback-migration memory-ordering issue

 - Remove a number of redundant memory barriers

 - Remove unnecessary bypass-list lock-contention mitigation, use
   parking API instead of open-coded ad-hoc equivalent, and upgrade
   obsolete comments

 - Revert avoidance of a deadlock that can no longer occur and properly
   synchronize Tasks Trace RCU checking of runqueues

 - Add tests for handling of double-call_rcu() bug, add missing
   MODULE_DESCRIPTION, and add a script that histograms the number of
   calls to RCU updaters

 - Fill out SRCU polled-grace-period API

* tag 'rcu.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (29 commits)
  rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
  rcu: Eliminate lockless accesses to rcu_sync->gp_count
  MAINTAINERS: Add Uladzislau Rezki as RCU maintainer
  rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
  rcu/exp: Remove redundant full memory barrier at the end of GP
  rcu: Remove full memory barrier on RCU stall printout
  rcu: Remove full memory barrier on boot time eqs sanity check
  rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot
  rcu: Remove superfluous full memory barrier upon first EQS snapshot
  rcu: Remove full ordering on second EQS snapshot
  srcu: Fill out polled grace-period APIs
  srcu: Update cleanup_srcu_struct() comment
  srcu: Add NUM_ACTIVE_SRCU_POLL_OLDSTATE
  srcu: Disable interrupts directly in srcu_gp_end()
  rcu: Disable interrupts directly in rcu_gp_init()
  rcu/tree: Reduce wake up for synchronize_rcu() common case
  rcu/tasks: Fix stale task snaphot for Tasks Trace
  tools/rcu: Add rcu-updaters.sh script
  rcutorture: Add missing MODULE_DESCRIPTION() macros
  rcutorture: Fix rcu_torture_fwd_cb_cr() data race
  ...
2024-07-15 15:25:27 -07:00
Tejun Heo
9283ff5be1 Merge branch 'for-6.10-fixes' into for-6.11 2024-07-14 18:04:03 -10:00
Steve French
d2346e2836 cifs: fix setting SecurityFlags to true
If you try to set /proc/fs/cifs/SecurityFlags to 1 it
will set them to CIFSSEC_MUST_NTLMV2 which no longer is
relevant (the less secure ones like lanman have been removed
from cifs.ko) and is also missing some flags (like for
signing and encryption) and can even cause mount to fail,
so change this to set it to Kerberos in this case.

Also change the description of the SecurityFlags to remove mention
of flags which are no longer supported.

Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-13 09:24:27 -05:00
Ryan Roberts
63d9866ab0 mm: shmem: rename mTHP shmem counters
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages.  This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours.  However, we are
stuck with it since it constitutes a user ABI.

Recently, commit 66f44583f9 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names.  But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.

So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable.  While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.

Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:23 -07:00
Ran Xiaokai
4c8763e84a kpageflags: detect isolated KPF_THP folios
When folio is isolated, the PG_lru bit is cleared.  So the PG_lru check in
stable_page_flags() will miss this kind of isolated folios.  Use
folio_test_large_rmappable() instead to also include isolated folios.

Since pagecache supports large folios and the introduction of mTHP, the
semantics of KPF_THP have been expanded, now it indicates not only
PMD-sized THP.  Update related documentation to clearly state that KPF_THP
indicates multiple order THPs.

[ran.xiaokai@zte.com.cn: directly use is_zero_folio(), per David]
  Link: https://lkml.kernel.org/r/20240708062601.165215-1-ranxiaokai627@163.com
Link: https://lkml.kernel.org/r/20240705104343.112680-1-ranxiaokai627@163.com
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Svetly Todorov <svetly.todorov@memverge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:21 -07:00
Ryan Roberts
00f5810420 mm: fix khugepaged activation policy
Since the introduction of mTHP, the docuementation has stated that
khugepaged would be enabled when any mTHP size is enabled, and disabled
when all mTHP sizes are disabled.  There are 2 problems with this; 1. 
this is not what was implemented by the code and 2.  this is not the
desirable behavior.

Desirable behavior is for khugepaged to be enabled when any PMD-sized THP
is enabled, anon or file.  (Note that file THP is still controlled by the
top-level control so we must always consider that, as well as the PMD-size
mTHP control for anon).  khugepaged only supports collapsing to PMD-sized
THP so there is no value in enabling it when PMD-sized THP is disabled. 
So let's change the code and documentation to reflect this policy.

Further, per-size enabled control modification events were not previously
forwarded to khugepaged to give it an opportunity to start or stop. 
Consequently the following was resulting in khugepaged eroneously not
being activated:

  echo never > /sys/kernel/mm/transparent_hugepage/enabled
  echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled

[ryan.roberts@arm.com: v3]
  Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20240704091051.2411934-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Fixes: 3485b88390 ("mm: thp: introduce multi-size THP sysfs interface")
Closes: https://lore.kernel.org/linux-mm/7a0bbe69-1e3d-4263-b206-da007791a5c4@redhat.com/
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:20 -07:00
Lance Yang
9b89e01899 mm: add docs for per-order mTHP split counters
This commit introduces documentation for mTHP split counters in
transhuge.rst.

[ioworker0@gmail.com: improve the doc as suggested by Ryan]
  Link: https://lkml.kernel.org/r/20240704012905.42971-3-ioworker0@gmail.com
[ioworker0@gmail.com: tweak Documentation/admin-guide/mm/transhuge.rst]
  Link: https://lkml.kernel.org/r/20240707013659.1151-1-ioworker0@gmail.com
Link: https://lkml.kernel.org/r/20240628130750.73097-3-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Bang Li <libang.li@antgroup.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:13 -07:00
Vidya Sagar
47c8846a49 PCI: Extend ACS configurability
PCIe ACS settings control the level of isolation and the possible P2P paths
between devices. With greater isolation the kernel will create smaller
iommu_groups and with less isolation there is more HW that can achieve P2P
transfers. From a virtualization perspective all devices in the same
iommu_group must be assigned to the same VM as they lack security
isolation.

There is no way for the kernel to automatically know the correct ACS
settings for any given system and workload. Existing command line options
(e.g., disable_acs_redir) allow only for large scale change, disabling all
isolation, but this is not sufficient for more complex cases.

Add a kernel command-line option 'config_acs' to directly control all the
ACS bits for specific devices, which allows the operator to setup the right
level of isolation to achieve the desired P2P configuration.  The
definition is future proof; when new ACS bits are added to the spec the
open syntax can be extended.

ACS needs to be setup early in the kernel boot as the ACS settings affect
how iommu_groups are formed. iommu_group formation is a one time event
during initial device discovery, so changing ACS bits after kernel boot can
result in an inaccurate view of the iommu_groups compared to the current
isolation configuration.

ACS applies to PCIe Downstream Ports and multi-function devices.  The
default ACS settings are strict and deny any direct traffic between two
functions. This results in the smallest iommu_group the HW can support.
Frequently these values result in slow or non-working P2PDMA.

ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:

  - Full prevention

  - Translated requests can be direct, with various options

  - Asymmetric direct traffic, A can reach B but not the reverse

  - All traffic can be direct

Along with some other less common ones for special topologies.

The intention is that this option would be used with expert knowledge of
the HW capability and workload to achieve the desired configuration.

Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
[bhelgaas: add example, tidy printk formats]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-12 16:51:46 -05:00
Xiu Jianfeng
6a26f9c689 cgroup/misc: Introduce misc.events.local
Currently the event counting provided by misc.events is hierarchical,
it's not practical if user is only concerned with events of a
specified cgroup. Therefore, introduce misc.events.local collect events
specific to the given cgroup.

This is analogous to memory.events.local and pids.events.local.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-12 06:45:23 -10:00
Jiaxun Yang
59649de96f MIPS: Implement ieee754 NAN2008 emulation mode
Implement ieee754 NAN2008 emulation mode.

When this mode is enabled, kernel will accept ELF file
compiled for both NaN 2008 and NaN legacy, but if hardware
does not have capability to match ELF's NaN mode, __own_fpu
will fail for corresponding thread and fpuemu will then kick
in.

This mode trade performance for correctness, while maintaining
support for both NaN mode regardless of hardware capability.
It is useful for multilib installation that have both types
of binary exist in system.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-07-12 13:09:25 +02:00
Juergen Gross
9fe6a8c5b2 x86/xen: remove deprecated xen_nopvspin boot parameter
The xen_nopvspin boot parameter is deprecated since 2019. nopvspin
can be used instead.

Remove the xen_nopvspin boot parameter and replace the xen_pvspin
variable use cases with nopvspin.

This requires to move the nopvspin variable out of the .initdata
section, as it needs to be accessed for cpuhotplug, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Message-ID: <20240710110139.22300-1-jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11 16:33:51 +02:00
Juergen Gross
942d917cb9 xen: make multicall debug boot time selectable
Today Xen multicall debugging needs to be enabled via modifying a
define in a source file for getting debug data of multicall errors
encountered by users.

Switch multicall debugging to depend on a boot parameter "xen_mc_debug"
instead, enabling affected users to boot with the new parameter set in
order to get better diagnostics.

With debugging enabled print the following information in case at least
one of the batched calls failed:
- all calls of the batch with operation, result and caller
- all parameters of each call
- all parameters stored in the multicall data for each call

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Message-ID: <20240710092749.13595-1-jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11 16:33:39 +02:00
Ingo Molnar
011b1134b8 Merge branch 'sched/urgent' into sched/core, to pick up fixes and refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-07-11 10:42:33 +02:00
Shyam Sundar S K
2fd66f7d3b
platform/x86/amd/pmf: Remove update system state document
This commit removes the "pmf.rst" document, which was associated with
the PMF driver that enabled system state updates based on TA output
actions.

The driver now uses existing input events (KEY_SCREENLOCK, KEY_SLEEP,
and KEY_SUSPEND) instead of defining new udev rules in the
"/etc/udev/rules.d/" directory. Consequently, the pmf.rst document is no
longer necessary. Therefore, the pmf.rst documentation is being removed.

Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20240711052047.1531957-2-Shyam-sundar.S-k@amd.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-11 10:41:53 +03:00
SeongJae Park
752f18b9d9 Docs/admin-guide/mm/damon/start: add access pattern snapshot example
DAMON user-space tool (damo) provides access pattern snapshot feature,
which is expected to be frequently used for real time access pattern
analysis.  The snapshot output is also showing what DAMON provides on its
own, including the 'age' information.

In contrast, the recorded access patterns, which is shown as an example
usage on the quick start section, shows what users can make from what
DAMON provided.  It includes information that generated outside of DAMON
and makes the 'age' concept bit unclear.  Hence snapshot output is easier
at understanding the raw realtime output of DAMON.  Add the snapshot usage
example on the quick start section.

Link: https://lkml.kernel.org/r/20240701192706.51415-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10 12:14:52 -07:00
Mikulas Patocka
0d815e3400 dm-crypt: limit the size of encryption requests
There was a performance regression reported where dm-crypt would perform
worse on new kernels than on old kernels. The reason is that the old
kernels split the bios to NVMe request size (that is usually 65536 or
131072 bytes) and the new kernels pass the big bios through dm-crypt and
split them underneath.

If a big 1MiB bio is passed to dm-crypt, dm-crypt processes it on a single
core without parallelization and this is what causes the performance
degradation.

This commit introduces new tunable variables
/sys/module/dm_crypt/parameters/max_read_size and
/sys/module/dm_crypt/parameters/max_write_size that specify the maximum
bio size for dm-crypt. Bios larger than this value are split, so that
they can be encrypted in parallel by multiple cores. If these variables
are '0', a default 131072 is used.

Splitting bios may cause performance regressions in other workloads - if
this happens, the user should increase the value in max_read_size and
max_write_size variables.

max_read_size:
128k    2399MiB/s
256k    2368MiB/s
512k    1986MiB/s
1024    1790MiB/s

max_write_size:
128k    1712MiB/s
256k    1651MiB/s
512k    1537MiB/s
1024k   1332MiB/s

Note that if you run dm-crypt inside a virtual machine, you may need to do
"echo numa >/sys/module/workqueue/parameters/default_affinity_scope" to
improve performance.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
2024-07-10 13:09:50 +02:00
Rafael J. Wysocki
9dabb5b48f Merge back cpufreq material for 6.11. 2024-07-10 13:03:11 +02:00
Daniel Lublin
9784f29bf5 Documentation: add reference from dynamic debug to loglevel kernel params
This is useful information for somebody who has managed to dig into
enabling debug output, but is wondering why there is no such output
appearing on the console.

Signed-off-by: Daniel Lublin <daniel@lublin.se>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/4633bdb82c1c7c014d79840887878624a55c59f8.1720533043.git.daniel@lublin.se
2024-07-09 08:57:52 -06:00
Bibo Mao
03779999ac LoongArch: KVM: Add PV steal time support in guest side
Per-cpu struct kvm_steal_time is added here, its size is 64 bytes and
also defined as 64 bytes, so that the whole structure is in one physical
page.

When a VCPU is online, function pv_enable_steal_time() is called. This
function will pass guest physical address of struct kvm_steal_time and
tells hypervisor to enable steal time. When a vcpu is offline, physical
address is set as 0 and tells hypervisor to disable steal time.

Here is an output of vmstat on guest when there is workload on both host
and guest. It shows steal time stat information.

procs -----------memory---------- -----io---- -system-- ------cpu-----
 r  b   swpd   free  inact active   bi    bo   in   cs us sy id wa st
15  1      0 7583616 184112  72208    20    0  162   52 31  6 43  0 20
17  0      0 7583616 184704  72192    0     0 6318 6885  5 60  8  5 22
16  0      0 7583616 185392  72144    0     0 1766 1081  0 49  0  1 50
16  0      0 7583616 184816  72304    0     0 6300 6166  4 62 12  2 20
18  0      0 7583632 184480  72240    0     0 2814 1754  2 58  4  1 35

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bartosz Golaszewski
91581c4b3f gpio: virtuser: new virtual testing driver for the GPIO API
The GPIO subsystem used to have a serious problem with undefined behavior
and use-after-free bugs on hot-unplug of GPIO chips. This can be
considered a corner-case by some as most GPIO controllers are enabled
early in the boot process and live until the system goes down but most
GPIO drivers do allow unbind over sysfs, many are loadable modules that
can be (force) unloaded and there are also GPIO devices that can be
dynamically detached, for instance CP2112 which is a USB GPIO expender.

Bugs can be triggered both from user-space as well as by in-kernel users.
We have the means of testing it from user-space via the character device
but the issues manifest themselves differently in the kernel.

This is a proposition of adding a new virtual driver - a configurable
GPIO consumer that can be configured over configfs (similarly to
gpio-sim) or described on the device-tree.

This driver is aimed as a helper in spotting any regressions in
hot-unplug handling in GPIOLIB.

Link: https://lore.kernel.org/r/20240708142912.120570-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-09 09:39:54 +02:00
Jiaqi Yan
44195d1eba docs: mm: add enable_soft_offline sysctl
Add the documentation for soft offline behaviors / costs, and what the new
enable_soft_offline sysctl is for.

[jiaqiyan@google.com: fix kerneldoc warnings]
  Link: https://lkml.kernel.org/r/CACw3F52=GxTCDw-PqFh3-GDM-fo3GbhGdu0hedxYXOTT4TQSTg@mail.gmail.com
[jiaqiyan@google.com: there are more blank lines needed]
  Link: https://lkml.kernel.org/r/CACw3F52_obAB742XeDRNun4BHBYtrxtbvp5NkUincXdaob0j1g@mail.gmail.com
Link: https://lkml.kernel.org/r/20240626050818.2277273-5-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04 18:06:00 -07:00
Dan Schatzberg
68cd9050d8 mm: add swappiness= arg to memory.reclaim
Allow proactive reclaimers to submit an additional swappiness=<val>
argument to memory.reclaim.  This overrides the global or per-memcg
swappiness setting for that reclaim attempt.

For example:

echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim

will perform reclaim on the rootcg with a swappiness setting of 0 (no
swap) regardless of the vm.swappiness sysctl setting.

Userspace proactive reclaimers use the memory.reclaim interface to trigger
reclaim.  The memory.reclaim interface does not allow for any way to
effect the balance of file vs anon during proactive reclaim.  The only
approach is to adjust the vm.swappiness setting.  However, there are a few
reasons we look to control the balance of file vs anon during proactive
reclaim, separately from reactive reclaim:

* Swapout should be limited to manage SSD write endurance.  In near-OOM
  situations we are fine with lots of swap-out to avoid OOMs.  As these
  are typically rare events, they have relatively little impact on write
  endurance.  However, proactive reclaim runs continuously and so its
  impact on SSD write endurance is more significant.  Therefore it is
  desireable to control swap-out for proactive reclaim separately from
  reactive reclaim

* Some userspace OOM killers like systemd-oomd[1] support OOM killing on
  swap exhaustion.  This makes sense if the swap exhaustion is triggered
  due to reactive reclaim but less so if it is triggered due to proactive
  reclaim (e.g.  one could see OOMs when free memory is ample but anon is
  just particularly cold).  Therefore, it's desireable to have proactive
  reclaim reduce or stop swap-out before the threshold at which OOM
  killing occurs.

In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness
before writes to memory.reclaim[2].  This has been in production for
nearly two years and has addressed our needs to control proactive vs
reactive reclaim behavior but is still not ideal for a number of reasons:

* vm.swappiness is a global setting, adjusting it can race/interfere
  with other system administration that wishes to control vm.swappiness. 
  In our case, we need to disable Senpai before adjusting vm.swappiness.

* vm.swappiness is stateful - so a crash or restart of Senpai can leave
  a misconfigured setting.  This requires some additional management to
  record the "desired" setting and ensure Senpai always adjusts to it.

With this patch, we avoid these downsides of adjusting vm.swappiness
globally.

[1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html
[2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598

Link: https://lkml.kernel.org/r/20240103164841.2800183-3-schatzberg.dan@gmail.com
Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Suggested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yue Zhao <findns94@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04 18:05:55 -07:00
Paul E. McKenney
68d124b099 rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
If a CPU is running either a userspace application or a guest OS in
nohz_full mode, it is possible for a system call to occur just as an
RCU grace period is starting.  If that CPU also has the scheduling-clock
tick enabled for any reason (such as a second runnable task), and if the
system was booted with rcutree.use_softirq=0, then RCU can add insult to
injury by awakening that CPU's rcuc kthread, resulting in yet another
task and yet more OS jitter due to switching to that task, running it,
and switching back.

In addition, in the common case where that system call is not of
excessively long duration, awakening the rcuc task is pointless.
This pointlessness is due to the fact that the CPU will enter an extended
quiescent state upon returning to the userspace application or guest OS.
In this case, the rcuc kthread cannot do anything that the main RCU
grace-period kthread cannot do on its behalf, at least if it is given
a few additional milliseconds (for example, given the time duration
specified by rcutree.jiffies_till_first_fqs, give or take scheduling
delays).

This commit therefore adds a rcutree.nohz_full_patience_delay kernel
boot parameter that specifies the grace period age (in milliseconds,
rounded to jiffies) before which RCU will refrain from awakening the
rcuc kthread.  Preliminary experimentation suggests a value of 1000,
that is, one second.  Increasing rcutree.nohz_full_patience_delay will
increase grace-period latency and in turn increase memory footprint,
so systems with constrained memory might choose a smaller value.
Systems with less-aggressive OS-jitter requirements might choose the
default value of zero, which keeps the traditional immediate-wakeup
behavior, thus avoiding increases in grace-period latency.

[ paulmck: Apply Leonardo Bras feedback.  ]

Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/

Reported-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
2024-07-04 13:47:39 -07:00
Liu Wei
f5a4af3c75 ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64
For varying privacy and security reasons, sometimes we would like to
completely silence the _serial_ console, and only enable it when needed.

But there are many existing systems that depend on this _serial_ console,
so add acpi=nospcr to disable console in ACPI SPCR table as default
_serial_ console.

Signed-off-by: Liu Wei <liuwei09@cestc.cn>
Suggested-by: Prarit Bhargava <prarit@redhat.com>
Suggested-by: Will Deacon <will@kernel.org>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Link: https://lore.kernel.org/r/20240625030504.58025-1-liuwei09@cestc.cn
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-07-04 16:33:13 +01:00
Tony Lindgren
17199dfccd Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports
Document the console option for DEVNAME:0.0 style addressing for serial
ports.

Suggested-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240703100615.118762-4-tony.lindgren@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-04 15:41:44 +02:00
Shubhang Kaushik OS
55ccad6fc1 vmalloc: modify the alloc_vmap_area() error message for better diagnostics
'vmap allocation for size %lu failed: use vmalloc=<size> to increase size'
The above warning is seen in the kernel functionality for allocation of
the restricted virtual memory range till exhaustion.

This message is misleading because 'vmalloc=' is supported on arm32, x86
platforms and is not a valid kernel parameter on a number of other
platforms (in particular its not supported on arm64, alpha, loongarch,
arc, csky, hexagon, microblaze, mips, nios2, openrisc, parisc, m64k,
powerpc, riscv, sh, um, xtensa, s390, sparc).  With the update, the output
gets modified to include the function parameters along with the start and
end of the virtual memory range allowed.

The warning message after fix on kernel version 6.10.0-rc1+:

vmalloc_node_range for size 33619968 failed: Address range restricted between 0xffff800082640000 - 0xffff800084650000

Backtrace with the misleading error message:

	vmap allocation for size 33619968 failed: use vmalloc=<size> to increase size
	insmod: vmalloc error: size 33554432, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0
	CPU: 46 PID: 1977 Comm: insmod Tainted: G            E      6.10.0-rc1+ #79
	Hardware name: INGRASYS Yushan Server iSystem TEMP-S000141176+10/Yushan Motherboard, BIOS 2.10.20230517 (SCP: xxx) yyyy/mm/dd
	Call trace:
		dump_backtrace+0xa0/0x128
		show_stack+0x20/0x38
		dump_stack_lvl+0x78/0x90
		dump_stack+0x18/0x28
		warn_alloc+0x12c/0x1b8
		__vmalloc_node_range_noprof+0x28c/0x7e0
		custom_init+0xb4/0xfff8 [test_driver]
		do_one_initcall+0x60/0x290
		do_init_module+0x68/0x250
		load_module+0x236c/0x2428
		init_module_from_file+0x8c/0xd8
		__arm64_sys_finit_module+0x1b4/0x388
		invoke_syscall+0x78/0x108
		el0_svc_common.constprop.0+0x48/0xf0
		do_el0_svc+0x24/0x38
		el0_svc+0x3c/0x130
		el0t_64_sync_handler+0x100/0x130
		el0t_64_sync+0x190/0x198

[Shubhang@os.amperecomputing.com: v5]
  Link: https://lkml.kernel.org/r/CH2PR01MB5894B0182EA0B28DF2EFB916F5C72@CH2PR01MB5894.prod.exchangelabs.com
Link: https://lkml.kernel.org/r/MN2PR01MB59025CC02D1D29516527A693F5C62@MN2PR01MB5902.prod.exchangelabs.com
Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Xiongwei Song <xiongwei.song@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:30:18 -07:00
Honggyu Kim
83d0d46a80 Docs/damon: document damos_migrate_{hot,cold}
This patch adds damon description for "migrate_hot" and "migrate_cold"
actions for both usage and design documents as long as a new
"target_nid" knob to set the migration target node.

[sj@kernel.org: trivial fixups for DAMOS_MIGRATE_{HOT,COLD} documentation]
  Link: https://lkml.kernel.org/r/20240618213630.84846-2-sj@kernel.org
Link: https://lkml.kernel.org/r/20240614030010.751-8-honggyu.kim@sk.com
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: SeongJae Park <sj@kernel.org>Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Gregory Price <gregory.price@memverge.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Hyeongtak Ji <hyeongtak.ji@sk.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:30:13 -07:00
David Hildenbrand
478fd0d8ec Documentation/admin-guide/mm/pagemap.rst: drop "Using pagemap to do something useful"
That example was added in 2008.  In 2015, we restricted access to the PFNs
in the pagemap to CAP_SYS_ADMIN, making that approach quite less usable.

It's 2024 now, and using that racy and low-lewel mechanism to calculate
the USS should not be considered a good example anymore.  /proc/$pid/smaps
and /proc/$pid/smaps_rollup can do a much better job without any of that
low-level handling.

Let's just drop that example.

Link: https://lkml.kernel.org/r/20240607122357.115423-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Oscar Salvador <osalvador@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:30:06 -07:00
Baolin Wang
66f44583f9 mm: shmem: add mTHP counters for anonymous shmem
Add mTHP counters for anonymous shmem.

[baolin.wang@linux.alibaba.com: update Documentation/admin-guide/mm/transhuge.rst]
  Link: https://lkml.kernel.org/r/d86e2e7f-4141-432b-b2ba-c6691f36ef0b@linux.alibaba.com
Link: https://lkml.kernel.org/r/4fd9e467d49ae4a747e428bcd821c7d13125ae67.1718090413.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0@gmail.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:30:04 -07:00
Baolin Wang
4b98995530 mm: shmem: add multi-size THP sysfs interface for anonymous shmem
To support the use of mTHP with anonymous shmem, add a new sysfs interface
'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/'
directory for each mTHP to control whether shmem is enabled for that mTHP,
with a value similar to the top level 'shmem_enabled', which can be set
to: "always", "inherit (to inherit the top level setting)", "within_size",
"advise", "never".  An 'inherit' option is added to ensure compatibility
with these global settings, and the options 'force' and 'deny' are
dropped, which are rather testing artifacts from the old ages.

By default, PMD-sized hugepages have enabled="inherit" and all other
hugepage sizes have enabled="never" for
'/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/shmem_enabled'.

In addition, if top level value is 'force', then only PMD-sized hugepages
have enabled="inherit", otherwise configuration will be failed and vice
versa.  That means now we will avoid using non-PMD sized THP to override
the global huge allocation.

[baolin.wang@linux.alibaba.com: fix transhuge.rst indentation]
  Link: https://lkml.kernel.org/r/b189d815-998b-4dfd-ba89-218ff51313f8@linux.alibaba.com
[akpm@linux-foundation.org: reflow transhuge.rst addition to 80 cols]
[baolin.wang@linux.alibaba.com: move huge_shmem_orders_lock under CONFIG_SYSFS]
  Link: https://lkml.kernel.org/r/eb34da66-7f12-44f3-a39e-2bcc90c33354@linux.alibaba.com
[akpm@linux-foundation.org: huge_memory.c needs mm_types.h]
Link: https://lkml.kernel.org/r/ffddfa8b3cb4266ff963099ab78cfd7184c57ac7.1718090413.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:30:04 -07:00
Daniel Watson
6b2fa426df docs/admin-guide/mm: correct typo 'quired' to 'queried'
Convert the word "quired" to the word "queried" which makes more
sense in this context.

Signed-off-by: Daniel Watson <ozzloy@each.do>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/878qymrjrg.fsf@trent-reznor
2024-07-03 16:22:36 -06:00
Xiu Jianfeng
1028f391d5 cgroup/misc: Introduce misc.peak
Introduce misc.peak to record the historical maximum usage of the
resource, as in some scenarios the value of misc.max could be
adjusted based on the peak usage of the resource.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-03 08:08:43 -10:00
Raphael Gallais-Pou
29acea1a04 cpufreq: docs: Add missing scaling_available_frequencies description
Add a description of the scaling_available_frequencies attribute in
sysfs to the documentation.

Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20240701171040.369030-1-rgallaispou@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-02 20:51:08 +02:00
Ard Biesheuvel
37aee82c21 x86/efi: Drop support for fake EFI memory maps
Between kexec and confidential VM support, handling the EFI memory maps
correctly on x86 is already proving to be rather difficult (as opposed
to other EFI architectures which manage to never modify the EFI memory
map to begin with)

EFI fake memory map support is essentially a development hack (for
testing new support for the 'special purpose' and 'more reliable' EFI
memory attributes) that leaked into production code. The regions marked
in this manner are not actually recognized as such by the firmware
itself or the EFI stub (and never have), and marking memory as 'more
reliable' seems rather futile if the underlying memory is just ordinary
RAM.

Marking memory as 'special purpose' in this way is also dubious, but may
be in use in production code nonetheless. However, the same should be
achievable by using the memmap= command line option with the ! operator.

EFI fake memmap support is not enabled by any of the major distros
(Debian, Fedora, SUSE, Ubuntu) and does not exist on other
architectures, so let's drop support for it.

Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-07-02 00:26:24 +02:00
Greg Kroah-Hartman
f7697db8b1 Merge 6.10-rc6 into usb-next
We need the USB fixes in here as well for some follow-on patches.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-01 13:59:29 +02:00
Linus Torvalds
3e334486ec TTY/Serial/Console fixes for 6.10-rc6
Here are a bunch of fixes/reverts for 6.10-rc6.  Include in here are:
   - revert the bunch of tty/serial/console changes that landed in -rc1
     that didn't quite work properly yet.  Everyone agreed to just revert
     them for now and will work on making them better for a future
     release instead of trying to quick fix the existing changes this
     late in the release cycle
   - 8250 driver port count bugfix
   - Other tiny serial port bugfixes for reported issues
 
 All of these have been in linux-next this week with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZoFmvg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymziACgvoDTxuDHHfPOd6h/1qrHqYpFK1YAn2IDMJGj
 Ng4/I/gwnkJeeHQC5JSn
 =g9o4
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial / console fixes from Greg KH:
 "Here are a bunch of fixes/reverts for 6.10-rc6.  Include in here are:

   - revert the bunch of tty/serial/console changes that landed in -rc1
     that didn't quite work properly yet.

     Everyone agreed to just revert them for now and will work on making
     them better for a future release instead of trying to quick fix the
     existing changes this late in the release cycle

   - 8250 driver port count bugfix

   - Other tiny serial port bugfixes for reported issues

  All of these have been in linux-next this week with no reported
  issues"

* tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "printk: Save console options for add_preferred_console_match()"
  Revert "printk: Don't try to parse DEVNAME:0.0 console options"
  Revert "printk: Flag register_console() if console is set on command line"
  Revert "serial: core: Add support for DEVNAME:0.0 style naming for kernel console"
  Revert "serial: core: Handle serial console options"
  Revert "serial: 8250: Add preferred console in serial8250_isa_init_ports()"
  Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports"
  Revert "serial: 8250: Fix add preferred console for serial8250_isa_init_ports()"
  Revert "serial: core: Fix ifdef for serial base console functions"
  serial: bcm63xx-uart: fix tx after conversion to uart_port_tx_limited()
  serial: core: introduce uart_port_tx_limited_flags()
  Revert "serial: core: only stop transmit when HW fifo is empty"
  serial: imx: set receiver level before starting uart
  tty: mcf: MCF54418 has 10 UARTS
  serial: 8250_omap: Implementation of Errata i2310
  tty: serial: 8250: Fix port count mismatch with the device
2024-06-30 08:57:43 -07:00
Nils Rothaug
ceac017e12 media: em28xx: Add support for MyGica UTV3
The MyGica UTV3 Analog USB2.0 TV Box is a USB video capture card
that has analog TV, composite video, and FM radio inputs, an IR
remote, and provides audio only as Line Out, but not over USB.
Mine is prepared for an FM tuner, but not equipped with one.
Support for FM radio is therefore missing. The device contains:
 - Empia EM2860 USB bridge
 - Philips SAA7113 video decoder
 - NXP TDA9801T demodulator
 - Tena TNF931D-DFDR1 tuner
 - ST HCF4052 demux, switches input audio to Line Out

Signed-off-by: Nils Rothaug <nils.rothaug@gmx.de>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-29 12:20:05 +02:00
Nils Rothaug
32c1935280 media: tuner-simple: Add support for Tena TNF931D-DFDR1
Tuner ranges were determined by USB capturing the vendor driver of a
MyGica UTV3 video capture card.

Signed-off-by: Nils Rothaug <nils.rothaug@gmx.de>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-29 12:20:05 +02:00
Josh Poimboeuf
42c141fbb6 x86/bugs: Add 'spectre_bhi=vmexit' cmdline option
In cloud environments it can be useful to *only* enable the vmexit
mitigation and leave syscalls vulnerable.  Add that as an option.

This is similar to the old spectre_bhi=auto option which was removed
with the following commit:

  36d4fe147c ("x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto")

with the main difference being that this has a more descriptive name and
is disabled by default.

Mitigation switch requested by Maksim Davydov <davydov-max@yandex-team.ru>.

  [ bp: Massage. ]

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/2cbad706a6d5e1da2829e5e123d8d5c80330148c.1719381528.git.jpoimboe@kernel.org
2024-06-28 15:35:54 +02:00
Josh Poimboeuf
4586c93ebf x86/bugs: Remove duplicate Spectre cmdline option descriptions
Duplicating the documentation of all the Spectre kernel cmdline options
in two separate files is unwieldy and error-prone.  Instead just add a
reference to kernel-parameters.txt from spectre.rst.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Link: https://lore.kernel.org/r/450b5f4ffe891a8cc9736ec52b0c6f225bab3f4b.1719381528.git.jpoimboe@kernel.org
2024-06-28 15:28:38 +02:00
Dorcas Anono Litunya
f5306b757c documentation: media: vivid: Update documentation on vivid loopback support
Modify section "Video and Sliced VBI Looping" in Documentation to explain
the vivid loopback support for video across multiple vivid instances.
Previous documentation is out-of-date as it was explaining looping in a
single vivid instance only.

Also, in "Some Future Improvements" the item "Add support to loop
from a specific output to a specific input across vivid instances"
can be dropped since that's now implemented.

Signed-off-by: Dorcas Anono Litunya <anonolitunya@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28 08:00:29 +02:00
Hans Verkuil
4c4dacb052 media: vivid: loopback based on 'Connected To' controls
Instead of using hardwired video loopback limited to a single vivid
instance, use the new 'Connected To' controls to only loopback if an
HDMI or S-Video input is connected to another output, which can be
in another vivid instance. Effectively this emulates connecting and
disconnecting an HDMI/S-Video cable.

The Loop Video control is dropped since it has now been replaced by
the new 'Connected To' controls. The Display Present has also been
dropped since it no longer fits.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Co-developed-by: Dorcas Anono Litunya <anonolitunya@gmail.com>
Signed-off-by: Dorcas Anono Litunya <anonolitunya@gmail.com>
2024-06-28 08:00:29 +02:00
Dorcas Anono Litunya
0bc9574a7a media: Documentation: vivid.rst: Remove documentation for Capture Overlay
Modifying documentation to remove 'Capture Overlay section' as
destructive capture overlay support was removed.

See commit ccaa9d50ca ("media: vivid: drop overlay support")

Signed-off-by: Dorcas Anono Litunya <anonolitunya@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28 08:00:29 +02:00
Hans Verkuil
3883822e17 media: Documentation: vivid.rst: add supports_requests
The module option supports_requests was not documented, add it.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28 08:00:29 +02:00
Hans Verkuil
50e2eba54d media: Documentation: vivid.rst: drop "Video, VBI and RDS Looping"
Drop the "Video, VBI and RDS Looping" section, instead moving the
Video/VBI info to section "Video and Sliced VBI looping" and the
RDS info to section "Radio & RDS Looping".

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28 08:00:28 +02:00
Hans Verkuil
2513996024 media: Documentation: vivid.rst: fix confusing section refs
The documentation contained several instances of "section X"
references, which no longer map to whatever X was.

Replace these by the section titles.

Also fix a single confusing typo in the "Radio & RDS Looping" section:
"are regular frequency intervals" -> "at regular frequency intervals"

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-28 08:00:28 +02:00
Rafael J. Wysocki
f53b4bb83d Add support for amd-pstate core performance boost support which
allows controlling which CPU cores can operate above nominal
 frequencies for short periods of time.
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmZ8fvEaHG1hcmlvLmxp
 bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3AnbmPQ/9FrmbFf8t0e7WQJ7wrlHz
 8HmeHGLNLQbOWLrNPP2um+33i97hxJ+h8RHnPnr9wzdDl1R+u2oR1vu5DXCYgBBA
 d9rLJv1YSFnEu9VPklAHWMyHHb+F6OsUyk6yPl8R50j2E3HOb/TjwLxIfxC0C80p
 ox2ffArMfO5iKEAcVkpKQuh0prWDoxl4eQ8UI2DoKLMu1UyZRmH/jWL8l1qNGpwF
 4nRwYl4xERF2qnMaszN+QZREirmXwzU5y1gylx25qKDpFwzotulkEyQDGVPfqBr2
 kTz0mvc+i1mrJ2P5MG5gi1Mgsxd5dA1VPhYDk+4vgE+oPnJp3kdBtOKWfkmN+mgn
 PB6gFMWJFpLm/Kl4Lu8TS3m+aE0Euctcu/pVYEhxeP5bEJ82gbxgT9/kd2hfMtMi
 6QbBTIpoJcLnUuMEaOXRYlpuAmaG3cp/gDI4UO8tid+BgoyGbOK8fkToL2s0mIFx
 JrH19ZBAEXSWcoMQVmY118H8Uy4J+1IsA4IlZweTV0ZQPQ/W8VQ2blfyvRo7dSGj
 JtGhkOYtXdtYKahqC06fyi5lfzy+huiLjQElBOFWTl5x+usntpeCuJrG2kZ/gAiS
 gxVsL1FX6J7cxN866ty3jdwNYwOt5/JwG/oq3buBCeKYobQB3qY9bK6V42siC+Qv
 bcmTcy0lrfzZoNW5fvo3JZE=
 =EZq/
 -----END PGP SIGNATURE-----

Merge tag 'amd-pstate-v6.11-2024-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux

Merge more amd-pstate driver updates for 6.11 from Mario Limonciello:

"Add support for amd-pstate core performance boost support which
 allows controlling which CPU cores can operate above nominal
 frequencies for short periods of time."

* tag 'amd-pstate-v6.11-2024-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
  Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method
  cpufreq: amd-pstate: Cap the CPPC.max_perf to nominal_perf if CPB is off
  cpufreq: amd-pstate: initialize core precision boost state
  cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
2024-06-27 21:26:36 +02:00
Rafael J. Wysocki
b11ec63abe Merge back cpufreq material for v6.11. 2024-06-27 21:20:10 +02:00
Thomas Huth
661404644d Documentation: Remove IA-64 from kernel-parameters
IA-64 has been removed from the tree, so we should also remove
the corresponding kernel-parameters documentation now.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240627162458.387700-1-thuth@redhat.com
2024-06-27 11:31:52 -06:00
Jacopo Mondi
5b683b2030 media: admin-guide: Document the Raspberry Pi PiSP BE
Add documentation for the PiSP Back End memory-to-memory ISP.

Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-27 13:06:47 +02:00
Diederik de Haas
ddb77059b2 docs: verify/bisect: Fix rendered version URL
When rendering the documentation, the 'html' file extension replaces the
'rst' file extension, not add it. So remove the 'rst' part of the URL.

Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
Reviewed-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240620081355.11549-1-didi.debian@cknow.org
2024-06-26 16:54:24 -06:00
Perry Yuan
6d588891a9 Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method
Updates the documentation in `amd-pstate.rst` to include information about
the per CPU boost control feature. Users can now enable or disable the
Core Performance Boost (CPB) feature on individual CPUs using the `boost`
sysfs attribute.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240626042733.3747-5-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-06-26 15:48:21 -05:00
Greg Kroah-Hartman
12b7210ea8 Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports"
This reverts commit 5c3a766e9f.

Let's roll back all of the serial core and printk console changes that
went into 6.10-rc1 as there still are problems with them that need to be
sorted out.

Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1
Reported-by: Petr Mladek <pmladek@suse.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-25 07:57:51 +02:00
Samuel Wein
9123419c3b media: Documentation: ipu6: Fix examples in ipu6-isys admin-guide
Fix flags in X1 Yoga example. MEDIA_LNK_FL_DYNAMIC (0x4 in the link flag)
was removed in V4 Intel IPU6 and IPU6 input system drivers. Added -V flag
to media-ctl commands for X1 Yoga, lower-case v only makes it verbose
upper-case V sets the format.

Signed-off-by: Samuel Wein <sam@samwein.com>
[Sakari Ailus: Align subject line, rewrap commit message.]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-06-24 16:47:34 +02:00
Perry Yuan
1d53f30b3a Documentation: PM: amd-pstate: add guided mode to the Operation mode
the guided mode is also supported, so the operation mode should include
that mode as well.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/a61d825ef71f6aacc8f1624fe9fb982b8446b5a7.1718811234.git.perry.yuan@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-06-20 21:52:05 -05:00
Waiman Long
737bb142a0 cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus
The "cpuset.cpus.exclusive.effective" value is currently limited to a
subset of its "cpuset.cpus". This makes the exclusive CPUs distribution
hierarchy subsumed within the larger "cpuset.cpus" hierarchy. We have to
decide on what CPUs are used locally and what CPUs can be passed down as
exclusive CPUs down the hierarchy and combine them into "cpuset.cpus".

The advantage of the current scheme is to have only one hierarchy to
worry about. However, it make it harder to use as all the "cpuset.cpus"
values have to be properly set along the way down to the designated remote
partition root. It also makes it more cumbersome to find out what CPUs
can be used locally.

Make creation of remote partition simpler by breaking the
dependency of "cpuset.cpus.exclusive" on "cpuset.cpus" and make
them independent entities. Now we have two separate hierarchies -
one for setting "cpuset.cpus.effective" and the other one for setting
"cpuset.cpus.exclusive.effective". We may not need to set "cpuset.cpus"
when we activate a partition root anymore.

Also update Documentation/admin-guide/cgroup-v2.rst and cpuset.c comment
to document this change.

Suggested-by: Petr Malat <oss@malat.biz>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-19 07:37:38 -10:00
Waiman Long
fe8cd2736e cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition
The CS_CPU_EXCLUSIVE flag is currently set whenever cpuset.cpus.exclusive
is set to make sure that the exclusivity test will be run to ensure its
exclusiveness. At the same time, this flag can be changed whenever the
partition root state is changed. For example, the CS_CPU_EXCLUSIVE flag
will be reset whenever a partition root becomes invalid. This makes
using CS_CPU_EXCLUSIVE to ensure exclusiveness a bit fragile.

The current scheme also makes setting up a cpuset.cpus.exclusive
hierarchy to enable remote partition harder as cpuset.cpus.exclusive
cannot overlap with any cpuset.cpus of sibling cpusets if their
cpuset.cpus.exclusive aren't set.

Solve these issues by deferring the setting of CS_CPU_EXCLUSIVE flag
until the cpuset become a valid partition root while adding new checks
in validate_change() to ensure that cpuset.cpus.exclusive of sibling
cpusets cannot overlap.

An additional check is also added to validate_change() to make sure that
cpuset.cpus of one cpuset cannot be a subset of cpuset.cpus.exclusive
of a sibling cpuset to avoid the problem that none of those CPUs will
be available when these exclusive CPUs are extracted out to a newly
enabled partition root. The Documentation/admin-guide/cgroup-v2.rst
file is updated to document the new constraints.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-19 07:37:37 -10:00
Steven Rostedt (Google)
d9d814eebb pstore/ramoops: Add ramoops.mem_name= command line option
Add a method to find a region specified by reserve_mem=nn:align:name for
ramoops. Adding a kernel command line parameter:

  reserve_mem=12M:4096:oops ramoops.mem_name=oops

Will use the size and location defined by the memmap parameter where it
finds the memory and labels it "oops". The "oops" in the ramoops option
is used to search for it.

This allows for arbitrary RAM to be used for ramoops if it is known that
the memory is not cleared on kernel crashes or soft reboots.

Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20240613155527.591647061@goodmis.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-06-19 18:05:14 +03:00
Steven Rostedt (Google)
1e4c64b71c mm/memblock: Add "reserve_mem" to reserved named memory at boot up
In order to allow for requesting a memory region that can be used for
things like pstore on multiple machines where the memory layout is not the
same, add a new option to the kernel command line called "reserve_mem".

The format is:  reserve_mem=nn:align:name

Where it will find nn amount of memory at the given alignment of align.
The name field is to allow another subsystem to retrieve where the memory
was found. For example:

  reserve_mem=12M:4096:oops ramoops.mem_name=oops

Where ramoops.mem_name will tell ramoops that memory was reserved for it
via the reserve_mem option and it can find it by calling:

  if (reserve_mem_find_by_name("oops", &start, &size)) {
	// start holds the start address and size holds the size given

This is typically used for systems that do not wipe the RAM, and this
command line will try to reserve the same physical memory on soft reboots.
Note, it is not guaranteed to be the same location. For example, if KASLR
places the kernel at the location of where the RAM reservation was from a
previous boot, the new reservation will be at a different location.  Any
subsystem using this feature must add a way to verify that the contents of
the physical memory is from a previous boot, as there may be cases where
the memory will not be located at the same location.

Not all systems may work either. There could be bit flips if the reboot
goes through the BIOS. Using kexec to reboot the machine is likely to
have better results in such cases.

Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/

Suggested-by: Mike Rapoport <rppt@kernel.org>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240613155527.437020271@goodmis.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-06-19 10:59:49 +03:00
Greg Kroah-Hartman
b0fc24f361 Linux 6.10-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmZvTbAeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGVksIAJEn4a9IVM8FNCJy
 Dxo0BItD1/qJ5mLDptqUFRKlxInjbojofz5CyoeIeXb0DwRfB16ALXqNXAkd3APi
 saoOpfjFsg2H2OqL9CHdkzWcJEAq2lDnL0zaOjumeDVu/EyeT+tC4e4hq1e6Bm0E
 fPC5ms2b+07DF9Rg6/DW8yPbdM5n6Mz1bRd3fQOIgvpM3yGOyGztEBgTRub/ZUgH
 5pNJauknFAZgdiWhgNpc+lPWYZbgHKULQPhUBPdVhDIXPtQNUlKgNTQc6+L0Nmbb
 K1sG1q7FLeMJOTFGQfD4r26X5DNQUi894q/9SX8X7rcrECdJKcw2WjVyB4myADpf
 ae2gP+A=
 =XjWP
 -----END PGP SIGNATURE-----

Merge tag 'v6.10-rc4' into usb-next

We need the USB / Thunderbolt fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-18 11:06:17 +02:00
Thomas Huth
166d6019f9 Documentation: Remove the unused "tp720" from kernel-parameters.txt
The "tp720" switch once belonged to the ps2esdi driver, but this
driver has been removed a long time ago in 2008 in the commit
2af3e6017e ("The ps2esdi driver was marked as BROKEN more than two years ago due to being no longer working for some time.")
already, so let's remove it from the documentation now, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240617073322.40679-1-thuth@redhat.com
2024-06-17 16:45:24 -06:00
Thomas Huth
9b8b80b9f6 Documentation: Remove the unused "topology_updates" from kernel-parameters.txt
The "topology_updates" switch has been removed four years ago in commit
c30f931e89 ("powerpc/numa: remove ability to enable topology updates"),
so let's remove this from the documentation, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240617060848.38937-1-thuth@redhat.com
2024-06-17 16:44:36 -06:00
Thomas Huth
69bce7f3dc Documentation: Remove unused "nps_mtm_hs_ctr" from kernel-parameters.txt
The "nps_mtm_hs_ctr" parameter has been removed in commit dd7c7ab01a
("ARC: [plat-eznps]: Drop support for EZChip NPS platform"). Remove it
from the documentation now, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614190804.602970-1-thuth@redhat.com
2024-06-17 16:25:29 -06:00
Thomas Huth
f730144162 Documentation: Remove unused "spia_*" kernel parameters
The kernel module parameters "spia_io_base", "spia_fio_base",
"spia_pedr" and "spia_peddr" have been removed via commit e377ca1e32
("ARM: clps711x: p720t: Special driver for handling NAND memory is removed").
Time to remove them from the documentation now, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614184041.601056-1-thuth@redhat.com
2024-06-17 16:25:05 -06:00
Thomas Huth
f891e73f96 Documentation: Remove unused "mtdset=" from kernel-parameters.txt
The kernel parameter "mtdset" has been removed two years ago in
commit 61b7f8920b ("ARM: s3c: remove all s3c24xx support") and
thus should be removed from the documentation now, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614182508.600113-1-thuth@redhat.com
2024-06-17 16:24:32 -06:00
Thomas Huth
35a9cbeefd Documentation: Remove the "rhash_entries=" from kernel-parameters.txt
"rhash_entries" belonged to the routing cache that has been removed in
commit 89aef8921b ("ipv4: Delete routing cache.").

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614092134.563082-1-thuth@redhat.com
2024-06-17 16:24:04 -06:00
Thomas Huth
2626f066f8 Documentation: Remove "ltpc=" from the kernel-parameters.txt
The string "ltpc" cannot be found in the source code anymore. This
kernel parameter likely belonged to the LocalTalk PC card module
which has been removed in commit 03dcb90dbf ("net: appletalk:
remove Apple/Farallon LocalTalk PC support"), so we should remove
it from kernel-parameters.txt now, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614084633.560069-1-thuth@redhat.com
2024-06-17 16:15:00 -06:00
Thomas Huth
6bb955d4fb Documentation: Add "S390" to the swiotlb kernel parameter
The "swiotlb" kernel parameter is used on s390 for protected virt since
commit 64e1f0c531 ("s390/mm: force swiotlb for protected virtualization")
and thus should be marked in kernel-parameters.txt accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614081438.553160-1-thuth@redhat.com
2024-06-17 16:12:10 -06:00
David Hildenbrand
384a746bb5 Revert "mm: init_mlocked_on_free_v3"
There was insufficient review and no agreement that this is the right
approach.

There are serious flaws with the implementation that make processes using
mlock() not even work with simple fork() [1] and we get reliable crashes
when rebooting.

Further, simply because we might be unmapping a single PTE of a large
mlocked folio, we shouldn't zero out the whole folio.

... especially because the code can also *corrupt* urelated memory because
	kernel_init_pages(page, folio_nr_pages(folio));

Could end up writing outside of the actual folio if we work with a tail
page.

Let's revert it.  Once there is agreement that this is the right approach,
the issues were fixed and there was reasonable review and proper testing,
we can consider it again.

[1] https://lkml.kernel.org/r/4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com

Link: https://lkml.kernel.org/r/20240605091710.38961-1-david@redhat.com
Fixes: ba42b524a0 ("mm: init_mlocked_on_free_v3")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/lkml/20240528151340.4282-1-00107082@163.com/
Reported-by: Lance Yang <ioworker0@gmail.com>
Closes: https://lkml.kernel.org/r/20240601140917.43562-1-ioworker0@gmail.com
Acked-by: Lance Yang <ioworker0@gmail.com>
Cc: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:05 -07:00
Colton Lewis
0b5afe0537 KVM: arm64: Add early_param to control WFx trapping
Add an early_params to control WFI and WFE trapping. This is to
control the degree guests can wait for interrupts on their own without
being trapped by KVM. Options for each param are trap and notrap. trap
enables the trap. notrap disables the trap. Note that when enabled,
traps are allowed but not guaranteed by the CPU architecture. Absent
an explicitly set policy, default to current behavior: disabling the
trap if only a single task is running and enabling otherwise.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20240523174056.1565133-1-coltonlewis@google.com
[ oliver: rework kvm_vcpu_should_clear_tw*() for readability ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-14 20:11:15 +00:00
Steven Rostedt (Google)
e645535a95 tracing: Add option to use memmapped memory for trace boot instance
Add an option to the trace_instance kernel command line parameter that
allows it to use the reserved memory from memmap boot parameter.

  memmap=12M$0x284500000 trace_instance=boot_mapped@0x284500000:12M

The above will reserves 12 megs at the physical address 0x284500000.
The second parameter will create a "boot_mapped" instance and use the
memory reserved as the memory for the ring buffer.

That will create an instance called "boot_mapped":

  /sys/kernel/tracing/instances/boot_mapped

Note, because the ring buffer is using a defined memory ranged, it will
act just like a memory mapped ring buffer. It will not have a snapshot
buffer, as it can't swap out the buffer. The snapshot files as well as any
tracers that uses a snapshot will not be present in the boot_mapped
instance.

Link: https://lkml.kernel.org/r/20240612232026.329660169@goodmis.org

Cc: linux-mm@kvack.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineeth Pillai <vineeth@bitbyteword.org>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14 12:28:21 -04:00
Thomas Huth
9b9eec8dc2 Documentation: Remove "mfgpt_irq=" from the kernel-parameters.txt file
The kernel parameter mfgpt_irq has been removed in 2009 already by

  c95d1e53ed ("cs5535: drop the Geode-specific MFGPT/GPIO code")

Time to remove it from the documentation now, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20240614090306.561464-1-thuth@redhat.com
2024-06-14 18:06:57 +02:00
Jinjie Ruan
99a021edde Documentation: kernel-parameters: Add RISCV for nohlt
Since commit bcf11b5e99 ("riscv: Enable idle generic idle loop") enable
idle generic idle loop for RISCV, but the document is not updated
synchronously, so update RISCV support for nohlt.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Fixes: bcf11b5e99 ("riscv: Enable idle generic idle loop")
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240604114005.875609-1-ruanjinjie@huawei.com
2024-06-12 15:32:06 -06:00
Linus Torvalds
dc772f8237 14 hotfixes, 6 of which are cc:stable.
All except the nilfs2 fix affect MM and all are singletons - see the
 chagelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZmOJLgAKCRDdBJ7gKXxA
 jinQAQC0AjAhN7zuxfCb9ljCsqyyAfsWbeyXAlqdhuRt2xZONgD+Nv2XwSUw0ZUv
 xHGgPodMCrmEvuLo048qRpdJRbYo8gw=
 =sM9B
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "14 hotfixes, 6 of which are cc:stable.

  All except the nilfs2 fix affect MM and all are singletons - see the
  chagelogs for details"

* tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
  mm: fix xyz_noprof functions calling profiled functions
  codetag: avoid race at alloc_slab_obj_exts
  mm/hugetlb: do not call vma_add_reservation upon ENOMEM
  mm/ksm: fix ksm_zero_pages accounting
  mm/ksm: fix ksm_pages_scanned accounting
  kmsan: do not wipe out origin when doing partial unpoisoning
  vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()
  mm: page_alloc: fix highatomic typing in multi-block buddies
  nilfs2: fix potential kernel bug due to lack of writeback flag waiting
  memcg: remove the lockdep assert from __mod_objcg_mlstate()
  mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes
  mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n
  mm: drop the 'anon_' prefix for swap-out mTHP counters
2024-06-07 17:01:10 -07:00
Baolin Wang
0d648dd5c8 mm: drop the 'anon_' prefix for swap-out mTHP counters
The mTHP swap related counters: 'anon_swpout' and 'anon_swpout_fallback'
are confusing with an 'anon_' prefix, since the shmem can swap out
non-anonymous pages.  So drop the 'anon_' prefix to keep consistent with
the old swap counter names.

This is needed in 6.10-rcX to avoid having an inconsistent ABI out in the
field.

Link: https://lkml.kernel.org/r/7a8989c13299920d7589007a30065c3e2c19f0e0.1716431702.git.baolin.wang@linux.alibaba.com
Fixes: d0f048ac39 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
Fixes: 42248b9d34 ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:23 -07:00
Sean Christopherson
c793a62823 sched/core: Drop spinlocks on contention iff kernel is preemptible
Use preempt_model_preemptible() to detect a preemptible kernel when
deciding whether or not to reschedule in order to drop a contended
spinlock or rwlock.  Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
preemption model is "none" or "voluntary".  In short, make kernels with
dynamically selected models behave the same as kernels with statically
selected models.

Somewhat counter-intuitively, NOT yielding a lock can provide better
latency for the relevant tasks/processes.  E.g. KVM x86's mmu_lock, a
rwlock, is often contended between an invalidation event (takes mmu_lock
for write) and a vCPU servicing a guest page fault (takes mmu_lock for
read).  For _some_ setups, letting the invalidation task complete even
if there is mmu_lock contention provides lower latency for *all* tasks,
i.e. the invalidation completes sooner *and* the vCPU services the guest
page fault sooner.

But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
can vary depending on the host VMM, the guest workload, the number of
vCPUs, the number of pCPUs in the host, why there is lock contention, etc.

In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
opposite and removing contention yielding entirely) needs to come with a
big pile of data proving that changing the status quo is a net positive.

Opportunistically document this side effect of preempt=full, as yielding
contended spinlocks can have significant, user-visible impact.

Fixes: c597bfddc9 ("sched: Provide Kconfig support for default dynamic preempt mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Link: https://lore.kernel.org/kvm/ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@proxmox.com
2024-06-05 16:52:36 +02:00
Norihiko Hama
804da867ad usb-storage: Optimize scan delay more precisely
Current storage scan delay is reduced by the following old commit.

a4a47bc03f ("Lower USB storage settling delay to something more reasonable")

It means that delay is at least 'one second', or zero with delay_use=0.
'one second' is still long delay especially for embedded system but
when delay_use is set to 0 (no delay), still error observed on some USB drives.

So delay_use should not be set to 0 but 'one second' is quite long.
Especially for embedded system, it's important for end user
how quickly access to USB drive when it's connected.
That's why we have a chance to minimize such a constant long delay.

This patch optimizes scan delay more precisely
to minimize delay time but not to have any problems on USB drives
by extending module parameter 'delay_use' in milliseconds internally.
The parameter 'delay_use' optionally supports in milliseconds
if it ends with 'ms'.
It makes the range of value to 1 / 1000 in internal 32-bit value
but it's still enough to set the delay time.
By default, delay time is 'one second' for backward compatibility.

For example, it seems to be good by changing delay_use=100ms,
that is 100 millisecond delay without issues for most USB pen drives.

Signed-off-by: Norihiko Hama <Norihiko.Hama@alpsalpine.com>
Link: https://lore.kernel.org/r/20240515004339.29892-1-Norihiko.Hama@alpsalpine.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:40:47 +02:00
Tetsuo Handa
c6144a2116 tomoyo: update project links
TOMOYO project has moved to SourceForge.net .

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2024-06-03 22:43:11 +09:00
Hans de Goede
0b178b0267 platform/x86: touchscreen_dmi: Add support for setting touchscreen properties from cmdline
On x86/ACPI platforms touchscreens mostly just work without needing any
device/model specific configuration. But in some cases (mostly with Silead
and Goodix touchscreens) it is still necessary to manually specify various
touchscreen-properties on a per model basis.

touchscreen_dmi is a special place for DMI quirks for this, but it can be
challenging for users to figure out the right property values, especially
for Silead touchscreens where non of these can be read back from
the touchscreen-controller.

ATM users can only test touchscreen properties by editing touchscreen_dmi.c
and then building a completely new kernel which makes it unnecessary
difficult for users to test and submit properties when necessary for their
laptop / tablet model.

Add support for specifying properties on the kernel commandline to allow
users to easily figure out the right settings. See the added documentation
in kernel-parameters.txt for the commandline syntax.

Cc: Gregor Riepl <onitake@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240523143601.47555-1-hdegoede@redhat.com
2024-05-27 11:42:57 +02:00
Michal Koutný
3f26a885a0 cgroup/pids: Add pids.events.local
Hierarchical counting of events is not practical for watching when a
particular pids.max is being hit. Therefore introduce .local flavor of
events file (akin to memory controller) that collects only events
relevant to given cgroup.

The file is only added to the default hierarchy.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-26 08:45:10 -10:00
Michal Koutný
385a635cac cgroup/pids: Make event counters hierarchical
The pids.events file should honor the hierarchy, so make the events
propagate from their origin up to the root on the unified hierarchy. The
legacy behavior remains non-hierarchical.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-26 08:45:09 -10:00
Michal Koutný
73e75e6fc3 cgroup/pids: Separate semantics of pids.events related to pids.max
Currently, when pids.max limit is breached in the hierarchy, the event
is counted and reported in the cgroup where the forking task resides.

This decouples the limit and the notification caused by the limit making
it hard to detect when the actual limit was effected.

Redefine the pids.events:max as: the number of times the limit of the
cgroup was hit.

(Implementation differentiates also "forkfail" event but this is
currently not exposed as it would better fit into pids.stat. It also
differs from pids.events:max only when pids.max is configured on
non-leaf cgroups.)

Since it changes semantics of the original "max" event, introduce this
change only in the v2 API of the controller and add a cgroup2 mount
option to revert to the legacy behavior.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-26 08:45:09 -10:00
Linus Torvalds
f6b8e86b7a TTY/Serial changes for 6.10-rc1
Here is the big set of tty/serial driver changes for 6.10-rc1.  Included
 in here are:
   - Usual good set of api cleanups and evolution by Jiri Slaby to make
     the serial interfaces move out of the 1990's by using kfifos instead
     of hand-rolling their own logic.
   - 8250_exar driver updates
   - max3100 driver updates
   - sc16is7xx driver updates
   - exar driver updates
   - sh-sci driver updates
   - tty ldisc api addition to help refuse bindings
   - other smaller serial driver updates
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk4Cvg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymqpwCgnHU1NeBBUsvoSDOLk5oApIQ4jVgAn102jWlw
 3dNDhA4i3Ay/mZdv8/Kj
 =TI+P
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial updates from Greg KH:
 "Here is the big set of tty/serial driver changes for 6.10-rc1.
  Included in here are:

   - Usual good set of api cleanups and evolution by Jiri Slaby to make
     the serial interfaces move out of the 1990's by using kfifos
     instead of hand-rolling their own logic.

   - 8250_exar driver updates

   - max3100 driver updates

   - sc16is7xx driver updates

   - exar driver updates

   - sh-sci driver updates

   - tty ldisc api addition to help refuse bindings

   - other smaller serial driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (113 commits)
  serial: Clear UPF_DEAD before calling tty_port_register_device_attr_serdev()
  serial: imx: Raise TX trigger level to 8
  serial: 8250_pnp: Simplify "line" related code
  serial: sh-sci: simplify locking when re-issuing RXDMA fails
  serial: sh-sci: let timeout timer only run when DMA is scheduled
  serial: sh-sci: describe locking requirements for invalidating RXDMA
  serial: sh-sci: protect invalidating RXDMA on shutdown
  tty: add the option to have a tty reject a new ldisc
  serial: core: Call device_set_awake_path() for console port
  dt-bindings: serial: brcm,bcm2835-aux-uart: convert to dtschema
  tty: serial: uartps: Add support for uartps controller reset
  arm64: zynqmp: Add resets property for UART nodes
  dt-bindings: serial: cdns,uart: Add optional reset property
  serial: 8250_pnp: Switch to DEFINE_SIMPLE_DEV_PM_OPS()
  serial: 8250_exar: Keep the includes sorted
  serial: 8250_exar: Make type of bit the same in exar_ee_*_bit()
  serial: 8250_exar: Use BIT() in exar_ee_read()
  serial: 8250_exar: Switch to use dev_err_probe()
  serial: 8250_exar: Return directly from switch-cases
  serial: 8250_exar: Decrease indentation level
  ...
2024-05-22 11:53:02 -07:00
Linus Torvalds
eb6a9339ef Mainly singleton patches, documented in their respective changelogs.
Notable series include:
 
 - Some maintenance and performance work for ocfs2 in Heming Zhao's
   series "improve write IO performance when fragmentation is high".
 
 - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
   exposed by fstests".
 
 - kfifo header rework from Andy Shevchenko in the series "kfifo: Clean
   up kfifo.h".
 
 - GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes
   for $lx_current and $lx_per_cpu".
 
 - After much discussion, a coding-style update from Barry Song
   explaining one reason why inline functions are preferred over macros.
   The series is "codingstyle: avoid unused parameters for a function-like
   macro".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkpLYQAKCRDdBJ7gKXxA
 jo9NAQDctSD3TMXqxqCHLaEpCaYTYzi6TGAVHjgkqGzOt7tYjAD/ZIzgcmRwthjP
 R7SSiSgZ7UnP9JRn16DQILmFeaoG1gs=
 =lYhr
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-mm updates from Andrew Morton:
 "Mainly singleton patches, documented in their respective changelogs.
  Notable series include:

   - Some maintenance and performance work for ocfs2 in Heming Zhao's
     series "improve write IO performance when fragmentation is high".

   - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
     exposed by fstests".

   - kfifo header rework from Andy Shevchenko in the series "kfifo:
     Clean up kfifo.h".

   - GDB script fixes from Florian Rommel in the series "scripts/gdb:
     Fixes for $lx_current and $lx_per_cpu".

   - After much discussion, a coding-style update from Barry Song
     explaining one reason why inline functions are preferred over
     macros. The series is "codingstyle: avoid unused parameters for a
     function-like macro""

* tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits)
  fs/proc: fix softlockup in __read_vmcore
  nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON()
  scripts: checkpatch: check unused parameters for function-like macro
  Documentation: coding-style: ask function-like macros to evaluate parameters
  nilfs2: use __field_struct() for a bitwise field
  selftests/kcmp: remove unused open mode
  nilfs2: remove calls to folio_set_error() and folio_clear_error()
  kernel/watchdog_perf.c: tidy up kerneldoc
  watchdog: allow nmi watchdog to use raw perf event
  watchdog: handle comma separated nmi_watchdog command line
  nilfs2: make superblock data array index computation sparse friendly
  squashfs: remove calls to set the folio error flag
  squashfs: convert squashfs_symlink_read_folio to use folio APIs
  scripts/gdb: fix detection of current CPU in KGDB
  scripts/gdb: make get_thread_info accept pointers
  scripts/gdb: fix parameter handling in $lx_per_cpu
  scripts/gdb: fix failing KGDB detection during probe
  kfifo: don't use "proxy" headers
  media: stih-cec: add missing io.h
  media: rc: add missing io.h
  ...
2024-05-19 14:02:03 -07:00
Linus Torvalds
8dde191aab Misc fixes:
- Fix a sched_balance_newidle setting bug
 
  - Fix bug in the setting of /sys/fs/cgroup/test/cpu.max.burst
 
  - Fix variable-shadowing build warning
 
  - Extend sched-domains debug output
 
  - Fix documentation
 
  - Fix comments
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZIbj4RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hEng/+NlAh7mm4AWckVjUxqyUnJ/omaV9Fe5F+
 koiihntyvhk+4RR40XomXPq37Av3zPo1dnKI4fJ3yioMs1tB+8JD+nVo3DURLGT/
 4k+lYI+K6RXBzUTpzeYZWVfa+ddGwbRu1KA5joI7QvRfjil7QP5rC5AQbAj0AiVO
 Xvor0M9vEcfkqShTttx4h2u7WVR4zqVEhBxkWNMT6dMxN2HnKm4qcAiX39E8p+Vx
 maC2/iO+1rXORRbUh+KBHR40WAwe2CVvh5hCe1sl+/vGfCbAnMK1k+j85UdV1pFD
 aZ1jSBwIERnx9PdD5zK0GCRx9hmux8mkJCeBseZyK/XubYuVOLiwBxfYA/9C3i3O
 1mQizaFBD8zanEiWj10sOxbfry+XhLwcISIiWC+xLpxKb0MvDD1TIeZR1fJv3Oz7
 14iYhq2CuKhfntYmV6fYTzSzXL2s16dMYMH/7m7cLY0P/cJo2vw7GNxkwPeJsOVN
 uX6jnRde2Kp3q+Er3I2u1SGeAZ8fEzXr19MCWRA0qI+wvgYQkaTgoh9zO9AwRNoa
 9hS/jc6Gq+O5xBMMJIPZMfOVai9RhYlPmQavFCGJLd3EFoVi9jp9+/iXgtyARCZp
 rfXFV9Dd9GvpFRzNnsMrLiKswBzUop5+epHYKZhVHJKH7aiHMbGEFD6cgNlf8k9b
 GFda3ay4JHA=
 =2okO
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

 - Fix a sched_balance_newidle setting bug

 - Fix bug in the setting of /sys/fs/cgroup/test/cpu.max.burst

 - Fix variable-shadowing build warning

 - Extend sched-domains debug output

 - Fix documentation

 - Fix comments

* tag 'sched-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
  sched/fair: Remove stale FREQUENCY_UTIL comment
  sched/fair: Fix initial util_avg calculation
  docs: cgroup-v1: Clarify that domain levels are system-specific
  sched/debug: Dump domains' level
  sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level
  arch/topology: Fix variable naming to avoid shadowing
2024-05-19 11:38:15 -07:00
Linus Torvalds
61307b7be4 The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.  Notable
 series include:
 
 - Lucas Stach has provided some page-mapping
   cleanup/consolidation/maintainability work in the series "mm/treewide:
   Remove pXd_huge() API".
 
 - In the series "Allow migrate on protnone reference with
   MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
   MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
   test.
 
 - In their series "Memory allocation profiling" Kent Overstreet and
   Suren Baghdasaryan have contributed a means of determining (via
   /proc/allocinfo) whereabouts in the kernel memory is being allocated:
   number of calls and amount of memory.
 
 - Matthew Wilcox has provided the series "Various significant MM
   patches" which does a number of rather unrelated things, but in largely
   similar code sites.
 
 - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
   Weiner has fixed the page allocator's handling of migratetype requests,
   with resulting improvements in compaction efficiency.
 
 - In the series "make the hugetlb migration strategy consistent" Baolin
   Wang has fixed a hugetlb migration issue, which should improve hugetlb
   allocation reliability.
 
 - Liu Shixin has hit an I/O meltdown caused by readahead in a
   memory-tight memcg.  Addressed in the series "Fix I/O high when memory
   almost met memcg limit".
 
 - In the series "mm/filemap: optimize folio adding and splitting" Kairui
   Song has optimized pagecache insertion, yielding ~10% performance
   improvement in one test.
 
 - Baoquan He has cleaned up and consolidated the early zone
   initialization code in the series "mm/mm_init.c: refactor
   free_area_init_core()".
 
 - Baoquan has also redone some MM initializatio code in the series
   "mm/init: minor clean up and improvement".
 
 - MM helper cleanups from Christoph Hellwig in his series "remove
   follow_pfn".
 
 - More cleanups from Matthew Wilcox in the series "Various page->flags
   cleanups".
 
 - Vlastimil Babka has contributed maintainability improvements in the
   series "memcg_kmem hooks refactoring".
 
 - More folio conversions and cleanups in Matthew Wilcox's series
 
 	"Convert huge_zero_page to huge_zero_folio"
 	"khugepaged folio conversions"
 	"Remove page_idle and page_young wrappers"
 	"Use folio APIs in procfs"
 	"Clean up __folio_put()"
 	"Some cleanups for memory-failure"
 	"Remove page_mapping()"
 	"More folio compat code removal"
 
 - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
   functions to work on folis".
 
 - Code consolidation and cleanup work related to GUP's handling of
   hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
 
 - Rick Edgecombe has developed some fixes to stack guard gaps in the
   series "Cover a guard gap corner case".
 
 - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
   "mm/ksm: fix ksm exec support for prctl".
 
 - Baolin Wang has implemented NUMA balancing for multi-size THPs.  This
   is a simple first-cut implementation for now.  The series is "support
   multi-size THP numa balancing".
 
 - Cleanups to vma handling helper functions from Matthew Wilcox in the
   series "Unify vma_address and vma_pgoff_address".
 
 - Some selftests maintenance work from Dev Jain in the series
   "selftests/mm: mremap_test: Optimizations and style fixes".
 
 - Improvements to the swapping of multi-size THPs from Ryan Roberts in
   the series "Swap-out mTHP without splitting".
 
 - Kefeng Wang has significantly optimized the handling of arm64's
   permission page faults in the series
 
 	"arch/mm/fault: accelerate pagefault when badaccess"
 	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
 
 - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
   GUP-fast".
 
 - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
   use struct vm_fault".
 
 - selftests build fixes from John Hubbard in the series "Fix
   selftests/mm build without requiring "make headers"".
 
 - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
   series "Improved Memory Tier Creation for CPUless NUMA Nodes".  Fixes
   the initialization code so that migration between different memory types
   works as intended.
 
 - David Hildenbrand has improved follow_pte() and fixed an errant driver
   in the series "mm: follow_pte() improvements and acrn follow_pte()
   fixes".
 
 - David also did some cleanup work on large folio mapcounts in his
   series "mm: mapcount for large folios + page_mapcount() cleanups".
 
 - Folio conversions in KSM in Alex Shi's series "transfer page to folio
   in KSM".
 
 - Barry Song has added some sysfs stats for monitoring multi-size THP's
   in the series "mm: add per-order mTHP alloc and swpout counters".
 
 - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
   and limit checking cleanups".
 
 - Matthew Wilcox has been looking at buffer_head code and found the
   documentation to be lacking.  The series is "Improve buffer head
   documentation".
 
 - Multi-size THPs get more work, this time from Lance Yang.  His series
   "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
   the freeing of these things.
 
 - Kemeng Shi has added more userspace-visible writeback instrumentation
   in the series "Improve visibility of writeback".
 
 - Kemeng Shi then sent some maintenance work on top in the series "Fix
   and cleanups to page-writeback".
 
 - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
   series "Improve anon_vma scalability for anon VMAs".  Intel's test bot
   reported an improbable 3x improvement in one test.
 
 - SeongJae Park adds some DAMON feature work in the series
 
 	"mm/damon: add a DAMOS filter type for page granularity access recheck"
 	"selftests/damon: add DAMOS quota goal test"
 
 - Also some maintenance work in the series
 
 	"mm/damon/paddr: simplify page level access re-check for pageout"
 	"mm/damon: misc fixes and improvements"
 
 - David Hildenbrand has disabled some known-to-fail selftests ni the
   series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
 
 - memcg metadata storage optimizations from Shakeel Butt in "memcg:
   reduce memory consumption by memcg stats".
 
 - DAX fixes and maintenance work from Vishal Verma in the series
   "dax/bus.c: Fixups for dax-bus locking".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
 jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
 nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
 =V3R/
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Linus Torvalds
0cc6f45cec IOMMU Updates for Linux v6.10
Including:
 
 	- Core:
 	  - IOMMU memory usage observability - This will make the memory used
 	    for IO page tables explicitly visible.
 	  - Simplify arch_setup_dma_ops()
 
 	- Intel VT-d:
 	  - Consolidate domain cache invalidation
 	  - Remove private data from page fault message
 	  - Allocate DMAR fault interrupts locally
 	  - Cleanup and refactoring
 
 	- ARM-SMMUv2:
 	  - Support for fault debugging hardware on Qualcomm implementations
 	  - Re-land support for the ->domain_alloc_paging() callback
 
 	- ARM-SMMUv3:
 	  - Improve handling of MSI allocation failure
 	  - Drop support for the "disable_bypass" cmdline option
 	  - Major rework of the CD creation code, following on directly from the
 	    STE rework merged last time around.
 	  - Add unit tests for the new STE/CD manipulation logic
 
 	- AMD-Vi:
 	  - Final part of SVA changes with generic IO page fault handling
 
 	- Renesas IPMMU:
 	  - Add support for R8A779H0 hardware
 
 	- A couple smaller fixes and updates across the sub-tree
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmZHJMkACgkQK/BELZcB
 GuND1Q/+M4RN5jM66XCfhqoP8QaI8I7zDlPDd14ismx0bjtOZhoiXpptKkAA8guo
 7mS57MLqBw/hKYucm1mw+F1qi1HnRWSstKXiCPmzDm3UXYgZJlKkrOw6vydFeHJH
 zx2ei7TmBrc0SrsybWK3NWRfVBBkO8enGZTmti0DfHL/rOFcUM0LHegY51GcDaaH
 SlDr+LLDMeGynSQWhRlVNJVmEI5gpVPitY/mDUpVPoELiW9C0WGk8kPlR11z2pCR
 eUNiqGJUcGasOhmfiYnpJR462eg7J41glquu+YHj8ivPbbu3C4wxgruY/tR4dmJG
 8s6AMAWR53JzG2SrCCwtzyRPSXmKfvixF+VKmlB2Ksc7VAn1xA0DYnY5Tx99EtXu
 qcEaR4SICMti0urmBGo/cGFdXi2TB1ccXqwoRtp1N3KiYnnOaQdLNO9qZdl9uUTI
 uleXACzkCVSssSpBfGjFcPyHU4r3WjMfX0f5ZJPpFMoQmvwV1yeMX7xTEZz4Sxew
 cHfBt9FAW9+4mBMTQfokBt0hZ6jwKcYl/z3Xi2oD+Ik/Qrzx5kcLA8LZLEVRXIBa
 SZh2ASazq/dr8YoZ744VRmlmi+nISAIHbbQMeqQEQgYQh0HpwS9g5HtpsBzNP6aB
 91RHqZSccb/zNdi8e+RH79Y7pX/G5QcuVKcW6KQUBcAAb6hAgOg=
 =JUzp
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core:
   - IOMMU memory usage observability - This will make the memory used
     for IO page tables explicitly visible.
   - Simplify arch_setup_dma_ops()

  Intel VT-d:
   - Consolidate domain cache invalidation
   - Remove private data from page fault message
   - Allocate DMAR fault interrupts locally
   - Cleanup and refactoring

  ARM-SMMUv2:
   - Support for fault debugging hardware on Qualcomm implementations
   - Re-land support for the ->domain_alloc_paging() callback

  ARM-SMMUv3:
   - Improve handling of MSI allocation failure
   - Drop support for the "disable_bypass" cmdline option
   - Major rework of the CD creation code, following on directly from
     the STE rework merged last time around.
   - Add unit tests for the new STE/CD manipulation logic

  AMD-Vi:
   - Final part of SVA changes with generic IO page fault handling

  Renesas IPMMU:
   - Add support for R8A779H0 hardware

  ... and a couple smaller fixes and updates across the sub-tree"

* tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (80 commits)
  iommu/arm-smmu-v3: Make the kunit into a module
  arm64: Properly clean up iommu-dma remnants
  iommu/amd: Enable Guest Translation after reading IOMMU feature register
  iommu/vt-d: Decouple igfx_off from graphic identity mapping
  iommu/amd: Fix compilation error
  iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
  iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  iommu/arm-smmu-v3: Allocate the CD table entry in advance
  iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  iommu/arm-smmu-v3: Consolidate clearing a CD table entry
  iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
  iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  iommu/arm-smmu-v3: Add an ops indirection to the STE code
  iommu/arm-smmu-qcom: Don't build debug features as a kernel module
  iommu/amd: Add SVA domain support
  iommu: Add ops->domain_alloc_sva()
  iommu/amd: Initial SVA support for AMD IOMMU
  iommu/amd: Add support for enable/disable IOPF
  iommu/amd: Add IO page fault notifier handler
  ...
2024-05-18 10:55:13 -07:00
Vitalii Bursov
0f1c74befa docs: cgroup-v1: Clarify that domain levels are system-specific
Add a clarification that domain levels are system-specific
and where to check for system details.

Signed-off-by: Vitalii Bursov <vitaly@bursov.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/42b177a2e897cdf880caf9c2025f5b609e820334.1714488502.git.vitaly@bursov.com
2024-05-17 09:48:25 +02:00
Linus Torvalds
6fd600d742 media updates for v6.10-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmZFpvgACgkQCF8+vY7k
 4RXffg//UOFGd12GwhBtkU1a3cBqT1DAUG8GRnmhLnGypRaiP7ypRhI/LV1ZZ0SQ
 vjKuDuXrbk+JJ4hxNTH8GoisYpnRqqC2vIm5cnjCiMxN/pY/GkzPm7MU5zEhuWMB
 Rtz5RS4UrTtpJ95XxuDhXY5rRb3uPXMF2LUHLUbYq3IoUGz8x/ta1aKE56B35vY+
 jDg9JQugR1ciIf0OL7kvDJJfDUKkGGsr/u4gRWBxntYHtVMdUJXso3tYa78F1mBX
 oTWKc8IFms1JgA7NdDnKttOCO0Ykb0IJxE0qO094xuOPW50wLsLByJXdxJtOBj/Q
 iLvSIVrk//U+re0j6xLJgKES6ldZvDKn5AU3O22lbm9cgeXrbONIHQOSqLumYPCi
 HLnuc0eq4oED1UHj695pNyjgigUmZL9mDMB31AU92r0pfOKpGFRnexT1tyhqFonN
 88HMKInudnLsE7lVPzbUSVZxJfhOFj7jf8LILnRzqzy0HOD7te5KhxdjxtBmXvoN
 lpQ3Cs+i/n3Fe510mO0rcpeR73nYkNnX7EoJWOjojCK+Cz7/GnXICF53T0yAYANA
 W6ZGKNCEEgs8ce6dFrRG33jv0I8b/u6L5BVuWT/Ndam+KwMw59OjKlNPDiTvtwSR
 OZDL9eifturMuMUe0HT6k6k3u6VYWWjn2cvMFHg4g7Y6JOrllfQ=
 =JM5r
 -----END PGP SIGNATURE-----

Merge tag 'media/v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - New V4L2 ioctl VIDIOC_REMOVE_BUFS

 - experimental support for using generic metaformats on V4L2 core

 - New drivers: Intel IPU6 controller driver, Broadcom BCM283x/BCM271x

 - More cleanups at atomisp driver

 - Usual bunch of driver cleanups, improvements and fixes

* tag 'media/v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (328 commits)
  media: bcm2835-unicam: Depend on COMMON_CLK
  Revert "media: v4l2-ctrls: show all owned controls in log_status"
  media: ov2740: Ensure proper reset sequence on probe()
  media: intel/ipu6: Don't print user-triggerable errors to kernel log
  media: bcm2835-unicam: Fix driver path in MAINTAINERS
  media: bcm2835-unicam: Fix a NULL vs IS_ERR() check
  media: bcm2835-unicam: Do not print error when irq not found
  media: bcm2835-unicam: Do not replace IRQ retcode during probe
  media: bcm2835-unicam: Convert to platform remove callback returning void
  media: media: intel/ipu6: Fix spelling mistake "remappinp" -> "remapping"
  media: intel/ipu6: explicitly include vmalloc.h
  media: cec.h: Fix kerneldoc
  media: uvcvideo: Refactor iterators
  media: v4l: async: refactor v4l2_async_create_ancillary_links
  media: intel/ipu6: Don't re-allocate memory for firmware
  media: dvb-frontends: tda10048: Fix integer overflow
  media: tc358746: Use the correct div_ function
  media: i2c: st-mipid02: Use the correct div function
  media: tegra-vde: Refactor timeout handling
  media: stk1160: Use min macro
  ...
2024-05-16 08:45:44 -07:00
Linus Torvalds
de6fef50ea cgroup: Changes for v6.10
- The locking around cpuset hotplug processing has always been a bit of mess
   which was worked around by making hotplug processing asynchronous. The
   asynchronity isn't great and led to other issues. We tried to make the
   behavior synchronous a while ago but that led to lockdep splats. Waiman
   took another stab at cleaning up and making it synchronous. The patch has
   been in -next for well over a month and there haven't been any complaints,
   so fingers crossed.
 
 - Tracepoints added to help understanding rstat lock contentions.
 
 - A bunch of minor changes - doc updates, code cleanups and selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZkUrFA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGfTyAQCwd0aNQOqaKRhJGtWYShqV/aYzurCy1Z2tB9/3
 dkdy9gD7BHNk6kZQEbT97RrHPIduFansLtc76VziACibWBuomgg=
 =2DNQ
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - The locking around cpuset hotplug processing has always been a bit of
   mess which was worked around by making hotplug processing
   asynchronous. The asynchronity isn't great and led to other issues.

   We tried to make the behavior synchronous a while ago but that led to
   lockdep splats. Waiman took another stab at cleaning up and making it
   synchronous. The patch has been in -next for well over a month and
   there haven't been any complaints, so fingers crossed.

 - Tracepoints added to help understanding rstat lock contentions.

 - A bunch of minor changes - doc updates, code cleanups and selftests.

* tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits)
  cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints
  selftests/cgroup: Drop define _GNU_SOURCE
  docs: cgroup-v1: Update page cache removal functions
  selftests/cgroup: fix uninitialized variables in test_zswap.c
  selftests/cgroup: cpu_hogger init: use {} instead of {NULL}
  selftests/cgroup: fix clang warnings: uninitialized fd variable
  selftests/cgroup: fix clang build failures for abs() calls
  cgroup/cpuset: Remove outdated comment in sched_partition_write()
  cgroup/cpuset: Fix incorrect top_cpuset flags
  cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice
  cgroup/cpuset: Statically initialize more members of top_cpuset
  cgroup: Avoid unnecessary looping in cgroup_no_v1()
  cgroup, legacy_freezer: update comment for freezer_css_offline()
  docs, cgroup: add entries for pids to cgroup-v2.rst
  cgroup: don't call cgroup1_pidlist_destroy_all() for v2
  cgroup_freezer: update comment for freezer_css_online()
  cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release
  cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints
  cgroup/pids: Remove superfluous zeroing
  docs: cgroup-v1: Fix description for css_online
  ...
2024-05-15 17:06:08 -07:00
Linus Torvalds
1b294a1f35 Networking changes for 6.10.
Core & protocols
 ----------------
 
  - Complete rework of garbage collection of AF_UNIX sockets.
    AF_UNIX is prone to forming reference count cycles due to fd passing
    functionality. New method based on Tarjan's Strongly Connected Components
    algorithm should be both faster and remove a lot of workarounds
    we accumulated over the years.
 
  - Add TCP fraglist GRO support, allowing chaining multiple TCP packets
    and forwarding them together. Useful for small switches / routers which
    lack basic checksum offload in some scenarios (e.g. PPPoE).
 
  - Support using SMP threads for handling packet backlog i.e. packet
    processing from software interfaces and old drivers which don't
    use NAPI. This helps move the processing out of the softirq jumble.
 
  - Continue work of converting from rtnl lock to RCU protection.
    Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address
    labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files,
    MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs,
    neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link
    information available via rtnetlink.
 
  - Small optimizations from Eric to UDP wake up handling, memory accounting,
    RPS/RFS implementation, TCP packet sizing etc.
 
  - Allow direct page recycling in the bulk API used by XDP, for +2% PPS.
 
  - Support peek with an offset on TCP sockets.
 
  - Add MPTCP APIs for querying last time packets were received/sent/acked,
    and whether MPTCP "upgrade" succeeded on a TCP socket.
 
  - Add intra-node communication shortcut to improve SMC performance.
 
  - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver.
 
  - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
 
  - Add reset reasons for tracing what caused a TCP reset to be sent.
 
  - Introduce direction attribute for xfrm (IPSec) states.
    State can be used either for input or output packet processing.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
    This required touch-ups and renaming of a few existing users.
 
  - Add Endian-dependent __counted_by_{le,be} annotations.
 
  - Make building selftests "quieter" by printing summaries like
    "CC object.o" rather than full commands with all the arguments.
 
 Netfilter
 ---------
 
  - Use GFP_KERNEL to clone elements, to deal better with OOM situations
    and avoid failures in the .commit step.
 
 BPF
 ---
 
  - Add eBPF JIT for ARCv2 CPUs.
 
  - Support attaching kprobe BPF programs through kprobe_multi link in
    a session mode, meaning, a BPF program is attached to both function entry
    and return, the entry program can decide if the return program gets
    executed and the entry program can share u64 cookie value with return
    program. "Session mode" is a common use-case for tetragon and bpftrace.
 
  - Add the ability to specify and retrieve BPF cookie for raw tracepoint
    programs in order to ease migration from classic to raw tracepoints.
 
  - Add an internal-only BPF per-CPU instruction for resolving per-CPU
    memory addresses and implement support in x86, ARM64 and RISC-V JITs.
    This allows inlining functions which need to access per-CPU state.
 
  - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
    atomics in bpf_arena which can be JITed as a single x86 instruction.
    Support BPF arena on ARM64.
 
  - Add a new bpf_wq API for deferring events and refactor process-context
    bpf_timer code to keep common code where possible.
 
  - Harden the BPF verifier's and/or/xor value tracking.
 
  - Introduce crypto kfuncs to let BPF programs call kernel crypto APIs.
 
  - Support bpf_tail_call_static() helper for BPF programs with GCC 13.
 
  - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled.
 
 Driver API
 ----------
 
  - Skip software TC processing completely if all installed rules are
    marked as HW-only, instead of checking the HW-only flag rule by rule.
 
  - Add support for configuring PoE (Power over Ethernet), similar to
    the already existing support for PoDL (Power over Data Line) config.
 
  - Initial bits of a queue control API, for now allowing a single queue
    to be reset without disturbing packet flow to other queues.
 
  - Common (ethtool) statistics for hardware timestamping.
 
 Tests and tooling
 -----------------
 
  - Remove the need to create a config file to run the net forwarding tests
    so that a naive "make run_tests" can exercise them.
 
  - Define a method of writing tests which require an external endpoint
    to communicate with (to send/receive data towards the test machine).
    Add a few such tests.
 
  - Create a shared code library for writing Python tests. Expose the YAML
    Netlink library from tools/ to the tests for easy Netlink access.
 
  - Move netfilter tests under net/, extend them, separate performance tests
    from correctness tests, and iron out issues found by running them
    "on every commit".
 
  - Refactor BPF selftests to use common network helpers.
 
  - Further work filling in YAML definitions of Netlink messages for:
    nftables, team driver, bonding interfaces, vlan interfaces, VF info,
    TC u32 mark, TC police action.
 
  - Teach Python YAML Netlink to decode attribute policies.
 
  - Extend the definition of the "indexed array" construct in the specs
    to cover arrays of scalars rather than just nests.
 
  - Add hyperlinks between definitions in generated Netlink docs.
 
 Drivers
 -------
 
  - Make sure unsupported flower control flags are rejected by drivers,
    and make more drivers report errors directly to the application rather
    than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen).
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - support multiple RSS contexts and steering traffic to them
      - support XDP metadata
      - make page pool allocations more NUMA aware
    - Intel (100G, ice, idpf):
      - extract datapath code common among Intel drivers into a library
      - use fewer resources in switchdev by sharing queues with the PF
      - add PFCP filter support
      - add Ethernet filter support
      - use a spinlock instead of HW lock in PTP clock ops
      - support 5 layer Tx scheduler topology
    - nVidia/Mellanox:
      - 800G link modes and 100G SerDes speeds
      - per-queue IRQ coalescing configuration
    - Marvell Octeon:
      - support offloading TC packet mark action
 
  - Ethernet NICs consumer, embedded and virtual:
    - stop lying about skb->truesize in USB Ethernet drivers, it messes up
      TCP memory calculations
    - Google cloud vNIC:
      - support changing ring size via ethtool
      - support ring reset using the queue control API
    - VirtIO net:
      - expose flow hash from RSS to XDP
      - per-queue statistics
      - add selftests
    - Synopsys (stmmac):
      - support controllers which require an RX clock signal from the MII
        bus to perform their hardware initialization
    - TI:
      - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
      - icssg_prueth: add SW TX / RX Coalescing based on hrtimers
      - cpsw: minimal XDP support
    - Renesas (ravb):
      - support describing the MDIO bus
    - Realtek (r8169):
      - add support for RTL8168M
    - Microchip Sparx5:
      - matchall and flower actions mirred and redirect
 
  - Ethernet switches:
    - nVidia/Mellanox:
      - improve events processing performance
    - Marvell:
      - add support for MV88E6250 family internal PHYs
    - Microchip:
      - add DCB and DSCP mapping support for KSZ switches
      - vsc73xx: convert to PHYLINK
    - Realtek:
      - rtl8226b/rtl8221b: add C45 instances and SerDes switching
 
  - Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup.
 
  - Ethernet PHYs:
    - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
    - micrel: lan8814: add support for PPS out and external timestamp trigger
 
  - WiFi:
    - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers.
      Modern devices can only be configured using nl80211.
    - mac80211/cfg80211
      - handle color change per link for WiFi 7 Multi-Link Operation
    - Intel (iwlwifi):
      - don't support puncturing in 5 GHz
      - support monitor mode on passive channels
      - BZ-W device support
      - P2P with HE/EHT support
      - re-add support for firmware API 90
      - provide channel survey information for Automatic Channel Selection
    - MediaTek (mt76):
      - mt7921 LED control
      - mt7925 EHT radiotap support
      - mt7920e PCI support
    - Qualcomm (ath11k):
      - P2P support for QCA6390, WCN6855 and QCA2066
      - support hibernation
      - ieee80211-freq-limit Device Tree property support
    - Qualcomm (ath12k):
      - refactoring in preparation of multi-link support
      - suspend and hibernation support
      - ACPI support
      - debugfs support, including dfs_simulate_radar support
    - RealTek:
      - rtw88: RTL8723CS SDIO device support
      - rtw89: RTL8922AE Wi-Fi 7 PCI device support
      - rtw89: complete features of new WiFi 7 chip 8922AE including
        BT-coexistence and Wake-on-WLAN
      - rtw89: use BIOS ACPI settings to set TX power and channels
      - rtl8xxxu: enable Management Frame Protection (MFP) support
 
  - Bluetooth:
    - support for Intel BlazarI and Filmore Peak2 (BE201)
    - support for MediaTek MT7921S SDIO
    - initial support for Intel PCIe BT driver
    - remove HCI_AMP support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZD6sQACgkQMUZtbf5S
 IrtLYw/+I73ePGIye37o2jpbodcLAUZVfF3r6uYUzK8hokEcKD0QVJa9w7PizLZ3
 UO45ClOXFLJCkfP4reFenLfxGCel2AJI+F7VFl2xaO2XgrcH/lnVrHqKZEAEXjls
 KoYMnShIolv7h2MKP6hHtyTi2j1wvQUKsZC71o9/fuW+4fUT8gECx1YtYcL73wrw
 gEMdlUgBYC3jiiCUHJIFX6iPJ2t/TC+q1eIIF2K/Osrk2kIqQhzoozcL4vpuAZQT
 99ljx/qRelXa8oppDb7nM5eulg7WY8ZqxEfFZphTMC5nLEGzClxuOTTl2kDYI/D/
 UZmTWZDY+F5F0xvNk2gH84qVJXBOVDoobpT7hVA/tDuybobc/kvGDzRayEVqVzKj
 Q0tPlJs+xBZpkK5TVnxaFLJVOM+p1Xosxy3kNVXmuYNBvT/R89UbJiCrUKqKZF+L
 z/1mOYUv8UklHqYAeuJSptHvqJjTGa/fsEYP7dAUBbc1N2eVB8mzZ4mgU5rYXbtC
 E6UXXiWnoSRm8bmco9QmcWWoXt5UGEizHSJLz6t1R5Df/YmXhWlytll5aCwY1ksf
 FNoL7S4u7AZThL1Nwi7yUs4CAjhk/N4aOsk+41S0sALCx30BJuI6UdesAxJ0lu+Z
 fwCQYbs27y4p7mBLbkYwcQNxAxGm7PSK4yeyRIy2njiyV4qnLf8=
 =EsC2
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Complete rework of garbage collection of AF_UNIX sockets.

     AF_UNIX is prone to forming reference count cycles due to fd
     passing functionality. New method based on Tarjan's Strongly
     Connected Components algorithm should be both faster and remove a
     lot of workarounds we accumulated over the years.

   - Add TCP fraglist GRO support, allowing chaining multiple TCP
     packets and forwarding them together. Useful for small switches /
     routers which lack basic checksum offload in some scenarios (e.g.
     PPPoE).

   - Support using SMP threads for handling packet backlog i.e. packet
     processing from software interfaces and old drivers which don't use
     NAPI. This helps move the processing out of the softirq jumble.

   - Continue work of converting from rtnl lock to RCU protection.

     Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
     address labels, netdev threaded NAPI sysfs files, bonding driver's
     sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
     TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
     of the link information available via rtnetlink.

   - Small optimizations from Eric to UDP wake up handling, memory
     accounting, RPS/RFS implementation, TCP packet sizing etc.

   - Allow direct page recycling in the bulk API used by XDP, for +2%
     PPS.

   - Support peek with an offset on TCP sockets.

   - Add MPTCP APIs for querying last time packets were received/sent/acked
     and whether MPTCP "upgrade" succeeded on a TCP socket.

   - Add intra-node communication shortcut to improve SMC performance.

   - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
     driver.

   - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.

   - Add reset reasons for tracing what caused a TCP reset to be sent.

   - Introduce direction attribute for xfrm (IPSec) states. State can be
     used either for input or output packet processing.

  Things we sprinkled into general kernel code:

   - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().

     This required touch-ups and renaming of a few existing users.

   - Add Endian-dependent __counted_by_{le,be} annotations.

   - Make building selftests "quieter" by printing summaries like
     "CC object.o" rather than full commands with all the arguments.

  Netfilter:

   - Use GFP_KERNEL to clone elements, to deal better with OOM
     situations and avoid failures in the .commit step.

  BPF:

   - Add eBPF JIT for ARCv2 CPUs.

   - Support attaching kprobe BPF programs through kprobe_multi link in
     a session mode, meaning, a BPF program is attached to both function
     entry and return, the entry program can decide if the return
     program gets executed and the entry program can share u64 cookie
     value with return program. "Session mode" is a common use-case for
     tetragon and bpftrace.

   - Add the ability to specify and retrieve BPF cookie for raw
     tracepoint programs in order to ease migration from classic to raw
     tracepoints.

   - Add an internal-only BPF per-CPU instruction for resolving per-CPU
     memory addresses and implement support in x86, ARM64 and RISC-V
     JITs. This allows inlining functions which need to access per-CPU
     state.

   - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
     atomics in bpf_arena which can be JITed as a single x86
     instruction. Support BPF arena on ARM64.

   - Add a new bpf_wq API for deferring events and refactor
     process-context bpf_timer code to keep common code where possible.

   - Harden the BPF verifier's and/or/xor value tracking.

   - Introduce crypto kfuncs to let BPF programs call kernel crypto
     APIs.

   - Support bpf_tail_call_static() helper for BPF programs with GCC 13.

   - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
     program to have code sections where preemption is disabled.

  Driver API:

   - Skip software TC processing completely if all installed rules are
     marked as HW-only, instead of checking the HW-only flag rule by
     rule.

   - Add support for configuring PoE (Power over Ethernet), similar to
     the already existing support for PoDL (Power over Data Line)
     config.

   - Initial bits of a queue control API, for now allowing a single
     queue to be reset without disturbing packet flow to other queues.

   - Common (ethtool) statistics for hardware timestamping.

  Tests and tooling:

   - Remove the need to create a config file to run the net forwarding
     tests so that a naive "make run_tests" can exercise them.

   - Define a method of writing tests which require an external endpoint
     to communicate with (to send/receive data towards the test
     machine). Add a few such tests.

   - Create a shared code library for writing Python tests. Expose the
     YAML Netlink library from tools/ to the tests for easy Netlink
     access.

   - Move netfilter tests under net/, extend them, separate performance
     tests from correctness tests, and iron out issues found by running
     them "on every commit".

   - Refactor BPF selftests to use common network helpers.

   - Further work filling in YAML definitions of Netlink messages for:
     nftables, team driver, bonding interfaces, vlan interfaces, VF
     info, TC u32 mark, TC police action.

   - Teach Python YAML Netlink to decode attribute policies.

   - Extend the definition of the "indexed array" construct in the specs
     to cover arrays of scalars rather than just nests.

   - Add hyperlinks between definitions in generated Netlink docs.

  Drivers:

   - Make sure unsupported flower control flags are rejected by drivers,
     and make more drivers report errors directly to the application
     rather than dmesg (large number of driver changes from Asbjørn
     Sloth Tønnesen).

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support multiple RSS contexts and steering traffic to them
         - support XDP metadata
         - make page pool allocations more NUMA aware
      - Intel (100G, ice, idpf):
         - extract datapath code common among Intel drivers into a library
         - use fewer resources in switchdev by sharing queues with the PF
         - add PFCP filter support
         - add Ethernet filter support
         - use a spinlock instead of HW lock in PTP clock ops
         - support 5 layer Tx scheduler topology
      - nVidia/Mellanox:
         - 800G link modes and 100G SerDes speeds
         - per-queue IRQ coalescing configuration
      - Marvell Octeon:
         - support offloading TC packet mark action

   - Ethernet NICs consumer, embedded and virtual:
      - stop lying about skb->truesize in USB Ethernet drivers, it
        messes up TCP memory calculations
      - Google cloud vNIC:
         - support changing ring size via ethtool
         - support ring reset using the queue control API
      - VirtIO net:
         - expose flow hash from RSS to XDP
         - per-queue statistics
         - add selftests
      - Synopsys (stmmac):
         - support controllers which require an RX clock signal from the
           MII bus to perform their hardware initialization
      - TI:
         - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
         - icssg_prueth: add SW TX / RX Coalescing based on hrtimers
         - cpsw: minimal XDP support
      - Renesas (ravb):
         - support describing the MDIO bus
      - Realtek (r8169):
         - add support for RTL8168M
      - Microchip Sparx5:
         - matchall and flower actions mirred and redirect

   - Ethernet switches:
      - nVidia/Mellanox:
         - improve events processing performance
      - Marvell:
         - add support for MV88E6250 family internal PHYs
      - Microchip:
         - add DCB and DSCP mapping support for KSZ switches
         - vsc73xx: convert to PHYLINK
      - Realtek:
         - rtl8226b/rtl8221b: add C45 instances and SerDes switching

   - Many driver changes related to PHYLIB and PHYLINK deprecated API
     cleanup

   - Ethernet PHYs:
      - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
      - micrel: lan8814: add support for PPS out and external timestamp trigger

   - WiFi:
      - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
        drivers. Modern devices can only be configured using nl80211.
      - mac80211/cfg80211
         - handle color change per link for WiFi 7 Multi-Link Operation
      - Intel (iwlwifi):
         - don't support puncturing in 5 GHz
         - support monitor mode on passive channels
         - BZ-W device support
         - P2P with HE/EHT support
         - re-add support for firmware API 90
         - provide channel survey information for Automatic Channel Selection
      - MediaTek (mt76):
         - mt7921 LED control
         - mt7925 EHT radiotap support
         - mt7920e PCI support
      - Qualcomm (ath11k):
         - P2P support for QCA6390, WCN6855 and QCA2066
         - support hibernation
         - ieee80211-freq-limit Device Tree property support
      - Qualcomm (ath12k):
         - refactoring in preparation of multi-link support
         - suspend and hibernation support
         - ACPI support
         - debugfs support, including dfs_simulate_radar support
      - RealTek:
         - rtw88: RTL8723CS SDIO device support
         - rtw89: RTL8922AE Wi-Fi 7 PCI device support
         - rtw89: complete features of new WiFi 7 chip 8922AE including
           BT-coexistence and Wake-on-WLAN
         - rtw89: use BIOS ACPI settings to set TX power and channels
         - rtl8xxxu: enable Management Frame Protection (MFP) support

   - Bluetooth:
      - support for Intel BlazarI and Filmore Peak2 (BE201)
      - support for MediaTek MT7921S SDIO
      - initial support for Intel PCIe BT driver
      - remove HCI_AMP support"

* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
  selftests: netfilter: fix packetdrill conntrack testcase
  net: gro: fix napi_gro_cb zeroed alignment
  Bluetooth: btintel_pcie: Refactor and code cleanup
  Bluetooth: btintel_pcie: Fix warning reported by sparse
  Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
  Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
  Bluetooth: btintel_pcie: Fix compiler warnings
  Bluetooth: btintel_pcie: Add *setup* function to download firmware
  Bluetooth: btintel_pcie: Add support for PCIe transport
  Bluetooth: btintel: Export few static functions
  Bluetooth: HCI: Remove HCI_AMP support
  Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
  Bluetooth: qca: Fix error code in qca_read_fw_build_info()
  Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
  Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
  Bluetooth: btintel: Add support for BlazarI
  LE Create Connection command timeout increased to 20 secs
  dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
  Bluetooth: compute LE flow credits based on recvbuf space
  Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
  ...
2024-05-14 19:42:24 -07:00
Linus Torvalds
4f8b6f25eb - Add a dm-crypt optional "high_priority" flag that enables the crypt
workqueues to use WQ_HIGHPRI.
 
 - Export dm-crypt workqueues via sysfs (by enabling WQ_SYSFS) to allow
   for improved visibility and controls over IO and crypt workqueues.
 
 - Fix dm-crypt to no longer constrain max_segment_size to PAGE_SIZE.
   This limit isn't needed given that the block core provides late bio
   splitting if bio exceeds underlying limits (e.g. max_segment_size).
 
 - Fix dm-crypt crypt_queue's use of WQ_UNBOUND to not use
   WQ_CPU_INTENSIVE because it is meaningless with WQ_UNBOUND.
 
 - Fix various issues with dm-delay target (ranging from a resource
   teardown fix, a fix for hung task when using kthread mode, and other
   improvements that followed from code inspection).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmZDiggACgkQxSPxCi2d
 A1oQrwf7BUHy7ehwCjRrVlFTteIlx0ULTpPxictakN/S+xZfcZUcbE20OjNVzdk9
 m5dx4Gn557rlMkiC4NDlHazVEVM5BbTpih27rvUgvX2nchUUdfHIT1OvU0isT4Yi
 h9g/o7i9DnBPjvyNjpXjP9YE7Xg8u2X9mxpv8DyU5M+QpFuofwzsfkCP7g14B0g2
 btGxT3AZ5Bo8A/csKeSqHq13Nbq/dcAZZ3IvjIg1xSXjm6CoQ04rfO0TN6SKfsFJ
 GXteBS2JT1MMcXf3oKweAeQduTE+psVFea7gt/8haKFldnV+DpPNg1gU/7rza5Os
 eL1+L1iPY5fuEJIkaqPjBVRGkxQqHg==
 =+OhH
 -----END PGP SIGNATURE-----

Merge tag 'for-6.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Add a dm-crypt optional "high_priority" flag that enables the crypt
   workqueues to use WQ_HIGHPRI.

 - Export dm-crypt workqueues via sysfs (by enabling WQ_SYSFS) to allow
   for improved visibility and controls over IO and crypt workqueues.

 - Fix dm-crypt to no longer constrain max_segment_size to PAGE_SIZE.
   This limit isn't needed given that the block core provides late bio
   splitting if bio exceeds underlying limits (e.g. max_segment_size).

 - Fix dm-crypt crypt_queue's use of WQ_UNBOUND to not use
   WQ_CPU_INTENSIVE because it is meaningless with WQ_UNBOUND.

 - Fix various issues with dm-delay target (ranging from a resource
   teardown fix, a fix for hung task when using kthread mode, and other
   improvements that followed from code inspection).

* tag 'for-6.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-delay: remove timer_lock
  dm-delay: change locking to avoid contention
  dm-delay: fix max_delay calculations
  dm-delay: fix hung task introduced by kthread mode
  dm-delay: fix workqueue delay_timer race
  dm-crypt: don't set WQ_CPU_INTENSIVE for WQ_UNBOUND crypt_queue
  dm: use queue_limits_set
  dm-crypt: stop constraining max_segment_size to PAGE_SIZE
  dm-crypt: export sysfs of all workqueues
  dm-crypt: add the optional "high_priority" flag
2024-05-14 18:34:19 -07:00
Linus Torvalds
103916ffe2 arm64 updates for 6.10
ACPI:
 * Support for the Firmware ACPI Control Structure (FACS) signature
   feature which is used to reboot out of hibernation on some systems.
 
 Kbuild:
 * Support for building Flat Image Tree (FIT) images, where the kernel
   Image is compressed alongside a set of devicetree blobs.
 
 Memory management:
 * Optimisation of our early page-table manipulation for creation of the
   linear mapping.
 
 * Support for userfaultfd write protection, which brings along some nice
   cleanups to our handling of invalid but present ptes.
 
 * Extend our use of range TLBI invalidation at EL1.
 
 Perf and PMUs:
 * Ensure that the 'pmu->parent' pointer is correctly initialised by PMU
   drivers.
 
 * Avoid allocating 'cpumask_t' types on the stack in some PMU drivers.
 
 * Fix parsing of the CPU PMU "version" field in assembly code, as it
   doesn't follow the usual architectural rules.
 
 * Add best-effort unwinding support for USER_STACKTRACE
 
 * Minor driver fixes and cleanups.
 
 Selftests:
 * Minor cleanups to the arm64 selftests (missing NULL check, unused
   variable).
 
 Miscellaneous
 * Add a command-line alias for disabling 32-bit application support.
 
 * Add part number for Neoverse-V2 CPUs.
 
 * Minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmY+IWkQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNBVNB/9JG4jlmgxzbTDoer0md31YFvWCDGeOKx1x
 g3XhE24W5w8eLXnc75p7/tOUKfo0TNWL4qdUs0hJCEUAOSy6a4Qz13bkkkvvBtDm
 nnHvEjidx5yprHggocsoTF29CKgHMJ3bt8rJe6g+O3Lp1JAFlXXNgplX5koeaVtm
 TtaFvX9MGyDDNkPIcQ/SQTFZJ2Oz51+ik6O8SYuGYtmAcR7MzlxH77lHl2mrF1bf
 Jzv/f5n0lS+Gt9tRuFWhbfEm4aKdUlLha4ufzUq42/vJvELboZbG3LqLxRG8DbqR
 +HvyZOG/xtu2dbzDqHkRumMToWmwzD4oBGSK4JAoJxeHavEdAvSG
 =JMvT
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The most interesting parts are probably the mm changes from Ryan which
  optimise the creation of the linear mapping at boot and (separately)
  implement write-protect support for userfaultfd.

  Outside of our usual directories, the Kbuild-related changes under
  scripts/ have been acked by Masahiro whilst the drivers/acpi/ parts
  have been acked by Rafael and the addition of cpumask_any_and_but()
  has been acked by Yury.

  ACPI:

   - Support for the Firmware ACPI Control Structure (FACS) signature
     feature which is used to reboot out of hibernation on some systems

  Kbuild:

   - Support for building Flat Image Tree (FIT) images, where the kernel
     Image is compressed alongside a set of devicetree blobs

  Memory management:

   - Optimisation of our early page-table manipulation for creation of
     the linear mapping

   - Support for userfaultfd write protection, which brings along some
     nice cleanups to our handling of invalid but present ptes

   - Extend our use of range TLBI invalidation at EL1

  Perf and PMUs:

   - Ensure that the 'pmu->parent' pointer is correctly initialised by
     PMU drivers

   - Avoid allocating 'cpumask_t' types on the stack in some PMU drivers

   - Fix parsing of the CPU PMU "version" field in assembly code, as it
     doesn't follow the usual architectural rules

   - Add best-effort unwinding support for USER_STACKTRACE

   - Minor driver fixes and cleanups

  Selftests:

   - Minor cleanups to the arm64 selftests (missing NULL check, unused
     variable)

  Miscellaneous:

   - Add a command-line alias for disabling 32-bit application support

   - Add part number for Neoverse-V2 CPUs

   - Minor fixes and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
  arm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2
  arm64/mm: Add uffd write-protect support
  arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG
  arm64/mm: Remove PTE_PROT_NONE bit
  arm64/mm: generalize PMD_PRESENT_INVALID for all levels
  arm64: simplify arch_static_branch/_jump function
  arm64: Add USER_STACKTRACE support
  arm64: Add the arm64.no32bit_el0 command line option
  drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
  drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
  drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
  kselftest: arm64: Add a null pointer check
  arm64: defer clearing DAIF.D
  arm64: assembler: update stale comment for disable_step_tsk
  arm64/sysreg: Update PIE permission encodings
  kselftest/arm64: Remove unused parameters in abi test
  perf/arm-spe: Assign parents for event_source device
  perf/arm-smmuv3: Assign parents for event_source device
  perf/arm-dsu: Assign parents for event_source device
  perf/arm-dmc620: Assign parents for event_source device
  ...
2024-05-14 11:09:39 -07:00
Linus Torvalds
9776dd3609 X86 interrupt handling update:
Support for posted interrupts on bare metal
 
     Posted interrupts is a virtualization feature which allows to inject
     interrupts directly into a guest without host interaction. The VT-d
     interrupt remapping hardware sets the bit which corresponds to the
     interrupt vector in a vector bitmap which is either used to inject the
     interrupt directly into the guest via a virtualized APIC or in case
     that the guest is scheduled out provides a host side notification
     interrupt which informs the host that an interrupt has been marked
     pending in the bitmap.
 
     This can be utilized on bare metal for scenarios where multiple
     devices, e.g. NVME storage, raise interrupts with a high frequency.  In
     the default mode these interrupts are handles independently and
     therefore require a full roundtrip of interrupt entry/exit.
 
     Utilizing posted interrupts this roundtrip overhead can be avoided by
     coalescing these interrupt entries to a single entry for the posted
     interrupt notification. The notification interrupt then demultiplexes
     the pending bits in a memory based bitmap and invokes the corresponding
     device specific handlers.
 
     Depending on the usage scenario and device utilization throughput
     improvements between 10% and 130% have been measured.
 
     As this is only relevant for high end servers with multiple device
     queues per CPU attached and counterproductive for situations where
     interrupts are arriving at distinct times, the functionality is opt-in
     via a kernel command line parameter.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmZBGUITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYod3xD/98Xa4qZN7eceyyGUhgXnPLOKQzGQ7k
 7cmhsoAYjABeXLvuAvtKePL7ky7OPcqVW2E/g0+jdZuRDkRDbnVkM7CDMRTyL0/b
 BZLhVAXyANKjK79a5WvjL0zDasYQRQ16MQJ6TPa++mX0KhZSI7KvXWIqPWov5i02
 n8UbPUraH5bJi3qGKm6u4n2261Be1gtDag0ZjmGma45/3wsn3bWPoB7iPK6qxmq3
 Q7VARPXAcRp5wYACk6mCOM1dOXMUV9CgI5AUk92xGfXi4RAdsFeNSzeQWn9jHWOf
 CYbbJjNl4QmGP4IWmy6/Up4vIiEhUCOT2DmHsygrQTs/G+nPnMAe1qUuDuECiofj
 iToBL3hn1dHG8uINKOB81MJ33QEGWyYWY8PxxoR3LMTrhVpfChUlJO8T2XK5nu+i
 2EA6XLtJiHacpXhn8HQam0aQN9nvi4wT1LzpkhmboyCQuXTiXuJNbyLIh5TdFa1n
 DzqAGhRB67z6eGevJJ7kTI1X71W0poMwYlzCU8itnLOK8np0zFQ8bgwwqm9opZGq
 V2eSDuZAbqXVolzmaF8NSfM+b/R9URQtWsZ8cEc+/OdVV4HR4zfeqejy60TuV/4G
 39CTnn8vPBKcRSS6CAcJhKPhzIvHw4EMhoU4DJKBtwBdM58RyP9NY1wF3rIPJIGh
 sl61JBuYYuIZXg==
 =bqLN
 -----END PGP SIGNATURE-----

Merge tag 'x86-irq-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 interrupt handling updates from Thomas Gleixner:
 "Add support for posted interrupts on bare metal.

  Posted interrupts is a virtualization feature which allows to inject
  interrupts directly into a guest without host interaction. The VT-d
  interrupt remapping hardware sets the bit which corresponds to the
  interrupt vector in a vector bitmap which is either used to inject the
  interrupt directly into the guest via a virtualized APIC or in case
  that the guest is scheduled out provides a host side notification
  interrupt which informs the host that an interrupt has been marked
  pending in the bitmap.

  This can be utilized on bare metal for scenarios where multiple
  devices, e.g. NVME storage, raise interrupts with a high frequency. In
  the default mode these interrupts are handles independently and
  therefore require a full roundtrip of interrupt entry/exit.

  Utilizing posted interrupts this roundtrip overhead can be avoided by
  coalescing these interrupt entries to a single entry for the posted
  interrupt notification. The notification interrupt then demultiplexes
  the pending bits in a memory based bitmap and invokes the
  corresponding device specific handlers.

  Depending on the usage scenario and device utilization throughput
  improvements between 10% and 130% have been measured.

  As this is only relevant for high end servers with multiple device
  queues per CPU attached and counterproductive for situations where
  interrupts are arriving at distinct times, the functionality is opt-in
  via a kernel command line parameter"

* tag 'x86-irq-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Use existing helper for pending vector check
  iommu/vt-d: Enable posted mode for device MSIs
  iommu/vt-d: Make posted MSI an opt-in command line option
  x86/irq: Extend checks for pending vectors to posted interrupts
  x86/irq: Factor out common code for checking pending interrupts
  x86/irq: Install posted MSI notification handler
  x86/irq: Factor out handler invocation from common_interrupt()
  x86/irq: Set up per host CPU posted interrupt descriptors
  x86/irq: Reserve a per CPU IDT vector for posted MSIs
  x86/irq: Add a Kconfig option for posted MSI
  x86/irq: Remove bitfields in posted interrupt descriptor
  x86/irq: Unionize PID.PIR for 64bit access w/o casting
  KVM: VMX: Move posted interrupt descriptor out of VMX code
2024-05-14 10:01:29 -07:00
Linus Torvalds
6e5a0c30b6 Scheduler changes for v6.10:
- Add cpufreq pressure feedback for the scheduler
 
  - Rework misfit load-balancing wrt. affinity restrictions
 
  - Clean up and simplify the code around ::overutilized and
    ::overload access.
 
  - Simplify sched_balance_newidle()
 
  - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
    handling that changed the output.
 
  - Rework & clean up <asm/vtime.h> interactions wrt. arch_vtime_task_switch()
 
  - Reorganize, clean up and unify most of the higher level
    scheduler balancing function names around the sched_balance_*()
    prefix.
 
  - Simplify the balancing flag code (sched_balance_running)
 
  - Miscellaneous cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZBtA0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gQEw//WiCiV7zTlWShSiG/g8GTfoAvl53QTWXF
 0jQ8TUcoIhxB5VeGgxVG1srYt8f505UXjH7L0MJLrbC3nOgRCg4NK57WiQEachKK
 HORIJHT0tMMsKIwX9D5Ovo4xYJn+j7mv7j/caB+hIlzZAbWk+zZPNWcS84p0ZS/4
 appY6RIcp7+cI7bisNMGUuNZS14+WMdWoX3TgoI6ekgDZ7Ky+kQvkwGEMBXsNElO
 qZOj6yS/QUE4Htwz0tVfd6h5svoPM/VJMIvl0yfddPGurfNw6jEh/fjcXnLdAzZ6
 9mgcosETncQbm0vfSac116lrrZIR9ygXW/yXP5S7I5dt+r+5pCrBZR2E5g7U4Ezp
 GjX1+6J9U6r6y12AMLRjadFOcDvxdwtszhZq4/wAcmS3B9dvupnH/w7zqY9ho3wr
 hTdtDHoAIzxJh7RNEHgeUC0/yQX3wJ9THzfYltDRIIjHTuvl4d5lHgsug+4Y9ClE
 pUIQm/XKouweQN9TZz2ULle4ZhRrR9sM9QfZYfirJ/RppmuKool4riWyQFQNHLCy
 mBRMjFFsTpFIOoZXU6pD4EabOpWdNrRRuND/0yg3WbDat2gBWq6jvSFv2UN1/v7i
 Un5jijTuN7t8yP5lY5Tyf47kQfLlA9bUx1v56KnF9mrpI87FyiDD3MiQVhDsvpGX
 rP96BIOrkSo=
 =obph
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Add cpufreq pressure feedback for the scheduler

 - Rework misfit load-balancing wrt affinity restrictions

 - Clean up and simplify the code around ::overutilized and
   ::overload access.

 - Simplify sched_balance_newidle()

 - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
   handling that changed the output.

 - Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()

 - Reorganize, clean up and unify most of the higher level
   scheduler balancing function names around the sched_balance_*()
   prefix

 - Simplify the balancing flag code (sched_balance_running)

 - Miscellaneous cleanups & fixes

* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  sched/pelt: Remove shift of thermal clock
  sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
  thermal/cpufreq: Remove arch_update_thermal_pressure()
  sched/cpufreq: Take cpufreq feedback into account
  cpufreq: Add a cpufreq pressure feedback for the scheduler
  sched/fair: Fix update of rd->sg_overutilized
  sched/vtime: Do not include <asm/vtime.h> header
  s390/irq,nmi: Include <asm/vtime.h> header directly
  s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
  sched/vtime: Get rid of generic vtime_task_switch() implementation
  sched/vtime: Remove confusing arch_vtime_task_switch() declaration
  sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
  sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
  sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
  sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
  sched/fair: Rename root_domain::overload to ::overloaded
  sched/fair: Use helper functions to access root_domain::overload
  sched/fair: Check root_domain::overload value before update
  sched/fair: Combine EAS check with root_domain::overutilized access
  sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
  ...
2024-05-13 17:18:51 -07:00
Jakub Kicinski
6e62702feb bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZkGcZAAKCRDbK58LschI
 g6o6APwLsqhrM2w71VUN5ciCxu4H5VDtZp6wkdqtVbxxU4qNxQEApKgYgKt8ZLF3
 Kily5c7m+S4ZXhMX21rb8JhSAz0dfQk=
 =5Dk7
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-05-13

We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).

The main changes are:

1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.

2) Add BPF range computation improvements to the verifier in particular
   around XOR and OR operators, refactoring of checks for range computation
   and relaxing MUL range computation so that src_reg can also be an unknown
   scalar, from Cupertino Miranda.

3) Add support to attach kprobe BPF programs through kprobe_multi link in
   a session mode, meaning, a BPF program is attached to both function entry
   and return, the entry program can decide if the return program gets
   executed and the entry program can share u64 cookie value with return
   program. Session mode is a common use-case for tetragon and bpftrace,
   from Jiri Olsa.

4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
   as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.

5) Improvements to BPF selftests in context of BPF gcc backend,
   from Jose E. Marchesi & David Faust.

6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
   -style in order to retire the old test, run it in BPF CI and additionally
   expand test coverage, from Jordan Rife.

7) Big batch for BPF selftest refactoring in order to remove duplicate code
   around common network helpers, from Geliang Tang.

8) Another batch of improvements to BPF selftests to retire obsolete
   bpf_tcp_helpers.h as everything is available vmlinux.h,
   from Martin KaFai Lau.

9) Fix BPF map tear-down to not walk the map twice on free when both timer
   and wq is used, from Benjamin Tissoires.

10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
    from Alexei Starovoitov.

11) Change BTF build scripts to using --btf_features for pahole v1.26+,
    from Alan Maguire.

12) Small improvements to BPF reusing struct_size() and krealloc_array(),
    from Andy Shevchenko.

13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
    from Ilya Leoshkevich.

14) Extend TCP ->cong_control() callback in order to feed in ack and
    flag parameters and allow write-access to tp->snd_cwnd_stamp
    from BPF program, from Miao Xu.

15) Add support for internal-only per-CPU instructions to inline
    bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
    from Puranjay Mohan.

16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
    from Tushar Vyavahare.

17) Extend libbpf to support "module:<function>" syntax for tracing
    programs, from Viktor Malik.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
  bpf: make list_for_each_entry portable
  bpf: ignore expected GCC warning in test_global_func10.c
  bpf: disable strict aliasing in test_global_func9.c
  selftests/bpf: Free strdup memory in xdp_hw_metadata
  selftests/bpf: Fix a few tests for GCC related warnings.
  bpf: avoid gcc overflow warning in test_xdp_vlan.c
  tools: remove redundant ethtool.h from tooling infra
  selftests/bpf: Expand ATTACH_REJECT tests
  selftests/bpf: Expand getsockname and getpeername tests
  sefltests/bpf: Expand sockaddr hook deny tests
  selftests/bpf: Expand sockaddr program return value tests
  selftests/bpf: Retire test_sock_addr.(c|sh)
  selftests/bpf: Remove redundant sendmsg test cases
  selftests/bpf: Migrate ATTACH_REJECT test cases
  selftests/bpf: Migrate expected_attach_type tests
  selftests/bpf: Migrate wildcard destination rewrite test
  selftests/bpf: Migrate sendmsg6 v4 mapped address tests
  selftests/bpf: Migrate sendmsg deny test cases
  selftests/bpf: Migrate WILDCARD_IP test
  selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
  ...
====================

Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 16:41:10 -07:00
Linus Torvalds
8815da98e0 Another not-too-busy cycle for documentation, including:
- Some build-system changes to detect the variable fonts installed by some
   distributions that can break the PDF build.
 
 - Various updates and additions to the Spanish, Chinese, Italian, and
   Japanese translations.
 
 - Update the stable-kernel rules to match modern practice
 
 ...and the usual array of corrections, updates, and typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmY9ASYACgkQF0NaE2wM
 flhPAwf/SYwHTBhKo0Xy3WsY3PHm4hsYVDwQ/Nfr6oa1mF+x4npxcN1RzPJd8iB9
 zXlynnBkptwvEoukJV2hw+gVwO9ixyqJzIt7AmRFgA5cywhklpxQQAVelQG4ISR2
 8M7LOXIjROJdY3OymPcQ2YF1m000tB9Khx7uvWrvMZEasXND/ITi9mFIJiOk841C
 5wGTHmYKjJwuqTm6CsghAgLJkRYGHD+gtp4w8wQwQzIHJ6B8SnbVPSnYYqJ8Qt/V
 31AEBgV3WJhmNiyNgP/p3rtDTCXBowSK8klOMa5CW3FQEIb4SQL/uBZ8qR8FQo2c
 l1zsuPKKJOqe9T+POWHXdjoryZn1Ug==
 =8fUD
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.10' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Another not-too-busy cycle for documentation, including:

   - Some build-system changes to detect the variable fonts installed by
     some distributions that can break the PDF build.

   - Various updates and additions to the Spanish, Chinese, Italian, and
     Japanese translations.

   - Update the stable-kernel rules to match modern practice

  ... and the usual array of corrections, updates, and typo fixes"

* tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits)
  cgroup: Add documentation for missing zswap memory.stat
  kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning.
  docs:core-api: fixed typos and grammar in printk-index page
  Documentation: tracing: Fix spelling mistakes
  docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4
  docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4
  docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4
  docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4
  docs: stable-kernel-rules: fix typo sent->send
  docs/zh_CN: remove two inconsistent spaces
  docs: scripts/check-variable-fonts.sh: Improve commands for detection
  docs: stable-kernel-rules: create special tag to flag 'no backporting'
  docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.)
  docs: stable-kernel-rules: remove code-labels tags and a indention level
  docs: stable-kernel-rules: call mainline by its name and change example
  docs: stable-kernel-rules: reduce redundancy
  docs, kprobes: Add riscv as supported architecture
  Docs: typos/spelling
  docs: kernel_include.py: Cope with docutils 0.21
  docs: ja_JP/howto: Catch up update in v6.8
  ...
2024-05-13 10:51:53 -07:00
Linus Torvalds
c024814828 Hi,
This is pull request for trusted keys subsystem containing a new key
 type for the Data Co-Processor (DCP), which is an IP core built into
 many NXP SoCs such as i.mx6ull.
 
 BR, Jarkko
 -----BEGIN PGP SIGNATURE-----
 
 iJYEABYKAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZjzswCAcamFya2tvLnNh
 a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0iVQAP9lxVjTKjMHQB01
 KFAXUogNU42JuJjzEiC5TaDxFPNHlAEAqVBYnPIZdP4VMF3UalVgIu/eRfvxTW/t
 klC+q7WiEwg=
 =+33z
 -----END PGP SIGNATURE-----

Merge tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull trusted keys updates from Jarkko Sakkinen:
 "This contains a new key type for the Data Co-Processor (DCP), which is
  an IP core built into many NXP SoCs such as i.mx6ull"

* tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  docs: trusted-encrypted: add DCP as new trust source
  docs: document DCP-backed trusted keys kernel params
  MAINTAINERS: add entry for DCP-based trusted keys
  KEYS: trusted: Introduce NXP DCP-backed trusted keys
  KEYS: trusted: improve scalability of trust source config
  crypto: mxs-dcp: Add support for hardware-bound keys
2024-05-13 10:38:13 -07:00
Illia Ostapyshyn
62158261a8 docs: cgroup-v1: Update page cache removal functions
Commit 452e9e6992 ("filemap: Add filemap_remove_folio and
__filemap_remove_folio") reimplemented __delete_from_page_cache() as
__filemap_remove_folio() and delete_from_page_cache() as
filemap_remove_folio().  The compatibility wrappers were finally removed
in ece62684dc ("hugetlbfs: convert hugetlb_delete_from_page_cache() to
use folios") and 6ffcd825e7 ("mm: Remove __delete_from_page_cache()").

Update the remaining references to dead functions in the memcg
implementation memo.

Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-13 07:00:43 -10:00
Linus Torvalds
c0b9620bc3 RCU pull request for v6.10
This pull request contains the following branches:
 
 fixes.2024.04.15a: Fix a lockdep complain for lazy-preemptible kernel,
 remove redundant BH disable for TINY_RCU, remove redundant READ_ONCE()
 in tree.c, fix false positives KCSAN splat and fix buffer overflow in
 the print_cpu_stall_info().
 
 misc.2024.04.12a: Misc updates related to bpf, tracing and update the
 MAINTAINERS file.
 
 rcu-sync-normal-improve.2024.04.15a: An improvement of a normal
 synchronize_rcu() call in terms of latency. It maintains a separate
 track for sync. users only. This approach bypasses per-cpu nocb-lists
 thus sync-users do not depend on nocb-list length and how fast regular
 callbacks are processed.
 
 rcu-tasks.2024.04.15a: RCU tasks, switch tasks RCU grace periods to
 sleep at TASK_IDLE priority, fix some comments, add some diagnostic
 warning to the exit_tasks_rcu_start() and fix a buffer overflow in
 the show_rcu_tasks_trace_gp_kthread().
 
 rcutorture.2024.04.15a: Increase memory to guest OS, fix a Tasks
 Rude RCU testing, some updates for TREE09, dump mode information
 to debug GP kthread state, remove redundant READ_ONCE(), fix some
 comments about RCU_TORTURE_PIPE_LEN and pipe_count, remove some
 redundant pointer initialization, fix a hung splat task by when
 the rcutorture tests start to exit, fix invalid context warning,
 add '--do-kvfree' parameter to torture test and use slow register
 unregister callbacks only for rcutype test.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEu6QRe/mAUYNn5U0PBYqkjnKWLM8FAmYzsmUACgkQBYqkjnKW
 LM/FAwv+LcIJ9lO/wzUpnH3d3djBOPmyu7Us8ERNY5lcVZ+neS2m3vxq0kOk/cnV
 RGgZc7qjWqMQ9hAx/MmIodmiw036ceRDe5CP/Ec/TYx68m+NPG3VnP08s/xLXLlx
 n8aSJJu37y0ElMQMwvuQaoNJ2xqlZ8AHCR6iaqJtzmPBR6zHLyeCPVpdPJQfcSO7
 +9ABzqo8isGxeuaAE7y0WUp0ZsSpdYvdext5SStjtvZ+hKERdVluhBF+OxZIZByp
 RSBoZJrbTKKpzTUBSE0ci+mlfqBPmSVjjqvygscuwOoKhm+601E51DYb1QXkGujq
 vuc1f/c7VjTAXyvs9k4An2x3XcN5SFhA6Bhc+L6aU/UJBzAWrJJkVOwS79gHNSn1
 qshyhpDLE8MiBEi0QxaEmBZLkz3BX1aYbQA0+5wvgoz0u8QglrpRrPRIWUWC0wvq
 SOLIibZkJuPUOZuD5AP4tg80swTuSCvyWuiKUVRnJK9FsYKdcyNUCnOLIwUzQlrg
 1/hatlvS
 =cq8V
 -----END PGP SIGNATURE-----

Merge tag 'rcu.next.v6.10' of https://github.com/urezki/linux

Pull RCU updates from Uladzislau Rezki:

 - Fix a lockdep complain for lazy-preemptible kernel, remove redundant
   BH disable for TINY_RCU, remove redundant READ_ONCE() in tree.c, fix
   false positives KCSAN splat and fix buffer overflow in the
   print_cpu_stall_info().

 - Misc updates related to bpf, tracing and update the MAINTAINERS file.

 - An improvement of a normal synchronize_rcu() call in terms of
   latency. It maintains a separate track for sync. users only. This
   approach bypasses per-cpu nocb-lists thus sync-users do not depend on
   nocb-list length and how fast regular callbacks are processed.

 - RCU tasks: switch tasks RCU grace periods to sleep at TASK_IDLE
   priority, fix some comments, add some diagnostic warning to the
   exit_tasks_rcu_start() and fix a buffer overflow in the
   show_rcu_tasks_trace_gp_kthread().

 - RCU torture: Increase memory to guest OS, fix a Tasks Rude RCU
   testing, some updates for TREE09, dump mode information to debug GP
   kthread state, remove redundant READ_ONCE(), fix some comments about
   RCU_TORTURE_PIPE_LEN and pipe_count, remove some redundant pointer
   initialization, fix a hung splat task by when the rcutorture tests
   start to exit, fix invalid context warning, add '--do-kvfree'
   parameter to torture test and use slow register unregister callbacks
   only for rcutype test.

* tag 'rcu.next.v6.10' of https://github.com/urezki/linux: (48 commits)
  rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test
  torture: Scale --do-kvfree test time
  rcutorture: Fix invalid context warning when enable srcu barrier testing
  rcutorture: Make stall-tasks directly exit when rcutorture tests end
  rcutorture: Removing redundant function pointer initialization
  rcutorture: Make rcutorture support print rcu-tasks gp state
  rcutorture: Use the gp_kthread_dbg operation specified by cur_ops
  rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading
  rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment
  rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE()
  rcu: Allocate WQ with WQ_MEM_RECLAIM bit set
  rcu: Support direct wake-up of synchronize_rcu() users
  rcu: Add a trace event for synchronize_rcu_normal()
  rcu: Reduce synchronize_rcu() latency
  rcu: Fix buffer overflow in print_cpu_stall_info()
  rcu: Mollify sparse with RCU guard
  rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow
  rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer
  rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE()
  rcu: Remove redundant CONFIG_PROVE_RCU #if condition
  ...
2024-05-13 09:49:06 -07:00
Linus Torvalds
d65e1a0f30 - Store AP Query Configuration Information in a static buffer
- Rework the AP initialization and add missing cleanups to the error path
 
 - Swap IRQ and AP bus/device registration to avoid race conditions
 
 - Export prot_virt_guest symbol
 
 - Introduce AP configuration changes notifier interface to facilitate
   modularization of the AP bus
 
 - Add CONFIG_AP kernel configuration option to allow modularization of
   the AP bus
 
 - Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description and
   dependency and rename it to CONFIG_AP_DEBUG
 
 - Convert sprintf() and snprintf() to sysfs_emit() in CIO code
 
 - Adjust indentation of RELOCS command build step
 
 - Make crypto performance counters upward compatible
 
 - Convert make_page_secure() and gmap_make_secure() to use folio
 
 - Rework channel-utilization-block (CUB) handling in preparation of
   introducing additional CUBs
 
 - Use attribute groups to simplify registration, removal and extension
   of measurement-related channel-path sysfs attributes
 
 - Add a per-channel-path binary "ext_measurement" sysfs attribute that
   provides access to extended channel-path measurement data
 
 - Export measurement data for all channel-measurement-groups (CMG), not
   only for a specific ones. This enables support of new CMG data formats
   in userspace without the need for kernel changes
 
 - Add a per-channel-path sysfs attribute "speed_bps" that provides the
   operating speed in bits per second or 0 if the operating speed is not
   available
 
 - The CIO tracepoint subchannel-type field "st" is incorrectly set to
   the value of subchannel-enabled SCHIB "ena" field. Fix that
 
 - Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS
 
 - Consider the maximum physical address available to a DCSS segment
   (512GB) when memory layout is set up
 
 - Simplify the virtual memory layout setup by reducing the size of
   identity mapping vs vmemmap overlap
 
 - Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
   This will allow to place the kernel image next to kernel modules
 
 - Move everyting KASLR related from <asm/setup.h> to <asm/page.h>
 
 - Put virtual memory layout information into a structure to improve
   code generation
 
 - Currently __kaslr_offset is the kernel offset in both physical and
   virtual memory spaces. Uncouple these offsets to allow uncoupling
   of the addresses spaces
 
 - Currently the identity mapping base address is implicit and is always
   set to zero. Make it explicit by putting into __identity_base persistent
   boot variable and use it in proper context
 
 - Introduce .amode31 section start and end macros AMODE31_START and
   AMODE31_END
 
 - Introduce OS_INFO entries that do not reference any data in memory,
   but rather provide only values
 
 - Store virtual memory layout in OS_INFO. It is read out by makedumpfile,
   crash and other tools
 
 - Store virtual memory layout in VMCORE_INFO. It is read out by crash and
   other tools when /proc/kcore device is used
 
 - Create additional PT_LOAD ELF program header that covers kernel image
   only, so that vmcore tools could locate kernel text and data when virtual
   and physical memory spaces are uncoupled
 
 - Uncouple physical and virtual address spaces
 
 - Map kernel at fixed location when KASLR mode is disabled. The location is
   defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration value.
 
 - Rework deployment of kernel image for both compressed and uncompressed
   variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel configuration
   value
 
 - Move .vmlinux.relocs section in front of the compressed kernel.
   The interim section rescue step is avoided as result
 
 - Correct modules thunk offset calculation when branch target is more
   than 2GB away
 
 - Kernel modules contain their own set of expoline thunks. Now that the
   kernel modules area is less than 4GB away from kernel expoline thunks,
   make modules use kernel expolines. Also make EXPOLINE_EXTERN the default
   if the compiler supports it
 
 - userfaultfd can insert shared zeropages into processes running VMs,
   but that is not allowed for s390. Fallback to allocating a fresh
   zeroed anonymous folio and insert that instead
 
 - Re-enable shared zeropages for non-PV and non-skeys KVM guests
 
 - Rename hex2bitmap() to ap_hex2bitmap() and export it for external use
 
 - Add ap_config sysfs attribute to provide the means for setting or
   displaying adapters, domains and control domains assigned to a vfio-ap
   mediated device in a single operation
 
 - Make vfio_ap_mdev_link_queue() ignore duplicate link requests
 
 - Add write support to ap_config sysfs attribute to allow atomic update
   a vfio-ap mediated device state
 
 - Document ap_config sysfs attribute
 
 - Function os_info_old_init() is expected to be called only from a regular
   kdump kernel. Enable it to be called from a stand-alone dump kernel
 
 - Address gcc -Warray-bounds warning and fix array size in struct os_info
 
 - s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks
 
 - Use unwinder instead of __builtin_return_address() with ftrace to
   prevent returning of undefined values
 
 - Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
   kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is disabled
 
 - Compile kernel with -fPIC and link with -no-pie to allow kpatch feature
   always succeed and drop the whole CONFIG_PIE_BUILD option-enabled code
 
 - Add missing virt_to_phys() converter for VSIE facility and crypto
   control blocks
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZjkp5xccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8D99AQCEby+KHssuZe9m0NvvikWREYBC
 myqob4EmdU3KdTEbNAEAt2OB7mzSQc90yjawI+Je7vwVyh3uc2Nb4Qg05yO6owI=
 =eOYN
 -----END PGP SIGNATURE-----

Merge tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

 - Store AP Query Configuration Information in a static buffer

 - Rework the AP initialization and add missing cleanups to the error
   path

 - Swap IRQ and AP bus/device registration to avoid race conditions

 - Export prot_virt_guest symbol

 - Introduce AP configuration changes notifier interface to facilitate
   modularization of the AP bus

 - Add CONFIG_AP kernel configuration option to allow modularization of
   the AP bus

 - Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description
   and dependency and rename it to CONFIG_AP_DEBUG

 - Convert sprintf() and snprintf() to sysfs_emit() in CIO code

 - Adjust indentation of RELOCS command build step

 - Make crypto performance counters upward compatible

 - Convert make_page_secure() and gmap_make_secure() to use folio

 - Rework channel-utilization-block (CUB) handling in preparation of
   introducing additional CUBs

 - Use attribute groups to simplify registration, removal and extension
   of measurement-related channel-path sysfs attributes

 - Add a per-channel-path binary "ext_measurement" sysfs attribute that
   provides access to extended channel-path measurement data

 - Export measurement data for all channel-measurement-groups (CMG), not
   only for a specific ones. This enables support of new CMG data
   formats in userspace without the need for kernel changes

 - Add a per-channel-path sysfs attribute "speed_bps" that provides the
   operating speed in bits per second or 0 if the operating speed is not
   available

 - The CIO tracepoint subchannel-type field "st" is incorrectly set to
   the value of subchannel-enabled SCHIB "ena" field. Fix that

 - Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS

 - Consider the maximum physical address available to a DCSS segment
   (512GB) when memory layout is set up

 - Simplify the virtual memory layout setup by reducing the size of
   identity mapping vs vmemmap overlap

 - Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
   This will allow to place the kernel image next to kernel modules

 - Move everyting KASLR related from <asm/setup.h> to <asm/page.h>

 - Put virtual memory layout information into a structure to improve
   code generation

 - Currently __kaslr_offset is the kernel offset in both physical and
   virtual memory spaces. Uncouple these offsets to allow uncoupling of
   the addresses spaces

 - Currently the identity mapping base address is implicit and is always
   set to zero. Make it explicit by putting into __identity_base
   persistent boot variable and use it in proper context

 - Introduce .amode31 section start and end macros AMODE31_START and
   AMODE31_END

 - Introduce OS_INFO entries that do not reference any data in memory,
   but rather provide only values

 - Store virtual memory layout in OS_INFO. It is read out by
   makedumpfile, crash and other tools

 - Store virtual memory layout in VMCORE_INFO. It is read out by crash
   and other tools when /proc/kcore device is used

 - Create additional PT_LOAD ELF program header that covers kernel image
   only, so that vmcore tools could locate kernel text and data when
   virtual and physical memory spaces are uncoupled

 - Uncouple physical and virtual address spaces

 - Map kernel at fixed location when KASLR mode is disabled. The
   location is defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration
   value.

 - Rework deployment of kernel image for both compressed and
   uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel
   configuration value

 - Move .vmlinux.relocs section in front of the compressed kernel. The
   interim section rescue step is avoided as result

 - Correct modules thunk offset calculation when branch target is more
   than 2GB away

 - Kernel modules contain their own set of expoline thunks. Now that the
   kernel modules area is less than 4GB away from kernel expoline
   thunks, make modules use kernel expolines. Also make EXPOLINE_EXTERN
   the default if the compiler supports it

 - userfaultfd can insert shared zeropages into processes running VMs,
   but that is not allowed for s390. Fallback to allocating a fresh
   zeroed anonymous folio and insert that instead

 - Re-enable shared zeropages for non-PV and non-skeys KVM guests

 - Rename hex2bitmap() to ap_hex2bitmap() and export it for external use

 - Add ap_config sysfs attribute to provide the means for setting or
   displaying adapters, domains and control domains assigned to a
   vfio-ap mediated device in a single operation

 - Make vfio_ap_mdev_link_queue() ignore duplicate link requests

 - Add write support to ap_config sysfs attribute to allow atomic update
   a vfio-ap mediated device state

 - Document ap_config sysfs attribute

 - Function os_info_old_init() is expected to be called only from a
   regular kdump kernel. Enable it to be called from a stand-alone dump
   kernel

 - Address gcc -Warray-bounds warning and fix array size in struct
   os_info

 - s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks

 - Use unwinder instead of __builtin_return_address() with ftrace to
   prevent returning of undefined values

 - Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
   kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is
   disabled

 - Compile kernel with -fPIC and link with -no-pie to allow kpatch
   feature always succeed and drop the whole CONFIG_PIE_BUILD
   option-enabled code

 - Add missing virt_to_phys() converter for VSIE facility and crypto
   control blocks

* tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
  Revert "s390: Relocate vmlinux ELF data to virtual address space"
  KVM: s390: vsie: Use virt_to_phys for crypto control block
  s390: Relocate vmlinux ELF data to virtual address space
  s390: Compile kernel with -fPIC and link with -no-pie
  s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD
  s390/ftrace: Use unwinder instead of __builtin_return_address()
  s390/pci: Drop unneeded reference to CONFIG_DMI
  s390/os_info: Fix array size in struct os_info
  s390/os_info: Initialize old os_info in standalone dump kernel
  docs: Update s390 vfio-ap doc for ap_config sysfs attribute
  s390/vfio-ap: Add write support to sysfs attr ap_config
  s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue
  s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state
  s390/ap: Externalize AP bus specific bitmap reading function
  s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests
  mm/userfaultfd: Do not place zeropages when zeropages are disallowed
  s390/expoline: Make modules use kernel expolines
  s390/nospec: Correct modules thunk offset calculation
  s390/boot: Do not rescue .vmlinux.relocs section
  s390/boot: Rework deployment of the kernel image
  ...
2024-05-13 08:33:52 -07:00
Joerg Roedel
2bd5059c6c Merge branches 'arm/renesas', 'arm/smmu', 'x86/amd', 'core' and 'x86/vt-d' into next 2024-05-13 14:06:54 +02:00
Shahab Vahedi
f122668ddc ARC: Add eBPF JIT support
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.

The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:

test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120  224  260 PASS

test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1  23 PASS

test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS

Deployment and structure
------------------------
The related codes are added to "arch/arc/net":

- bpf_jit.h       -- The interface that a back-end translator must provide
- bpf_jit_core.c  -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic

The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".

1. Normal pass       # The necessary pass
     1a. Dry run       # Get the whole JIT length, epilogue offset, etc.
     1b. Emit phase    # Allocate memory and start emitting instructions
2. Extra pass        # Only needed if there are relocations to be fixed
     2a. Patch relocations

Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:

- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)

The result of "test_bpf" test suite on an HSDK board is:

hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf

  test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:

  .-----------.------------.-------------.
  | test type |   opcodes  | # of cases  |
  |-----------+------------+-------------|
  | atomic    | 0xC3, 0xDB |         149 |
  | div64     | 0x37, 0x3F |          22 |
  | mod64     | 0x97, 0x9F |          15 |
  `-----------^------------+-------------|
                           | (total) 186 |
                           `-------------'

Setup: build config
-------------------
The following configs must be set to have a working JIT test:

  CONFIG_BPF_JIT=y
  CONFIG_BPF_JIT_ALWAYS_ON=y
  CONFIG_TEST_BPF=m

The following options are not necessary for the tests module,
but are good to have:

  CONFIG_DEBUG_INFO=y             # prerequisite for below
  CONFIG_DEBUG_INFO_BTF=y         # so bpftool can generate vmlinux.h

  CONFIG_FTRACE=y                 #
  CONFIG_BPF_SYSCALL=y            # all these options lead to
  CONFIG_KPROBE_EVENTS=y          # having CONFIG_BPF_EVENTS=y
  CONFIG_PERF_EVENTS=y            #

Some BPF programs provide data through /sys/kernel/debug:
  CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug

Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.

[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7

Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:

pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats

Or else, the build will fail:

$ make V=1
  ...
  BTF     .btf.vmlinux.bin.o
pahole -J --btf_gen_floats                    \
       -j --lang_exclude=rust                 \
       --skip_encoding_btf_inconsistent_proto \
       --btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
  ...
  BTFIDS  vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available

This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.

Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
      ...
      test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
      test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support

Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:51:36 -07:00
SeongJae Park
14e70e4660 Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
To update effective size quota of DAMOS schemes on DAMON sysfs file
interface, user should write 'update_schemes_effective_quotas' to the
kdamond 'state' file.  But the document is mistakenly saying the input
string as 'update_schemes_effective_bytes'.  Fix it (s/bytes/quotas/).

Link: https://lkml.kernel.org/r/20240503180318.72798-8-sj@kernel.org
Fixes: a6068d6dfa ("Docs/admin-guide/mm/damon/usage: document effective_bytes file")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[6.9.x]
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11 15:41:34 -07:00
SeongJae Park
da2a061888 Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
The example usage of DAMOS filter sysfs files, specifically the part of
'matching' file writing for memcg type filter, is wrong.  The intention is
to exclude pages of a memcg that already getting enough care from a given
scheme, but the example is setting the filter to apply the scheme to only
the pages of the memcg.  Fix it.

Link: https://lkml.kernel.org/r/20240503180318.72798-7-sj@kernel.org
Fixes: 9b7f9322a5 ("Docs/admin-guide/mm/damon/usage: document DAMOS filters of sysfs")
Closes: https://lore.kernel.org/r/20240317191358.97578-1-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[6.3.x]
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11 15:41:34 -07:00
Usama Arif
db5b4f3253 cgroup: Add documentation for missing zswap memory.stat
This includes zswpin, zswpout and zswpwb.

Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20240502185307.3942173-2-usamaarif642@gmail.com>
2024-05-09 10:54:37 -06:00
David Gstir
b85b253e23 docs: document DCP-backed trusted keys kernel params
Document the kernel parameters trusted.dcp_use_otp_key
and trusted.dcp_skip_zk_test for DCP-backed trusted keys.

Co-developed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Co-developed-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-05-09 18:29:03 +03:00
Will Deacon
42e7ddbaf1 Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (41 commits)
  arm64: Add USER_STACKTRACE support
  drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
  drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
  drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
  perf/arm-spe: Assign parents for event_source device
  perf/arm-smmuv3: Assign parents for event_source device
  perf/arm-dsu: Assign parents for event_source device
  perf/arm-dmc620: Assign parents for event_source device
  perf/arm-ccn: Assign parents for event_source device
  perf/arm-cci: Assign parents for event_source device
  perf/alibaba_uncore: Assign parents for event_source device
  perf/arm_pmu: Assign parents for event_source devices
  perf/imx_ddr: Assign parents for event_source devices
  perf/qcom: Assign parents for event_source devices
  Documentation: qcom-pmu: Use /sys/bus/event_source/devices paths
  perf/riscv: Assign parents for event_source devices
  perf/thunderx2: Assign parents for event_source devices
  Documentation: thunderx2-pmu: Use /sys/bus/event_source/devices paths
  perf/xgene: Assign parents for event_source devices
  Documentation: xgene-pmu: Use /sys/bus/event_source/devices paths
  ...
2024-05-09 15:56:10 +01:00
Linus Torvalds
1ab1a19db1 pci-v6.9-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmY61iEUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwtwA//Zw8a27/+cHuciYCOYMIrjhucBUCc
 qHBdzDWTy+h3gkfbcRFfXs3XaIBhlGbtI1d0GG5FyMuqicxCsF/mCIyc2LSTMIUo
 4201qVl/EGrNIBhOVcZtK+CFQmwmw1AaBdz7q4dS4/549xXGQ+/8DibAjfUlcDgC
 2iIkcvfNW9Hj9n4tFezNSPLewGVgFY2yFpImLHZc2hAuSXQ0P0D9JEDUUVVIWg/c
 PSJQKKita/fxgKk8RRCTRdpVezAtd7QO8V4Ae5gGH+oho4nRvCO0kYGteglx7/ab
 ReNtfNUPJN9h7M5ZYpyiNp1aZTaMEp3P+gMsD9ohV0/+5MNNAiZhDLPguQaEy/2n
 ZiQh5K3vwQb2NStJXauiBqJ+NHeqf8m3mk76X3/hxma6wqDfEOsRvYaexwY+Wxfa
 I0tzjZF1LBepsoFyDJM/5S+3nCJoqaCUAy1ZbGXwsBAAzZHw6x9+ieJfJhnCOL96
 kkNiNlxs8OJTTMl6F8W88NvMnhmCF0JxSOVTfxTaVwCaD6GwpnMNpXSEXqXPlVL1
 jMRHr/hZ7JjHarELC/TGe1uUmPsBhIym862XV+7E+9uUbcWldwjBijuSZQ9zUpX2
 UL0Cc2gJzyh/GwpeDVCyBGaINxzNVq3D5H6rYHlWQP+dp59stt/UBqBWJmerTHjy
 QQ+9+XS/CiElUL8=
 =EQN9
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Update kernel-parameters doc to describe "pcie_aspm=off" more
   accurately (Bjorn Helgaas)

 - Restore the parent's (not the child's) ASPM state to the parent
   during resume, which fixes a reboot during resume (Kai-Heng Feng)

* tag 'pci-v6.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/ASPM: Restore parent state to parent, child state to child
  PCI/ASPM: Clarify that pcie_aspm=off means leave ASPM untouched
2024-05-08 09:37:58 -07:00
Song Liu
393fb313a2 watchdog: allow nmi watchdog to use raw perf event
NMI watchdog permanently consumes one hardware counters per CPU on the
system.  For systems that use many hardware counters, this causes more
aggressive time multiplexing of perf events.

OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
used.  Add kernel cmdline arg nmi_watchdog=rNNN to configure the watchdog
to use raw event.  For example, on Intel CPUs, we can use "r300" to
configure the watchdog to use ref-cycles event.

If the raw event does not work, fall back to use "cycles".

[akpm@linux-foundation.org: fix kerneldoc]
Link: https://lkml.kernel.org/r/20240430060236.1878002-2-song@kernel.org
Signed-off-by: Song Liu <song@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-08 08:41:29 -07:00
SeongJae Park
ed13c93b93 Docs/admin-guide/mm/damon/usage: update for young page type DAMOS filter
Update DAMON usage document for the newly added DAMOS filter type, 'young
page'.

Link: https://lkml.kernel.org/r/20240426195247.100306-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Honggyu Kim <honggyu.kim@sk.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:55 -07:00
David Hildenbrand
1bafe96e89 mm/khugepaged: replace page_mapcount() check by folio_likely_mapped_shared()
We want to limit the use of page_mapcount() to places where absolutely
required, to prepare for kernel configs where we won't keep track of
per-page mapcounts in large folios.

khugepaged is one of the remaining "more challenging" page_mapcount()
users, but we might be able to move away from page_mapcount() without
resulting in a significant behavior change that would warrant
special-casing based on kernel configs.

In 2020, we first added support to khugepaged for collapsing COW-shared
pages via commit 9445689f3b ("khugepaged: allow to collapse a page
shared across fork"), followed by support for collapsing PTE-mapped THP in
commit 5503fbf2b0 ("khugepaged: allow to collapse PTE-mapped compound
pages") and limiting the memory waste via the "page_count() > 1" check in
commit 71a2c112a0 ("khugepaged: introduce 'max_ptes_shared' tunable").

As a default, khugepaged will allow up to half of the PTEs to map shared
pages: where page_mapcount() > 1.  MADV_COLLAPSE ignores the khugepaged
setting.

khugepaged does currently not care about swapcache page references, and
does not check under folio lock: so in some corner cases the "shared vs. 
exclusive" detection might be a bit off, making us detect "exclusive" when
it's actually "shared".

Most of our anonymous folios in the system are usually exclusive.  We
frequently see sharing of anonymous folios for a short period of time,
after which our short-lived suprocesses either quit or exec().

There are some famous examples, though, where child processes exist for a
long time, and where memory is COW-shared with a lot of processes
(webservers, webbrowsers, sshd, ...) and COW-sharing is crucial for
reducing the memory footprint.  We don't want to suddenly change the
behavior to result in a significant increase in memory waste.

Interestingly, khugepaged will only collapse an anonymous THP if at least
one PTE is writable.  After fork(), that means that something (usually a
page fault) populated at least a single exclusive anonymous THP in that
PMD range.

So ...  what happens when we switch to "is this folio mapped shared"
instead of "is this page mapped shared" by using
folio_likely_mapped_shared()?

For "not-COW-shared" folios, small folios and for THPs (large folios) that
are completely mapped into at least one process, switching to
folio_likely_mapped_shared() will not result in a change.

We'll only see a change for COW-shared PTE-mapped THPs that are partially
mapped into all involved processes.

There are two cases to consider:

(A) folio_likely_mapped_shared() returns "false" for a PTE-mapped THP

  If the folio is detected as exclusive, and it actually is exclusive,
  there is no change: page_mapcount() == 1. This is the common case
  without fork() or with short-lived child processes.

  folio_likely_mapped_shared() might currently still detect a folio as
  exclusive although it is shared (false negatives): if the first page is
  not mapped multiple times and if the average per-page mapcount is smaller
  than 1, implying that (1) the folio is partially mapped and (2) if we are
  responsible for many mapcounts by mapping many pages others can't
  ("mostly exclusive") (3) if we are not responsible for many mapcounts by
  mapping little pages ("mostly shared") it won't make a big impact on the
  end result.

  So while we might now detect a page as "exclusive" although it isn't,
  it's not expected to make a big difference in common cases.

(B) folio_likely_mapped_shared() returns "true" for a PTE-mapped THP

  folio_likely_mapped_shared() will never detect a large anonymous folio
  as shared although it is exclusive: there are no false positives.

  If we detect a THP as shared, at least one page of the THP is mapped by
  another process. It could well be that some pages are actually exclusive.
  For example, our child processes could have unmapped/COW'ed some pages
  such that they would now be exclusive to out process, which we now
  would treat as still-shared.

  Examples:
  (1) Parent maps all pages of a THP, child maps some pages. We detect
      all pages in the parent as shared although some are actually
      exclusive.
  (2) Parent maps all but some page of a THP, child maps the remainder.
      We detect all pages of the THP that the parent maps as shared
      although they are all exclusive.

  In (1) we wouldn't collapse a THP right now already: no PTE
  is writable, because a write fault would have resulted in COW of a
  single page and the parent would no longer map all pages of that THP.

  For (2) we would have collapsed a THP in the parent so far, now we
  wouldn't as long as the child process is still alive: unless the child
  process unmaps the remaining THP pages or we decide to split that THP.

  Possibly, the child COW'ed many pages, meaning that it's likely that
  we can populate a THP for our child first, and then for our parent.

  For (2), we are making really bad use of the THP in the first
  place (not even mapped completely in at least one process). If the
  THP would be completely partially mapped, it would be on the deferred
  split queue where we would split it lazily later.

  For short-running child processes, we don't particularly care. For
  long-running processes, the expectation is that such scenarios are
  rather rare: further, a THP might be best placed if most data in the
  PMD range is actually written, implying that we'll have to COW more
  pages first before khugepaged would collapse it.

To summarize, in the common case, this change is not expected to matter
much.  The more common application of khugepaged operates on exclusive
pages, either before fork() or after a child quit.

Can we improve (A)?  Yes, if we implement more precise tracking of "mapped
shared" vs.  "mapped exclusively", we could get rid of the false negatives
completely.

Can we improve (B)?  We could count how many pages of a large folio we map
inside the current page table and detect that we are responsible for most
of the folio mapcount and conclude "as good as exclusive", which might
help in some cases.  ...  but likely, some other mechanism should detect
that the THP is not a good use in the scenario (not even mapped completely
in a single process) and try splitting that folio lazily etc.

We'll move the folio_test_anon() check before our "shared" check, so we
might get more expressive results for SCAN_EXCEED_SHARED_PTE: this order
of checks now matches the one in __collapse_huge_page_isolate().  Extend
documentation.

Link: https://lkml.kernel.org/r/20240424122630.495788-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:50 -07:00
Yosry Ahmed
c074e1467f mm: zswap: remove same_filled module params
These knobs offer more fine-grained control to userspace than needed and
directly expose/influence kernel implementation; remove them.

For disabling same_filled handling, there is no logical reason to refuse
storing same-filled pages more efficiently and opt for compression. 
Scanning pages for patterns may be an argument, but the page contents will
be read into the CPU cache anyway during compression.  Also, removing the
same_filled handling code does not move the needle significantly in terms
of performance anyway [1].

For disabling non_same_filled handling, it was added when the compressed
pages in zswap were not being properly charged to memcgs, as workloads
could escape the accounting with compression [2].  This is no longer the
case after commit f4840ccfca ("zswap: memcg accounting"), and using
zswap without compression does not make much sense.

[1]https://lore.kernel.org/lkml/CAJD7tkaySFP2hBQw4pnZHJJwe3bMdjJ1t9VC2VJd=khn1_TXvA@mail.gmail.com/
[2]https://lore.kernel.org/lkml/19d5cdee-2868-41bd-83d5-6da75d72e940@maciej.szmigiero.name/

[yosryahmed@google.com: remove same_filled_pages from docs]
  Link: https://lkml.kernel.org/r/ZhxFVggdyvCo79jc@google.com
Link: https://lkml.kernel.org/r/20240413022407.785696-5-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:38 -07:00
Barry Song
a14421ae2a mm: correct the docs for thp_fault_alloc and thp_fault_fallback
The documentation does not align with the code.  In
__do_huge_pmd_anonymous_page(), THP_FAULT_FALLBACK is incremented when
mem_cgroup_charge() fails, despite the allocation succeeding, whereas
THP_FAULT_ALLOC is only incremented after a successful charge.

Link: https://lkml.kernel.org/r/20240412114858.407208-5-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:36 -07:00
Barry Song
42248b9d34 mm: add docs for per-order mTHP counters and transhuge_page ABI
This patch includes documentation for mTHP counters and an ABI file for
sys-kernel-mm-transparent-hugepage, which appears to have been missing for
some time.

[v-songbaohua@oppo.com: fix the name and unexpected indentation]
  Link: https://lkml.kernel.org/r/20240415054538.17071-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240412114858.407208-4-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:36 -07:00
David Hildenbrand
658670607f Documentation/admin-guide/cgroup-v1/memory.rst: don't reference page_mapcount()
Let's stop talking about page_mapcount().

Link: https://lkml.kernel.org/r/20240409192301.907377-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:31 -07:00
Bjorn Helgaas
2e0239d47d PCI/ASPM: Clarify that pcie_aspm=off means leave ASPM untouched
Previously we claimed "pcie_aspm=off" meant that ASPM would be disabled,
which is wrong.

Correct this to say that with "pcie_aspm=off", Linux doesn't touch any ASPM
configuration at all.  ASPM may have been enabled by firmware, and that
will be left unchanged.  See "aspm_support_enabled".

Link: https://lore.kernel.org/r/20240429191821.691726-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
2024-05-03 11:45:32 -05:00
Andrea della Porta
1279e8d0dc arm64: Add the arm64.no32bit_el0 command line option
Introducing the field 'el0' to the idreg-override for register
ID_AA64PFR0_EL1. This field is also aliased to the new kernel
command line option 'arm64.no32bit_el0' as a more recognizable
and mnemonic name to disable the execution of 32 bit userspace
applications (i.e. avoid Aarch32 execution state in EL0) from
kernel command line.

Link: https://lore.kernel.org/all/20240207105847.7739-1-andrea.porta@suse.com/
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Link: https://lore.kernel.org/r/20240429102833.6426-1-andrea.porta@suse.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-05-03 13:08:06 +01:00
Remington Brasga
da51bbcdba Docs: typos/spelling
Fix spelling and grammar in Docs descriptions

Signed-off-by: Remington Brasga <rbrasga@uci.edu>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240429225527.2329-1-rbrasga@uci.edu
2024-05-02 10:02:29 -06:00
Jacob Pan
be9be07b22 iommu/vt-d: Make posted MSI an opt-in command line option
Add a command line opt-in option for posted MSI if CONFIG_X86_POSTED_MSI=y.

Also introduce a helper function for testing if posted MSI is supported on
the platform.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423174114.526704-12-jacob.jun.pan@linux.intel.com
2024-04-30 00:54:43 +02:00
Bingbu Cao
ba124c8cf3 media: Documentation: add Intel IPU6 ISYS driver admin-guide doc
This document mainly describe the functionality of IPU6 and IPU6 isys
driver, and gives an example that how user can do imaging capture with
tools.

Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-04-29 14:56:38 +02:00
Linus Torvalds
aec147c188 Misc fixes:
- Make the CPU_MITIGATIONS=n interaction with conflicting
    mitigation-enabling boot parameters a bit saner.
 
  - Re-enable CPU mitigations by default on non-x86
 
  - Fix TDX shared bit propagation on mprotect()
 
  - Fix potential show_regs() system hang when PKE
    initialization is not fully finished yet.
 
  - Add the 0x10-0x1f model IDs to the Zen5 range
 
  - Harden #VC instruction emulation some more
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmYuCVMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h0Hw/1HVlmRGTrQQBvVMlzt6Y3GlUk2uHSiSh0
 pO57sh9tMu/3kWdcrUi4xkEVHmfBjMxXY5sw/7VXQ9mG7wv+SVgF3gAaAl+5q73K
 JKPPAhkPqUmXP3Sm1rqTt8iZtTViY3ilP6QEZaOIfL2Pwa7X3QP8TJRBKAJCrXEM
 hOEMXSd1W1Escs/uPlhCXHx8TRVTr9f4bv8TdHBXZGHTida5vejj+yhMSdaM94qw
 ywZ4an1NOnLGcNEMMYhOQ6Kbh9Ckj46JRjpodTfmjodLd/jOhVU5C7nTZfHRXSRU
 3UQBZtTZIYYCs8Urg2l/W5IhywWV3P9Jg+D+vl/bdEKJ+yINLAnOgVhVPqeG2GWt
 Ww3FelgRz0AkQKTegRCK2jQWnHActSrYmkr4M24wa/cVkMrcpXT3LHj8PgRnllx5
 q5JqQ37G3QYHMzslbBqyUHzJv8KzgdZdgyFTN3dX1q9n5FPy7Ul9Ue1Zp2SoId8i
 K6u+IjCkftWwIbv8AhXiEVo0ynfBkmV4UNVGJks1xIPA3lmNv3ax5nQMJLvZzJ48
 n+Id8ALEWxyOrKR6bdWdPtJqd0Nw/q4e6AOTzVYE94X8+uVuug4m4X7QPo+Ctbz1
 IkhTxmBbHzgKylbddK6LkdnXnHCGidOmXsF3VS6TRfz7ALaMUgpaHw34reEhiOlT
 xsIw+XVOKg==
 =AfRR
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Make the CPU_MITIGATIONS=n interaction with conflicting
   mitigation-enabling boot parameters a bit saner.

 - Re-enable CPU mitigations by default on non-x86

 - Fix TDX shared bit propagation on mprotect()

 - Fix potential show_regs() system hang when PKE initialization
   is not fully finished yet.

 - Add the 0x10-0x1f model IDs to the Zen5 range

 - Harden #VC instruction emulation some more

* tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
  cpu: Re-enable CPU mitigations by default for !X86 architectures
  x86/tdx: Preserve shared bit on mprotect()
  x86/cpu: Fix check for RDPKRU in __show_regs()
  x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range
  x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler
2024-04-28 11:58:16 -07:00
Baoquan He
b0f970c50d Documentation: kdump: clean up the outdated description
After commit 443cbaf9e2 ("crash: split vmcoreinfo exporting code out
from crash_core.c"), Kconfig item CRASH_CORE has gone away in kernel. 
Items VMCORE_INFO and CRASH_RESERVE are used instead.

So clean up the outdated description about CRASH_CORE and update it
accordingly.

Link: https://lkml.kernel.org/r/20240329132825.1102459-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 21:07:04 -07:00
Sergey Senozhatsky
34efe1c3b6 zram: add max_pages param to recompression
Introduce "max_pages" param to recompress device attribute which sets an
upper limit on the number of entries (pages) zram attempts to recompress
(in this particular recompression call).  S/W recompression can be quite
expensive so limiting the number of pages recompress touches can be quite
helpful.

Link: https://lkml.kernel.org/r/20240329094050.2815699-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Brian Geffon <bgeffon@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:30 -07:00
York Jasper Niebuhr
ba42b524a0 mm: init_mlocked_on_free_v3
Implements the "init_mlocked_on_free" boot option. When this boot option
is enabled, any mlock'ed pages are zeroed on free. If
the pages are munlock'ed beforehand, no initialization takes place.
This boot option is meant to combat the performance hit of
"init_on_free" as reported in commit 6471384af2 ("mm: security:
introduce init_on_alloc=1 and init_on_free=1 boot options"). With
"init_mlocked_on_free=1" only relevant data is freed while everything
else is left untouched by the kernel. Correspondingly, this patch
introduces no performance hit for unmapping non-mlock'ed memory. The
unmapping overhead for purely mlocked memory was measured to be
approximately 13%. Realistically, most systems mlock only a fraction of
the total memory so the real-world system overhead should be close to
zero.

Optimally, userspace programs clear any key material or other
confidential memory before exit and munlock the according memory
regions. If a program crashes, userspace key managers fail to do this
job. Accordingly, no munlock operations are performed so the data is
caught and zeroed by the kernel. Should the program not crash, all
memory will ideally be munlocked so no overhead is caused.

CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON can be set to enable
"init_mlocked_on_free" by default.

Link: https://lkml.kernel.org/r/20240329145605.149917-1-yjnworkstation@gmail.com
Signed-off-by: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:29 -07:00
Matthew Wilcox (Oracle)
4dc7d37370 remove references to page->flags in documentation
Mostly rewording, but remove entirely the copy of page_fixed_fake_head()
in the documentation; we can refer people to the actual source if
necessary.

Link: https://lkml.kernel.org/r/20240326171045.410737-10-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:15 -07:00
Baolin Wang
353dc18784 docs: hugetlbpage.rst: add hugetlb migration description
Add some description of the hugetlb migration strategy.

Link: https://lkml.kernel.org/r/63fb16e7a4ebc5cb69ce655af86e29b2d8e9ba34.1709719720.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:06 -07:00
Suren Baghdasaryan
22d407b164 lib: add allocation tagging support for memory allocation profiling
Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily
instrument memory allocators.  It registers an "alloc_tags" codetag type
with /proc/allocinfo interface to output allocation tag information when
the feature is enabled.

CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory
allocation profiling instrumentation.

Memory allocation profiling can be enabled or disabled at runtime using
/proc/sys/vm/mem_profiling sysctl when CONFIG_MEM_ALLOC_PROFILING_DEBUG=n.
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT enables memory allocation
profiling by default.

[surenb@google.com: Documentation/filesystems/proc.rst: fix allocinfo title]
  Link: https://lkml.kernel.org/r/20240326073813.727090-1-surenb@google.com
[surenb@google.com: do limited memory accounting for modules with ARCH_NEEDS_WEAK_PER_CPU]
  Link: https://lkml.kernel.org/r/20240402180933.1663992-2-surenb@google.com
[klarasmodin@gmail.com: explicitly include irqflags.h in alloc_tag.h]
  Link: https://lkml.kernel.org/r/20240407133252.173636-1-klarasmodin@gmail.com
[surenb@google.com: fix alloc_tag_init() to prevent passing NULL to PTR_ERR()]
  Link: https://lkml.kernel.org/r/20240417003349.2520094-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-14-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Klara Modin <klarasmodin@gmail.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:55:52 -07:00
Sean Christopherson
ce0abef6a1 cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
Explicitly disallow enabling mitigations at runtime for kernels that were
built with CONFIG_CPU_MITIGATIONS=n, as some architectures may omit code
entirely if mitigations are disabled at compile time.

E.g. on x86, a large pile of Kconfigs are buried behind CPU_MITIGATIONS,
and trying to provide sane behavior for retroactively enabling mitigations
is extremely difficult, bordering on impossible.  E.g. page table isolation
and call depth tracking require build-time support, BHI mitigations will
still be off without additional kernel parameters, etc.

  [ bp: Touchups. ]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240420000556.2645001-3-seanjc@google.com
2024-04-25 15:47:39 +02:00
Maíra Canal
b413f9cd4c mm: Update shuffle documentation to match its current state
Commit 839195352d ("mm/shuffle: remove dynamic reconfiguration")
removed the dynamic reconfiguration capabilities from the shuffle page
allocator. This means that, now, we don't have any perspective of an
"autodetection of memory-side-cache" that triggers the enablement of the
shuffle page allocator.

Therefore, let the documentation reflect that the only way to enable
the shuffle page allocator is by setting `page_alloc.shuffle=1`.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240422142007.1062231-1-mcanal@igalia.com
2024-04-24 13:05:01 -06:00
Thomas Weißschuh
8af2d1ab78 admin-guide/hw-vuln/core-scheduling: fix return type of PR_SCHED_CORE_GET
sched_core_share_pid() copies the cookie to userspace with
put_user(id, (u64 __user *)uaddr), expecting 64 bits of space.
The "unsigned long" datatype that is documented in core-scheduling.rst
however is only 32 bits large on 32 bit architectures.

Document "unsigned long long" as the correct data type that is always
64bits large.

This matches what the selftest cs_prctl_test.c has been doing all along.

Fixes: 0159bb020c ("Documentation: Add usecases, design and interface for core scheduling")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/util-linux/df7a25a0-7923-4f8b-a527-5e6f0064074d@t-8ch.de/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240423-core-scheduling-cookie-v1-1-5753a35f8dfc@weissschuh.net
2024-04-24 13:04:27 -06:00
Vincent Guittot
97450eb909 sched/pelt: Remove shift of thermal clock
The optional shift of the clock used by thermal/hw load avg has been
introduced to handle case where the signal was not always a high frequency
hw signal. Now that cpufreq provides a signal for firmware and
SW pressure, we can remove this exception and always keep this PELT signal
aligned with other signals.
Mark sysctl_sched_migration_cost boot parameter as deprecated

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20240326091616.3696851-6-vincent.guittot@linaro.org
2024-04-24 12:08:02 +02:00
Greg Kroah-Hartman
660a708098 Merge 6.9-rc5 into tty-next
We want the tty fixes in here as well, and it resolves a merge conflict
in:
	drivers/tty/serial/serial_core.c
as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-23 13:24:45 +02:00
Linus Torvalds
4d2008430c A set of updates from Thorsten to his (new) guide to verifying bugs and
tracking down regressions.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmYmeN4PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YtzcH/1plJvgfTDMWZWoOgNuPSiExWFgqoWk3KyDw
 hqBfQ0p5QvTnC/rQx8mgGP3EqUXegwFHormRT2oSNyCSBHBSMdJU5ZT/WOp4rDpa
 NvjM7FurBhb3b83ieottl6dDxnbUiRy6+z/uayEeV03KEolL7/nljfBozg9282Ep
 p6cc4D5zwNcYdeqEom32AVOPFUUtQAvPhGj4OUHBwjzn7WF2lfzb4z4RudmsyE5E
 1Whd35HEOpKsNlosy9XJ8ZP/QHAb0Nkwdg2irT71lZ+37nUX/KgMxFuJUxviYI9m
 4jtZsjRew7RS/EUtwbUsyD4DTXla0N2H+//ZF5AatRz8RuQKZgQ=
 =A/2u
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.9-fixes2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A set of updates from Thorsten to his (new) guide to verifying bugs
  and tracking down regressions"

* tag 'docs-6.9-fixes2' of git://git.lwn.net/linux:
  docs: verify/bisect: stable regressions: first stable, then mainline
  docs: verify/bisect: describe how to use a build host
  docs: verify/bisect: explain testing reverts, patches and newer code
  docs: verify/bisect: proper headlines and more spacing
  docs: verify/bisect: add and fetch stable branches ahead of time
  docs: verify/bisect: use git switch, tag kernel, and various fixes
2024-04-22 09:41:03 -07:00
Jonathan Cameron
556da13434 Documentation: qcom-pmu: Use /sys/bus/event_source/devices paths
To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-14-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:30 +01:00
Jonathan Cameron
90b4a1a927 Documentation: thunderx2-pmu: Use /sys/bus/event_source/devices paths
To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-11-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:29 +01:00
Jonathan Cameron
867ba6d204 Documentation: xgene-pmu: Use /sys/bus/event_source/devices paths
To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-9-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:29 +01:00
Jonathan Cameron
eff6af5313 Documentation: hns-pmu: Use /sys/bus/event_source/devices paths
To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-5-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:28 +01:00
Jonathan Cameron
d0412b6ecb Documentation: hisi-pmu: Drop reference to /sys/devices path
Having assigned a parent to the device, the suggested path is
no longer valid.  As /sys/bus/event_sources based path is also
provided, simply drop mention of alternative.

Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-3-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:28 +01:00
Xiu Jianfeng
c9169291be docs, cgroup: add entries for pids to cgroup-v2.rst
This patch add two entries (pids.peak and pids.events) for pids
controller, and also update pids.current because it's on non-root.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-18 06:00:58 -10:00
Alexander Gordeev
54f2ecc318 s390: Map kernel at fixed location when KASLR is disabled
Since kernel virtual and physical address spaces are
uncoupled the kernel is mapped at the top of the virtual
address space in case KASLR is disabled.

That does not pose any issue with regard to the kernel
booting and operation, but makes it difficult to use a
generated vmlinux with some debugging tools (e.g. gdb),
because the exact location of the kernel image in virtual
memory is unknown. Make that location known and introduce
CONFIG_KERNEL_IMAGE_BASE configuration option.

A custom CONFIG_KERNEL_IMAGE_BASE value that would break
the virtual memory layout leads to a build error.

The kernel image size is defined by KERNEL_IMAGE_SIZE
macro and set to 512 MB, by analogy with x86.

Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-17 13:38:02 +02:00
Mikulas Patocka
5268de78e1 dm-crypt: add the optional "high_priority" flag
When WQ_HIGHPRI was used for the dm-crypt kcryptd workqueue it was
reported that dm-crypt performs badly when the system is loaded[1].
Because of reports of audio skipping, dm-crypt stopped using
WQ_HIGHPRI with commit f612b2132d (Revert "dm crypt: use WQ_HIGHPRI
for the IO and crypt workqueues").

But it has since been determined that WQ_HIGHPRI provides improved
performance (with reduced latency) for highend systems with much more
resources than those laptop/desktop users which suffered from the use
of WQ_HIGHPRI.

As such, add an option "high_priority" that allows the use of
WQ_HIGHPRI for dm-crypt's workqueues and also sets the write_thread to
nice level MIN_NICE (-20). This commit makes it optional, so that
normal users won't be harmed by it.

[1] https://listman.redhat.com/archives/dm-devel/2023-February/053410.html

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-04-16 11:34:47 -04:00
Uladzislau Rezki (Sony)
988f569ae0 rcu: Reduce synchronize_rcu() latency
A call to a synchronize_rcu() can be optimized from a latency
point of view. Workloads which depend on this can benefit of it.

The delay of wakeme_after_rcu() callback, which unblocks a waiter,
depends on several factors:

- how fast a process of offloading is started. Combination of:
    - !CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU;
    - !CONFIG_RCU_LAZY/CONFIG_RCU_LAZY;
    - other.
- when started, invoking path is interrupted due to:
    - time limit;
    - need_resched();
    - if limit is reached.
- where in a nocb list it is located;
- how fast previous callbacks completed;

Example:

1. On our embedded devices i can easily trigger the scenario when
it is a last in the list out of ~3600 callbacks:

<snip>
  <...>-29      [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt CBs=3613 bl=28
...
  <...>-29      [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt rhp=00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt rhp=00000000a446f607 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt rhp=00000000a5cab03b func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt rhp=0000000013b7e5ee func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt rhp=000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt rhp=000000008f162ca8 func=wakeme_after_rcu.cfi_jt
  <...>-29      [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs-invoked=3612 idle=....
<snip>

2. We use cpuset/cgroup to classify tasks and assign them into
different cgroups. For example "backgrond" group which binds tasks
only to little CPUs or "foreground" which makes use of all CPUs.
Tasks can be migrated between groups by a request if an acceleration
is needed.

See below an example how "surfaceflinger" task gets migrated.
Initially it is located in the "system-background" cgroup which
allows to run only on little cores. In order to speed it up it
can be temporary moved into "foreground" cgroup which allows
to use big/all CPUs:

cgroup_attach_task():
 -> cgroup_migrate_execute()
   -> cpuset_can_attach()
     -> percpu_down_write()
       -> rcu_sync_enter()
         -> synchronize_rcu()
   -> now move tasks to the new cgroup.
 -> cgroup_migrate_finish()

<snip>
         rcuop/1-29      [000] .....  7030.528570: rcu_invoke_callback: rcu_preempt rhp=00000000461605e0 func=wakeme_after_rcu.cfi_jt
    PERFD-SERVER-1855    [000] d..1.  7030.530293: cgroup_attach_task: dst_root=3 dst_id=22 dst_level=1 dst_path=/foreground pid=1900 comm=surfaceflinger
   TimerDispatch-2768    [002] d..5.  7030.537542: sched_migrate_task: comm=surfaceflinger pid=1900 prio=98 orig_cpu=0 dest_cpu=4
<snip>

"Boosting a task" depends on synchronize_rcu() latency:

- first trace shows a completion of synchronize_rcu();
- second shows attaching a task to a new group;
- last shows a final step when migration occurs.

3. To address this drawback, maintain a separate track that consists
of synchronize_rcu() callers only. After completion of a grace period
users are deferred to a dedicated worker to process requests.

4. This patch reduces the latency of synchronize_rcu() approximately
by ~30-40% on synthetic tests. The real test case, camera launch time,
shows(time is in milliseconds):

1-run 542 vs 489 improvement 9%
2-run 540 vs 466 improvement 13%
3-run 518 vs 468 improvement 9%
4-run 531 vs 457 improvement 13%
5-run 548 vs 475 improvement 13%
6-run 509 vs 484 improvement 4%

Synthetic test(no "noise" from other callbacks):
Hardware: x86_64 64 CPUs, 64GB of memory
Linux-6.6

- 10K tasks(simultaneous);
- each task does(1000 loops)
     synchronize_rcu();
     kfree(p);

default: CONFIG_RCU_NOCB_CPU: takes 54 seconds to complete all users;
patch: CONFIG_RCU_NOCB_CPU: takes 35 seconds to complete all users.

Running 60K gives approximately same results on my setup. Please note
it is without any interaction with another type of callbacks, otherwise
it will impact a lot a default case.

5. By default it is disabled. To enable this perform one of the
below sequence:

echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-15 19:47:49 +02:00
Thorsten Leemhuis
8d939ae349 docs: verify/bisect: stable regressions: first stable, then mainline
Rearrange the instructions so that readers facing a regression within a
stable or longterm series first test its latest release before testing
mainline. This is less scary for some people. It also reduces the chance
that something goes sideways for readers that compile their first
kernel, as mainline can cause slightly more trouble.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/efd3cb9c68db450091021326bf9c334553df0ec2.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis
2bcfd71e8d docs: verify/bisect: describe how to use a build host
Describe how to build kernels on another system (with and without
cross-compiling), as building locally can be quite painfully on some
slow systems. This is done in an add-on section, as it would make the
step-by-step guide to complicated if this special case would be
described there.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/288160cb4769e46a3280250ca71da0abc4aa002d.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis
a421835a2a docs: verify/bisect: explain testing reverts, patches and newer code
Rename 'Supplementary tasks' to 'Complementary tasks' while introducing
a section 'Optional tasks: test reverts, patches, or later versions':
the latter is something readers occasionally will have to do after
reporting a bug and thus is best covered here.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/dacf26a4c48e9e8f04ecbc77e0a74c9b2a6a1103.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis
453de3207f docs: verify/bisect: proper headlines and more spacing
Various small improvements and fixes:

* Separate ref links from their target with a space for better
  readability.

* Add a proper heading for the note at the end of the step-by-step
  guide.

* Use proper 3rd and 4th level headlines in the reference section and
  add short intros for the 2nd level headlines that lacked one.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/f59f0f235a2192ed93899a7338153e4cb71075f0.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis
932c9a5398 docs: verify/bisect: add and fetch stable branches ahead of time
Add and fetch all required stable branches ahead of time. This fixes a
bug, as readers that wanted to bisect a regression within a stable or
longterm series otherwise did not have them available at the right time.
This way also matches the flow somewhat better and avoids some "if you
haven't already added it" phrases that otherwise become necessary in
future changes.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/57dcf312959476abe6151bf3d35eb79e3e9a83d1.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis
abbb99301e docs: verify/bisect: use git switch, tag kernel, and various fixes
Various small improvements and fixes:

* Use the more modern 'git switch' instead of 'git checkout', which
  makes it more obvious what's happening (among others due to the
  --discard-changes parameter that is more clear than --force).

* Provide a hint how a mainline version number and one from a stable
  series look like.

* When trying to validate the bisection result with a revert, add a
  special tag to facilitate the identification.

* Sync version numbers used in various examples for consistency: stick
  to 6.0.13, 6.0.15, and 6.1.5.

* Fix a few typos and oddities.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/85029aa004447b0eeb5043fb014630f2acafacec.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:55 -06:00
Pasha Tatashin
212c5c078d iommu: account IOMMU allocated memory
In order to be able to limit the amount of memory that is allocated
by IOMMU subsystem, the memory must be accounted.

Account IOMMU as part of the secondary pagetables as it was discussed
at LPC.

The value of SecPageTables now contains mmeory allocation by IOMMU
and KVM.

There is a difference between GFP_ACCOUNT and what NR_IOMMU_PAGES shows.
GFP_ACCOUNT is set only where it makes sense to charge to user
processes, i.e. IOMMU Page Tables, but there more IOMMU shared data
that should not really be charged to a specific process.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20240413002522.1101315-12-pasha.tatashin@soleen.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-15 14:31:48 +02:00
Josh Poimboeuf
36d4fe147c x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
Unlike most other mitigations' "auto" options, spectre_bhi=auto only
mitigates newer systems, which is confusing and not particularly useful.

Remove it.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
2024-04-12 12:05:54 +02:00
Sreenath Vijayan
1b743485e2 tty/sysrq: Replay kernel log messages on consoles via sysrq
When terminal is unresponsive, one cannot use dmesg to view
the printk ring buffer messages. Also, syslog services may be
disabled, especially on embedded systems, to check the messages
after a reboot. In this scenario, replay the messages in printk
ring buffer on consoles via sysrq by pressing sysrq+R.

The console loglevel will determine which all kernel log messages
are displayed. The messages will be displayed only when
console_trylock() succeeds. Users could repeat the sysrq key when
it fails. If the owner of console subsystem lock is stuck,
repeating the key won't work.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Shimoyashiki Taichi <taichi.shimoyashiki@sony.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sreenath Vijayan <sreenath.vijayan@sony.com>
Link: https://lore.kernel.org/r/cc3b9b1aae60a236c6aed1dc7b0ffa2c7cd1f183.1710220326.git.sreenath.vijayan@sony.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-11 14:22:52 +02:00
Josh Poimboeuf
5f882f3b0a x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
While syscall hardening helps prevent some BHI attacks, there's still
other low-hanging fruit remaining.  Don't classify it as a mitigation
and make it clear that the system may still be vulnerable if it doesn't
have a HW or SW mitigation enabled.

Fixes: ec9404e40e ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
2024-04-11 10:30:33 +02:00
Josh Poimboeuf
dfe648903f x86/bugs: Fix BHI documentation
Fix up some inaccuracies in the BHI documentation.

Fixes: ec9404e40e ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
2024-04-11 10:30:25 +02:00