Currently, amd_pstate_update_freq passes the hardware perf limits as
min/max_perf to amd_pstate_update, which eventually gets programmed into
the min/max_perf fields of the CPPC_REQ register.
Instead pass the effective perf limits i.e. min/max_limit_perf values to
amd_pstate_update as min/max_perf.
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20250205112523.201101-6-dhananjay.ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
des_perf is later on clamped between min_perf and max_perf in
amd_pstate_update. So, remove the redundant clamping from
amd_pstate_adjust_perf.
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20250205112523.201101-5-dhananjay.ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Instead of setting a fixed floor at lowest_nonlinear_perf, use the
min_limit_perf value, so that it gives the user the freedom to lower the
floor further.
There are two minimum frequency/perf limits that we need to consider in
the adjust_perf callback. One provided by schedutil i.e. the sg_cpu->bw_min
value passed in _min_perf arg, another is the effective value of
min_freq_qos request that is updated in cpudata->min_limit_perf. Modify the
code to use the bigger of these two values.
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20250205112523.201101-4-dhananjay.ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Move the invocation of intel_pstate_platform_pwr_mgmt_exists() before
checking whether or not HWP is enabled because it does not depend on
any code running before it except for the vendor check and if CPU
performance scaling is going to be carried out by the platform, all of
the code that runs before that function (again, except for the vendor
check) is redundant.
This is not expected to alter any functionality except for the ordering
of messages printed by intel_pstate_init() when it is going to return an
error before attempting to register the driver.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/2776745.mvXUDI8C0e@rjwysocki.net
We observed an issue that the CPU frequency can't raise up with a 100% CPU
load when NOHZ is off and the 'conservative' governor is selected.
'idle_time' can be negative if it's obtained from get_cpu_idle_time_jiffy()
when NOHZ is off. This was found and explained in commit 9485e4ca0b
("cpufreq: governor: Fix handling of special cases in dbs_update()").
However, commit 7592019634 ("cpufreq: governors: Fix long idle detection
logic in load calculation") introduced a comparison between 'idle_time' and
'samling_rate' to detect a long idle interval. While 'idle_time' is
converted to int before comparison, it's actually promoted to unsigned
again when compared with an unsigned 'sampling_rate'. Hence, this leads to
wrong idle interval detection when it's in fact 100% busy and sets
policy_dbs->idle_periods to a very large value. 'conservative' adjusts the
frequency to minimum because of the large 'idle_periods', such that the
frequency can't raise up. 'Ondemand' doesn't use policy_dbs->idle_periods
so it fortunately avoids the issue.
Correct negative 'idle_time' to 0 before any use of it in dbs_update().
Fixes: 7592019634 ("cpufreq: governors: Fix long idle detection logic in load calculation")
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Link: https://patch.msgid.link/20250213035510.2402076-1-zhanjie9@hisilicon.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This frequency was disabled because of stability problems whose source could
not be accurately identified[1]. After seven months of testing, the evidence
points to an incorrectly configured bootloader as the source of the historical
instability. Testing was performed on two A3720 devices with this frequency
enabled and the ondemand policy in use. Marvell merged[2] changes to their
bootloader source needed to address the stability issue. This driver should
expose this frequency option to users.
[1] 484f2b7c61
[2] https://github.com/MarvellEmbeddedProcessors/mv-ddr-marvell/pull/44
Signed-off-by: Benjamin Schneider <ben@bens.haus>
Reviewed-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Capacity-aware scheduling (CAS) is enabled by default by intel_pstate on
hybrid systems without SMT, but in some usage scenarios it may be more
attractive to place tasks for maximum CPU performance regardless of the
extra cost in terms of energy, which is the case on such systems when
CAS is not enabled, so introduce a command line option to forbid
intel_pstate to enable CAS.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by:Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/2781262.mvXUDI8C0e@rjwysocki.net
Currently the CPUFreq core exposes two sysfs attributes that can be used
to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
and scaling_cur_freq. Both provide slightly different view on the
subject and they do come with their own drawbacks.
cpuinfo_cur_freq provides higher precision though at a cost of being
rather expensive. Moreover, the information retrieved via this attribute
is somewhat short lived as frequency can change at any point of time
making it difficult to reason from.
scaling_cur_freq, on the other hand, tends to be less accurate but then
the actual level of precision (and source of information) varies between
architectures making it a bit ambiguous.
The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
distinct interface, exposing an average frequency of a given CPU(s), as
reported by the hardware, over a time frame spanning no more than a few
milliseconds. As it requires appropriate hardware support, this
interface is optional.
Note that under the hood, the new attribute relies on the information
provided by arch_freq_get_on_cpu, which, up to this point, has been
feeding data for scaling_cur_freq attribute, being the source of
ambiguity when it comes to interpretation. This has been amended by
restoring the intended behavior for scaling_cur_freq, with a new
dedicated config option to maintain status quo for those, who may need
it.
CC: Jonathan Corbet <corbet@lwn.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Phil Auld <pauld@redhat.com>
CC: x86@kernel.org
CC: linux-doc@vger.kernel.org
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20250131162439.3843071-3-beata.michalska@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Allow arch_freq_get_on_cpu to return an error for cases when retrieving
current CPU frequency is not possible, whether that being due to lack of
required arch support or due to other circumstances when the current
frequency cannot be determined at given point of time.
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20250131162439.3843071-2-beata.michalska@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The boost feature can be controlled at two levels currently, driver
level (applies to all policies) and per-policy.
Currently the driver enables driver level boost support from the
per-policy ->init() callback, which isn't really efficient as that gets
called for each policy and then there is online/offline path too where
this gets done unnecessarily.
Instead set the .set_boost field directly and always enable the boost
support. If a policy doesn't support boost feature, the core will not
enable it for that policy.
Keep the initial state of driver level boost to disabled and let the
user enable it if required as ideally the boost frequencies must be used
only when really required.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The boost feature can be controlled at two levels currently, driver
level (applies to all policies) and per-policy.
Currently the driver enables driver level boost support from the
per-policy ->init() callback, which isn't really efficient as that gets
called for each policy and then there is online/offline path too where
this gets done unnecessarily.
Instead set the .set_boost field directly and always enable the boost
support. If a policy doesn't support boost feature, the core will not
enable it for that policy.
Keep the initial state of driver level boost to disabled and let the
user enable it if required as ideally the boost frequencies must be used
only when really required.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The boost feature can be controlled at two levels currently, driver
level (applies to all policies) and per-policy.
Currently the driver enables driver level boost support from the
per-policy ->init() callback, which isn't really efficient as that gets
called for each policy and then there is online/offline path too where
this gets done unnecessarily.
Instead set the .set_boost field directly and always enable the boost
support. If a policy doesn't support boost feature, the core will not
enable it for that policy.
Keep the initial state of driver level boost to disabled and let the
user enable it if required as ideally the boost frequencies must be used
only when really required.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
The boost feature can be controlled at two levels currently, driver
level (applies to all policies) and per-policy.
Currently the driver enables driver level boost support from the
per-policy ->init() callback, which isn't really efficient as that gets
called for each policy and then there is online/offline path too where
this gets done unnecessarily.
Instead set the .set_boost field directly and always enable the boost
support. If a policy doesn't support boost feature, the core will not
enable it for that policy.
Keep the initial state of driver level boost to disabled and let the
user enable it if required as ideally the boost frequencies must be used
only when really required.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The boost feature can be controlled at two levels currently, driver
level (applies to all policies) and per-policy.
Currently the driver enables driver level boost support from the
per-policy ->init() callback, which isn't really efficient as that gets
called for each policy and then there is online/offline path too where
this gets done unnecessarily.
Instead set the .set_boost field directly and always enable the boost
support. If a policy doesn't support boost feature, the core will not
enable it for that policy.
Keep the initial state of driver level boost to disabled and let the
user enable it if required as ideally the boost frequencies must be used
only when really required.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The boost feature can be controlled at two levels currently, driver
level (applies to all policies) and per-policy.
Currently the driver enables driver level boost support from the
per-policy ->init() callback, which isn't really efficient as that gets
called for each policy and then there is online/offline path too where
this gets done unnecessarily.
Instead set the .set_boost field directly and always enable the boost
support. If a policy doesn't support boost feature, the core will not
enable it for that policy.
Keep the initial state of driver level boost to disabled and let the
user enable it if required as ideally the boost frequencies must be used
only when really required.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
It is possible to have a scenario where not all cpufreq policies support
boost frequencies. And letting sysfs (or other parts of the kernel)
enable boost feature for that policy isn't correct.
Now that all drivers (that required a change) are updated to set the
policy->boost_supported properly, check this flag before enabling boost
feature for a policy.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
With a later commit, the cpufreq core will call the ->set_boost()
callback only if the policy supports boost frequency. The
boost_supported flag is set by the cpufreq core if policy->freq_table is
set and one or more boost frequencies are present.
For other drivers, the flag must be set explicitly.
With this, the local variable boost_supported isn't required anymore.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
With a later commit, the cpufreq core will call the ->set_boost()
callback only if the policy supports boost frequency. The
boost_supported flag is set by the cpufreq core if policy->freq_table is
set and one or more boost frequencies are present.
For other drivers, the flag must be set explicitly.
The policy->boost_enabled flag is set by the cpufreq core once the
policy is initialized, don't set it anymore.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
With a later commit, the cpufreq core will call the ->set_boost()
callback only if the policy supports boost frequency. The
boost_supported flag is set by the cpufreq core if policy->freq_table is
set and one or more boost frequencies are present.
For other drivers, the flag must be set explicitly.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
It is possible to have a scenario where not all cpufreq policies support
boost frequencies. And letting sysfs (or other parts of the kernel)
enable boost feature for that policy isn't correct.
Add a new flag, boost_supported, which will be set to true by the
cpufreq core only if the freq table contains valid boost frequencies.
Some cpufreq drivers though don't have boost frequencies in the
freq-table, they can set this flag from their ->init() callbacks.
Once all the drivers are updated to set the flag correctly, we can check
it before enabling boost feature for a policy.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
None of the drivers set these attributes directly now, remove the
unnecessary check.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
All users of cpufreq_generic_attr are migrated now, remove it. While at
it, also stop exporting attributes for available and boost frequencies
as they are only used by cpufreq core now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core handles this now, the driver can skip setting it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core handles this now, the driver can skip setting it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The core handles this now, the driver can skip setting it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The cpufreq core now handles this for basic attributes, including boost
frequencies, the driver can skip setting them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Currently it is left for the individual drivers to set the available and
boost frequencies related attributes in the cpufreq_driver->attr field.
Some drivers provide them, while others don't.
A quick search revealed that only the drivers that set the
policy->freq_table field, enable these attributes. Which makes sense as
well, since the show_available_freqs() helper works only if the
freq_table is present.
In order to simplify drivers, create the relevant sysfs files forcefully
from cpufreq core.
For now, skip adding them twice. This can be removed once all the
drivers are updated.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Commit f994c1cb6c ("cpufreq: Use str_enable_disable()-like helpers") has
already introduced helpers from string_choices.h and replaced ternary
syntax with it. Use str_enable_disable() helper in this line to stay
consistent.
Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* Fix some error cleanup paths with mutex use and boost
* Fix a ref counting issue
* Fix a schedutil issue
-----BEGIN PGP SIGNATURE-----
iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmelDV8aHG1hcmlvLmxp
bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3Anbo3g/+IiTRN6zM+LfsdbHKNFzx
X7ksJYlUPIPyF/osIUMuWvphQRucd5NEB44kaM60Iwc+2xcQmZFTL8DadIDEbYKY
0y+Qm9bPPYWTDUrczSwqoYrGU5XP7jWFbtIt6HVfBhiFshL1w0kbj4c7VSxRBI4h
vzMHu2WWE+qJ2520gCMXXlhQy5bu7o3VG+qNQexl32/VnZWCTeFp2G+DKzBZ61sL
hLlXdgokNrR2MIHtEKZG9mMQ6Vvuov/P8d4D6I2wtYKc2I7JMdJd/tjIOdvjraAi
7P9+MOnfGyUejG6m6k6u0bxDR+kWCGQXbZ+JRBJoTdDSI1STyxq4i2UoTFBHQ2M+
xOnGOHFKWyN8OXvRIAhdnfAKTqrIn/wOSxsMMI3/d1yvFkV7CFG79R1hpipVMSxx
WPi9a03mMP9nzt+3MsAu0O7Ma1F6RKe5O4rwDsbIueMZAbkEp54DRJq+mJEeqiEW
mZffJOMkz+XzpEAZ4fCzHjOA09FLu0KnJclBUOzlWVZr+Dyt9dP+k8mNiBJSUoWO
uxKMz7lwC/2Z0zSVJWnMMN66tkkhrCXg2PM9/Akx74UEmHaODUJHrRxam+SoMXta
y1KCIXmNoNHnvPjWEE2H+TjBwVKjB4h22GpyRmpaTD/z968jUXLFcZGmgntryRWS
NfDVzBVrCHOKix8ohNF9IO4=
=xfUQ
-----END PGP SIGNATURE-----
Merge tag 'amd-pstate-v6.14-2025-02-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge amd-pstate driver fixes for 6.14-rc2 from Mario Limonciello:
"* Fix some error cleanup paths with mutex use and boost
* Fix a ref counting issue
* Fix a schedutil issue"
* tag 'amd-pstate-v6.14-2025-02-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Fix cpufreq_policy ref counting
cpufreq/amd-pstate: Fix max_perf updation with schedutil
cpufreq/amd-pstate: Remove the goto label in amd_pstate_update_limits
cpufreq/amd-pstate: Fix per-policy boost flag incorrect when fail
amd_pstate_update_limits() takes a cpufreq_policy reference but doesn't
decrement the refcount in one of the exit paths, fix that.
Fixes: 45722e777f ("cpufreq: amd-pstate: Optimize amd_pstate_update_limits()")
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250205112523.201101-10-dhananjay.ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
set_boost is a per-policy function call, hence a driver wide lock is
unnecessary. Also this mutex_acquire can collide with the mutex_acquire
from the mode-switch path in status_store(), which can lead to a
deadlock. So, remove it.
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Acked-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The CPU rate from clk_get_rate() may not be divisible by 1000
(e.g., 133333333). But the rate calculated from frequency(kHz) is
always divisible by 1000 (e.g., 133333000).
Comparing the rate causes a warning during CPU scaling:
"cpufreq: __target_index: Failed to change cpu frequency: -5".
When we choose to compare kHz here, the issue does not occur.
Fixes: 343a8d17fa ("cpufreq: scpi: remove arm_big_little dependency")
Signed-off-by: zuoqian <zuoqian113@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Compile-testing without CONFIG_OF leads to a harmless build warning:
drivers/cpufreq/airoha-cpufreq.c:109:34: error: 'airoha_cpufreq_match_list' defined but not used [-Werror=unused-const-variable=]
109 | static const struct of_device_id airoha_cpufreq_match_list[] __initconst = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~
It would be possible to mark the variable as __maybe_unused to shut up
that warning, but a Kconfig dependency seems more appropriate as this still
allows build testing in allmodconfig and randconfig builds on all
architectures.
An earlier commit, b865a84046 ("cpufreq: airoha: Depends on OF"),
tried to fix it incorrectly. ARCH_AIROHA already requires CONFIG_OF, so
this change does nothing, and the dependency is still missing for the
COMPILE_TEST case.
Fix it properly.
Fixes: 84cf9e541c ("cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver")
Fixes: b865a84046 ("cpufreq: airoha: Depends on OF")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ Viresh: updated commit log and fixed rebase conflict ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/9d51d2710061dfa7f2568287c6ed125b858b7318.1738580005.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In adjust_perf() callback, we are setting the max_perf to highest_perf,
as opposed to the correct limit value i.e. max_limit_perf. Fix that.
Fixes: 3f7b835fa4 ("cpufreq/amd-pstate: Move limit updating code")
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250205112523.201101-3-dhananjay.ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Scope based guard/cleanup macros should not be used together with goto
labels. Hence, remove the goto label.
Fixes: 6c093d5a5b ("cpufreq/amd-pstate: convert mutex use to guard()")
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250205112523.201101-2-dhananjay.ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Commit c8c68c38b5 ("cpufreq: amd-pstate: initialize core precision
boost state") sets per-policy boost flag to false when boost fail.
However, this boost flag will be set to reverse value in
store_local_boost() and cpufreq_boost_trigger_state() in cpufreq.c. This
will cause the per-policy boost flag set to true when fail to set boost.
Remove the extra assignment in amd_pstate_set_boost() and keep all
operations on per-policy boost flag outside of set_boost() to fix this
problem.
Fixes: c8c68c38b5 ("cpufreq: amd-pstate: initialize core precision boost state")
Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250110091949.3610770-1-zhenglifeng1@huawei.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The Airoha cpufreq depends on OF and must be marked as such. With the
kernel compiled without OF support, we get following warning:
drivers/cpufreq/airoha-cpufreq.c:109:34: warning: 'airoha_cpufreq_match_list' defined but not used [-Wunused-const-variable=]
109 | static const struct of_device_id airoha_cpufreq_match_list[] __initconst = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501251941.0fXlcd1D-lkp@intel.com/
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/455e18c947bd9529701a2f1c796f0f934d1354d7.1738050679.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
At the end of cpufreq_online() in cpufreq.c, set_boost is executed and
the per-policy boost flag is set to mirror the cpufreq_driver boost, so
it is not necessary to run set_boost in acpi_cpufreq_cpu_init().
Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20250117101457.1530653-5-zhenglifeng1@huawei.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In policy initialization, policy->max and policy->cpuinfo.max_freq are
always set to the value calculated from caps->nominal_perf.
This will cause the frequency stay on base frequency even if the policy
is already boosted when a CPU is going online.
Fix this by using policy->boost_enabled to determine which value should
be set.
Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20250117101457.1530653-4-zhenglifeng1@huawei.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In cpufreq_online() of cpufreq.c, the per-policy boost flag is already
set to mirror the cpufreq_driver boost during init but using freq_table
to judge if the policy has boost frequency. There are two drawbacks to
this approach:
1. It doesn't work for the cpufreq drivers that do not use a frequency
table. For now, acpi-cpufreq and amd-pstate have to enable boost in
policy initialization. And cppc_cpufreq never set policy to boost
when going online no matter what the cpufreq_driver boost flag is.
2. If the CPU goes offline when cpufreq_driver boost is enabled and
then goes online when cpufreq_driver boost is disabled, the
per-policy boost flag will incorrectly remain true.
Running set_boost at the end of the online process is a more generic way
for all cpufreq drivers.
Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/20250117101457.1530653-3-zhenglifeng1@huawei.com
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It turns out that CPUX will stay on the base frequency after performing
these operations:
1. boost all CPUs: echo 1 > /sys/devices/system/cpu/cpufreq/boost
2. offline one CPU: echo 0 > /sys/devices/system/cpu/cpuX/online
3. deboost all CPUs: echo 0 > /sys/devices/system/cpu/cpufreq/boost
4. online CPUX: echo 1 > /sys/devices/system/cpu/cpuX/online
5. boost all CPUs again: echo 1 > /sys/devices/system/cpu/cpufreq/boost
This is because max_freq_req of the policy is not updated during the
online process, and the value of max_freq_req before the last offline is
retained.
When the CPU is boosted again, freq_qos_update_request() will do nothing
because the old value is the same as the new one. This causes the CPU to
stay at the base frequency. Updating max_freq_req in cpufreq_online()
will solve this problem.
Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20250117101457.1530653-2-zhenglifeng1@huawei.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The driver generates following warning when regulator support isn't
enabled in the kernel. Fix it.
drivers/cpufreq/s3c64xx-cpufreq.c: In function 's3c64xx_cpufreq_set_target':
>> drivers/cpufreq/s3c64xx-cpufreq.c:55:22: warning: variable 'old_freq' set but not used [-Wunused-but-set-variable]
55 | unsigned int old_freq, new_freq;
| ^~~~~~~~
>> drivers/cpufreq/s3c64xx-cpufreq.c:54:30: warning: variable 'dvfs' set but not used [-Wunused-but-set-variable]
54 | struct s3c64xx_dvfs *dvfs;
| ^~~~
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501191803.CtfT7b2o-lkp@intel.com/
Cc: 5.4+ <stable@vger.kernel.org> # v5.4+
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/236b227e929e5adc04d1e9e7af6845a46c8e9432.1737525916.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Use str_enable_disable()-like helpers in cpufreq (Krzysztof
Kozlowski).
- Extend the Apple cpufreq driver to support more SoCs (Hector Martin,
Nick Chan).
- Add new cpufreq driver for Airoha SoCs (Christian Marangi).
- Fix using cpufreq-dt as module (Andreas Kemnade).
- Minor fixes for Sparc, SCMI, and Qcom cpufreq drivers (Ethan Carter
Edwards, Sibi Sankar, Manivannan Sadhasivam).
- Fix the maximum supported frequency computation in the ACPI cpufreq
driver to avoid relying on unfounded assumptions (Gautham Shenoy).
- Fix an amd-pstate driver regression with preferred core rankings not
being used (Mario Limonciello).
- Fix a precision issue with frequency calculation in the amd-pstate
driver (Naresh Solanki).
- Add ftrace event to the amd-pstate driver for active mode (Mario
Limonciello).
- Set default EPP policy on Ryzen processors in amd-pstate (Mario
Limonciello).
- Clean up the amd-pstate cpufreq driver and optimize it to increase
code reuse (Mario Limonciello, Dhananjay Ugwekar).
- Use CPPC to get scaling factors between HWP performance levels and
frequency in the intel_pstate driver and make it stop using a built
-in scaling factor for Arrow Lake processors (Rafael Wysocki).
- Make intel_pstate initialize epp_policy to CPUFREQ_POLICY_UNKNOWN for
consistency with CPU offline (Christian Loehle).
- Fix superfluous updates caused by need_freq_update in the schedutil
cpufreq governor (Sultan Alsawaf).
- Allow configuring the system suspend-resume (DPM) watchdog to warn
earlier than panic (Douglas Anderson).
- Implement devm_device_init_wakeup() helper and introduce a device-
managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan).
- Remove direct inclusions of 'pm_wakeup.h' which should be only
included via 'device.h' (Wolfram Sang).
- Clean up two comments in the core system-wide PM code (Rafael
Wysocki, Randy Dunlap).
- Add Clearwater Forest processor support to the intel_idle cpuidle
driver (Artem Bityutskiy).
- Clean up the Exynos devfreq driver and devfreq core (Markus Elfring,
Jeongjun Park).
- Minor cleanups and fixes for OPP (Dan Carpenter, Neil Armstrong, Joe
Hattori).
- Implement dev_pm_opp_get_bw() (Neil Armstrong).
- Expose OPP reference counting helpers for Rust (Viresh Kumar).
- Fix TSC MHz calculation in cpupower (He Rongguang).
- Add install and uninstall options to bindings Makefile and add header
changes for cpufreq.h to SWIG bindings in cpupower (John B. Wyatt IV).
- Add missing residency header changes in cpuidle.h to SWIG bindings in
cpupower (John B. Wyatt IV).
- Add output files to .gitignore and clean them up in "make clean" in
selftests/cpufreq (Li Zhijian).
- Fix cross-compilation in cpupower Makefile (Peng Fan).
- Revise the is_valid flag handling for idle_monitor in the cpupower
utility (wangfushuai).
- Extend and clean up AMD processors support in cpupower (Mario
Limonciello).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmeOthsSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxqQsP/ivDt8nqDnxdKB7cKFQIsEK+tl0RnFVD
o5regvYeRcGWpUXuMaqBtTmCMjsB8bUkcj2yLquM54ubjHAGF6zJuw9ZytMPHVcC
b2xk3RCFlXSBFXVK8eOh3XRviA9nGhuY97ZnPsQOlvoECrxT2xyeL+mWo7s+t+q9
2NUH+yfRoi5FM+nqqDhsm0xXxJuPaNg6eAjIASuMjXap48rNk3L5kW6W/6nw7i0I
xQWd/pKLHaI5e7DRF/QdMKu8+Fm4BbN0jMqLblKPOmTe9KggvBkck5q1Um20sYkJ
vdKMAT02ClGavIC7DtY092Xik84NZfID4ZUchS6e2hJIQ3Uaw/eDvAo/jlT8gIzq
fnXPdApRIzQGDvMxFaAsKaGlwxiVlAGHPDSTH6MVWzsp+1DSkbloSwVPAfeYIn44
Jhov+6Ydux3597sSjo+YmD58acimXl7urVuk8P6m3U5+gb8/jlgbxpIn+vbxH3Ka
o44Vt7axD63gezOQY134sj5gic5JL0GuZovOlvzrF6+FsjvVqcax6FZ4n3uIXu7P
C1nwai+Wdzo7wvuz7RfO0g15Y15wYLQLYsRq/osRlf+sOmGVv7nA9tSzZ0LUdD5D
Pp6PxppF6anM0Kjen8Ppuu+Bcr11JfVvhnVTJqhs6u71XdAy4TnG1JjL4lPWYJ4D
Gfz2hyPNjiQX
=AoMC
-----END PGP SIGNATURE-----
Merge tag 'pm-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The majority of changes here are cpufreq updates which are dominated
by amd-pstate driver changes, like in the previous cycle. Moreover,
changes related to amd-pstate are also the majority of cpupower
utility updates.
Included are some pieces of new hardware support, like the addition of
Clearwater Forest processors support to intel_idle, new cpufreq driver
for Airoha SoCs, and Apple cpufreq driver extensions to support more
SoCs. The intel_pstate driver is also extended to be able to support
new platforms by using ACPI CPPC to compute scaling factors between
HWP performance states and frequency.
The rest is mostly fixes and cleanups in assorted pieces of power
management code.
Specifics:
- Use str_enable_disable()-like helpers in cpufreq (Krzysztof
Kozlowski)
- Extend the Apple cpufreq driver to support more SoCs (Hector
Martin, Nick Chan)
- Add new cpufreq driver for Airoha SoCs (Christian Marangi)
- Fix using cpufreq-dt as module (Andreas Kemnade)
- Minor fixes for Sparc, SCMI, and Qcom cpufreq drivers (Ethan Carter
Edwards, Sibi Sankar, Manivannan Sadhasivam)
- Fix the maximum supported frequency computation in the ACPI cpufreq
driver to avoid relying on unfounded assumptions (Gautham Shenoy)
- Fix an amd-pstate driver regression with preferred core rankings
not being used (Mario Limonciello)
- Fix a precision issue with frequency calculation in the amd-pstate
driver (Naresh Solanki)
- Add ftrace event to the amd-pstate driver for active mode (Mario
Limonciello)
- Set default EPP policy on Ryzen processors in amd-pstate (Mario
Limonciello)
- Clean up the amd-pstate cpufreq driver and optimize it to increase
code reuse (Mario Limonciello, Dhananjay Ugwekar)
- Use CPPC to get scaling factors between HWP performance levels and
frequency in the intel_pstate driver and make it stop using a
built-in scaling factor for Arrow Lake processors (Rafael Wysocki)
- Make intel_pstate initialize epp_policy to CPUFREQ_POLICY_UNKNOWN
for consistency with CPU offline (Christian Loehle)
- Fix superfluous updates caused by need_freq_update in the schedutil
cpufreq governor (Sultan Alsawaf)
- Allow configuring the system suspend-resume (DPM) watchdog to warn
earlier than panic (Douglas Anderson)
- Implement devm_device_init_wakeup() helper and introduce a device-
managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan)
- Remove direct inclusions of 'pm_wakeup.h' which should be only
included via 'device.h' (Wolfram Sang)
- Clean up two comments in the core system-wide PM code (Rafael
Wysocki, Randy Dunlap)
- Add Clearwater Forest processor support to the intel_idle cpuidle
driver (Artem Bityutskiy)
- Clean up the Exynos devfreq driver and devfreq core (Markus
Elfring, Jeongjun Park)
- Minor cleanups and fixes for OPP (Dan Carpenter, Neil Armstrong,
Joe Hattori)
- Implement dev_pm_opp_get_bw() (Neil Armstrong)
- Expose OPP reference counting helpers for Rust (Viresh Kumar)
- Fix TSC MHz calculation in cpupower (He Rongguang)
- Add install and uninstall options to bindings Makefile and add
header changes for cpufreq.h to SWIG bindings in cpupower (John B.
Wyatt IV)
- Add missing residency header changes in cpuidle.h to SWIG bindings
in cpupower (John B. Wyatt IV)
- Add output files to .gitignore and clean them up in "make clean" in
selftests/cpufreq (Li Zhijian)
- Fix cross-compilation in cpupower Makefile (Peng Fan)
- Revise the is_valid flag handling for idle_monitor in the cpupower
utility (wangfushuai)
- Extend and clean up AMD processors support in cpupower (Mario
Limonciello)"
* tag 'pm-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (67 commits)
PM / OPP: Add reference counting helpers for Rust implementation
PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq()
cpufreq: Use str_enable_disable()-like helpers
cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver
PM: sleep: Allow configuring the DPM watchdog to warn earlier than panic
PM: sleep: convert comment from kernel-doc to plain comment
cpufreq: ACPI: Fix max-frequency computation
pm: cpupower: Add missing residency header changes in cpuidle.h to SWIG
PM / devfreq: exynos: remove unused function parameter
OPP: OF: Fix an OF node leak in _opp_add_static_v2()
cpufreq/amd-pstate: Refactor max frequency calculation
cpufreq/amd-pstate: Fix prefcore rankings
pm: cpupower: Add header changes for cpufreq.h to SWIG bindings
cpufreq: sparc: change kzalloc to kcalloc
cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks
cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available
cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support
cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT
cpufreq: apple-soc: Increase cluster switch timeout to 400us
cpufreq: apple-soc: Use 32-bit read for status register
...
1) Per-CPU kthreads must stay affine to a single CPU and never execute
relevant code on any other CPU. This is currently handled by smpboot
code which takes care of CPU-hotplug operations. Affinity here is
a correctness constraint.
2) Some kthreads _have_ to be affine to a specific set of CPUs and can't
run anywhere else. The affinity is set through kthread_bind_mask()
and the subsystem takes care by itself to handle CPU-hotplug
operations. Affinity here is assumed to be a correctness constraint.
3) Per-node kthreads _prefer_ to be affine to a specific NUMA node. This
is not a correctness constraint but merely a preference in terms of
memory locality. kswapd and kcompactd both fall into this category.
The affinity is set manually like for any other task and CPU-hotplug
is supposed to be handled by the relevant subsystem so that the task
is properly reaffined whenever a given CPU from the node comes up.
Also care should be taken so that the node affinity doesn't cross
isolated (nohz_full) cpumask boundaries.
4) Similar to the previous point except kthreads have a _preferred_
affinity different than a node. Both RCU boost kthreads and RCU
exp kworkers fall into this category as they refer to "RCU nodes"
from a distinctly distributed tree.
Currently the preferred affinity patterns (3 and 4) have at least 4
identified users, with more or less success when it comes to handle
CPU-hotplug operations and CPU isolation. Each of which do it in its own
ad-hoc way.
This is an infrastructure proposal to handle this with the following API
changes:
_ kthread_create_on_node() automatically affines the created kthread to
its target node unless it has been set as per-cpu or bound with
kthread_bind[_mask]() before the first wake-up.
- kthread_affine_preferred() is a new function that can be called right
after kthread_create_on_node() to specify a preferred affinity
different than the specified node.
When the preferred affinity can't be applied because the possible
targets are offline or isolated (nohz_full), the kthread is affine
to the housekeeping CPUs (which means to all online CPUs most of the
time or only the non-nohz_full CPUs when nohz_full= is set).
kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
converted, along with a few old drivers.
Summary of the changes:
* Consolidate a bunch of ad-hoc implementations of kthread_run_on_cpu()
* Introduce task_cpu_fallback_mask() that defines the default last
resort affinity of a task to become nohz_full aware
* Add some correctness check to ensure kthread_bind() is always called
before the first kthread wake up.
* Default affine kthread to its preferred node.
* Convert kswapd / kcompactd and remove their halfway working ad-hoc
affinity implementation
* Implement kthreads preferred affinity
* Unify kthread worker and kthread API's style
* Convert RCU kthreads to the new API and remove the ad-hoc affinity
implementation.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmeNf8gACgkQhSRUR1CO
jHedQQ/+IxTjjqQiItzrq41TES2S0desHDq8lNJFb7rsR/DtKFyLx3s67cOYV+cM
Yx54QHg2m/Fz4nXMQ7Po5ygOtJGCKBc5C5QQy7y0lVKeTQK+daDfEtBSa3oG7j3C
u+E3tTY6qxkbCzymUyaKkHN4/ay2vLvjFS50luV7KMyI3x47Aji+t7VdCX4LCPP2
eAwOALWD0+7qLJ/VF6gsmQLKA4Qx7PQAzBa3KSBmUN9UcN8Gk1bQHCTIQKDHP9LQ
v8BXrNZtYX1o2+snNYpX2z6/ECjxkdwriOgqqZY5306hd9RAQ1u46Dx3byrIqjGn
ULG/XQ2istPyhTqb/h+RbrobdOcwEUIeqk8hRRbBXE8bPpqUz9EMuaCMxWDbQjgH
NTuKG4ifKJ/IqstkkuDkdOiByE/ysMmwqrTXgSnu2ITNL9yY3BEgFbvA95hgo42s
f7QCxEfZb1MHcNEMENSMwM3xw5lLMGMpxVZcMQ3gLwyotMBRrhFZm1qZJG7TITYW
IDIeCbH4JOMdQwLs3CcWTXio0N5/85NhRNFV+IDn96OrgxObgnMtV8QwNgjXBAJ5
wGeJWt8s34W1Zo3qS9gEuVzEhW4XaxISQQMkHe8faKkK6iHmIB/VjSQikDwwUNQ/
AspYj82RyWBCDZsqhiYh71kpxjvS6Xp0bj39Ce1sNsOnuksxKkQ=
=g8In
-----END PGP SIGNATURE-----
Merge tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull kthread updates from Frederic Weisbecker:
"Kthreads affinity follow either of 4 existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never
execute relevant code on any other CPU. This is currently handled
by smpboot code which takes care of CPU-hotplug operations.
Affinity here is a correctness constraint.
2) Some kthreads _have_ to be affine to a specific set of CPUs and
can't run anywhere else. The affinity is set through
kthread_bind_mask() and the subsystem takes care by itself to
handle CPU-hotplug operations. Affinity here is assumed to be a
correctness constraint.
3) Per-node kthreads _prefer_ to be affine to a specific NUMA node.
This is not a correctness constraint but merely a preference in
terms of memory locality. kswapd and kcompactd both fall into this
category. The affinity is set manually like for any other task and
CPU-hotplug is supposed to be handled by the relevant subsystem so
that the task is properly reaffined whenever a given CPU from the
node comes up. Also care should be taken so that the node affinity
doesn't cross isolated (nohz_full) cpumask boundaries.
4) Similar to the previous point except kthreads have a _preferred_
affinity different than a node. Both RCU boost kthreads and RCU
exp kworkers fall into this category as they refer to "RCU nodes"
from a distinctly distributed tree.
Currently the preferred affinity patterns (3 and 4) have at least 4
identified users, with more or less success when it comes to handle
CPU-hotplug operations and CPU isolation. Each of which do it in its
own ad-hoc way.
This is an infrastructure proposal to handle this with the following
API changes:
- kthread_create_on_node() automatically affines the created kthread
to its target node unless it has been set as per-cpu or bound with
kthread_bind[_mask]() before the first wake-up.
- kthread_affine_preferred() is a new function that can be called
right after kthread_create_on_node() to specify a preferred
affinity different than the specified node.
When the preferred affinity can't be applied because the possible
targets are offline or isolated (nohz_full), the kthread is affine to
the housekeeping CPUs (which means to all online CPUs most of the time
or only the non-nohz_full CPUs when nohz_full= is set).
kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
converted, along with a few old drivers.
Summary of the changes:
- Consolidate a bunch of ad-hoc implementations of
kthread_run_on_cpu()
- Introduce task_cpu_fallback_mask() that defines the default last
resort affinity of a task to become nohz_full aware
- Add some correctness check to ensure kthread_bind() is always
called before the first kthread wake up.
- Default affine kthread to its preferred node.
- Convert kswapd / kcompactd and remove their halfway working ad-hoc
affinity implementation
- Implement kthreads preferred affinity
- Unify kthread worker and kthread API's style
- Convert RCU kthreads to the new API and remove the ad-hoc affinity
implementation"
* tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
kthread: modify kernel-doc function name to match code
rcu: Use kthread preferred affinity for RCU exp kworkers
treewide: Introduce kthread_run_worker[_on_cpu]()
kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
rcu: Use kthread preferred affinity for RCU boost
kthread: Implement preferred affinity
mm: Create/affine kswapd to its preferred node
mm: Create/affine kcompactd to its preferred node
kthread: Default affine kthread to its preferred NUMA node
kthread: Make sure kthread hasn't started while binding it
sched,arm64: Handle CPU isolation on last resort fallback rq selection
arm64: Exclude nohz_full CPUs from 32bits el0 support
lib: test_objpool: Use kthread_run_on_cpu()
kallsyms: Use kthread_run_on_cpu()
soc/qman: test: Use kthread_run_on_cpu()
arm/bL_switcher: Use kthread_run_on_cpu()
Merge cpufreq updates for 6.14:
- Use str_enable_disable()-like helpers in cpufreq (Krzysztof
Kozlowski).
- Extend the Apple cpufreq driver to support more SoCs (Hector Martin,
Nick Chan).
- Add new cpufreq driver for Airoha SoCs (Christian Marangi).
- Fix using cpufreq-dt as module (Andreas Kemnade).
- Minor fixes for Sparc, SCMI, and Qcom cpufreq drivers (Ethan Carter
Edwards, Sibi Sankar, Manivannan Sadhasivam).
- Fix the maximum supported frequency computation in the ACPI cpufreq
driver to avoid relying on unfounded assumptions (Gautham Shenoy).
- Fix an amd-pstate driver regression with preferred core rankings not
being used (Mario Limonciello).
- Fix a precision issue with frequency calculation in the amd-pstate
driver (Naresh Solanki).
- Add ftrace event to the amd-pstate driver for active mode (Mario
Limonciello).
- Set default EPP policy on Ryzen processors in amd-pstate (Mario
Limonciello).
- Clean up the amd-pstate cpufreq driver and optimize it to increase
code reuse (Mario Limonciello, Dhananjay Ugwekar).
- Use CPPC to get scaling factors between HWP performance levels and
frequency in the intel_pstate driver and make it stop using a built
-in scaling factor for the Arrow Lake processor (Rafael Wysocki).
- Make intel_pstate initialize epp_policy to CPUFREQ_POLICY_UNKNOWN for
consistency with CPU offline (Christian Loehle).
- Fix superfluous updates caused by need_freq_update in the schedutil
cpufreq governor (Sultan Alsawaf).
* pm-cpufreq: (40 commits)
cpufreq: Use str_enable_disable()-like helpers
cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver
cpufreq: ACPI: Fix max-frequency computation
cpufreq/amd-pstate: Refactor max frequency calculation
cpufreq/amd-pstate: Fix prefcore rankings
cpufreq: sparc: change kzalloc to kcalloc
cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks
cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available
cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support
cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT
cpufreq: apple-soc: Increase cluster switch timeout to 400us
cpufreq: apple-soc: Use 32-bit read for status register
cpufreq: apple-soc: Allow per-SoC configuration of APPLE_DVFS_CMD_PS1
cpufreq: apple-soc: Drop setting the PS2 field on M2+
dt-bindings: cpufreq: apple,cluster-cpufreq: Add A7-A11, T2 compatibles
dt-bindings: cpufreq: Document support for Airoha EN7581 CPUFreq
cpufreq: fix using cpufreq-dt as module
cpufreq: scmi: Register for limit change notifications
cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
cpufreq: intel_pstate: Use CPUFREQ_POLICY_UNKNOWN
...
Merge updates related to system sleep, a cpuidle update and an Energy
Model handling code update for 6.14-rc1:
- Allow configuring the system suspend-resume (DPM) watchdog to warn
earlier than panic (Douglas Anderson).
- Implement devm_device_init_wakeup() helper and introduce a device-
managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan).
- Remove direct inclusions of 'pm_wakeup.h' which should be only
included via 'device.h' (Wolfram Sang).
- Clean up two comments in the core system-wide PM code (Rafael
Wysocki, Randy Dunlap).
- Add Clearwater Forest processor support to the intel_idle cpuidle
driver (Artem Bityutskiy).
- Move sched domains rebuild function from the schedutil cpufreq
governor to the Energy Model handling code (Rafael Wysocki).
* pm-sleep:
PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq()
PM: sleep: Allow configuring the DPM watchdog to warn earlier than panic
PM: sleep: convert comment from kernel-doc to plain comment
PM: wakeup: implement devm_device_init_wakeup() helper
PM: sleep: sysfs: don't include 'pm_wakeup.h' directly
PM: sleep: autosleep: don't include 'pm_wakeup.h' directly
PM: sleep: Update stale comment in device_resume()
* pm-cpuidle:
intel_idle: add Clearwater Forest SoC support
* pm-em:
PM: EM: Move sched domains rebuild function from schedutil to EM
- Extended support for more SoCs in apple cpufreq driver (Hector Martin
and Nick Chan).
- Add new cpufreq driver for Airoha SoCs (Christian Marangi).
- Fix using cpufreq-dt as module (Andreas Kemnade).
- Minor fixes for Sparc, scmi, and Qcom drivers (Ethan Carter Edwards,
Sibi Sankar and Manivannan Sadhasivam).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmeNxBsACgkQ0rkcPK6B
EhyhQBAArv2A97hZz0frOz4PWNpZ7DUSsp8UUhBZ5pzM+ZK3hckgReyxm/n14qbJ
iOvX+S8VenJrartsWDH9ucDiK8r2JvJpc+TNliPKvGkMd1qGDizC3+mgb/zdtJwz
ZrspIdE/5HQ/yVyOylXQr8mpUe6o4Jz/n1vjPZV6CcbxCj8OUvdBriqFvUVAm+Dq
s5EUWeziSdXgholUDlCM2E03yzkLSnf9aJJKzdRhm4raDoMm4rT8RkPXUvPrB+TY
tv1ykTWyLnn6gzRT5o7odAi0lenjWNLJWZcixcY/CAjax6/5RjirYze2vdwZ+g8k
EWd+y84eGSvmcBK5LVvuxN6XZja8pP5sEaZLGknmOsNkIOuNeXKPQXGKBg+SzQlp
atWiHeoTLorkDkCW5Hu6NFnHOKJM+nqPo8vqQPpQty2fUarc3L/k6jZTsyx0+0gX
46axncUenJasvXNQ1Oab3tW4SJ9hpv8AaHUY6TQEwIvGlkBctq/wr37wKDAN0KRy
WhtvCuybIvlaJ2FES61nPMDITV8ZYbKRHYbPP074qqiI2W4sIAjbT+qlx3f/3XWx
Ese7PaPiPg7ZoqUfXPdk77KKSDtbxsg8rCY3eaoKfusgMn5XkVFZgK5N19fAYPtP
5VA2vHQxLpRV8C/yZrmyhrQWFSFdFdtFeMvQrPeBn4QGYR+Rcms=
=J/XA
-----END PGP SIGNATURE-----
Merge tag 'cpufreq-arm-updates-6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge ARM cpufreq updates for 6.14 from Viresh Kumar:
"- Extended support for more SoCs in apple cpufreq driver (Hector Martin
and Nick Chan).
- Add new cpufreq driver for Airoha SoCs (Christian Marangi).
- Fix using cpufreq-dt as module (Andreas Kemnade).
- Minor fixes for Sparc, scmi, and Qcom drivers (Ethan Carter Edwards,
Sibi Sankar and Manivannan Sadhasivam)."
* tag 'cpufreq-arm-updates-6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver
cpufreq: sparc: change kzalloc to kcalloc
cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks
cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available
cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support
cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT
cpufreq: apple-soc: Increase cluster switch timeout to 400us
cpufreq: apple-soc: Use 32-bit read for status register
cpufreq: apple-soc: Allow per-SoC configuration of APPLE_DVFS_CMD_PS1
cpufreq: apple-soc: Drop setting the PS2 field on M2+
dt-bindings: cpufreq: apple,cluster-cpufreq: Add A7-A11, T2 compatibles
dt-bindings: cpufreq: Document support for Airoha EN7581 CPUFreq
cpufreq: fix using cpufreq-dt as module
cpufreq: scmi: Register for limit change notifications
Replace ternary (condition ? "enable" : "disable") syntax with helpers
from string_choices.h because:
1. Simple function call with one argument is easier to read. Ternary
operator has three arguments and with wrapping might lead to quite
long code.
2. Is slightly shorter thus also easier to read.
3. It brings uniformity in the text - same string.
4. Allows deduping by the linker, which results in a smaller binary
file.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20250114190600.846651-1-krzysztof.kozlowski@linaro.org
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add simple CPU Freq driver for Airoha EN7581 SoC that control CPU
frequency scaling with SMC APIs and register a generic "cpufreq-dt"
device.
All CPU share the same frequency and can't be controlled independently.
CPU frequency is controlled by the attached PM domain.
Add SoC compatible to cpufreq-dt-plat block list as a dedicated cpufreq
driver is needed with OPP v2 nodes declared in DTS.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Commit 3c55e94c0a ("cpufreq: ACPI: Extend frequency tables to cover
boost frequencies") introduced an assumption in acpi_cpufreq_cpu_init()
that the first entry in the P-state table was the nominal frequency.
This assumption is incorrect. The frequency corresponding to the P0
P-State need not be the same as the nominal frequency advertised via
CPPC.
Since the driver is using the CPPC.highest_perf and CPPC.nominal_perf
to compute the boost-ratio, it makes sense to use CPPC.nominal_freq to
compute the max-frequency. CPPC.nominal_freq is advertised on
platforms supporting CPPC revisions 3 or higher.
Hence, fallback to using the first entry in the P-State table only on
platforms that do not advertise CPPC.nominal_freq.
Fixes: 3c55e94c0a ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies")
Tested-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/20250113044107.566-1-gautham.shenoy@amd.com
[ rjw: Retain reverse X-mas tree ordering of local variable declarations ]
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is possible to enable few cpufreq drivers, without the framework
being enabled. This happened due to a bug while moving the entries
earlier. Fix it.
Fixes: 7ee1378736 ("cpufreq: Move CPPC configs to common Kconfig and add RISC-V")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/84ac7a8fa72a8fe20487bb0a350a758bce060965.1736488384.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix a regression with preferred core rankings not being used.
Fix a precision issue with frequency calculation.
-----BEGIN PGP SIGNATURE-----
iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmd9nfoaHG1hcmlvLmxp
bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3AnZwjBAA4KrGmCNlV3At/v82MQXx
Bwe+I6ngIweHDglV9UDDGgtqtQJfP3cJfXp36ppZSHn+xidMoEt3TjVpnCF3dP7X
GYQOsMWQlPnsBQfh8TQusPoDGZK+UnS5GQ6F9jHkUxo1p4PsOcajW6fu6EZGQBc9
dAf+roHDZqEcAWwmxAzycpYrFxNU2YWvWW6gX24yYdO+mbNQpkgw6zMmzLKUtlCm
uWBtOXxOepee65dMqPMgsRyjWJPoGXWB4mnAo17147LW0qYQ0mPw4D6wyKJsOj5y
UTLxz4QxHH5WbRPdMBcdX5m85CqxOqj+KeauWHsnQ4XGrA1n+D5SOt0ieAIWgSOm
k9/KCYvw2+ZZ8dFYEvwtuImrDbBrJDQGjAQr1HAiPl5f4TE7vfbZY6SLYO8gQ0RH
U4POkGUDWdGoKwCjyBReZ6dVIerCCH1gYmLp+HJqPK7k4BoX7bHNOCJ9sc+hCop5
x9SZFA5Gmf8TkXRcSMctatm9CxvIawh4THkkzctQErKZs0qJ/+KdTVEwg0i44sKm
XNH3iRjWPttFI3b+wJZMwwoBmUSTvOMycz4oJZ5jN0wYXfvdPh3FcF7XUpdn9wNq
8qWE00zMPy9u9wCpjOiIQ1BPHBaqXxfvTDfcoi2zoxpv/f9QM6tsqtjUhZG0+n5t
oKkupREcIpKqUIvgdCf1v8M=
=vmI+
-----END PGP SIGNATURE-----
Merge tag 'amd-pstate-v6.14-2025-01-07' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge amd-pstate changes for 6.14 from Mario Limonciello:
"Fix a regression with preferred core rankings not being used.
Fix a precision issue with frequency calculation."
* tag 'amd-pstate-v6.14-2025-01-07' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Refactor max frequency calculation
cpufreq/amd-pstate: Fix prefcore rankings
kthread_create() creates a kthread without running it yet. kthread_run()
creates a kthread and runs it.
On the other hand, kthread_create_worker() creates a kthread worker and
runs it.
This difference in behaviours is confusing. Also there is no way to
create a kthread worker and affine it using kthread_bind_mask() or
kthread_affine_preferred() before starting it.
Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]()
that behaves just like kthread_run(). kthread_create_worker[_on_cpu]()
will now only create a kthread worker without starting it.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
The previous approach introduced roundoff errors during division when
calculating the boost ratio. This, in turn, affected the maximum
frequency calculation, often resulting in reporting lower frequency
values.
For example, on the Glinda SoC based board with the following
parameters:
max_perf = 208
nominal_perf = 100
nominal_freq = 2600 MHz
The Linux kernel previously calculated the frequency as:
freq = ((max_perf * 1024 / nominal_perf) * nominal_freq) / 1024
freq = 5405 MHz // Integer arithmetic.
With the updated formula:
freq = (max_perf * nominal_freq) / nominal_perf
freq = 5408 MHz
This change ensures more accurate frequency calculations by eliminating
unnecessary shifts and divisions, thereby improving precision.
Signed-off-by: Naresh Solanki <naresh.solanki@9elements.com>
[ML: trim the changelog from commit message]
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241219201833.2750998-1-naresh.solanki@9elements.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
commit 50a062a762 ("cpufreq/amd-pstate: Store the boost numerator as
highest perf again") updated the value stored for highest perf to no longer
store the highest perf value but instead the boost numerator.
This is a fixed value for systems with preferred cores and not appropriate
for use ITMT rankings. Update the value used for ITMT rankings to be the
preferred core ranking.
Reported-and-tested-by: Sebastian <sobrus@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219640
Fixes: 50a062a762 ("cpufreq/amd-pstate: Store the boost numerator as highest perf again")
Reviewed-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20250102141204.3413202-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Refactor to use kcalloc instead of kzalloc when multiplying
allocation size by count. This refactor prevents unintentional
memory overflows. Discovered by checkpatch.pl.
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
determine_rate() callback is used by the clk_set_rate() API to get the
closest rate of the target rate supported by the clock. If this callback
is not implemented (nor round_rate() callback), then the API will assume
that the clock cannot set the requested rate. And since there is no parent,
it will return -EINVAL.
This is not an issue right now as clk_set_rate() mistakenly compares the
target rate with cached rate and bails out early. But once that is fixed
to compare the target rate with the actual rate of the clock (returned by
recalc_rate()), then clk_set_rate() for this clock will start to fail as
below:
cpu cpu0: _opp_config_clk_single: failed to set clock rate: -22
So implement the determine_rate() callback that just returns the actual
rate at which the clock is passed to the CPUs in a domain.
Fixes: 4370232c72 ("cpufreq: qcom-hw: Add CPU clock provider support")
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Suggested-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Currently, qcom_cpufreq_hw_recalc_rate() returns the LMh throttled
frequency for the domain even if LMh IRQ is not available. But as per
qcom_cpufreq_hw_get(), the driver has to query LUT entries to get the
actual frequency of the domain. So do the same in
qcom_cpufreq_hw_recalc_rate().
While doing so, refactor the existing qcom_cpufreq_hw_get() function so
that qcom_cpufreq_hw_recalc_rate() can make use of the existing code and
avoid code duplication. This also requires setting the
qcom_cpufreq_data::policy even if LMh IRQ is not available.
Fixes: 4370232c72 ("cpufreq: qcom-hw: Add CPU clock provider support")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
These SoCs only use 3 bits for p-states, and have a different
APPLE_DVFS_CMD_PS1 mask value.
Signed-off-by: Nick Chan <towinchenmi@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The driver already assumes transitions will not take longer than
APPLE_DVFS_TRANSITION_TIMEOUT in apple_soc_cpufreq_set_target(), so it
makes little sense to set CPUFREQ_ETERNAL as the transition latency
when the transistion latency is not given by the opp-table.
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Nick Chan <towinchenmi@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Apple A11 SoC takes a long time to switch. Maximum switch time
observed is 345us, so increase the cluster switch timeout to 400us
to be safe.
Signed-off-by: Nick Chan <towinchenmi@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Apple A7-A9(X) SoCs requires 32-bit reads on the status register. Newer
SoCs accepts 32-bit reads on the status register as well.
Signed-off-by: Nick Chan <towinchenmi@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Support for SoC that has a different APPLE_DVFS_CMD_PS1 will be added soon,
so modify the driver first to allow it to be configured per-SoC.
Signed-off-by: Nick Chan <towinchenmi@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Newer device do not use this. It is not known what this field does,
but change the behavior to be same as macOS to be safe.
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Nick Chan <towinchenmi@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This driver can be built as a module since commit 3b062a0869 ("cpufreq:
dt-platdev: Support building as module"), but unfortunately this caused
a regression because the cputfreq-dt-platdev.ko module does not autoload.
Usually, this is solved by just using the MODULE_DEVICE_TABLE() macro to
export all the device IDs as module aliases. But this driver is special
due how matches with devices and decides what platform supports.
There are two of_device_id lists, an allow list that are for CPU devices
that always match and a deny list that's for devices that must not match.
The driver registers a cpufreq-dt platform device for all the CPU device
nodes that either are in the allow list or contain an operating-points-v2
property and are not in the deny list.
Enforce builtin compile of cpufreq-dt-platdev to make autoload work.
Fixes: 3b062a0869 ("cpufreq: dt-platdev: Support building as module")
Link: https://lore.kernel.org/all/20241104201424.2a42efdd@akair/
Link: https://lore.kernel.org/all/20241119111918.1732531-1-javierm@redhat.com/
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Reported-by: Radu Rendec <rrendec@redhat.com>
Reported-by: Javier Martinez Canillas <javierm@redhat.com>
[ Viresh: Picked commit log from Javier, updated tags ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Register for limit change notifications if supported and use the throttled
frequency from the notification to apply HW pressure.
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
Tested-by: Mike Tipton <quic_mdtipton@quicinc.com>
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Mostly cleanups and optimizations to increase code reuse by
shuffling around and using helpers.
Notable other changes:
* Add ftrace event for active mode to use
* Set default EPP policy on Ryzen
-----BEGIN PGP SIGNATURE-----
iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmdjJMEaHG1hcmlvLmxp
bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3AnYr0g/+KbniaTKVcEVa1IVz55+i
X5O72FSJjrKP2plMtK52wVZS7hVSawA/hfKeOqX1+40Ynss1LI/s7NzeUElLbQiZ
QapgY6LyQCxz7lso0F85gbjlU3DTvA5twwUEjk0rUWSwqfWOODkN1dF6NlskcIMQ
9tF+5SDORnQp4eGJbUhFasfxpCLexQP4EprWhq62lgYJcvZePQnm2ccMK/pJWBw0
5OgIX9T0dzpC/lCiHeyVAInB/Zri51o3ZdLwJAUwEOjILrJ6de4b7HeoPxAGV8Y7
21rU5QhWZgowmmGX8tbz3VM8sl5eaPXqQYS1n+b8+sIzzBgoE4rO5Ns6pYAzMHIN
7WehN724gD+RVPs7072OuhzENAKmWbpZEeu6YDaXTe5JA4XJQ/uiez7irFWtcr6k
mxUFhBoiQxEdq1Rjvt7HfWpPSUZrH3syh1OsxAecm5QNGUU+9W/L7Fbyl2OHFL5V
stTufe3i2igd4cczjrczgKheKsCVU6cfulysFnN4u0TiRnn8PHRKxT5K5KZX4O67
U7F+S8nAG2bNgBmVuwHterrypBmsAa9NAvD+3M/IWBGlgxJTDcAizwTugMZT1Nqb
lmupOJnAL7eRmJS0Bdd9Pcd3C62c1ayTVITN3hCKLMTRJPxEgChAPs2DgUOt72cR
ZQT3nbM2FAg0LFKl+kewnl0=
=44/p
-----END PGP SIGNATURE-----
Merge tag 'amd-pstate-v6.14-2024-12-18' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge amd-pstate changes for 6.14 from Mario Limonciello:
"Mostly cleanups and optimizations to increase code reuse by
shuffling around and using helpers.
Notable other changes:
* Add ftrace event for active mode to use
* Set default EPP policy on Ryzen"
* tag 'amd-pstate-v6.14-2024-12-18' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: (21 commits)
cpufreq/amd-pstate: Drop boost_state variable
cpufreq/amd-pstate: Set different default EPP policy for Epyc and Ryzen
cpufreq/amd-pstate: Drop ret variable from amd_pstate_set_energy_pref_index()
cpufreq/amd-pstate: Always write EPP value when updating perf
cpufreq/amd-pstate: Cache EPP value and use that everywhere
cpufreq/amd-pstate: Move limit updating code
cpufreq/amd-pstate: Change amd_pstate_update_perf() to return an int
cpufreq/amd-pstate: store all values in cpudata struct in khz
cpufreq/amd-pstate: Only update the cached value in msr_set_epp() on success
cpufreq/amd-pstate: Use FIELD_PREP and FIELD_GET macros
cpufreq/amd-pstate: Drop cached epp_policy variable
cpufreq/amd-pstate: convert mutex use to guard()
cpufreq/amd-pstate: Add trace event for EPP perf updates
cpufreq/amd-pstate: Merge amd_pstate_epp_cpu_offline() and amd_pstate_epp_offline()
cpufreq/amd-pstate: Remove the cppc_state check in offline/online functions
cpufreq/amd-pstate: Refactor amd_pstate_epp_reenable() and amd_pstate_epp_offline()
cpufreq/amd-pstate: Move the invocation of amd_pstate_update_perf()
cpufreq/amd-pstate: Convert the amd_pstate_get/set_epp() to static calls
cpufreq/amd-pstate: Use boost numerator for upper bound of frequencies
cpufreq/amd-pstate: Store the boost numerator as highest perf again
...
Function sugov_eas_rebuild_sd() defined in the schedutil cpufreq governor
implements generic functionality that may be useful in other places. In
particular, there is a plan to use it in the intel_pstate driver in the
future.
For this reason, move it from schedutil to the energy model code and
rename it to em_rebuild_sched_domains().
This also helps to get rid of some #ifdeffery in schedutil which is a
plus.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
epp_policy uses the same values as cpufreq_policy.policy and resets
to CPUFREQ_POLICY_UNKNOWN during offlining. Be consistent about
it and initialize to CPUFREQ_POLICY_UNKNOWN instead of 0, too.
No functional change intended.
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20241211122605.3048503-3-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently boost_state is cached for every processor in cpudata structure
and driver boost state is set for every processor.
Both of these aren't necessary as the driver only needs to set once and
the policy stores whether boost is enabled.
Move the driver boost setting to registration and adjust all references
to cached value to pull from the policy instead.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-16-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
For Ryzen systems the EPP policy set by the BIOS is generally configured
to performance as this is the default register value for the CPPC request
MSR.
If a user doesn't use additional software to configure EPP then the system
will default biased towards performance and consume extra battery. Instead
configure the default to "balanced_performance" for this case.
Suggested-by: Artem S. Tashkinov <aros@gmx.com>
Reviewed-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219526
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-15-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The ret variable is not necessary.
Reviewed-and-tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-14-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
For MSR systems the EPP value is in the same register as perf targets
and so divding them into two separate MSR writes is wasteful.
In msr_update_perf(), update both EPP and perf values in one write to
MSR_AMD_CPPC_REQ, and cache them if successful.
To accomplish this plumb the EPP value into the update_perf call and
modify all its callers to check the return value.
As this unifies calls, ensure that the MSR write is necessary before
flushing a write out. Also drop the comparison from the passive flow
tracing.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-13-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Cache the value in cpudata->epp_cached, and use that for all callers.
As all callers use cached value merge amd_pstate_get_energy_pref_index()
into show_energy_performance_preference().
Check if the EPP value is changed before writing it to MSR or
shared memory region.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-12-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The limit updating code in amd_pstate_epp_update_limit() should not
only apply to EPP updates. Move it to amd_pstate_update_min_max_limit()
so other callers can benefit as well.
With this move it's not necessary to have clamp_t calls anymore because
the verify callback is called when setting limits.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-11-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
As msr_update_perf() calls an MSR it's possible that it fails. Pass
this return code up to the caller.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-10-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Storing values in the cpudata structure in different units leads
to confusion and hardcoded conversions elsewhere. After ratios are
calculated store everything in khz for any future use. Adjust all
relevant consumers for this change as well.
Suggested-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-9-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
If writing the MSR MSR_AMD_CPPC_REQ fails then the cached value in the
amd_cpudata structure should not be updated.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-8-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
In "active" mode the most important thing for debugging whether
an issue is hardware or software based is to look at what was the
last thing written to the CPPC request MSR or shared memory region.
The 'amd_pstate_epp_perf' trace event shows the values being written
for all CPUs.
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241209185248.16301-4-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
amd_pstate_epp_offline() is only called from within
amd_pstate_epp_cpu_offline() and doesn't make much sense to have it at all.
Hence, remove it.
Also remove the unncessary debug print in the offline path while at it.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241204144842.164178-6-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Only amd_pstate_epp driver (i.e. cppc_state = ACTIVE) enters the
amd_pstate_epp_offline() and amd_pstate_epp_cpu_online() functions,
so remove the unnecessary if condition checking if cppc_state is
equal to AMD_PSTATE_ACTIVE.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241204144842.164178-5-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Replace similar code chunks with amd_pstate_update_perf() and
amd_pstate_set_epp() function calls.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241204144842.164178-4-Dhananjay.Ugwekar@amd.com
[ML: Fix LKP reported error about unused variable]
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
amd_pstate_update_perf() should not be a part of shmem_set_epp() function,
so move it to the amd_pstate_epp_update_limit() function, where it is needed.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241204144842.164178-3-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
MSR and shared memory based systems have different mechanisms to get and
set the epp value. Split those mechanisms into different functions and
assign them appropriately to the static calls at boot time. This eliminates
the need for the "if(cpu_feature_enabled(X86_FEATURE_CPPC))" checks at
runtime.
Also, propagate the error code from rdmsrl_on_cpu() and cppc_get_epp_perf()
to *_get_epp()'s caller, instead of returning -EIO unconditionally.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241204144842.164178-2-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Since HYBRID_SCALING_FACTOR_MTL is not going to be suitable for Arrow
Lake in general, drop it from the "known hybrid scaling factors" list of
platforms, so the scaling factor for it will be determined with the
help of information provided by the platform firmware via CPPC.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2307515.iZASKD2KPV@rjwysocki.net
The perf-to-frequency scaling factors are used by intel_pstate on hybrid
platforms to cast performance levels to frequency on different types of
CPUs which is needed because the generic cpufreq sysfs interface works
in the frequency domain.
For some hybrid platforms already in the field, the scaling factors are
known, but for others (including some upcoming ones) they most likely
will be different and the only way to get them that scales is to use
information provided by the platform firmware. In this particular case,
the requisite information can be obtained via CPPC.
If the P-core hybrid scaling factor for the given processor model is not
known, use CPPC to compute hybrid scaling factors for all CPUs.
Since the current default hybrid scaling factor is only suitable for a
few early hybrid platforms, add intel_hybrid_scaling_factor[] entries
for them and initialize the scaling factor to zero ("unknown") by
default.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/8476313.T7Z3S40VBb@rjwysocki.net
commit 18d9b52271 ("cpufreq/amd-pstate: Use nominal perf for limits
when boost is disabled") introduced different semantics for min/max limits
based upon whether the user turned off boost from sysfs.
This however is not necessary when the highest perf value is the boost
numerator.
Suggested-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Fixes: 18d9b52271 ("cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled")
Link: https://lore.kernel.org/r/20241209185248.16301-3-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
commit ad4caad58d ("cpufreq: amd-pstate: Merge
amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()")
changed the semantics for highest perf and commit 18d9b52271
("cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled")
worked around those semantic changes.
This however is a confusing result and furthermore makes it awkward to
change frequency limits and boost due to the scaling differences. Restore
the boost numerator to highest perf again.
Suggested-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Fixes: ad4caad58d ("cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()")
Link: https://lore.kernel.org/r/20241209185248.16301-2-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Booting with amd-pstate on 3rd Generation EPYC system incorrectly
enabled ITMT support despite the system not supporting Preferred Core
ranking. amd_pstate_init_prefcore() called during amd_pstate*_cpu_init()
requires "amd_pstate_prefcore" to be set correctly however the preferred
core support is detected only after driver registration which is too
late.
Swap the function calls around to detect preferred core support before
registring the driver via amd_pstate_register_driver(). This ensures
amd_pstate*_cpu_init() sees the correct value of "amd_pstate_prefcore"
considering the platform support.
Fixes: 279f838a61 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()")
Fixes: ff2653ded4 ("cpufreq/amd-pstate: Move registration after static function call update")
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241210032557.754-1-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
- Rework kfence support for the HPT MMU to work on systems with >= 16TB of RAM.
- Remove the powerpc "maple" platform, used by the "Yellow Dog Powerstation".
- Add support for DYNAMIC_FTRACE_WITH_CALL_OPS,
DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines.
- Add support for running KVM nested guests on Power11.
- Other small features, cleanups and fixes.
Thanks to: Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa Shulyupin,
David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert Uytterhoeven,
Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard, Lukas Bulwahn, Madhavan
Srinivasan, Markus Elfring, Michal Suchanek, Ming Lei, Mukesh Kumar Chaurasiya,
Nathan Chancellor, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A, Paulo Miguel
Almeida, Pavithra Prakash, Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P
Bappalige, Shen Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten
Blum, Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun,
zhang jiao.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRjvi15rv0TSTaE+SIF0oADX8seIQUCZ0Fi5AAKCRAF0oADX8se
IeI0AQCAkNWRYzGNzPM6aMwDpq5qdeZzvp0rZxuNsRSnIKJlxAD+PAOxOietgjbQ
Lxt3oizg+UcH/304Y/iyT8IrwI4n+gE=
=xNtu
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Rework kfence support for the HPT MMU to work on systems with >= 16TB
of RAM.
- Remove the powerpc "maple" platform, used by the "Yellow Dog
Powerstation".
- Add support for DYNAMIC_FTRACE_WITH_CALL_OPS,
DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines.
- Add support for running KVM nested guests on Power11.
- Other small features, cleanups and fixes.
Thanks to Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa
Shulyupin, David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert
Uytterhoeven, Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard,
Lukas Bulwahn, Madhavan Srinivasan, Markus Elfring, Michal Suchanek,
Ming Lei, Mukesh Kumar Chaurasiya, Nathan Chancellor, Naveen N Rao,
Nicholas Piggin, Nysal Jan K.A, Paulo Miguel Almeida, Pavithra Prakash,
Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P Bappalige, Shen
Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten Blum,
Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun, and zhang jiao.
* tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (89 commits)
EDAC/powerpc: Remove PPC_MAPLE drivers
powerpc/perf: Add per-task/process monitoring to vpa_pmu driver
powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch
docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Document sysfs event format entries for vpa_pmu
powerpc/perf: Add perf interface to expose vpa counters
MAINTAINERS: powerpc: Mark Maddy as "M"
powerpc/Makefile: Allow overriding CPP
powerpc-km82xx.c: replace of_node_put() with __free
ps3: Correct some typos in comments
powerpc/kexec: Fix return of uninitialized variable
macintosh: Use common error handling code in via_pmu_led_init()
powerpc/powermac: Use of_property_match_string() in pmac_has_backlight_type()
powerpc: remove dead config options for MPC85xx platform support
powerpc/xive: Use cpumask_intersects()
selftests/powerpc: Remove the path after initialization.
powerpc/xmon: symbol lookup length fixed
powerpc/ep8248e: Use %pa to format resource_size_t
powerpc/ps3: Reorganize kerneldoc parameter names
KVM: PPC: Book3S HV: Fix kmv -> kvm typo
powerpc/sstep: make emulate_vsx_load and emulate_vsx_store static
...
- Add virtual cpufreq driver for guest kernels (David Dai).
- Minor cleanup to various cpufreq drivers (Andy Shevchenko, Dhruva
Gole, Jie Zhan, Jinjie Ruan, Shuosheng Huang, Sibi Sankar, and Yuan
Can).
- Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check" (Colin
Ian King).
- Improve DT bindings for qcom-hw driver (Dmitry Baryshkov, Konrad
Dybcio, and Nikunj Kela).
- Make cpuidle_play_dead() try all idle states with :enter_dead()
callbacks and change their return type to void (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmdA6AcSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxSnEP/AkRFAczWdQNqOEw3H7VPZJwqnN6BIpw
0ewhxx5QHP5TfQ/u/VXxgcja0nuTDVIDJSjvnypyxKJnZa5+sl0Kj9ieaIBlPKAl
m0UmiqT4mbw4mEIBse/6pZR6DCHOYNfmYTo6BdA1JDfVPM9fEmmicKmk1NW5BEt3
FOPPZpGACzlMxiFHLhEoDsMJZKGdGZQsu5mLhkHW3pmMVU8+pgKiypO0yVVl535l
vcgvrXjBRgHVdoUi99zK3ORW3n9O/UADvampEV0e4pxpTNd0kC3U3igaOAT+S/EI
1wGOxC7WAlMJju+5aUX8qEwg19kalg8ecU49ahhE9o/dRN9lOzLYm4xurpcVWJLH
pH88XJMTZGHAgfLWD+isR4NRnIYVxdG+K85YpWd560xD1Be+oJXAlrg1PcR2PLea
CcvyH8a4fGqnpvLxx3teYFYZElXSFkARxEXrfDx0KLDsy+q7DGaoiAOf+AonXipP
8HjhuLdh2jUPqWwdZR+4Td3HMh6sS0soLGHiURA+lGVS5ucek8ZLnbNoQGS+9NAU
u64QwAqH9kiW4VxLUC17ErfZLMBU1jj+C79LaMRUoc5wfv6WU9oOZ6Xg9JcGZUGV
cEsCQqtir9o0snRXHdQv5QTij8IFqTr2iNV8vZ0cmCp8UxSMN8/41p016BZ+Ahd/
yb7PJ7hHZGm6
=aB0d
-----END PGP SIGNATURE-----
Merge tag 'pm-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These mostly are updates of cpufreq drivers used on ARM platforms plus
one new DT-based cpufreq driver for virtualized guests and two cpuidle
changes that should not make any difference on systems currently in
the field, but will be needed for future development:
- Add virtual cpufreq driver for guest kernels (David Dai)
- Minor cleanup to various cpufreq drivers (Andy Shevchenko, Dhruva
Gole, Jie Zhan, Jinjie Ruan, Shuosheng Huang, Sibi Sankar, and Yuan
Can)
- Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check"
(Colin Ian King)
- Improve DT bindings for qcom-hw driver (Dmitry Baryshkov, Konrad
Dybcio, and Nikunj Kela)
- Make cpuidle_play_dead() try all idle states with :enter_dead()
callbacks and change their return type to void (Rafael Wysocki)"
* tag 'pm-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (22 commits)
cpuidle: Change :enter_dead() driver callback return type to void
cpuidle: Do not return from cpuidle_play_dead() on callback failures
arm64: dts: qcom: sc8180x: Add a SoC-specific compatible to cpufreq-hw
dt-bindings: cpufreq: cpufreq-qcom-hw: Add SC8180X compatible
cpufreq: sun50i: add a100 cpufreq support
cpufreq: mediatek-hw: Fix wrong return value in mtk_cpufreq_get_cpu_power()
cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_power()
cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_cost()
cpufreq: loongson3: Check for error code from devm_mutex_init() call
cpufreq: scmi: Fix cleanup path when boost enablement fails
cpufreq: CPPC: Fix possible null-ptr-deref for cppc_get_cpu_cost()
cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw()
Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check"
dt-bindings: cpufreq: cpufreq-qcom-hw: Add SAR2130P compatible
cpufreq: add virtual-cpufreq driver
dt-bindings: cpufreq: add virtual cpufreq device
cpufreq: loongson2: Unregister platform_driver on failure
cpufreq: ti-cpufreq: Remove revision offsets in AM62 family
cpufreq: ti-cpufreq: Allow backward compatibility for efuse syscon
cppc_cpufreq: Remove HiSilicon CPPC workaround
...
- Set the required dev for a required OPP during genpd attach
- Add support for required OPPs to dev_pm_domain_attach_list()
pmdomain providers:
- ti: Enable GENPD_FLAG_ACTIVE_WAKEUP flag for ti_sci PM domains
- mediatek: Add support for MT6735 PM domains
- mediatek: Use OF-specific regulator API to get power domain supply
- qcom: Add support for the SM8750/SAR2130P/qcs615/qcs8300 rpmhpds
pmdomain consumers:
- Convert a couple of consumer drivers to *_pm_domain_attach|detach_list()
opp core:
- Rework and cleanup some code that manages required OPPs
- Remove *_opp_attach|detach_genpd()
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAmc7Y+QXHHVsZi5oYW5z
c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCmbrw/9FVdo3+mo6EZne+Zsxc0+3N9G
bRwkgyCB8WSFy6MHD1TRyP9u8bFgFDitSxcUGuBW0l9t1le3IcsYwfbTEEZpUkU4
iasPoZoFKA3Akfr7tvQpIpSNh8MIBMFy7CfxWpsfiHlmwrIH6oT6HmlwWwFsbVxh
Fv8xA5SOE1KRHq0Aos23h7MizPsav/PYSh/4Ga5l6ZBlm40c16cE0i0M4RRUnoNY
FmUBe57HoumDd05ToFR9wrqMEVWbAJHV4xZpZwnfYUhwGrgbxUWI/1FIfwxtob12
OExPr9kiV8/f8Kfp3E3ul0R8q8XYaYZaT3R7nF5QOngLZCCpD5H9aWEcAihUryBh
Ol9Wao0Ku1JqLc776bjwf92ozFDtx0yUN/8LXmQgUu+e+MC3eAOrsls+U4731p+Y
V80mvBpqn8AbX7LOjJvmbOhK6Qnm0cHo2cs0afBSS5c9RBcCWuEy9d7n8dks6JX2
7H6ySDaKoEEK06V6VzEKdHQoRFkmw95w4n7Ei3OL0cxNLcT1ILGA1+O7PPw+6h4T
3UnSik6szZCBXFFaDsw4J53HZACVZWbkyrk5Dbsigte8lh6e2VI+qd9whd46nuCY
Sn4vJ399FJvXBuEIu+8yymGwn/NSpgd6ybpkwdDNJ+fItbes7DQhtxM6k74ANAhf
0irf1o4Yv0S5W+pXBAM=
=SpyF
-----END PGP SIGNATURE-----
Merge tag 'pmdomain-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain updates from Ulf Hansson:
"pmdomain core:
- Set the required dev for a required OPP during genpd attach
- Add support for required OPPs to dev_pm_domain_attach_list()
pmdomain providers:
- ti: Enable GENPD_FLAG_ACTIVE_WAKEUP flag for ti_sci PM domains
- mediatek: Add support for MT6735 PM domains
- mediatek: Use OF-specific regulator API to get power domain supply
- qcom: Add support for the SM8750/SAR2130P/qcs615/qcs8300 rpmhpds
pmdomain consumers:
- Convert a couple of consumer drivers to
*_pm_domain_attach|detach_list()
opp core:
- Rework and cleanup some code that manages required OPPs
- Remove *_opp_attach|detach_genpd()"
* tag 'pmdomain-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: (25 commits)
pmdomain: qcom: rpmhpd: Add rpmhpd support for SM8750
dt-bindings: power: qcom,rpmpd: document the SM8750 RPMh Power Domains
pmdomain: imx: Use of_property_present() for non-boolean properties
pmdomain: imx: gpcv2: replace dev_err() with dev_err_probe()
pmdomain: ti-sci: Use scope based of_node_put() to simplify code.
pmdomain: ti-sci: Add missing of_node_put() for args.np
pmdomain: ti-sci: set the GENPD_FLAG_ACTIVE_WAKEUP flag for all PM domains
pmdomain: mediatek: Add support for MT6735
pmdomain: qcom: rpmhpd: add support for SAR2130P
dt-bindings: power: Add binding for MediaTek MT6735 power controller
dt-bindings: power: rpmpd: Add SAR2130P compatible
OPP: Drop redundant *_opp_attach|detach_genpd()
cpufreq: qcom-nvmem: Convert to dev_pm_domain_attach|detach_list()
media: venus: Convert into devm_pm_domain_attach_list() for OPP PM domain
drm/tegra: gr3d: Convert into devm_pm_domain_attach_list()
OPP: Drop redundant code in _link_required_opps()
pmdomain: core: Set the required dev for a required OPP during genpd attach
pmdomain: core: Manage the default required OPP from a separate function
PM: domains: Support required OPPs in dev_pm_domain_attach_list()
OPP: Rework _set_required_devs() to manage a single device per call
...
with the purpose of using such hints when making scheduling decisions
- Determine the boost enumerator for each AMD core based on its type: efficiency
or performance, in the cppc driver
- Add the type of a CPU to the topology CPU descriptor with the goal of
supporting and making decisions based on the type of the respective core
- Add a feature flag to denote AMD cores which have heterogeneous topology and
enable SD_ASYM_PACKING for those
- Check microcode revisions before disabling PCID on Intel
- Cleanups and fixlets
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7q0UACgkQEsHwGGHe
VUq27Q//TADIn/rZj95OuWLYFXduOpzdyfF6BAOabRjUpIWTGJ5YdKjj1TCA2wUE
6SiHZWQxQropB3NgeICcDT+3OGdGzE2qywzpXspUDsBPraWx+9CA56qREYafpRps
88ZQZJWHla2/0kHN5oM4fYe05mWMLAFgIhG4tPH/7sj54Zqar40nhVksz3WjKAid
yEfzbdVeRI5sNoujyHzGANXI0Fo98nAyi5Qj9kXL9W/UV1JmoQ78Rq7V9IIgOBsc
l6Gv/h0CNtH9voqfrfUb07VHk8ZqSJ37xUnrnKdidncWGCWEAoZRr7wU+I9CHKIs
tzdx+zq6JC3YN0IwsZCjk4me+BqVLJxW2oDgW7esPifye6ElyEo4T9UO9LEpE1qm
ReAByoIMdSXWwXuITwy4NxLPKPCpU7RyJCiqFzpJp0g4qUq2cmlyERDirf6eknXL
s+dmRaglEdcQT/EL+Y+vfFdQtLdwJmOu+nPPjjFxeRcIDB+u1sXJMEFbyvkLL6FE
HOdNxL+5n/3M8Lbh77KIS5uCcjXL2VCkZK2/hyoifUb+JZR/ENoqYjElkMXOplyV
KQIfcTzVCLRVvZApf/MMkTO86cpxMDs7YLYkgFxDsBjRdoq/Mzub8yzWn6kLZtmP
ANNH4uYVtjrHE1nxJSA0JgYQlJKYeNU5yhLiTLKhHL5BwDYfiz8=
=420r
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Add a feature flag which denotes AMD CPUs supporting workload
classification with the purpose of using such hints when making
scheduling decisions
- Determine the boost enumerator for each AMD core based on its type:
efficiency or performance, in the cppc driver
- Add the type of a CPU to the topology CPU descriptor with the goal of
supporting and making decisions based on the type of the respective
core
- Add a feature flag to denote AMD cores which have heterogeneous
topology and enable SD_ASYM_PACKING for those
- Check microcode revisions before disabling PCID on Intel
- Cleanups and fixlets
* tag 'x86_cpu_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Remove redundant CONFIG_NUMA guard around numa_add_cpu()
x86/cpu: Fix FAM5_QUARK_X1000 to use X86_MATCH_VFM()
x86/cpu: Fix formatting of cpuid_bits[] in scattered.c
x86/cpufeatures: Add X86_FEATURE_AMD_WORKLOAD_CLASS feature bit
x86/amd: Use heterogeneous core topology for identifying boost numerator
x86/cpu: Add CPU type to struct cpuinfo_topology
x86/cpu: Enable SD_ASYM_PACKING for PKG domain on AMD
x86/cpufeatures: Add X86_FEATURE_AMD_HETEROGENEOUS_CORES
x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix
x86/mm: Don't disable PCID when INVLPG has been fixed by microcode
Update EPP default for balance_performance to 32.
This will give better performance out of the box using Intel P-State
powersave governor while still offering power savings compared to
performance governor.
This is in line with what has already been done for Emerald Rapids and
Sapphire Rapids.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20241112235946.368082-1-srinivas.pandruvada@linux.intel.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This driver is no longer buildable since the PPC_MAPLE platform was
removed, see commit 62f8f307c8 ("powerpc/64: Remove maple platform").
Remove the driver.
Note that the comment in the driver says it supports "SMU & 970FX
based G5 Macs", but that's not true, that comment was copied from
pmac64-cpufreq.c, which still exists and continues to support those
machines.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241112085148.415574-1-mpe@ellerman.id.au
Replace the 32-bit MSR access function with a 64-bit variant to simplify
the call site, eliminating unnecessary 32-bit value manipulations.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://patch.msgid.link/20241106182313.165297-1-chang.seok.bae@intel.com
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Notice that hybrid_init_cpu_capacity_scaling() only needs to hold
hybrid_capacity_lock around __hybrid_init_cpu_capacity_scaling()
calls, so introduce a "locked" wrapper around the latter and call
it from the former. This allows to drop a local variable and a
label that are not needed any more.
Also, rename __hybrid_init_cpu_capacity_scaling() to
__hybrid_refresh_cpu_capacity_scaling() for consistency.
Interestingly enough, this fixes a locking issue introduced by commit
929ebc93cc ("cpufreq: intel_pstate: Set asymmetric CPU capacity on
hybrid systems") that put an arch_enable_hybrid_capacity_scale() call
under hybrid_capacity_lock, which was a mistake because the latter is
acquired in CPU hotplug paths and so it cannot be held around
cpus_read_lock() calls.
Link: https://lore.kernel.org/linux-pm/SJ1PR11MB6129EDBF22F8A90FC3A3EDC8B9582@SJ1PR11MB6129.namprd11.prod.outlook.com/
Fixes: 929ebc93cc ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
Link: https://patch.msgid.link/12554508.O9o76ZdvQC@rjwysocki.net
[ rjw: Changelog update ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Let's add cpufreq nvmem based for allwinner a100 soc. It's similar to h6,
let us use efuse_xlate to extract the differentiated part.
Signed-off-by: Shuosheng Huang <huangshuosheng@allwinnertech.com>
[masterr3c0rd@epochal.quest: add A100 to opp_match_list]
Signed-off-by: Cody Eksal <masterr3c0rd@epochal.quest>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Parthiban Nallathambi <parthiban@linumiz.com>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
mtk_cpufreq_get_cpu_power() return 0 if the policy is NULL. Then in
em_create_perf_table(), the later zero check for power is not invalid
as power is uninitialized. As Lukasz suggested, it must return -EINVAL when
the 'policy' is not found. So return -EINVAL to fix it.
Cc: stable@vger.kernel.org
Fixes: 4855e26bcf ("cpufreq: mediatek-hw: Add support for CPUFREQ HW")
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Suggested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
cppc_get_cpu_power() return 0 if the policy is NULL. Then in
em_create_perf_table(), the later zero check for power is not valid
as power is uninitialized. As Quentin pointed out, kernel energy model
core check the return value of active_power() first, so if the callback
failed it should tell the core. So return -EINVAL to fix it.
Fixes: a78e720756 ("cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw()")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
cppc_get_cpu_cost() return 0 if the policy is NULL. Then in
em_compute_costs(), the later zero check for cost is not valid
as cost is uninitialized. As Quentin pointed out, kernel energy model
core check the return value of get_cost() first, so if the callback
failed it should tell the core. Return -EINVAL to fix it.
Fixes: 1a1374bb8c ("cpufreq: CPPC: Fix possible null-ptr-deref for cppc_get_cpu_cost()")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/c4765377-7830-44c2-84fa-706b6e304e10@stanley.mountain/
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Even if it's not critical, the avoidance of checking the error code
from devm_mutex_init() call today diminishes the point of using devm
variant of it. Tomorrow it may even leak something. Add the missed
check.
Fixes: ccf5145414 ("cpufreq: Add Loongson-3 CPUFreq driver support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Commit 929ebc93cc ("cpufreq: intel_pstate: Set asymmetric CPU
capacity on hybrid systems") overlooked a corner case in which some
CPUs may be offline to start with and brought back online later,
after the intel_pstate driver has been registered, so their asymmetric
capacity will not be set.
Address this by calling hybrid_update_capacity() in the CPU
initialization path that is executed instead of the online path
for those CPUs.
Note that this asymmetric capacity update will be skipped during
driver initialization and mode switches because hybrid_max_perf_cpu
is NULL in those cases.
Fixes: 929ebc93cc ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1913414.tdWV9SEqCh@rjwysocki.net
Modify intel_pstate_register_driver() to clear hybrid_max_perf_cpu
before calling cpufreq_register_driver(), so that asymmetric CPU
capacity scaling is not updated until hybrid_init_cpu_capacity_scaling()
runs down the road. This is done in preparation for a subsequent
change adding asymmetric CPU capacity computation to the CPU init path
to handle CPUs that are initially offline.
The information on whether or not hybrid_max_perf_cpu was NULL before
it has been cleared is passed to hybrid_init_cpu_capacity_scaling(),
so full initialization of CPU capacity scaling can be skipped if it
has been carried out already.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4616631.LvFx2qVVIh@rjwysocki.net
cpufreq_cpu_get_raw() may return NULL if the cpu is not in
policy->cpus cpu mask and it will cause null pointer dereference,
so check NULL for cppc_get_cpu_cost().
Fixes: 740fcdc2c2 ("cpufreq: CPPC: Register EM based on efficiency class information")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
cpufreq_cpu_get_raw() may return NULL if the cpu is not in
policy->cpus cpu mask and it will cause null pointer dereference.
Fixes: 740fcdc2c2 ("cpufreq: CPPC: Register EM based on efficiency class information")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Currently the condition ((rc != -ENOTSUPP) || (rc != -EINVAL)) is always
true because rc cannot be equal to two different values at the same time,
so it must be not equal to at least one of them. Fix the original commit
that introduced the issue.
This reverts commit 22a26cc6a5.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
On shared memory designs the static functions need to work before
registration is done or the system can hang at bootup.
Move the registration later in amd_pstate_init() to solve this.
Fixes: b427ac4084 ("cpufreq/amd-pstate: Remove the redundant amd_pstate_set_driver() call")
Reported-by: Klara Modin <klarasmodin@gmail.com>
Closes: https://lore.kernel.org/linux-pm/cf9c146d-bacf-444e-92e2-15ebf513af96@gmail.com/#t
Tested-by: Klara Modin <klarasmodin@gmail.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241028145542.1739160-2-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
As the driver can be changed in and out of different modes it's possible
that adjust_perf is assigned when it shouldn't be.
This could happen if an MSR design is started up in passive mode and then
switches to active mode.
To solve this explicitly clear `adjust_perf` in amd_pstate_epp_cpu_init().
Tested-by: Klara Modin <klarasmodin@gmail.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241028145542.1739160-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Introduce a virtualized cpufreq driver for guest kernels to improve
performance and power of workloads within VMs.
This driver does two main things:
1. Sends the frequency of vCPUs as a hint to the host. The host uses the
hint to schedule the vCPU threads and decide physical CPU frequency.
2. If a VM does not support a virtualized FIE(like AMUs), it queries the
host CPU frequency by reading a MMIO region of a virtual cpufreq device
to update the guest's frequency scaling factor periodically. This enables
accurate Per-Entity Load Tracking for tasks running in the guest.
Co-developed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: David Dai <davidai@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Set min_perf to lowest_perf for shared memory systems, similar to the MSR
based systems.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241023102108.5980-5-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The EPP value being set in perf_ctrls.energy_perf is not being propagated
to the shared memory, fix that.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241023102108.5980-4-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
MSR_AMD_CPPC_ENABLE is a write once register, i.e. attempting to clear
it is futile, it will not take effect. Hence, return if disable (0)
argument is passed to the msr_cppc_enable()
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241023102108.5980-3-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
commit 642aff3964b0f ("cpufreq/amd-pstate: Set the initial min_freq to
lowest_nonlinear_freq") changed the initial minimum frequency to lowest
nonlinear frequency, but the unit tests weren't updated and now fail.
Update them to match this same change.
Fixes: 642aff3964b0f ("cpufreq/amd-pstate: Set the initial min_freq to lowest_nonlinear_freq")
Link: https://lore.kernel.org/r/20241017173439.4924-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Currently the default cpufreq driver for all the AMD EPYC servers is
acpi-cpufreq. Going forward, switch to amd-pstate as the default
driver on the AMD EPYC server platforms with CPU family 0x1A or
higher. The default mode will be active mode.
Testing shows that amd-pstate with active mode and performance
governor provides comparable or better performance per-watt against
acpi-cpufreq + performance governor.
Likewise, amd-pstate with active mode and powersave governor with the
energy_performance_preference=power (EPP=255) provides comparable or
better performance per-watt against acpi-cpufreq + schedutil governor
for a wide range of workloads.
Users can still revert to using acpi-cpufreq driver on these platforms
with the "amd_pstate=disable" kernel commandline parameter.
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241021101836.9047-3-gautham.shenoy@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The amd-pstate driver sets CPPC_REQ.min_perf to CPPC_REQ.max_perf when
in active mode with performance governor. Typically CPPC_REQ.max_perf
is set to CPPC.highest_perf. This causes frequency throttling on
power-limited platforms which causes performance regressions on
certain classes of workloads.
Hence, set the CPPC_REQ.min_perf to the CPPC.nominal_perf or
CPPC_REQ.max_perf, whichever is lower of the two.
Fixes: ffa5096a7c ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241021101836.9047-2-gautham.shenoy@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
amd_pstate_set_driver() is called twice, once in amd_pstate_init() and once
as part of amd_pstate_register_driver(). Move around code and eliminate
the redundancy.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241017100528.300143-5-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Replace the switch case with a more readable if condition.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241017100528.300143-4-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Replace a similar chunk of code in amd_pstate_register_driver() with
amd_pstate_set_driver() call.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241017100528.300143-3-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Replace a similar chunk of code in amd_pstate_init() with
amd_pstate_register() call.
Suggested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241017100528.300143-2-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
According to the AMD architectural programmer's manual volume 2 [1], in
section "17.6.4.1 CPPC_CAPABILITY_1" lowest_nonlinear_perf is described
as "Reports the most energy efficient performance level (in terms of
performance per watt). Above this threshold, lower performance levels
generally result in increased energy efficiency. Reducing performance
below this threshold does not result in total energy savings for a given
computation, although it reduces instantaneous power consumption". So
lowest_nonlinear_perf is the most power efficient performance level, and
going below that would lead to a worse performance/watt.
Also, setting the minimum frequency to lowest_nonlinear_freq (instead of
lowest_freq) allows the CPU to idle at a higher frequency which leads
to more time being spent in a deeper idle state (as trivial idle tasks
are completed sooner). This has shown a power benefit in some systems,
in other systems, power consumption has increased but so has the
throughput/watt.
Modify the initial policy_data->min set by cpufreq-core to
lowest_nonlinear_freq, in the ->verify() callback. Also set the
cpudata->req[0] to FREQ_QOS_MIN_DEFAULT_VALUE (i.e. 0), so that it also
gets overriden by the check in verify function.
Link: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf [1]
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241017053927.25285-3-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Merge the two verify() callback functions and rename the
cpufreq_policy_data argument for better readability.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241017053927.25285-2-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The EPP value doesn't need to be cached to the CPPC request in
amd_pstate_epp_update_limit() because it's passed as an argument
at the end to amd_pstate_set_epp() and stored at that time.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241012174519.897-4-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
When the EPP updates are set the maximum capable frequency for the
CPU is used to set the upper limit instead of that of the policy.
Adjust amd_pstate_epp_update_limit() to reuse policy calculation code
from amd_pstate_update_min_max_limit().
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241012174519.897-3-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
When boost is changed the CPPC value is changed in amd_pstate_cpu_boost_update()
but then changed again when refresh_frequency_limits() and all it's callbacks
occur. The first is a pointless write, so instead just update the limits for
the policy and let the policy refresh anchor everything properly.
Fixes: c8c68c38b5 ("cpufreq: amd-pstate: initialize core precision boost state")
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241012174519.897-2-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Existing function names "cppc_*" and "pstate_*" for shared memory and
MSR based systems are not intuitive enough, replace them with "shmem_*" and
"msr_*" respectively.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240917091434.10685-1-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
After commit 0edb555a65 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/cpufreq to use .remove(),
with the eventual goal to drop struct platform_driver::remove_new(). As
.remove() and .remove_new() have the same prototypes, conversion is done
by just changing the structure member name in the driver initializer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20241020153910.324096-2-u.kleine-koenig@baylibre.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When I booted my RK3588 based system I noticed that cpufreq complained
about system clock:
[ +0.007211] cpufreq: cpufreq_online: CPU0: Running at unlisted initial frequency: 816000 KHz, changing to: 1008000 KHz
Then I realized that unit is displayed wrong: "KHz" instead of "kHz".
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20240909095529.2325103-1-marcin.juszkiewicz@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When cpufreq_register_driver() returns error, the cpufreq_init() returns
without unregister platform_driver, fix by add missing
platform_driver_unregister() when cpufreq_register_driver() failed.
Fixes: f8ede0f700 ("MIPS: Loongson 2F: Add CPU frequency scaling support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
When boost has been disabled the limit for perf should be nominal perf not
the highest perf. Using the latter to do calculations will lead to
incorrect values that are still above nominal.
Fixes: ad4caad58d ("cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()")
Reported-by: Peter Jung <ptr1337@cachyos.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219348
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241012174519.897-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Rather than hooking up the PM domains through _opp_attach_genpd() and
manually manage runtime PM for the corresponding virtual devices created by
genpd during attach, let's avoid the boilerplate-code by converting into
dev_pm_domain_attach|detach_list.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20241002122232.194245-11-ulf.hansson@linaro.org
With the Silicon revision being taken directly from socinfo, there's no
longer any need for reading any SOC register for revision from this driver.
Hence, we do not require any rev_offset for AM62 family of devices.
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The AM625 syscon for efuse was being taken earlier from the wkup_conf node
where the entire wkup_conf was marked as "syscon". This is wrong and will
be fixed in the devicetree. However, whenever that does happen will end up
breaking this driver for that device because of the change in efuse offset.
Hence, to avoid breaking any sort of backward compatibility of devicetrees
use a quirk to distinguish and accordingly use 0x0 offset for the new
syscon node.
Suggested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
While switching the driver mode between active and passive, Collaborative
Processor Performance Control (CPPC) is disabled in
amd_pstate_unregister_driver(). But, it is not enabled back while registering
the new driver (passive or active). This leads to the new driver mode not
working correctly, so enable it back in amd_pstate_register_driver().
Fixes: 3ca7bc818d ("cpufreq: amd-pstate: Add guided mode control support via sysfs")
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241004122303.94283-1-Dhananjay.Ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
- Fix CPU device node reference counting in the cpufreq core (Miquel
Sabaté Solà).
- Turn the spinlock used by the intel_pstate driver in hard IRQ
context into a raw one to prevent the driver from crashing when
PREEMPT_RT is enabled (Uwe Kleine-König).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmcAJMUSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx4o0P/2eVuEGFIOurheROyFs7UrYRwODvWpgd
omE2iyihyi+E85IhOD3pPhwv8G156MhfGQ17Bwmt15jczzAs+j9ijPoX9tNxYMlh
oUq9lsF3B+1sF2VD31iRK6oUjwSQIrjoCzlEuKfW/2VQKmHFFNbj1oj6gYhtWJnb
0NfozwDMVeN+hqRAZ1TNXnQcohgpQJjkaalHJccTgSNlTJ6J3nCvamdrBRMbBJLU
DbqYWkLHw26AzARxV3isTj8dFb9R/ng6g/uCBuHGID9JbYCpQS72n2H/9yff7Msf
wHnRpaDYaAniJPFXlZDyct/LYR9JI477nYIO8jJenfNLr1640hFGBF8DyM7EREcU
ACnnbQTY5omGyhNoTRrEnSdwa54ukzABlNcldCLn1ilIvWqFrR3YFwX/tNsyBYFL
Y2HGwfEHpFoKYdUXs3wLN0prnaV0ILr71Kv9OR+d2lB/KmPCSMvYvJSpYVMrl93R
hsOkZUprd0DXZCRDr4mSFbUZcNzOwMi7c1QgueRcdJuoG8rXzi0nq3RrtV2rgoTz
beOnxUieEzWrl4OVTtbiBu2H2o2N0VM7znaNxr1hWjwzHeXpxs0ChmQRwmY450fW
bvOpqhEkrrBTGuQ0xgn41KpIV0G/X0ULrjsC9W9bmf4Oi6aUxQvtRTTvYDjH9pil
Gy3P1hPfPUkb
=mgVf
-----END PGP SIGNATURE-----
Merge tag 'pm-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one in the core and one in the
intel_pstate driver:
- Fix CPU device node reference counting in the cpufreq core (Miquel
Sabaté Solà)
- Turn the spinlock used by the intel_pstate driver in hard IRQ
context into a raw one to prevent the driver from crashing when
PREEMPT_RT is enabled (Uwe Kleine-König)"
* tag 'pm-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid a bad reference count on CPU node
cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock
Since commit 6c8d750f97 ("cpufreq / cppc: Work around for Hisilicon CPPC
cpufreq"), we introduce a workround for HiSilicon platforms that do not
support performance feedback counters, whereas they can get the actual
frequency from the desired perf register. Later on, FIE is disabled in
that workaround as well.
Now the workround can be handled by the common code. Desired perf would be
read and converted to frequency if feedback counters don't change. FIE
would be disabled if the CPPC regs are in PCC region.
Hence, the workaround is no longer needed and can be safely removed, in an
effort to consolidate the driver procedure.
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Huisong Li <lihuisong@huawei.com>
[ Viresh: Move fie_disabled withing CONFIG option to fix warning ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The CPPC performance feedback counters could be 0 or unchanged when the
target cpu is in a low-power idle state, e.g. power-gated or clock-gated.
When the counters are 0, cppc_cpufreq_get_rate() returns 0 KHz, which makes
cpufreq_online() get a false error and fail to generate a cpufreq policy.
When the counters are unchanged, the existing cppc_perf_from_fbctrs()
returns a cached desired perf, but some platforms may update the real
frequency back to the desired perf reg.
For the above cases in cppc_cpufreq_get_rate(), get the latest desired perf
from the CPPC reg to reflect the frequency because some platforms may
update the actual frequency back there; if failed, use the cached desired
perf.
Fixes: 6a4fec4f6d ("cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.")
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
notify_hwp_interrupt() is called via sysvec_thermal() ->
smp_thermal_vector() -> intel_thermal_interrupt() in hard irq context.
For this reason it must not use a simple spin_lock that sleeps with
PREEMPT_RT enabled. So convert it to a raw spinlock.
Reported-by: xiao sheng wen <atzlinux@sina.com>
Link: https://bugs.debian.org/1076483
Signed-off-by: Uwe Kleine-König <ukleinek@debian.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: xiao sheng wen <atzlinux@sina.com>
Link: https://patch.msgid.link/20240919081121.10784-2-ukleinek@debian.org
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot de Oliveira's
last major contribution to the kernel:
"SCHED_DEADLINE servers can help fixing starvation issues of low priority
tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU
cycles. Today we have RT Throttling; DEADLINE servers should be able to
replace and improve that."
(Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes,
Youssef Esmat, Huang Shijie)
- Preparatory changes for sched_ext integration:
- Use set_next_task(.first) where required
- Fix up set_next_task() implementations
- Clean up DL server vs. core sched
- Split up put_prev_task_balance()
- Rework pick_next_task()
- Combine the last put_prev_task() and the first set_next_task()
- Rework dl_server
- Add put_prev_task(.next)
(Peter Zijlstra, with a fix by Tejun Heo)
- Complete the EEVDF transition and refine EEVDF scheduling:
- Implement delayed dequeue
- Allow shorter slices to wakeup-preempt
- Use sched_attr::sched_runtime to set request/slice suggestion
- Document the new feature flags
- Remove unused and duplicate-functionality fields
- Simplify & unify pick_next_task_fair()
- Misc debuggability enhancements
(Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann,
Valentin Schneider and Chuyi Zhou)
- Initialize the vruntime of a new task when it is first enqueued,
resulting in significant decrease in latency of newly woken tasks.
(Zhang Qiao)
- Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
(K Prateek Nayak, Peter Zijlstra)
- Clean up and clarify the usage of Clean up usage of rt_task()
(Qais Yousef)
- Preempt SCHED_IDLE entities in strict cgroup hierarchies
(Tianchen Ding)
- Clarify the documentation of time units for deadline scheduler
parameters. (Christian Loehle)
- Remove the HZ_BW chicken-bit feature flag introduced a year ago,
the original change seems to be working fine.
(Phil Auld)
- Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie,
Peilin He, Qais Yousefm and Vincent Guittot)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmbr8qcRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gdbw/+Mj3zWfYP+dtUkfgrR2FClPAJoo1/9Dz0
LYD8XgYHu8rEJ0Aq+VbdkgYGUt9utvzUFPIxvWFDcldQl57KwhF4hp9Ir+PqJyYC
NolQ1q8ddo1hnslxnEg6SgHVzQq/4FqMM0nDNUkQETCx6zTyFFeRf+q7o/2c2m5B
uI9dSU1Wrx7XrXm2D3kB8+xP+ZRy+qhbFN5Pfuz96mhelfklylgKMfPzgAiCT/7T
JTbQhQ2HdcCNgiLoSrWsHBDy2UYpouP4zb4jyd+lDQzhSUJrj3u4Xy4vVmuTKq+y
sTgWlgKB+MTuh9UuJ4UYzSnMqg161UlMvtXeH84ABmAqDNGHRPtOKrrlcLtJ3D4x
m1SPhNnsvpjOu2pH0XLIS8al3VUesWND5S+rucHRYSq6Nvhivf4MTvRJlicXXurL
Mt2APnIlhGJuKBNWnmyZovVdtO0ZUUPlaZWfr3rCS4txAVo+HwWhsm3uhtTycQqN
gazsCiuGh6Jds90ZqA/BvdLWG+DY8J0xLlV3ex4pCXuQ/HFrabVWTyThJsULhrZ2
5mTdWIsocPctNMO9/RHMy7vJI7G7ljgHEquWVn5kiGGzXhK6VwVwKAMpfgXGw+YA
yVP6/M7a7g2yEzj69gXkcDa8k/kedMVquJ/G/8YhZM7u7sPqsMjpmaGsqsJRfnpT
ChngAzap+kA=
=TEC6
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot
de Oliveira's last major contribution to the kernel:
"SCHED_DEADLINE servers can help fixing starvation issues of low
priority tasks (e.g., SCHED_OTHER) when higher priority tasks
monopolize CPU cycles. Today we have RT Throttling; DEADLINE
servers should be able to replace and improve that."
(Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef
Esmat, Huang Shijie)
- Preparatory changes for sched_ext integration:
- Use set_next_task(.first) where required
- Fix up set_next_task() implementations
- Clean up DL server vs. core sched
- Split up put_prev_task_balance()
- Rework pick_next_task()
- Combine the last put_prev_task() and the first set_next_task()
- Rework dl_server
- Add put_prev_task(.next)
(Peter Zijlstra, with a fix by Tejun Heo)
- Complete the EEVDF transition and refine EEVDF scheduling:
- Implement delayed dequeue
- Allow shorter slices to wakeup-preempt
- Use sched_attr::sched_runtime to set request/slice suggestion
- Document the new feature flags
- Remove unused and duplicate-functionality fields
- Simplify & unify pick_next_task_fair()
- Misc debuggability enhancements
(Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin
Schneider and Chuyi Zhou)
- Initialize the vruntime of a new task when it is first enqueued,
resulting in significant decrease in latency of newly woken tasks
(Zhang Qiao)
- Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
(K Prateek Nayak, Peter Zijlstra)
- Clean up and clarify the usage of Clean up usage of rt_task()
(Qais Yousef)
- Preempt SCHED_IDLE entities in strict cgroup hierarchies
(Tianchen Ding)
- Clarify the documentation of time units for deadline scheduler
parameters (Christian Loehle)
- Remove the HZ_BW chicken-bit feature flag introduced a year ago,
the original change seems to be working fine (Phil Auld)
- Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie,
Peilin He, Qais Yousefm and Vincent Guittot)
* tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
sched/cpufreq: Use NSEC_PER_MSEC for deadline task
cpufreq/cppc: Use NSEC_PER_MSEC for deadline task
sched/deadline: Clarify nanoseconds in uapi
sched/deadline: Convert schedtool example to chrt
sched/debug: Fix the runnable tasks output
sched: Fix sched_delayed vs sched_core
kernel/sched: Fix util_est accounting for DELAY_DEQUEUE
kthread: Fix task state in kthread worker if being frozen
sched/pelt: Use rq_clock_task() for hw_pressure
sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c
sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
sched: Add put_prev_task(.next)
sched: Rework dl_server
sched: Combine the last put_prev_task() and the first set_next_task()
sched: Rework pick_next_task()
sched: Split up put_prev_task_balance()
sched: Clean up DL server vs core sched
sched: Fixup set_next_task() implementations
sched: Use set_next_task(.first) where required
sched/fair: Properly deactivate sched_delayed task upon class change
...
* Move the calculation of the AMD boost numerator outside of
amd-pstate, correcting acpi-cpufreq on systems with preferred cores
* Harden preferred core detection to avoid potential false positives
* Add extra unit test coverage for mode state machine
-----BEGIN PGP SIGNATURE-----
iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmbhviEaHG1hcmlvLmxp
bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3AnYqDA//TrvmXcpk1mnVJw3Y7MG0
/n8dsLpxqVtEf+USnlGR+iRhgSQ/W/Kr7b5a+jmdCwpHChuWHt2FnNgcHLIxDnZC
vmEJ02/2BCRoPKvcvV4VTh0ATu3O9nqwQiBVWBdNjDy+Dzr0pzA+SQopt1hCIsO2
mzUodhpiBqYKlMf/i6+aM1gZCGGqoRC40aGqnJsgegb61vl7zIc2ZcbTxUQlyTfv
t6J73IXLx8+YtrjejBYc7mRHhMQ2hCKy92C/8cNoGocj5faSKsAA3OUDcWq8qX0U
zK3GGGdW8MLHSbt3VyntstnfiLL7TnzowcjvrMudIWpjC1987GlE9BApbN9VRZ8e
ARN3Y7/ltjut/1fRB97BwjI9aDpzA0122Qzy4UOcK8o+be1eIr+ihV3Z9EN/snWg
0L/oq5+rGHvvIzf1BwGhoPSvgBIu7eMIYDcRxKPlEiKsbXrL4DdJC/nXgaZ/HiGO
eHx1dNy7LFrdnEwVI1frZWC6ZuZcpmOBdhnfU+leVxzB3Z++Qc266rsxKBsc5taZ
PPV18pxfbbl3iL85KDIbuBUCmA0aY8WEdCKtfXpl7zlB5g0fZQLyYeUbvahK08Sk
vyQAnPECbX/4v1Vx54Z70GPk0XD2+TXdg8yApnXrmRc36z/SLdprk5hPKbKhZu/r
iPxFUnvd0HCtjsLrsq/qUiQ=
=R4HZ
-----END PGP SIGNATURE-----
Merge tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge the second round of amd-pstate changes for 6.12 from Mario
Limonciello:
"* Move the calculation of the AMD boost numerator outside of
amd-pstate, correcting acpi-cpufreq on systems with preferred cores
* Harden preferred core detection to avoid potential false positives
* Add extra unit test coverage for mode state machine"
* tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
cpufreq/amd-pstate-ut: Add test case for mode switches
cpufreq/amd-pstate: Export symbols for changing modes
amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
x86/amd: Move amd_get_highest_perf() out of amd-pstate
ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
ACPI: CPPC: Drop check for non zero perf ratio
x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
Using uninitialized value "mode2" when calling "amd_pstate_get_mode_string".
Set "mode2" to "AMD_PSTATE_DISABLE" by default.
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Link: https://lore.kernel.org/r/20240910233923.46470-1-qianqiang.liu@163.com
Acked-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
There is a state machine in the amd-pstate driver utilized for
switches for all modes. To make sure that cleanup and setup works
properly for each mode add a unit test case that tries all
combinations.
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
In order to effectively test all mode switch combinations export
everything necessarily for amd-pstate-ut to trigger a mode switch.
Reviewed-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Don't take and release the mutex when prefcore isn't present and
avoid initialization of variables that will be initially set
in the function.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The special case in amd_pstate_highest_perf_set() is the value used
for calculating the boost numerator. Merge this into
amd_get_boost_ratio_numerator() and then use that to calculate boost
ratio.
This allows dropping more special casing of the highest perf value.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
AMD systems that support preferred cores will use "166" as their
numerator for max frequency calculations instead of "255".
Add a function for detecting preferred cores by looking at the
highest perf value on all cores.
If preferred cores are enabled return 166 and if disabled the
value in the highest perf register. As the function will be called
multiple times, cache the values for the boost numerator and if
preferred cores will be enabled in global variables.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
amd_pstate_get_highest_perf() is a helper used to get the highest perf
value on AMD systems. It's used in amd-pstate as part of preferred
core handling, but applicable for acpi-cpufreq as well.
Move it out to cppc handling code as amd_get_highest_perf().
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The function name is ambiguous because it returns an intermediate value
for calculating maximum frequency rather than the CPPC 'Highest Perf'
register.
Rename the function to clarify its use and allow the function to return
errors. Adjust the consumer in acpi-cpufreq to catch errors.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Convert the cppc deadline task attributes to use the available
definitions to make them more readable.
No functional change.
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20240813144348.1180344-4-christian.loehle@arm.com
- Several OF related cleanups in cpufreq drivers (Rob Herring).
- Enable COMPILE_TEST for ARM drivers (Rob Herrring).
- Introduce quirks for syscon failures and use socinfo to get revision
for TI cpufreq driver (Dhruva Gole and Nishanth Menon).
- Minor cleanups in amd-pstate driver (Anastasia Belova and Dhananjay
Ugwekar).
- Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
(Danila Tikhonov, Huacai Chen, and Liu Jing).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmbalyoACgkQ0rkcPK6B
Ehw9IBAAus+BOdYMzU8VT7j8Y98oOfb5FsJCoTU2KaV2RIIpX4k+6daruCOm0BXP
RtRiI+ILV5zLUm8CIC15f2GQE6PtDBFmjky7ItEemcbQPlTkpkZFWNFhBqE1u3hw
jllA4p1LmUwAnr1zkwl2CEUJSRJBxWPeTxPL0Ci6pycFhiNPZwGqOreJQRsIMOh3
pgohKSBebxpzgwES8fhR32CqaHphrEFCryHafZIqzsXSBuyETGEKg57zTmdo6ojy
GDuaIz6kQ9lKvW/q9iwTih93SsBnzDD85AAERDZkUDxey5IBLztrJLH5QT/XN77K
EQOHeygwyKk4su00fXy/LXmMqKHCN/mAHgb6JvWBIm2xbDWx6drBJyV/NdX4YI4w
4m1SqmFH9Cv41UIcynQR83XthGKgIddjEDKPW0GNMQ+LHWlUS6Qm4Kb2q4rXruqD
bUWs3NmZEvYD9P2XOKGHgfSPZ0iNXi0Lt5BBIWbeIPNwaikxHisNsNG1W2pMsfke
n19cvt20aBJgx2s5acIH7Po8qQglrGGK9EKWRg8gInvtB7QRbHBhXVD6ZNwuIk/7
u2+Y42R4R1GzwsD3EUl+RnnUFgRwhg53OIzcE+AaaMDqGeTdxmG42eg0jGSBA7yx
KbljH9PAfsMjjEjsVYReiIYxS28PZNyTBaxZJxD2RyxMz53CV9w=
=BlNP
-----END PGP SIGNATURE-----
Merge tag 'cpufreq-arm-updates-6.12' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge ARM cpufreq updates for 6.12 from Viresh Kumar:
"- Several OF related cleanups in cpufreq drivers (Rob Herring).
- Enable COMPILE_TEST for ARM drivers (Rob Herrring).
- Introduce quirks for syscon failures and use socinfo to get revision
for TI cpufreq driver (Dhruva Gole and Nishanth Menon).
- Minor cleanups in amd-pstate driver (Anastasia Belova and Dhananjay
Ugwekar).
- Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
(Danila Tikhonov, Huacai Chen, and Liu Jing)."
* tag 'cpufreq-arm-updates-6.12' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: ti-cpufreq: Use socinfo to get revision in AM62 family
cpufreq: Fix the cacography in powernv-cpufreq.c
cpufreq: ti-cpufreq: Introduce quirks to handle syscon fails appropriately
cpufreq: loongson3: Use raw_smp_processor_id() in do_service_request()
cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value
cpufreq: Add SM7325 to cpufreq-dt-platdev blocklist
cpufreq: Fix warning on unused of_device_id tables for !CONFIG_OF
cpufreq/amd-pstate: Add the missing cpufreq_cpu_put()
cpufreq: Drop CONFIG_ARM and CONFIG_ARM64 dependency on Arm drivers
cpufreq: Enable COMPILE_TEST on Arm drivers
cpufreq: armada-8k: Avoid excessive stack usage
cpufreq: omap: Drop asm includes
cpufreq: qcom: Add explicit io.h include for readl/writel_relaxed
cpufreq: spear: Use of_property_for_each_u32() instead of open coding
cpufreq: Use of_property_present()
* Validate return of any attempt to update EPP limits, which fixes
the masking hardware problems.
-----BEGIN PGP SIGNATURE-----
iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmbYvoIaHG1hcmlvLmxp
bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3AnY6ZxAAw0vB5p/mC8QfS5VuDm0O
Up+dMtjVf9VcIOx4OoqSPNTY4HmNUnBSmACfeY/LzUa22BhM+7SJ74y8UoXGYGc8
GKzuVDRsbqnuMrGqbOd8u+eJhcc8fln7zJVa1xBOM8V5yHqvnxjAsnwVieVhjmKa
7m5K8ht/tJsCpaY/BjF7iHMgq950UGjO+pUXzrYP5ARV1O367DMWVQ91X9NBKwvN
w/mapWmH+mpp0//2hL7wzrtGIfYjdndFP3xM0+7v5MoEd9KuSl7xCVObGidDOEaz
oxfMCfPPpHMJLwD/L6VOwq/AIomJQVUZS8111ucnbdkIVW5QOlSaFwKJG/KmFvUQ
98lfX2xAaGB26tvWQ3l+vFQ7Qces0NrJfCLbpS2NdQ2oSq1+1ab3Yeh1wDXYph7o
hF+Xa/qJHzIFwzeQ2EmdXwthpoXbF5mmIiBZCTE/v1WpBVkYj+lJGtU+WsyAwFJT
qPASAqrRo8KoCfhdxYT47Rc3AowUWtHoJQlZNzTdbmc7KXrk+7NfDUZ5ifLq3puI
FAPWg76NM8aefoattsC57+6/V2bygbe1wIQaXRrybZiO9cMtY0eQ9bE4ysIm0ZQ8
bSPEHg1iYl3OrMzwRYnBHmev8E4jmKdKSnBBPSqD3ObeR3fNZQs1pxWc3CtHP8LA
p+AEE66CGAzsed5wudcEoOc=
=nPcu
-----END PGP SIGNATURE-----
Merge tag 'amd-pstate-v6.12-2024-09-04' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge an amd-pstate driver update for 6.12 from Mario Limonciello:
"amd-pstate development for 6.12:
* Validate return of any attempt to update EPP limits, which fixes
the masking hardware problems."
* tag 'amd-pstate-v6.12-2024-09-04' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Catch failures for amd_pstate_epp_update_limit()
amd_pstate_set_epp() calls cppc_set_epp_perf() which can fail for
a variety of reasons but this is ignored. Change the return flow
to allow failures.
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
In the AM62x, AM62Ax, and AM62Px devices, we already have the revision
info within the k3-socinfo driver. Hence, re-use this information from
there instead of re using the offset for 2 drivers trying to get the
same information ie. revision.
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Commit b4bc9f9e27 ("cpufreq: ti-cpufreq: add support for omap34xx
and omap36xx") introduced special handling for OMAP3 class devices
where syscon node may not be present. However, this also creates a bug
where the syscon node is present, however the offset used to read
is beyond the syscon defined range.
Fix this by providing a quirk option that is populated when such
special handling is required. This allows proper failure for all other
platforms when the syscon node and efuse offsets are mismatched.
Fixes: b4bc9f9e27 ("cpufreq: ti-cpufreq: add support for omap34xx and omap36xx")
Signed-off-by: Nishanth Menon <nm@ti.com>
Tested-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Use raw_smp_processor_id() instead of plain smp_processor_id() in
do_service_request(), otherwise we may get some errors with the driver
enabled:
BUG: using smp_processor_id() in preemptible [00000000] code: (udev-worker)/208
caller is loongson3_cpufreq_probe+0x5c/0x250 [loongson3_cpufreq]
Reported-by: Xi Ruoyao <xry111@xry111.site>
Tested-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>