linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-31 22:23:05 +00:00

Author	SHA1	Message	Date
Srinivas Pandruvada	8bdab3c8f2	cpufreq: intel_pstate: Update Meteor Lake EPPs Update the default balance_performance EPP to 64. This gives better performance and also perf/watt compared to current value of 115. For example: Speedometer 2.1 score: +19% Perf/watt: +5.25% Webxprt 4 score score: +12% Perf/watt: +6.12% 3DMark Wildlife extreme unlimited score score: +3.2% Perf/watt: +11.5% Geekbench6 MT score: +2.14% Perf/watt: +0.32% Also update balance_power EPP default to 179. With this change: Video Playback power is reduced by 52% Team video conference power is reduced by 35% With Power profile daemon now sets balance_power EPP on DC instead of balance_performance, updating balance_power EPP will help to extend battery life. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-07 21:05:09 +02:00
Tony Luck	ca8752384c	cpufreq: intel_pstate: Switch to new Intel CPU model defines New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-07 20:33:46 +02:00
Srinivas Pandruvada	1e24c31351	cpufreq: intel_pstate: Fix unchecked HWP MSR access Fix unchecked MSR access error for processors with no HWP support. On such processors, maximum frequency can be changed by the system firmware using ACPI event ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED. This results in accessing HWP MSR 0x771. Call Trace: <TASK> generic_exec_single+0x58/0x120 smp_call_function_single+0xbf/0x110 rdmsrl_on_cpu+0x46/0x60 intel_pstate_get_hwp_cap+0x1b/0x70 intel_pstate_update_limits+0x2a/0x60 acpi_processor_notify+0xb7/0x140 acpi_ev_notify_dispatch+0x3b/0x60 HWP MSR 0x771 can be only read on a CPU which supports HWP and enabled. Hence intel_pstate_get_hwp_cap() can only be called when hwp_active is true. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/linux-pm/20240529155740.Hq2Hw7be@linutronix.de/ Fixes: `e8217b4bec` ("cpufreq: intel_pstate: Update the maximum CPU frequency consistently") Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-03 18:00:23 +02:00
Jeff Johnson	0a206fe35d	cpufreq: intel_pstate: fix struct cpudata::epp_cached kernel-doc make C=1 currently gives the following warning: drivers/cpufreq/intel_pstate.c:262: warning: Function parameter or struct member 'epp_cached' not described in 'cpudata' Add the missing ":" to fix the trivial kernel-doc syntax error. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-05-07 15:02:43 +02:00
Arnd Bergmann	8c556541a5	cpufreq: intel_pstate: hide unused intel_pstate_cpu_oob_ids[] The reference to this variable is hidden in an #ifdef: drivers/cpufreq/intel_pstate.c:2440:32: error: 'intel_pstate_cpu_oob_ids' defined but not used [-Werror=unused-const-variable=] Use the same check around the definition. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-03 16:41:59 +02:00
Rafael J. Wysocki	e8217b4bec	cpufreq: intel_pstate: Update the maximum CPU frequency consistently There are 3 places at which the maximum CPU frequency may change, store_no_turbo(), intel_pstate_update_limits() (when called by the cpufreq core) and intel_pstate_notify_work() (when handling a HWP change notification). Currently, cpuinfo.max_freq is only updated by store_no_turbo() and intel_pstate_notify_work(), although it principle it may be necessary to update it in intel_pstate_update_limits() either. Make all of them mutually consistent. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:57:22 +02:00
Rafael J. Wysocki	f32587dcbe	cpufreq: intel_pstate: Replace three global.turbo_disabled checks Replace the global.turbo_disabled in __intel_pstate_update_max_freq() with a global.no_turbo one to make store_no_turbo() actually update the maximum CPU frequency on the trubo preference changes, which needs to be consistent with arch_set_max_freq_ratio() called from there. For more consistency, replace the global.turbo_disabled checks in __intel_pstate_cpu_init() and intel_cpufreq_adjust_perf() with global.no_turbo checks either. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:57:17 +02:00
Rafael J. Wysocki	9558fae8ce	cpufreq: intel_pstate: Read global.no_turbo under READ_ONCE() Because global.no_turbo is generally not read under intel_pstate_driver_lock make store_no_turbo() use WRITE_ONCE() for updating it (this is the only place at which it is updated except for the initialization) and make the majority of places reading it use READ_ONCE(). Also remove redundant global.turbo_disabled checks from places that depend on the 'true' value of global.no_turbo because it can only be 'true' if global.turbo_disabled is also 'true'. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:57:10 +02:00
Rafael J. Wysocki	c626a43845	cpufreq: intel_pstate: Rearrange show_no_turbo() and store_no_turbo() Now that global.turbo_disabled can only change at the cpufreq driver registration time, initialize global.no_turbo at that time too so they are in sync to start with (if the former is set, the latter cannot be updated later anyway). That allows show_no_turbo() to be simlified because it does not need to check global.turbo_disabled and store_no_turbo() can be rearranged to avoid doing anything if the new value of global.no_turbo is equal to the current one and only return an error on attempts to clear global.no_turbo when global.turbo_disabled. While at it, eliminate the redundant ret variable from store_no_turbo(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:57:04 +02:00
Rafael J. Wysocki	0940f1a801	cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization The global.turbo_disabled is updated quite often, especially in the passive mode in which case it is updated every time the scheduler calls into the driver. However, this is generally not necessary and it adds MSR read overhead to scheduler code paths (and that particular MSR is slow to read). For this reason, make the driver read MSR_IA32_MISC_ENABLE_TURBO_DISABLE just once at the cpufreq driver registration time and remove all of the in-flight updates of global.turbo_disabled. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:56:58 +02:00
Rafael J. Wysocki	032c5565eb	cpufreq: intel_pstate: Fold intel_pstate_max_within_limits() into caller Fold intel_pstate_max_within_limits() into its only caller. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:56:51 +02:00
Rafael J. Wysocki	e97a98238d	cpufreq: intel_pstate: Use __ro_after_init for three variables There are at least 3 variables in intel_pstate that do not get updated after they have been initialized, so annotate them with __ro_after_init. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:43 +02:00
Rafael J. Wysocki	0f2828e17b	cpufreq: intel_pstate: Get rid of unnecessary READ_ONCE() annotations Drop two redundant checks involving READ_ONCE() from notify_hwp_interrupt() and make it check hwp_active without READ_ONCE() which is not necessary, because that variable is only set once during the early initialization of the driver. In order to make that clear, annotate hwp_active with __ro_after_init. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:40 +02:00
Rafael J. Wysocki	432acb219a	cpufreq: intel_pstate: Wait for canceled delayed work to complete Make intel_pstate_disable_hwp_interrupt() wait for canceled delayed work to complete to avoid leftover work items running when it returns which may be during driver unregistration and may confuse things going forward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:36 +02:00
Rafael J. Wysocki	12ebba42d2	cpufreq: intel_pstate: Simplify spinlock locking Because intel_pstate_enable/disable_hwp_interrupt() are only called from thread context, they need not save the IRQ flags when using a spinlock as interrupts are guaranteed to be enabled when they run, so make them use spin_lock/unlock_irq(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:33 +02:00
Rafael J. Wysocki	f186b2dace	cpufreq: intel_pstate: Drop redundant locking from intel_pstate_driver_cleanup() Remove the spinlock locking from intel_pstate_driver_cleanup() as it is not necessary because no other code accessing all_cpu_data[] can run in parallel with that function. Had the locking been necessary, though, it would have been incorrect because the lock in question is acquired from a hardirq handler and it cannot be acquired from thread context without disabling interrupts. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:08 +02:00
Rafael J. Wysocki	e4d0d7f194	Merge back cpufreq material for 6.9-rc1.	2024-03-11 15:08:45 +01:00
Srinivas Pandruvada	1f4b7fdd71	cpufreq: intel_pstate: Update default EPPs for Meteor Lake Update default balanced_performance EPP to 115 and performance EPP to 16. Changing the balanced_performance EPP has better performance/watt compared to default powerup EPP value of 128. Changing the performance EPP to 0x10 shows reduced power for similar performance as EPP 0. On small form factor devices this is beneficial as lower power results in lower CPU and skin temperature. This results in reduced thermal throttling and higher performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-02-24 15:02:33 +01:00
Srinivas Pandruvada	240a8da623	cpufreq: intel_pstate: Allow model specific EPPs The current implementation allows model specific EPP override for balanced_performance. Add feature to allow model specific EPP for all predefined EPP strings. For example for some CPU models, even changing performance EPP has benefits Use a mask of EPPs as driver_data instead of just balanced_performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-02-24 15:02:33 +01:00
Doug Smythies	f0a0fc10ab	cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back There is a loophole in pstate limit clamping for the intel_cpufreq CPU frequency scaling driver (intel_pstate in passive mode), schedutil CPU frequency scaling governor, HWP (HardWare Pstate) control enabled, when the adjust_perf call back path is used. Fix it. Fixes: `a365ab6b9d` cpufreq: intel_pstate: Implement the ->adjust_perf() callback Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-02-24 15:01:59 +01:00
Jiri Slaby (SUSE)	4615ac9010	cpufreq: intel_pstate: remove cpudata::prev_cummulative_iowait Commit `09c448d3c6` ("cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm") removed the last user of cpudata::prev_cummulative_iowait. Remove the member too. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-02-13 13:47:35 +01:00
Rafael J. Wysocki	192cdb1c90	cpufreq: intel_pstate: Refine computation of P-state for given frequency On systems using HWP, if a given frequency is equal to the maximum turbo frequency or the maximum non-turbo frequency, the HWP performance level corresponding to it is already known and can be used directly without any computation. Accordingly, adjust the code to use the known HWP performance levels in the cases mentioned above. This also helps to avoid limiting CPU capacity artificially in some cases when the BIOS produces the HWP_CAP numbers using a different E-core-to-P-core performance scaling factor than expected by the kernel. Fixes: `f5c8cf2a49` ("cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores") Cc: 6.1+ <stable@vger.kernel.org> # 6.1+ Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-01-22 15:18:11 +01:00
Srinivas Pandruvada	bde4f5ff82	cpufreq: intel_pstate: Update hybrid scaling factor for Meteor Lake On some Meteor Lake platforms, maximum one core turbo frequency is not observed. During hybrid performance to frequency conversion, the maximum frequency is 100 MHz less. This results in requesting maximum frequency 100 MHz less. For example when the max one core turbo is 4.9 GHz: MSR HWP_CAPABILITIES shows highest performance ratio for P-core is 0x3E. With the current scaling factor of 78741 (1.27x for converting frequency to performance) results in max frequency of 4.8 GHz. This results in capping the max scaling frequency as 4.8 GHz, which is 100 MHz less than the desired. Add capability to define per CPU model specific scaling factor and define scaling factor of 80000 (1.25x for converting frequency to performance for P-cores) for Meteor Lake. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Debug message adjustment, subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-01-10 15:02:25 +01:00
Zhenguo Yao	e95013156a	cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode Users may disable HWP in firmware, in which case intel_pstate will give up unless the CPU model is explicitly supported. See also the following past commits: - commit `df51f287b5` ("cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode") - commit `d8de7a44e1` ("cpufreq: intel_pstate: Add Skylake servers support") - commit `706c532885` ("cpufreq: intel_pstate: Add Cometlake support in no-HWP mode") - commit `fbdc21e9b0` ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode") - commit `71bb5c82aa` ("cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode") Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-12-19 21:38:09 +01:00
Srinivas Pandruvada	2719675fa8	cpufreq: intel_pstate: Prioritize firmware-provided balance performance EPP The platform firmware can provide a balance performance EPP value by enabling HWP and programming the EPP to the desired value. However, currently this only takes effect for processors listed in intel_epp_balance_perf[], so in order to enable a new processor model to utilize this mechanism, that table needs to be updated. It arguably should not be necessary to modify the kernel to work properly with every new generation of processors, though, and distributions that don't always ship the most recent kernels should be able to run reasonably well on new hardware without code changes. For this reason, move the check to avoid updating the EPP when the balance performance EPP is unmodified from the power-up default of 0x80 after the check that allows the firmware-provided balance performance EPP value to be retrieved. This will cause the code to always look for the firmware- provided value before consulting intel_epp_balance_perf[] and the handling of new hardware will not depend on whether or not that thable has been updated yet. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-12-05 21:29:47 +01:00
Srinivas Pandruvada	37b6ddba96	cpufreq: intel_pstate: Revise global turbo disable check Setting global turbo flag based on CPU 0 P-state limits is problematic as it limits max P-state request on every CPU on the system just based on its P-state limits. There are two cases in which global.turbo_disabled flag is set: - When the MSR_IA32_MISC_ENABLE_TURBO_DISABLE bit is set to 1 in the MSR MSR_IA32_MISC_ENABLE. This bit can be only changed by the system BIOS before power up. - When the max non turbo P-state is same as max turbo P-state for CPU 0. The second check is not a valid to decide global turbo state based on the CPU 0. CPU 0 max turbo P-state can be same as max non turbo P-state, but for other CPUs this may not be true. There is no guarantee that max P-state limits are same for every CPU. This is possible that during fusing max P-state for a CPU is constrained. Also with the Intel Speed Select performance profile, CPU 0 may not be present in all profiles. In this case the max non turbo and turbo P-state can be set to the lowest possible P-state by the hardware when switched to such profile. Since max non turbo and turbo P-state is same, global.turbo_disabled flag will be set. Once global.turbo_disabled is set, any scaling max and min frequency update for any CPU will result in its max P-state constrained to the max non turbo P-state. Hence remove the check of max non turbo P-state equal to max turbo P-state of CPU 0 to set global turbo disabled flag. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-09-11 20:18:17 +02:00
Doug Smythies	d51847acb0	cpufreq: intel_pstate: set stale CPU frequency to minimum The intel_pstate CPU frequency scaling driver does not use policy->cur and it is 0. When the CPU frequency is outdated arch_freq_get_on_cpu() will default to the nominal clock frequency when its call to cpufreq_quick_getpolicy_cur returns the never updated 0. Thus, the listed frequency might be outside of currently set limits. Some users are complaining about the high reported frequency, albeit stale, when their system is idle and/or it is above the reduced maximum they have set. This patch will maintain policy_cur for the intel_pstate driver at the current minimum CPU frequency. Reported-by: Yang Jie <yang.jie@linux.intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217597 Signed-off-by: Doug Smythies <dsmythies@telus.net> [ rjw: White space damage fixes and comment adjustment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-08-21 21:11:40 +02:00
Srinivas Pandruvada	0fcfc9e519	cpufreq: intel_pstate: Fix scaling for hybrid-capable systems with disabled E-cores Some system BIOS configuration may provide option to disable E-cores. As part of this change, CPUID feature for hybrid (Leaf 7 sub leaf 0, EDX[15] = 0) may not be set. But HWP performance limits will still be using a scaling factor like any other hybrid enabled system. The current check for applying scaling factor will fail when hybrid CPUID feature is not set and the only way to make sure that scaling should be applied by checking CPPC nominal frequency and nominal performance. First, or systems predating Alder Lake, the CPPC nominal frequency and nominal performance are 0, which can be used to distinguish those systems from hybrid systems with disabled E-cores. Second, if the CPPC nominal frequency and nominal performance are defined, which indicates the need to use a special scaling factor, and the nominal performance value multiplied by 100 is not equal to the nominal frequency one, use hybrid scaling factor. This can be done for all HWP systems without additional CPU model check. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits, removal of unneeded parens, comment edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-06-30 10:55:43 +02:00
Tero Kristo	03f44ffb3d	cpufreq: intel_pstate: Fix energy_performance_preference for passive If the intel_pstate driver is set to passive mode, then writing the same value to the energy_performance_preference sysfs twice will fail. This is caused by the wrong return value used (index of the matched energy_perf_string), instead of the length of the passed in parameter. Fix by forcing the internal return value to zero when the same preference is passed in by user. This same issue is not present when active mode is used for the driver. Fixes: `f6ebbcf08f` ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Reported-by: Niklas Neronin <niklas.neronin@intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-06-21 19:42:58 +02:00
Linus Torvalds	556eb8b791	Driver core changes for 6.4-rc1 Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5 LEGadNS38k5fs+73UaxV =7K4B -----END PGP SIGNATURE----- Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[\|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...	2023-04-27 11:53:57 -07:00
Srinivas Pandruvada	1f5e62f5fb	cpufreq: intel_pstate: Enable HWP IO boost for all servers The HWP IO boost results in slight improvements for IO performance on both Ice Lake and Sapphire Rapid servers. Currently there is a CPU model check for Skylake desktop and server along with the ACPI PM profile for performance and enterprise servers to enable IO boost. Remove the CPU model check, so that all current server models enable HWP IO boost by default. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-03-17 18:37:30 +01:00
Greg Kroah-Hartman	2744a63c1a	cpufreq: move to use bus_get_dev_root() Direct access to the struct bus_type dev_root pointer is going away soon so replace that with a call to bus_get_dev_root() instead, which is what it is there for. Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: linux-pm@vger.kernel.org Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230313182918.1312597-3-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-17 15:29:01 +01:00
Linus Torvalds	c8b4accf86	More power management updates for 6.3-rc1 - Fix error handling in the apple-soc cpufreq driver (Dan Carpenter). - Change the log level of a message in the amd-pstate cpufreq driver so it is more visible to users (Kai-Heng Feng). - Adjust the balance_performance EPP value for Sapphire Rapids in the intel_pstate cpufreq driver (Srinivas Pandruvada). - Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick Alcock). - Make a read-only kobj_type structure in the schedutil cpufreq governor constant (Thomas Weißschuh). - Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL power capping driver (Sumeet Pawnikar). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmQCNm0SHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRx0P8QAJdcjEg4cY/OJq9VTqiUo92mt63aqWXr vOMYgp5QDuDRPKiVMMuCJNCJSrA/xPwAz4pcSOj/59CZn1UZqzg6Ap/vdmV7pQV1 vDp6N9E2sgCL0QMQqueyv3yEN9meleGrQPnAulOZf29Q9PC0rGiAUdivvgDXZq/9 9wo+/11EzZpKyaDTfLzotIvNOkmmkp5FtxaCR64D5w3e6gehGrL/wL3FgSefDnsa fNlhxgi66FYuamSgXqQTkuIuig0Rbvlp0fmhllPaIOkNMI4o7rvP5rB7FbAKZm1E XI+M3aVlZsImPpEEJ1dTqbc4y9WU9HakLfRRSiUnnWHSXpwBI8ncEwP0oulqqVoF elA9kd7Sv1DwLiUGMy3GaOwscTN6NDUwICH4UJeISWlMZfrP7YR3Bb7gVtzlCrKC b99oA92OFazzWliYPSzzSsFQTI1PczNLZKnelzgF1+5g76q3sIUa3tu6Ts6Z84UK rHCLDVw8TUFpsEQxKOvM3oLUdpvE0mmQpdG8fSeHVxon47jVzmkbDjgOBFp1i19F HBI7LfOhDkkzO35qb6+DfkQoij0mpn8ldbVVLd1XQe4F0WjfSe8D4oGTgB3ErTK7 8cOKGoez9FsQRHbSVCaZcaTSemDHdB3UMRUHG2hvU1jbwXOaQcLfMAF5McyECJiN U8uI28JVGVPI =lRKS -----END PGP SIGNATURE----- Merge tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These update power capping (new hardware support and cleanup) and cpufreq (bug fixes, cleanups and intel_pstate adjustment for a new platform). Specifics: - Fix error handling in the apple-soc cpufreq driver (Dan Carpenter) - Change the log level of a message in the amd-pstate cpufreq driver so it is more visible to users (Kai-Heng Feng) - Adjust the balance_performance EPP value for Sapphire Rapids in the intel_pstate cpufreq driver (Srinivas Pandruvada) - Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick Alcock) - Make a read-only kobj_type structure in the schedutil cpufreq governor constant (Thomas Weißschuh) - Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL power capping driver (Sumeet Pawnikar)" * tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: apple-soc: Fix an IS_ERR() vs NULL check powercap: remove MODULE_LICENSE in non-modules cpufreq: intel_pstate: remove MODULE_LICENSE in non-modules powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC cpufreq: amd-pstate: remove MODULE_LICENSE in non-modules cpufreq: schedutil: make kobj_type structure constant cpufreq: amd-pstate: Let user know amd-pstate is disabled cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire Rapids	2023-03-03 10:30:58 -08:00
Nick Alcock	5bd289f69b	cpufreq: intel_pstate: remove MODULE_LICENSE in non-modules Since commit `8b41fc4454` ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-02-28 21:28:23 +01:00
Srinivas Pandruvada	60675225eb	cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire Rapids While the majority of server OS distributions are deployed with the "performance" governor as the default, some distributions like Ubuntu use the "powersave" governor by default. While using the "powersave" governor in its default configuration on Sapphire Rapids systems leads to much lower power, the performance is lower by more than 25% for several workloads relative to the "performance" governor. A 37% difference has been reported by www.Phoronix.com [1]. This is a consequence of using a relatively high EPP value in the default configuration of the "powersave" governor and the performance can be made much closer to the "performance" governor's level by adjusting the default EPP value. Based on experiments, with EPP of 0x00, 0x10, 0x20, the performance delta between the "powersave" governor and the "performance" one is around 12%. However, the EPP of 0x20 reduces average power by 18% with respect to the lower EPP values. [Note that raising min_perf_pct in sysfs as high as 50% in addition to adjusting EPP does not improve the performance any further.] For this reason, change the EPP value corresponding to the the default balance_performance setting for Sapphire Rapids to 0x20, which is straightforward, because analogous default EPP adjustment has been applied to Alder Lake and there is a way to set the balance_performance EPP value in intel_pstate based on the processor model already. The goal here is to limit the mean performance delta between the "powersave" governor in the default configuration and the "performance" governor for a wide variety of server workloadsto to around 10-12%. For some bursty workloads, this delta can be still large, as the frequency ramp-up will still lag when the "powersave" governor is in use irrespective of the EPP setting, because the performance governor always requests the maximum possible frequency. Link: https://www.phoronix.com/review/centos-clear-spr/6 # [1] Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-02-23 19:52:51 +01:00
Rafael J. Wysocki	e8a0e30b74	cpufreq: intel_pstate: Drop ACPI _PSS states table patching After making acpi_processor_get_platform_limit() use the "no limit" value for its frequency QoS request when _PPC returns 0, it is not necessary to replace the frequency corresponding to the first _PSS return package entry with the maximum turbo frequency of the given CPU in intel_pstate_init_acpi_perf_limits() any more, so drop the code doing that along with the comment explaining it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-12-30 19:10:02 +01:00
Giovanni Gherdovich	df51f287b5	cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode Users may disable HWP in firmware, in which case intel_pstate wouldn't load unless the CPU model is explicitly supported. See also the following past commits: commit `d8de7a44e1` ("cpufreq: intel_pstate: Add Skylake servers support") commit `706c532885` ("cpufreq: intel_pstate: Add Cometlake support in no-HWP mode") commit `fbdc21e9b0` ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode") commit `71bb5c82aa` ("cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode") Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-11-30 19:56:17 +01:00
Srinivas Pandruvada	21cdb6c18f	cpufreq: intel_pstate: Allow EPP 0x80 setting by the firmware With the "commit `3d13058ed2` ("cpufreq: intel_pstate: Use firmware default EPP")" the firmware can set an EPP, which driver will not overwrite. But the driver has a valid range check for: 0x40 > firmware epp < 0x80. Hence firmware can't specify EPP of 0x80. If the firmware didn't specify in the valid range, the driver has a hard coded EPP of 102. But some Chrome hardware vendors don't want this overwrite and wants to boot with chipset default EPP of 0x80 as this improves battery life. In this case they want to have capability to specify EPP of 0x80 via the firmware. This require the valid range to include 0x80 also. But here the valid range can't be simply extended to include 0x80 as this is the chipset default EPP. Even without any firmware specifying EPP, the chipset will always boot with EPP of 0x80. To make sure that firmware specified EPP of 0x80 and not by the chipset default, it will require additional check to make sure HWP was enabled by the firmware before boot. Only way the firmware can update EPP, is to enable HWP and update EPP via MSR_HWP_REQUEST. This driver already checks, if the HWP is enabled by the firmware. Use the same flag and extend valid range to include 0x80. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-11-03 19:21:52 +01:00
Rafael J. Wysocki	f5c8cf2a49	cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores Commit `46573fd636` ("cpufreq: intel_pstate: hybrid: Rework HWP calibration") attempted to use the information from CPPC (the nominal performance in particular) to obtain the scaling factor allowing the frequency to be computed if the HWP performance level of the given CPU is known or vice versa. However, it turns out that on some platforms this doesn't work, because the CPPC information on them does not align with the contents of the MSR_HWP_CAPABILITIES registers. This basically means that the only way to make intel_pstate work on all of the hybrid platforms to date is to use the observation that on all of them the scaling factor between the HWP performance levels and frequency for P-cores is 78741 (approximately 100000/1.27). For E-cores it is 100000, which is the same as for all of the non-hybrid "core" platforms and does not require any changes. Accordingly, make intel_pstate use 78741 as the scaling factor between HWP performance levels and frequency for P-cores on all hybrid platforms and drop the dependency of the HWP calibration code on CPPC. Fixes: `46573fd636` ("cpufreq: intel_pstate: hybrid: Rework HWP calibration") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.15+ <stable@vger.kernel.org> # 5.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-10-25 15:09:23 +02:00
Rafael J. Wysocki	8dbab94d45	cpufreq: intel_pstate: Read all MSRs on the target CPU Some of the MSR accesses in intel_pstate are carried out on the CPU that is running the code, but the values coming from them are used for the performance scaling of the other CPUs. This is problematic, for example, on hybrid platforms where MSR_TURBO_RATIO_LIMIT for P-cores and E-cores is different, so the values read from it on a P-core are generally not applicable to E-cores and the other way around. For this reason, make the driver access all MSRs on the target CPU on platforms using the "core" pstate_funcs callbacks which is the case for all of the hybrid platforms released to date. For this purpose, pass a CPU argument to the ->get_max(), ->get_max_physical(), ->get_min() and ->get_turbo() pstate_funcs callbacks and from there pass it to rdmsrl_on_cpu() or rdmsrl_safe_on_cpu() to access the MSR on the target CPU. Fixes: `46573fd636` ("cpufreq: intel_pstate: hybrid: Rework HWP calibration") Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.15+ <stable@vger.kernel.org> # 5.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-10-25 15:09:23 +02:00
Doug Smythies	71bb5c82aa	cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode Users may disable HWP in firmware, in which case intel_pstate wouldn't load unless the CPU model is explicitly supported. Add TIGERLAKE to the list of CPUs that can register intel_pstate while not advertising the HWP capability. Without this change, an TIGERLAKE in no-HWP mode could only use the acpi_cpufreq frequency scaling driver. See also commits: `d8de7a44e1`: cpufreq: intel_pstate: Add Skylake servers support `fbdc21e9b0`: cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode `706c532885`: cpufreq: intel_pstate: Add Cometlake support in no-HWP mode Reported by: M. Cargi Ari <cagriari@pm.me> Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-09-10 18:40:40 +02:00
Srinivas Pandruvada	bbd67f1b5a	cpufreq: intel_pstate: Support Sapphire Rapids OOB mode Prevent intel_pstate to load when OOB (Out Of Band) P-states mode is enabled in Sapphire Rapids. The OOB identifying bits are same as the prior generation CPUs like Ice Lake servers. So, also add Sapphire Rapids to intel_pstate_cpu_oob_ids list. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-11 19:17:43 +02:00
Chen Yu	addca28512	cpufreq: intel_pstate: Handle no_turbo in frequency invariance Problem statement: Once the user has disabled turbo frequency by # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo the cfs_rq's util_avg becomes quite small when compared with CPU capacity. Step to reproduce: # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo # ./x86_cpuload --count 1 --start 3 --timeout 100 --busy 99 would launch 1 thread and bind it to CPU3, lasting for 100 seconds, with a CPU utilization of 99%. [1] top result: %Cpu3 : 98.4 us, 0.0 sy, 0.0 ni, 1.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st check util_avg: cat /sys/kernel/debug/sched/debug \| grep "cfs_rq\[3\]" -A 20 \| grep util_avg .util_avg : 611 So the util_avg/cpu capacity is 611/1024, which is much smaller than 98.4% shown in the top result. This might impact some logic in the scheduler. For example, group_is_overloaded() would compare the group_capacity and group_util in the sched group, to check if this sched group is overloaded or not. With this gap, even when there is a nearly 100% workload, the sched group will not be regarded as overloaded. Besides group_is_overloaded(), there are also other victims. There is a ongoing work that aims to optimize the task wakeup in a LLC domain. The main idea is to stop searching idle CPUs if the sched domain is overloaded[2]. This proposal also relies on the util_avg/CPU capacity to decide whether the LLC domain is overloaded. Analysis: CPU frequency invariance has caused this difference. In summary, the util_sum of cfs rq would decay quite fast when the CPU is in idle, when the CPU frequency invariance is enabled. The detail is as followed: As depicted in update_rq_clock_pelt(), when the frequency invariance is enabled, there would be two clock variables on each rq, clock_task and clock_pelt: The clock_pelt scales the time to reflect the effective amount of computation done during the running delta time but then syncs back to clock_task when rq is idle. absolute time \| 1\| 2\| 3\| 4\| 5\| 6\| 7\| 8\| 9\|10\|11\|12\|13\|14\|15\|16 @ max frequency ------****---------------**--------------- @ half frequency ------********---------**********--------- clock pelt \| 1\| 2\| 3\| 4\| 7\| 8\| 9\| 10\| 11\|14\|15\|16 The fast decay of util_sum during idle is due to: 1. rq->clock_pelt is always behind rq->clock_task 2. rq->last_update is updated to rq->clock_pelt' after invoking ___update_load_sum() 3. Then the CPU becomes idle, the rq->clock_pelt' would be suddenly increased a lot to rq->clock_task 4. Enters ___update_load_sum() again, the idle period is calculated by rq->clock_task - rq->last_update, AKA, rq->clock_task - rq->clock_pelt'. The lower the CPU frequency is, the larger the delta = rq->clock_task - rq->clock_pelt' will be. Since the idle period will be used to decay the util_sum only, the util_sum drops significantly during idle period. Proposal: This symptom is not only caused by disabling turbo frequency, but it would also appear if the user limits the max frequency at runtime. Because, if the frequency is always lower than the max frequency, CPU frequency invariance would decay the util_sum quite fast during idle. As some end users would disable turbo after boot up, this patch aims to present this symptom and deals with turbo scenarios for now. It might be ideal if CPU frequency invariance is aware of the max CPU frequency (user specified) at runtime in the future. Link: https://github.com/yu-chen-surf/x86_cpuload.git #1 Link: https://lore.kernel.org/lkml/20220310005228.11737-1-yu.c.chen@intel.com/ #2 Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-04-13 17:38:04 +02:00
Srinivas Pandruvada	3d13058ed2	cpufreq: intel_pstate: Use firmware default EPP For some specific platforms (E.g. AlderLake) the balance performance EPP is updated from the hard coded value in the driver. This acts as the default and balance_performance EPP. The purpose of this EPP update is to reach maximum 1 core turbo frequency (when possible) out of the box. Although we can achieve the objective by using hard coded value in the driver, there can be other EPP which can be better in terms of power. But that will be very subjective based on platform and use cases. This is not practical to have a per platform specific default hard coded in the driver. If a platform wants to specify default EPP, it can be set in the firmware. If this EPP is not the chipset default of 0x80 (balance_perf_epp unless driver changed it) and more performance oriented but not 0, the driver can use this as the default and balanced_perf EPP. In this case no driver update is required every time there is some new platform and default EPP. If the firmware didn't update the EPP from the chipset default then the hard coded value is used as per existing implementation. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-03-16 19:14:55 +01:00
Rafael J. Wysocki	dfeeedc1bf	cpufreq: intel_pstate: Update cpuinfo.max_freq on HWP_CAP changes With HWP enabled, when the turbo range of performance levels is disabled by the platform firmware, the CPU capacity is given by the "guaranteed performance" field in MSR_HWP_CAPABILITIES which is generally dynamic. When it changes, the kernel receives an HWP notification interrupt handled by notify_hwp_interrupt(). When the "guaranteed performance" value changes in the above configuration, the CPU performance scaling needs to be adjusted so as to use the new CPU capacity in computations, which means that the cpuinfo.max_freq value needs to be updated for that CPU. Accordingly, modify intel_pstate_notify_work() to read MSR_HWP_CAPABILITIES and update cpuinfo.max_freq to reflect the new configuration (this update can be carried out even if the configuration doesn't actually change, because it simply doesn't matter then and it takes less time to update it than to do extra checks to decide whether or not a change has really occurred). Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-12-22 18:36:07 +01:00
Srinivas Pandruvada	b6e6f8beec	cpufreq: intel_pstate: Update EPP for AlderLake mobile There is an expectation from users that they can get frequency specified by cpufreq/cpuinfo_max_freq when conditions permit. But with AlderLake mobile it may not be possible. This is possible that frequency is clipped based on the system power-up EPP value. In this case users can update cpufreq/energy_performance_preference to some performance oriented EPP to limit clipping of frequencies. To get out of box behavior as the prior generations of CPUs, update EPP for AlderLake mobile CPUs on boot. On prior generations of CPUs EPP = 128 was enough to get maximum frequency, but with AlderLake mobile the equivalent EPP is 102. Since EPP is model specific, this is possible that they have different meaning on each generation of CPU. The current EPP string "balance_performance" corresponds to EPP = 128. Change the EPP corresponding to "balance_performance" to 102 for only AlderLake mobile CPUs and update this on each CPU during boot. To implement reuse epp_values[] array and update the modified EPP at the index for BALANCE_PERFORMANCE. Add a dummy EPP_INDEX_DEFAULT to epp_values[] to match indexes in the energy_perf_strings[]. After HWP PM is enabled also update EPP when "balance_performance" is redefined for the very first time after the boot on each CPU. On subsequent suspend/resume or offline/online the old EPP is restored, so no specific action is needed. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-12-17 16:44:15 +01:00
Rafael J. Wysocki	458b03f81a	cpufreq: intel_pstate: Drop redundant intel_pstate_get_hwp_cap() call It is not necessary to call intel_pstate_get_hwp_cap() from intel_pstate_update_perf_limits(), because it gets called from intel_pstate_verify_cpu_policy() which is either invoked directly right before intel_pstate_update_perf_limits(), in intel_cpufreq_verify_policy() in the passive mode, or called from driver callbacks in a sequence that causes it to be followed by an immediate intel_pstate_update_perf_limits(). Namely, in the active mode intel_cpufreq_verify_policy() is called by intel_pstate_verify_policy() which is the ->verify() callback routine of intel_pstate and gets called by the cpufreq core right before intel_pstate_set_policy(), which is the driver's ->setoplicy() callback routine, where intel_pstate_update_perf_limits() is called. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-12-17 16:26:50 +01:00
Srinivas Pandruvada	03c83982a0	cpufreq: intel_pstate: ITMT support for overclocked system On systems with overclocking enabled, CPPC Highest Performance can be hard coded to 0xff. In this case even if we have cores with different highest performance, ITMT can't be enabled as the current implementation depends on CPPC Highest Performance. On such systems we can use MSR_HWP_CAPABILITIES maximum performance field when CPPC.Highest Performance is 0xff. Due to legacy reasons, we can't solely depend on MSR_HWP_CAPABILITIES as in some older systems CPPC Highest Performance is the only way to identify different performing cores. Reported-by: Michael Larabel <Michael@MichaelLarabel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Michael Larabel <Michael@MichaelLarabel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-11-23 14:11:18 +01:00
Rafael J. Wysocki	ed38eb49d1	cpufreq: intel_pstate: Fix active mode offline/online EPP handling After commit `4adcf2e582` ("cpufreq: intel_pstate: Add ->offline and ->online callbacks") the EPP value set by the "performance" scaling algorithm in the active mode is not restored after an offline/online cycle which replaces it with the saved EPP value coming from user space. Address this issue by forcing intel_pstate_hwp_set() to set a new EPP value when it runs first time after online. Fixes: `4adcf2e582` ("cpufreq: intel_pstate: Add ->offline and ->online callbacks") Link: https://lore.kernel.org/linux-pm/adc7132c8655bd4d1c8b6129578e931a14fe1db2.camel@linux.intel.com/ Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-11-23 14:02:11 +01:00
Adamos Ttofari	cd23f02f16	cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs Commit `fbdc21e9b0` ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode") enabled the use of Intel P-State driver for Ice Lake servers. But it doesn't cover the case when OS can't control P-States. Therefore, for Ice Lake server, if MSR_MISC_PWR_MGMT bits 8 or 18 are enabled, then the Intel P-State driver should exit as OS can't control P-States. Fixes: `fbdc21e9b0` ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode") Signed-off-by: Adamos Ttofari <attofari@amazon.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-11-23 14:02:10 +01:00
Srinivas Pandruvada	074d0cdfbb	cpufreq: intel_pstate: Clear HWP Status during HWP Interrupt enable It is possible that some performance excursions happened before OS boot or enable HWP interrupts. So clear MSR_HWP_STATUS bits when we enable HWP interrupt. In this way a next excursion will results in a HWP interrupt. The status bits of MSR_HWP_STATUS must be cleared (0) by software so that a new status condition change will cause the hardware to set the bit again and issue the notification. Fixes: `57577c996d` ("cpufreq: intel_pstate: Process HWP Guaranteed change notification") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-11-04 19:48:47 +01:00
Srinivas Pandruvada	5521055670	cpufreq: intel_pstate: Fix unchecked MSR 0x773 access It is possible that on some platforms HWP interrupts are disabled. In that case accessing MSR 0x773 will result in warning. So check X86_FEATURE_HWP_NOTIFY feature to access MSR 0x773. The other places in code where this MSR is accessed, already checks this feature except during disable path called during cpufreq offline and suspend callbacks. Fixes: `57577c996d` ("cpufreq: intel_pstate: Process HWP Guaranteed change notification") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-11-04 19:48:47 +01:00
Rafael J. Wysocki	dbea75fe18	cpufreq: intel_pstate: Clear HWP desired on suspend/shutdown and offline Commit `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") caused intel_pstate to use nonzero HWP desired values in certain usage scenarios, but it did not prevent them from being leaked into the confugirations in which HWP desired is expected to be 0. The failing scenarios are switching the driver from the passive mode to the active mode and starting a new kernel via kexec() while intel_pstate is running in the passive mode. To address this issue, ensure that HWP desired will be cleared on offline and suspend/shutdown. Fixes: `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Reported-by: Julia Lawall <julia.lawall@inria.fr> Tested-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-11-04 19:48:47 +01:00
Zhang Rui	c72bcf0ab8	cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization Fix a problem in active mode that cpu->pstate.turbo_freq is initialized only if HWP-to-frequency scaling factor is refined. In passive mode, this problem is not exposed, because cpu->pstate.turbo_freq is set again, later in intel_cpufreq_cpu_init()->intel_pstate_get_hwp_cap(). Fixes: `eb3693f052` ("cpufreq: intel_pstate: hybrid: CPU-specific scaling factor") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-10-26 16:00:50 +02:00
Srinivas Pandruvada	57577c996d	cpufreq: intel_pstate: Process HWP Guaranteed change notification It is possible that HWP guaranteed ratio is changed in response to change in power and thermal limits. For example when Intel Speed Select performance profile is changed or there is change in TDP, hardware can send notifications. It is possible that the guaranteed ratio is increased. This creates an issue when turbo is disabled, as the old limits set in MSR_HWP_REQUEST are still lower and hardware will clip to older limits. This change enables HWP interrupt and process HWP interrupts. When guaranteed is changed, calls cpufreq_update_policy() so that driver callbacks are called to update to new HWP limits. This callback is called from a delayed workqueue of 10ms to avoid frequent updates. Although the scope of IA32_HWP_INTERRUPT is per logical cpu, on some plaforms interrupt is generated on all CPUs. This is particularly a problem during initialization, when the driver didn't allocated data for other CPUs. So this change uses a cpumask of enabled CPUs and process interrupts on those CPUs only. When the cpufreq offline() or suspend() callback is called, HWP interrupt is disabled on those CPUs and also cancels any pending work item. Spin lock is used to protect data and processing shared with interrupt handler. Here READ_ONCE(), WRITE_ONCE() macros are used to designate shared data, even though spin lock act as an optimization barrier here. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: pablomh@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-10-05 15:30:44 +02:00
Doug Smythies	d9a7e9df73	cpufreq: intel_pstate: Override parameters if HWP forced by BIOS If HWP has been already been enabled by BIOS, it may be necessary to override some kernel command line parameters. Once it has been enabled it requires a reset to be disabled. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-09-13 19:26:08 +02:00
Rafael J. Wysocki	46573fd636	cpufreq: intel_pstate: hybrid: Rework HWP calibration The current HWP calibration for hybrid processors in intel_pstate is fragile, because it depends too much on the information provided by the platform firmware via CPPC which may not be reliable enough. It also need not be so complicated. In order to improve that mechanism and make it more resistant to platform firmware issues, make it only use the CPPC nominal_perf values to compute the HWP-to-frequency scaling factors for all CPUs and possibly use the HWP_CAP highest_perf values to recompute them if the ones derived from the CPPC nominal_perf values alone appear to be too high. Namely, fetch CPC.nominal_perf for all CPUs present in the system, find the minimum one and use it as a reference for computing all of the CPUs' scaling factors (using the observation that for the CPUs having the minimum CPC.nominal_perf the HWP range of available performance levels should be the same as the range of available "legacy" P-states and so the HWP-to-frequency scaling factor for them should be the same as the corresponding scaling factor used for representing the P-state values in kHz). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com>	2021-09-07 21:15:16 +02:00
Rafael J. Wysocki	dd7c46d6e5	Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification" Revert commit `d0e936adbd` ("cpufreq: intel_pstate: Process HWP Guaranteed change notification"), because it causes a NULL pointer dereference to occur on Lenovo X1 gen9 laptops due to an HWP guaranteed performance change interrupt arriving prematurely. This feature will be revisited in the next cycle. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-09-07 15:39:45 +02:00
Srinivas Pandruvada	d0e936adbd	cpufreq: intel_pstate: Process HWP Guaranteed change notification It is possible that HWP guaranteed ratio is changed in response to change in power and thermal limits. For example when Intel Speed Select performance profile is changed or there is change in TDP, hardware can send notifications. It is possible that the guaranteed ratio is increased. This creates an issue when turbo is disabled, as the old limits set in MSR_HWP_REQUEST are still lower and hardware will clip to older limits. This change enables HWP interrupt and process HWP interrupts. When guaranteed is changed, calls cpufreq_update_policy() so that driver callbacks are called to update to new HWP limits. This callback is called from a delayed workqueue of 10ms to avoid frequent updates. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-08-25 20:09:37 +02:00
Sebastian Andrzej Siewior	09681a0772	cpufreq: Replace deprecated CPU-hotplug functions The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-08-04 20:16:32 +02:00
Rafael J. Wysocki	49d6feef94	cpufreq: intel_pstate: Combine ->stop_cpu() and ->offline() Combine the ->stop_cpu() and ->offline() callback routines for intel_pstate in the active mode so as to avoid setting the ->stop_cpu callback pointer which is going to be dropped from the framework. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-06-30 18:44:46 +02:00
Rafael J. Wysocki	8df71a7dc5	cpufreq: intel_pstate: hybrid: Fix build with CONFIG_ACPI unset One of the previous commits introducing hybrid processor support to intel_pstate broke build with CONFIG_ACPI unset. Fix that and while at it make empty stubs of two functions related to ACPI CPPC static inline and fix a spelling mistake in the name of one of them. Fixes: `eb3693f052` ("cpufreq: intel_pstate: hybrid: CPU-specific scaling factor") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested	2021-06-07 13:47:57 +02:00
Giovanni Gherdovich	706c532885	cpufreq: intel_pstate: Add Cometlake support in no-HWP mode Users may disable HWP in firmware, in which case intel_pstate wouldn't load unless the CPU model is explicitly supported. See also commit `d8de7a44e1` ("cpufreq: intel_pstate: Add Skylake servers support"). Suggested-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-05-21 18:47:18 +02:00
Giovanni Gherdovich	fbdc21e9b0	cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode Users may disable HWP in firmware, in which case intel_pstate wouldn't load unless the CPU model is explicitly supported. Add ICELAKE_X to the list of CPUs that can register intel_pstate while not advertising the HWP capability. Without this change, an ICELAKE_X in no-HWP mode could only use the acpi_cpufreq frequency scaling driver. See also commit `d8de7a44e1` ("cpufreq: intel_pstate: Add Skylake servers support"). Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-05-21 18:47:18 +02:00
Rafael J. Wysocki	eb3693f052	cpufreq: intel_pstate: hybrid: CPU-specific scaling factor The scaling factor between HWP performance levels and CPU frequency may be different for different types of CPUs in a hybrid processor and in general the HWP performance levels need not correspond to "P-states" representing values that would be written to MSR_IA32_PERF_CTL if HWP was disabled. However, the policy limits control in cpufreq is defined in terms of CPU frequency, so it is necessary to map the frequency limits set through that interface to HWP performance levels with reasonable accuracy and the behavior of that interface on hybrid processors has to be compatible with its behavior on non-hybrid ones. To address this problem, use the observations that (1) on hybrid processors the sysfs interface can operate by mapping frequency to "P-states" and translating those "P-states" to specific HWP performance levels of the given CPU and (2) the scaling factor between the MSR_IA32_PERF_CTL "P-states" and CPU frequency can be regarded as a known value. Moreover, the mapping between the HWP performance levels and CPU frequency can be assumed to be linear and such that HWP performance level 0 correspond to the frequency value of 0, so it is only necessary to know the frequency corresponding to one specific HWP performance level to compute the scaling factor applicable to all of them. One possibility is to take the nominal performance value from CPPC, if available, and use cpu_khz as the corresponding frequency. If the CPPC capabilities interface is not there or the nominal performance value provided by it is out of range, though, something else needs to be done. Namely, the guaranteed performance level either from CPPC or from MSR_HWP_CAPABILITIES can be used instead, but the corresponding frequency needs to be determined. That can be done by computing the product of the (known) scaling factor between the MSR_IA32_PERF_CTL P-states and CPU frequency (the PERF_CTL scaling factor) and the P-state value referred to as the "TDP ratio". If the HWP-to-frequency scaling factor value obtained in one of the ways above turns out to be euqal to the PERF_CTL scaling factor, it can be assumed that the number of HWP performance levels is equal to the number of P-states and the given CPU can be handled as though this was not a hybrid processor. Otherwise, one more adjustment may still need to be made, because the HWP-to-frequency scaling factor computed so far may not be accurate enough (e.g. because the CPPC information does not match the exact behavior of the processor). Specifically, in that case the frequency corresponding to the highest HWP performance value from MSR_HWP_CAPABILITIES (computed as the product of that value and the HWP-to-frequency scaling factor) cannot exceed the frequency that corresponds to the maximum 1-core turbo P-state value from MSR_TURBO_RATIO_LIMIT (computed as the procuct of that value and the PERF_CTL scaling factor) and the HWP-to-frequency scaling factor may need to be adjusted accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-05-21 18:44:55 +02:00
Rafael J. Wysocki	c3d175e485	cpufreq: intel_pstate: hybrid: Avoid exposing two global attributes The turbo_pct and num_pstates sysfs attributes represent CPU properties that may be different for differenty types of CPUs in a hybrid processor, so avoid exposing them in that case. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-05-21 18:44:55 +02:00
Rafael J. Wysocki	e5af36b2ad	cpufreq: intel_pstate: Use HWP if enabled by platform firmware It turns out that there are systems where HWP is enabled during initialization by the platform firmware (BIOS), but HWP EPP support is not advertised. After commit `7aa1031223` ("cpufreq: intel_pstate: Avoid enabling HWP if EPP is not supported") intel_pstate refuses to use HWP on those systems, but the fallback PERF_CTL interface does not work on them either because of enabled HWP, and once enabled, HWP cannot be disabled. Consequently, the users of those systems cannot control CPU performance scaling. Address this issue by making intel_pstate use HWP unconditionally if it is enabled already when the driver starts. Fixes: `7aa1031223` ("cpufreq: intel_pstate: Avoid enabling HWP if EPP is not supported") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+	2021-05-10 13:22:39 +02:00
Rafael J. Wysocki	b989bc0f3c	cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits() Because pstate.max_freq is always equal to the product of pstate.max_pstate and pstate.scaling and, analogously, pstate.turbo_freq is always equal to the product of pstate.turbo_pstate and pstate.scaling, the result of the max_policy_perf computation in intel_pstate_update_perf_limits() is always equal to the quotient of policy_max and pstate.scaling, regardless of whether or not turbo is disabled. Analogously, the result of min_policy_perf in intel_pstate_update_perf_limits() is always equal to the quotient of policy_min and pstate.scaling. Accordingly, intel_pstate_update_perf_limits() need not check whether or not turbo is enabled at all and in order to compute max_policy_perf and min_policy_perf it can always divide policy_max and policy_min, respectively, by pstate.scaling. Make it do so. While at it, move the definition and initialization of the turbo_max local variable to the code branch using it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com>	2021-04-09 17:16:12 +02:00
Rafael J. Wysocki	de5bcf404a	cpufreq: intel_pstate: Clean up frequency computations Notice that some computations related to frequency in intel_pstate can be simplified if (a) intel_pstate_get_hwp_max() updates the relevant members of struct cpudata by itself and (b) the "turbo disabled" check is moved from it to its callers, so modify the code accordingly and while at it rename intel_pstate_get_hwp_max() to intel_pstate_get_hwp_cap() which better reflects its purpose and provide a simplified variat of it, __intel_pstate_get_hwp_cap(), suitable for the initialization path. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com>	2021-03-23 19:41:14 +01:00
Nigel Christian	75a8d877d6	cpufreq: intel_pstate: Remove repeated word In the comment for trace in passive mode there is an unnecessary "the". Eradicate it. Signed-off-by: Nigel Christian <nigel.l.christian@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-01-22 17:04:56 +01:00
Chen Yu	6f67e06008	cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available Currently, when turbo is disabled (either by BIOS or by the user), the intel_pstate driver reads the max non-turbo frequency from the package-wide MSR_PLATFORM_INFO(0xce) register. However, on asymmetric platforms it is possible in theory that small and big core with HWP enabled might have different max non-turbo CPU frequency, because MSR_HWP_CAPABILITIES is per-CPU scope according to Intel Software Developer Manual. The turbo max freq is already per-CPU in current code, so make similar change to the max non-turbo frequency as well. Reported-by: Wendy Wang <wendy.wang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Subject and changelog edits ] Cc: 4.18+ <stable@vger.kernel.org> # 4.18+: `a45ee4d4e1`: cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-01-12 19:44:49 +01:00
Rafael J. Wysocki	597ffbc8d0	cpufreq: intel_pstate: Rename two functions Rename intel_cpufreq_adjust_hwp() and intel_cpufreq_adjust_perf_ctl() to intel_cpufreq_hwp_update() and intel_cpufreq_perf_ctl_update(), respectively, to avoid possible confusion with the ->adjist_perf() callback function, intel_cpufreq_adjust_perf(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com>	2021-01-12 19:34:05 +01:00
Rafael J. Wysocki	a45ee4d4e1	cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument All of the callers of intel_pstate_get_hwp_max() access the struct cpudata object that corresponds to the given CPU already and the function itself needs to access that object (in order to update hwp_cap_cached), so modify the code to pass a struct cpudata pointer to it instead of the CPU number. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com>	2021-01-12 19:34:04 +01:00
Rafael J. Wysocki	9dd04ec6bc	cpufreq: intel_pstate: Always read hwp_cap_cached with READ_ONCE() Because intel_pstate_get_hwp_max() which updates hwp_cap_cached may run in parallel with the readers of it, annotate all of the read accesses to it with READ_ONCE(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com>	2021-01-12 19:34:04 +01:00
Lukas Bulwahn	c4151604f0	cpufreq: intel_pstate: remove obsolete functions percent_fp() was used in intel_pstate_pid_reset(), which was removed in commit `9d0ef7af1f` ("cpufreq: intel_pstate: Do not use PID-based P-state selection") and hence, percent_fp() is unused since then. percent_ext_fp() was last used in intel_pstate_update_perf_limits(), which was refactored in commit `1a4fe38add` ("cpufreq: intel_pstate: Remove max/min fractions to limit performance"), and hence, percent_ext_fp() is unused since then. make CC=clang W=1 points us those unused functions: drivers/cpufreq/intel_pstate.c:79:23: warning: unused function 'percent_fp' [-Wunused-function] static inline int32_t percent_fp(int percent) ^ drivers/cpufreq/intel_pstate.c:94:23: warning: unused function 'percent_ext_fp' [-Wunused-function] static inline int32_t percent_ext_fp(int percent) ^ Remove those obsolete functions. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-01-07 18:22:46 +01:00
Rafael J. Wysocki	17ffd35809	cpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf() If turbo P-states cannot be used, either due to the configuration of the processor, or because intel_pstate is not allowed to used them, the maximum available P-state with HWP enabled corresponds to the HWP_CAP.GUARANTEED value which is not static. It can be adjusted by an out-of-band agent or during an Intel Speed Select performance level change, so long as it remains less than or equal to HWP_CAP.MAX. However, if turbo P-states cannot be used, intel_cpufreq_adjust_perf() always uses pstate.max_pstate (set during the initialization of the driver only) as the maximum available P-state, so it may miss a change of the HWP_CAP.GUARANTEED value. Prevent that from happening by modifyig intel_cpufreq_adjust_perf() to always read the "guaranteed" and "maximum turbo" performance levels from the cached HWP_CAP value. Fixes: `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2021-01-07 17:34:32 +01:00
Rafael J. Wysocki	be1283454b	cpufreq: intel_pstate: Fix fast-switch fallback path When sugov_update_single_perf() falls back to the "frequency" path due to the missing scale-invariance, it will call cpufreq_driver_fast_switch() via sugov_fast_switch() and the driver's ->fast_switch() callback will be invoked, so it must not be NULL. However, after commit `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to NULL when it is going to use intel_cpufreq_adjust_perf(), which is a mistake, because on x86 the scale-invariance may be turned off dynamically, so modify it to retain the original ->adjust_perf() callback pointer. Fixes: `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-30 18:22:17 +01:00
Rafael J. Wysocki	e40ad84c26	cpufreq: intel_pstate: Use most recent guaranteed performance values When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is changed later, user space may want to take advantage of this increased guaranteed performance. HWP_CAP.GUARANTEED is not a static value. It can be adjusted by an out-of-band agent or during an Intel Speed Select performance level change. The HWP_CAP.MAX is still the maximum achievable performance with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still change as long as it remains less than or equal to HWP_CAP.MAX. When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency attribute shows the most recent guaranteed frequency value. This attribute can be used by user space software to update the scaling min/max limits of the CPU. Currently, the ->setpolicy() callback already uses the latest HWP_CAP values when setting HWP_REQ, but the ->verify() callback will restrict the user settings to the to old guaranteed performance value which prevents user space from making use of the extra CPU capacity theoretically available to it after increasing HWP_CAP.GUARANTEED. To address this, read HWP_CAP in intel_pstate_verify_cpu_policy() to obtain the maximum P-state that can be used and use that to confine the policy max limit instead of using the cached and possibly stale pstate.max_freq value for this purpose. For consistency, update intel_pstate_update_perf_limits() to use the maximum available P-state returned by intel_pstate_get_hwp_max() to compute the maximum frequency instead of using the return value of intel_pstate_get_max_freq() which, again, may be stale. This issue is a side-effect of fixing the scaling frequency limits in commit `eacc9c5a92` ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") which corrected the setting of the reduced scaling frequency values, but caused stale HWP_CAP.GUARANTEED to be used in the case at hand. Fixes: `eacc9c5a92` ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.8+ <stable@vger.kernel.org> # 5.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-21 10:51:00 +01:00
Rafael J. Wysocki	a365ab6b9d	cpufreq: intel_pstate: Implement the ->adjust_perf() callback Make intel_pstate expose the ->adjust_perf() callback when it operates in the passive mode with HWP enabled which causes the schedutil governor to use that callback instead of ->fast_switch(). The minimum and target performance-level values passed by the governor to ->adjust_perf() are converted to HWP.REQ.MIN and HWP.REQ.DESIRED, respectively, which allows the processor to adjust its configuration to maximize energy-efficiency while providing sufficient capacity. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-15 19:24:19 +01:00
Rafael J. Wysocki	2554c32f0b	cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() Avoid doing the same assignment in both branches of a conditional, do it after the whole conditional instead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-11 19:53:58 +01:00
Rafael J. Wysocki	fcb3a1ab79	cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account Make intel_pstate take the new CPUFREQ_GOV_STRICT_TARGET governor flag into account when it operates in the passive mode with HWP enabled, so as to fix the "powersave" governor behavior in that case (currently, HWP is allowed to scale the performance all the way up to the policy max limit when the "powersave" governor is used, but it should be constrained to the policy min limit then). Fixes: `f6ebbcf08f` ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: `9a2a9ebc0a` cpufreq: Introduce governor flags Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: `218f668701` cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: `ea9364bbad` cpufreq: Add strict_target to struct cpufreq_policy	2020-11-10 18:36:17 +01:00
Rafael J. Wysocki	e0be38ed4a	cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode If the cpufreq policy max limit is changed when intel_pstate operates in the passive mode with HWP enabled and the "powersave" governor is used on top of it, the HWP max limit is not updated as appropriate. Namely, in the "powersave" governor case, the target P-state is always equal to the policy min limit, so if the latter does not change, intel_cpufreq_adjust_hwp() is not invoked to update the HWP Request MSR due to the "target_pstate != old_pstate" check in intel_cpufreq_update_pstate(), so the HWP max limit is not updated as a result. Also, if the CPUFREQ_NEED_UPDATE_LIMITS flag is not set for the driver and the target frequency does not change along with the policy max limit, the "target_freq == policy->cur" check in __cpufreq_driver_target() prevents the driver's ->target() callback from being invoked at all, so the HWP max limit is not updated. To prevent that occurring, set the CPUFREQ_NEED_UPDATE_LIMITS flag in the intel_cpufreq driver structure if HWP is enabled and modify intel_cpufreq_update_pstate() to do the "target_pstate != old_pstate" check only in the non-HWP case and let intel_cpufreq_adjust_hwp() always run in the HWP case (it will update HWP Request only if the cached value of the register is different from the new one including the limits, so if neither the target P-state value nor the max limit changes, the register write will still be avoided). Fixes: `f6ebbcf08f` ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Reported-by: Zhang Rui <rui.zhang@intel.com> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: `1c534352f4` cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ... Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Zhang Rui <rui.zhang@intel.com>	2020-10-27 18:53:35 +01:00
Chen Yu	cdc1719cd8	cpufreq: intel_pstate: Delete intel_pstate sysfs if failed to register the driver There is a corner case that if the intel_pstate driver fails to be registered (might be due to invalid MSR access) and acpi_cpufreq takse over, the intel_pstate sysfs interface is still populated which may confuse user space (turbostat for example): grep . /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver acpi-cpufreq grep . /sys/devices/system/cpu/intel_pstate/* /sys/devices/system/cpu/intel_pstate/max_perf_pct:0 /sys/devices/system/cpu/intel_pstate/min_perf_pct:0 grep: /sys/devices/system/cpu/intel_pstate/no_turbo: Resource temporarily unavailable grep: /sys/devices/system/cpu/intel_pstate/num_pstates: Resource temporarily unavailable /sys/devices/system/cpu/intel_pstate/status:off grep: /sys/devices/system/cpu/intel_pstate/turbo_pct: Resource temporarily unavailable The mere presence of the intel_pstate sysfs interface does not mean that intel_pstate is in use (for example, echo "off" to "status"), but it should not be created in the failing case. Fix this issue by deleting the intel_pstate sysfs if the driver registration fails. Reported-by: Wendy Wang <wendy.wang@intel.com> Suggested-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com [ rjw: Refactor code to avoid jumps, change function name, changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-10-16 16:33:12 +02:00
Zhang Rui	fc7d17551f	cpufreq: intel_pstate: Fix missing return statement Fix missing return statement when writing "off" to intel_pstate status sysfs I/F. Fixes: `55671ea325` ("cpufreq: intel_pstate: Free memory only when turning off") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-09-30 17:37:23 +02:00
Francisco Jerez	eacc9c5a92	cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled This fixes the behavior of the scaling_max_freq and scaling_min_freq sysfs files in systems which had turbo disabled by the BIOS. Caleb noticed that the HWP is programmed to operate in the wrong P-state range on his system when the CPUFREQ policy min/max frequency is set via sysfs. This seems to be because in his system intel_pstate_get_hwp_max() is returning the maximum turbo P-state even though turbo was disabled by the BIOS, which causes intel_pstate to scale kHz frequencies incorrectly e.g. setting the maximum turbo frequency whenever the maximum guaranteed frequency is requested via sysfs. Tested-by: Caleb Callaway <caleb.callaway@intel.com> Signed-off-by: Francisco Jerez <currojerez@riseup.net> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Minor subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-09-01 21:15:00 +02:00
Rafael J. Wysocki	55671ea325	cpufreq: intel_pstate: Free memory only when turning off When intel_pstate switches the operation mode from "active" to "passive" or the other way around, freeing its data structures representing CPUs and allocating them again from scratch is not necessary and wasteful. Moreover, if these data structures are preserved, the cached HWP Request MSR value from there may be written to the MSR to start with to reinitialize it and help to restore the EPP value set previously (it is set to 0xFF when CPUs go offline to allow their SMT siblings to use the full range of EPP values and that also happens when the driver gets unregistered). Accordingly, modify the driver to only do a full cleanup on driver object registration errors and when its status is changed to "off" via sysfs and to write the cached HWP Request MSR value back to the MSR on CPU init if the data structure representing the given CPU is still there. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2020-09-01 21:14:52 +02:00
Rafael J. Wysocki	4adcf2e582	cpufreq: intel_pstate: Add ->offline and ->online callbacks Add ->offline and ->online driver callbacks to prepare for taking a CPU offline and to restore its working configuration when it goes back online, respectively, to avoid invoking the ->init callback on every CPU online which is quite a bit of unnecessary overhead. Define ->offline and ->online so that they can be used in the passive mode as well as in the active mode and because ->offline will do the majority of ->stop_cpu work, the passive mode does not need that callback any more, so drop it from there. Also modify the active mode ->suspend and ->resume callbacks to prevent them from interfering with the new ->offline and ->online ones in case the latter are invoked withing the system-wide suspend and resume code flow and make the passive mode use them too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2020-09-01 21:14:45 +02:00
Rafael J. Wysocki	b388eb58ce	cpufreq: intel_pstate: Tweak the EPP sysfs interface Modify the EPP sysfs interface to reject attempts to change the EPP to values different from 0 ("performance") in the active mode with the "performance" policy (ie. scaling_governor set to "performance"), to avoid situations in which the kernel appears to discard data passed to it via the EPP sysfs attribute. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2020-09-01 21:14:06 +02:00
Rafael J. Wysocki	c27a0ccc3c	cpufreq: intel_pstate: Update cached EPP in the active mode Make intel_pstate update the cached EPP value when setting the EPP via sysfs in the active mode just like it is the case in the passive mode, for consistency, but also for the benefit of subsequent changes. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2020-09-01 21:13:59 +02:00
Rafael J. Wysocki	43298db300	cpufreq: intel_pstate: Refuse to turn off with HWP enabled After commit `f6ebbcf08f` ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") it is possible to change the driver status to "off" via sysfs with HWP enabled, which effectively causes the driver to unregister itself, but HWP remains active and it forces the minimum performance, so even if another cpufreq driver is loaded, it will not be able to control the CPU frequency. For this reason, make the driver refuse to change the status to "off" with HWP enabled. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2020-09-01 21:13:32 +02:00
Rafael J. Wysocki	f6ebbcf08f	cpufreq: intel_pstate: Implement passive mode with HWP enabled Allow intel_pstate to work in the passive mode with HWP enabled and make it set the HWP minimum performance limit (HWP floor) to the P-state value given by the target frequency supplied by the cpufreq governor, so as to prevent the HWP algorithm and the CPU scheduler from working against each other, at least when the schedutil governor is in use, and update the intel_pstate documentation accordingly. Among other things, this allows utilization clamps to be taken into account, at least to a certain extent, when intel_pstate is in use and makes it more likely that sufficient capacity for deadline tasks will be provided. After this change, the resulting behavior of an HWP system with intel_pstate in the passive mode should be close to the behavior of the analogous non-HWP system with intel_pstate in the passive mode, except that the HWP algorithm is generally allowed to make the CPU run at a frequency above the floor P-state set by intel_pstate in the entire available range of P-states, while without HWP a CPU can run in a P-state above the requested one if the latter falls into the range of turbo P-states (referred to as the turbo range) or if the P-states of all CPUs in one package are coordinated with each other at the hardware level. [Note that in principle the HWP floor may not be taken into account by the processor if it falls into the turbo range, in which case the processor has a license to choose any P-state, either below or above the HWP floor, just like a non-HWP processor in the case when the target P-state falls into the turbo range.] With this change applied, intel_pstate in the passive mode assumes complete control over the HWP request MSR and concurrent changes of that MSR (eg. via the direct MSR access interface) are overridden by it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-08-11 17:29:45 +02:00
Srinivas Pandruvada	4daca379c7	cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0 The MSR_TURBO_RATIO_LIMIT can be 0. This is not an error. User can update this MSR via BIOS settings on some systems or can use msr tools to update. Also some systems boot with value = 0. This results in display of cpufreq/cpuinfo_max_freq wrong. This value will be equal to cpufreq/base_frequency, even though turbo is enabled. But platform will still function normally in HWP mode as we get max 1-core frequency from the MSR_HWP_CAPABILITIES. This MSR is already used to calculate cpu->pstate.turbo_freq, which is used for to set policy->cpuinfo.max_freq. But some other places cpu->pstate.turbo_pstate is used. For example to set policy->max. To fix this, also update cpu->pstate.turbo_pstate when updating cpu->pstate.turbo_freq. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-08-04 12:43:07 +02:00
Rafael J. Wysocki	de002c55ca	cpufreq: intel_pstate: Fix EPP setting via sysfs in active mode Because intel_pstate_set_energy_pref_index() reads and writes the MSR_HWP_REQUEST register without using the cached value of it used by intel_pstate_hwp_boost_up() and intel_pstate_hwp_boost_down(), those functions may overwrite the value written by it and so the EPP value set via sysfs may be lost. To avoid that, make intel_pstate_set_energy_pref_index() take the cached value of MSR_HWP_REQUEST just like the other two routines mentioned above and update it with the new EPP value coming from user space in addition to updating the MSR. Note that the MSR itself still needs to be updated too in case hwp_boost is unset or the boosting mechanism is not active at the EPP change time. Fixes: `e0efd5be63` ("cpufreq: intel_pstate: Add HWP boost utility and sched util hooks") Reported-by: Francisco Jerez <currojerez@riseup.net> Cc: 4.18+ <stable@vger.kernel.org> # 4.18+: 3da97d4db8ee cpufreq: intel_pstate: Rearrange ... Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-07-30 18:20:23 +02:00
Rafael J. Wysocki	3a95717606	cpufreq: intel_pstate: Rearrange the storing of new EPP values Move the locking away from intel_pstate_set_energy_pref_index() into its only caller and drop the (now redundant) return_pref label from it. Also move the "raw" EPP value check into the caller of that function, so as to do it before acquiring the mutex, and reduce code duplication related to the "raw" EPP values processing somewhat. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-07-30 18:19:52 +02:00
Rafael J. Wysocki	80e3036866	Merge back cpufreq material for v5.9.	2020-07-27 12:34:55 +02:00
Rafael J. Wysocki	7aa1031223	cpufreq: intel_pstate: Avoid enabling HWP if EPP is not supported Although there are processors supporting hardware-managed P-states (HWP) without the energy-performance preference (EPP) feature, they are not expected to be run with HWP enabled (the BIOS should disable HWP on those systems). Missing EPP support generally indicates an incomplete HWP implementation and so it is better to avoid using HWP on those systems in production. However, intel_pstate currently enables HWP on such systems, which is questionable, so prevent it from doing that by making it check EPP support before enabling HWP and avoid enabling it if EPP is not supported by the processor at hand. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-16 17:16:51 +02:00
Rafael J. Wysocki	23a522e388	cpufreq: intel_pstate: Clean up aperf_mperf_shift description The kerneldoc description of the aperf_mperf_shift field in struct global_params is unclear and there is a typo in it, so simplify it and clean it up. Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lee Jones <lee.jones@linaro.org>	2020-07-16 17:15:13 +02:00
Lee Jones	8f23d1f12c	cpufreq: intel_pstate: Supply struct attribute description for get_aperf_mperf_shift() Fixes the following W=1 kernel build warning(s): drivers/cpufreq/intel_pstate.c:293: warning: Function parameter or member 'get_aperf_mperf_shift' not described in 'pstate_funcs' Suggested-by: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Remove line break ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-15 15:17:06 +02:00
Rafael J. Wysocki	39a188b883	cpufreq: intel_pstate: Fix active mode setting from command line If intel_pstate starts in the passive mode by default (that happens when the processor in the system doesn't support HWP), passing intel_pstate=active in the kernel command line doesn't work, so fix that. Fixes: `33aa46f252` ("cpufreq: intel_pstate: Use passive mode by default without HWP") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Doug Smythies <dsmythies@telus.net>	2020-07-13 17:55:57 +02:00
Srinivas Pandruvada	3ff79754d7	cpufreq: intel_pstate: Fix static checker warning for epp variable Fix warning for: drivers/cpufreq/intel_pstate.c:731 store_energy_performance_preference() error: uninitialized symbol 'epp'. This warning is for a case, when energy_performance_preference attribute matches pre defined strings. In this case the value of raw epp will not be used to set EPP bits in MSR_HWP_REQUEST. So initializing with any value is fine. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-13 15:50:53 +02:00

1 2 3 4 5 ...

533 Commits