linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-31 06:09:56 +00:00

Author	SHA1	Message	Date
Rafael J. Wysocki	3598b30bd9	cpufreq: Fix typo in cpufreq.h s/internale/internal/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-10-21 12:30:17 +02:00
Vincent Donnefort	b894d20e68	cpufreq: Use CPUFREQ_RELATION_E in DVFS governors Let the governors schedutil, conservative and ondemand to work, if possible on efficient frequencies only. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-10-05 16:33:05 +02:00
Vincent Donnefort	1f39fa0dcc	cpufreq: Introducing CPUFREQ_RELATION_E This newly introduced flag can be applied by a governor to a CPUFreq relation, when looking for a frequency within the policy table. The resolution would then only walk through efficient frequencies. Even with the flag set, the policy max limit will still be honoured. If no efficient frequencies can be found within the limits of the policy, an inefficient one would be returned. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-10-05 16:33:05 +02:00
Vincent Donnefort	442d24a5c4	cpufreq: Add an interface to mark inefficient frequencies Some SoCs such as the sd855 have OPPs within the same policy whose cost is higher than others with a higher frequency. Those OPPs are inefficients and it might be interesting for a governor to not use them. cpufreq_table_set_inefficient() allows the caller to identify a specified frequency as being inefficient. Inefficient frequencies are only supported on sorted tables. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-10-05 16:33:05 +02:00
Rafael J. Wysocki	27de8d5970	Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar: "This adds a new cpufreq driver for Mediatek, which had been going through reviews since last one year." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW	2021-09-07 15:42:10 +02:00
Hector.Yuan	8486a32dd4	cpufreq: Add of_perf_domain_get_sharing_cpumask Add of_perf_domain_get_sharing_cpumask function to group cpu to specific performance domain. Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com> [ Viresh: create separate routine parse_perf_domain() and always set the cpumask. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-09-06 15:14:35 +05:30
Viresh Kumar	4bf8e58211	cpufreq: Remove ready() callback This isn't used anymore, get rid of it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-09-02 18:04:17 +02:00
Viresh Kumar	c17495b01b	cpufreq: Add callback to register with energy model Many cpufreq drivers register with the energy model for each policy and do exactly the same thing. Follow the footsteps of thermal-cooling, to get it done from the cpufreq core itself. Provide a new callback, which will be called, if present, by the cpufreq core at the right moment (more on that in the code's comment). Also provide a generic implementation that uses dev_pm_opp_of_register_em(). This also allows us to register with the EM at a later point of time, compared to ->init(), from where the EM core can access cpufreq policy directly using cpufreq_cpu_get() type of helpers and perform other work, like marking few frequencies inefficient, this will be done separately. Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-08-12 09:54:07 +05:30
Viresh Kumar	b3beca7618	cpufreq: Remove ->resolve_freq() Commit `e3c0623608` ("cpufreq: add cpufreq_driver_resolve_freq()") introduced this callback, back in 2016, for drivers that provide the ->target() callback. The kernel hasn't seen a single user of it in the past 5 years and it is not likely to be used any time soon. Remove it for now. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-06-30 19:45:42 +02:00
Viresh Kumar	3e0f897fd9	cpufreq: Remove the ->stop_cpu() driver callback Now that all users of ->stop_cpu() have been migrated to using other callbacks, drop it from the core. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Minor edits in the subject and changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-06-30 18:54:11 +02:00
Viresh Kumar	2f0531869f	cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN This flag is set by one of the drivers but it isn't used in the code otherwise. Remove the unused flag and update the driver. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-02-04 19:25:47 +01:00
Viresh Kumar	5ae4a4b45d	cpufreq: Remove CPUFREQ_STICKY flag During cpufreq driver's registration, if the ->init() callback for all the CPUs fail then there is not much point in keeping the driver around as it will only account for more of unnecessary noise, for example cpufreq core will try to suspend/resume the driver which never got registered properly. The removal of such a driver is avoided if the driver carries the CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no one should ever need it now. A lot of drivers do set this flag, probably because they just copied it from other drivers. This was added earlier for some platforms [2] because their cpufreq drivers were getting registered before the CPUs were registered with subsys framework. And hence they used to fail. The same isn't true anymore though. The current code flow in the kernel is: start_kernel() -> kernel_init() -> kernel_init_freeable() -> do_basic_setup() -> driver_init() -> cpu_dev_init() -> subsys_system_register() //For CPUs -> do_initcalls() -> cpufreq_register_driver() Clearly, the CPUs will always get registered with subsys framework before any cpufreq driver can get probed. Remove the flag and update the relevant drivers. Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1] Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-02-04 19:23:20 +01:00
Rafael J. Wysocki	ee2cc4276b	cpufreq: Add special-purpose fast-switching callback for drivers First off, some cpufreq drivers (eg. intel_pstate) can pass hints beyond the current target frequency to the hardware and there are no provisions for doing that in the cpufreq framework. In particular, today the driver has to assume that it should not allow the frequency to fall below the one requested by the governor (or the required capacity may not be provided) which may not be the case and which may lead to excessive energy usage in some scenarios. Second, the hints passed by these drivers to the hardware need not be in terms of the frequency, so representing the utilization numbers coming from the scheduler as frequency before passing them to those drivers is not really useful. Address the two points above by adding a special-purpose replacement for the ->fast_switch callback, called ->adjust_perf, allowing the governor to pass abstract performance level (rather than frequency) values for the minimum (required) and target (desired) performance along with the CPU capacity to compare them to. Also update the schedutil governor to use the new callback instead of ->fast_switch if present and if the utilization mertics are frequency-invariant (that is requisite for the direct mapping between the utilization and the CPU performance levels to be a reasonable approximation). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-15 19:24:18 +01:00
Rafael J. Wysocki	ef7ece9a9b	Merge back cpufreq updates for v5.11.	2020-11-16 13:20:31 +01:00
Rafael J. Wysocki	ea9364bbad	cpufreq: Add strict_target to struct cpufreq_policy Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is set for the current governor to struct cpufreq_policy, so that the drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to access the governor object during every frequency transition. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-11-10 18:31:17 +01:00
Rafael J. Wysocki	218f668701	cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET Introduce a new governor flag, CPUFREQ_GOV_STRICT_TARGET, for the governors that want the target frequency to be set exactly to the given value without leaving any room for adjustments on the hardware side and set this flag for the powersave and performance governors. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-11-10 18:31:17 +01:00
Rafael J. Wysocki	9a2a9ebc0a	cpufreq: Introduce governor flags A new cpufreq governor flag will be added subsequently, so replace the bool dynamic_switching fleid in struct cpufreq_governor with a flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for the "dynamic switching" governors instead of it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-11-10 18:31:17 +01:00
Rafael J. Wysocki	56a7ff75cd	cpufreq: Drop restore_freq from struct cpufreq_policy The restore_freq field in struct cpufreq_policy is only used by __target_index() in one place and a local variable in that function may as well be used instead of it, so drop it and modify __target_index() accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-11-02 18:32:39 +01:00
Rafael J. Wysocki	a62f68f5ca	cpufreq: Introduce cpufreq_driver_test_flags() Add a helper function to test the flags of the cpufreq driver in use againt a given flags mask. In particular, this will be needed to test the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil governor. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-10-29 14:07:30 +01:00
Rafael J. Wysocki	1c534352f4	cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flag Generally, a cpufreq driver may need to update some internal upper and lower frequency boundaries on policy max and min changes, respectively, but currently this does not work if the target frequency does not change along with the policy limit. Namely, if the target frequency does not change along with the policy min or max, the "target_freq == policy->cur" check in __cpufreq_driver_target() prevents driver callbacks from being invoked and they do not even have a chance to update the corresponding internal boundary. This particularly affects the "powersave" and "performance" governors that always set the target frequency to one of the policy limits and it never changes when the other limit is updated. To allow cpufreq the drivers needing to update internal frequency boundaries on policy limits changes to avoid this issue, introduce a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will neutralize the check mentioned above. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-10-27 18:47:40 +01:00
Ionela Voinescu	a20b7053b5	cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale() Compared to other arch_* functions, arch_set_freq_scale() has an atypical weak definition that can be replaced by a strong architecture specific implementation. The more typical support for architectural functions involves defining an empty stub in a header file if the symbol is not already defined in architecture code. Some examples involve: - #define arch_scale_freq_capacity topology_get_freq_scale - #define arch_scale_freq_invariant topology_scale_freq_invariant - #define arch_scale_cpu_capacity topology_get_cpu_scale - #define arch_update_cpu_topology topology_update_cpu_topology - #define arch_scale_thermal_pressure topology_get_thermal_pressure - #define arch_set_thermal_pressure topology_set_thermal_pressure Bring arch_set_freq_scale() in line with these functions by renaming it to topology_set_freq_scale() in the arch topology driver, and by defining the arch_set_freq_scale symbol to point to the new function for arm and arm64. While there are other users of the arch_topology driver, this patch defines arch_set_freq_scale for arm and arm64 only, due to their existing definitions of arch_scale_freq_capacity. This is the getter function of the frequency invariance scale factor and without a getter function, the setter function - arch_set_freq_scale() has not purpose. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-10-08 17:17:27 +02:00
Valentin Schneider	ecddc3a0d5	arch_topology, cpufreq: constify arch_* cpumasks The passed cpumask arguments to arch_set_freq_scale() and arch_freq_counters_available() are only iterated over, so reflect this in the prototype. This also allows to pass system cpumasks like cpu_online_mask without getting a warning. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-09-18 19:11:04 +02:00
Ionela Voinescu	874f635310	cpufreq: report whether cpufreq supports Frequency Invariance (FI) Now that the update of the FI scale factor is done in cpufreq core for selected functions - target(), target_index() and fast_switch(), we can provide feedback to the task scheduler and architecture code on whether cpufreq supports FI. For this purpose provide an external function to expose whether the cpufreq drivers support FI, by using a static key. The logic behind the enablement of cpufreq-based invariance is as follows: - cpufreq-based invariance is disabled by default - cpufreq-based invariance is enabled if any of the callbacks above is implemented while the unsupported setpolicy() is not The cpufreq_supports_freq_invariance() function only returns whether cpufreq is instrumented with the arch_set_freq_scale() calls that result in support for frequency invariance. Due to the lack of knowledge on whether the implementation of arch_set_freq_scale() actually results in the setting of a scale factor based on cpufreq information, it is up to the architecture code to ensure the setting and provision of the scale factor to the scheduler. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-09-18 19:10:56 +02:00
Viresh Kumar	30b8e6b22f	cpufreq: Use WARN_ON_ONCE() for invalid relation The relation can't be invalid here, so if it turns out to be invalid, just WARN_ON_ONCE() and return 0. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-08-27 12:51:25 +02:00
Rafael J. Wysocki	f6ebbcf08f	cpufreq: intel_pstate: Implement passive mode with HWP enabled Allow intel_pstate to work in the passive mode with HWP enabled and make it set the HWP minimum performance limit (HWP floor) to the P-state value given by the target frequency supplied by the cpufreq governor, so as to prevent the HWP algorithm and the CPU scheduler from working against each other, at least when the schedutil governor is in use, and update the intel_pstate documentation accordingly. Among other things, this allows utilization clamps to be taken into account, at least to a certain extent, when intel_pstate is in use and makes it more likely that sufficient capacity for deadline tasks will be provided. After this change, the resulting behavior of an HWP system with intel_pstate in the passive mode should be close to the behavior of the analogous non-HWP system with intel_pstate in the passive mode, except that the HWP algorithm is generally allowed to make the CPU run at a frequency above the floor P-state set by intel_pstate in the entire available range of P-states, while without HWP a CPU can run in a P-state above the requested one if the latter falls into the range of turbo P-states (referred to as the turbo range) or if the P-states of all CPUs in one package are coordinated with each other at the hardware level. [Note that in principle the HWP floor may not be taken into account by the processor if it falls into the turbo range, in which case the processor has a license to choose any P-state, either below or above the HWP floor, just like a non-HWP processor in the case when the target P-state falls into the turbo range.] With this change applied, intel_pstate in the passive mode assumes complete control over the HWP request MSR and concurrent changes of that MSR (eg. via the direct MSR access interface) are overridden by it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-08-11 17:29:45 +02:00
Rafael J. Wysocki	9ac1fb156a	Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq driver changes for v5.9-rc1 from Viresh Kumar: "Here are the details: - Adaptive voltage scaling (AVS) support and minor cleanups for brcmstb driver (Florian Fainelli and Markus Mayer). - A new tegra driver and cleanup for the existing one (Sumit Gupta and Jon Hunter). - Bandwidth level support for Qcom driver along with OPP changes (Sibi Sankar). - Cleanups to sti, cpufreq-dt, ap806, CPPC drivers (Viresh Kumar, Lee Jones, Ivan Kokshaysky, Sven Auhagen, and Xin Hao). - Make schedutil default governor for ARM (Valentin Schneider). - Fix dependency issues for imx (Walter Lozano). - Cleanup around cached_resolved_idx in cpufreq core (Viresh Kumar)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: make schedutil the default for arm and arm64 cpufreq: cached_resolved_idx can not be negative cpufreq: Add Tegra194 cpufreq driver dt-bindings: arm: Add NVIDIA Tegra194 CPU Complex binding cpufreq: imx: Select NVMEM_IMX_OCOTP cpufreq: sti-cpufreq: Fix some formatting and misspelling issues cpufreq: tegra186: Simplify probe return path cpufreq: CPPC: Reuse caps variable in few routines cpufreq: ap806: fix cpufreq driver needs ap cpu clk cpufreq: cppc: Reorder code and remove apply_hisi_workaround variable cpufreq: dt: fix oops on armada37xx cpufreq: brcmstb-avs-cpufreq: send S2_ENTER / S2_EXIT commands to AVS cpufreq: brcmstb-avs-cpufreq: Support polling AVS firmware cpufreq: brcmstb-avs-cpufreq: more flexible interface for __issue_avs_command() cpufreq: qcom: Disable fast switch when scaling DDR/L3 cpufreq: qcom: Update the bandwidth levels on frequency change OPP: Add and export helper to set bandwidth cpufreq: blacklist SC7180 in cpufreq-dt-platdev cpufreq: blacklist SDM845 in cpufreq-dt-platdev	2020-08-04 12:44:53 +02:00
Viresh Kumar	292072c387	cpufreq: cached_resolved_idx can not be negative It is not possible for cached_resolved_idx to be invalid here as the cpufreq core always sets index to a positive value. Change its type to unsigned int and fix qcom usage a bit. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-07-30 11:40:16 +05:30
Quentin Perret	10dd8573b0	cpufreq: Register governors at core_initcall Currently, most CPUFreq governors are registered at the core_initcall time when the given governor is the default one, and the module_init time otherwise. In preparation for letting users specify the default governor on the kernel command line, change all of them to be registered at the core_initcall unconditionally, as it is already the case for the schedutil and performance governors. This will allow us to assume that builtin governors have been registered before the built-in CPUFreq drivers probe. And since all governors have similar init/exit patterns now, introduce two new macros, cpufreq_governor_{init,exit}(), to factorize the code. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-02 13:03:30 +02:00
Xiongfeng Wang	cf6fada715	cpufreq: change '.set_boost' to act on one policy Macro 'for_each_active_policy()' is defined internally. To avoid some cpufreq driver needing this macro to iterate over all the policies in '.set_boost' callback, we redefine '.set_boost' to act on only one policy and pass the policy as an argument. 'cpufreq_boost_trigger_state()' iterates over all the policies to set boost for the system. This is preparation for adding SW BOOST support for CPPC. To protect Boost enable/disable by sysfs from CPU online/offline, add 'cpu_hotplug_lock' before calling '.set_boost' for each CPU. Also move the lock from 'set_boost()' to 'store_cpb()' in acpi_cpufreq. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-06-05 14:20:02 +02:00
Wang Wenhu	2909438d4d	cpufreq: fix minor typo in struct cpufreq_driver doc comment Delete the duplicate "to", possibly double-typed. Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-05-14 13:23:09 +02:00
Ionela Voinescu	bbce8eaa60	cpufreq: add function to get the hardware max frequency Add weak function to return the hardware maximum frequency of a CPU, with the default implementation returning cpuinfo.max_freq, which is the best information we can generically get from the cpufreq framework. The default can be overwritten by a strong function in platforms that want to provide an alternative implementation, with more accurate information, obtained either from hardware or firmware. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2020-03-06 16:02:50 +00:00
Yangtao Li	183edb20e6	cpufreq: Make cpufreq_global_kobject static The cpufreq_global_kobject is only used internally by cpufreq.c after commit `2361be2366` ("cpufreq: Don't create empty /sys/devices/system/cpu/cpufreq directory"). Make it static. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> [ rjw: Add empty line after cpufreq_global_kobject definition ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-02-03 16:56:48 +01:00
Rafael J. Wysocki	1e4f63aecb	cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: `3000ce3c52` ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-01-27 10:33:33 +01:00
Rafael J. Wysocki	85572c2c4a	cpufreq: Avoid leaving stale IRQ work items during CPU offline The scheduler code calling cpufreq_update_util() may run during CPU offline on the target CPU after the IRQ work lists have been flushed for it, so the target CPU should be prevented from running code that may queue up an IRQ work item on it at that point. Unfortunately, that may not be the case if dvfs_possible_from_any_cpu is set for at least one cpufreq policy in the system, because that allows the CPU going offline to run the utilization update callback of the cpufreq governor on behalf of another (online) CPU in some cases. If that happens, the cpufreq governor callback may queue up an IRQ work on the CPU running it, which is going offline, and the IRQ work may not be flushed after that point. Moreover, that IRQ work cannot be flushed until the "offlining" CPU goes back online, so if any other CPU calls irq_work_sync() to wait for the completion of that IRQ work, it will have to wait until the "offlining" CPU is back online and that may not happen forever. In particular, a system-wide deadlock may occur during CPU online as a result of that. The failing scenario is as follows. CPU0 is the boot CPU, so it creates a cpufreq policy and becomes the "leader" of it (policy->cpu). It cannot go offline, because it is the boot CPU. Next, other CPUs join the cpufreq policy as they go online and they leave it when they go offline. The last CPU to go offline, say CPU3, may queue up an IRQ work while running the governor callback on behalf of CPU0 after leaving the cpufreq policy because of the dvfs_possible_from_any_cpu effect described above. Then, CPU0 is the only online CPU in the system and the stale IRQ work is still queued on CPU3. When, say, CPU1 goes back online, it will run irq_work_sync() to wait for that IRQ work to complete and so it will wait for CPU3 to go back online (which may never happen even in principle), but (worse yet) CPU0 is waiting for CPU1 at that point too and a system-wide deadlock occurs. To address this problem notice that CPUs which cannot run cpufreq utilization update code for themselves (for example, because they have left the cpufreq policies that they belonged to), should also be prevented from running that code on behalf of the other CPUs that belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so in that case the cpufreq_update_util_data pointer of the CPU running the code must not be NULL as well as for the CPU which is the target of the cpufreq utilization update in progress. Accordingly, change cpufreq_this_cpu_can_update() into a regular function in kernel/sched/cpufreq.c (instead of a static inline in a header file) and make it check the cpufreq_update_util_data pointer of the local CPU if dvfs_possible_from_any_cpu is set for the target cpufreq policy. Also update the schedutil governor to do the cpufreq_this_cpu_can_update() check in the non-fast-switch case too to avoid the stale IRQ work issues. Fixes: `99d14d0e16` ("cpufreq: Process remote callbacks from any CPU if the platform permits") Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/ Reported-by: Anson Huang <anson.huang@nxp.com> Tested-by: Anson Huang <anson.huang@nxp.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-12-12 17:59:43 +01:00
Rafael J. Wysocki	3000ce3c52	cpufreq: Use per-policy frequency QoS Replace the CPU device PM QoS used for the management of min and max frequency constraints in cpufreq (and its users) with per-policy frequency QoS to avoid problems with cpufreq policies covering more then one CPU. Namely, a cpufreq driver is registered with the subsys interface which calls cpufreq_add_dev() for each CPU, starting from CPU0, so currently the PM QoS notifiers are added to the first CPU in the policy (i.e. CPU0 in the majority of cases). In turn, when the cpufreq driver is unregistered, the subsys interface doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0, and the PM QoS notifiers are only removed when cpufreq_remove_dev() is called for the last CPU in the policy, say CPUx, which as a rule is not CPU0 if the policy covers more than one CPU. Then, the PM QoS notifiers cannot be removed, because CPUx does not have them, and they are still there in the device PM QoS notifiers list of CPU0, which prevents new PM QoS notifiers from being registered for CPU0 on the next attempt to register the cpufreq driver. The same issue occurs when the first CPU in the policy goes offline before unregistering the driver. After this change it does not matter which CPU is the policy CPU at the driver registration time and whether or not it is online all the time, because the frequency QoS is per policy and not per CPU. Fixes: `67d874c3b2` ("cpufreq: Register notifiers with the PM QoS framework") Reported-by: Dmitry Osipenko <digetx@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Reported-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2019-10-21 02:05:21 +02:00
Viresh Kumar	df0eea4488	cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events No driver makes reference to these events now, remove them and the code related to them. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-09-02 22:44:04 +02:00
Viresh Kumar	6a1490367c	cpufreq: Add policy create/remove notifiers back This effectively reverts some changes made by commit `f9f41e3ef9` ("cpufreq: Remove policy create/remove notifiers"). We have a new use case for policy create/remove notifiers (for allocating/freeing QoS requests per policy), so add them back. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-08-10 14:05:48 +02:00
Rafael J. Wysocki	918e162e6a	Merge branch 'pm-cpufreq' * pm-cpufreq: cpufreq: Make cpufreq_generic_init() return void cpufreq: imx-cpufreq-dt: Add i.MX8MN support cpufreq: Add QoS requests for userspace constraints cpufreq: intel_pstate: Reuse refresh_frequency_limits() cpufreq: Register notifiers with the PM QoS framework PM / QoS: Add support for MIN/MAX frequency constraints PM / QOS: Pass request type to dev_pm_qos_read_value() PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() PM / QOS: Pass request type to dev_pm_qos_{add\|remove}_notifier()	2019-07-18 09:49:30 +02:00
Viresh Kumar	c4dcc8a162	cpufreq: Make cpufreq_generic_init() return void It always returns 0 (success) and its return type should really be void. Over that, many drivers have added error handling code based on its return value, which is not required at all. Change its return type to void and update all the callers. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-07-16 10:20:11 +02:00
Viresh Kumar	18c49926c4	cpufreq: Add QoS requests for userspace constraints This implements QoS requests to manage userspace configuration of min and max frequency. Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: syzbot <syzbot+de771ae9390dffed7266@syzkaller.appspotmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-07-08 23:56:39 +02:00
Viresh Kumar	c57b25bdf7	cpufreq: intel_pstate: Reuse refresh_frequency_limits() The implementation of intel_pstate_update_max_freq() is quite similar to refresh_frequency_limits(), lets reuse it. Finding minimum of policy->user_policy.max and policy->cpuinfo.max_freq in intel_pstate_update_max_freq() is redundant as cpufreq_set_policy() will call the ->verify() callback of intel-pstate driver, which will do this comparison anyway and so dropping it from intel_pstate_update_max_freq() doesn't harm. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-07-08 23:56:39 +02:00
Viresh Kumar	67d874c3b2	cpufreq: Register notifiers with the PM QoS framework Register notifiers for min/max frequency constraints with the PM QoS framework. The constraints are also taken into consideration in cpufreq_set_policy(). This also relocates cpufreq_policy_put_kobj() as it is required to be called from cpufreq_policy_alloc() now. refresh_frequency_limits() is updated to avoid calling cpufreq_set_policy() for inactive policies and handle_update() is updated to have proper locking in place. No constraints are added until now though. Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-07-08 23:56:13 +02:00
Rafael J. Wysocki	586a07dca8	Merge branch 'pm-cpufreq' * pm-cpufreq: cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update() cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get() cpufreq: Don't skip frequency validation for has_target() drivers cpufreq: Use has_target() instead of !setpolicy cpufreq: Remove redundant !setpolicy check cpufreq: Move the IS_ENABLED(CPU_THERMAL) macro into a stub cpufreq: s5pv210: Don't flood kernel log after cpufreq change cpufreq: pcc-cpufreq: Fail initialization if driver cannot be registered cpufreq: add driver for Raspberry Pi cpufreq: Switch imx7d to imx-cpufreq-dt for speed grading cpufreq: imx-cpufreq-dt: Remove global platform match list cpufreq: brcmstb-avs-cpufreq: Fix types for voltage/frequency cpufreq: brcmstb-avs-cpufreq: Fix initial command check cpufreq: armada-37xx: Remove set but not used variable 'freq' cpufreq: imx-cpufreq-dt: Fix no OPPs available on unfused parts dt-bindings: imx-cpufreq-dt: Document opp-supported-hw usage cpufreq: Add imx-cpufreq-dt driver	2019-07-08 11:00:02 +02:00
Daniel Lezcano	bcc6156999	cpufreq: Move the IS_ENABLED(CPU_THERMAL) macro into a stub cpufreq_online() and cpufreq_offline() [un]register the driver as a cooling device. This is done if the driver is flagged as a cooling device in addition with an IS_ENABLED() check to compile out the branching code. Group this test in a stub function added in the cpufreq header instead of having the IS_ENABLED() in the code. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-06-26 10:59:57 +02:00
Thomas Gleixner	d2912cb15b	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-19 17:09:55 +02:00
Viresh Kumar	df24014abe	cpufreq: Call transition notifier only once for each policy Currently, the notifiers are called once for each CPU of the policy->cpus cpumask. It would be more optimal if the notifier can be called only once and all the relevant information be provided to it. Out of the 23 drivers that register for the transition notifiers today, only 4 of them do per-cpu updates and the callback for the rest can be called only once for the policy without any impact. This would also avoid multiple function calls to the notifier callbacks and reduce multiple iterations of notifier core's code (which does locking as well). This patch adds pointer to the cpufreq policy to the struct cpufreq_freqs, so the notifier callback has all the information available to it with a single call. The five drivers which perform per-cpu updates are updated to use the cpufreq policy. The freqs->cpu field is redundant now and is removed. Acked-by: David S. Miller <davem@davemloft.net> (sparc) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-05-10 12:20:36 +02:00
Rafael J. Wysocki	9083e49861	cpufreq: intel_pstate: Update max frequency on global turbo changes While the cpuinfo.max_freq value doesn't really matter for intel_pstate in the active mode, in the passive mode it is used by governors as the maximum physical frequency of the CPU and the results of governor computations generally depend on it. Also it is made available to user space via sysfs and it should match the current HW configuration. For this reason, make intel_pstate update cpuinfo.max_freq for all CPUs if it detects a global change of turbo frequency settings from "disable" to "enable" or the other way associated with a _PPC change notification from the platform firmware. Note that policy_is_inactive(), cpufreq_cpu_acquire(), cpufreq_cpu_release(), and cpufreq_set_policy() need to be made available to it for this purpose. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200759 Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2019-04-08 11:26:09 +02:00
Rafael J. Wysocki	5a25e3f7cc	cpufreq: intel_pstate: Driver-specific handling of _PPC updates In some cases, the platform firmware disables or enables turbo frequencies for all CPUs globally before triggering a _PPC change notification for one of them. Obviously, that global change affects all CPUs, not just the notified one, and it needs to be acted upon by cpufreq. The intel_pstate driver is able to detect such global changes of the settings, but it also needs to update policy limits for all CPUs if that happens, in particular if turbo frequencies are enabled globally - to allow them to be used. For this reason, introduce a new cpufreq driver callback to be invoked on _PPC notifications, if present, instead of simply calling cpufreq_update_policy() for the notified CPU and make intel_pstate use it to trigger policy updates for all CPUs in the system if global settings change. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200759 Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2019-04-01 23:43:05 +02:00
Viresh Kumar	91a12e91dc	cpufreq: Allow light-weight tear down and bring up of CPUs The cpufreq core doesn't remove the cpufreq policy anymore on CPU offline operation, rather that happens when the CPU device gets unregistered from the kernel. This allows faster recovery when the CPU comes back online. This is also very useful during system wide suspend/resume where we offline all non-boot CPUs during suspend and then bring them back on resume. This commit takes the same idea a step ahead to allow drivers to do light weight tear-down and bring-up during CPU offline and online operations. A new set of callbacks is introduced, online/offline(). online() gets called when the first CPU of an inactive policy is brought up and offline() gets called when all the CPUs of a policy are offlined. The existing init/exit() callback get called on policy creation/destruction. They also get called instead of online/offline() callbacks if the online/offline() callbacks aren't provided. This also moves around some code to get executed only for the new-policy case going forward. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-12 23:47:42 +01:00
Amit Kucheria	5c238a8b59	cpufreq: Auto-register the driver as a thermal cooling device if asked All cpufreq drivers do similar things to register as a cooling device. Provide a cpufreq driver flag so drivers can just ask the cpufreq core to register the cooling device on their behalf. This allows us to get rid of duplicated code in the drivers. In order to allow this, we add a struct thermal_cooling_device pointer to struct cpufreq_policy so that drivers don't need to store it in a private data structure. Suggested-by: Stephen Boyd <swboyd@chromium.org> Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-01-30 23:02:26 +01:00
Viresh Kumar	625c85a62c	cpufreq: Use struct kobj_attribute instead of struct global_attr The cpufreq_global_kobject is created using kobject_create_and_add() helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store routines are set to kobj_attr_show() and kobj_attr_store(). These routines pass struct kobj_attribute as an argument to the show/store callbacks. But all the cpufreq files created using the cpufreq_global_kobject expect the argument to be of type struct attribute. Things work fine currently as no one accesses the "attr" argument. We may not see issues even if the argument is used, as struct kobj_attribute has struct attribute as its first element and so they will both get same address. But this is logically incorrect and we should rather use struct kobj_attribute instead of struct global_attr in the cpufreq core and drivers and the show/store callbacks should take struct kobj_attribute as argument instead. This bug is caught using CFI CLANG builds in android kernel which catches mismatch in function prototypes for such callbacks. Reported-by: Donghee Han <dh.han@samsung.com> Reported-by: Sangkyu Kim <skwith.kim@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-01-29 11:44:30 +01:00
Amit Kucheria	8321be6a9d	cpufreq: Replace open-coded << with BIT() Minor clean-up to use BIT() and keep checkpatch happy. Clean up the comment formatting while we're at it to make it easier to read. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-01-21 11:02:09 +01:00
Quentin Perret	531b5c9f5c	sched/topology: Make Energy Aware Scheduling depend on schedutil Energy Aware Scheduling (EAS) is designed with the assumption that frequencies of CPUs follow their utilization value. When using a CPUFreq governor other than schedutil, the chances of this assumption being true are small, if any. When schedutil is being used, EAS' predictions are at least consistent with the frequency requests. Although those requests have no guarantees to be honored by the hardware, they should at least guide DVFS in the right direction and provide some hope in regards to the EAS model being accurate. To make sure EAS is only used in a sane configuration, create a strong dependency on schedutil being used. Since having sugov compiled-in does not provide that guarantee, make CPUFreq call a scheduler function on governor changes hence letting it rebuild the scheduling domains, check the governors of the online CPUs, and enable/disable EAS accordingly. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-11 15:17:00 +01:00
Viresh Kumar	036399782b	cpufreq: Rename cpufreq_can_do_remote_dvfs() This routine checks if the CPU running this code belongs to the policy of the target CPU or if not, can it do remote DVFS for it remotely. But the current name of it implies as if it is only about doing remote updates. Rename it to make it more relevant. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-05-23 10:37:08 +02:00
Viresh Kumar	2dd0df8472	cpufreq: Drop cpufreq_table_validate_and_show() This isn't used anymore. Remove the helper and update documentation accordingly. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-04-10 08:40:45 +02:00
Viresh Kumar	d417e0691a	cpufreq: Validate frequency table in the core By design, cpufreq drivers are responsible for calling cpufreq_frequency_table_cpuinfo() from their ->init() callbacks to validate the frequency table. However, if a cpufreq driver is buggy and fails to do so properly, it lead to unexpected behavior of the driver or the cpufreq core at a later point in time. It would be better if the core could validate the frequency table during driver initialization. To that end, introduce cpufreq_table_validate_and_sort() and make the cpufreq core call it right after invoking the ->init() callback of the driver and destroy the cpufreq policy if the table is invalid. For the time being the validation of the table happens twice, once from the driver and then from the core. The individual drivers will be updated separately to drop table validation if they don't need it for other reasons. The frequency table is marked "sorted" or "unsorted" by the new helper now instead of in cpufreq_table_validate_and_show(), as it should only be done after validating the table (which the drivers won't do going forward). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject/changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-02-27 18:22:12 +01:00
Dominik Brodowski	ffd81dcfef	cpufreq: Add and use cpufreq_for_each_{valid_,}entry_idx() Pointer subtraction is slow and tedious. Therefore, replace all instances where cpufreq_for_each_{valid_,}entry loops contained such substractions with an iteration macro providing an index to the frequency_table entry. Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Link: http://lkml.kernel.org/r/20180120020237.GM13338@ZenIV.linux.org.uk Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-02-08 10:21:39 +01:00
Rafael J. Wysocki	7d5905dc14	x86 / CPU: Always show current CPU frequency in /proc/cpuinfo After commit `890da9cf09` (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo on x86 can be either the nominal CPU frequency (which is constant) or the frequency most recently requested by a scaling governor in cpufreq, depending on the cpufreq configuration. That is somewhat inconsistent and is different from what it was before 4.13, so in order to restore the previous behavior, make it report the current CPU frequency like the scaling_cur_freq sysfs file in cpufreq. To that end, modify the /proc/cpuinfo implementation on x86 to use aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback registers, if available, and use their values to compute the CPU frequency to be reported as "cpu MHz". However, do that carefully enough to avoid accumulating delays that lead to unacceptable access times for /proc/cpuinfo on systems with many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs asynchronously at the /proc/cpuinfo open time, add a single delay upfront (if necessary) at that point and simply compute the current frequency while running show_cpuinfo() for each individual CPU. Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce the default delay between consecutive APERF and MPERF reads to 10 ms, which should be sufficient to get large enough numbers for the frequency computation in all cases. Fixes: `890da9cf09` (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org>	2017-11-15 19:46:50 +01:00
Dietmar Eggemann	e7d5459dfa	cpufreq: provide default frequency-invariance setter function Frequency-invariant accounting support based on the ratio of current frequency and maximum supported frequency is an optional feature an arch can implement. Since there are cpufreq drivers (e.g. cpufreq-dt) which can be build for different arch's a default implementation of the frequency-invariance setter function arch_set_freq_scale() is needed. This default implementation is an empty weak function which will be overwritten by a strong function in case the arch provides one. The setter function passes the cpumask of related (to the frequency change) cpus (online and offline cpus), the (new) current frequency and the maximum supported frequency. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-10-03 02:37:53 +02:00
Rafael J. Wysocki	08a10002be	Merge branch 'pm-cpufreq-sched' * pm-cpufreq-sched: cpufreq: schedutil: Always process remote callback with slow switching cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily cpufreq: Return 0 from ->fast_switch() on errors cpufreq: Simplify cpufreq_can_do_remote_dvfs() cpufreq: Process remote callbacks from any CPU if the platform permits sched: cpufreq: Allow remote cpufreq callbacks cpufreq: schedutil: Use unsigned int for iowait boost cpufreq: schedutil: Make iowait boost more energy efficient	2017-09-04 00:05:22 +02:00
Rafael J. Wysocki	d6344d4b56	cpufreq: Simplify cpufreq_can_do_remote_dvfs() The if () in cpufreq_can_do_remote_dvfs() is superfluous, so drop it and simply return the value of the expression under it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-08-08 17:09:02 +02:00
Viresh Kumar	99d14d0e16	cpufreq: Process remote callbacks from any CPU if the platform permits On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU from policy-A can change frequency of CPUs belonging to policy-B. This is quite common in case of ARM platforms where we don't configure any per-cpu register. Add a flag to identify such platforms and update cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is set. Also enable the flag for cpufreq-dt driver which is used only on ARM platforms currently. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-08-01 14:24:54 +02:00
Viresh Kumar	674e75411f	sched: cpufreq: Allow remote cpufreq callbacks With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently, callbacks into cpufreq governors are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run the cpufreq governor for some time. One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that the new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. But because of the above mentioned limitation though, this does not occur. This patch updates the scheduler core to call the cpufreq callbacks for remote CPUs as well. The schedutil, ondemand and conservative governors are updated to process cpufreq utilization update hooks called for remote CPUs where the remote CPU is managed by the cpufreq policy of the local CPU. The intel_pstate driver is updated to always reject remote callbacks. This is tested with couple of usecases (Android: hackbench, recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit octa-core, single policy). Only galleryfling showed minor improvements, while others didn't had much deviation. The reason being that this patch only targets a corner case, where following are required to be true to improve performance and that doesn't happen too often with these tests: - Task is migrated to another CPU. - The task has high demand, and should take the target CPU to higher OPPs. - And the target CPU doesn't call into the cpufreq governor until the next tick. Based on initial work from Steve Muckle. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-08-01 14:24:53 +02:00
Viresh Kumar	fe829ed8ef	cpufreq: Add CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING cpufreq driver flag The policy->transition_latency field is used for multiple purposes today and its not straight forward at all. This is how it is used: A. Set the correct transition_latency value. B. Set it to CPUFREQ_ETERNAL because: 1. We don't want automatic dynamic switching (with ondemand/conservative) to happen at all. 2. We don't know the transition latency. This patch handles the B.1. case in a more readable way. A new flag for the cpufreq drivers is added to disallow use of cpufreq governors which have dynamic_switching flag set. All the current cpufreq drivers which are setting transition_latency unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't need to set transition_latency anymore. There shouldn't be any functional change after this patch. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-26 00:15:46 +02:00
Viresh Kumar	ed4676e254	cpufreq: Replace "max_transition_latency" with "dynamic_switching" There is no limitation in the ondemand or conservative governors which disallow the transition_latency to be greater than 10 ms. The max_transition_latency field is rather used to disallow automatic dynamic frequency switching for platforms which didn't wanted these governors to run. Replace max_transition_latency with a boolean (dynamic_switching) and check for transition_latency == CPUFREQ_ETERNAL along with that. This makes it pretty straight forward to read/understand now. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-26 00:15:45 +02:00
Viresh Kumar	aa7519af45	cpufreq: Use transition_delay_us for legacy governors as well The policy->transition_delay_us field is used only by the schedutil governor currently, and this field describes how fast the driver wants the cpufreq governor to change CPUs frequency. It should rather be a common thing across all governors, as it doesn't have any schedutil dependency here. Create a new helper cpufreq_policy_transition_delay_us() to get the transition delay across all governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-22 02:25:20 +02:00
Viresh Kumar	2d04503632	cpufreq: governor: Drop min_sampling_rate The cpufreq core and governors aren't supposed to set a limit on how fast we want to try changing the frequency. This is currently done for the legacy governors with help of min_sampling_rate. At worst, we may end up setting the sampling rate to a value lower than the rate at which frequency can be changed and then one of the CPUs in the policy will be only changing frequency for ever. But that is something for the user to decide and there is no need to have special handling for such cases in the core. Leave it for the user to figure out. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-22 02:25:20 +02:00
Linus Torvalds	4d25ec1966	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: - Improve thermal cpu_cooling interaction with cpufreq core. The cpu_cooling driver is designed to use CPU frequency scaling to avoid high thermal states for a platform. But it wasn't glued really well with cpufreq core. For example clipped-cpus is copied from the policy structure and its much better to use the policy->cpus (or related_cpus) fields directly as they may have got updated. Not that things were broken before this series, but they can be optimized a bit more. This series tries to improve interactions between cpufreq core and cpu_cooling driver and does some fixes/cleanups to the cpu_cooling driver. (Viresh Kumar) - A couple of fixes and cleanups in thermal core and imx, hisilicon, bcm_2835, int340x thermal drivers. (Arvind Yadav, Dan Carpenter, Sumeet Pawnikar, Srinivas Pandruvada, Willy WOLFF) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits) thermal: bcm2835: fix an error code in probe() thermal: hisilicon: Handle return value of clk_prepare_enable thermal: imx: Handle return value of clk_prepare_enable thermal: int340x: check for sensor when PTYP is missing Thermal/int340x: Fix few typos and kernel-doc style thermal: fix source code documentation for parameters thermal: cpu_cooling: Replace kmalloc with kmalloc_array thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power() thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev thermal: cpu_cooling: get_level() can't fail thermal: cpu_cooling: create structure for idle time stats thermal: cpu_cooling: merge frequency and power tables thermal: cpu_cooling: get rid of 'allowed_cpus' thermal: cpu_cooling: OPPs are registered for all CPUs thermal: cpu_cooling: store cpufreq policy cpufreq: create cpufreq_table_count_valid_entries() thermal: cpu_cooling: use cpufreq_policy to register cooling device thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state() thermal: cpu_cooling: remove cpufreq_cooling_get_level() ...	2017-07-14 13:12:32 -07:00
Len Brown	f8475cef90	x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-06-27 01:47:32 +02:00
Viresh Kumar	55d8529313	cpufreq: create cpufreq_table_count_valid_entries() We need such a routine at two places already, lets create one. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>	2017-05-27 17:32:28 -07:00
Rafael J. Wysocki	1b72e7fd30	cpufreq: schedutil: Use policy-dependent transition delays Make the schedutil governor take the initial (default) value of the rate_limit_us sysfs attribute from the (new) transition_delay_us policy parameter (to be set by the scaling driver). That will allow scaling drivers to make schedutil use smaller default values of rate_limit_us and reduce the default average time interval between consecutive frequency changes. Make intel_pstate set transition_delay_us to 500. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2017-04-17 18:37:27 +02:00
Viresh Kumar	565ebe8073	cpufreq: Fix typos in comments - s/freqnency/frequency/ - s/accomodating/accommodating/ Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-02-04 00:47:59 +01:00
Viresh Kumar	052f573f5c	cpufreq: Remove CPUFREQ_START notifier event Its not used anymore, remove it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-02-04 00:05:30 +01:00
Viresh Kumar	f9f41e3ef9	cpufreq: Remove policy create/remove notifiers Those were added by: commit `fcd7af917a` ("cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly") but aren't used anymore since: commit `1aefc75b24` ("cpufreq: stats: Make the stats code non-modular"). Remove them. Also remove the redundant parameter to the respective routines. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-02-03 23:59:38 +01:00
Rafael J. Wysocki	30248feff5	cpufreq: Make cpufreq_update_policy() void The return value of cpufreq_update_policy() is never used, so make it void. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-11-21 14:35:43 +01:00
Markus Mayer	ee7930ee27	cpufreq: stats: New sysfs attribute for clearing statistics Allow CPUfreq statistics to be cleared by writing anything to /sys/.../cpufreq/stats/reset. Signed-off-by: Markus Mayer <mmayer@broadcom.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-11 01:51:11 +01:00
Sergey Senozhatsky	c6fe46a79e	cpufreq: fix overflow in cpufreq_table_find_index_dl() 'best' is always less or equals to 'pos', so `best - pos' returns a negative value which is then getting casted to `unsigned int' and passed to __cpufreq_driver_target()->acpi_cpufreq_target() for policy->freq_table selection. This results in BUG: unable to handle kernel paging request at ffff881019b469f8 IP: [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] PGD 267f067 PUD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-20161017-dbg-dirty Workqueue: events dbs_work_handler task: ffff88041b808000 task.stack: ffff88041b810000 RIP: 0010:[<ffffffffa00356c1>] [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] RSP: 0018:ffff88041b813c60 EFLAGS: 00010282 RAX: ffff880419b46a00 RBX: ffff88041b848400 RCX: ffff880419b20f80 RDX: 00000000001dff38 RSI: 00000000ffffffff RDI: ffff88041b848400 RBP: ffff88041b813cb0 R08: 0000000000000006 R09: 0000000000000040 R10: ffffffff8207f9e0 R11: ffffffff8173595b R12: 0000000000000000 R13: ffff88041f1dff38 R14: 0000000000262900 R15: 0000000bfffffff4 FS: 0000000000000000(0000) GS:ffff88041f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff881019b469f8 CR3: 000000041a2d3000 CR4: 00000000001406e0 Stack: ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663 ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000 0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc Call Trace: [<ffffffff813347f9>] ? cpufreq_freq_transition_begin+0xf1/0xfc [<ffffffff81334663>] ? get_cpu_idle_time+0x97/0xa6 [<ffffffff813355dc>] __cpufreq_driver_target+0x3b6/0x44e [<ffffffff81336ca3>] cs_dbs_timer+0x11a/0x135 [<ffffffff81336fda>] dbs_work_handler+0x39/0x62 [<ffffffff81057823>] process_one_work+0x280/0x4a5 [<ffffffff81058719>] worker_thread+0x24f/0x397 [<ffffffff810584ca>] ? rescuer_thread+0x30b/0x30b [<ffffffff81418380>] ? nl80211_get_key+0x29/0x36a [<ffffffff8105d2b7>] kthread+0xfc/0x104 [<ffffffff8107ceea>] ? put_lock_stats.isra.9+0xe/0x20 [<ffffffff8105d1bb>] ? kthread_create_on_node+0x3f/0x3f [<ffffffff814b2092>] ret_from_fork+0x22/0x30 Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41 08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 <46> 8b 74 38 04 45 3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00 RIP [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] RSP <ffff88041b813c60> CR2: ffff881019b469f8 ---[ end trace 16d9fc7a17897d37 ]--- [ rjw: In some cases this bug may also cause incorrect frequencies to be selected by cpufreq governors. ] Fixes: `899bb6642f` (cpufreq: skip invalid entries when searching the frequency) Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2 Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-20 16:35:50 +02:00
Aaro Koskinen	899bb6642f	cpufreq: skip invalid entries when searching the frequency Skip invalid entries when searching the frequency. This fixes cpufreq at least on loongson2 MIPS board. Fixes: `da0c6dc00c` (cpufreq: Handle sorted frequency tables more efficiently) Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-12 21:01:18 +02:00
Steve Muckle	e3c0623608	cpufreq: add cpufreq_driver_resolve_freq() Cpufreq governors may need to know what a particular target frequency maps to in the driver without necessarily wanting to set the frequency. Support this operation via a new cpufreq API, cpufreq_driver_resolve_freq(). This API returns the lowest driver frequency equal or greater than the target frequency (CPUFREQ_RELATION_L), subject to any policy (min/max) or driver limitations. The mapping is also cached in the policy so that a subsequent fast_switch operation can avoid repeating the same lookup. The API will call a new cpufreq driver callback, resolve_freq(), if it has been registered by the driver. Otherwise the frequency is resolved via cpufreq_frequency_table_target(). Rather than require ->target() style drivers to provide a resolve_freq() callback it is left to the caller to ensure that the driver implements this callback if necessary to use cpufreq_driver_resolve_freq(). Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Steve Muckle <smuckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-21 14:46:08 +02:00
Viresh Kumar	da0c6dc00c	cpufreq: Handle sorted frequency tables more efficiently cpufreq drivers aren't required to provide a sorted frequency table today, and even the ones which provide a sorted table aren't handled efficiently by cpufreq core. This patch adds infrastructure to verify if the freq-table provided by the drivers is sorted or not, and use efficient helpers if they are sorted. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-07 00:13:20 +02:00
Viresh Kumar	d218ed7739	cpufreq: Return index from cpufreq_frequency_table_target() This routine can't fail unless the frequency table is invalid and doesn't contain any valid entries. Make it return the index and WARN() in case it is used for an invalid table. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-09 00:58:06 +02:00
Viresh Kumar	7ab4aabbaa	cpufreq: Drop freq-table param to cpufreq_frequency_table_target() The policy already has this pointer set, use it instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-09 00:58:06 +02:00
Viresh Kumar	f8bfc116ca	cpufreq: Remove cpufreq_frequency_get_table() Most of the callers of cpufreq_frequency_get_table() already have the pointer to a valid 'policy' structure and they don't really need to go through the per-cpu variable first and then a check to validate the frequency, in order to find the freq-table for the policy. Directly use the policy->freq_table field instead for them. Only one user of that API is left after above changes, cpu_cooling.c and it accesses the freq_table in a racy way as the policy can get freed in between. Fix it by using cpufreq_cpu_get() properly. Since there are no more users of cpufreq_frequency_get_table() left, get rid of it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Javi Merino <javi.merino@arm.com> (cpu_cooling.c) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-09 00:58:05 +02:00
Rafael J. Wysocki	1aefc75b24	cpufreq: stats: Make the stats code non-modular The modularity of cpufreq_stats is quite problematic. First off, the usage of policy notifiers for the initialization and cleanup in the cpufreq_stats module is inherently racy with respect to CPU offline/online and the initialization and cleanup of the cpufreq driver. Second, fast frequency switching (used by the schedutil governor) cannot be enabled if any transition notifiers are registered, so if the cpufreq_stats module (that registers a transition notifier for updating transition statistics) is loaded, the schedutil governor cannot use fast frequency switching. On the other hand, allowing cpufreq_stats to be built as a module doesn't really add much value. Arguably, there's not much reason for that code to be modular at all. For the above reasons, make the cpufreq stats code non-modular, modify the core to invoke functions provided by that code directly and drop the notifiers from it. Make the stats sysfs attributes appear empty if fast frequency switching is enabled as the statistics will not be updated in that case anyway (and returning -EBUSY from those attributes breaks powertop). While at it, clean up Kconfig help for the CPU_FREQ_STAT and CPU_FREQ_STAT_DETAILS options. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-02 23:24:41 +02:00
Rafael J. Wysocki	9a15fb2c79	cpufreq: Drop the 'initialized' field from struct cpufreq_governor The 'initialized' field in struct cpufreq_governor is only used by the conservative governor (as a usage counter) and the way that happens is far from straightforward and arguably incorrect. Namely, the value of 'initialized' is checked by cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and the results of those checks are passed (as the second argument) to the ->init() and ->exit() callbacks in struct dbs_governor. Those callbacks are only implemented by the ondemand and conservative governors and ondemand doesn't use their second argument at all. In turn, the conservative governor uses it to decide whether or not to either register or unregister a transition notifier. That whole mechanism is not only unnecessarily convoluted, but also racy, because the 'initialized' field of struct cpufreq_governor is updated in cpufreq_init_governor() and cpufreq_exit_governor() under policy->rwsem which doesn't help if one of these functions is run twice in parallel for different policies (which isn't impossible in principle), for example. Instead of it, add a proper usage counter to the conservative governor and update it from cs_init() and cs_exit() which is guaranteed to be non-racy, as those functions are only called under gov_dbs_data_mutex which is global. With that in place, drop the 'initialized' field from struct cpufreq_governor as it is not used any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-02 23:24:39 +02:00
Viresh Kumar	bf2be2de84	cpufreq: governor: Create cpufreq_policy_apply_limits() Create a new helper to avoid code duplication across governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-02 23:24:39 +02:00
Rafael J. Wysocki	e788892ba3	cpufreq: governor: Get rid of governor events The design of the cpufreq governor API is not very straightforward, as struct cpufreq_governor provides only one callback to be invoked from different code paths for different purposes. The purpose it is invoked for is determined by its second "event" argument, causing it to act as a "callback multiplexer" of sorts. Unfortunately, that leads to extra complexity in governors, some of which implement the ->governor() callback as a switch statement that simply checks the event argument and invokes a separate function to handle that specific event. That extra complexity can be eliminated by replacing the all-purpose ->governor() callback with a family of callbacks to carry out specific governor operations: initialization and exit, start and stop and policy limits updates. That also turns out to reduce the code size too, so do it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-02 23:24:15 +02:00
Rafael J. Wysocki	6c9d9c8192	cpufreq: Call cpufreq_disable_fast_switch() in sugov_exit() Due to differences in the cpufreq core's handling of runtime CPU offline and nonboot CPUs disabling during system suspend-to-RAM, fast frequency switching gets disabled after a suspend-to-RAM and resume cycle on all of the nonboot CPUs. To prevent that from happening, move the invocation of cpufreq_disable_fast_switch() from cpufreq_exit_governor() to sugov_exit(), as the schedutil governor is the only user of fast frequency switching today anyway. That simply prevents cpufreq_disable_fast_switch() from being called without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT event (which happens during system suspend now). Fixes: `b7898fda5b` (cpufreq: Support for fast frequency switching) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-08 22:41:36 +02:00
Rafael J. Wysocki	b7898fda5b	cpufreq: Support for fast frequency switching Modify the ACPI cpufreq driver to provide a method for switching CPU frequencies from interrupt context and update the cpufreq core to support that method if available. Introduce a new cpufreq driver callback, ->fast_switch, to be invoked for frequency switching from interrupt context by (future) governors supporting that feature via (new) helper function cpufreq_driver_fast_switch(). Add two new policy flags, fast_switch_possible, to be set by the cpufreq driver if fast frequency switching can be used for the given policy and fast_switch_enabled, to be set by the governor if it is going to use fast frequency switching for the given policy. Also add a helper for setting the latter. Since fast frequency switching is inherently incompatible with cpufreq transition notifiers, make it possible to set the fast_switch_enabled only if there are no transition notifiers already registered and make the registration of new transition notifiers fail if fast_switch_enabled is set for at least one policy. Implement the ->fast_switch callback in the ACPI cpufreq driver and make it set fast_switch_possible during policy initialization as appropriate. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:03 +02:00
Rafael J. Wysocki	379480d825	cpufreq: Move governor symbols to cpufreq.h Move definitions of symbols related to transition latency and sampling rate to include/linux/cpufreq.h so they can be used by (future) goverernors located outside of drivers/cpufreq/. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:02 +02:00
Rafael J. Wysocki	66893b6ac9	cpufreq: Move governor attribute set headers to cpufreq.h Move definitions and function headers related to struct gov_attr_set to include/linux/cpufreq.h so they can be used by (future) goverernors located outside of drivers/cpufreq/. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:02 +02:00
Rafael J. Wysocki	a5acbfbd70	Merge branch 'pm-cpufreq-governor' into pm-cpufreq	2016-03-10 20:46:03 +01:00
Rafael J. Wysocki	adaf9fcd13	cpufreq: Move scheduler-related code to the sched directory Create cpufreq.c under kernel/sched/ and move the cpufreq code related to the scheduler to that file and to sched.h. Redefine cpufreq_update_util() as a static inline function to avoid function calls at its call sites in the scheduler code (as suggested by Peter Zijlstra). Also move the definition of struct update_util_data and declaration of cpufreq_set_update_util_data() from include/linux/cpufreq.h to include/linux/sched.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2016-03-10 20:44:47 +01:00
Viresh Kumar	242aa883a6	cpufreq: Remove 'policy->governor_enabled' The entire sequence of events (like INIT/START or STOP/EXIT) for which cpufreq_governor() is called, is guaranteed to be protected by policy->rwsem now. The additional checks that were added earlier (as we were forced to drop policy->rwsem before calling cpufreq_governor() for EXIT event), aren't required anymore. Over that, they weren't sufficient really. They just take care of START/STOP events, but not INIT/EXIT and the state machine was never maintained properly by them. Kill the unnecessary checks and policy->governor_enabled field. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:12 +01:00
Viresh Kumar	68e80dae09	Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" Earlier, when the struct freq-attr was used to represent governor attributes, the standard cpufreq show/store sysfs attribute callbacks were applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That could have resulted in an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. The previous commit, "cpufreq: governor: New sysfs show/store callbacks for governor tunables", fixed the original ABBA deadlock by adding new governor specific show/store callbacks. We don't have to drop rwsem around invocations of governor event CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now. Fixes: `955ef48335` (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reported-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:59 +01:00
Rafael J. Wysocki	34e2c555f3	cpufreq: Add mechanism for registering utilization update callbacks Introduce a mechanism by which parts of the cpufreq subsystem ("setpolicy" drivers or the core) can register callbacks to be executed from cpufreq_update_util() which is invoked by the scheduler's update_load_avg() on CPU utilization changes. This allows the "setpolicy" drivers to dispense with their timers and do all of the computations they need and frequency/voltage adjustments in the update_load_avg() code path, among other things. The update_load_avg() changes were suggested by Peter Zijlstra. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org>	2016-03-09 14:39:19 +01:00
Rafael J. Wysocki	34b0870515	cpufreq: Simplify the cpufreq_for_each_valid_entry() That macro uses an internal static inline function that is first totally unnecessary and second hard to read, so simplify it and get rid of that monster. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-02-26 22:11:56 +01:00
Rafael J. Wysocki	de1df26b7c	cpufreq: Clean up default and fallback governor setup The preprocessor magic used for setting the default cpufreq governor (and for using the performance governor as a fallback one for that matter) is really nasty, so replace it with __weak functions and overrides. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-02-05 02:37:42 +01:00
Rafael J. Wysocki	7a6c79f2fe	cpufreq: Simplify core code related to boost support Notice that the boost_supported field in struct cpufreq_driver is redundant, because the driver's ->set_boost callback may be left unset if "boost" is not supported. Moreover, the only driver populating the ->set_boost callback is acpi_cpufreq, so make it avoid populating that callback if "boost" is not supported, rework the core to check ->set_boost instead of boost_supported to verify "boost" support and drop boost_supported which isn't used any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-01-01 03:49:51 +01:00
Rafael J. Wysocki	41669da030	cpufreq: Make cpufreq_boost_supported() static cpufreq_boost_supported() is not used outside of cpufreq.c, so make it static. While at it, refactor it as a one-liner (which it really is). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-01-01 03:49:51 +01:00

1 2 3 4 5 ...

284 Commits