The intel_idle_init_cstates_icpu() function includes a loop that iterates
over every C-state. Inside the loop, the same C-state data is referenced 2
ways:
1. as cpuidle_state_table[cstate]
2. as drv->states[drv->state_count] (but it is a copy of #1, not the same
object).
Make the code be more consistent and easier to read by using only the 2nd
way. So the code structure would be as follows:
1. Use cpuidle_state_table[cstate]
2. Copy cpuidle_state_table[cstate] to drv->states[drv->state_count]
3. Use only drv->states[drv->state_count] from this point.
Note, this change introduces a checkpatch.pl warning (too long line), but it
will be addressed in the next patch.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Substitute 'printk()' with 'pr_info()', because 'intel_idle' already uses
'pr_debug()', so using 'pr_info()' will be more consistent.
In addition to this, this patch addresses the following checkpatch.pl
warning:
WARNING: printk() should include KERN_<LEVEL> facility level
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
Karny, Arnd Bergmann, Bagas Sanjaya).
- Drop the custom cpufreq driver for loongson1 that is not necessary
any more and the corresponding cpufreq platform device (Keguang
Zhang).
- Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
entries (Paul E. McKenney).
- Enable thermal cooling for Tegra194 (Yi-Wei Wang).
- Register module device table and add missing compatibles for
cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss).
- Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu
(Christian Marangi).
- Make kobj_type structure in the cpufreq core constant (Thomas
Weißschuh).
- Make cpufreq_unregister_driver() return void (Uwe Kleine-König).
- Make the TEO cpuidle governor check CPU utilization in order to refine
idle state selection (Kajetan Puchalski).
- Make Kconfig select the haltpoll cpuidle governor when the haltpoll
cpuidle driver is selected and replace a default_idle() call in that
driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing).
- Add Emerald Rapids Xeon support to the intel_idle driver (Artem
Bityutskiy).
- Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
avoid randconfig build failures (Arnd Bergmann).
- Make kobj_type structures used in the cpuidle sysfs interface
constant (Thomas Weißschuh).
- Make the cpuidle driver registration code update microsecond values
of idle state parameters in accordance with their nanosecond values
if they are provided (Rafael Wysocki).
- Make the PSCI cpuidle driver prevent topology CPUs from being
suspended on PREEMPT_RT (Krzysztof Kozlowski).
- Document that pm_runtime_force_suspend() cannot be used with
DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald).
- Add EXPORT macros for exporting PM functions from drivers (Richard
Fitzgerald).
- Remove /** from non-kernel-doc comments in hibernation code (Randy
Dunlap).
- Fix possible name leak in powercap_register_zone() (Yang Yingliang).
- Add Meteor Lake and Emerald Rapids support to the intel_rapl power
capping driver (Zhang Rui).
- Modify the idle_inject power capping facility to support 100% idle
injection (Srinivas Pandruvada).
- Fix large time windows handling in the intel_rapl power capping
driver (Zhang Rui).
- Fix memory leaks with using debugfs_lookup() in the generic PM
domains and Energy Model code (Greg Kroah-Hartman).
- Add missing 'cache-unified' property in the example for kryo OPP
bindings (Rob Herring).
- Fix error checking in opp_migrate_dentry() (Qi Zheng).
- Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
Dybcio).
- Modify some power management utilities to use the canonical ftrace
path (Ross Zwisler).
- Correct spelling problems for Documentation/power/ as reported by
codespell (Randy Dunlap).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPuJfMSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx/5kQAJNOVImLEPLerLP8xufw30//LuDU5Gi0
STsyDOMql/I2MpkeqeCcgrSbpy6NlEglOvg16gfpQ3qqTCLF9ypENxs9E5BGGvW0
aEdCzvaoqmvi9PCr/jmj0EPP70/U+rIX5m/k0QdjLh9x0aLoAEe3uRJTfR9QVqXf
I7JX0N9kjKi7YxpA5DlkHrS7J7GPPiWlesJ3p4wXuHMo3jf+6fgkoPFt8yRrGWeh
AHzGT2BLrsy7aAUjGZB65Qx9q3fnSXMmXOjmn0Xh2njQah+zRZDwrNzwoY2HTLL/
KQ6/Ww16USYRZtCS1fmGwAj9I+ddq6AOvhPCMn0vLXXmKVAMUrVVWnQS/0+vpm9y
suUMK9Tndkgxd1vjby2246ThJn27uDd/ERFan4ouQo2j22uICY+SDo3osj2hMXka
wq4zthXkY8KgjZ+MuXnZxPhcOvo8KRvfxAU0fy5efQnSkbtwY9UlMvjPBMBHm/RA
21/6kjQNtq5vMmI37oC8DH+oPrRQ7sUKuY7HNqwO9P3QNKWVmNe7cF5UtXXxME7Q
ULvP1d+u+TNNdHFLryPwCSzBO34wQEccdRZBjalZ8tBe6JiDWUFHC3giSURZSuzZ
GDvzVaNX6PkgToyv4inBTB8lTp6pAuUjaWNvNJzVvUXiEKHB0ihzg5vpJW5NdwlH
15Tn8cjH7pp0
=lZLx
-----END PGP SIGNATURE-----
Merge tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add EPP support to the AMD P-state cpufreq driver, add support
for new platforms to the Intel RAPL power capping driver, intel_idle
and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
drop the custom cpufreq driver for loongson1 that is not necessary any
more (and the corresponding cpufreq platform device), fix assorted
issues and clean up code.
Specifics:
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
Karny, Arnd Bergmann, Bagas Sanjaya)
- Drop the custom cpufreq driver for loongson1 that is not necessary
any more and the corresponding cpufreq platform device (Keguang
Zhang)
- Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
entries (Paul E. McKenney)
- Enable thermal cooling for Tegra194 (Yi-Wei Wang)
- Register module device table and add missing compatibles for
cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)
- Various dt binding updates for qcom-cpufreq-nvmem and
opp-v2-kryo-cpu (Christian Marangi)
- Make kobj_type structure in the cpufreq core constant (Thomas
Weißschuh)
- Make cpufreq_unregister_driver() return void (Uwe Kleine-König)
- Make the TEO cpuidle governor check CPU utilization in order to
refine idle state selection (Kajetan Puchalski)
- Make Kconfig select the haltpoll cpuidle governor when the haltpoll
cpuidle driver is selected and replace a default_idle() call in
that driver with arch_cpu_idle() to allow MWAIT to be used (Li
RongQing)
- Add Emerald Rapids Xeon support to the intel_idle driver (Artem
Bityutskiy)
- Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
avoid randconfig build failures (Arnd Bergmann)
- Make kobj_type structures used in the cpuidle sysfs interface
constant (Thomas Weißschuh)
- Make the cpuidle driver registration code update microsecond values
of idle state parameters in accordance with their nanosecond values
if they are provided (Rafael Wysocki)
- Make the PSCI cpuidle driver prevent topology CPUs from being
suspended on PREEMPT_RT (Krzysztof Kozlowski)
- Document that pm_runtime_force_suspend() cannot be used with
DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)
- Add EXPORT macros for exporting PM functions from drivers (Richard
Fitzgerald)
- Remove /** from non-kernel-doc comments in hibernation code (Randy
Dunlap)
- Fix possible name leak in powercap_register_zone() (Yang Yingliang)
- Add Meteor Lake and Emerald Rapids support to the intel_rapl power
capping driver (Zhang Rui)
- Modify the idle_inject power capping facility to support 100% idle
injection (Srinivas Pandruvada)
- Fix large time windows handling in the intel_rapl power capping
driver (Zhang Rui)
- Fix memory leaks with using debugfs_lookup() in the generic PM
domains and Energy Model code (Greg Kroah-Hartman)
- Add missing 'cache-unified' property in the example for kryo OPP
bindings (Rob Herring)
- Fix error checking in opp_migrate_dentry() (Qi Zheng)
- Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
Dybcio)
- Modify some power management utilities to use the canonical ftrace
path (Ross Zwisler)
- Correct spelling problems for Documentation/power/ as reported by
codespell (Randy Dunlap)"
* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
Documentation: amd-pstate: disambiguate user space sections
cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
PM: Add EXPORT macros for exporting PM functions
cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
MIPS: loongson32: Drop obsolete cpufreq platform device
powercap: intel_rapl: Fix handling for large time window
cpuidle: driver: Update microsecond values of state parameters as needed
cpuidle: sysfs: make kobj_type structures constant
cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
PM: EM: fix memory leak with using debugfs_lookup()
PM: domains: fix memory leak with using debugfs_lookup()
cpufreq: Make kobj_type structure constant
cpufreq: davinci: Fix clk use after free
cpufreq: amd-pstate: avoid uninitialized variable use
cpufreq: Make cpufreq_unregister_driver() return void
OPP: fix error checking in opp_migrate_dentry()
dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
...
Emerald Rapids (EMR) is the next Intel Xeon processor after Sapphire
Rapids (SPR).
EMR C-states are the same as SPR C-states, and we expect that EMR
C-state characteristics (latency and target residency) will be the
same as in SPR. Therefore, add EMR support by using SPR C-states table.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
So objtool found this bug:
vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to trace_hardirqs_off() leaves .noinstr.text section
As per commit 32d4fd5751 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE"):
"must not have tracing in idle functions"
Clearly people can't read and tinker along until splat dissapears.
This straight up reverts commit d295ad34f2 ("intel_idle: Fix false
positive RCU splats due to incorrect hardirqs state").
It doesn't re-introduce the problem because preceding patches fixed it
properly.
Fixes: d295ad34f2 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.434302128@infradead.org
Similar to the other other AlderLake platforms, the C1 and C1E states on
ADL-N are mutually exclusive. Only one of them can be enabled at a time.
C1E is preferred on ADL-N for better energy efficiency.
C6S is also supported on this platform. Its latency is far bigger than
C6, but really close to C8 (PC8), thus it is not exposed as a separate
state.
Suggested-by: Baieswara Reddy Sagili <baieswara.reddy.sagili@intel.com>
Suggested-by: Vinay Kumar <vinay.kumar@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq
sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on
Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq policy
which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq
driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on
Sapphire Rapids (Artem Bityutskiy).
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
- Add new devfreq driver for Mediatek CCI (Cache Coherent
Interconnect) (Johnson Wang).
- Convert the Samsung Exynos SoC Bus bindings to DT schema of
exynos-bus.c (Krzysztof Kozlowski).
- Address kernel-doc warnings by adding the description for unused
fucntion parameters in devfreq core (Mauro Carvalho Chehab).
- Use NULL to pass a null pointer rather than zero according to the
function propotype in imx-bus.c (Colin Ian King).
- Print error message instead of error interger value in
tegra30-devfreq.c (Dmitry Osipenko).
- Add checks to prevent setting negative frequency QoS limits for
CPUs (Shivnandan Kumar).
- Update the pm-graph suite of utilities to the latest revision 5.9
including multiple improvements (Todd Brandt).
- Drop pme_interrupt reference from the PCI power management
documentation (Mario Limonciello).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLoKy8SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx3+oQAJNVU+W14EaRPWXQRMuwBC5zk3hb6T9q
JqmMd8coEd+9/4ABAeRAWso1B26rUzB6JyBvw3lGH9OXInpYmvnJEhEPrTpK2h0D
U9HxEARuGJolrDm0X9NAkn7tKKMC9GnvPS9z2s7s+N97VFFWC/QiU+PHB0SypGNb
JxRfbVJZQCuxmNG9UeK+xeHFQ9lM2Z9ZdTxR71G0n7nQPPR+sUvnFufFby3Aogf3
XnBYfia+YNqkUlefxxwB5a0cFwPXOUGsQkIf4d64gZnq1TgZ+71kht1GEF08PDFl
wV8v1rOWuXEae8dozuf5xszp/eVyAqzgB+IShT9APREOO3Wg6I16XdBm8R1TGwCK
JTdZqnm6HVKBNqchEwYViJILX69rrNUT+AwHBWhtKKDNh3qeTuwi/JGTeDVN++en
xf3TNKx3LV31Nq6nWJFzDGLehfZMnAPkhfYohUBI7FNyblpk4mJRVcZ0bYI7UNnS
als77uoipvb5KdFCtdhxYBHd/y867NvXKa1qsAuDxusAsfJHf4SnlMdbgOepBH2y
jJg06CGrMDU3TZ8BL+WpqUYk4irQnAMs/159Txh7A6/dOnTjE7S9NHrENCwmt2og
FrHSLH1eLX6Sa4RSibiGHPC7mNULP2/TOtryf3zFdlIVcjm3NEU3bnfzx7nlJn05
8t6ObMxgMhWT
=XeLV
-----END PGP SIGNATURE-----
Merge tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These are mostly minor improvements all over including new CPU IDs for
the Intel RAPL driver, an Energy Model rework to use micro-Watt as the
power unit, cpufreq fixes and cleanus, cpuidle updates, devfreq
updates, documentation cleanups and a new version of the pm-graph
suite of utilities.
Specifics:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq
sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on
Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq
policy which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq
driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on
Sapphire Rapids (Artem Bityutskiy).
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
- Add new devfreq driver for Mediatek CCI (Cache Coherent
Interconnect) (Johnson Wang).
- Convert the Samsung Exynos SoC Bus bindings to DT schema of
exynos-bus.c (Krzysztof Kozlowski).
- Address kernel-doc warnings by adding the description for unused
function parameters in devfreq core (Mauro Carvalho Chehab).
- Use NULL to pass a null pointer rather than zero according to the
function propotype in imx-bus.c (Colin Ian King).
- Print error message instead of error interger value in
tegra30-devfreq.c (Dmitry Osipenko).
- Add checks to prevent setting negative frequency QoS limits for
CPUs (Shivnandan Kumar).
- Update the pm-graph suite of utilities to the latest revision 5.9
including multiple improvements (Todd Brandt).
- Drop pme_interrupt reference from the PCI power management
documentation (Mario Limonciello)"
* tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits)
powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
PM: QoS: Add check to make sure CPU freq is non-negative
PM: hibernate: defer device probing when resuming from hibernation
intel_idle: make SPR C1 and C1E be independent
cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask
cpufreq: loongson2: fix Kconfig "its" grammar
pm-graph v5.9
cpufreq: Warn users while freeing active policy
cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
firmware: arm_scmi: Get detailed power scale from perf
Documentation: EM: Switch to micro-Watts scale
PM: EM: convert power field to micro-Watts precision and align drivers
PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
PM / devfreq: shut up kernel-doc warnings
dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
dt-bindings: interconnect: Add MediaTek CCI dt-bindings
PM: domains: Ensure genpd_debugfs_dir exists before remove
PM: runtime: Extend support for wakeirq for force_suspend|resume
...
be able to enter deeper low-power state
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLnsksACgkQEsHwGGHe
VUpOOw//WAfkouWFd7kmACSiWtkgEQfXgImhhM7tw5Zzks+aEMtL2RrKqFYzkFg5
hJK+lMI8QDkBFU/bgI/nAZfFiAS7iBMPY4T2Uw4+jZCPLr3TmUheJ2Pe1CxlIzQC
MfjXQm/j5uTZcB2jEORjPT5dVE3p6k1KpSbvf5ZKCc9YTwdylv3VeYcfv5WEkihR
61bWU+T7Yse4A3Bx32ewabLmk7lwOcdS1vbfsqdvkpI1vE1gI8CThgTuNAt8JWij
27GIxiF2BQkyw3d/IPt3wGIPOgVowISXWdtMgpCr17Mw1m+44vXG9cjSuAKfqAUY
wNXrBzirdqzJgN85WVJEFIoJasFJicrz/oNLYbcHQa8+AruRu6in22cSkPYPvVGc
iNgSlQOZdoY9Vl6izEV4OawCccYnKjskEW7nEVIqfENrwRPYWB/IAnGxkla7q3Ch
q+T8dyOAWToumuPK13c5VoX0nd02bfwSJACYRxN+M22zq8s7+Jv1fNtQeAGLnmD1
jG3HR0wJWBOVVyira7AbFI7Mx667HayslIesftEGU33FfY0gZTcwZ7jsZ9GTSyOi
AgHN3PvHyJYQ648T8JzbyuNJe3dyDKf81OLaPHP6+nV9Dy3aCrERTML0jo8xWv2N
rDA61BV/q+hdQS3vzmLRVPzLLZksGRNCS2ZzIbkR4dGxLQAAB2M=
=w/wH
-----END PGP SIGNATURE-----
Merge tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu update from Borislav Petkov:
- Add machinery to initialize AMX register state in order for
AMX-capable CPUs to be able to enter deeper low-power state
* tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Add a new flag to initialize the AMX state
x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
This patch partially reverts the changes made by the following commit:
da0e58c038 intel_idle: add 'preferred_cstates' module argument
As that commit describes, on early Sapphire Rapids Xeon platforms the C1 and
C1E states were mutually exclusive, so that users could only have either C1 and
C6, or C1E and C6.
However, Intel firmware engineers managed to remove this limitation and make C1
and C1E to be completely independent, just like on previous Xeon platforms.
Therefore, this patch:
* Removes commentary describing the old, and now non-existing SPR C1E
limitation.
* Marks SPR C1E as available by default.
* Removes the 'preferred_cstates' parameter handling for SPR. Both C1 and
C1E will be available regardless of 'preferred_cstates' value.
We expect that all SPR systems are shipping with new firmware, which includes
the C1/C1E improvement.
Cc: v5.18+ <stable@vger.kernel.org> # v5.18+
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 32d4fd5751 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
uses raw_local_irq_enable/local_irq_disable() around call to
__intel_idle() in intel_idle_irq().
With interrupt enabled, timer tick interrupt can happen and a
subsequently call to __do_softirq() may change the lockdep hardirqs state
of a debug kernel back to 'on'. This will result in a mismatch between
the cpu hardirqs state (off) and the lockdep hardirqs state (on) causing
a number of false positive "WARNING: suspicious RCU usage" splats.
Fix that by using local_irq_disable() to disable interrupt in
intel_idle_irq().
Fixes: 32d4fd5751 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The non-initialized AMX state can be the cause of C-state demotion from C6
to C1E. This low-power idle state may improve power savings and thus result
in a higher available turbo frequency budget.
This behavior is implementation-specific. Initialize the state for the C6
entrance of Sapphire Rapids as needed.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/20220614164116.5196-1-chang.seok.bae@intel.com
Having IBRS enabled while the SMT sibling is idle unnecessarily slows
down the running sibling. OTOH, disabling IBRS around idle takes two
MSR writes, which will increase the idle latency.
Therefore, only disable IBRS around deeper idle states. Shallow idle
states are bounded by the tick in duration, since NOHZ is not allowed
for them by virtue of their short target residency.
Only do this for mwait-driven idle, since that keeps interrupts disabled
across idle, which makes disabling IBRS vs IRQ-entry a non-issue.
Note: C6 is a random threshold, most importantly C1 probably shouldn't
disable IBRS, benchmarking needed.
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit c227233ad6 ("intel_idle: enable interrupts before C1 on
Xeons") wrecked intel_idle in two ways:
- must not have tracing in idle functions
- must return with IRQs disabled
Additionally, it added a branch for no good reason.
Fixes: c227233ad6 ("intel_idle: enable interrupts before C1 on Xeons")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ rjw: Moved the intel_idle() kerneldoc comment next to the function ]
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Similar to SPR, the C1 and C1E states on ADL are mutually exclusive.
Only one of them can be enabled at a time.
But contrast to SPR, which usually has a strong latency requirement
as a Xeon processor, C1E is preferred on ADL for better energy
efficiency.
Add custom C-state tables for ADL with both C1 and C1E, and
1. Enable the "C1E promotion" bit in MSR_IA32_POWER_CTL and mark C1
with the CPUIDLE_FLAG_UNUSABLE flag, so C1 is not available by
default.
2. Add support for the "preferred_cstates" module parameter, so that
users can choose to use C1 instead of C1E by booting with
"intel_idle.preferred_cstates=2".
Separate custom C-state tables are introduced for the ADL mobile and
desktop processors, because of the exit latency differences between
these two variants, especially with respect to PC10.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits, code rearrangement ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Sapphire Rapids (SPR) C6 optimization was added to the end of the
'spr_idle_state_table_update()' function. However, the function has a
'return' which may happen before the optimization has a chance to run.
And this may prevent the optimization from happening.
This is an unlikely scenario, but possible if user boots with, say,
the 'intel_idle.preferred_cstates=6' kernel boot option.
This patch fixes the issue by eliminating the problematic 'return'
statement.
Fixes: 3a9cf77b60 ("intel_idle: add core C6 optimization for SPR")
Suggested-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Problem description.
When user boots kernel up with the 'intel_idle.preferred_cstates=4' option,
we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order
for C1E to work on SPR, we have to enable the C1E promotion bit on all
CPUs. However, we enable it only on one CPU.
Fix description.
The 'intel_idle' driver already has the infrastructure for disabling C1E
promotion on every CPU. This patch uses the same infrastructure for
enabling C1E promotion on every CPU. It changes the boolean
'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion'
variable.
Tested on a 2-socket SPR system. I verified the following combinations:
* C1E promotion enabled and disabled in BIOS.
* Booted with and without the 'intel_idle.preferred_cstates=4' kernel
argument.
In all 4 cases C1E promotion was correctly set on all CPUs.
Also tested on an old Broadwell system, just to make sure it does not cause
a regression. C1E promotion was correctly disabled on that system, both C1
and C1E were exposed (as expected).
Fixes: da0e58c038 ("intel_idle: add 'preferred_cstates' module argument")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Drop a redundant backslash character at the end of a line in the
spr_cstates[] definition.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Commit bf9282dc26 ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic")
moved the leave_mm() call away from intel_idle(), but it didn't update
its kerneldoc comment accordingly, so do that now.
Fixes: bf9282dc26 ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a Sapphire Rapids Xeon C6 optimization, similar to what we have for Sky Lake
Xeon: if package C6 is disabled, adjust C6 exit latency and target residency to
match core C6 values, instead of using the default package C6 values.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
exclusive - only one of them can be enabled. By default, 'intel_idle' driver
enables C1 and disables C1E. However, some users prefer to use C1E instead of
C1, because it saves more energy.
This patch adds a new module parameter ('preferred_cstates') for enabling C1E
and disabling C1. Here is the idea behind it.
1. This option has effect only for "mutually exclusive" C-states like C1 and
C1E on SPR.
2. It does not have any effect on independent C-states, which do not require
other C-states to be disabled (most states on most platforms as of today).
3. For mutually exclusive C-states, the 'intel_idle' driver always has a
reasonable default, such as enabling C1 on SPR by default. On other
platforms, the default may be different.
4. Users can override the default using the 'preferred_cstates' parameter.
5. The parameter accepts the preferred C-states bit-mask, similarly to the
existing 'states_off' parameter.
6. This parameter is not limited to C1/C1E, and leaves room for supporting
other mutually exclusive C-states, if they come in the future.
Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
to disable C1 and enable C1E, users should boot with the following kernel
argument: intel_idle.preferred_cstates=4
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Sapphire Rapids Xeon support.
Up until very recently, the C1 and C1E C-states were independent, but this
has changed in some new chips, including Sapphire Rapids Xeon (SPR). In these
chips the C1 and C1E states cannot be enabled at the same time. The "C1E
promotion" bit in 'MSR_IA32_POWER_CTL' also has its semantics changed a bit.
Here are the C1, C1E, and "C1E promotion" bit rules on Xeons before SPR.
1. If C1E promotion bit is disabled.
a. C1 requests end up with C1 C-state.
b. C1E requests end up with C1E C-state.
2. If C1E promotion bit is enabled.
a. C1 requests end up with C1E C-state.
b. C1E requests end up with C1E C-state.
Here are the C1, C1E, and "C1E promotion" bit rules on Sapphire Rapids Xeon.
1. If C1E promotion bit is disabled.
a. C1 requests end up with C1 C-state.
b. C1E requests end up with C1 C-state.
2. If C1E promotion bit is enabled.
a. C1 requests end up with C1E C-state.
b. C1E requests end up with C1E C-state.
Before SPR Xeon, the 'intel_idle' driver was disabling C1E promotion and was
exposing C1 and C1E as independent C-states. But on SPR, C1 and C1E cannot be
enabled at the same time.
This patch adds both C1 and C1E states. However, C1E is marked as with the
"CPUIDLE_FLAG_UNUSABLE" flag, which means that in won't be registered by
default. The C1E promotion bit will be cleared, which means that by default
only C1 and C6 will be registered on SPR.
The next patch will add an option for enabling C1E and disabling C1 on SPR.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Enable local interrupts before requesting C1 on the last two generations
of Intel Xeon platforms: Sky Lake, Cascade Lake, Cooper Lake, Ice Lake.
This decreases average C1 interrupt latency by about 5-10%, as measured
with the 'wult' tool.
The '->enter()' function of the driver enters C-states with local
interrupts disabled by executing the 'monitor' and 'mwait' pair of
instructions. If an interrupt happens, the CPU exits the C-state and
continues executing instructions after 'mwait'. It does not jump to
the interrupt handler, because local interrupts are disabled. The
cpuidle subsystem enables interrupts a bit later, after doing some
housekeeping.
With this patch, we enable local interrupts before requesting C1. In
this case, if the CPU wakes up because of an interrupt, it will jump
to the interrupt handler right away. The cpuidle housekeeping will be
done after the pending interrupt(s) are handled.
Enabling interrupts before entering a C-state has measurable impact
for faster C-states, like C1. Deeper, but slower C-states like C6 do
not really benefit from this sort of change, because their latency is
a lot higher comparing to the delay added by cpuidle housekeeping.
This change was also tested with cyclictest and dbench. In case of Ice
Lake, the average cyclictest latency decreased by 5.1%, and the average
'dbench' throughput increased by about 0.8%. Both tests were run for 4
hours with only C1 enabled (all other idle states, including 'POLL',
were disabled). CPU frequency was pinned to HFM, and uncore frequency
was pinned to the maximum value. The other platforms had similar
single-digit percentage improvements.
It is worth noting that this patch affects 'cpuidle' statistics a tiny
bit. Before this patch, C1 residency did not include the interrupt
handling time, but with this patch, it will include it. This is similar
to what happens in case of the 'POLL' state, which also runs with
interrupts enabled.
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Because cpuidle assumes worst-case C-state parameters, PC6 parameters
are used for describing C6, which is worst-case for requesting CC6.
When PC6 is enabled, this is appropriate. But if PC6 is disabled
in the BIOS, the exit latency and target residency should be adjusted
accordingly.
Exit latency:
Previously the C6 exit latency was measured as the PC6 exit latency.
With PC6 disabled, the C6 exit latency should be the one of CC6.
Target residency:
With PC6 disabled, the idle duration within [CC6, PC6) would make the
idle governor choose C1E over C6. This would cause low energy-efficiency.
We should lower the bar to request C6 when PC6 is disabled.
To fill this gap, check if PC6 is disabled in the BIOS in the
MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
for C6 and set target_residency to 3 times of the new exit latency. [This
is consistent with how intel_idle driver uses _CST to calculate the
target_residency.] As a result, the OS would be more likely to choose C6
over C1E when PC6 is disabled, which is reasonable, because if C6 is
enabled, it implies that the user cares about energy, so choosing C6 more
frequently makes sense.
The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
model number as SkX, but their CC6 exit latencies are similar to the SKX
one, 96us and 89us respectively, so reuse the SKX value for them.
There is a concern that it might be better to use a more generic approach
instead of optimizing every platform. However, if the required code
complexity and different PC6 bit interpretation on different platforms
are taken into account, tuning the code per platform seems to be an
acceptable tradeoff.
Link: https://intel.github.io/wult/ # [1]
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch adds Icelake Xeon D support to the intel_idle driver.
Since Icelake D and Icelake SP C-state characteristics the same,
we use Icelake SP C-states table for Icelake D as well.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change IceLake Xeon C6 latency from 128 us to 170 us. The latency
was measured with the "wult" tool and corresponds to the 99.99th
percentile when measuring with the "nic" method. Note, the 128 us
figure correspond to the median latency, but in intel_idle we use
the "worst case" latency figure instead.
C6 target residency was increased from 384 us to 600 us, which may
result in less C6 residency in some workloads. This value was tested
and compared to values 384, and 1000. Value 600 is a reasonable
tradeoff between power and performance.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Defining DEBUG should only be done in development.
So remove DEBUG.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.
The following has been changed.
1. C1E latency changed from 10us to 15us. It was measured using the
open source "wult" tool (the "nic" method, 15us is the 99.99th
percentile).
2. C1E power break even changed from 20us to 25us, which may result
in less C1E residency in some workloads.
3. C6 latency changed from 50us to 130us. Measured the same way as C1E.
The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.
On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpuidle->enter() callbacks should not call into tracing because RCU
has already been disabled. Instead of doing the broadcast thing
itself, simply advertise to the cpuidle core that those states stop
the timer.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20201123143510.GR3021@hirez.programming.kicks-ass.net
Currently intel_idle driver gets the c-state information from ACPI
_CST if the processor model is not recognized by it. However the
c-state in _CST starts with index 1 which is different from the
index in intel_idle driver's internal c-state table.
While intel_idle_max_cstate_reached() was previously introduced to
deal with intel_idle driver's internal c-state table, re-using
this function directly on _CST is incorrect.
Fix this by subtracting 1 from the index when checking max_cstate
in the _CST case.
For example, append intel_idle.max_cstate=1 in boot command line,
Before the patch:
grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
POLL
After the patch:
grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
Fixes: 18734958e9 ("intel_idle: Use ACPI _CST for processor models without C-state tables")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
e6d4f08a67 ("intel_idle: Use ACPI _CST on server systems") avoids
enabling c-states that have been disabled by the platform with the
exception of C1E.
Unfortunately, BIOS implementations are not always consistent in terms
of how capabilities are advertised and control cannot always be handed
over. If control cannot be handed over then intel_idle reports that "ACPI
_CST not found or not usable" but does not clear acpi_state_table.count
meaning the information is still partially used.
This patch ignores ACPI information if CST control cannot be requested from
the platform. This was only observed on a number of Haswell platforms that
had identical CPUs but not identical BIOS versions. While this problem
may be rare overall, 24 separate test cases bisected to this specific
commit across 4 separate test machines and is worth addressing. If the
situation occurs, the kernel behaves as it did before commit e6d4f08a67
and uses any c-states that are discovered.
The affected test cases were all ones that involved a small number of
processes -- exec microbenchmark, pipe microbenchmark, git test suite,
netperf, tbench with one client and system call microbenchmark. Each
case benefits from being able to use turboboost which is prevented if the
lower c-states are unavailable. This may mask real regressions specific
to older hardware so it is worth addressing.
C-state status before and after the patch
5.9.0-vanilla POLL latency:0 disabled:0 default:enabled
5.9.0-vanilla C1 latency:2 disabled:0 default:enabled
5.9.0-vanilla C1E latency:10 disabled:0 default:enabled
5.9.0-vanilla C3 latency:33 disabled:1 default:disabled
5.9.0-vanilla C6 latency:133 disabled:1 default:disabled
5.9.0-ignore-cst-v1r1 POLL latency:0 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C1 latency:2 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C1E latency:10 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C3 latency:33 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C6 latency:133 disabled:0 default:enabled
Patch enables C3/C6.
Netperf UDP_STREAM
netperf-udp
5.5.0 5.9.0
vanilla ignore-cst-v1r1
Hmean send-64 193.41 ( 0.00%) 226.54 * 17.13%*
Hmean send-128 392.16 ( 0.00%) 450.54 * 14.89%*
Hmean send-256 769.94 ( 0.00%) 881.85 * 14.53%*
Hmean send-1024 2994.21 ( 0.00%) 3468.95 * 15.85%*
Hmean send-2048 5725.60 ( 0.00%) 6628.99 * 15.78%*
Hmean send-3312 8468.36 ( 0.00%) 10288.02 * 21.49%*
Hmean send-4096 10135.46 ( 0.00%) 12387.57 * 22.22%*
Hmean send-8192 17142.07 ( 0.00%) 19748.11 * 15.20%*
Hmean send-16384 28539.71 ( 0.00%) 30084.45 * 5.41%*
Fixes: e6d4f08a67 ("intel_idle: Use ACPI _CST on server systems")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Intel SDM does not explicitly say that entering a C-state via MWAIT will
implicitly flush CPU caches as appropriate for that C-state. However,
documentation for individual Intel CPU generations does mention this
behavior.
Since intel_idle binds to any Intel CPU with MWAIT, list this assumption
of MWAIT behavior.
In passing, reword opening comment to make it clear that the driver can
load on any old and future Intel CPU with MWAIT.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This allows moving the leave_mm() call into generic code before
rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
On ICX platform, the C1E auto-promotion is enabled by default.
As a result, the CPU might fall into C1E more offen than previous
platforms. Besides, the C1E is not exposed to sysfs on ICX, which
is inconsistent with previous server platforms.
So disable C1E auto-promotion and expose C1E as a separate idle
state, so the C1E and C6 can be disabled via sysfs when necessary.
Beside C1 and C1E, the exit latency of C6 was measured
by a dedicated tool. However the exit latency(41us) exposed
by _CST is much smaller than the one we measured(128us). This
is probably due to the _CST uses the exit latency when woken
up from PC0+C6, rather than PC6+C6 when C6 was measured. Choose
the latter as we need the longest latency in theory.
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Control Flow Integrity(CFI) is a security mechanism that disallows
changes to the original control flow graph of a compiled binary,
making it significantly harder to perform such attacks.
init_state_node() assign same function callback to different
function pointer declarations.
static int init_state_node(struct cpuidle_state *idle_state,
const struct of_device_id *matches,
struct device_node *state_node) { ...
idle_state->enter = match_id->data; ...
idle_state->enter_s2idle = match_id->data; }
Function declarations:
struct cpuidle_state { ...
int (*enter) (struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index);
void (*enter_s2idle) (struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index); };
In this case, either enter() or enter_s2idle() would cause CFI check
failed since they use same callee.
Align function prototype of enter() since it needs return value for
some use cases. The return value of enter_s2idle() is no
need currently.
Signed-off-by: Neal Liu <neal.liu@mediatek.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.
In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:
git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'
drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.
No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.
[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
The value of the lapic_timer_always_reliable static variable in
the intel_idle driver reflects the boot_cpu_has(X86_FEATURE_ARAT)
value and so it also reflects the static_cpu_has(X86_FEATURE_ARAT)
value.
Hence, the lapic_timer_always_reliable check in intel_idle() is
redundant and apart from this lapic_timer_always_reliable is only
used in two places in which boot_cpu_has(X86_FEATURE_ARAT) can be
used directly.
Eliminate the lapic_timer_always_reliable variable in accordance
with the above observations.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull perf updates from Ingo Molnar:
"The main changes in this cycle were:
Kernel side changes:
- A couple of x86/cpu cleanups and changes were grandfathered in due
to patch dependencies. These clean up the set of CPU model/family
matching macros with a consistent namespace and C99 initializer
style.
- A bunch of updates to various low level PMU drivers:
* AMD Family 19h L3 uncore PMU
* Intel Tiger Lake uncore support
* misc fixes to LBR TOS sampling
- optprobe fixes
- perf/cgroup: optimize cgroup event sched-in processing
- misc cleanups and fixes
Tooling side changes are to:
- perf {annotate,expr,record,report,stat,test}
- perl scripting
- libapi, libperf and libtraceevent
- vendor events on Intel and S390, ARM cs-etm
- Intel PT updates
- Documentation changes and updates to core facilities
- misc cleanups, fixes and other enhancements"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
cpufreq/intel_pstate: Fix wrong macro conversion
x86/cpu: Cleanup the now unused CPU match macros
hwrng: via_rng: Convert to new X86 CPU match macros
crypto: Convert to new CPU match macros
ASoC: Intel: Convert to new X86 CPU match macros
powercap/intel_rapl: Convert to new X86 CPU match macros
PCI: intel-mid: Convert to new X86 CPU match macros
mmc: sdhci-acpi: Convert to new X86 CPU match macros
intel_idle: Convert to new X86 CPU match macros
extcon: axp288: Convert to new X86 CPU match macros
thermal: Convert to new X86 CPU match macros
hwmon: Convert to new X86 CPU match macros
platform/x86: Convert to new CPU match macros
EDAC: Convert to new X86 CPU match macros
cpufreq: Convert to new X86 CPU match macros
ACPI: Convert to new X86 CPU match macros
x86/platform: Convert to new CPU match macros
x86/kernel: Convert to new CPU match macros
x86/kvm: Convert to new CPU match macros
x86/perf/events: Convert to new CPU match macros
...
The new macro set has a consistent namespace and uses C99 initializers
instead of the grufty C89 ones.
Get rid the of the local macro wrappers for consistency.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131510.193755545@linutronix.de
Update the copyright notice in intel_idle.c to cover the recent
changes, drop the description of a "known limitation" that is not
a limitation any more and bump up the driver version number.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use the BIT() macro for defining CPUIDLE_FLAG_TLB_FLUSHED instead of
the hex bit encoding of the same value.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Turn the description comments of some functions in the intel_idle
driver into proper kerneldoc ones and clean them up.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reorder declarations of static variables so that the __initdata ones
are declared together.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add __initdata or __initconst annotations to the static data
structures that are only used during the initialization of the
driver.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Annotate static variables cpuidle_state_table and mwait_substates
with __initdata, because they are only used during the initialization
of the driver.
Also notice that static variable icpu could be annotated analogously
and the structure pointed to by it could be __initconst, but two of
its fields are accessed via icpu in intel_idle_cpu_init() and
auto_demotion_disable(), so introduce two new static variables,
auto_demotion_disable_flags and disable_promotion_to_c1e, to hold
the values of these fields, set them during the initialization and
use them in those functions instead of accessing the source data
structure via icpu.
That allows icpu to be annotated with __initdata, so do that,
and it will also allow some __initconst annotations to be added
subsequently.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move the definitions of intel_idle() and intel_idle_s2idle() before
the definitions of cpuidle_state structures referring to them to
avoid having to use additional declarations of them (and drop those
declarations).
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add proper kerneldoc descriptions to intel_idle() and
intel_idle_s2idle(), annotate the latter with __cpuidle and
reorder the declarations of local variables in both of them to
reflect the mwait_idle_with_hints() arguments order.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The lapic_timer_always_reliable variable really takes only two values
and some arithmetic in intel_idle() related to comparing it with the
target C-state's MWAIT hint value is unnecessary.
Simplify the code by replacing lapic_timer_always_reliable with
a bool variable lapic_timer_always_reliable and dropping the
LAPIC_TIMER_ALWAYS_RELIABLE symbol along with the excess
computations in intel_idle().
While at it, add a comment explaining the branch taken in intel_idle()
if the LAPIC timer is only reliable in C1 and modify the related debug
message in intel_idle_init() accordingly (the modification of this
message in the only expected functional impact of the change made
here).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In certain system configurations it may not be desirable to use some
C-states assumed to be available by intel_idle and the driver needs
to be prevented from using them even before the cpuidle sysfs
interface becomes accessible to user space. Currently, the only way
to achieve that is by setting the 'max_cstate' module parameter to a
value lower than the index of the shallowest of the C-states in
question, but that may be overly intrusive, because it effectively
makes all of the idle states deeper than the 'max_cstate' one go
away (and the C-state to avoid may be in the middle of the range
normally regarded as available).
To allow that limitation to be overcome, introduce a new module
parameter called 'states_off' to represent a list of idle states to
be disabled by default in the form of a bitmask and update the
documentation to cover it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For diagnostics, it is generally useful to be able to make intel_idle
take the system's ACPI tables into consideration even if that is not
required for the processor model in there, so introduce a new module
parameter, 'use_acpi', to make that happen and update the documentation
to cover it.
While at it, fix the 'no_acpi' module parameter name in the
documentation.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull x86 cpu-features updates from Ingo Molnar:
"The biggest change in this cycle was a large series from Sean
Christopherson to clean up the handling of VMX features. This both
fixes bugs/inconsistencies and makes the code more coherent and
future-proof.
There are also two cleanups and a minor TSX syslog messages
enhancement"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/cpu: Remove redundant cpu_detect_cache_sizes() call
x86/cpu: Print "VMX disabled" error message iff KVM is enabled
KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs
perf/x86: Provide stubs of KVM helpers for non-Intel CPUs
KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits
KVM: VMX: Check for full VMX support when verifying CPU compatibility
KVM: VMX: Use VMX feature flag to query BIOS enabling
KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR
x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured
x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()
x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*
x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs
x86/vmx: Introduce VMX_FEATURES_*
x86/cpu: Clear VMX feature flag if VMX is not fully enabled
x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization
x86/centaur: Use common IA32_FEAT_CTL MSR initialization
x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked
x86/intel: Initialize IA32_FEAT_CTL MSR at boot
tools/x86: Sync msr-index.h from kernel sources
selftests, kvm: Replace manual MSR defs with common msr-index.h
...
Move the irtl_ns_units[] definition into irtl_2_usec() which is the
only user of it, use div_u64() for the division in there (as the
divisor is small enough) and use the NSEC_PER_USEC symbol for the
divisor. Also convert the irtl_2_usec() comment to a proper
kerneldo one.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move intel_idle_verify_cstate(), auto_demotion_disable() and
c1e_promotion_disable() closer to their callers.
While at it, annotate intel_idle_verify_cstate() with __init,
as it is only used during the initialization of the driver.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Annotate the functions that are only used at the initialization time
with __init and the data structures used by them with __initdata or
__initconst.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move intel_idle_cpuidle_devices_uninit() closer to its caller,
intel_idle_init(), add the __init modifier to its header, drop a
redundant local variable from it and fix up its kerneldoc comment.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Notice that intel_idle_state_table_update() only needs to be called
if icpu is not NULL, so fold it into intel_idle_init_cstates_icpu(),
and pass a pointer to the driver object to
intel_idle_cpuidle_driver_init() as an argument instead of
referencing it locally in there.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Instead of comparing intel_idle_cpuidle_devices with NULL apply
the "!" (not) operator to it when checking it against NULL.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no particular reason why intel_idle_probe() needs to be
a separate function and folding it into intel_idle_init() causes
the code to be somewhat easier to follow, so do just that.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The __setup_broadcast_timer() static function is only called in one
place and "true" is passed to it as the argument in there, so
effectively it is a wrapper arround tick_broadcast_enable().
To simplify the code, call tick_broadcast_enable() directly instead
of __setup_broadcast_timer() and drop the latter.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
In many cases, especially on server systems, it is desirable to avoid
enabling C-states that have been disabled in the platform firmware
(BIOS) setup, except for C1E.
As a rule, the C-states disabled this way are not listed by ACPI
_CST, so if that is used by intel_idle along with the specific
table of C-states that it has for the given processor, the C-states
disabled through the platform firmware will not be enabled by default
by intel_idle.
Accordingly, set the use_acpi flag (introduced previously) in all
server processor profiles defined in intel_idle (so as to make it use
ACPI _CST to decide which C-states to enable by default) and set
the CPUIDLE_FLAG_ALWAYS_ENABLE flag (also introduced previously)
for C1E in all C-states tables in intel_idle that contain C1 too
(so that C1E is enabled regardless of whether or not it is listed
by ACPI _CST).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a new module parameter called "no_acpi" to the intel_idle driver
to allow the driver to be prevented from using ACPI _CST via kernel
command line.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Update the intel_idle driver to get the C-states information from ACPI
_CST in some cases in which the processor is known to the driver, as long as
that information is available and the new use_acpi flag is set in the
profile of the processor in question.
In the cases when there is a specific table of C-states for the given
processor in the driver, that table is used as the primary source of
information on the available C-states, but if ACPI _CST is present,
the C-states that are not listed by it will not be enabled by default
(they still can be enabled later by user space via sysfs, though).
The new CPUIDLE_FLAG_ALWAYS_ENABLE flag can be used for marking
C-states that should be enabled by default even if they are not
listed by ACPI _CST.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Modify the intel_idle driver to get the C-states information from ACPI
_CST if the processor model is not recognized by it.
The processor is still required to support MWAIT and the information
from ACPI _CST will only be used if all of the C-states listed by
_CST are of the ACPI_CSTATE_FFH type (which means that they are
expected to be entered via MWAIT).
Moreover, the driver assumes that the _CST information is the same
for all CPUs in the system, so it is sufficient to evaluate _CST for
one of them and extract the common list of C-states from there.
Also _CST is evaluated once at the system initialization time and
the driver does not respond to _CST change notifications (that can
be changed in the future).
The main functional difference between intel_idle with this change
and the ACPI processor driver is that the former sets the target
residency to be equal to the exit latency (provided by _CST) for
C1-type C-states and to 3 times the exit latency value for the other
C-state types, whereas the latter obtains the target residency by
multiplying the exit latency by the same number (2 by default) for
all C-state types. Therefore it is expected that in general using
the former instead of the latter on the same system will lead to
improved energy-efficiency.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move the C-state verification and checks from
intel_idle_cpuidle_driver_init() to a separate function,
intel_idle_verify_cstate(), and make the former call it after
checking the CPUIDLE_FLAG_UNUSABLE state flag.
Also combine the drv->states[] updates with the incrementation of
drv->state_count.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After recent cpuidle updates the "disabled" field in struct
cpuidle_state is only used by two drivers (intel_idle and shmobile
cpuidle) for marking unusable idle states, but that may as well be
achieved with the help of a state flag, so define an "unusable" idle
state flag, CPUIDLE_FLAG_UNUSABLE, make the drivers in question use
it instead of the "disabled" field and make the core set
CPUIDLE_STATE_DISABLED_BY_DRIVER for the idle states with that flag
set.
After the above changes, the "disabled" field in struct cpuidle_state
is not used any more, so drop it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently big microservers have _XEON_D while small microservers have
_X, Make it uniformly: _D.
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \
-e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
Currently big core clients with extra graphics on have:
- _G
- _GT3E
Make it uniformly: _G
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_GT3E"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_GT3E/\1_G/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.622802314@infradead.org
Currently big core mobile chips have either:
- _L
- _ULT
- _MOBILE
Make it uniformly: _L.
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org
Currently the big core client models either have:
- no OPTDIFF
- _CORE
- _DESKTOP
Make it uniformly: 'no OPTDIFF'.
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(CORE\|DESKTOP\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(CORE\|DESKTOP\)/\1/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.513945586@infradead.org
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin st fifth floor boston ma 02110
1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 111 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.567572064@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jacobsville uses the same C-states as Denverton.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull perf updates from Ingo Molnar:
"The main updates in this cycle were:
- Lots of perf tooling changes too voluminous to list (big perf trace
and perf stat improvements, lots of libtraceevent reorganization,
etc.), so I'll list the authors and refer to the changelog for
details:
Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter
Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven
Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas
Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir
Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa.
... with the bulk of the changes written by Jiri Olsa, Tzvetomir
Stoyanov and Arnaldo Carvalho de Melo.
- Continued intel_rdt work with a focus on playing well with perf
events. This also imported some non-perf RDT work due to
dependencies. (Reinette Chatre)
- Implement counter freezing for Arch Perfmon v4 (Skylake and newer).
This allows to speed up the PMI handler by avoiding unnecessary MSR
writes and make it more accurate. (Andi Kleen)
- kprobes cleanups and simplification (Masami Hiramatsu)
- Intel Goldmont PMU updates (Kan Liang)
- ... plus misc other fixes and updates"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits)
kprobes/x86: Use preempt_enable() in optimized_callback()
x86/intel_rdt: Prevent pseudo-locking from using stale pointers
kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
perf/x86/intel: Export mem events only if there's PEBS support
x86/cpu: Drop pointless static qualifier in punit_dev_state_show()
x86/intel_rdt: Fix initial allocation to consider CDP
x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
x86/intel_rdt: Introduce utility to obtain CDP peer
tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file
tools lib traceevent: Separate out tep_strerror() for strerror_r() issues
perf python: More portable way to make CFLAGS work with clang
perf python: Make clang_has_option() work on Python 3
perf tools: Free temporary 'sys' string in read_event_files()
perf tools: Avoid double free in read_event_file()
perf tools: Free 'printk' string in parse_ftrace_printk()
perf tools: Cleanup trace-event-info 'tdata' leak
perf strbuf: Match va_{add,copy} with va_end
perf test: S390 does not support watchpoints in test 22
perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG
tools include: Adopt linux/bits.h
...
Going primarily by:
https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
with additional information gleaned from other related pages; notably:
- Bonnell shrink was called Saltwell
- Moorefield is the Merriefield refresh which makes it Airmont
The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
for i in `git grep -l FAM6_ATOM` ; do
sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Replace custom grown macro with generic INTEL_CPU_FAM6() one.
No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpuidle:
intel_idle: Graceful probe failure when MWAIT is disabled
cpuidle: Avoid assignment in if () argument
cpuidle: Clean up cpuidle_enable_device() error handling a bit
cpuidle: ladder: Add per CPU PM QoS resume latency support
ARM: cpuidle: Refactor rollback operations if init fails
ARM: cpuidle: Correct driver unregistration if init fails
intel_idle: replace conditionals with static_cpu_has(X86_FEATURE_ARAT)
cpuidle: fix broadcast control when broadcast can not be entered
Conflicts:
drivers/idle/intel_idle.c
When MWAIT is disabled, intel_idle refuses to probe.
But it may mis-lead the user by blaming this on the model number:
intel_idle: does not run on family 6 modesl 79
So defer the check for MWAIT until after the model# white-list check succeeds,
and if the MWAIT check fails, tell the user how to fix it:
intel_idle: Please enable MWAIT in BIOS SETUP
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This reverts commit 43858b4f25.
The reason I removed the leave_mm() calls in question is because the
heuristic wasn't needed after that patch. With the original version
of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running
kernel thread) due a flush on the loaded mm.
Unfortunately, that caused architectural issues, so now I've
reinstated these flushes on non-PCID systems in:
commit b956575bed ("x86/mm: Flush more aggressively in lazy TLB mode").
That, in turn, gives us a power management and occasionally
performance regression as compared to old kernels: a process that
goes into a deep idle state on a given CPU and gets its mm flushed
due to activity on a different CPU will wake the idle CPU.
Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an
intel_idle state that is likely to cause a TLB flush gets its mm
switched to init_mm before going idle.
FWIW, this heuristic is lousy. Whether we should change CR3 before
idle isn't a good hint except insofar as the performance hit is a bit
lower if the TLB is getting flushed by the idle code anyway. What we
really want to know is whether we anticipate being idle long enough
that the mm is likely to be flushed before we wake up. This is more a
matter of the expected latency than the idle state that gets chosen.
This heuristic also completely fails on systems that don't know
whether the TLB will be flushed (e.g. AMD systems?). OTOH it may be a
bit obsolete anyway -- PCID systems don't presently benefit from this
heuristic at all.
We also shouldn't do this callback from innermost bit of the idle code
due to the RCU nastiness it causes. All the information need is
available before rcu_idle_enter() needs to happen.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 43858b4f25 "x86/mm: Stop calling leave_mm() in idle code"
Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the 'arat' cpu flag is set, then the conditionals in intel_idle() that
guard calling tick_broadcast_enter()/exit() will never be true. Use
static_cpu_has(X86_FEATURE_ARAT) to create a fast path to replace
the conditional.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Drop the P-state selection algorithm based on a PID controller
from intel_pstate and make it use the same P-state selection
method (based on the CPU load) for all types of systems in the
active mode (Rafael Wysocki, Srinivas Pandruvada).
- Rework the cpufreq core and governors to make it possible to
take cross-CPU utilization updates into account and modify the
schedutil governor to actually do so (Viresh Kumar).
- Clean up the handling of transition latency information in the
cpufreq core and untangle it from the information on which drivers
cannot do dynamic frequency switching (Viresh Kumar).
- Add support for new SoCs (MT2701/MT7623 and MT7622) to the
mediatek cpufreq driver and update its DT bindings (Sean Wang).
- Modify the cpufreq dt-platdev driver to autimatically create
cpufreq devices for the new (v2) Operating Performance Points
(OPP) DT bindings and update its whitelist of supported systems
(Viresh Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen,
Finley Xiao).
- Add support for Ux500 to the cpufreq-dt driver and drop the
obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).
- Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
Nguyen).
- Fix and clean up assorted issues in the cpufreq drivers and core
(Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).
- Update the IO-wait boost handling in the schedutil governor to
make it less aggressive (Joel Fernandes).
- Rework system suspend diagnostics to make it print fewer messages
to the kernel log by default, add a sysfs knob to allow more
suspend-related messages to be printed and add Low Power S0 Idle
constraints checks to the ACPI suspend-to-idle code (Rafael
Wysocki, Srinivas Pandruvada).
- Prefer suspend-to-idle over S3 on ACPI-based systems with the
ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
interface present in the ACPI tables (Rafael Wysocki).
- Update documentation related to system sleep and rename a number
of items in the code to make it cleare that they are related to
suspend-to-idle (Rafael Wysocki).
- Export a variable allowing device drivers to check the target
system sleep state from the core system suspend code (Florian
Fainelli).
- Clean up the cpuidle subsystem to handle the polling state on
x86 in a more straightforward way and to use %pOF instead of
full_name (Rafael Wysocki, Rob Herring).
- Update the devfreq framework to fix and clean up a few minor
issues (Chanwoo Choi, Rob Herring).
- Extend diagnostics in the generic power domains (genpd) framework
and clean it up slightly (Thara Gopinath, Rob Herring).
- Fix and clean up a couple of issues in the operating performance
points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).
- Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
(AVS) driver (David Wu).
- Fix the usage of notifiers in CPU power management on some
platforms (Alex Shi).
- Update the pm-graph system suspend/hibernation and boot profiling
utility (Todd Brandt).
- Make it possible to run the cpupower utility without CPU0 (Prarit
Bhargava).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZrcDJAAoJEILEb/54YlRx9FUQAIUKvWBAARc61ZIZXjbqZF1v
aEMOBuksFns0CMekdptSic6n4wc81E/XYMS8yDhOOMpyDzfAZsTWjmu+gKwN7w3l
E/yf/NVlhob9JZ7MqGgqD4EUFfFIaKBXPlWFdDi2rdCUXE2L8xJ7rla8i7zyZlc5
pYHfAppBbF4qUcEY4OoOVOOGRZCfMdiLXj0iZOhMX8Y6yLBRk/AjnVADYsF33hoj
gBEfomU+H0K5V8nQEp0ZFKDArPwL+oElHQj6i+nxBpGfPM5evvLXhHOyR6AsldJ5
J4YI1kMuQNSCmvHMqOTxTYyJf8Jcf3Fj4wcjwaVMVGceY1lz6McAKknnFnCqCvz+
mskn84gFCBCM8EoJDqRf0b9MQHcuRyQKM+yw4tjnR9r8yd32erb85ZWFHcPWYhCT
fZatNOwFFv2MU+2vo5J3yeUNSWIKT+uBjy+tKPbrDkUwpKZVRj3Oj+hP3Mq9NE8U
YBqltsj7tmrdA634zI8C7jfS6wF221S0fId/iPszwmPJaVn/lq8Ror7pWL5YI8U7
SCJFjiqDiGmAcQEkuWwFAQnscZkyHpO+Y3A+jfXl/izoaZETaI5+ceIHBaocm3+5
XrOOpHS3ik8EHf9ji0KFCKZ/pYDwllday3cBQPWo3sMIzpQ2lrjbqdnE1cVnBrld
OtHZAeD/jLUXuY6XW2jN
=mAiV
-----END PGP SIGNATURE-----
Merge tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"This time (again) cpufreq gets the majority of changes which mostly
are driver updates (including a major consolidation of intel_pstate),
some schedutil governor modifications and core cleanups.
There also are some changes in the system suspend area, mostly related
to diagnostics and debug messages plus some renames of things related
to suspend-to-idle. One major change here is that suspend-to-idle is
now going to be preferred over S3 on systems where the ACPI tables
indicate to do so and provide requsite support (the Low Power Idle S0
_DSM in particular). The system sleep documentation and the tools
related to it are updated too.
The rest is a few cpuidle changes (nothing major), devfreq updates,
generic power domains (genpd) framework updates and a few assorted
modifications elsewhere.
Specifics:
- Drop the P-state selection algorithm based on a PID controller from
intel_pstate and make it use the same P-state selection method
(based on the CPU load) for all types of systems in the active mode
(Rafael Wysocki, Srinivas Pandruvada).
- Rework the cpufreq core and governors to make it possible to take
cross-CPU utilization updates into account and modify the schedutil
governor to actually do so (Viresh Kumar).
- Clean up the handling of transition latency information in the
cpufreq core and untangle it from the information on which drivers
cannot do dynamic frequency switching (Viresh Kumar).
- Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
cpufreq driver and update its DT bindings (Sean Wang).
- Modify the cpufreq dt-platdev driver to autimatically create
cpufreq devices for the new (v2) Operating Performance Points (OPP)
DT bindings and update its whitelist of supported systems (Viresh
Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
Xiao).
- Add support for Ux500 to the cpufreq-dt driver and drop the
obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).
- Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
Nguyen).
- Fix and clean up assorted issues in the cpufreq drivers and core
(Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).
- Update the IO-wait boost handling in the schedutil governor to make
it less aggressive (Joel Fernandes).
- Rework system suspend diagnostics to make it print fewer messages
to the kernel log by default, add a sysfs knob to allow more
suspend-related messages to be printed and add Low Power S0 Idle
constraints checks to the ACPI suspend-to-idle code (Rafael
Wysocki, Srinivas Pandruvada).
- Prefer suspend-to-idle over S3 on ACPI-based systems with the
ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
interface present in the ACPI tables (Rafael Wysocki).
- Update documentation related to system sleep and rename a number of
items in the code to make it cleare that they are related to
suspend-to-idle (Rafael Wysocki).
- Export a variable allowing device drivers to check the target
system sleep state from the core system suspend code (Florian
Fainelli).
- Clean up the cpuidle subsystem to handle the polling state on x86
in a more straightforward way and to use %pOF instead of full_name
(Rafael Wysocki, Rob Herring).
- Update the devfreq framework to fix and clean up a few minor issues
(Chanwoo Choi, Rob Herring).
- Extend diagnostics in the generic power domains (genpd) framework
and clean it up slightly (Thara Gopinath, Rob Herring).
- Fix and clean up a couple of issues in the operating performance
points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).
- Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
(AVS) driver (David Wu).
- Fix the usage of notifiers in CPU power management on some
platforms (Alex Shi).
- Update the pm-graph system suspend/hibernation and boot profiling
utility (Todd Brandt).
- Make it possible to run the cpupower utility without CPU0 (Prarit
Bhargava)"
* tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
cpuidle: Make drivers initialize polling state
cpuidle: Move polling state initialization code to separate file
cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
cpufreq: imx6q: Fix imx6sx low frequency support
cpufreq: speedstep-lib: make several arrays static, makes code smaller
PM: docs: Delete the obsolete states.txt document
PM: docs: Describe high-level PM strategies and sleep states
PM / devfreq: Fix memory leak when fail to register device
PM / devfreq: Add dependency on PM_OPP
PM / devfreq: Move private devfreq_update_stats() into devfreq
PM / devfreq: Convert to using %pOF instead of full_name
PM / AVS: rockchip-io: add io selectors and supplies for RV1108
cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
cpufreq: dt-platdev: Drop few entries from whitelist
cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
ARM: ux500: don't select CPUFREQ_DT
cpuidle: Convert to using %pOF instead of full_name
cpufreq: Convert to using %pOF instead of full_name
PM / Domains: Convert to using %pOF instead of full_name
cpufreq: Cap the default transition delay value to 10 ms
...
* pm-sleep:
ACPI / PM: Check low power idle constraints for debug only
PM / s2idle: Rename platform operations structure
PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
PM / s2idle: Rename freeze_state enum and related items
PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE
ACPI / PM: Prefer suspend-to-idle over S3 on some systems
platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle
PM / suspend: Define pr_fmt() in suspend.c
PM / suspend: Use mem_sleep_labels[] strings in messages
PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG
PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()
PM / core: Add error argument to dpm_show_time()
PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()
PM / s2idle: Rearrange the main suspend-to-idle loop
PM / timekeeping: Print debug messages when requested
PM / sleep: Mark suspend/hibernation start and finish
PM / sleep: Do not print debug messages by default
PM / suspend: Export pm_suspend_target_state
Make the drivers that want to include the polling state into their
states table initialize it explicitly and drop the initialization of
it (which in fact is conditional, but that is not obvious from the
code) from the core.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Rename the ->enter_freeze cpuidle driver callback to ->enter_s2idle
to make it clear that it is used for entering suspend-to-idle and
rename the related functions, variables and so on accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The SME patches we are about to apply add some E820 logic, so merge in
pending E820 code changes first, to have a single code base.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that lazy TLB suppresses all flush IPIs (as opposed to all but
the first), there's no need to leave_mm() when going idle.
This means we can get rid of the rcuidle hack in
switch_mm_irqs_off() and we can unexport leave_mm().
This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
since it has no callers any more.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/03c699cfd6021e467be650d6b73deaccfe4b4bd7.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove #define PREFIX and add #define pr_fmt to use more common logging.
Miscellanea:
o Add missing newline to format
o Convert a single printk without KERN_<LEVEL> to pr_info
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Gemini Lake uses the same C-states as Broxton and also uses the
IRTL MSR's to determine maximum C-state latency.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
These update turbostat significantly and in particular:
- Default output is now verbose, --debug is no longer required to
get all counters. As a result, some options have been added to
specify exactly what output is wanted.
- Added --quiet to skip system configuration output
- Added --list, --show and --hide parameters
- Added --cpu parameter
- Enhanced Baytrail SoC support
- Added Gemini Lake SoC support
- Added sysfs C-state columns
Also the symbol definitions in arch/x86/include/asm/intel-family.h
and arch/x86/include/asm/msr-index.h are updated and the intel_idle
and intel_pstate drivers are modified to use the updated symbols.
Credits to Len Brown for all of these changes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYuLyNAAoJEILEb/54YlRxEvkQAJsggzpgGrlhrO6KHSm4yC9M
CqhBVsdeppX1ZTAVPiMk/pcXQYtL5fZ97ELk2So/CjT5Nh3jwDPMA/ux5n3uiob+
O2BTdtxnpNLxPQPQM1mW7Dr/uAIRlJug9gSMxKDbFSU9Oe3aET58PUdUTs7xaT59
nbtLxVSvzrdGk/bX6WO4ic+7F2licJLZPfDGhYidnoika8LxD4M+cIO73gFpgqQi
yoKrTZyLimvneFT0eAUUvHIyKjkJIxeMfslW57uBpz8rW5my+3UwsdpRG4AIVeWc
wSBlsNqj+TuR4BBiZ2VR2RoHF3qbH/SceI+k864BqyThfyK/g2q/vV/GvLZQCR/R
yWcajWD9kvLKvnm1D3XYOIQDBeP4l60j3vVwHytSvmaPYjn5Ms3jq6b+2K6zkXMM
8y3leW/hgw+rGCacdXPrKIlpBykSV7h+TnD2iMxeeDISNkbefWWDe/WB6HncocAg
HDtKRvU9ntRq6/MlnTKbCFM5c0oCXWRw4QNjDy3AsjJELgeAIwiqpHWMKO6XltFj
qU/rdyW/BTCuAlIjWVbjooAIJZ268geupeug3zvE3uGzrxT4DaVIo8W1wtJ+XQrt
By7sOW/gMQ2EcTJQiuFjS/Gz5gOKQ2F8OLCm6T8Prjh6SxrCUAiuIvP0LmxUCa8i
KMlx+8c9E2f9j+TTt9AP
=oMZe
-----END PGP SIGNATURE-----
Merge tag 'pm-turbostat-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull turbostat utility updates from Rafael Wysocki:
"Power management turbostat utility updates.
These update turbostat significantly and in particular:
- default output is now verbose, --debug is no longer required to get
all counters. As a result, some options have been added to specify
exactly what output is wanted.
- added --quiet to skip system configuration output
- added --list, --show and --hide parameters
- added --cpu parameter
- enhanced Baytrail SoC support
- added Gemini Lake SoC support
- added sysfs C-state columns
Also the symbol definitions in arch/x86/include/asm/intel-family.h and
arch/x86/include/asm/msr-index.h are updated and the intel_idle and
intel_pstate drivers are modified to use the updated symbols.
Credits to Len Brown for all of these changes"
* tag 'pm-turbostat-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits)
tools/power turbostat: version 17.02.24
tools/power turbostat: bugfix: --add u32 was printed as u64
tools/power turbostat: show error on exec
tools/power turbostat: dump p-state software config
tools/power turbostat: show package number, even without --debug
tools/power turbostat: support "--hide C1" etc.
tools/power turbostat: move --Package and --processor into the --cpu option
tools/power turbostat: turbostat.8 update
tools/power turbostat: update --list feature
tools/power turbostat: use wide columns to display large numbers
tools/power turbostat: Add --list option to show available header names
tools/power turbostat: fix zero IRQ count shown in one-shot command mode
tools/power turbostat: add --cpu parameter
tools/power turbostat: print sysfs C-state stats
tools/power turbostat: extend --add option to accept /sys path
tools/power turbostat: skip unused counters on BDX
tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits
tools/power turbostat: skip unused counters on SKX
tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7
tools/power turbostat: initial Gemini Lake SOC support
...
Cosmetic only -- no functional change in this patch.
sysfs before:
state4/desc:MWAIT 0x20
state4/name:C6-HSW
sysfs after:
state4/desc:MWAIT 0x20
state4/name:C6
We remove the platform acronyms from the end of the state name
(-HSW in this case) for three reasonse.
1. more consistency with acpi_idle, which prints C1, C2, C3 etc.
2. users know what platform they are on already
an acronym for the processor code name here
seems to cause more confusion than clarity.
3. less clutter in "cpupower monitor" output,
which truncates the names to 4 columns.
The precise definition of the state continues to be available in "desc".
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Install the callbacks via the state machine and let the core invoke the
callbacks on the already online CPUs.
The two smp_call_function_single() invocations in intel_idle_cpu_init() have
been removed because intel_idle_cpu_init() is now invoked via the hotplug
callback which runs on the target CPU. The IRQ-off calling convention for
auto_demotion_disable() and c1e_promotion_disable() has not been preserved
because only those two modify the MSR during CPU intialization.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit 1cf4f629d9 ("cpu/hotplug: Move online calls to
hotplugged cpu") the CPU_ONLINE and CPU_DOWN_PREPARE notifiers are
always run on the hot plugged CPU, and as of commit 3b9d6da67e
("cpu/hotplug: Fix rollback during error-out in __cpu_disable()") the
CPU_DOWN_FAILED notifier also runs on the hot plugged CPU. This patch
converts the SMP functional calls into direct calls.
smp_function_call_single() executes the function with interrupts
disabled. This calling convention is not preserved, because
tick_broadcast_enable() and tick_braodcast_disable() handle
interrupts themselves.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Knights Mill (KNM) to the list of CPUIDs supported by intel_idle.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Add CPU ID for Atom Z34xx processors. Datasheets indicate support for this,
detailed information about potential quirks or limitations are missing, though.
So we just reuse the definition from official BSP code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.
This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.
Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>