Commit Graph

280 Commits

Author SHA1 Message Date
Artem Bityutskiy
a78032e94b intel_idle: clean up intel_idle_init_cstates_icpu()
The intel_idle_init_cstates_icpu() function includes a loop that iterates
over every C-state. Inside the loop, the same C-state data is referenced 2
ways:
 1. as cpuidle_state_table[cstate]
 2. as drv->states[drv->state_count] (but it is a copy of #1, not the same
    object).

Make the code be more consistent and easier to read by using only the 2nd
way. So the code structure would be as follows:

 1. Use cpuidle_state_table[cstate]
 2. Copy cpuidle_state_table[cstate] to drv->states[drv->state_count]
 3. Use only drv->states[drv->state_count] from this point.

Note, this change introduces a checkpatch.pl warning (too long line), but it
will be addressed in the next patch.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
91048ce422 intel_idle: use pr_info() instead of printk()
Substitute 'printk()' with 'pr_info()', because 'intel_idle' already uses
'pr_debug()', so using 'pr_info()' will be more consistent.

In addition to this, this patch addresses  the following checkpatch.pl
warning:
  WARNING: printk() should include KERN_<LEVEL> facility level

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Linus Torvalds
2504ba8b01 Power management updates for 6.3-rc1
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
    Karny, Arnd Bergmann, Bagas Sanjaya).
 
  - Drop the custom cpufreq driver for loongson1 that is not necessary
    any more and the corresponding cpufreq platform device (Keguang
    Zhang).
 
  - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
    entries (Paul E. McKenney).
 
  - Enable thermal cooling for Tegra194 (Yi-Wei Wang).
 
  - Register module device table and add missing compatibles for
    cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss).
 
  - Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu
    (Christian Marangi).
 
  - Make kobj_type structure in the cpufreq core constant (Thomas
    Weißschuh).
 
  - Make cpufreq_unregister_driver() return void (Uwe Kleine-König).
 
  - Make the TEO cpuidle governor check CPU utilization in order to refine
   idle state selection (Kajetan Puchalski).
 
  - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
    cpuidle driver is selected and replace a default_idle() call in that
    driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing).
 
  - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
    Bityutskiy).
 
  - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
    avoid randconfig build failures (Arnd Bergmann).
 
  - Make kobj_type structures used in the cpuidle sysfs interface
    constant (Thomas Weißschuh).
 
  - Make the cpuidle driver registration code update microsecond values
    of idle state parameters in accordance with their nanosecond values
    if they are provided (Rafael Wysocki).
 
  - Make the PSCI cpuidle driver prevent topology CPUs from being
    suspended on PREEMPT_RT (Krzysztof Kozlowski).
 
  - Document that pm_runtime_force_suspend() cannot be used with
    DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald).
 
  - Add EXPORT macros for exporting PM functions from drivers (Richard
    Fitzgerald).
 
  - Remove /** from non-kernel-doc comments in hibernation code (Randy
    Dunlap).
 
  - Fix possible name leak in powercap_register_zone() (Yang Yingliang).
 
  - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
    capping driver (Zhang Rui).
 
  - Modify the idle_inject power capping facility to support 100% idle
    injection (Srinivas Pandruvada).
 
  - Fix large time windows handling in the intel_rapl power capping
    driver (Zhang Rui).
 
  - Fix memory leaks with using debugfs_lookup() in the generic PM
    domains and Energy Model code (Greg Kroah-Hartman).
 
  - Add missing 'cache-unified' property in the example for kryo OPP
    bindings (Rob Herring).
 
  - Fix error checking in opp_migrate_dentry() (Qi Zheng).
 
  - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
    Dybcio).
 
  - Modify some power management utilities to use the canonical ftrace
    path (Ross Zwisler).
 
  - Correct spelling problems for Documentation/power/ as reported by
    codespell (Randy Dunlap).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPuJfMSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx/5kQAJNOVImLEPLerLP8xufw30//LuDU5Gi0
 STsyDOMql/I2MpkeqeCcgrSbpy6NlEglOvg16gfpQ3qqTCLF9ypENxs9E5BGGvW0
 aEdCzvaoqmvi9PCr/jmj0EPP70/U+rIX5m/k0QdjLh9x0aLoAEe3uRJTfR9QVqXf
 I7JX0N9kjKi7YxpA5DlkHrS7J7GPPiWlesJ3p4wXuHMo3jf+6fgkoPFt8yRrGWeh
 AHzGT2BLrsy7aAUjGZB65Qx9q3fnSXMmXOjmn0Xh2njQah+zRZDwrNzwoY2HTLL/
 KQ6/Ww16USYRZtCS1fmGwAj9I+ddq6AOvhPCMn0vLXXmKVAMUrVVWnQS/0+vpm9y
 suUMK9Tndkgxd1vjby2246ThJn27uDd/ERFan4ouQo2j22uICY+SDo3osj2hMXka
 wq4zthXkY8KgjZ+MuXnZxPhcOvo8KRvfxAU0fy5efQnSkbtwY9UlMvjPBMBHm/RA
 21/6kjQNtq5vMmI37oC8DH+oPrRQ7sUKuY7HNqwO9P3QNKWVmNe7cF5UtXXxME7Q
 ULvP1d+u+TNNdHFLryPwCSzBO34wQEccdRZBjalZ8tBe6JiDWUFHC3giSURZSuzZ
 GDvzVaNX6PkgToyv4inBTB8lTp6pAuUjaWNvNJzVvUXiEKHB0ihzg5vpJW5NdwlH
 15Tn8cjH7pp0
 =lZLx
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add EPP support to the AMD P-state cpufreq driver, add support
  for new platforms to the Intel RAPL power capping driver, intel_idle
  and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
  drop the custom cpufreq driver for loongson1 that is not necessary any
  more (and the corresponding cpufreq platform device), fix assorted
  issues and clean up code.

  Specifics:

   - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
     Karny, Arnd Bergmann, Bagas Sanjaya)

   - Drop the custom cpufreq driver for loongson1 that is not necessary
     any more and the corresponding cpufreq platform device (Keguang
     Zhang)

   - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
     entries (Paul E. McKenney)

   - Enable thermal cooling for Tegra194 (Yi-Wei Wang)

   - Register module device table and add missing compatibles for
     cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)

   - Various dt binding updates for qcom-cpufreq-nvmem and
     opp-v2-kryo-cpu (Christian Marangi)

   - Make kobj_type structure in the cpufreq core constant (Thomas
     Weißschuh)

   - Make cpufreq_unregister_driver() return void (Uwe Kleine-König)

   - Make the TEO cpuidle governor check CPU utilization in order to
     refine idle state selection (Kajetan Puchalski)

   - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
     cpuidle driver is selected and replace a default_idle() call in
     that driver with arch_cpu_idle() to allow MWAIT to be used (Li
     RongQing)

   - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy)

   - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
     avoid randconfig build failures (Arnd Bergmann)

   - Make kobj_type structures used in the cpuidle sysfs interface
     constant (Thomas Weißschuh)

   - Make the cpuidle driver registration code update microsecond values
     of idle state parameters in accordance with their nanosecond values
     if they are provided (Rafael Wysocki)

   - Make the PSCI cpuidle driver prevent topology CPUs from being
     suspended on PREEMPT_RT (Krzysztof Kozlowski)

   - Document that pm_runtime_force_suspend() cannot be used with
     DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)

   - Add EXPORT macros for exporting PM functions from drivers (Richard
     Fitzgerald)

   - Remove /** from non-kernel-doc comments in hibernation code (Randy
     Dunlap)

   - Fix possible name leak in powercap_register_zone() (Yang Yingliang)

   - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
     capping driver (Zhang Rui)

   - Modify the idle_inject power capping facility to support 100% idle
     injection (Srinivas Pandruvada)

   - Fix large time windows handling in the intel_rapl power capping
     driver (Zhang Rui)

   - Fix memory leaks with using debugfs_lookup() in the generic PM
     domains and Energy Model code (Greg Kroah-Hartman)

   - Add missing 'cache-unified' property in the example for kryo OPP
     bindings (Rob Herring)

   - Fix error checking in opp_migrate_dentry() (Qi Zheng)

   - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
     Dybcio)

   - Modify some power management utilities to use the canonical ftrace
     path (Ross Zwisler)

   - Correct spelling problems for Documentation/power/ as reported by
     codespell (Randy Dunlap)"

* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
  Documentation: amd-pstate: disambiguate user space sections
  cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
  dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
  PM: Add EXPORT macros for exporting PM functions
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  MIPS: loongson32: Drop obsolete cpufreq platform device
  powercap: intel_rapl: Fix handling for large time window
  cpuidle: driver: Update microsecond values of state parameters as needed
  cpuidle: sysfs: make kobj_type structures constant
  cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
  PM: EM: fix memory leak with using debugfs_lookup()
  PM: domains: fix memory leak with using debugfs_lookup()
  cpufreq: Make kobj_type structure constant
  cpufreq: davinci: Fix clk use after free
  cpufreq: amd-pstate: avoid uninitialized variable use
  cpufreq: Make cpufreq_unregister_driver() return void
  OPP: fix error checking in opp_migrate_dentry()
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
  ...
2023-02-21 12:13:58 -08:00
Artem Bityutskiy
74528edfbc intel_idle: add Emerald Rapids Xeon support
Emerald Rapids (EMR) is the next Intel Xeon processor after Sapphire
Rapids (SPR).

EMR C-states are the same as SPR C-states, and we expect that EMR
C-state characteristics (latency and target residency) will be the
same as in SPR. Therefore, add EMR support by using SPR C-states table.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20 18:39:42 +01:00
Peter Zijlstra
365bd03ff6 intel_idle: Add force_irq_on module param
For testing purposes.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.967699392@infradead.org
2023-01-13 11:48:17 +01:00
Peter Zijlstra
9b461a6faa cpuidle, intel_idle: Fix CPUIDLE_FLAG_IBRS
objtool to the rescue:

  vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() leaves .noinstr.text section
  vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.556912863@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
6d9c7f51b1 cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*
So objtool found this bug:

  vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to trace_hardirqs_off() leaves .noinstr.text section

As per commit 32d4fd5751 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE"):

  "must not have tracing in idle functions"

Clearly people can't read and tinker along until splat dissapears.
This straight up reverts commit d295ad34f2 ("intel_idle: Fix false
positive RCU splats due to incorrect hardirqs state").

It doesn't re-introduce the problem because preceding patches fixed it
properly.

Fixes: d295ad34f2 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.434302128@infradead.org
2023-01-13 11:48:15 +01:00
Zhang Rui
65c0c2367e intel_idle: Add AlderLake-N support
Similar to the other other AlderLake platforms, the C1 and C1E states on
ADL-N are mutually exclusive. Only one of them can be enabled at a time.

C1E is preferred on ADL-N for better energy efficiency.

C6S is also supported on this platform. Its latency is far bigger than
C6, but really close to C8 (PC8), thus it is not exposed as a separate
state.

Suggested-by: Baieswara Reddy Sagili <baieswara.reddy.sagili@intel.com>
Suggested-by: Vinay Kumar <vinay.kumar@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21 20:33:49 +02:00
Wolfram Sang
0dbc0f49d6 intel_idle: move from strlcpy() with unused retval to strscpy()
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-31 21:12:03 +02:00
Linus Torvalds
a771ea6413 Power management updates for 5.20-rc1
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
 
  - Drop unnecessary CPU hotplug locking from store() used by cpufreq
    sysfs attributes (Viresh Kumar).
 
  - Make the ACPI cpufreq driver support the boost control interface on
    Zhaoxin/Centaur processors (Tony W Wang-oc).
 
  - Print a warning message on attempts to free an active cpufreq policy
    which should never happen (Viresh Kumar).
 
  - Fix grammar in the Kconfig help text for the loongson2 cpufreq
    driver (Randy Dunlap).
 
  - Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
    governor (Zhao Liu).
 
  - Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
    cpuidle driver (Eiichi Tsukata).
 
  - Modify intel_idle to treat C1 and C1E as independent idle states on
    Sapphire Rapids (Artem Bityutskiy).
 
  - Extend support for wakeirq to callback wrappers used during system
    suspend and resume (Ulf Hansson).
 
  - Defer waiting for device probe before loading a hibernation image
    till the first actual device access to avoid possible deadlocks
    reported by syzbot (Tetsuo Handa).
 
  - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
    Helgaas).
 
  - Add Raptor Lake-P to the list of processors supported by the Intel
    RAPL driver (George D Sworo).
 
  - Add Alder Lake-N and Raptor Lake-P to the list of processors for
    which Power Limit4 is supported in the Intel RAPL driver (Sumeet
    Pawnikar).
 
  - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
    attempting to remove it (Hsin-Yi Wang).
 
  - Change the Energy Model code to represent power in micro-Watts and
    adjust its users accordingly (Lukasz Luba).
 
  - Add new devfreq driver for Mediatek CCI (Cache Coherent
    Interconnect) (Johnson Wang).
 
  - Convert the Samsung Exynos SoC Bus bindings to DT schema of
    exynos-bus.c (Krzysztof Kozlowski).
 
  - Address kernel-doc warnings by adding the description for unused
    fucntion parameters in devfreq core (Mauro Carvalho Chehab).
 
  - Use NULL to pass a null pointer rather than zero according to the
    function propotype in imx-bus.c (Colin Ian King).
 
  - Print error message instead of error interger value in
    tegra30-devfreq.c (Dmitry Osipenko).
 
  - Add checks to prevent setting negative frequency QoS limits for
    CPUs (Shivnandan Kumar).
 
  - Update the pm-graph suite of utilities to the latest revision 5.9
    including multiple improvements (Todd Brandt).
 
  - Drop pme_interrupt reference from the PCI power management
    documentation (Mario Limonciello).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLoKy8SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx3+oQAJNVU+W14EaRPWXQRMuwBC5zk3hb6T9q
 JqmMd8coEd+9/4ABAeRAWso1B26rUzB6JyBvw3lGH9OXInpYmvnJEhEPrTpK2h0D
 U9HxEARuGJolrDm0X9NAkn7tKKMC9GnvPS9z2s7s+N97VFFWC/QiU+PHB0SypGNb
 JxRfbVJZQCuxmNG9UeK+xeHFQ9lM2Z9ZdTxR71G0n7nQPPR+sUvnFufFby3Aogf3
 XnBYfia+YNqkUlefxxwB5a0cFwPXOUGsQkIf4d64gZnq1TgZ+71kht1GEF08PDFl
 wV8v1rOWuXEae8dozuf5xszp/eVyAqzgB+IShT9APREOO3Wg6I16XdBm8R1TGwCK
 JTdZqnm6HVKBNqchEwYViJILX69rrNUT+AwHBWhtKKDNh3qeTuwi/JGTeDVN++en
 xf3TNKx3LV31Nq6nWJFzDGLehfZMnAPkhfYohUBI7FNyblpk4mJRVcZ0bYI7UNnS
 als77uoipvb5KdFCtdhxYBHd/y867NvXKa1qsAuDxusAsfJHf4SnlMdbgOepBH2y
 jJg06CGrMDU3TZ8BL+WpqUYk4irQnAMs/159Txh7A6/dOnTjE7S9NHrENCwmt2og
 FrHSLH1eLX6Sa4RSibiGHPC7mNULP2/TOtryf3zFdlIVcjm3NEU3bnfzx7nlJn05
 8t6ObMxgMhWT
 =XeLV
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are mostly minor improvements all over including new CPU IDs for
  the Intel RAPL driver, an Energy Model rework to use micro-Watt as the
  power unit, cpufreq fixes and cleanus, cpuidle updates, devfreq
  updates, documentation cleanups and a new version of the pm-graph
  suite of utilities.

  Specifics:

   - Make cpufreq_show_cpus() more straightforward (Viresh Kumar).

   - Drop unnecessary CPU hotplug locking from store() used by cpufreq
     sysfs attributes (Viresh Kumar).

   - Make the ACPI cpufreq driver support the boost control interface on
     Zhaoxin/Centaur processors (Tony W Wang-oc).

   - Print a warning message on attempts to free an active cpufreq
     policy which should never happen (Viresh Kumar).

   - Fix grammar in the Kconfig help text for the loongson2 cpufreq
     driver (Randy Dunlap).

   - Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
     governor (Zhao Liu).

   - Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
     cpuidle driver (Eiichi Tsukata).

   - Modify intel_idle to treat C1 and C1E as independent idle states on
     Sapphire Rapids (Artem Bityutskiy).

   - Extend support for wakeirq to callback wrappers used during system
     suspend and resume (Ulf Hansson).

   - Defer waiting for device probe before loading a hibernation image
     till the first actual device access to avoid possible deadlocks
     reported by syzbot (Tetsuo Handa).

   - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
     Helgaas).

   - Add Raptor Lake-P to the list of processors supported by the Intel
     RAPL driver (George D Sworo).

   - Add Alder Lake-N and Raptor Lake-P to the list of processors for
     which Power Limit4 is supported in the Intel RAPL driver (Sumeet
     Pawnikar).

   - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
     attempting to remove it (Hsin-Yi Wang).

   - Change the Energy Model code to represent power in micro-Watts and
     adjust its users accordingly (Lukasz Luba).

   - Add new devfreq driver for Mediatek CCI (Cache Coherent
     Interconnect) (Johnson Wang).

   - Convert the Samsung Exynos SoC Bus bindings to DT schema of
     exynos-bus.c (Krzysztof Kozlowski).

   - Address kernel-doc warnings by adding the description for unused
     function parameters in devfreq core (Mauro Carvalho Chehab).

   - Use NULL to pass a null pointer rather than zero according to the
     function propotype in imx-bus.c (Colin Ian King).

   - Print error message instead of error interger value in
     tegra30-devfreq.c (Dmitry Osipenko).

   - Add checks to prevent setting negative frequency QoS limits for
     CPUs (Shivnandan Kumar).

   - Update the pm-graph suite of utilities to the latest revision 5.9
     including multiple improvements (Todd Brandt).

   - Drop pme_interrupt reference from the PCI power management
     documentation (Mario Limonciello)"

* tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits)
  powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
  PM: QoS: Add check to make sure CPU freq is non-negative
  PM: hibernate: defer device probing when resuming from hibernation
  intel_idle: make SPR C1 and C1E be independent
  cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask
  cpufreq: loongson2: fix Kconfig "its" grammar
  pm-graph v5.9
  cpufreq: Warn users while freeing active policy
  cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
  firmware: arm_scmi: Get detailed power scale from perf
  Documentation: EM: Switch to micro-Watts scale
  PM: EM: convert power field to micro-Watts precision and align drivers
  PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
  PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
  PM / devfreq: shut up kernel-doc warnings
  dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
  PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
  dt-bindings: interconnect: Add MediaTek CCI dt-bindings
  PM: domains: Ensure genpd_debugfs_dir exists before remove
  PM: runtime: Extend support for wakeirq for force_suspend|resume
  ...
2022-08-02 11:17:00 -07:00
Linus Torvalds
650ea1f626 - Add machinery to initialize AMX register state in order for CPUs to
be able to enter deeper low-power state
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLnsksACgkQEsHwGGHe
 VUpOOw//WAfkouWFd7kmACSiWtkgEQfXgImhhM7tw5Zzks+aEMtL2RrKqFYzkFg5
 hJK+lMI8QDkBFU/bgI/nAZfFiAS7iBMPY4T2Uw4+jZCPLr3TmUheJ2Pe1CxlIzQC
 MfjXQm/j5uTZcB2jEORjPT5dVE3p6k1KpSbvf5ZKCc9YTwdylv3VeYcfv5WEkihR
 61bWU+T7Yse4A3Bx32ewabLmk7lwOcdS1vbfsqdvkpI1vE1gI8CThgTuNAt8JWij
 27GIxiF2BQkyw3d/IPt3wGIPOgVowISXWdtMgpCr17Mw1m+44vXG9cjSuAKfqAUY
 wNXrBzirdqzJgN85WVJEFIoJasFJicrz/oNLYbcHQa8+AruRu6in22cSkPYPvVGc
 iNgSlQOZdoY9Vl6izEV4OawCccYnKjskEW7nEVIqfENrwRPYWB/IAnGxkla7q3Ch
 q+T8dyOAWToumuPK13c5VoX0nd02bfwSJACYRxN+M22zq8s7+Jv1fNtQeAGLnmD1
 jG3HR0wJWBOVVyira7AbFI7Mx667HayslIesftEGU33FfY0gZTcwZ7jsZ9GTSyOi
 AgHN3PvHyJYQ648T8JzbyuNJe3dyDKf81OLaPHP6+nV9Dy3aCrERTML0jo8xWv2N
 rDA61BV/q+hdQS3vzmLRVPzLLZksGRNCS2ZzIbkR4dGxLQAAB2M=
 =w/wH
 -----END PGP SIGNATURE-----

Merge tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu update from Borislav Petkov:

 - Add machinery to initialize AMX register state in order for
   AMX-capable CPUs to be able to enter deeper low-power state

* tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  intel_idle: Add a new flag to initialize the AMX state
  x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
2022-08-01 09:36:18 -07:00
Rafael J. Wysocki
f611b33af2 Merge back cpuidle material for 5.20. 2022-07-29 17:15:30 +02:00
Artem Bityutskiy
1548fac47a intel_idle: make SPR C1 and C1E be independent
This patch partially reverts the changes made by the following commit:

da0e58c038 intel_idle: add 'preferred_cstates' module argument

As that commit describes, on early Sapphire Rapids Xeon platforms the C1 and
C1E states were mutually exclusive, so that users could only have either C1 and
C6, or C1E and C6.

However, Intel firmware engineers managed to remove this limitation and make C1
and C1E to be completely independent, just like on previous Xeon platforms.

Therefore, this patch:
 * Removes commentary describing the old, and now non-existing SPR C1E
   limitation.
 * Marks SPR C1E as available by default.
 * Removes the 'preferred_cstates' parameter handling for SPR. Both C1 and
   C1E will be available regardless of 'preferred_cstates' value.

We expect that all SPR systems are shipping with new firmware, which includes
the C1/C1E improvement.

Cc: v5.18+ <stable@vger.kernel.org> # v5.18+
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-25 20:40:29 +02:00
Waiman Long
d295ad34f2 intel_idle: Fix false positive RCU splats due to incorrect hardirqs state
Commit 32d4fd5751 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
uses raw_local_irq_enable/local_irq_disable() around call to
__intel_idle() in intel_idle_irq().

With interrupt enabled, timer tick interrupt can happen and a
subsequently call to __do_softirq() may change the lockdep hardirqs state
of a debug kernel back to 'on'. This will result in a mismatch between
the cpu hardirqs state (off) and the lockdep hardirqs state (on) causing
a number of false positive "WARNING: suspicious RCU usage" splats.

Fix that by using local_irq_disable() to disable interrupt in
intel_idle_irq().

Fixes: 32d4fd5751 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-25 19:39:07 +02:00
Chang S. Bae
9f01129382 intel_idle: Add a new flag to initialize the AMX state
The non-initialized AMX state can be the cause of C-state demotion from C6
to C1E. This low-power idle state may improve power savings and thus result
in a higher available turbo frequency budget.

This behavior is implementation-specific. Initialize the state for the C6
entrance of Sapphire Rapids as needed.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/20220614164116.5196-1-chang.seok.bae@intel.com
2022-07-19 19:17:28 +02:00
Peter Zijlstra
bf5835bcdb intel_idle: Disable IBRS during long idle
Having IBRS enabled while the SMT sibling is idle unnecessarily slows
down the running sibling. OTOH, disabling IBRS around idle takes two
MSR writes, which will increase the idle latency.

Therefore, only disable IBRS around deeper idle states. Shallow idle
states are bounded by the tick in duration, since NOHZ is not allowed
for them by virtue of their short target residency.

Only do this for mwait-driven idle, since that keeps interrupts disabled
across idle, which makes disabling IBRS vs IRQ-entry a non-issue.

Note: C6 is a random threshold, most importantly C1 probably shouldn't
disable IBRS, benchmarking needed.

Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Peter Zijlstra
32d4fd5751 cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE
Commit c227233ad6 ("intel_idle: enable interrupts before C1 on
Xeons") wrecked intel_idle in two ways:

 - must not have tracing in idle functions
 - must return with IRQs disabled

Additionally, it added a branch for no good reason.

Fixes: c227233ad6 ("intel_idle: enable interrupts before C1 on Xeons")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ rjw: Moved the intel_idle() kerneldoc comment next to the function ]
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-06-08 18:05:45 +02:00
Zhang Rui
d1cf8bbfed intel_idle: Add AlderLake support
Similar to SPR, the C1 and C1E states on ADL are mutually exclusive.
Only one of them can be enabled at a time.

But contrast to SPR, which usually has a strong latency requirement
as a Xeon processor, C1E is preferred on ADL for better energy
efficiency.

Add custom C-state tables for ADL with both C1 and C1E, and

 1. Enable the "C1E promotion" bit in MSR_IA32_POWER_CTL and mark C1
    with the CPUIDLE_FLAG_UNUSABLE flag, so C1 is not available by
    default.

 2. Add support for the "preferred_cstates" module parameter, so that
    users can choose to use C1 instead of C1E by booting with
    "intel_idle.preferred_cstates=2".

Separate custom C-state tables are introduced for the ADL mobile and
desktop processors, because of the exit latency differences between
these two variants, especially with respect to PC10.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits, code rearrangement ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-28 14:50:09 +02:00
Artem Bityutskiy
7eac3bd38d intel_idle: Fix SPR C6 optimization
The Sapphire Rapids (SPR) C6 optimization was added to the end of the
'spr_idle_state_table_update()' function. However, the function has a
'return' which may happen before the optimization has a chance to run.
And this may prevent the optimization from happening.

This is an unlikely scenario, but possible if user boots with, say,
the 'intel_idle.preferred_cstates=6' kernel boot option.

This patch fixes the issue by eliminating the problematic 'return'
statement.

Fixes: 3a9cf77b60 ("intel_idle: add core C6 optimization for SPR")
Suggested-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-27 20:36:47 +02:00
Artem Bityutskiy
39c184a6a9 intel_idle: Fix the 'preferred_cstates' module parameter
Problem description.

When user boots kernel up with the 'intel_idle.preferred_cstates=4' option,
we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order
for C1E to work on SPR, we have to enable the C1E promotion bit on all
CPUs.  However, we enable it only on one CPU.

Fix description.

The 'intel_idle' driver already has the infrastructure for disabling C1E
promotion on every CPU. This patch uses the same infrastructure for
enabling C1E promotion on every CPU. It changes the boolean
'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion'
variable.

Tested on a 2-socket SPR system. I verified the following combinations:

 * C1E promotion enabled and disabled in BIOS.
 * Booted with and without the 'intel_idle.preferred_cstates=4' kernel
   argument.

In all 4 cases C1E promotion was correctly set on all CPUs.

Also tested on an old Broadwell system, just to make sure it does not cause
a regression. C1E promotion was correctly disabled on that system, both C1
and C1E were exposed (as expected).

Fixes: da0e58c038 ("intel_idle: add 'preferred_cstates' module argument")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-27 20:36:47 +02:00
Rafael J. Wysocki
03eb65224e cpuidle: intel_idle: Drop redundant backslash at line end
Drop a redundant backslash character at the end of a line in the
spr_cstates[] definition.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2022-03-17 14:32:59 +01:00
Rafael J. Wysocki
a335b1e6bb cpuidle: intel_idle: Update intel_idle() kerneldoc comment
Commit bf9282dc26 ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic")
moved the leave_mm() call away from intel_idle(), but it didn't update
its kerneldoc comment accordingly, so do that now.

Fixes: bf9282dc26 ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-17 14:32:59 +01:00
Artem Bityutskiy
3a9cf77b60 intel_idle: add core C6 optimization for SPR
Add a Sapphire Rapids Xeon C6 optimization, similar to what we have for Sky Lake
Xeon: if package C6 is disabled, adjust C6 exit latency and target residency to
match core C6 values, instead of using the default package C6 values.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-04 19:54:32 +01:00
Artem Bityutskiy
da0e58c038 intel_idle: add 'preferred_cstates' module argument
On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
exclusive - only one of them can be enabled. By default, 'intel_idle' driver
enables C1 and disables C1E. However, some users prefer to use C1E instead of
C1, because it saves more energy.

This patch adds a new module parameter ('preferred_cstates') for enabling C1E
and disabling C1. Here is the idea behind it.

1. This option has effect only for "mutually exclusive" C-states like C1 and
   C1E on SPR.
2. It does not have any effect on independent C-states, which do not require
   other C-states to be disabled (most states on most platforms as of today).
3. For mutually exclusive C-states, the 'intel_idle' driver always has a
   reasonable default, such as enabling C1 on SPR by default. On other
   platforms, the default may be different.
4. Users can override the default using the 'preferred_cstates' parameter.
5. The parameter accepts the preferred C-states bit-mask, similarly to the
   existing 'states_off' parameter.
6. This parameter is not limited to C1/C1E, and leaves room for supporting
   other mutually exclusive C-states, if they come in the future.

Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
to disable C1 and enable C1E, users should boot with the following kernel
argument: intel_idle.preferred_cstates=4

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-04 19:54:32 +01:00
Artem Bityutskiy
9edf3c0ffe intel_idle: add SPR support
Add Sapphire Rapids Xeon support.

Up until very recently, the C1 and C1E C-states were independent, but this
has changed in some new chips, including Sapphire Rapids Xeon (SPR). In these
chips the C1 and C1E states cannot be enabled at the same time. The "C1E
promotion" bit in 'MSR_IA32_POWER_CTL' also has its semantics changed a bit.

Here are the C1, C1E, and "C1E promotion" bit rules on Xeons before SPR.

1. If C1E promotion bit is disabled.
   a. C1  requests end up with C1  C-state.
   b. C1E requests end up with C1E C-state.
2. If C1E promotion bit is enabled.
   a. C1  requests end up with C1E C-state.
   b. C1E requests end up with C1E C-state.

Here are the C1, C1E, and "C1E promotion" bit rules on Sapphire Rapids Xeon.
1. If C1E promotion bit is disabled.
   a. C1  requests end up with C1 C-state.
   b. C1E requests end up with C1 C-state.
2. If C1E promotion bit is enabled.
   a. C1  requests end up with C1E C-state.
   b. C1E requests end up with C1E C-state.

Before SPR Xeon, the 'intel_idle' driver was disabling C1E promotion and was
exposing C1 and C1E as independent C-states. But on SPR, C1 and C1E cannot be
enabled at the same time.

This patch adds both C1 and C1E states. However, C1E is marked as with the
"CPUIDLE_FLAG_UNUSABLE" flag, which means that in won't be registered by
default. The C1E promotion bit will be cleared, which means that by default
only C1 and C6 will be registered on SPR.

The next patch will add an option for enabling C1E and disabling C1 on SPR.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-04 19:54:31 +01:00
Artem Bityutskiy
c227233ad6 intel_idle: enable interrupts before C1 on Xeons
Enable local interrupts before requesting C1 on the last two generations
of Intel Xeon platforms: Sky Lake, Cascade Lake, Cooper Lake, Ice Lake.
This decreases average C1 interrupt latency by about 5-10%, as measured
with the 'wult' tool.

The '->enter()' function of the driver enters C-states with local
interrupts disabled by executing the 'monitor' and 'mwait' pair of
instructions. If an interrupt happens, the CPU exits the C-state and
continues executing instructions after 'mwait'. It does not jump to
the interrupt handler, because local interrupts are disabled. The
cpuidle subsystem enables interrupts a bit later, after doing some
housekeeping.

With this patch, we enable local interrupts before requesting C1. In
this case, if the CPU wakes up because of an interrupt, it will jump
to the interrupt handler right away. The cpuidle housekeeping will be
done after the pending interrupt(s) are handled.

Enabling interrupts before entering a C-state has measurable impact
for faster C-states, like C1. Deeper, but slower C-states like C6 do
not really benefit from this sort of change, because their latency is
a lot higher comparing to the delay added by cpuidle housekeeping.

This change was also tested with cyclictest and dbench. In case of Ice
Lake, the average cyclictest latency decreased by 5.1%, and the average
'dbench' throughput increased by about 0.8%. Both tests were run for 4
hours with only C1 enabled (all other idle states, including 'POLL',
were disabled). CPU frequency was pinned to HFM, and uncore frequency
was pinned to the maximum value. The other platforms had similar
single-digit percentage improvements.

It is worth noting that this patch affects 'cpuidle' statistics a tiny
bit.  Before this patch, C1 residency did not include the interrupt
handling time, but with this patch, it will include it. This is similar
to what happens in case of the 'POLL' state, which also runs with
interrupts enabled.

Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-24 18:44:18 +02:00
Chen Yu
6423333849 intel_idle: Adjust the SKX C6 parameters if PC6 is disabled
Because cpuidle assumes worst-case C-state parameters, PC6 parameters
are used for describing C6, which is worst-case for requesting CC6.
When PC6 is enabled, this is appropriate. But if PC6 is disabled
in the BIOS, the exit latency and target residency should be adjusted
accordingly.

Exit latency:
Previously the C6 exit latency was measured as the PC6 exit latency.
With PC6 disabled, the C6 exit latency should be the one of CC6.

Target residency:
With PC6 disabled, the idle duration within [CC6, PC6) would make the
idle governor choose C1E over C6. This would cause low energy-efficiency.
We should lower the bar to request C6 when PC6 is disabled.

To fill this gap, check if PC6 is disabled in the BIOS in the
MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
for C6 and set target_residency to 3 times of the new exit latency. [This
is consistent with how intel_idle driver uses _CST to calculate the
target_residency.] As a result, the OS would be more likely to choose C6
over C1E when PC6 is disabled, which is reasonable, because if C6 is
enabled, it implies that the user cares about energy, so choosing C6 more
frequently makes sense.

The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
model number as SkX, but their CC6 exit latencies are similar to the SKX
one, 96us and 89us respectively, so reuse the SKX value for them.

There is a concern that it might be better to use a more generic approach
instead of optimizing every platform. However, if the required code
complexity and different PC6 bit interpretation on different platforms
are taken into account, tuning the code per platform seems to be an
acceptable tradeoff.

Link: https://intel.github.io/wult/ # [1]
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-09 17:51:07 +02:00
Artem Bityutskiy
22141d5f41 intel_idle: add Iclelake-D support
This patch adds Icelake Xeon D support to the intel_idle driver.

Since Icelake D and Icelake SP C-state characteristics the same,
we use Icelake SP C-states table for Icelake D as well.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-08 19:18:07 +02:00
Artem Bityutskiy
d484b8bfc6 intel_idle: update ICX C6 data
Change IceLake Xeon C6 latency from 128 us to 170 us. The latency
was measured with the "wult" tool and corresponds to the 99.99th
percentile when measuring with the "nic" method. Note, the 128 us
figure correspond to the median latency, but in intel_idle we use
the "worst case" latency figure instead.

C6 target residency was increased from 384 us to 600 us, which may
result in less C6 residency in some workloads. This value was tested
and compared to values 384, and 1000. Value 600 is a reasonable
tradeoff between power and performance.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-18 19:36:34 +01:00
Tom Rix
651bc5816c intel_idle: remove definition of DEBUG
Defining DEBUG should only be done in development.
So remove DEBUG.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-22 16:46:16 +01:00
Artem Bityutskiy
9cf93f056f intel_idle: add SnowRidge C-state table
Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.

The following has been changed.

 1. C1E latency changed from 10us to 15us. It was measured using the
    open source "wult" tool (the "nic" method, 15us is the 99.99th
    percentile).

 2. C1E power break even changed from 20us to 25us, which may result
    in less C1E residency in some workloads.

 3. C6 latency changed from 50us to 130us. Measured the same way as C1E.

The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.

On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-30 18:25:34 +01:00
Peter Zijlstra
4d916140bf intel_idle: Build fix
Because CONFIG_ soup.

Fixes: 6e1d2bc675 ("intel_idle: Fix intel_idle() vs tracing")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201130115402.GO3040@hirez.programming.kicks-ass.net
2020-12-03 10:00:23 +01:00
Peter Zijlstra
6e1d2bc675 intel_idle: Fix intel_idle() vs tracing
cpuidle->enter() callbacks should not call into tracing because RCU
has already been disabled. Instead of doing the broadcast thing
itself, simply advertise to the cpuidle core that those states stop
the timer.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20201123143510.GR3021@hirez.programming.kicks-ass.net
2020-11-24 16:47:36 +01:00
Chen Yu
4e0ba5577d intel_idle: Fix max_cstate for processor models without C-state tables
Currently intel_idle driver gets the c-state information from ACPI
_CST if the processor model is not recognized by it. However the
c-state in _CST starts with index 1 which is different from the
index in intel_idle driver's internal c-state table.

While intel_idle_max_cstate_reached() was previously introduced to
deal with intel_idle driver's internal c-state table, re-using
this function directly on _CST is incorrect.

Fix this by subtracting 1 from the index when checking max_cstate
in the _CST case.

For example, append intel_idle.max_cstate=1 in boot command line,
Before the patch:
grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
POLL
After the patch:
grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI

Fixes: 18734958e9 ("intel_idle: Use ACPI _CST for processor models without C-state tables")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-27 19:03:53 +01:00
Mel Gorman
75af76d0a3 intel_idle: Ignore _CST if control cannot be taken from the platform
e6d4f08a67 ("intel_idle: Use ACPI _CST on server systems") avoids
enabling c-states that have been disabled by the platform with the
exception of C1E.

Unfortunately, BIOS implementations are not always consistent in terms
of how capabilities are advertised and control cannot always be handed
over. If control cannot be handed over then intel_idle reports that "ACPI
_CST not found or not usable" but does not clear acpi_state_table.count
meaning the information is still partially used.

This patch ignores ACPI information if CST control cannot be requested from
the platform. This was only observed on a number of Haswell platforms that
had identical CPUs but not identical BIOS versions.  While this problem
may be rare overall, 24 separate test cases bisected to this specific
commit across 4 separate test machines and is worth addressing. If the
situation occurs, the kernel behaves as it did before commit e6d4f08a67
and uses any c-states that are discovered.

The affected test cases were all ones that involved a small number of
processes -- exec microbenchmark, pipe microbenchmark, git test suite,
netperf, tbench with one client and system call microbenchmark. Each
case benefits from being able to use turboboost which is prevented if the
lower c-states are unavailable. This may mask real regressions specific
to older hardware so it is worth addressing.

C-state status before and after the patch

5.9.0-vanilla            POLL     latency:0      disabled:0 default:enabled
5.9.0-vanilla            C1       latency:2      disabled:0 default:enabled
5.9.0-vanilla            C1E      latency:10     disabled:0 default:enabled
5.9.0-vanilla            C3       latency:33     disabled:1 default:disabled
5.9.0-vanilla            C6       latency:133    disabled:1 default:disabled
5.9.0-ignore-cst-v1r1    POLL     latency:0      disabled:0 default:enabled
5.9.0-ignore-cst-v1r1    C1       latency:2      disabled:0 default:enabled
5.9.0-ignore-cst-v1r1    C1E      latency:10     disabled:0 default:enabled
5.9.0-ignore-cst-v1r1    C3       latency:33     disabled:0 default:enabled
5.9.0-ignore-cst-v1r1    C6       latency:133    disabled:0 default:enabled

Patch enables C3/C6.

Netperf UDP_STREAM

netperf-udp
                                      5.5.0                  5.9.0
                                    vanilla        ignore-cst-v1r1
Hmean     send-64         193.41 (   0.00%)      226.54 *  17.13%*
Hmean     send-128        392.16 (   0.00%)      450.54 *  14.89%*
Hmean     send-256        769.94 (   0.00%)      881.85 *  14.53%*
Hmean     send-1024      2994.21 (   0.00%)     3468.95 *  15.85%*
Hmean     send-2048      5725.60 (   0.00%)     6628.99 *  15.78%*
Hmean     send-3312      8468.36 (   0.00%)    10288.02 *  21.49%*
Hmean     send-4096     10135.46 (   0.00%)    12387.57 *  22.22%*
Hmean     send-8192     17142.07 (   0.00%)    19748.11 *  15.20%*
Hmean     send-16384    28539.71 (   0.00%)    30084.45 *   5.41%*

Fixes: e6d4f08a67 ("intel_idle: Use ACPI _CST on server systems")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-16 17:28:32 +02:00
Alexander Monakov
8bb2e2a887 intel_idle: mention assumption that WBINVD is not needed
Intel SDM does not explicitly say that entering a C-state via MWAIT will
implicitly flush CPU caches as appropriate for that C-state. However,
documentation for individual Intel CPU generations does mention this
behavior.

Since intel_idle binds to any Intel CPU with MWAIT, list this assumption
of MWAIT behavior.

In passing, reword opening comment to make it clear that the driver can
load on any old and future Intel CPU with MWAIT.

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-16 17:09:14 +02:00
Peter Zijlstra
bf9282dc26 cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic
This allows moving the leave_mm() call into generic code before
rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
2020-08-26 12:41:53 +02:00
Linus Torvalds
99ea1521a0 Remove uninitialized_var() macro for v5.9-rc1
- Clean up non-trivial uses of uninitialized_var()
 - Update documentation and checkpatch for uninitialized_var() removal
 - Treewide removal of uninitialized_var()
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oYLQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsfjEACvf0D3WL3H7sLHtZ2HeMwOgAzq
 il08t6vUscINQwiIIK3Be43ok3uQ1Q+bj8sr2gSYTwunV2IYHFferzgzhyMMno3o
 XBIGd1E+v1E4DGBOiRXJvacBivKrfvrdZ7AWiGlVBKfg2E0fL1aQbe9AYJ6eJSbp
 UGqkBkE207dugS5SQcwrlk1tWKUL089lhDAPd7iy/5RK76OsLRCJFzIerLHF2ZK2
 BwvA+NWXVQI6pNZ0aRtEtbbxwEU4X+2J/uaXH5kJDszMwRrgBT2qoedVu5LXFPi8
 +B84IzM2lii1HAFbrFlRyL/EMueVFzieN40EOB6O8wt60Y4iCy5wOUzAdZwFuSTI
 h0xT3JI8BWtpB3W+ryas9cl9GoOHHtPA8dShuV+Y+Q2bWe1Fs6kTl2Z4m4zKq56z
 63wQCdveFOkqiCLZb8s6FhnS11wKtAX4czvXRXaUPgdVQS1Ibyba851CRHIEY+9I
 AbtogoPN8FXzLsJn7pIxHR4ADz+eZ0dQ18f2hhQpP6/co65bYizNP5H3h+t9hGHG
 k3r2k8T+jpFPaddpZMvRvIVD8O2HvJZQTyY6Vvneuv6pnQWtr2DqPFn2YooRnzoa
 dbBMtpon+vYz6OWokC5QNWLqHWqvY9TmMfcVFUXE4AFse8vh4wJ8jJCNOFVp8On+
 drhmmImUr1YylrtVOw==
 =xHmk
 -----END PGP SIGNATURE-----

Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull uninitialized_var() macro removal from Kees Cook:
 "This is long overdue, and has hidden too many bugs over the years. The
  series has several "by hand" fixes, and then a trivial treewide
  replacement.

   - Clean up non-trivial uses of uninitialized_var()

   - Update documentation and checkpatch for uninitialized_var() removal

   - Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  compiler: Remove uninitialized_var() macro
  treewide: Remove uninitialized_var() usage
  checkpatch: Remove awareness of uninitialized_var() macro
  mm/debug_vm_pgtable: Remove uninitialized_var() usage
  f2fs: Eliminate usage of uninitialized_var() macro
  media: sur40: Remove uninitialized_var() usage
  KVM: PPC: Book3S PR: Remove uninitialized_var() usage
  clk: spear: Remove uninitialized_var() usage
  clk: st: Remove uninitialized_var() usage
  spi: davinci: Remove uninitialized_var() usage
  ide: Remove uninitialized_var() usage
  rtlwifi: rtl8192cu: Remove uninitialized_var() usage
  b43: Remove uninitialized_var() usage
  drbd: Remove uninitialized_var() usage
  x86/mm/numa: Remove uninitialized_var() usage
  docs: deprecated.rst: Add uninitialized_var()
2020-08-04 13:49:43 -07:00
Chen Yu
a472ad2bce intel_idle: Customize IceLake server support
On ICX platform, the C1E auto-promotion is enabled by default.
As a result, the CPU might fall into C1E more offen than previous
platforms. Besides, the C1E is not exposed to sysfs on ICX, which
is inconsistent with previous server platforms.

So disable C1E auto-promotion and expose C1E as a separate idle
state, so the C1E and C6 can be disabled via sysfs when necessary.

Beside C1 and C1E, the exit latency of C6 was measured
by a dedicated tool. However the exit latency(41us) exposed
by _CST is much smaller than the one we measured(128us). This
is probably due to the _CST uses the exit latency when woken
up from PC0+C6, rather than PC6+C6 when C6 was measured. Choose
the latter as we need the longest latency in theory.

Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-30 17:36:10 +02:00
Neal Liu
efe9711214 cpuidle: change enter_s2idle() prototype
Control Flow Integrity(CFI) is a security mechanism that disallows
changes to the original control flow graph of a compiled binary,
making it significantly harder to perform such attacks.

init_state_node() assign same function callback to different
function pointer declarations.

static int init_state_node(struct cpuidle_state *idle_state,
                           const struct of_device_id *matches,
                           struct device_node *state_node) { ...
        idle_state->enter = match_id->data; ...
        idle_state->enter_s2idle = match_id->data; }

Function declarations:

struct cpuidle_state { ...
        int (*enter) (struct cpuidle_device *dev,
                      struct cpuidle_driver *drv,
                      int index);

        void (*enter_s2idle) (struct cpuidle_device *dev,
                              struct cpuidle_driver *drv,
                              int index); };

In this case, either enter() or enter_s2idle() would cause CFI check
failed since they use same callee.

Align function prototype of enter() since it needs return value for
some use cases. The return value of enter_s2idle() is no
need currently.

Signed-off-by: Neal Liu <neal.liu@mediatek.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-29 18:38:30 +02:00
Kees Cook
3f649ab728 treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:15 -07:00
Rafael J. Wysocki
dab20177b6 intel_idle: Eliminate redundant static variable
The value of the lapic_timer_always_reliable static variable in
the intel_idle driver reflects the boot_cpu_has(X86_FEATURE_ARAT)
value and so it also reflects the static_cpu_has(X86_FEATURE_ARAT)
value.

Hence, the lapic_timer_always_reliable check in intel_idle() is
redundant and apart from this lapic_timer_always_reliable is only
used in two places in which boot_cpu_has(X86_FEATURE_ARAT) can be
used directly.

Eliminate the lapic_timer_always_reliable variable in accordance
with the above observations.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-29 13:58:28 +02:00
Linus Torvalds
9b82f05f86 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle were:

  Kernel side changes:

   - A couple of x86/cpu cleanups and changes were grandfathered in due
     to patch dependencies. These clean up the set of CPU model/family
     matching macros with a consistent namespace and C99 initializer
     style.

   - A bunch of updates to various low level PMU drivers:
       * AMD Family 19h L3 uncore PMU
       * Intel Tiger Lake uncore support
       * misc fixes to LBR TOS sampling

   - optprobe fixes

   - perf/cgroup: optimize cgroup event sched-in processing

   - misc cleanups and fixes

  Tooling side changes are to:

   - perf {annotate,expr,record,report,stat,test}

   - perl scripting

   - libapi, libperf and libtraceevent

   - vendor events on Intel and S390, ARM cs-etm

   - Intel PT updates

   - Documentation changes and updates to core facilities

   - misc cleanups, fixes and other enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
  cpufreq/intel_pstate: Fix wrong macro conversion
  x86/cpu: Cleanup the now unused CPU match macros
  hwrng: via_rng: Convert to new X86 CPU match macros
  crypto: Convert to new CPU match macros
  ASoC: Intel: Convert to new X86 CPU match macros
  powercap/intel_rapl: Convert to new X86 CPU match macros
  PCI: intel-mid: Convert to new X86 CPU match macros
  mmc: sdhci-acpi: Convert to new X86 CPU match macros
  intel_idle: Convert to new X86 CPU match macros
  extcon: axp288: Convert to new X86 CPU match macros
  thermal: Convert to new X86 CPU match macros
  hwmon: Convert to new X86 CPU match macros
  platform/x86: Convert to new CPU match macros
  EDAC: Convert to new X86 CPU match macros
  cpufreq: Convert to new X86 CPU match macros
  ACPI: Convert to new X86 CPU match macros
  x86/platform: Convert to new CPU match macros
  x86/kernel: Convert to new CPU match macros
  x86/kvm: Convert to new CPU match macros
  x86/perf/events: Convert to new CPU match macros
  ...
2020-03-30 16:40:08 -07:00
Thomas Gleixner
4a9f45a053 intel_idle: Convert to new X86 CPU match macros
The new macro set has a consistent namespace and uses C99 initializers
instead of the grufty C89 ones.

Get rid the of the local macro wrappers for consistency.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131510.193755545@linutronix.de
2020-03-24 21:34:32 +01:00
Rafael J. Wysocki
317e5ec3ec intel_idle: Update copyright notice, known limitations and version
Update the copyright notice in intel_idle.c to cover the recent
changes, drop the description of a "known limitation" that is not
a limitation any more and bump up the driver version number.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:58 +01:00
Rafael J. Wysocki
a472e4b592 intel_idle: Define CPUIDLE_FLAG_TLB_FLUSHED as BIT(16)
Use the BIT() macro for defining CPUIDLE_FLAG_TLB_FLUSHED instead of
the hex bit encoding of the same value.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:58 +01:00
Rafael J. Wysocki
6eacb15fef intel_idle: Clean up kerneldoc comments for multiple functions
Turn the description comments of some functions in the intel_idle
driver into proper kerneldoc ones and clean them up.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:58 +01:00
Rafael J. Wysocki
6eb0443ac8 intel_idle: Reorder declarations of static variables
Reorder declarations of static variables so that the __initdata ones
are declared together.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:57 +01:00
Rafael J. Wysocki
ab1a8522d8 intel_idle: Annotate init time data structures
Add __initdata or __initconst annotations to the static data
structures that are only used during the initialization of the
driver.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:57 +01:00
Rafael J. Wysocki
7f843dd712 intel_idle: Add __initdata annotations to init time variables
Annotate static variables cpuidle_state_table and mwait_substates
with __initdata, because they are only used during the initialization
of the driver.

Also notice that static variable icpu could be annotated analogously
and the structure pointed to by it could be __initconst, but two of
its fields are accessed via icpu in intel_idle_cpu_init() and
auto_demotion_disable(), so introduce two new static variables,
auto_demotion_disable_flags and disable_promotion_to_c1e, to hold
the values of these fields, set them during the initialization and
use them in those functions instead of accessing the source data
structure via icpu.

That allows icpu to be annotated with __initdata, so do that,
and it will also allow some __initconst annotations to be added
subsequently.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:57 +01:00
Rafael J. Wysocki
30a996fbb3 intel_idle: Relocate definitions of cpuidle callbacks
Move the definitions of intel_idle() and intel_idle_s2idle() before
the definitions of cpuidle_state structures referring to them to
avoid having to use additional declarations of them (and drop those
declarations).

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:57 +01:00
Rafael J. Wysocki
bc721c1e45 intel_idle: Clean up definitions of cpuidle callbacks
Add proper kerneldoc descriptions to intel_idle() and
intel_idle_s2idle(), annotate the latter with __cpuidle and
reorder the declarations of local variables in both of them to
reflect the mwait_idle_with_hints() arguments order.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:57 +01:00
Rafael J. Wysocki
40ab82e08d intel_idle: Simplify LAPIC timer reliability checks
The lapic_timer_always_reliable variable really takes only two values
and some arithmetic in intel_idle() related to comparing it with the
target C-state's MWAIT hint value is unnecessary.

Simplify the code by replacing lapic_timer_always_reliable with
a bool variable lapic_timer_always_reliable and dropping the
LAPIC_TIMER_ALWAYS_RELIABLE symbol along with the excess
computations in intel_idle().

While at it, add a comment explaining the branch taken in intel_idle()
if the LAPIC timer is only reliable in C1 and modify the related debug
message in intel_idle_init() accordingly (the modification of this
message in the only expected functional impact of the change made
here).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 23:34:57 +01:00
Rafael J. Wysocki
4dcb78ee57 intel_idle: Introduce 'states_off' module parameter
In certain system configurations it may not be desirable to use some
C-states assumed to be available by intel_idle and the driver needs
to be prevented from using them even before the cpuidle sysfs
interface becomes accessible to user space.  Currently, the only way
to achieve that is by setting the 'max_cstate' module parameter to a
value lower than the index of the shallowest of the C-states in
question, but that may be overly intrusive, because it effectively
makes all of the idle states deeper than the 'max_cstate' one go
away (and the C-state to avoid may be in the middle of the range
normally regarded as available).

To allow that limitation to be overcome, introduce a new module
parameter called 'states_off' to represent a list of idle states to
be disabled by default in the form of a bitmask and update the
documentation to cover it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-03 11:57:18 +01:00
Rafael J. Wysocki
3a5be9b8f4 intel_idle: Introduce 'use_acpi' module parameter
For diagnostics, it is generally useful to be able to make intel_idle
take the system's ACPI tables into consideration even if that is not
required for the processor model in there, so introduce a new module
parameter, 'use_acpi', to make that happen and update the documentation
to cover it.

While at it, fix the 'no_acpi' module parameter name in the
documentation.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-03 11:57:08 +01:00
Linus Torvalds
c0275ae758 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu-features updates from Ingo Molnar:
 "The biggest change in this cycle was a large series from Sean
  Christopherson to clean up the handling of VMX features. This both
  fixes bugs/inconsistencies and makes the code more coherent and
  future-proof.

  There are also two cleanups and a minor TSX syslog messages
  enhancement"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/cpu: Remove redundant cpu_detect_cache_sizes() call
  x86/cpu: Print "VMX disabled" error message iff KVM is enabled
  KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs
  perf/x86: Provide stubs of KVM helpers for non-Intel CPUs
  KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits
  KVM: VMX: Check for full VMX support when verifying CPU compatibility
  KVM: VMX: Use VMX feature flag to query BIOS enabling
  KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR
  x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured
  x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()
  x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*
  x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs
  x86/vmx: Introduce VMX_FEATURES_*
  x86/cpu: Clear VMX feature flag if VMX is not fully enabled
  x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization
  x86/centaur: Use common IA32_FEAT_CTL MSR initialization
  x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked
  x86/intel: Initialize IA32_FEAT_CTL MSR at boot
  tools/x86: Sync msr-index.h from kernel sources
  selftests, kvm: Replace manual MSR defs with common msr-index.h
  ...
2020-01-28 12:46:42 -08:00
Rafael J. Wysocki
86e9466ae6 intel_idle: Clean up irtl_2_usec()
Move the irtl_ns_units[] definition into irtl_2_usec() which is the
only user of it, use div_u64() for the division in there (as the
divisor is small enough) and use the NSEC_PER_USEC symbol for the
divisor.  Also convert the irtl_2_usec() comment to a proper
kerneldo one.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-23 00:37:02 +01:00
Rafael J. Wysocki
1aefbd7aeb intel_idle: Move 3 functions closer to their callers
Move intel_idle_verify_cstate(), auto_demotion_disable() and
c1e_promotion_disable() closer to their callers.

While at it, annotate intel_idle_verify_cstate() with __init,
as it is only used during the initialization of the driver.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-23 00:37:01 +01:00
Rafael J. Wysocki
095928ae48 intel_idle: Annotate initialization code and data structures
Annotate the functions that are only used at the initialization time
with __init and the data structures used by them with __initdata or
__initconst.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-23 00:37:01 +01:00
Rafael J. Wysocki
0755a9bd99 intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit()
Move intel_idle_cpuidle_devices_uninit() closer to its caller,
intel_idle_init(), add the __init modifier to its header, drop a
redundant local variable from it and fix up its kerneldoc comment.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-23 00:37:01 +01:00
Rafael J. Wysocki
3d3a1ae9b4 intel_idle: Rearrange intel_idle_cpuidle_driver_init()
Notice that intel_idle_state_table_update() only needs to be called
if icpu is not NULL, so fold it into intel_idle_init_cstates_icpu(),
and pass a pointer to the driver object to
intel_idle_cpuidle_driver_init() as an argument instead of
referencing it locally in there.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-23 00:37:00 +01:00
Rafael J. Wysocki
533da74a8c intel_idle: Clean up NULL pointer check in intel_idle_init()
Instead of comparing intel_idle_cpuidle_devices with NULL apply
the "!" (not) operator to it when checking it against NULL.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-23 00:37:00 +01:00
Rafael J. Wysocki
a6c86e3362 intel_idle: Fold intel_idle_probe() into intel_idle_init()
There is no particular reason why intel_idle_probe() needs to be
a separate function and folding it into intel_idle_init() causes
the code to be somewhat easier to follow, so do just that.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-23 00:37:00 +01:00
Rafael J. Wysocki
cbd2c4c25d intel_idle: Eliminate __setup_broadcast_timer()
The __setup_broadcast_timer() static function is only called in one
place and "true" is passed to it as the argument in there, so
effectively it is a wrapper arround tick_broadcast_enable().

To simplify the code, call tick_broadcast_enable() directly instead
of __setup_broadcast_timer() and drop the latter.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-23 00:36:59 +01:00
Sean Christopherson
32ad73db7f x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation.  Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.

Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.

Opportunistically, fix a few other annoyances with the defines:

  - Relocate the bit defines so that they immediately follow the MSR
    define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
  - Add whitespace around the block of feature control defines to make
    it clear they're all related.
  - Use BIT() instead of manually encoding the bit shift.
  - Use "VMX" instead of "VMXON" to match the SDM.
  - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
    be consistent with the kernel's verbiage used for all other feature
    control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
    likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
    the (literal) one-off usage of _ON, the SDM is simply "wrong".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2020-01-13 17:23:08 +01:00
Rafael J. Wysocki
e6d4f08a67 intel_idle: Use ACPI _CST on server systems
In many cases, especially on server systems, it is desirable to avoid
enabling C-states that have been disabled in the platform firmware
(BIOS) setup, except for C1E.

As a rule, the C-states disabled this way are not listed by ACPI
_CST, so if that is used by intel_idle along with the specific
table of C-states that it has for the given processor, the C-states
disabled through the platform firmware will not be enabled by default
by intel_idle.

Accordingly, set the use_acpi flag (introduced previously) in all
server processor profiles defined in intel_idle (so as to make it use
ACPI _CST to decide which C-states to enable by default) and set
the CPUIDLE_FLAG_ALWAYS_ENABLE flag (also introduced previously)
for C1E in all C-states tables in intel_idle that contain C1 too
(so that C1E is enabled regardless of whether or not it is listed
by ACPI _CST).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-27 11:02:09 +01:00
Rafael J. Wysocki
4ec32d9e8e intel_idle: Add module parameter to prevent ACPI _CST from being used
Add a new module parameter called "no_acpi" to the intel_idle driver
to allow the driver to be prevented from using ACPI _CST via kernel
command line.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-27 11:02:09 +01:00
Rafael J. Wysocki
bff8e60a86 intel_idle: Allow ACPI _CST to be used for selected known processors
Update the intel_idle driver to get the C-states information from ACPI
_CST in some cases in which the processor is known to the driver, as long as
that information is available and the new use_acpi flag is set in the
profile of the processor in question.

In the cases when there is a specific table of C-states for the given
processor in the driver, that table is used as the primary source of
information on the available C-states, but if ACPI _CST is present,
the C-states that are not listed by it will not be enabled by default
(they still can be enabled later by user space via sysfs, though).

The new CPUIDLE_FLAG_ALWAYS_ENABLE flag can be used for marking
C-states that should be enabled by default even if they are not
listed by ACPI _CST.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-27 11:02:08 +01:00
Rafael J. Wysocki
18734958e9 intel_idle: Use ACPI _CST for processor models without C-state tables
Modify the intel_idle driver to get the C-states information from ACPI
_CST if the processor model is not recognized by it.

The processor is still required to support MWAIT and the information
from ACPI _CST will only be used if all of the C-states listed by
_CST are of the ACPI_CSTATE_FFH type (which means that they are
expected to be entered via MWAIT).

Moreover, the driver assumes that the _CST information is the same
for all CPUs in the system, so it is sufficient to evaluate _CST for
one of them and extract the common list of C-states from there.
Also _CST is evaluated once at the system initialization time and
the driver does not respond to _CST change notifications (that can
be changed in the future).

The main functional difference between intel_idle with this change
and the ACPI processor driver is that the former sets the target
residency to be equal to the exit latency (provided by _CST) for
C1-type C-states and to 3 times the exit latency value for the other
C-state types, whereas the latter obtains the target residency by
multiplying the exit latency by the same number (2 by default) for
all C-state types.  Therefore it is expected that in general using
the former instead of the latter on the same system will lead to
improved energy-efficiency.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-27 11:02:08 +01:00
Rafael J. Wysocki
9f3d6daf61 intel_idle: Refactor intel_idle_cpuidle_driver_init()
Move the C-state verification and checks from
intel_idle_cpuidle_driver_init() to a separate function,
intel_idle_verify_cstate(), and make the former call it after
checking the CPUIDLE_FLAG_UNUSABLE state flag.

Also combine the drv->states[] updates with the incrementation of
drv->state_count.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-27 11:02:07 +01:00
Rafael J. Wysocki
ba1e78a1dc cpuidle: Drop disabled field from struct cpuidle_state
After recent cpuidle updates the "disabled" field in struct
cpuidle_state is only used by two drivers (intel_idle and shmobile
cpuidle) for marking unusable idle states, but that may as well be
achieved with the help of a state flag, so define an "unusable" idle
state flag, CPUIDLE_FLAG_UNUSABLE, make the drivers in question use
it instead of the "disabled" field and make the core set
CPUIDLE_STATE_DISABLED_BY_DRIVER for the idle states with that flag
set.

After the above changes, the "disabled" field in struct cpuidle_state
is not used any more, so drop it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-29 11:48:39 +01:00
Peter Zijlstra
5ebb34edbe x86/intel: Aggregate microserver naming
Currently big microservers have _XEON_D while small microservers have
_X, Make it uniformly: _D.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \
	       -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
2019-08-28 11:29:32 +02:00
Peter Zijlstra
5e741407ea x86/intel: Aggregate big core graphics naming
Currently big core clients with extra graphics on have:

 - _G
 - _GT3E

Make it uniformly: _G

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_GT3E"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_GT3E/\1_G/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.622802314@infradead.org
2019-08-28 11:29:31 +02:00
Peter Zijlstra
af239c44e3 x86/intel: Aggregate big core mobile naming
Currently big core mobile chips have either:

 - _L
 - _ULT
 - _MOBILE

Make it uniformly: _L.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org
2019-08-28 11:29:31 +02:00
Peter Zijlstra
c66f78a6de x86/intel: Aggregate big core client naming
Currently the big core client models either have:

 - no OPTDIFF
 - _CORE
 - _DESKTOP

Make it uniformly: 'no OPTDIFF'.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(CORE\|DESKTOP\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(CORE\|DESKTOP\)/\1/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.513945586@infradead.org
2019-08-28 11:29:31 +02:00
Thomas Gleixner
a61127c213 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 335
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms and conditions of the gnu general public license
  version 2 as published by the free software foundation this program
  is distributed in the hope it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not write to the free
  software foundation inc 51 franklin st fifth floor boston ma 02110
  1301 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 111 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.567572064@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:06 +02:00
Zhang Rui
04b1d5d098 intel_idle: add support for Jacobsville
Jacobsville uses the same C-states as Denverton.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-15 10:49:14 +01:00
Linus Torvalds
c05f3642f4 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main updates in this cycle were:

   - Lots of perf tooling changes too voluminous to list (big perf trace
     and perf stat improvements, lots of libtraceevent reorganization,
     etc.), so I'll list the authors and refer to the changelog for
     details:

       Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter
       Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven
       Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas
       Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir
       Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa.

     ... with the bulk of the changes written by Jiri Olsa, Tzvetomir
     Stoyanov and Arnaldo Carvalho de Melo.

   - Continued intel_rdt work with a focus on playing well with perf
     events. This also imported some non-perf RDT work due to
     dependencies. (Reinette Chatre)

   - Implement counter freezing for Arch Perfmon v4 (Skylake and newer).
     This allows to speed up the PMI handler by avoiding unnecessary MSR
     writes and make it more accurate. (Andi Kleen)

   - kprobes cleanups and simplification (Masami Hiramatsu)

   - Intel Goldmont PMU updates (Kan Liang)

   - ... plus misc other fixes and updates"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits)
  kprobes/x86: Use preempt_enable() in optimized_callback()
  x86/intel_rdt: Prevent pseudo-locking from using stale pointers
  kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
  perf/x86/intel: Export mem events only if there's PEBS support
  x86/cpu: Drop pointless static qualifier in punit_dev_state_show()
  x86/intel_rdt: Fix initial allocation to consider CDP
  x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
  x86/intel_rdt: Introduce utility to obtain CDP peer
  tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file
  tools lib traceevent: Separate out tep_strerror() for strerror_r() issues
  perf python: More portable way to make CFLAGS work with clang
  perf python: Make clang_has_option() work on Python 3
  perf tools: Free temporary 'sys' string in read_event_files()
  perf tools: Avoid double free in read_event_file()
  perf tools: Free 'printk' string in parse_ftrace_printk()
  perf tools: Cleanup trace-event-info 'tdata' leak
  perf strbuf: Match va_{add,copy} with va_end
  perf test: S390 does not support watchpoints in test 22
  perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG
  tools include: Adopt linux/bits.h
  ...
2018-10-23 13:32:18 +01:00
Peter Zijlstra
f2c4db1bd8 x86/cpu: Sanitize FAM6_ATOM naming
Going primarily by:

  https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors

with additional information gleaned from other related pages; notably:

 - Bonnell shrink was called Saltwell
 - Moorefield is the Merriefield refresh which makes it Airmont

The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE

  for i in `git grep -l FAM6_ATOM` ; do
	sed -i  -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g'		\
		-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/'		\
		-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g'		\
		-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g'	\
		-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g'		\
		-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g'	\
		-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g'	\
		-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g'	\
		-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g'	\
		-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g'		\
		-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
  done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 10:14:32 +02:00
Andy Shevchenko
a4a008e53c intel_idle: Get rid of custom ICPU() macro
Replace custom grown macro with generic INTEL_CPU_FAM6() one.

No functional change intended.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-09-10 12:02:01 +02:00
Rafael J. Wysocki
622ade3a2f Merge branch 'pm-cpuidle'
* pm-cpuidle:
  intel_idle: Graceful probe failure when MWAIT is disabled
  cpuidle: Avoid assignment in if () argument
  cpuidle: Clean up cpuidle_enable_device() error handling a bit
  cpuidle: ladder: Add per CPU PM QoS resume latency support
  ARM: cpuidle: Refactor rollback operations if init fails
  ARM: cpuidle: Correct driver unregistration if init fails
  intel_idle: replace conditionals with static_cpu_has(X86_FEATURE_ARAT)
  cpuidle: fix broadcast control when broadcast can not be entered

 Conflicts:
	drivers/idle/intel_idle.c
2017-11-13 01:34:14 +01:00
Len Brown
a4c447533a intel_idle: Graceful probe failure when MWAIT is disabled
When MWAIT is disabled, intel_idle refuses to probe.
But it may mis-lead the user by blaming this on the model number:

intel_idle: does not run on family 6 modesl 79

So defer the check for MWAIT until after the model# white-list check succeeds,
and if the MWAIT check fails, tell the user how to fix it:

intel_idle: Please enable MWAIT in BIOS SETUP

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-09 14:13:21 +01:00
Andy Lutomirski
675357362a Revert "x86/mm: Stop calling leave_mm() in idle code"
This reverts commit 43858b4f25.

The reason I removed the leave_mm() calls in question is because the
heuristic wasn't needed after that patch.  With the original version
of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running
kernel thread) due a flush on the loaded mm.

Unfortunately, that caused architectural issues, so now I've
reinstated these flushes on non-PCID systems in:

    commit b956575bed ("x86/mm: Flush more aggressively in lazy TLB mode").

That, in turn, gives us a power management and occasionally
performance regression as compared to old kernels: a process that
goes into a deep idle state on a given CPU and gets its mm flushed
due to activity on a different CPU will wake the idle CPU.

Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an
intel_idle state that is likely to cause a TLB flush gets its mm
switched to init_mm before going idle.

FWIW, this heuristic is lousy.  Whether we should change CR3 before
idle isn't a good hint except insofar as the performance hit is a bit
lower if the TLB is getting flushed by the idle code anyway.  What we
really want to know is whether we anticipate being idle long enough
that the mm is likely to be flushed before we wake up.  This is more a
matter of the expected latency than the idle state that gets chosen.
This heuristic also completely fails on systems that don't know
whether the TLB will be flushed (e.g. AMD systems?).  OTOH it may be a
bit obsolete anyway -- PCID systems don't presently benefit from this
heuristic at all.

We also shouldn't do this callback from innermost bit of the idle code
due to the RCU nastiness it causes.  All the information need is
available before rcu_idle_enter() needs to happen.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 43858b4f25 "x86/mm: Stop calling leave_mm() in idle code"
Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-04 15:01:50 +01:00
Jason Baron
0563bb7ba6 intel_idle: replace conditionals with static_cpu_has(X86_FEATURE_ARAT)
If the 'arat' cpu flag is set, then the conditionals in intel_idle() that
guard calling tick_broadcast_enter()/exit() will never be true. Use
static_cpu_has(X86_FEATURE_ARAT) to create a fast path to replace
the conditional.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-10-11 02:02:16 +02:00
Linus Torvalds
439644096c Power management updates for v4.14-rc1
- Drop the P-state selection algorithm based on a PID controller
    from intel_pstate and make it use the same P-state selection
    method (based on the CPU load) for all types of systems in the
    active mode (Rafael Wysocki, Srinivas Pandruvada).
 
  - Rework the cpufreq core and governors to make it possible to
    take cross-CPU utilization updates into account and modify the
    schedutil governor to actually do so (Viresh Kumar).
 
  - Clean up the handling of transition latency information in the
    cpufreq core and untangle it from the information on which drivers
    cannot do dynamic frequency switching (Viresh Kumar).
 
  - Add support for new SoCs (MT2701/MT7623 and MT7622) to the
    mediatek cpufreq driver and update its DT bindings (Sean Wang).
 
  - Modify the cpufreq dt-platdev driver to autimatically create
    cpufreq devices for the new (v2) Operating Performance Points
    (OPP) DT bindings and update its whitelist of supported systems
    (Viresh Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen,
    Finley Xiao).
 
  - Add support for Ux500 to the cpufreq-dt driver and drop the
    obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).
 
  - Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
    Nguyen).
 
  - Fix and clean up assorted issues in the cpufreq drivers and core
    (Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
    Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).
 
  - Update the IO-wait boost handling in the schedutil governor to
    make it less aggressive (Joel Fernandes).
 
  - Rework system suspend diagnostics to make it print fewer messages
    to the kernel log by default, add a sysfs knob to allow more
    suspend-related messages to be printed and add Low Power S0 Idle
    constraints checks to the ACPI suspend-to-idle code (Rafael
    Wysocki, Srinivas Pandruvada).
 
  - Prefer suspend-to-idle over S3 on ACPI-based systems with the
    ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
    interface present in the ACPI tables (Rafael Wysocki).
 
  - Update documentation related to system sleep and rename a number
    of items in the code to make it cleare that they are related to
    suspend-to-idle (Rafael Wysocki).
 
  - Export a variable allowing device drivers to check the target
    system sleep state from the core system suspend code (Florian
    Fainelli).
 
  - Clean up the cpuidle subsystem to handle the polling state on
    x86 in a more straightforward way and to use %pOF instead of
    full_name (Rafael Wysocki, Rob Herring).
 
  - Update the devfreq framework to fix and clean up a few minor
    issues (Chanwoo Choi, Rob Herring).
 
  - Extend diagnostics in the generic power domains (genpd) framework
    and clean it up slightly (Thara Gopinath, Rob Herring).
 
  - Fix and clean up a couple of issues in the operating performance
    points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).
 
  - Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
    (AVS) driver (David Wu).
 
  - Fix the usage of notifiers in CPU power management on some
    platforms (Alex Shi).
 
  - Update the pm-graph system suspend/hibernation and boot profiling
    utility (Todd Brandt).
 
  - Make it possible to run the cpupower utility without CPU0 (Prarit
    Bhargava).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZrcDJAAoJEILEb/54YlRx9FUQAIUKvWBAARc61ZIZXjbqZF1v
 aEMOBuksFns0CMekdptSic6n4wc81E/XYMS8yDhOOMpyDzfAZsTWjmu+gKwN7w3l
 E/yf/NVlhob9JZ7MqGgqD4EUFfFIaKBXPlWFdDi2rdCUXE2L8xJ7rla8i7zyZlc5
 pYHfAppBbF4qUcEY4OoOVOOGRZCfMdiLXj0iZOhMX8Y6yLBRk/AjnVADYsF33hoj
 gBEfomU+H0K5V8nQEp0ZFKDArPwL+oElHQj6i+nxBpGfPM5evvLXhHOyR6AsldJ5
 J4YI1kMuQNSCmvHMqOTxTYyJf8Jcf3Fj4wcjwaVMVGceY1lz6McAKknnFnCqCvz+
 mskn84gFCBCM8EoJDqRf0b9MQHcuRyQKM+yw4tjnR9r8yd32erb85ZWFHcPWYhCT
 fZatNOwFFv2MU+2vo5J3yeUNSWIKT+uBjy+tKPbrDkUwpKZVRj3Oj+hP3Mq9NE8U
 YBqltsj7tmrdA634zI8C7jfS6wF221S0fId/iPszwmPJaVn/lq8Ror7pWL5YI8U7
 SCJFjiqDiGmAcQEkuWwFAQnscZkyHpO+Y3A+jfXl/izoaZETaI5+ceIHBaocm3+5
 XrOOpHS3ik8EHf9ji0KFCKZ/pYDwllday3cBQPWo3sMIzpQ2lrjbqdnE1cVnBrld
 OtHZAeD/jLUXuY6XW2jN
 =mAiV
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "This time (again) cpufreq gets the majority of changes which mostly
  are driver updates (including a major consolidation of intel_pstate),
  some schedutil governor modifications and core cleanups.

  There also are some changes in the system suspend area, mostly related
  to diagnostics and debug messages plus some renames of things related
  to suspend-to-idle. One major change here is that suspend-to-idle is
  now going to be preferred over S3 on systems where the ACPI tables
  indicate to do so and provide requsite support (the Low Power Idle S0
  _DSM in particular). The system sleep documentation and the tools
  related to it are updated too.

  The rest is a few cpuidle changes (nothing major), devfreq updates,
  generic power domains (genpd) framework updates and a few assorted
  modifications elsewhere.

  Specifics:

   - Drop the P-state selection algorithm based on a PID controller from
     intel_pstate and make it use the same P-state selection method
     (based on the CPU load) for all types of systems in the active mode
     (Rafael Wysocki, Srinivas Pandruvada).

   - Rework the cpufreq core and governors to make it possible to take
     cross-CPU utilization updates into account and modify the schedutil
     governor to actually do so (Viresh Kumar).

   - Clean up the handling of transition latency information in the
     cpufreq core and untangle it from the information on which drivers
     cannot do dynamic frequency switching (Viresh Kumar).

   - Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
     cpufreq driver and update its DT bindings (Sean Wang).

   - Modify the cpufreq dt-platdev driver to autimatically create
     cpufreq devices for the new (v2) Operating Performance Points (OPP)
     DT bindings and update its whitelist of supported systems (Viresh
     Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
     Xiao).

   - Add support for Ux500 to the cpufreq-dt driver and drop the
     obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).

   - Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
     Nguyen).

   - Fix and clean up assorted issues in the cpufreq drivers and core
     (Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
     Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).

   - Update the IO-wait boost handling in the schedutil governor to make
     it less aggressive (Joel Fernandes).

   - Rework system suspend diagnostics to make it print fewer messages
     to the kernel log by default, add a sysfs knob to allow more
     suspend-related messages to be printed and add Low Power S0 Idle
     constraints checks to the ACPI suspend-to-idle code (Rafael
     Wysocki, Srinivas Pandruvada).

   - Prefer suspend-to-idle over S3 on ACPI-based systems with the
     ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
     interface present in the ACPI tables (Rafael Wysocki).

   - Update documentation related to system sleep and rename a number of
     items in the code to make it cleare that they are related to
     suspend-to-idle (Rafael Wysocki).

   - Export a variable allowing device drivers to check the target
     system sleep state from the core system suspend code (Florian
     Fainelli).

   - Clean up the cpuidle subsystem to handle the polling state on x86
     in a more straightforward way and to use %pOF instead of full_name
     (Rafael Wysocki, Rob Herring).

   - Update the devfreq framework to fix and clean up a few minor issues
     (Chanwoo Choi, Rob Herring).

   - Extend diagnostics in the generic power domains (genpd) framework
     and clean it up slightly (Thara Gopinath, Rob Herring).

   - Fix and clean up a couple of issues in the operating performance
     points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).

   - Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
     (AVS) driver (David Wu).

   - Fix the usage of notifiers in CPU power management on some
     platforms (Alex Shi).

   - Update the pm-graph system suspend/hibernation and boot profiling
     utility (Todd Brandt).

   - Make it possible to run the cpupower utility without CPU0 (Prarit
     Bhargava)"

* tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
  cpuidle: Make drivers initialize polling state
  cpuidle: Move polling state initialization code to separate file
  cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
  cpufreq: imx6q: Fix imx6sx low frequency support
  cpufreq: speedstep-lib: make several arrays static, makes code smaller
  PM: docs: Delete the obsolete states.txt document
  PM: docs: Describe high-level PM strategies and sleep states
  PM / devfreq: Fix memory leak when fail to register device
  PM / devfreq: Add dependency on PM_OPP
  PM / devfreq: Move private devfreq_update_stats() into devfreq
  PM / devfreq: Convert to using %pOF instead of full_name
  PM / AVS: rockchip-io: add io selectors and supplies for RV1108
  cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
  cpufreq: dt-platdev: Drop few entries from whitelist
  cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
  ARM: ux500: don't select CPUFREQ_DT
  cpuidle: Convert to using %pOF instead of full_name
  cpufreq: Convert to using %pOF instead of full_name
  PM / Domains: Convert to using %pOF instead of full_name
  cpufreq: Cap the default transition delay value to 10 ms
  ...
2017-09-05 12:19:08 -07:00
Rafael J. Wysocki
7b01463e51 Merge branch 'pm-sleep'
* pm-sleep:
  ACPI / PM: Check low power idle constraints for debug only
  PM / s2idle: Rename platform operations structure
  PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
  PM / s2idle: Rename freeze_state enum and related items
  PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE
  ACPI / PM: Prefer suspend-to-idle over S3 on some systems
  platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle
  PM / suspend: Define pr_fmt() in suspend.c
  PM / suspend: Use mem_sleep_labels[] strings in messages
  PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG
  PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()
  PM / core: Add error argument to dpm_show_time()
  PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()
  PM / s2idle: Rearrange the main suspend-to-idle loop
  PM / timekeeping: Print debug messages when requested
  PM / sleep: Mark suspend/hibernation start and finish
  PM / sleep: Do not print debug messages by default
  PM / suspend: Export pm_suspend_target_state
2017-09-04 00:06:02 +02:00
Rafael J. Wysocki
1b39e3f813 cpuidle: Make drivers initialize polling state
Make the drivers that want to include the polling state into their
states table initialize it explicitly and drop the initialization of
it (which in fact is conditional, but that is not obvious from the
code) from the core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-30 03:06:33 +02:00
Rafael J. Wysocki
28ba086ed3 PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
Rename the ->enter_freeze cpuidle driver callback to ->enter_s2idle
to make it clear that it is used for entering suspend-to-idle and
rename the related functions, variables and so on accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-11 01:29:56 +02:00
Ingo Molnar
1ed7d32763 Merge branch 'x86/boot' into x86/mm, to pick up interacting changes
The SME patches we are about to apply add some E820 logic, so merge in
pending E820 code changes first, to have a single code base.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:36:53 +02:00
Andy Lutomirski
43858b4f25 x86/mm: Stop calling leave_mm() in idle code
Now that lazy TLB suppresses all flush IPIs (as opposed to all but
the first), there's no need to leave_mm() when going idle.

This means we can get rid of the rcuidle hack in
switch_mm_irqs_off() and we can unexport leave_mm().

This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
since it has no callers any more.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/03c699cfd6021e467be650d6b73deaccfe4b4bd7.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:57 +02:00
Joe Perches
654d08a42a intel_idle: Use more common logging style
Remove #define PREFIX and add #define pr_fmt to use more common logging.

Miscellanea:

o Add missing newline to format
o Convert a single printk without KERN_<LEVEL> to pr_info

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-29 22:58:35 +02:00
David E. Box
1b2e87687d x86/intel_idle: add Gemini Lake support
Gemini Lake uses the same C-states as Broxton and also uses the
IRTL MSR's to determine maximum C-state latency.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-05-01 23:17:37 +02:00
Linus Torvalds
c82be9d224 Power management turbostat utility updates for v4.11-rc1
These update turbostat significantly and in particular:
 
  - Default output is now verbose, --debug is no longer required to
    get all counters.  As a result, some options have been added to
    specify exactly what output is wanted.
  - Added --quiet to skip system configuration output
  - Added --list, --show and --hide parameters
  - Added --cpu parameter
  - Enhanced Baytrail SoC support
  - Added Gemini Lake SoC support
  - Added sysfs C-state columns
 
 Also the symbol definitions in arch/x86/include/asm/intel-family.h
 and arch/x86/include/asm/msr-index.h are updated and the intel_idle
 and intel_pstate drivers are modified to use the updated symbols.
 
 Credits to Len Brown for all of these changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYuLyNAAoJEILEb/54YlRxEvkQAJsggzpgGrlhrO6KHSm4yC9M
 CqhBVsdeppX1ZTAVPiMk/pcXQYtL5fZ97ELk2So/CjT5Nh3jwDPMA/ux5n3uiob+
 O2BTdtxnpNLxPQPQM1mW7Dr/uAIRlJug9gSMxKDbFSU9Oe3aET58PUdUTs7xaT59
 nbtLxVSvzrdGk/bX6WO4ic+7F2licJLZPfDGhYidnoika8LxD4M+cIO73gFpgqQi
 yoKrTZyLimvneFT0eAUUvHIyKjkJIxeMfslW57uBpz8rW5my+3UwsdpRG4AIVeWc
 wSBlsNqj+TuR4BBiZ2VR2RoHF3qbH/SceI+k864BqyThfyK/g2q/vV/GvLZQCR/R
 yWcajWD9kvLKvnm1D3XYOIQDBeP4l60j3vVwHytSvmaPYjn5Ms3jq6b+2K6zkXMM
 8y3leW/hgw+rGCacdXPrKIlpBykSV7h+TnD2iMxeeDISNkbefWWDe/WB6HncocAg
 HDtKRvU9ntRq6/MlnTKbCFM5c0oCXWRw4QNjDy3AsjJELgeAIwiqpHWMKO6XltFj
 qU/rdyW/BTCuAlIjWVbjooAIJZ268geupeug3zvE3uGzrxT4DaVIo8W1wtJ+XQrt
 By7sOW/gMQ2EcTJQiuFjS/Gz5gOKQ2F8OLCm6T8Prjh6SxrCUAiuIvP0LmxUCa8i
 KMlx+8c9E2f9j+TTt9AP
 =oMZe
 -----END PGP SIGNATURE-----

Merge tag 'pm-turbostat-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull turbostat utility updates from Rafael Wysocki:
 "Power management turbostat utility updates.

  These update turbostat significantly and in particular:

   - default output is now verbose, --debug is no longer required to get
     all counters. As a result, some options have been added to specify
     exactly what output is wanted.

   - added --quiet to skip system configuration output

   - added --list, --show and --hide parameters

   - added --cpu parameter

   - enhanced Baytrail SoC support

   - added Gemini Lake SoC support

   - added sysfs C-state columns

  Also the symbol definitions in arch/x86/include/asm/intel-family.h and
  arch/x86/include/asm/msr-index.h are updated and the intel_idle and
  intel_pstate drivers are modified to use the updated symbols.

  Credits to Len Brown for all of these changes"

* tag 'pm-turbostat-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits)
  tools/power turbostat: version 17.02.24
  tools/power turbostat: bugfix: --add u32 was printed as u64
  tools/power turbostat: show error on exec
  tools/power turbostat: dump p-state software config
  tools/power turbostat: show package number, even without --debug
  tools/power turbostat: support "--hide C1" etc.
  tools/power turbostat: move --Package and --processor into the --cpu option
  tools/power turbostat: turbostat.8 update
  tools/power turbostat: update --list feature
  tools/power turbostat: use wide columns to display large numbers
  tools/power turbostat: Add --list option to show available header names
  tools/power turbostat: fix zero IRQ count shown in one-shot command mode
  tools/power turbostat: add --cpu parameter
  tools/power turbostat: print sysfs C-state stats
  tools/power turbostat: extend --add option to accept /sys path
  tools/power turbostat: skip unused counters on BDX
  tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits
  tools/power turbostat: skip unused counters on SKX
  tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7
  tools/power turbostat: initial Gemini Lake SOC support
  ...
2017-03-02 17:41:27 -08:00
Len Brown
6cfb2374f8 intel_idle: use new name for MSR_PKG_CST_CONFIG_CONTROL
previously known as  MSR_NHM_SNB_PKG_CST_CFG_CTL

Signed-off-by: Len Brown <len.brown@intel.com>
2017-03-01 00:13:17 -05:00
Len Brown
de09cdd09f intel_idle: stop exposing platform acronyms in sysfs
Cosmetic only -- no functional change in this patch.

sysfs before:

state4/desc:MWAIT 0x20
state4/name:C6-HSW

sysfs after:

state4/desc:MWAIT 0x20
state4/name:C6

We remove the platform acronyms from the end of the state name
(-HSW in this case) for three reasonse.

 1. more consistency with acpi_idle, which prints C1, C2, C3 etc.

 2. users know what platform they are on already
    an acronym for the processor code name here
    seems to cause more confusion than clarity.

 3. less clutter in "cpupower monitor" output,
    which truncates the names to 4 columns.

The precise definition of the state continues to be available in "desc".

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-01 00:54:05 +01:00
Sebastian Andrzej Siewior
fb1013a016 intel_idle: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke the
callbacks on the already online CPUs.

The two smp_call_function_single() invocations in intel_idle_cpu_init() have
been removed because intel_idle_cpu_init() is now invoked via the hotplug
callback which runs on the target CPU. The IRQ-off calling convention for
auto_demotion_disable() and c1e_promotion_disable() has not been preserved
because only those two modify the MSR during CPU intialization.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-01 23:01:36 +01:00
Anna-Maria Gleixner
29d7bbada9 intel_idle: Remove superfluous SMP fuction call
Since commit 1cf4f629d9 ("cpu/hotplug: Move online calls to
hotplugged cpu") the CPU_ONLINE and CPU_DOWN_PREPARE notifiers are
always run on the hot plugged CPU, and as of commit 3b9d6da67e
("cpu/hotplug: Fix rollback during error-out in __cpu_disable()") the
CPU_DOWN_FAILED notifier also runs on the hot plugged CPU. This patch
converts the SMP functional calls into direct calls.

smp_function_call_single() executes the function with interrupts
disabled. This calling convention is not preserved, because
tick_broadcast_enable() and tick_braodcast_disable() handle
interrupts themselves.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-01 23:01:36 +01:00
Piotr Luc
a2c1bc645e x86/intel_idle: Add Knights Mill CPUID
Add Knights Mill (KNM) to the list of CPUIDs supported by intel_idle.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2016-12-01 00:29:07 -05:00
Andy Shevchenko
5e7ec268fd x86/intel_idle: Add CPU model 0x4a (Atom Z34xx series)
Add CPU ID for Atom Z34xx processors. Datasheets indicate support for this,
detailed information about potential quirks or limitations are missing, though.
So we just reuse the definition from official BSP code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2016-12-01 00:16:31 -05:00
Chris Metcalf
6727ad9e20 nmi_backtrace: generate one-line reports for idle cpus
When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative.  Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.

This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.

Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:30 -07:00