The 'automatic_cstate_conversion_probe()' function has a too long 'if'
statement, convert it to a 'switch' statement in order to improve code
readability a bit.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Before this patch, SPR platform was considered identical to ICX platform. This
patch separates SPR support from ICX.
This patch is a preparation for adding SPR-specific package C-state limits
support.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Intel Performance Hybrid processors have a 2nd MSR
describing the turbo limits enforced on the Ecores.
Note, TRL and Secondary-TRL are usually R/O information,
but on overclock-capable parts, they can be written.
Signed-off-by: Len Brown <len.brown@intel.com>
When CONFIG_INTEL_UNCORE_FREQ_CONTROL is effective,
(Linux 5.9 and later), print the current (and default)
min and max uncore frequency limits.
When that driver provides the current uncore frequency
(Linux 5.18 and later), print a UncMHz column
reflecting the current uncore frequency.
Note that UncMHz is an instantaneous sample, not an average.
eg.
$ sudo ./turbostat -S --show frequency
...
Uncore Frequency pkg0 die0: 800 - 3900 MHz (800 - 3900 MHz)
...
Avg_MHz Busy% Bzy_MHz TSC_MHz UncMHz
28 0.70 4049 3095 3900
Signed-off-by: Len Brown <len.brown@intel.com>
Currently if a fscanf fails then an early return leaks an open
file pointer. Fix this by fclosing the file before the return.
Detected using static analysis with cppcheck:
tools/power/x86/turbostat/turbostat.c:2039:3: error: Resource leak: fp [resourceLeak]
Fixes: eae97e053f ("tools/power turbostat: Support thermal throttle count print")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Using strncmp for a single character comparison is overly complicated,
just use a simpler single character comparison instead. Also stops
static analyzers (such as cppcheck) from complaining about strncmp on
non-null terminated strings.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
It would be handy to have cmdline in turbostat output. For example,
according to the turbostat output, there are no C-states requested.
In this case the user is very curious if something like
intel_idle.max_cstate=0 was used, or may be idle=none too. It is
also curious whether things like intel_pstate=nohwp were used.
Print the boot command line accordingly:
turbostat version 21.05.04 - Len Brown <lenb@kernel.org>
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.16.0+ root=UUID=
b42359ed-1e05-42eb-8757-6bf2a1c19070 ro quiet splash vt.handoff=7
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Pull turbostat changes for 5.19 from Len Brown:
"Chen Yu (1):
tools/power turbostat: Support thermal throttle count print
Dan Merillat (1):
tools/power turbostat: fix dump for AMD cpus
Len Brown (5):
tools/power turbostat: tweak --show and --hide capability
tools/power turbostat: fix ICX DRAM power numbers
tools/power turbostat: be more useful as non-root
tools/power turbostat: No build warnings with -Wextra
tools/power turbostat: version 2022.04.16
Sumeet Pawnikar (2):
tools/power turbostat: Add Power Limit4 support
tools/power turbostat: print power values upto three decimal
Zephaniah E. Loss-Cutler-Hull (2):
tools/power turbostat: Allow -e for all names.
tools/power turbostat: Allow printing header every N iterations"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2022.04.16
tools/power turbostat: No build warnings with -Wextra
tools/power turbostat: be more useful as non-root
tools/power turbostat: fix ICX DRAM power numbers
tools/power turbostat: Support thermal throttle count print
tools/power turbostat: Allow printing header every N iterations
tools/power turbostat: Allow -e for all names.
tools/power turbostat: print power values upto three decimal
tools/power turbostat: Add Power Limit4 support
tools/power turbostat: fix dump for AMD cpus
tools/power turbostat: tweak --show and --hide capability
Don't exit if used this way:
sudo setcap cap_sys_nice,cap_sys_rawio=+ep ./turbostat
sudo chmod +r /dev/cpu/*/msr
./turbostat
note: cap_sys_admin is now also needed for the perf IPC counter:
sudo setcap cap_sys_admin,cap_sys_nice,cap_sys_rawio=+ep ./turbostat
Reported-by: Artem S. Tashkinov <aros@gmx.com>
Reported-by: Toby Broom <tbroom@outlook.com>
Signed-off-by: Len Brown <len.brown@intel.com>
ICX (and its duplicates) require special hard-coded DRAM RAPL units,
rather than using the generic RAPL energy units.
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The turbostat data is collected by end user for power evaluationit. However
it looks like we are missing enough thermal context there. Already a couple of
time we found that power management developer asking something like this:
grep -r . /sys/devices/system/cpu/cpu*/thermal_throttle/*
Print the per core thermal throttle count so as to get suffificent thermal
context.
turbostat -i 5 -s Core,CPU,CoreThr
Core CPU CoreThr
- - 104
0 0 61
0 4
1 1 0
1 5
2 2 104
2 6
3 3 7
3 7
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This gives the ability to reprint the header every N iterations, so you
can ensure that a scrolling display always has the header visible
somewhere on the screen.
Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Currently, there are a number of variables which are displayed by
default, enabled with -e all, and listed by --list, but which you can
not give to --enable/-e.
So you can enable CPU0c1 (in the bic array), but you can't enable C1 or
C1% (not in the bic array, but exists in sysfs).
This runs counter to both the documentation and user expectations, and
it's just not very user friendly.
As such, the mechanism used by --hide has been duplicated, and is now
also used by --enable, so we can handle unknown names gracefully.
Note: One impact of this is that truly unknown fields given to --enable
will no longer generate errors, they will be silently ignored, as --hide
does.
Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Print power values upto three decimal places in watts.
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat --Dump exits early with status 243 (-13)
get_counters() calls get_msr_sum() on zen CPUS
for MSR_PKG_ENERGY_STAT, but per_cpu_msr_sum
has not been initialized.
Signed-off-by: Dan Merillat <git@dan.eginity.com>
Signed-off-by: Len Brown <len.brown@intel.com>
'MSR_PKG_CST_CONFIG_CONTROL' encodes the deepest allowed package C-state limit,
and turbostat decodes it.
Before this patch: turbostat does not recognize value "3" on Ice Lake Xeon
(ICX) and Sapphire Rapids Xeon (SPR), treats it as "unknown", and does not
display any package C-states in the results table.
After this patch: turbostat recognizes value 3 on ICX and SPR, treats it as
"PC6", and correctly displays package C-states in the results table.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As idle, in particular, can have many columns on some machines...
Make it easy to ignore them all at once.
Signed-off-by: Len Brown <len.brown@intel.com>
There are two TCC activation temeprature.
One is the default TCC activation temperature, also known as TJ_MAX.
Another one is the effective TCC activation temperature, which is the
subtraction of default TCC activation temperature and TCC offset.
The name of variable tcc_activation_temp might be misleading here.
Thus rename tcc_activation_temp to tj_max, and use tcc_default and
tcc_offset to calculate the effective TCC activation temperature.
No functional change in this patch.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The length of TCC Offset bits varies on different platforms.
Decode TCC Offset bits only for the platforms that we have verified.
For the others, only show default TCC activation temperature.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
CPU model may get changed in intel_model_duplicates() for code reuse.
But there are still some cases we need the original CPU model to handle
minor differences between generations.
Thus save the original CPU model.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
For Atom CPUs that have core cstate deeper than C6,
MSR_CORE_C6_RESIDENCY actually returns the residency for both CC6 and
deeper Core cstates.
Thus, the real Core C6 residency should be the subtraction of
MSR_CORE_C6_RESIDENCY return value and MSR_CORE_C6_RESIDENCY return value.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
C-state pre-wake setting[1] is an optimization for some Intel CPUs to
be woken up from deep C-states in order to reduce latency. According to
the spec, the BIT30 is the C-state Pre-wake Disable. Expose this setting
accordingly.
Sample output from turbostat:
...
cpu51: MSR_IA32_POWER_CTL: 0x1a00a40059 (C1E auto-promotion: DISabled)
C-state Pre-wake: ENabled
cpu51: MSR_TURBO_RATIO_LIMIT: 0x2021212121212224
...
[1] https://intel.github.io/wult/#c-state-pre-wake
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
It was found that on Elkhart Lake the TSC frequency is driven by
a separate crystal-clock domain, which is different from the
BCLK domain which includes mperf. This has result in small different
speed thus inconsistence between TSC and the mperf, which caused the
Busy% to be higher than 100%. On this platform it seems that the mperf
runs faster than tsc when the CPU is 100% utilized:
delta tsc(18815473183) < delta mperf(18958403680) for 10 seconds.
To align TSC with mperf, leverage the tsc_tweak mechanism introduced for
cores newer than Skylake, so that TSC and mperf would be calculated in
the same domain.
Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Do not mark a comment as kernel-doc notation when it is not
meant to be in kernel-doc notation.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Currently the turbostat treats ICX the same way as SKX and shares the
code among them. But one difference is that ICX does not support Package
C6 Retention, unlike SKX and CLX.
So this patch:
1. Splitting SKX and ICX in turbostat.
2. Removing Package C6 Rentention for ICX.
And after this split, it would be easier to cutomize Ice Lake Server
in turbostat in the future.
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The idx_to_offset() function returns type int (32-bit signed), but
MSR_PKG_ENERGY_STAT is u32 and would be interpreted as a negative number.
The end result is that it hits the if (offset < 0) check in update_msr_sum()
which prevents the timer callback from updating the stat in the background when
long durations are used. The similar issue exists in offset_to_idx() and
update_msr_sum(). Fix this issue by converting the 'int' to 'off_t' accordingly.
Fixes: 9972d5d84d ("tools/power turbostat: Enable accumulate RAPL display")
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
It was reported that on Zen+ system turbostat started exiting,
which was tracked down to the MSR_PKG_ENERGY_STAT read failing because
offset_to_idx wasn't returning a non-negative index.
This patch combined the modification from Bingsong Si and
Bas Nieuwenhuizen and addd the MSR to the index system as alternative for
MSR_PKG_ENERGY_STATUS.
Fixes: 9972d5d84d ("tools/power turbostat: Enable accumulate RAPL display")
Reported-by: youling257 <youling257@gmail.com>
Tested-by: youling257 <youling257@gmail.com>
Tested-by: Kurt Garloff <kurt@garloff.de>
Tested-by: Bingsong Si <owen.si@ucloud.cn>
Tested-by: Artem S. Tashkinov <aros@gmx.com>
Co-developed-by: Bingsong Si <owen.si@ucloud.cn>
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This reverts commit 6ff7cb371c.
Apparently the TCC offset should not be used to adjust what temperature
we show the user after all.
(on most systems, TCC offset is 0, FWIW)
Fixes: 6ff7cb371c
Signed-off-by: Len Brown <len.brown@intel.com>
Ice Lake D is low-end server version of Ice Lake X, reuse
the code accordingly.
Tested-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Share the code between Alder Lake Mobile and Alder Lake Desktop.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Use linux-perf to access the hardware instructions-retired counter.
This is necessary because the counter is not enabled by default,
and also the counter is prone to roll-over -- both of which
perf manages.
It is not necessary to use perf for the cycle counter,
because turbostat already needs to collect delta-aperf
to calcuate frequency.
Signed-off-by: Len Brown <len.brown@intel.com>
Commit
6d6501d912 ("tools/power/turbostat: Read energy_perf_bias from sysfs")
converted turbostat to read the energy_perf_bias value from sysfs.
However, older kernels which do not have that file yet, would fail. For
those, fall back to the MSR reading.
Fixes: 6d6501d912 ("tools/power/turbostat: Read energy_perf_bias from sysfs")
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://lkml.kernel.org/r/20210127132444.981120-1-dedekind1@gmail.com
an attempt to have userspace tools not poke at naked MSRs. This round
deals with MSR_IA32_ENERGY_PERF_BIAS and removes direct poking into it
by our in-tree tools in favor of the proper "energy_perf_bias" sysfs
interface which we already have.
In addition, the msr.ko write filtering's error message points to a new
summary page which contains the info we collected from helpful reporters
about which userspace tools write MSRs:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about
along with the current status of their conversion.
Rest is the usual small fixes and improvements.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/XVKYACgkQEsHwGGHe
VUondg//fv3aQM3KtWE7sxv6BjpiUNozPBELRuKo+EskHSxHudRhBxzdSMM7WgKq
2uojb2CQtzRzYhHuiXjXKfbB7Ci/Jo4EDCJW2otpiqit7/UgXu15Q5ypCUMIteiV
u9A2w3oN3GPR5TuofLWCffaotVMpFok3u7jX7RxEQPWmZqJItTwZpqYLeyniHaKM
c6taAxZVyV13iejRhxim2zkl/hMXpjA8I+8CqWIL25J7GYlYeWLWxWYmHIQTs0NM
zSIyr47RD8RRXVeRdeJMxnQblKE1zrObIV1fUXXu1dSW47DkrrcOQwEMorNjPtPA
FR5Xhi+TX8JrBasMpwCnV/CTj6Ua8UsMfwQcPOFnXALPj87HfFSypa5BpnBH5xTW
PaiatRmiNJm3g79ncaTvXCksMbb4WANqOYK+gsGYvtKbfLR+caWT6vytjZA6sC6x
laynstV9PFUyewdwjjAjilhArzV+y+5RsRudBK8xSjcawbyV4ZEorNKYS9qrhm+y
7CAM9A8fCQiO6POr6W7HcfmkUOHC9PLhtyjdJH89tAmaf+sfvaczzx3awwSuKx7P
0rJlDiJP1v7yEpOMWHbpGIqjMBaWK4y3mb4g3UwFpHpo8cTl+WXZQppOPIBn9GA9
ASLYT/ze7zk1Ua2V88qoXiC5AEvqBnSq4fp2pmf06ROZgBnYT6o=
=ISyk
-----END PGP SIGNATURE-----
Merge tag 'x86_misc_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
"The main part of this branch is the ongoing fight against windmills in
an attempt to have userspace tools not poke at naked MSRs.
This round deals with MSR_IA32_ENERGY_PERF_BIAS and removes direct
poking into it by our in-tree tools in favor of the proper
"energy_perf_bias" sysfs interface which we already have.
In addition, the msr.ko write filtering's error message points to a
new summary page which contains the info we collected from helpful
reporters about which userspace tools write MSRs:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about
along with the current status of their conversion.
The rest is the usual small fixes and improvements"
* tag 'x86_misc_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/msr: Add a pointer to an URL which contains further details
x86/pci: Fix the function type for check_reserved_t
selftests/x86: Add missing .note.GNU-stack sections
selftests/x86/fsgsbase: Fix GS == 1, 2, and 3 tests
x86/msr: Downgrade unrecognized MSR message
x86/msr: Do not allow writes to MSR_IA32_ENERGY_PERF_BIAS
tools/power/x86_energy_perf_policy: Read energy_perf_bias from sysfs
tools/power/turbostat: Read energy_perf_bias from sysfs
tools/power/cpupower: Read energy_perf_bias from sysfs
MAINTAINERS: Cleanup SGI-related entries
turbostat tends to get confused when CPUs are added and removed
while it is running.
There are races, such as checking the current cpu, and then
reading a sysfs file that depends on that cpu number.
Close the two issues that seem to come up the most.
First, there is an infinite reset loop detector --
change that to allow more resets before giving up.
Secondly, one of those file reads didn't really need
to exit the program on failure...
Signed-off-by: Len Brown <len.brown@intel.com>
cpu1: MSR_IA32_TEMPERATURE_TARGET: 0x05640000 (95 C) (100 default - 5 offset)
Account for the new "offset" field in MSR_TEMPERATURE_TARGET.
While this field is usually zero, ignoring it results in over-stating
the current temperature, both per-core and per-package.
Signed-off-by: Len Brown <len.brown@intel.com>
For compatibility reasons, Glibc off_t is a 32-bit type on 32-bit x86
unless _FILE_OFFSET_BITS=64 is defined. Add this define, as otherwise
reading MSRs with index 0x80000000 and above attempts a pread with a
negative offset, which fails.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Tested-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Family 19h processors have the same RAPL (Running average power limit)
hardware register interface as Family 17h processors.
Change the family checks to succeed for Family 17h and above to enable
core and package energy measurement on Family 19h machines.
Also update the TDP to the largest found at the bottom of the page at
amd.com->processors->servers->epyc->2nd-gen-epyc, i.e., the EPYC 7H12.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Jacobsville doesn't have Package C2 and C6. Also
Core and DRAM RAPL are not available. Adjust output
accordingly.
Signed-off-by: Antti Laakso <antti.laakso@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The column already present called GFXMHz reads from gt_cur_freq_mhz,
which represents the GT frequency that was requested, but power
management might not be able to do that. So the new column will display
what the actual frequency GT is running at.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Like we skip PC3 and PC6 columns when the package C-state limit
disables them, skip PC8/PC9/CP10 under analogous conditions.
Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat formatting is broken with ACPI CST for enumeration. The
problem is that the CX_ACPI% is eight characters long which does not
work with tab formatting. One simple solution is to remove the underbar
from the state name such that C1_ACPI will be displayed as C1ACPI.
Signed-off-by: David Arcari <darcari@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Disabling cpu 0 results in an error
turbostat: /sys/devices/system/cpu/cpu0/topology/thread_siblings: open failed: No such file or directory
Use sched_getcpu() instead of a hardcoded cpu 0 to get the max cpu number.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Since the RAPL Joule Counter is 32 bit, turbostat would
only print a *star* instead of printing the actual energy
consumed to indicate the overflow due to long duration.
This does not meet the requirement from servers as the
sampling time of turbostat is usually very long on servers.
So maintain a set of MSR buffer, and update them
periodically before the 32bit MSR register is wrapped round,
so as to avoid the overflow.
The idea is similar to the implementation of ktime_get():
Periodical MSR timer:
total_rapl_sum += (current_rapl_msr - last_rapl_msr);
Using get_msr_sum() to get the accumulated RAPL:
return (current_rapl_msr - last_rapl_msr) + total_rapl_sum;
The accumulated RAPL mechanism will be turned on in next patch.
Originally-by: Aaron Lu <aaron.lwe@gmail.com>
Reviewed-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Change the energy variable from 32bit to 64bit,
so that it can record long time duration.
After this conversion, adjust the DELTA_WRAP32() accordingly.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
If the --quiet option is not used, turbostat prints a useful system
configuration header during startup.
But inclusion of idle system configuration information in this header
is currently a function of inclusion in the columns chosen to be displayed.
Always list this idle system configuration.
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Len Brown <len.brown@intel.com>
Users are puzzled when they use tuned performance and all their
C-states vanish. Dump /dev/cpu_dma_latency and state
whether the value is default, or constraining,
to explain this situation.
Signed-off-by: Len Brown <len.brown@intel.com>
Some Chromebook BIOS' do not export an ACPI LPIT, which is how
Linux finds the residency counter for CPU and SYSTEM low power states,
that is exports in /sys/devices/system/cpu/cpuidle/*residency_us
When these sysfs attributes are missing, check the debugfs attrubte
from the pmc_core driver, which accesses the same counter value.
Signed-off-by: Len Brown <len.brown@intel.com>
From a turbostat point of view the Tremont-based Elkhart Lake
is very similar to Goldmont, reuse the code of Goldmont.
Elkhart Lake does not support 'group turbo limit counter'
nor C3, adjust the code accordingly.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Jasper Lake, like Elkhart Lake, uses a Tremont CPU.
So reuse the code.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
From a turbostat point of view, Ice Lake server looks like Sky Lake server.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
From a turbostat point of view, Tiger Lake looks like Ice Lake.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Warning: ‘__builtin_strncpy’ specified bound 20 equals destination size
[-Wstringop-truncation]
reduce param to strncpy, to guarantee that a null byte is always copied
into destination buffer.
Signed-off-by: Len Brown <len.brown@intel.com>
From a turbostat point of view, Cometlake is like Kabylake.
Suggested-by: Rui Zhang <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Sync msr-index.h to pull in recent renames of the IA32_FEATURE_CONTROL
MSR definitions. Update KVM's VMX selftest and turbostat accordingly.
Keep the full name in turbostat's output to avoid breaking someone's
workflow, e.g. if a script is looking for the full name.
While using the renamed defines is by no means necessary, do the sync
now to avoid leaving a landmine that will get stepped on the next time
msr-index.h needs to be refreshed for some other reason.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-4-sean.j.christopherson@intel.com
Conflicts:
tools/power/x86/turbostat/turbostat.c
Recent turbostat changes conflicted with a pending rename of x86 model names in tip:x86/cpu,
sort it out.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 9392bd98bb ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") and the commit 3316f99a9f ("tools/power
turbostat: Also read package power on AMD F17h (Zen)") add AMD Fam 17h
RAPL support.
Hygon Family 18h(Dhyana) support RAPL in bit 14 of CPUID 0x80000007 EDX,
and has MSRs RAPL_PWR_UNIT/CORE_ENERGY_STAT/PKG_ENERGY_STAT. So add Hygon
Dhyana Family 18h support for RAPL.
Already tested on Hygon multi-node systems and it shows correct per-core
energy usage and the total package power.
Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Commit 9392bd98bb ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") add a function get_tdp_amd(), the parameter is CPU
family. But the rapl_probe_amd() function use wrong model parameter.
Fix the wrong caller parameter of get_tdp_amd() to use family.
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
In some case C1% will be wrong value, when platform doesn't have MSR for
C1 residency.
For example:
Core CPU CPU%c1
- - 100.00
0 0 100.00
0 2 100.00
1 1 100.00
1 3 100.00
But adding Busy% will fix this
Core CPU Busy% CPU%c1
- - 99.77 0.23
0 0 99.77 0.23
0 2 99.77 0.23
1 1 99.77 0.23
1 3 99.77 0.23
This issue can be reproduced on most of the recent systems including
Broadwell, Skylake and later.
This is because if we don't select Busy% or Avg_MHz or Bzy_MHz then
mperf value will not be read from MSR, so it will be 0. But this
is required for C1% calculation when MSR for C1 residency is not present.
Same is true for C3, C6 and C7 column selection.
So add another define DO_BIC_READ(), which doesn't depend on user
column selection and use for mperf, C3, C6 and C7 related counters.
So when there is no platform support for C1 residency counters,
we still read these counters, if the CPU has support and user selected
display of CPU%c1.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Turbostat works by taking a snapshot of counters, sleeping, taking another
snapshot, calculating deltas, and printing out the table.
The sleep time is controlled via -i option or by user sending a signal or a
character to stdin. In the latter case, turbostat always adds 1 ms
sleep before it reads the counters, in order to avoid larger imprecisions
in the results in prints.
While the 1 ms delay may be a good idea for a "dumb" user, it is a
problem for an "aware" user. I do thousands and thousands of measurements
over a short period of time (like 2ms), and turbostat unconditionally adds
a 1ms to my interval, so I cannot get what I really need.
This patch removes the unconditional 1ms sleep. This is an expert user
tool, after all, and non-experts will unlikely ever use it in the non-fixed
interval mode anyway, so I think it is OK to remove the 1ms delay.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Commit '47936f944e78 tools/power turbostat: fix printing on input' make
a valid fix, but it completely disabled piped stdin support, which is
a valuable use-case. Indeed, if stdin is a pipe, turbostat won't read
anything from it, so it becomes impossible to get turbostat output at
user-defined moments, instead of the regular intervals.
There is no reason why this should works for terminals, but not for
pipes. This patch improves the situation. Instead of ignoring pipes, we
read data from them but gracefully handle the EOF case.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This enables turbostat utility on Ice Lake NNPI SoC.
Link: https://lkml.org/lkml/2019/6/5/1034
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Perhaps if this more descriptive name had been used,
then we wouldn't have had the HSW ULT vs HSW CORE bug,
fixed by the previous commit.
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat: cpu0: msr offset 0x630 read failed: Input/output error
because Haswell Core does not have C8-C10.
Output C8-C10 only on Haswell ULT.
Fixes: f5a4c76ad7 ("tools/power turbostat: consolidate duplicate model numbers")
Reported-by: Prarit Bhargava <prarit@redhat.com>
Suggested-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat could be terminated by general protection fault on some latest
hardwares which (for example) support 9 levels of C-states and show 18
"tADDED" lines. That bloats the total output and finally causes buffer
overrun. So let's extend the buffer to avoid this.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Currently the error return path does not close the file fp and leaks
a file descriptor. Fix this by closing the file.
Fixes: 5ea7647b33 ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Turbostat currently normalizes TSC and other values by dividing by an
interval. This interval is the delta between the start of one global
(all counters on all CPUs) sampling and the start of another. However,
this introduces a lot of jitter into the data.
In order to reduce jitter, the interval calculation should be based on
timestamps taken per thread and close to the start of the thread's
sampling.
Define a per thread time value to hold the delta between samples taken
on the thread.
Use the timestamp taken at the beginning of sampling to calculate the
delta.
Move the thread's beginning timestamp to after the CPU migration to
avoid jitter due to the migration.
Use the global time delta for the average time delta.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Compiling without optimisations is silly, especially since some
warnings depend on the optimiser. Use -O2.
Fortify adds warnings for unchecked I/O (among other things), which
seems to be a good idea for user-space code. Enable that too.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>
Currently big microservers have _XEON_D while small microservers have
_X, Make it uniformly: _D.
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \
-e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
Currently big core clients with extra graphics on have:
- _G
- _GT3E
Make it uniformly: _G
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_GT3E"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_GT3E/\1_G/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.622802314@infradead.org
Currently big core mobile chips have either:
- _L
- _ULT
- _MOBILE
Make it uniformly: _L.
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org
Currently the big core client models either have:
- no OPTDIFF
- _CORE
- _DESKTOP
Make it uniformly: 'no OPTDIFF'.
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(CORE\|DESKTOP\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(CORE\|DESKTOP\)/\1/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.513945586@infradead.org
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin st fifth floor boston ma 02110
1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 111 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.567572064@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull x86 MDS mitigations from Thomas Gleixner:
"Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is
available in various CPU internal buffers. This new set of misfeatures
has the following CVEs assigned:
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
MDS attacks target microarchitectural buffers which speculatively
forward data under certain conditions. Disclosure gadgets can expose
this data via cache side channels.
Contrary to other speculation based vulnerabilities the MDS
vulnerability does not allow the attacker to control the memory target
address. As a consequence the attacks are purely sampling based, but
as demonstrated with the TLBleed attack samples can be postprocessed
successfully.
The mitigation is to flush the microarchitectural buffers on return to
user space and before entering a VM. It's bolted on the VERW
instruction and requires a microcode update. As some of the attacks
exploit data structures shared between hyperthreads, full protection
requires to disable hyperthreading. The kernel does not do that by
default to avoid breaking unattended updates.
The mitigation set comes with documentation for administrators and a
deeper technical view"
* 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/speculation/mds: Fix documentation typo
Documentation: Correct the possible MDS sysfs values
x86/mds: Add MDSUM variant to the MDS documentation
x86/speculation/mds: Add 'mitigations=' support for MDS
x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
x86/speculation/mds: Fix comment
x86/speculation/mds: Add SMT warning message
x86/speculation: Move arch_smt_update() call to after mitigation decisions
x86/speculation/mds: Add mds=full,nosmt cmdline option
Documentation: Add MDS vulnerability documentation
Documentation: Move L1TF to separate directory
x86/speculation/mds: Add mitigation mode VMWERV
x86/speculation/mds: Add sysfs reporting for MDS
x86/speculation/mds: Add mitigation control for MDS
x86/speculation/mds: Conditionally clear CPU buffers on idle entry
x86/kvm/vmx: Add MDS protection when L1D Flush is not active
x86/speculation/mds: Clear CPU buffers on exit to user
x86/speculation/mds: Add mds_clear_cpu_buffers()
x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
x86/speculation/mds: Add BUG_MSBDS_ONLY
...
Pull turbostat utility updates for 5.1 from Len Brown:
"Misc fixes and updates."
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
On some systems /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
or /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
return a file error because of bad ACPI LPIT data from a misconfigured BIOS.
turbostat interprets this failure as a fatal error and outputs
turbostat: CPU LPI: No data available
If the ACPI LPIT sysfs files return an error output a warning instead of
a fatal error, disable the ACPI LPIT evaluation code, and continue.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Most calls to fgets() and fscanf() are followed by error checks.
Add an exit-on-error in the remaining cases.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>
The package power can also be read from an MSR. It's not clear exactly
what is included, and whether it's aggregated over all nodes or
reported separately.
It does look like this is reported separately per CCX (I get a single
value on the Ryzen R7 1700), but it might be reported separately per-
die (node?) on larger processors. If that's the case, it would have to
be recorded per node and aggregated for the socket.
Note that although Zen has these MSRs reporting power, it looks like
the actual RAPL configuration (power limits, configured TDP) is done
through PCI configuration space. I have not yet found any public
documentation for this.
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Based on the Open-Source Register Reference for AMD Family 17h
Processors Models 00h-2Fh:
https://support.amd.com/TechDocs/56255_OSRR.pdf
These processors report RAPL support in bit 14 of CPUID 0x80000007 EDX,
and the following MSRs are present:
0xc0010299 (RAPL_PWR_UNIT), like Intel's RAPL_POWER_UNIT
0xc001029a (CORE_ENERGY_STAT), kind of like Intel's PP0_ENERGY_STATUS
0xc001029b (PKG_ENERGY_STAT), like Intel's PKG_ENERGY_STATUS
A notable difference from the Intel implementation is that AMD reports
the "Cores" energy usage separately for each core, rather than a
per-package total. The code has been adjusted to handle either case in a
generic way.
I haven't yet enabled collection of package power, due to being unable
to test it on multi-node systems (TR, EPYC).
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Running without a cpufreq driver is a valid case so warnings output in
this case should not be to stderr.
Use outf instead of stderr for these warnings.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat executes on CPUs in "topology order".
This is an optimization for measuring profoundly idle systems --
as the closest hardware is woken next...
Fix a typo that was added with the sub-die-node support,
that broke topology ordering on multi-node systems.
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat failed to return a non-zero exit status even though the
supplied command (turbostat <command>) failed. Currently when turbostat
forks a command it returns zero instead of the actual exit status of the
command. Modify the code to return the exit status.
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Greg pointed out that speculation related bit defines are using (1 << N)
format instead of BIT(N). Aside of that (1 << N) is wrong as it should use
1UL at least.
Clean it up.
[ Josh Poimboeuf: Fix tools build ]
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
perf c2c:
Jiri Olsa:
- Change the default coalesce setup to from '--coalesce pid,iaddr' to just '--coalesce iaddr'.
- Increase the HITM ratio limit for displayed cachelines.
perf script:
Andi Kleen:
- Fix LBR skid dump problems in brstackinsn.
perf trace:
Arnaldo Carvalho de Melo:
- Check if the raw_syscalls:sys_{enter,exit} are setup before setting tp filter.
- Do not hardcode the size of the tracepoint common_ fields.
- Beautify USBDEFFS_ ioctl commands.
Colin Ian King:
- Use correct SECCOMP prefix spelling, "SECOMP_*" -> "SECCOMP_*".
perf python:
Jiri Olsa:
- Do not force closing original perf descriptor in evlist.get_pollfd().
tools misc:
Jiri Olsa:
- Allow overriding CFLAGS and LDFLAGS.
perf build:
Stanislav Fomichev:
- Don't unconditionally link the libbfd feature test to -liberty and -lz
thread-stack:
Adrian Hunter:
- Fix processing for the idle task, having a stack per cpu.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXC4BOgAKCRCyPKLppCJ+
Jy1BAP0Vr6fC+mv/ul0x3WC4dlF0UG9p9+GxBoKsXPpG5vojCgEAqX7F+Pmx+6HK
FIBjbOWIL5NYRViskwPlQy5+qkKmJgA=
=I4gE
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-4.21-20190103' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf c2c:
Jiri Olsa:
- Change the default coalesce setup to from '--coalesce pid,iaddr' to just '--coalesce iaddr'.
- Increase the HITM ratio limit for displayed cachelines.
perf script:
Andi Kleen:
- Fix LBR skid dump problems in brstackinsn.
perf trace:
Arnaldo Carvalho de Melo:
- Check if the raw_syscalls:sys_{enter,exit} are setup before setting tp filter.
- Do not hardcode the size of the tracepoint common_ fields.
- Beautify USBDEFFS_ ioctl commands.
Colin Ian King:
- Use correct SECCOMP prefix spelling, "SECOMP_*" -> "SECCOMP_*".
perf python:
Jiri Olsa:
- Do not force closing original perf descriptor in evlist.get_pollfd().
tools misc:
Jiri Olsa:
- Allow overriding CFLAGS and LDFLAGS.
perf build:
Stanislav Fomichev:
- Don't unconditionally link the libbfd feature test to -liberty and -lz
thread-stack:
Adrian Hunter:
- Fix processing for the idle task, having a stack per cpu.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So that the user can specify outside CFLAGS/LDFLAGS values.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Herton Krzesinski <herton@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/20181212102537.25902-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Often a new processor gets a new model number, but from a turbostat
point of view, it is the same as a previous model. Support duplicates
with 1-line updates, rather than error-prone scattering of model #'s.
Signed-off-by: Len Brown <len.brown@intel.com>
When the C-state limit is 8 on Goldmont, PC10 is enabled.
Previously turbostat saw this as "undefined", and thus assumed
it should not show some counters, such as pc3, pc6, pc7.
Signed-off-by: Len Brown <len.brown@intel.com>
A recent turbostat release increased topo.max_cpu_num
to make it convenient to handle sysfs bitmaps of 32-cpus.
But users, who regularly make use of "--debug", then saw a bunch of output
for cpus that were not present.
Remove that extra output by checking a cpu is online before dumping its info.
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Prarit Bhargava <prarit@redhat.com>
turbostat recently gained a feature adding APIC and X2APIC columns.
While they are disabled by-default, they are enabled with --debug
or when explicitly requested, eg.
$ sudo turbostat --quiet --show Package,Node,Core,CPU,APIC,X2APIC date
But these columns erroneously showed zeros on AMD hardware.
This patch corrects the APIC and X2APIC [sic] columns on AMD.
Signed-off-by: Len Brown <len.brown@intel.com>
Going primarily by:
https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
with additional information gleaned from other related pages; notably:
- Bonnell shrink was called Saltwell
- Moorefield is the Merriefield refresh which makes it Airmont
The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
for i in `git grep -l FAM6_ATOM` ; do
sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This fixes the reported family on modern AMD processors (e.g. Ryzen,
which is family 0x17). Previously these processors all showed up as
family 0xf.
See the document
https://support.amd.com/TechDocs/56255_OSRR.pdf
section CPUID_Fn00000001_EAX for how to calculate the family
from the BaseFamily and ExtFamily values.
This matches the code in arch/x86/lib/cpu.c
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat fails on some multi-package topologies because the logical node
enumeration assumes that the nodes are sequentially numbered,
which causes the logical numa nodes to not be enumerated, or enumerated incorrectly.
Use a more robust enumeration algorithm which allows for non-seqential physical nodes.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch fixes a regression introduced in
commit 8cb48b32a5 ("tools/power turbostat: track thread ID in cpu_topology")
Turbostat uses incorrect cores number ('topo.num_cores') - its value is count
of logical CPUs, instead of count of physical cores. So it is twice as large as
it should be on a typical Intel system. For example, on a 6 core Xeon system
'topo.num_cores' is 12, and on a 52 core Xeon system 'topo.num_cores' is 104.
And interestingly, on a 68-core Knights Landing Intel system 'topo.num_cores'
is 272, because this system has 4 logical CPUs per core.
As a result, some of the turbostat calculations are incorrect. For example,
on idle 52-core Xeon system when all cores are ~99% in Core C6 (CPU%c6), the
summary (very first) line shows ~48% Core C6, while it should be ~99%.
This patch fixes the problem by fixing 'topo.num_cores' calculation.
Was:
1. Init 'thread_id' for all CPUs to -1
2. Run 'get_thread_siblings()' which sets it to 0 or 1
3. Increment 'topo.num_cores' when thread_id != -1 (bug!)
Now:
1. Init 'thread_id' for all CPUs to -1
2. Run 'get_thread_siblings()' which sets it to 0 or 1
3. Increment 'topo.num_cores' when thread_id is not 0
I did not have a chance to test this on an AMD machine, and only tested on a
couple of Intel Xeons (6 and 52 cores).
Reported-by: Vladislav Govtva <vladislav.govtva@intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The -S (system summary) option failed to print any data on a 1-processor system.
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Document the missing command line tokens in the help() function.
Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Improve the help() output by adding the single character
tokens (e.g -a).
Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Sort the command line arguments output of help() in
alphabetical order in line with other linux tools.
Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Running turbostat on machines that don't expose nodes
in sysfs (no /sys/bus/node) causes a segfault or a -nan
value diesplayed in the log. This is caused by
physical_node_id being reported as -1 and logical_node_id
being calculated as a negative number resulting in the new
GET_THREAD/GET_CORE returning an incorrect address.
Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Add APIC and X2APIC columns to the topology section.
They are disabled-by-default -- enable like so:
--debug
or
--enable APIC,X2APIC
Signed-off-by: Len Brown <len.brown@intel.com>
The --show and --hide options failed on "Node", which was listed as "Node%".
The --show and --hide options were generally fouled-up do due to come
content merges that scrambled the list of column name indexes.
Signed-off-by: Len Brown <len.brown@intel.com>
Output a Node column if there is more than one node/socket.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The previous patches have added node information to turbostat, but the
counters code does not take it into account.
Add node information from cpu_topology calculations to turbostat
counters.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cleanup, remove num_ from num_nodes_per_pkg, num_cores_per_node, and
num_threads_per_node.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat incorrectly assumes that there is one node per package. As a
result num_cores_per_pkg is not correctly named and is actually
num_cores_per_node.
Rename num_cores_per_pkg to num_cores_per_node.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The code can be simplified if the cpu_topology *cpus tracks the thread
IDs. This removes an additional file lookup and simplifies the counter
initialization code.
Add thread ID to cpu_topology information and cleanup the counter
initialization code.
v2: prevent thread_id from being overwritten
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The code currently assumes each package has exactly one node. This is not
the case for AMD systems and Intel systems with COD. AMD systems also
may re-enumerate each node's core IDs starting at 0 (for example, an AMD
processor may have two nodes, each with core IDs from 0 to 7). In order
to properly enumerate the cores we need to track both the physical and
logical node IDs.
Add physical_node_id to track the node ID assigned by the kernel, and
logical_node_id used by turbostat to track the nodes per package ie) a
0-based count within the package.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The turbostat code only looks at thread_siblings_list to determine if
processing units/threads are on the same the core. This works well on
Intel systems which have a shared L1 instruction and data cache. This
does not work on AMD systems which have shared L1 instruction cache but
separate L1 data caches. Other utilities also check sibling's core ID
to determine if the processing unit shares the same core.
Additionally, the cpu_topology *cpus list used in topology_probe() can
be used elsewhere in the code to simplify things.
Export *cpus to the entire turbostat code, and add Processing Unit/Thread
IDs information to each cpu_topology struct. Confirm that the thread
is on the same core as indicated by thread_siblings_list.
[v2]: Fixup CPU_* usage that caused gcc malloc error.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Future fixes will use sysfs files that contain cpumask output. The code
needs to know the length of the cpumask in order to determine which cpus
are set in a cpumask. Currently topo.max_cpu_num is the maximum cpu
number. It can be increased the the maximum value of cpus represented in
cpumasks.
Set max_num_cpus to the length of a cpumask.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
There's a use case during test to only print specific round of iterations
if --num_iterations is specified, for example, with this patch applied:
turbostat -i 5 -n 4
will capture 4 samples with 5 seconds interval.
[lenb: renamed to --num_iterations from --iterations]
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
All MSRs related to turbostat are same as Kabylake.
Even though SDM claims that core C3 residency can be read from MSR 0x662,
the read on this MSR fails on CNL platform. Hence disabled C3 MSR read
and display.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The SNB_C1_AUTO_UNDEMOTE definition should have been deleted once
it was copied into msr-index.h. One copy of the truth is better --
particularly when Matt needs to fix it:-)
Signed-off-by: Len Brown <len.brown@intel.com>
According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added.
Fixes: 889facbee3 ("tools/power turbostat: v3.0: monitor Watts and Temperature")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Like the "C1" and "C1%" column, the new POLL and POLL% columns
show invocations and residency% during the measurement interval.
While it didn't seem important to track in the past,
we've recently found some Linux cpuidle bugs related to POLL%.
Signed-off-by: Len Brown <len.brown@intel.com>
The column header for PC10 residency is "Pk%pc10"
This is missing the 'g' that others have, eg Pkg%pc6,
to allow tab-delimited columns to fit into 8-columns.
However, --hide Pk%pc10 did not work, it was still looking for the 'g'.
This was confusing, because --list shows the correct "Pk%pc10"
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Linux 4.15 exports the ACPI Low Power Idle Table's
counters in /sys/devices/system/cpu/cpuidle/
low_power_idle_cpu_residency_us
Show this in the "CPU%LPI" column.
Today this reflects the "North Complex"
residency in PC10, so expect it to
closely follow "Pk%pc10".
low_power_idle_system_residency_us
Show this in the "SYS%LPI" column.
Today, this reflects the North is in PC10,
plus the PCH is sufficiently quiescent
to save additional power via the "S0ix"
system state, as measured by the
PCH SLP_S0 counter.
Signed-off-by: Len Brown <len.brown@intel.com>
rpm-lint flagged these as being executable:
kernel-tools.x86_64: W: spurious-executable-perm /usr/share/man/man8/turbostat.8.gz
kernel-tools.x86_64: W: spurious-executable-perm /usr/share/man/man8/x86_energy_perf_policy.8.gz
Fix this
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
When the user reuests to collect and show columns
that are not present on every row (eg. for every CPU)
turbostat still prints an (empty) line for every CPU.
Update so no blank lines are printed.
old:
# turbostat --quiet --show Pkg%pc6
Pkg%pc6
9.12
9.12
Pkg%pc6
9.12
9.12
new:
# turbostat --quiet --show Pkg%pc6
Pkg%pc6
9.12
9.12
Pkg%pc6
9.12
9.12
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Improve readability a little bit by changing this output:
MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked: pkg-cstate-limit=7: unlimited, automatic-c-state-conversion=off)
with this output:
MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked, pkg-cstate-limit=7 (unlimited), automatic-c-state-conversion=off)
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
BDX and SKX have a bit that tells them to PROMOTE shallow
C-states requests to MWAIT(C6). It is generally a BIOS bug
if this bit is set. As we have encountered that BIOS bug,
let's print this bit in turbostat debug output.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some SKX use a 24 MHz crystal, so do not hard code 25 MHz.
Also, SKX crystal is not exact, because SKX uses an EMI reduction
circuit that costs a fraction of a percent.
Signed-off-by: Len Brown <len.brown@intel.com>
MSR_IA32_MISC_ENABLE[18] is the MWAIT ENABLE bit, not DISABLE bit...
so
MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO)
should print as:
MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO)
Signed-off-by: Len Brown <len.brown@intel.com>
The recent patch that implements table printing on a keypress introduced a
regression - turbostat prints the table almost continuously if it is run from a
daemon program.
The problem is also easy to reproduce like this:
echo | turbostat
The reason is that we cannot assume that stdin is always a TTY. It can be many
things.
This patch adds fixes the problem by limiting the new keypress functionality to
TTYs only. If stdin is not a TTY, we just sleep for the full interval time.
While on it, clean-up 'do_sleep()' to return no value, as callers do not expect
that anyway.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
In turbostat interval mode, a newline typed on standard input
will now conclude the current interval. Data will immediately
be collected and printed for that interval, and the next interval
will be started.
This is similar to the recently added SIGUSR1 feature.
But that is for use by programs, while this is for interactive use.
Signed-off-by: Len Brown <len.brown@intel.com>
Interval-mode turbostat now catches and discards SIGUSR1.
Thus, SIGUSR1 can be used to tell turbostat to cut short
the current measurement interval. Turbostat will then start
the next measurement interval using the regular interval length.
This can be used to give turbostat variable intervals.
Invoke turbostat with --interval LARGE_NUMBER_SEC
and have a program that has permission to send it a SIGUSR1
always before LARGE_NUMBER_SEC expires.
It may also be useful to use "--enable Time_Of_Day_Seconds"
to observe the actual interval length.
Signed-off-by: Len Brown <len.brown@intel.com>
Add a Time_Of_Day_Seconds column showing when measurement
for each row was completed. Units are [sec.subsec] since Epoch,
as reported by gettimeofday(2).
While useful to correlate turbostat output with other tools,
this built-in column is disabled, by default.
Add the "--enable" option to enable such disabled-by-default
built-in columns:
"--enable Time_Of_Day_Seconds"
"--enable usec"
"--enable all", will enable all disabled-by-defauilt built-in counters.
When "--debug" is used, all disabled-by-default columns are enabled,
unless explicitly skipped using "--hide"
Signed-off-by: Len Brown <len.brown@intel.com>
Turbostat neglects to display all package C-states for some Skylake Xeon BIOS configurations.
This is due to a typo in the table decoding MSR_PKG_CST_CONFIG_CONTROL (0x000000e2)
Here we fix that typo, according to Intel SDM, vol 4, Table 2-41 -
"MSRs Supported by Intel® Xeon® Processor Scalable Family with DisplayFamily_DisplayModel 06_55H".
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit c91fc8519d.
That change caused a C6 and PC6 residency regression on large idle systems.
Users also complained about new output indicating jitter:
turbostat: cpu6 jitter 3794 9142
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: 4.13+ <stable@vger.kernel.org> # v4.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Turbostat has the capability to set its own affinity to
each CPU so that its MSR accesses are on the local CPU.
However, using the in-kernel cross-call in the msr driver
tends to be less invasive, so do that -- by-default.
'-m' remains to get the old behaviour.
Signed-off-by: Len Brown <len.brown@intel.com>
The --debug option now pre-pends each row with
the number of micro-seconds [usec] to collect
the finishing snapshot for that row.
Signed-off-by: Len Brown <len.brown@intel.com>
Skylake has some new counters, and they were erroneously
exempt from --show and --hide
eg.
turbostat --quiet --show CPU
CPU Totl%C0 Any%C0 GFX%C0 CPUGFX%
- 116.73 90.56 85.69 79.00
0 117.78 91.38 86.47 79.71
2
1
3
is now
CPU
-
0
2
1
3
Signed-off-by: Len Brown <len.brown@intel.com>
Most CPUs do not have a hardware c1 counter,
and so turbostat derives c1 residency:
c1 = TSC - MPERF - other_core_cstate_counters
As it is not possible to atomically read these coutners,
measurement jitter can case this calcuation to "go negative"
when very close to 0. Turbostat detect that case and
simply prints c1 = 0.00%
But that check neglected to account for systems where the TSC
crystal clock domain and the MPERF BCLK domain are differ by
a small amount. That allowed very small negative c1 numbers
to escape this check and be printed as huge positve numbers.
This code begs for a bit of cleanup, but this patch
is the minimal change to fix the issue.
Signed-off-by: Len Brown <len.brown@intel.com>
Add GFX%rc6 and GFXMHz to the column descriptions section
of the turbostat man page.
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Len Brown <len.brown@intel.com>
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x884b0800 (25 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C)
Enable the same per-core output, but hide it behind --debug
because it is too verbose on big systems.
Signed-off-by: Len Brown <len.brown@intel.com>
While the current SDM is silent on the matter, the Core and GFX
RAPL power meters on SKL and KBL appear to work -- so show them.
Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat displays a GFXMHz column, which comes from reading
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
But GFXMHz was not changing, even when a manual
cat /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
showed a new value.
It turns out that a rewind() on the open file is not sufficient,
fflush() (or a close/open) is needed to read fresh values.
Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The turbostat before this last set of changes is obsolete.
This new version can do a lot more, but it also has
some different defaults, that might catch some off-guard.
So it seems a good time to give a new version number.
Signed-off-by: Len Brown <len.brown@intel.com>
When the "u32" keyword is used with --add, it means that
the output should be truncated to 32-bits. This was not
happening and all 64-bits were printed.
Also, when no column name was used for an added MSR,
The default column name was in deximal, eg. MSR16.
Users report that they tend to use hex MSR numbers,
so print them in hex. To always fit into the columns,
use the syntax M0x10. Note that the user can always
supply any column header that they want.
eg --add msr0x10,MY_TSC
Signed-off-by: Len Brown <len.brown@intel.com>
When turbostat is run in one-shot command mode,
the parent takes the 'before' counter snapshot,
fork/exec/wait for the child to exit,
takes the 'after' counter snapshot,
and prints the results.
however, if the child fails to exec the command,
it immediately returns, without indicating that
anythign was wrong.
Add an error message showing that exec failed:
sudo turbostat sleeeep 4
...
turbostat: exec sleeeep: No such file or directory
...
Note that the parent will still print out the statistics,
because it can't tell the difference between the failed
exec and a command that is purposefully returning
the same status. Unfortunately, this may obscure the
error message. However, if the --out parameter is used,
the error message is evident on stderr.
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
On multi-package systems, the "Package" column was being displayed
only if --debug was used. Show it always.
Signed-off-by: Len Brown <len.brown@intel.com>
Originally, the only way to hide the sysfs C-state statistics columns
was with "--hide sysfs". This was because we process "--hide" before
we probe for those columns.
hack --hide to remember deferred hide requests, and apply
them when sysfs is probed.
"--hide sysfs" is still available as short-hand to refer to
the entire group of counters.
The down-side of this change is that we no longer error check for
bogus --hide column names. But the user will quickly figure that
out if a column they mean to hide is still there...
Signed-off-by: Len Brown <len.brown@intel.com>
--Package is now "--cpu package",
which will display just the 1st CPU in each package
--processor is not "--cpu core"
which will display just the 1st CPU in each core
Signed-off-by: Len Brown <len.brown@intel.com>
Make it possible to take the entire un-edited output
from `turbostat --list` and feed it to "turbostat --show"
or "turbostat --hide".
To do this, the leading comma was removed
(no mater what columns are active)
and also they dynamic C-state "C1, C2, C3" etc are replaced
by the string "sysfs", which refers to them as a group.
Signed-off-by: Len Brown <len.brown@intel.com>
When a counter overlfows 7 columns, it shifts the remaining
columns to the right, so they no longer line up under
their column header.
Update turbostat to dectect when it is handling large
numbers, and switch to wider columns where, necessary.
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
It is handy to know the list of column header names,
so that they can be used with --add and --skip
The new --list option shows them:
sudo ./turbostat --list --hide sysfs
,Core,CPU,Avg_MHz,Busy%,Bzy_MHz,TSC_MHz,IRQ,SMI,CPU%c1,CPU%c3,CPU%c6,CPU%c7,CoreTmp,PkgTmp,GFX%rc6,GFXMHz,PkgWatt,CorWatt,GFXWatt
Signed-off-by: Len Brown <len.brown@intel.com>
The IRQ column has been working for periodic mode,
but not in one-shot command mode, it shows only 0.
until now.
Signed-off-by: Len Brown <len.brown@intel.com>
With the --cpu parameter, turbostat prints only lines
for the specified set of CPUs:
sudo ./turbostat --quiet --show Core,CPU --cpu 0,1,3..5,6-7
Core CPU
- -
0 0
0 4
1 1
1 5
2 6
3 3
3 7
Signed-off-by: Len Brown <len.brown@intel.com>
When turbostat shows % of time in a CPU idle power state,
it has always been showing information from underlying
hardware residency counters.
While this reflects what the hardware is doing, and is thus
useful for understanding the hardware,
it doesn't directly tell us what Linux requested --
which is useful for tuning Linux itself.
Here we add columns to turbostat to show the
Linux cpuidle sub-system statistics:
/sys/devices/system/cpu/cpu*/cpuidle/state*/*
The first group of columns are the "usage", which is the
number of times software requested that C-state in the
measurement interval. eg C1 below.
The second group of columns are the "time", which is the percentage
of the measurement interval time that software has requested
the specified C-state. eg C1% below.
These software counters can be compared to the underlying
hardware residency counters (eg CPU%c1 CPU%c3 CPU%c6 CPU%c7)
to compare what sofware requested to what the hardware delivered.
These sysfs attributes are discovered when turbostat starts,
rather than being "built in". So the --show and --hide
parameters do not know about these dynamic column names.
However "--show sysfs" and "--hide sysfs" act on the
entire group of columns:
turbostat --show sysfs
...
cpu4: POLL: CPUIDLE CORE POLL IDLE
cpu4: C1: MWAIT 0x00
cpu4: C1E: MWAIT 0x01
cpu4: C3: MWAIT 0x10
cpu4: C6: MWAIT 0x20
cpu4: C7s: MWAIT 0x32
...
C1 C1E C3 C6 C7s C1% C1E% C3% C6% C7s%
3 6 5 1 188 0.00 0.02 0.00 0.00 99.93
0 6 5 0 58 0.00 0.16 0.02 0.00 99.70
0 0 0 0 9 0.00 0.00 0.00 0.00 99.96
0 0 0 1 24 0.00 0.00 0.00 0.02 99.93
0 0 0 0 9 0.00 0.00 0.00 0.00 99.97
0 0 0 0 32 0.00 0.00 0.00 0.00 99.96
0 0 0 0 7 0.00 0.00 0.00 0.00 99.98
2 0 0 0 36 0.00 0.00 0.00 0.00 99.97
1 0 0 0 13 0.00 0.00 0.00 0.00 99.98
Signed-off-by: Len Brown <len.brown@intel.com>
Previously, the --add option could specify only an MSR.
Here is is extended so an arbitrary /sys attribute,
as specified by an absolute file path name.
sudo ./turbostat --add /sys/devices/system/cpu/cpu0/cpuidle/state5/usage
Signed-off-by: Len Brown <len.brown@intel.com>
Newer processors do not hard-code the the number of cpus in each bin
to {1, 2, 3, 4, 5, 6, 7, 8} Rather, they can specify any number
of CPUS in each of the 8 bins:
eg.
...
37 * 100.0 = 3600.0 MHz max turbo 4 active cores
38 * 100.0 = 3700.0 MHz max turbo 3 active cores
39 * 100.0 = 3800.0 MHz max turbo 2 active cores
39 * 100.0 = 3900.0 MHz max turbo 1 active cores
could now look something like this:
...
37 * 100.0 = 3600.0 MHz max turbo 16 active cores
38 * 100.0 = 3700.0 MHz max turbo 8 active cores
39 * 100.0 = 3800.0 MHz max turbo 4 active cores
39 * 100.0 = 3900.0 MHz max turbo 2 active cores
Signed-off-by: Len Brown <len.brown@intel.com>
The CC1 column in tubostat can be computed by subtracting
the core c-state residency countes from the total Cx residency.
CC1 = (Idle_time_as_measured by MPERF) - (all core C-states with
residency counters)
However, as the underlying counter reads are not atomic,
error can be noticed in this calculations, especially
when the numbers are small.
Denverton has a hardware CC1 residency counter
to improve the accuracy of the cc1 statistic -- use it.
At the same time, Denverton has no concept of CC3, PC3, CC7, PC7,
so skip collecting and printing those columns.
Finally, a note of clarification.
Turbostat prints the standard PC2 residency counter,
but on Denverton hardware, that actually means PC1E.
Turbostat prints the standard PC6 residency counter,
but on Denverton hardware, that actually means PC2.
At this point, we document that differnce in this commit message,
rather than adding a quirk to the software.
Signed-off-by: Len Brown <len.brown@intel.com>
Fix a bug with --add, where the title of the column
is un-initialized if not specified by the user.
The initial implementation of --show and --hide
neglected to handle the pc8/pc9/pc10 counters.
Fix a bug where "--show Core" only worked with --debug
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The CPU ticks at a rate in the "bus clock" domain.
eg. 100 MHz * bus_ratio.
On newer processors, the TSC has been moved out of this BCLK
domain and into a separate crystal-clock domain.
While the TSC ticks "close to" the base frequency, those that look
closely at the numbers will notice small errors in calculations that
mix units of TSC clocks and bus clocks.
"tsc_tweak" was introduced to address the most visible
mixing -- the %Busy and the the Busy_MHz calculations.
(A simplification as since removed TSC from the BusyMHz calculation)
Here we apply the tsc_tweak to everyplace where BCLK
and TSC units are mixed. The results is that
on a system which is 100% idle, the sum of the C-states
are now much more likely to be closer to 100%.
Reported-by: Travis Downs <travis.downs@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some users want turbostat to tell them everything, by default.
Some users want turbostat to be quiet, by default.
I find that I'm in the 1st camp, and so I've never liked
needing to type the --debug parameter to decode the system
configuration.
So here we change the default and print the system configuration,
by default. (The --debug option is now un-documented, though
it does still exist for debugging turbostat internals)
When you do not want to see the system configuration
header, use the new "--quiet" option.
Signed-off-by: Len Brown <len.brown@intel.com>
Some time ago, turbostat overflowed 80 columns.
So on the assumption that a "casual" user would always
want topology and frequency columns, we hid the rest
of the columns and the system configuration decoding
behind the --debug option.
Not everybody liked that change -- including me.
I use --debug 99% of the time...
Well, now we have "-o file" to put turbostat output into a file,
so unless you are watching real-time in a small window,
column count is less frequently a factor.
And more recently, we got the "--hide columnA,columnB" option
to specify columns to skip.
So now we "un-hide" the rest of the columns from behind --debug,
and show them all, by default.
Signed-off-by: Len Brown <len.brown@intel.com>
useful for observing if the BIOS disabled prefetch
Not architectural, but docuemented as present on NHM, SNB
and is present on others.
Signed-off-by: Len Brown <len.brown@intel.com>
show the CPUID feature for turbo to clarify the case
when it may not be shown in MISC_ENABLE
CPUID(6): APERF, TURBO, DTS, PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, EPB
cpu4: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT TURBO)
Signed-off-by: Len Brown <len.brown@intel.com>
Turbostat dumps MSR_TURBO_RATIO_LIMIT on Core Architecture.
But Atom Architecture uses MSR_ATOM_CORE_RATIOS and
MSR_ATOM_CORE_TURBO_RATIOS.
Signed-off-by: Len Brown <len.brown@intel.com>
Decode MISC_ENABLE.NO_TURBO,
also use the #defines in msr-index.h for decoding this register
cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT TURBO)
Although it is not architectural, decode also
MSR_IA32_MISC_ENABLE.prefetch-disable (bit-9).
documented to be present on: Core, P4, Intel-Xeon
reserved on: Atom, Silvermont, Nehalem, SNB, PHI ec.
Signed-off-by: Len Brown <len.brown@intel.com>
Add a digit of precision to the --debug output for frequency range.
This is useful when BCLK is not an integer.
old:
6 * 83 = 500 MHz max efficiency frequency
26 * 83 = 2166 MHz base frequency
new:
6 * 83.3 = 499.8 MHz max efficiency frequency
26 * 83.3 = 2165.8 MHz base frequency
Signed-off-by: Len Brown <len.brown@intel.com>
The Baytrail SOC, with its Silvermont core, has some unique properties:
1. a hardware CC1 residency counter
2. a module-c6 residency counter
3. a package-c6 counter at traditional package-c7 counter address.
The SOC does not support c3, pc3, c7 or pc7 counters.
Signed-off-by: Len Brown <len.brown@intel.com>
with --debug, see:
cpu0: MSR_CC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-CC6-Demotion)
cpu0: MSR_MC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-MC6-Demotion)
Note that the hardware default is to enable demotion,
and Linux started clearing these registers in 3.17.
Signed-off-by: Len Brown <len.brown@intel.com>
and so --debug fails with:
turbostat: msr 1 offset 0x1aa read failed: Input/output error
It seems that baytrail, and airmont do not have this MSR.
It is included in subsequent Goldmont Atom.
Signed-off-by: Len Brown <len.brown@intel.com>
Add the "--show" and "--hide" cmdline parameters.
By default, turbostat shows all columns.
turbostat --hide counter_list
will continue showing all columns, except for those listed.
turbostat --show counter_list
will show _only_ the listed columns
These features work for built-in counters, and have no effect
on columns added with the --add parameter.
Signed-off-by: Len Brown <len.brown@intel.com>
When --add was used more than once, overflowed buffers
caused some counters to be stored on top of others,
corrupting the results. Simplify the code by simply
reserving space for up to 16 added counters per each
cpu, core, package.
Per-cpu added counters were being printed only per-core.
Signed-off-by: Len Brown <len.brown@intel.com>
The new --add option has replaced the -M, -m, -C, -c options
Eg.
-M 0x10 is now --add msr0x10,raw
-m 0x10 is now --add msr0x10,raw,u32
-C 0x10 is now --add msr0x10,delta
-c 0x10 is now --add msr0x10,delta,u32
The --add option can be repeated to add any number of counters,
while the previous options were limited to adding one of each type.
In addition, the --add option can accept a column label,
and can also display a counter as a percentage of elapsed cycles.
Eg. --add msr0x3fe,core,percent,MY_CC3
Signed-off-by: Len Brown <len.brown@intel.com>
Create the "--add" parameter. This can be used to teach an existing
turbostat binary about any number of any type of counter.
turbostat(8) details the syntax for --add.
Signed-off-by: Len Brown <len.brown@intel.com>
This changes only the TSC frequency decoding line seen with --debug
old: TSC: 1382 MHz (19200000 Hz * 216 / 3 / 1000000)
new: TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000)
Signed-off-by: Len Brown <len.brown@intel.com>
The -M option adds an 18-column item, and the header
needs to be wide enough to keep the header aligned
with the columns.
Signed-off-by: Len Brown <len.brown@intel.com>
SKX has fewer package C-states than previous generations,
and so the decoding of PKG_CSTATE_LIMIT has changed.
This changes the line ending with pkg-cstate-limit=XXX: pcYYY
Signed-off-by: Len Brown <len.brown@intel.com>
Display if the HWP is enabled in OOB (Out of band) mode.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Add Denverton to the group of SandyBridge and later processors,
to let the bclk be recognized as 100MHz rather than 133MHz,
then avoid the wrong value of the frequencies based on it,
including Bzy_MHz, max efficiency freuency, base frequency,
and turbo mode frequencies.
Signed-off-by: Xiaolong Wang <xiaolong.wang@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
All except for model 1F, a Nehalem, which is currently incorrectly
indentified as a Westmere in that new header.
Signed-off-by: Len Brown <len.brown@intel.com>
The Denverton CPU RAPL supports package, core, and DRAM domains.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Denverton is an Atom based micro server which shares the same
Goldmont architecture as Broxton. The available C-states on
Denverton is a subset of Broxton with only C1, C1e, and C6.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some CPUs may not have PP0/Core domain power limit MSRs. We
should still allow its domain energy status to be used. This
patch splits PP0/Core RAPL into two separate flags for power
limit and energy status such that energy status can continue
to be reported without power limit.
Without this patch, turbostat will not be able to use the
remaining RAPL features if some PL MSRs are not present.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
When i >= SLM_BCLK_FREQS, the frequency read from the slm_freq_table
is off the end of the array because msr is set to 3 rather than the
actual array index i. Set i to 3 rather than msr to fix this.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The tool uses topo.max_cpu_num to determine number of entries needed for
fd_percpu[] and irqs_per_cpu[]. For example on a system with 4 CPUs
topo.max_cpu_num is 3 so we get too small array for holding per-CPU items.
Fix this to use right number of entries, which is topo.max_cpu_num + 1.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Switch to tab-delimited output from fixed-width columns
to make it simpler to import into spreadsheets.
As the fixed width columnns were 8-spaces wide,
the output on the screen should not change.
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat gives valid results across suspend to idle, aka freeze,
whether invoked in interval mode, or in command mode.
Indeed, this can be used to measure suspend to idle:
turbostat echo freeze > /sys/power/state
But this does not work across suspend to ACPI S3, because the
processor counters, including the TSC, are reset on resume.
Further, when turbostat detects a problem, it does't forgive
the hardware, and interval mode will print *'s from there on out.
Instead, upon detecting counters going backwards, simply
reset and start over.
Interval mode across ACPI S3: (observe TSC going backwards)
root@sharkbay:/home/lenb/turbostat-src# ./turbostat -M 0x10
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x010
- 1 0.06 858 2294 0x0000000000000000
0 0 0.06 847 2294 0x0000002a254b98ac
1 1 0.06 878 2294 0x0000002a254efa3a
2 1 0.07 843 2294 0x0000002a2551df65
3 0 0.05 863 2294 0x0000002a2553fea2
turbostat: re-initialized with num_cpus 4
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x010
- 2 0.20 849 2294 0x0000000000000000
0 2 0.26 856 2294 0x0000000449abb60d
1 2 0.20 844 2294 0x0000000449b087ec
2 2 0.21 850 2294 0x0000000449b35d5d
3 1 0.12 839 2294 0x0000000449b5fd5a
^C
Command mode across ACPI S3:
root@sharkbay:/home/lenb/turbostat-src# ./turbostat -M 0x10 sleep 10
./turbostat: Counter reset detected
14.196299 sec
Signed-off-by: Len Brown <len.brown@intel.com>
The RAPL Joules counter is limited in capacity.
Turbostat estimates how soon it can roll-over
based on the max TDP of the processor --
which tells us the maximum increment rate.
eg.
RAPL: 2759 sec. Joule Counter Range, at 95 Watts
So if a sample duration is longer than 2759 seconds on this system,
'**' replace the decimal place in the display to indicate
that the results may be suspect.
But the display had an extra ' ' in this case, throwing off the columns.
Also, the -J "Joules" option appended an extra "time" column
to the display. While this may be useful, it printed the interval time,
which may not be the accurate time per processor. Remove this column,
which appeared only when using '-J',
as we plan to add accurate per-cpu interval times in a future commit.
Signed-off-by: Len Brown <len.brown@intel.com>
Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When run
make -C tools DESTDIR=/my/nice/dir turbostat_install
get a message
install: cannot create regular file '/usr/bin/turbostat': Permission denied
Allow user to alter DESTDIR and PREFIX variables.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-core:
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
PM / runtime: Document steps for device removal
* powercap:
powercap: intel_rapl: Add missing Haswell model
* pm-tools:
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
Sometimes the rc6 sysfs counter spontaneously resets,
causing turbostat prints a very large number
as it tries to calcuate % = 100 * (old - new) / interval
When we see (old > new), print ***.**% instead
of a bogus huge number.
Note that this detection is not fool-proof, as the counter
could reset several times and still result in new > old.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hard-code BXT ART to 19200MHz, so turbostat --debug
can fully enumerate TSC:
CPUID(0x15): eax_crystal: 3 ebx_tsc: 186 ecx_crystal_hz: 0
TSC: 1190 MHz (19200000 Hz * 186 / 3 / 1000000)
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.
IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The CPUID.SGX bit was printed, even if --debug was used
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Redesign of cpufreq governors and the intel_pstate driver to
make them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers
for that purpose (Rafael Wysocki).
- Reorganization and cleanup of cpufreq governor code to make it
more straightforward and fix some concurrency problems in it
(Rafael Wysocki, Viresh Kumar).
- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).
- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).
- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).
- Operating Performance Points (OPP) framework updates to improve
its handling of voltage regulators and device clocks and updates
of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
- Updates of the powernv cpufreq driver to fix initialization
and cleanup problems in it and correct its worker thread handling
with respect to CPU offline, new powernv_throttle tracepoint
(Shilpasri Bhat).
- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
- ACPICA updates including one fix for a regression introduced
by previos changes in the ACPICA code (Bob Moore, Lv Zheng,
David Box, Colin Ian King).
- Support for installing ACPI tables from initrd (Lv Zheng).
- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).
- Support for _HID(ACPI0010) devices (ACPI processor containers)
and ACPI processor driver cleanups (Sudeep Holla).
- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).
- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as
a valid interrupt vector (Chen Fan).
- ACPI APEI fixes related to resource leaks (Josh Hunt).
- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).
- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).
- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).
- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).
- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).
- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).
- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).
- Year 2038 fix for the process freezer (Abhilash Jindal).
- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls made,
fixes related to Xeon x200 processors, compiler warning fixes) and
cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJW50NXAAoJEILEb/54YlRxvr8QAIktC9+ft0y5AmU46hDcBWcK
QutyWJL9X9BS6DWBJZA2qclDYFmhMfi5Fza1se0gQ9TnLB/KrBwHWLsiYoTsb1k+
nPKf214aPk+qAhkVuyB4leNWML9Qz9n9jwku/EYxWWpgtbSRf3+0ioIKZeWWc/8V
JvuaOu4O+g/tkmL7QTrnGWBwhIIssAAV85QPsHkx+g68MrCj4UMMzm7z9G21SPXX
bmP8yIHsczX/XnRsY0W2NSno7Vdk6ImHpDJ26IAZg28WRNPWICHgGYHvB0TTWMvb
tts+yqfF7/7QLRjT/M8k9CzDBDE/DnVqoZ0fNJ+aYr7hNKF32mtAN+jH9ZB9dl/P
fEFapJkPxnWyzAoVoB9Dz0rkcZkYMlbxlLWzUGpaPq0JflUUTzLk0ApSjmMn4HRO
UddwCDdyHTaYThp3gn6GbOb0pIP0SdOVbI1M2QV2x/4PLcT2Ft8Np1+1RFWOeinZ
Bdl9AE890big0808mqbBzw/buETwr9FjHtCdDPXpP0vJpkBLu3nIYRNb0LCt39es
mWMp6dFhGgvGj3D3ahTuV3GI8hdpDkh9SObexa11RCjkTKrXcwEmFxHxLeFXwKYq
alG278bo6cSChRMziS1lis+W/3tsJRN4TXUSv1PPzJHrFgptQVFRStU9ngBKP+pN
WB+itPc4Fw0YHOrAFsrx
=cfty
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the majority of changes go into cpufreq and they are
significant.
First off, the way CPU frequency updates are triggered is different
now. Instead of having to set up and manage a deferrable timer for
each CPU in the system to evaluate and possibly change its frequency
periodically, cpufreq governors set up callbacks to be invoked by the
scheduler on a regular basis (basically on utilization updates). The
"old" governors, "ondemand" and "conservative", still do all of their
work in process context (although that is triggered by the scheduler
now), but intel_pstate does it all in the callback invoked by the
scheduler with no need for any additional asynchronous processing.
Of course, this eliminates the overhead related to the management of
all those timers, but also it allows the cpufreq governor code to be
simplified quite a bit. On top of that, the common code and data
structures used by the "ondemand" and "conservative" governors are
cleaned up and made more straightforward and some long-standing and
quite annoying problems are addressed. In particular, the handling of
governor sysfs attributes is modified and the related locking becomes
more fine grained which allows some concurrency problems to be avoided
(particularly deadlocks with the core cpufreq code).
In principle, the new mechanism for triggering frequency updates
allows utilization information to be passed from the scheduler to
cpufreq. Although the current code doesn't make use of it, in the
works is a new cpufreq governor that will make decisions based on the
scheduler's utilization data. That should allow the scheduler and
cpufreq to work more closely together in the long run.
In addition to the core and governor changes, cpufreq drivers are
updated too. Fixes and optimizations go into intel_pstate, the
cpufreq-dt driver is updated on top of some modification in the
Operating Performance Points (OPP) framework and there are fixes and
other updates in the powernv cpufreq driver.
Apart from the cpufreq updates there is some new ACPICA material,
including a fix for a problem introduced by previous ACPICA updates,
and some less significant changes in the ACPI code, like CPPC code
optimizations, ACPI processor driver cleanups and support for loading
ACPI tables from initrd.
Also updated are the generic power domains framework, the Intel RAPL
power capping driver and the turbostat utility and we have a bunch of
traditional assorted fixes and cleanups.
Specifics:
- Redesign of cpufreq governors and the intel_pstate driver to make
them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers for
that purpose (Rafael Wysocki).
- Reorganization and cleanup of cpufreq governor code to make it more
straightforward and fix some concurrency problems in it (Rafael
Wysocki, Viresh Kumar).
- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).
- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).
- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).
- Operating Performance Points (OPP) framework updates to improve its
handling of voltage regulators and device clocks and updates of the
cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
- Updates of the powernv cpufreq driver to fix initialization and
cleanup problems in it and correct its worker thread handling with
respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
Bhat).
- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
- ACPICA updates including one fix for a regression introduced by
previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
Colin Ian King).
- Support for installing ACPI tables from initrd (Lv Zheng).
- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).
- Support for _HID(ACPI0010) devices (ACPI processor containers) and
ACPI processor driver cleanups (Sudeep Holla).
- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).
- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as a
valid interrupt vector (Chen Fan).
- ACPI APEI fixes related to resource leaks (Josh Hunt).
- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).
- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).
- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).
- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).
- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).
- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).
- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).
- Year 2038 fix for the process freezer (Abhilash Jindal).
- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls
made, fixes related to Xeon x200 processors, compiler warning
fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"
* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
ACPI / APEI: ERST: Fixed leaked resources in erst_init
ACPI / APEI: Fix leaked resources
intel_pstate: Do not skip samples partially
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
...
MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF
MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF
And the same modification to MSR_CONFIG_TDP_LEVEL_2.
MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().
syntax only, no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.
When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.
A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.
For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.
Signed-off-by: Len Brown <len.brown@intel.com>
The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.
This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.
Signed-off-by: Len Brown <len.brown@intel.com>
Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.
Signed-off-by: Len Brown <len.brown@intel.com>
The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.
The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.
Signed-off-by: Len Brown <len.brown@intel.com>
skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.
The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.
Signed-off-by: Len Brown <len.brown@intel.com>
By default...
Turbostat --debug gconfiguration info goes to stderr.
In FORK mode, turbostat statistics go to stderr.
In PERIODIC mode, turbostat statistics go to stdout.
These defaults do not change, but an option "--out file"
will send all output above only to the specified file.
Signed-off-by: Len Brown <len.brown@intel.com>
some tools processing turbostat output
have difficulty with items that begin with %...
Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
parsing code
- added x200 to list of architectures that do not support Nahlem compatible
definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
(logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
same order as implementations for other platforms
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.
prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat -i interval_sec
will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).
Signed-off-by: Len Brown <len.brown@intel.com>
When building with gcc 6 we're getting various build warnings that just
require some trivial function declaration and call fixes:
turbostat.c: In function ‘dump_cstate_pstate_config_info’:
turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
dump_cstate_pstate_config_info(family, model)
turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
turbostat.c: In function ‘get_tdp’:
turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
double get_tdp(model)
turbostat.c: In function ‘perf_limit_reasons_probe’:
turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
void perf_limit_reasons_probe(family, model)
turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
for debugging, dump a few more fields:
CPUID(1): SSE3 MONITOR EIST TM2 TSC MSR ACPI-TM TM
cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MONITOR)
Signed-off-by: Len Brown <len.brown@intel.com>
MSR_PLATFORM_INFO is the new name for MSR_NHM_PLATFORM_INFO
no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=6 lock=0)
should print all 7 bits of MAX_NON_TURBO_RATIO (in decimal):
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=22 lock=0)
Signed-off-by: Len Brown <len.brown@intel.com>
Bzy_MHz = TSC_delta*tsc_tweak/APERF_delta/MPERF_delta/measurement_interval
becomes
Bzy_MHz = base_mhz/APERF_delta/MPERF_delta
on systems which support MSR_NHM_PLATFORM_INFO.
base_mhz is calculated directly from the base_ratio
reported in MSR_NHM_PLATFORM_INFO * bclk,
and bclk is discovered via MSR or cpuid.
This reduces the dependency of Bzy_MHz calculation on the TSC.
Previously, there were 4 TSC readings required in each caculation,
the raw TSC delta combined with the measurement_interval.
This also removes the "tsc_tweak" correction factor used when
TSC runs on a different base clock from the CPU's bclk.
After this change, tsc_tweak is used only for %Busy.
The end-result should be a Bzy_MHz result slightly less prone to jitter.
Signed-off-by: Len Brown <len.brown@intel.com>
On a Skylake with 1500MHz base frequency,
the TSC runs at 1512MHz.
This is because the TSC is no longer in the n*100 MHz BCLK domain,
but is now in the m*24MHz crystal clock domain. (24 MHz * 63 = 1512 MHz)
This adds error to several calculations in turbostat,
unless the TSC sample sizes are adjusted for this difference.
Note that calculations in the time domain are immune
from this issue, as the timing sub-system has already
calibrated the TSC against a known wall clock.
AVG_MHz = APERF_delta/measurement_interval
need no adjustment. APERF_delta is in the BCLK domain,
and measurement_interval is in the time domain.
TSC_MHz = TSC_delta/measurement_interval
needs no adjustment -- as we really do want to report
the actual measured TSC delta here, and measurement_interval
is in the accurate time domain.
%Busy = MPERF_delta/TSC_delta
needs adjustment to use TSC_BCLK_DOMAIN_delta.
TSC_BCLK_DOMAIN_delta = TSC_delta * base_hz / tsc_hz
Bzy_MHz = TSC_delta/APERF_delta/MPERF_delta/measurement_interval
need adjustment as above.
No other metrics in turbostat need to be adjusted.
Before:
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 550 24.84 2216 1512
0 2191 98.73 2219 1514
2 0 0.01 2130 1512
1 9 0.43 2016 1512
3 2 0.08 2016 1512
After:
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 550 25.05 2198 1512
0 2190 99.62 2199 1512
2 0 0.01 2152 1512
1 9 0.46 2000 1512
3 2 0.10 2000 1512
Note that in this example, the "Before" Bzy_MHz
was reported as exceeding the 2200 max turbo rate.
Also, even a pinned spin loop would not be reported
as over 99% busy.
Signed-off-by: Len Brown <len.brown@intel.com>
KNL increments APERF and MPERF every 1024 clocks.
This is compliant with the architecture specification,
which requires that only the ratio of APERF/MPERF need be valid.
However, turbostat takes advantage of the fact that these
two MSRs increment every un-halted clock
at the actual and base frequency:
AVG_MHz = APERF_delta/measurement_interval
%Busy = MPERF_delta/TSC_delta
This quirk is needed for these calculations to also work on KNL,
which would otherwise show a value 1024x smaller than expected.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Staring in Linux-4.3-rc1,
commit 6fb3143b56 ("tools/power turbostat: dump CONFIG_TDP")
touches MSR 0x648, which is not supported on IVB-Xeon.
This results in "turbostat --debug" exiting on those systems:
turbostat: /dev/cpu/2/msr offset 0x648 read failed: Input/output error
Remove IVB-Xeon from the list of machines supporting with that MSR.
Signed-off-by: Len Brown <len.brown@intel.com>
Pull turbostat changes for v4.3 from Len Brown.
* 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: fix typo on DRAM column in Joules-mode
tools/power turbostat: fix parameter passing for forked command
tools/power turbostat: dump CONFIG_TDP
tools/power turbostat: cpu0 is no longer hard-coded, so update output
tools/power turbostat: update turbostat(8)
turbostat supports forked command when sampling cpu state. However,
the forked command is not allowed to be executed with options, otherwise
turbostat might regard these options as invalid turbostat options.
For example:
./turbostat stress -c 4 -t 10
./turbostat: unrecognized option '-t'
Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Config TDP is a feature that allows parts to be configured
for different thermal limits after they have left the factory.
This can have an effect on the operation of the part,
particularly in determiniing...
Max Non-turbo Ratio
Turbo Activation Ratio
Signed-off-by: Len Brown <len.brown@intel.com>
The --debug option reads a number of per-package MSRs.
Previously we explicitly read them on cpu0, but recently
turbostat changed to read them on the current "base_cpu".
Update the print-out to reflect base_cpu, rather than
the hard-coded cpu0.
Signed-off-by: Len Brown <len.brown@intel.com>
This header containing all MSRs and respective bit definitions
got exported to userspace in conjunction with the big UAPI
shuffle.
But, it doesn't belong in the UAPI headers because userspace can
do its own MSR defines and exporting them from the kernel blocks
us from doing cleanups/renames in that header. Which is
ridiculous - it is not kernel's job to export such a header and
keep MSRs list and their names stable.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-19-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove reference to the original Nehalem Turbo white paper,
since it has moved, and these mechanisms have now long since
been documented in the Software Developer's Manual.
Reported-by: Jeremie Lagraviere <jeremie@simula.no>
Signed-off-by: Len Brown <len.brown@intel.com>
Linux-3.7 added CONFIG_BOOTPARAM_HOTPLUG_CPU0,
allowing systems to offline cpu0.
But when cpu0 is offline, turbostat will not run:
# turbostat ls
turbostat: no /dev/cpu/0/msr
This patch replaces the hard-coded use of cpu0 in turbostat
with the current cpu, allowing it to run without a cpu0.
Fewer cross-calls may also be needed due to use of current cpu,
though this hard-coding was used only for the --debug preamble.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
When EPB is 0xF, turbosat was incorrectly describing it as "custom"
instead of calling it "powersave":
< cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0000000f (custom)
> cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0000000f (powersave)
Signed-off-by: Len Brown <len.brown@intel.com>
Changes mainly to account for minor differences in Knights Landing(KNL):
1. KNL supports C1 and C6 core states.
2. KNL supports PC2, PC3 and PC6 package states.
3. KNL has a different encoding of the TURBO_RATIO_LIMIT MSR
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Without this update, turbostat displays only 2 threads per core.
Some processors, such as Xeon Phi, have more.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
HSW expanded MSR_PKG_CST_CONFIG_CONTROL.Package-C-State-Limit,
from bits[2:0] used by previous implementations, to [3:0].
The value 1000b is unlimited, and is used by BDW and SKL too.
Signed-off-by: Len Brown <len.brown@intel.com>
While not yet documented in the Software Developer's Manual,
the data-sheet for modern Xeon states that DRAM RAPL ENERGY units
are fixed at 15.3 uJ, rather than being discovered via MSR.
Before this patch, DRAM energy on these products is over-stated by turbostat
because the RAPL units are 4x larger.
ref: "Xeon E5-2600 v3/E5-1600 v3 Datasheet Volume 2"
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Signed-off-by: Andrey Semin <andrey.semin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Skylake adds some additional residency counters.
Skylake supports a different mix of RAPL registers
from any previous product.
In most other ways, Skylake is like Broadwell.
Signed-off-by: Len Brown <len.brown@intel.com>
Since commit ee0778a301
("tools/power: turbostat: make Makefile a bit more capable")
turbostat's Makefile is using
[...]
BUILD_OUTPUT := $(PWD)
[...]
which obviously causes trouble when building "turbostat" with
make -C /usr/src/linux/tools/power/x86/turbostat ARCH=x86 turbostat
because GNU make does not update nor guarantee that $PWD is set.
This patch changes the Makefile to use $CURDIR instead, which GNU make
guarantees to set and update (i.e. when using "make -C ...") and also
adds support for the O= option (see "make help" in your root of your
kernel source tree for more details).
Link: https://bugs.gentoo.org/show_bug.cgi?id=533918
Fixes: ee0778a301 ("tools/power: turbostat: make Makefile a bit more capable")
Signed-off-by: Thomas D. <whissi@whissi.de>
Cc: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some distros (Ubuntu) ship the msr driver as a module.
If turbosat is run as root on those systems, and discovers
that there is no /dev/cpu/cpu0/msr, it will now "modprobe msr"
for the user.
If not root, the modprobe attempt will fail, and turbostat will exit as before:
turbostat: no /dev/cpu/0/msr, Try "# modprobe msr" : No such file or directory
Signed-off-by: Len Brown <len.brown@intel.com>
s/MSR_NHM_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT/
s/MSR_IVT_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT1/
syntax only -- use the documented strings describing these registers.
Signed-off-by: Len Brown <len.brown@intel.com>
syntax only.
The cool kids are now using the phrase "base frequency",
where in the past we used "max non-turbo frequency" or "TSC frequency".
This distinction becomes important when a processor has a TSC
that runs at a different speed than the "base frequency".
Signed-off-by: Len Brown <len.brown@intel.com>
cosmetic only.
order the decoding of MSR_PERF_LIMIT_REASONS bits
from MSB to LSB -- which you notice when more than 1 bit is set
and you are, say, comparing the output to the documentation...
Signed-off-by: Len Brown <len.brown@intel.com>
Casual turbostat users generally just want to know MHz.
So by default, just print enough information to make sense of MHz.
All the other configuration data and columns for C-states and temperature etc,
are printed with the --debug option.
Signed-off-by: Len Brown <len.brown@intel.com>
Long format options added, though the short ones should still work.
eg. the new "--Counter 0x10" is the same as the old "-C 0x10"
Note this Incompatibility:
Old:
-v displayed verbose debug output
New:
-v and --version simpaly display version
Additional parameters:
-d and --debug display verbose debug output
-h and --help display a help message
Updated turbosat.8 man page accordingly.
Signed-off-by: Len Brown <len.brown@intel.com>
Replaced previously open-coded Package C-state Limit decoding
with table-driven decoding. In doing so, updated to match January 2015
"Intel(R) 64 and IA-23 Architectures Software Developer's Manual".
In the past, turbostat would print package C-state residency columns
for all package states supported by the model's architecture, even though
a particular SKU may not support them, or they may be disabled by the BIOS.
Now turbostat will skip printing colunns if MSRs indicate that they are not enabled.
eg. many SKUs don't support PC7, and so that column will no longer be printed.
Signed-off-by: Len Brown <len.brown@intel.com>