cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.
As a first step, move __flush_tlb() out of line and hide the native
function. The latter can be static when CONFIG_PARAVIRT is disabled.
Consolidate the namespace while at it and remove the pointless extra
wrapper in the paravirt code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.246130908@linutronix.de
Pull x86 mtrr updates from Ingo Molnar:
"Two changes: restrict /proc/mtrr to CAP_SYS_ADMIN, plus a cleanup"
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Require CAP_SYS_ADMIN for all access
x86/mtrr: Get rid of mtrr_seq_show() forward declaration
pat.h is a file whose main purpose is to provide the memtype_*() APIs.
PAT is the low level hardware mechanism - but the high level abstraction
is memtype.
So name the header <memtype.h> as well - this goes hand in hand with memtype.c
and memtype_interval.c.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Zhang Xiaoxu noted that physical address locations for MTRR were visible
to non-root users, which could be considered an information leak.
In discussing[1] the options for solving this, it sounded like just
moving the capable check into open() was the first step.
If this breaks userspace, then we will have a test case for the more
conservative approaches discussed in the thread. In summary:
- MTRR should check capabilities at open time (or retain the
checks on the opener's permissions for later checks).
- changing the DAC permissions might break something that expects to
open mtrr when not uid 0.
- if we leave the DAC permissions alone and just move the capable check
to the opener, we should get the desired protection. (i.e. check
against CAP_SYS_ADMIN not just the wider uid 0.)
- if that still breaks things, as in userspace expects to be able to
read other parts of the file as non-uid-0 and non-CAP_SYS_ADMIN, then
we need to censor the contents using the opener's permissions. For
example, as done in other /proc cases, like commit
51d7b12041 ("/proc/iomem: only expose physical resource addresses to privileged users").
[1] https://lore.kernel.org/lkml/201911110934.AC5BA313@keescook/
Reported-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-security-module@vger.kernel.org
Cc: Matthew Garrett <mjg59@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: x86-ml <x86@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/201911181308.63F06502A1@keescook
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: i386_defconfig i386):
arch/x86/kernel/cpu/mtrr/cyrix.c:99:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20190805201712.GA19927@embeddedor
Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.
On some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:
clocksource: timekeeping watchdog on CPU1: Marking clocksource
'tsc-early' as unstable because the skew is too large:
clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
fffedb90 mask: ffffffff
clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
mask: ffffffffffffffff
tsc: Marking TSC unstable due to clocksource watchdog
As per measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers:
Platform Before After
104-core (208 Threads) Skylake 1437ms 28ms
2-core ( 4 Threads) Haswell 114ms 2ms
Reported-by: Mohammad Etemadi <mohammad.etemadi@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Alan Cox <alan.cox@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/1561689337-19390-3-git-send-email-ricardo.neri-calderon@linux.intel.com
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Compiling the kernel with W=1 results in the following warning:
arch/x86/kernel/cpu/mtrr/cleanup.c:299:16: warning: variable ‘second_basek’ set but not used [-Wunused-but-set-variable]
unsigned long second_basek, second_sizek;
Remove the unused variable.
[ tglx: Massaged changelog ]
Signed-off-by: Bo Yu <tsu.yubo@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: puwen@hygon.cn
Link: https://lkml.kernel.org/r/20190208125343.11451-1-tsu.yubo@gmail.com
Currently the copy_to_user of data in the gentry struct is copying
uninitiaized data in field _pad from the stack to userspace.
Fix this by explicitly memset'ing gentry to zero, this also will zero any
compiler added padding fields that may be in struct (currently there are
none).
Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable")
Fixes: b263b31e8a ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: security@kernel.org
Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com
"sizeof(x)" is the canonical coding style used in arch/x86 most of the time.
Fix the few places that didn't follow the convention.
(Also do some whitespace cleanups in a few places while at it.)
[ mingo: Rewrote the changelog. ]
Signed-off-by: Jordan Borgner <mail@jordan-borgner.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181028125828.7rgammkgzep2wpam@JordanDesktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Don't access the provided buffer out of bounds - this can cause a kernel
out-of-bounds read when invoked through sys_splice() or other things that
use kernel_write()/__kernel_write().
Fixes: 7f8ec5a4f0 ("x86/mtrr: Convert to use strncpy_from_user() helper")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180706215003.156702-1-jannh@google.com
Pull x86 updates and fixes from Thomas Gleixner:
- Fix the (late) fallout from the vector management rework causing
hlist corruption and irq descriptor reference leaks caused by a
missing sanity check.
The straight forward fix triggered another long standing issue to
surface. The pre rework code hid the issue due to being way slower,
but now the chance that user space sees an EBUSY error return when
updating irq affinities is way higher, though quite a bunch of
userspace tools do not handle it properly despite the fact that EBUSY
could be returned for at least 10 years.
It turned out that the EBUSY return can be avoided completely by
utilizing the existing delayed affinity update mechanism for irq
remapped scenarios as well. That's a bit more error handling in the
kernel, but avoids fruitless fingerpointing discussions with tool
developers.
- Decouple PHYSICAL_MASK from AMD SME as its going to be required for
the upcoming Intel memory encryption support as well.
- Handle legacy device ACPI detection properly for newer platforms
- Fix the wrong argument ordering in the vector allocation tracepoint
- Simplify the IDT setup code for the APIC=n case
- Use the proper string helpers in the MTRR code
- Remove a stale unused VDSO source file
- Convert the microcode update lock to a raw spinlock as its used in
atomic context.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Enable CMT and MBM on new Skylake stepping
x86/apic/vector: Print APIC control bits in debugfs
genirq/affinity: Defer affinity setting if irq chip is busy
x86/platform/uv: Use apic_ack_irq()
x86/ioapic: Use apic_ack_irq()
irq_remapping: Use apic_ack_irq()
x86/apic: Provide apic_ack_irq()
genirq/migration: Avoid out of line call if pending is not set
genirq/generic_pending: Do not lose pending affinity update
x86/apic/vector: Prevent hlist corruption and leaks
x86/vector: Fix the args of vector_alloc tracepoint
x86/idt: Simplify the idt_setup_apic_and_irq_gates()
x86/platform/uv: Remove extra parentheses
x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME
x86: Mark native_set_p4d() as __always_inline
x86/microcode: Make the late update update_lock a raw lock for RT
x86/mtrr: Convert to use strncpy_from_user() helper
x86/mtrr: Convert to use match_string() helper
x86/vdso: Remove unused file
x86/i8237: Register device based on FADT legacy boot flag
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apm: Fix spelling mistake: "caculate" -> "calculate"
x86/mtrr: Rename main.c to mtrr.c and remove duplicate prefixes
x86: Remove pr_fmt duplicate logging prefixes
x86/early-quirks: Rename duplicate define of dev_err
x86/bpf: Clean up non-standard comments, to make the code more readable
The x86/mtrr code does horrific things because hardware. It uses
stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper
thread on another CPU), which uses RCU, all before the CPU is onlined.
RCU complains about this, because wakeups use RCU and RCU does
(rightfully) not consider offline CPUs for grace-periods.
Fix this by initializing RCU way early in the MTRR case.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add !SMP support, per 0day Test Robot report. ]
Replace the open coded string fetch from user-space with strncpy_from_user().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/20180515180535.89703-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The helper returns index of the matching string in an array.
Replace the open coded array lookup with match_string().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/20180515175759.89315-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kbuild uses the first file as the name for KBUILD_MODNAME.
mtrr uses main.c as its first file, so rename that file to mtrr.c
and fixup the Makefile.
Remove the now duplicate "mtrr: " prefixes from the logging calls.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/ae1fa81a0d1fad87571967b91ea90f70f486e853.1525964384.git.joe@perches.com
x86_mask is a confusing name which is hard to associate with the
processor's stepping.
Additionally, correct an indent issue in lib/cpu.c.
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Larry reported a CPU hotplug lock recursion in the MTRR code.
============================================
WARNING: possible recursive locking detected
systemd-udevd/153 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c030fc26>] stop_machine+0x16/0x30
but task is already holding lock:
(cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c0234353>] mtrr_add_page+0x83/0x470
....
cpus_read_lock+0x48/0x90
stop_machine+0x16/0x30
mtrr_add_page+0x18b/0x470
mtrr_add+0x3e/0x70
mtrr_add_page() holds the hotplug rwsem already and calls stop_machine()
which acquires it again.
Call stop_machine_cpuslocked() instead.
Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708140920250.1865@nanos
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@suse.de>
mtrr_save_state() is invoked from native_cpu_up() which is in the context
of a CPU hotplug operation and therefor calling get_online_cpus() is
pointless.
While this works in the current get_online_cpus() implementation it
prevents from converting the hotplug locking to percpu rwsems.
Remove it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170524081547.651378834@linutronix.de
So there's a number of constants that start with "E820" but which
are not types - these create a confusing mixture when seen together
with 'enum e820_type' values:
E820MAP
E820NR
E820_X_MAX
E820MAX
To better differentiate the 'enum e820_type' values prefix them
with E820_TYPE_.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have these three related functions:
extern void e820_add_region(u64 start, u64 size, int type);
extern u64 e820_update_range(u64 start, u64 size, unsigned old_type, unsigned new_type);
extern u64 e820_remove_range(u64 start, u64 size, unsigned old_type, int checktype);
But it's not clear from the naming that they are 3 operations based around the
same 'memory range' concept. Rename them to better signal this, and move
the prototypes next to each other:
extern void e820__range_add (u64 start, u64 size, int type);
extern u64 e820__range_update(u64 start, u64 size, unsigned old_type, unsigned new_type);
extern u64 e820__range_remove(u64 start, u64 size, unsigned old_type, int checktype);
Note that this improved organization of the functions shows another problem that was easy
to miss before: sometimes the E820 entry type is 'int', sometimes 'unsigned int' - but this
will be fixed in a separate patch.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
update_e820() should have 'e820' as a prefix as most of the other E820
functions have - but it's also a bit unclear about its purpose, as
it's unclear what is updated - the whole table, or an entry?
Also, the name does not express that it's a trivial wrapper
around sanitize_e820_table() that also prints out the resulting
table.
So rename it to e820__update_table_print(). This also makes it
harmonize with the e820__update_table_firmware() function which
has a very similar purpose.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In line with asm/e820/types.h, move the e820 API declarations to
asm/e820/api.h and update all usage sites.
This is just a mechanical, obviously correct move & replace patch,
there will be subsequent changes to clean up the code and to make
better use of the new header organization.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Guided by grsecurity's analogous __read_only markings in arch/x86,
this applies several uses of __ro_after_init to structures that are
only updated during __init, and const for some structures that are
never updated. Additionally extends __init markings to some functions
that are only used during __init, and cleans up some missing C99 style
static initializers.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20160808232906.GA29731@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. In the case of
some of these which are modular, we can extend that to also include
files that are building basic support functionality but not related
to loading or registering the final module; such files also have
no need whatsoever for module.h
The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.
Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each instance for the
presence of either and replace as needed.
In the case of crypto/glue_helper.c we delete a redundant instance
of MODULE_LICENSE in order to delete module.h -- the license info
is already present at the top of the file.
The uncore change warrants a mention too; it is uncore.c that uses
module.h and not uncore.h; hence the relocation done there.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-9-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.
Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed. Build testing
revealed some implicit header usage that was fixed up accordingly.
Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use static_cpu_has() in __flush_tlb_all() due to the time-sensitivity of
this one.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
get_mtrr_state() calls pat_init() on BSP even if MTRR is disabled.
This results in calling pat_init() on BSP only since APs do not call
pat_init() when MTRR is disabled. This inconsistency between BSP
and APs leads to undefined behavior.
Make BSP's calling condition to pat_init() consistent with AP's,
mtrr_ap_init() and mtrr_aps_init().
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-6-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A Xorg failure on qemu32 was reported as a regression [1] caused by
commit 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled").
This patch fixes the Xorg crash.
Negative effects of this regression were the following two failures [2]
in Xorg on QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were
triggered by the fact that its virtual CPU does not support MTRRs.
#1. copy_process() failed in the check in reserve_pfn_range()
copy_process
copy_mm
dup_mm
dup_mmap
copy_page_range
track_pfn_copy
reserve_pfn_range
A WC map request was tracked as WC in memtype, which set a PTE as
UC (pgprot) per __cachemode2pte_tbl[]. This led to this error in
reserve_pfn_range() called from track_pfn_copy(), which obtained
a pgprot from a PTE. It converts pgprot to page_cache_mode, which
does not necessarily result in the original page_cache_mode since
__cachemode2pte_tbl[] redirects multiple types to UC.
#2. error path in copy_process() then hit WARN_ON_ONCE in
untrack_pfn().
x86/PAT: Xorg:509 map pfn expected mapping type uncached-
minus for [mem 0xfd000000-0xfdffffff], got write-combining
Call Trace:
dump_stack
warn_slowpath_common
? untrack_pfn
? untrack_pfn
warn_slowpath_null
untrack_pfn
? __kunmap_atomic
unmap_single_vma
? pagevec_move_tail_fn
unmap_vmas
exit_mmap
mmput
copy_process.part.47
_do_fork
SyS_clone
do_syscall_32_irqs_on
entry_INT80_32
These negative effects are caused by two separate bugs, but they
can be addressed in separate patches. Fixing the pat_init() issue
described below addresses the root cause, and avoids Xorg to hit
these cases.
When the CPU does not support MTRRs, MTRR does not call pat_init(),
which leaves PAT enabled without initializing PAT. This pat_init()
issue is a long-standing issue, but manifested as issue #1 (and then
hit issue #2) with the above-mentioned commit because the memtype
now tracks cache attribute with 'page_cache_mode'.
This pat_init() issue existed before the commit, but we used pgprot
in memtype. Hence, we did not have issue #1 before. But WC request
resulted in WT in effect because WC pgrot is actually WT when PAT
is not initialized. This is not how it was designed to work. When
PAT is set to disable properly, WC is converted to UC. The use of
WT can result in a system crash if the target range does not support
WT. Fortunately, nobody ran into such issue before.
To fix this pat_init() issue, PAT code has been enhanced to provide
pat_disable() interface. Call this interface when MTRRs are disabled.
By setting PAT to disable properly, PAT bypasses the memtype check,
and avoids issue #1.
[1]: https://lkml.org/lkml/2016/3/3/828
[2]: https://lkml.org/lkml/2016/3/4/775
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-5-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 asm updates from Ingo Molnar:
"This is another big update. Main changes are:
- lots of x86 system call (and other traps/exceptions) entry code
enhancements. In particular the complex parts of the 64-bit entry
code have been migrated to C code as well, and a number of dusty
corners have been refreshed. (Andy Lutomirski)
- vDSO special mapping robustification and general cleanups (Andy
Lutomirski)
- cpufeature refactoring, cleanups and speedups (Borislav Petkov)
- lots of other changes ..."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
x86/cpufeature: Enable new AVX-512 features
x86/entry/traps: Show unhandled signal for i386 in do_trap()
x86/entry: Call enter_from_user_mode() with IRQs off
x86/entry/32: Change INT80 to be an interrupt gate
x86/entry: Improve system call entry comments
x86/entry: Remove TIF_SINGLESTEP entry work
x86/entry/32: Add and check a stack canary for the SYSENTER stack
x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
x86/entry: Vastly simplify SYSENTER TF (single-step) handling
x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
x86/entry/32: Restore FLAGS on SYSEXIT
x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
selftests/x86: In syscall_nt, test NT|TF as well
x86/asm-offsets: Remove PARAVIRT_enabled
x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
uprobes: __create_xol_area() must nullify xol_mapping.fault
x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
...
- Use the more current logging style pr_<level>(...) instead of the old
printk(KERN_<LEVEL> ...).
- Convert pr_warning() to pr_warn().
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454384702-21707-1-git-send-email-slaoub@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move them to a separate header and have the following
dependency:
x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h
This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- make the debugfs 'kernel_page_tables' file read-only, as it only
has read ops. (Borislav Petkov)
- micro-optimize clflush_cache_range() (Chris Wilson)
- swiotlb enhancements, which fixes certain KVM emulated devices
(Igor Mammedov)
- fix an LDT related debug message (Jan Beulich)
- modularize CONFIG_X86_PTDUMP (Kees Cook)
- tone down an overly alarming warning (Laura Abbott)
- Mark variable __initdata (Rasmus Villemoes)
- PAT additions (Toshi Kani)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Micro-optimise clflush_cache_range()
x86/mm/pat: Change free_memtype() to support shrinking case
x86/mm/pat: Add untrack_pfn_moved for mremap
x86/mm: Drop WARN from multi-BAR check
x86/LDT: Print the real LDT base address
x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN
x86/mm: Introduce max_possible_pfn
x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only
x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata
x86/mm: Turn CONFIG_X86_PTDUMP into a module
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.
The remaining ones need more careful inspection before a conversion can
happen. On the TODO.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
'range_new' doesn't seem to be used after init. It is only passed
to memset(), sum_ranges(), memcmp() and x86_get_mtrr_mem_range(), the
latter of which also only passes it on to various *range*
library functions.
So mark it __initdata to free up an extra page after init.
Its contents are wiped at every call to mtrr_calc_range_state(),
so it being static is not about preserving state between calls,
but simply to avoid a 4k+ stack frame. While there, add a
comment explaining this and why it's safe.
We could also mark nr_range_new as __initdata, but since it's
just a single int and also doesn't carry state between calls (it
is unconditionally assigned to before it is read), we might as
well make it an ordinary automatic variable.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/1449002691-20783-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The effort to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the effort done now, hide direct MTRR access for
drivers.
The legacy user-space /proc/mtrr ABI is not affected.
Update x86 documentation on MTRR to reflect the completion of
the phasing out of direct access to MTRR, also add a note on
platform firmware code use of MTRRs based on the obituary
discussion of MTRRs on Linux [0].
[0] http://lkml.kernel.org/r/1438991330.3109.196.camel@hp.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: <syrjala@sci.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: airlied@linux.ie
Cc: benh@kernel.crashing.org
Cc: bhelgaas@google.com
Cc: dan.j.williams@intel.com
Cc: konrad.wilk@oracle.com
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: mst@redhat.com
Cc: netdev@vger.kernel.org
Cc: vinod.koul@intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1440443613-13696-12-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We use pat_enabled in x86-specific code to see if PAT is enabled
or not but we're granting full access to it even though readers
do not need to set it. If, for instance, we granted access to it
to modules later they then could override the variable
setting... no bueno.
This renames pat_enabled to a new static variable __pat_enabled.
Folks are redirected to use pat_enabled() now.
Code that sets this can only be internal to pat.c. Apart from
the early kernel parameter "nopat" to disable PAT, we also have
a few cases that disable it later and make use of a helper
pat_disable(). It is wrapped under an ifdef but since that code
cannot run unless PAT was enabled its not required to wrap it
with ifdefs, unwrap that. Likewise, since "nopat" doesn't really
change non-PAT systems just remove that ifdef as well.
Although we could add and use an early_param_off(), these
helpers don't use __read_mostly but we want to keep
__read_mostly for __pat_enabled as this is a hot path -- upon
boot, for instance, a simple guest may see ~4k accesses to
pat_enabled(). Since __read_mostly early boot params are not
that common we don't add a helper for them just yet.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end
up with a system with MTRR functionality disabled but PAT
functionality enabled. This can happen, for instance, when the
Xen hypervisor is used where MTRRs are not supported but PAT is.
This can happen on Linux as of commit
47591df505 ("xen: Support Xen pv-domains using PAT")
by Juergen, introduced in v3.19.
Technically, we should assume the proper CPU bits would be set
to disable MTRRs but we can't always rely on this. At least on
the Xen Hypervisor, for instance, only X86_FEATURE_MTRR was
disabled as of Xen 4.4 through Xen commit 586ab6a [0], but not
X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, or
X86_FEATURE_CYRIX_ARR for instance.
Roger Pau Monné has clarified though that although this is
technically true we will never support PVH on these CPU types so
Xen has no need to disable these bits on those systems. As per
Roger, AMD K6, Centaur and VIA chips don't have the necessary
hardware extensions to allow running PVH guests [1].
As per Toshi it is also possible for the BIOS to disable MTRR
support, in such cases get_mtrr_state() would update the MTRR
state as per the BIOS, we need to propagate this information as
well.
x86 MTRR code relies on quite a bit of checks for mtrr_if being
set to check to see if MTRRs did get set up. Instead, lets
provide a generic getter for that. This also adds a few checks
where they were not before which could potentially safeguard
ourselves against incorrect usage of MTRR where this was not
desirable.
Where possible match error codes as if MTRRs were disabled on
arch/x86/include/asm/mtrr.h.
Lastly, since disabling MTRRs can happen at run time and we
could end up with PAT enabled, best record now in our logs when
MTRRs are disabled.
[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18
[1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.html
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: bhelgaas@google.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: konrad.wilk@oracle.com
Cc: venkatesh.pallipadi@intel.com
Cc: ville.syrjala@linux.intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is only one user but since we're going to bury MTRR next
out of access to drivers, expose this last piece of API to
drivers in a general fashion only needing io.h for access to
helpers.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As part of the effort to phase out MTRR use document
write-combining MTRR effects on pages with different non-PAT
page attributes flags and different PAT entry values. Extend
arch_phys_wc_add() documentation to clarify power of two sizes /
boundary requirements as we phase out mtrr_add() use.
Lastly hint towards ioremap_uc() for corner cases on device
drivers working with devices with mixed regions where MTRR size
requirements would otherwise not enable write-combining
effective memory types.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds the argument 'uniform' to mtrr_type_lookup(),
which gets set to 1 when a given range is covered uniformly by
MTRRs, i.e. the range is fully covered by a single MTRR entry or
the default type.
Change pud_set_huge() and pmd_set_huge() to honor the 'uniform'
flag to see if it is safe to create a huge page mapping in the
range.
This allows them to create a huge page mapping in a range
covered by a single MTRR entry of any memory type. It also
detects a non-optimal request properly. They continue to check
with the WB type since it does not effectively change the
uniform mapping even if a request spans multiple MTRR entries.
pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case. Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
MTRRs contain fixed and variable entries. mtrr_type_lookup() may
repeatedly call __mtrr_type_lookup() to handle a request that
overlaps with variable entries.
However, __mtrr_type_lookup() also handles the fixed entries,
which do not have to be repeated. Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.
The patch also updates the function headers to clarify the
return values and output argument. It updates comments to
clarify that the repeating is necessary to handle overlaps with
the default type, since overlaps with multiple entries alone can
be handled without such repeating.
There is no functional change in this patch.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
mtrr_type_lookup() returns verbatim 0xFF when MTRRs are
disabled. This patch defines MTRR_TYPE_INVALID to clarify the
meaning of this value, and documents its usage.
Document the return values of the kernel virtual address mapping
helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().
There is no functional change in this patch.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType. Intel SDM,
section 11.11.2.1, defines these flags as follows:
- All MTRRs are disabled when the E flag is clear.
The FE flag has no affect when the E flag is clear.
- The default type is enabled when the E flag is set.
- MTRR variable ranges are enabled when the E flag is set.
- MTRR fixed ranges are enabled when both E and FE flags
are set.
MTRR state checks in __mtrr_type_lookup() do not match with SDM.
Hence, this patch makes the following changes:
- The current code detects MTRRs disabled when both E and
FE flags are clear in mtrr_state.enabled. Fix to detect
MTRRs disabled when the E flag is clear.
- The current code does not check if the FE bit is set in
mtrr_state.enabled when looking at the fixed entries.
Fix to check the FE flag.
- The current code returns the default type when the E flag
is clear in mtrr_state.enabled. However, the default type
is UC when the E flag is clear. Remove the code as this
case is handled as MTRR disabled with the 1st change.
In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
- FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
- E flag: MTRR_STATE_MTRR_ENABLED
print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When an MTRR entry is inclusive to a requested range, i.e. the
start and end of the request are not within the MTRR entry range
but the range contains the MTRR entry entirely:
range_start ... [mtrr_start ... mtrr_end] ... range_end
__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.
This bug can cause the following issues:
1) reserve_memtype() tracks an effective memory type in case
a request type is WB (ex. /dev/mem blindly uses WB). Missing
to track with its effective type causes a subsequent request
to map the same range with the effective type to fail.
2) pud_set_huge() and pmd_set_huge() check if a requested range
has any overlap with MTRRs. Missing to detect an overlap may
cause a performance penalty or undefined behavior.
This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case. This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__mtrr_type_lookup() checks MTRR fixed ranges when mtrr_state.have_fixed
is set and start is less than 0x100000.
However, the 'else if (start < 0x1000000)' in the code checks with an
incorrect address as it has an extra-zero in the address.
The code still runs correctly as this check is meaningless, though.
This patch replaces the incorrect address check with 'else' with no
condition.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1427234921-19737-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1431332153-18566-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The seq_printf return value, because it's frequently misused,
will eventually be converted to void.
See: commit 1f33c41c03 ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Context switches and TLB flushes can change individual bits of CR4.
CR4 reads take several cycles, so store a shadow copy of CR4 in a
per-cpu variable.
To avoid wasting a cache line, I added the CR4 shadow to
cpu_tlbstate, which is already touched in switch_mm. The heaviest
users of the cr4 shadow will be switch_mm and __switch_to_xtra, and
__switch_to_xtra is called shortly after switch_mm during context
switch, so the cacheline is likely to be hot.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The original motivation for these patches was for an Intel CPU
feature called MPX. The patch to add a disabled feature for it
will go in with the other parts of the support.
But, in the meantime, there are a few other features than MPX
that we can make assumptions about at compile-time based on
compile options. Add them to disabled-features.h and check them
with cpu_feature_enabled().
Note that this gets rid of the last things that needed an #ifdef
CONFIG_X86_64 in cpufeature.h. Yay!
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm:
vmstats: tlb flush counters") to cause overhead problems.
The counters are undeniably useful but how often do we really
need to debug TLB flush related issues? It does not justify
taking the penalty everywhere so make it a debugging option.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
for SMP.
UP systems do not do remote TLB flushes, so compile those counters out on
UP.
arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly. This is
probably an optimization since both the mtrr code and __flush_tlb() write
cr4. It would probably be safe to make that a flush_tlb_all() (and then
get these statistics), but the mtrr code is ancient and I'm hesitant to
touch it other than to just stick in the counters.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm updates from Dave Airlie:
"Okay this is the big one, I was stalled on the fbdev pull req as I
stupidly let fbdev guys merge a patch I required to fix a warning with
some patches I had, they ended up merging the patch from the wrong
place, but the warning should be fixed. In future I'll just take the
patch myself!
Outside drm:
There are some snd changes for the HDMI audio interactions on haswell,
they've been acked for inclusion via my tree. This relies on the
wound/wait tree from Ingo which is already merged.
Major changes:
AMD finally released the dynamic power management code for all their
GPUs from r600->present day, this is great, off by default for now but
also a huge amount of code, in fact it is most of this pull request.
Since it landed there has been a lot of community testing and Alex has
sent a lot of fixes for any bugs found so far. I suspect radeon might
now be the biggest kernel driver ever :-P p.s. radeon.dpm=1 to enable
dynamic powermanagement for anyone.
New drivers:
Renesas r-car display unit.
Other highlights:
- core: GEM CMA prime support, use new w/w mutexs for TTM
reservations, cursor hotspot, doc updates
- dvo chips: chrontel 7010B support
- i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell),
Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp
support (this time for sure)
- nouveau: async buffer object deletion, context/register init
updates, kernel vp2 engine support, GF117 support, GK110 accel
support (with external nvidia ucode), context cleanups.
- exynos: memory leak fixes, Add S3C64XX SoC series support, device
tree updates, common clock framework support,
- qxl: cursor hotspot support, multi-monitor support, suspend/resume
support
- mgag200: hw cursor support, g200 mode limiting
- shmobile: prime support
- tegra: fixes mostly
I've been banging on this quite a lot due to the size of it, and it
seems to okay on everything I've tested it on."
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits)
drm/radeon/dpm: implement vblank_too_short callback for si
drm/radeon/dpm: implement vblank_too_short callback for cayman
drm/radeon/dpm: implement vblank_too_short callback for btc
drm/radeon/dpm: implement vblank_too_short callback for evergreen
drm/radeon/dpm: implement vblank_too_short callback for 7xx
drm/radeon/dpm: add checks against vblank time
drm/radeon/dpm: add helper to calculate vblank time
drm/radeon: remove stray line in old pm code
drm/radeon/dpm: fix display_gap programming on rv7xx
drm/nvc0/gr: fix gpc firmware regression
drm/nouveau: fix minor thinko causing bo moves to not be async on kepler
drm/radeon/dpm: implement force performance level for TN
drm/radeon/dpm: implement force performance level for ON/LN
drm/radeon/dpm: implement force performance level for SI
drm/radeon/dpm: implement force performance level for cayman
drm/radeon/dpm: implement force performance levels for 7xx/eg/btc
drm/radeon/dpm: add infrastructure to force performance levels
drm/radeon: fix surface setup on r1xx
drm/radeon: add support for 3d perf states on older asics
drm/radeon: set default clocks for SI when DPM is disabled
...
Pull x86 mm changes from Ingo Molnar:
"Misc improvements:
- Fix /proc/mtrr reporting
- Fix ioremap printout
- Remove the unused pvclock fixmap entry on 32-bit
- misc cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioremap: Correct function name output
x86: Fix /proc/mtrr with base/size more than 44bits
ix86: Don't waste fixmap entries
x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
x86_64: Correct phys_addr in cleanup_highmap comment
Pull asm/x86 changes from Ingo Molnar:
"Misc changes, with a bigger processor-flags cleanup/reorganization by
Peter Anvin"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, asm, cleanup: Replace open-coded control register values with symbolic
x86, processor-flags: Fix the datatypes and add bit number defines
x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
linux/const.h: Add _BITUL() and _BITULL()
x86/vdso: Convert use of typedef ctl_table to struct ctl_table
x86: __force_order doesn't need to be an actual variable
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEbBAABAgAGBQJRxf9cAAoJEHm+PkMAQRiGMWkH911xM4gRmFgE7SqVW4F4AWBm
ngcqMqNy9IdqKfibORUUDvVfEa5gjD5ai2quIKpfQiaukbpQJ696H90ijuAkajLn
DQBrN243s0pzhhc/quWINnWxsFQ613JjdUMUMaD7e9A1aKjYzWrPGt/tSjrFXGCP
tArTupVzc/iOmnEQDKiROI/Nokq44QJ36aTGPM7n08xMtpKmkCXM+9/UosBteB0O
HVI33dmjwz7i55fI53XAWyuZCE+gSEnA4z8spJ9LfXso2W14V+roc+GuL6OyeeTI
pCn/+4niVPb4B0ROZlpyVmdZjbPPcMMEK5o+BSJI68SH6LHZTQh2iVuqYfpSyA==
=uUH5
-----END PGP SIGNATURE-----
Merge tag 'v3.10-rc7' into drm-next
Linux 3.10-rc7
The sdvo lvds fix in this -fixes pull
commit c3456fb3e4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Jun 10 09:47:58 2013 +0200
drm/i915: prefer VBT modes for SVDO-LVDS over EDID
has a silent functional conflict with
commit 990256aec2
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Fri May 31 12:17:07 2013 +0000
drm: Add probed modes in probe order
in drm-next. W simply need to add the vbt modes before edid modes, i.e. the
other way round than now.
Conflicts:
drivers/gpu/drm/drm_prime.c
drivers/gpu/drm/i915/intel_sdvo.c
On one sytem that mtrr range is more then 44bits, in dmesg we have
[ 0.000000] MTRR default type: write-back
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-DFFFF write-through
[ 0.000000] E0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[ 0.000000] 1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[ 0.000000] 2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[ 0.000000] 3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[ 0.000000] 4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[ 0.000000] 5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[ 0.000000] 6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[ 0.000000] 7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[ 0.000000] 8 disabled
[ 0.000000] 9 disabled
but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size= 16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size= 16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size= 32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size= 1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size= 16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size= 16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size= 16MB, count=1: write-combining
so bit 44 and bit 45 get cut off.
We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.
2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.
Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.
So When are we going to have size more than 44bits? that is 16TiB.
after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size= 16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size= 16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size= 32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size= 1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size= 16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size= 16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size= 16MB, count=1: write-combining
-v2: simply checking in mtrr_add_page according to hpa.
[ hpa: This probably wants to go into -stable only after having sat in
mainline for a bit. It is not a regression. ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge
during add range) broke mtrr cleanup on his setup in 3.9.5.
corresponding commit in upstream is fbe06b7bae.
*BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G
https://bugzilla.kernel.org/show_bug.cgi?id=59491
So it rejects new var mtrr layout.
It turns out we have some problem with initial mtrr range retrieval.
The current sequence is:
x86_get_mtrr_mem_range
==> bunchs of add_range_with_merge
==> bunchs of subract_range
==> clean_sort_range
add_range_with_merge for [0,1M)
sort_range()
add_range_with_merge could have blank slots, so we can not just
sort only, that will have final result have extra blank slot in head.
So move that calling add_range_with_merge for [0,1M), with that we
could avoid extra clean_sort_range calling.
Reported-by: Joshua Covington <joshuacov@googlemail.com>
Tested-by: Joshua Covington <joshuacov@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371154622-8929-2-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Several drivers currently use mtrr_add through various #ifdef guards
and/or drm wrappers. The vast majority of them want to add WC MTRRs
on x86 systems and don't actually need the MTRR if PAT (i.e.
ioremap_wc, etc) are working.
arch_phys_wc_add and arch_phys_wc_del are new functions, available
on all architectures and configurations, that add WC MTRRs on x86 if
needed (and handle errors) and do nothing at all otherwise. They're
also easier to use than mtrr_add and mtrr_del, so the call sites can
be simplified.
As an added benefit, this will avoid wasting MTRRs and possibly
warning pointlessly on PAT-supporting systems.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...
high_width can be easily calculated in a single expression when
making use of __ffs64().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4FF71053020000780008E1B5@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the variable operated on being of "unsigned long" type,
neither ffs() nor fls() are suitable to use on them, as those
truncate their arguments to 32 bits. Using __ffs() and __fls()
respectively at once eliminates the need to subtract 1 from their
results.
Additionally, with the alignment value subsequently used as a
shift count, it must be enforced to be less than BITS_PER_LONG
(and on 64-bit there's no need for it to be any smaller).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4FF70D54020000780008E179@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When boot on sun G5+ with 4T mem, see an overflow in mtrr cleanup as below.
*BAD*gran_size: 2G chunk_size: 2G num_reg: 10 lose cover RAM:
-18014398505283592M
This is because 1<<31 sign extended. Use an unsigned long constant to
fix it. Useful for mem larger than or equal to 4T.
-v2: Use 64bit constant instead of explicit type conversion as suggested
by Yinghai. Description updated too.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: http://lkml.kernel.org/r/4FC5A77F.6060505@oracle.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Pull x32 support for x86-64 from Ingo Molnar:
"This tree introduces the X32 binary format and execution mode for x86:
32-bit data space binaries using 64-bit instructions and 64-bit kernel
syscalls.
This allows applications whose working set fits into a 32 bits address
space to make use of 64-bit instructions while using a 32-bit address
space with shorter pointers, more compressed data structures, etc."
Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}
* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
x32: Fix alignment fail in struct compat_siginfo
x32: Fix stupid ia32/x32 inversion in the siginfo format
x32: Add ptrace for x32
x32: Switch to a 64-bit clock_t
x32: Provide separate is_ia32_task() and is_x32_task() predicates
x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
x86/x32: Fix the binutils auto-detect
x32: Warn and disable rather than error if binutils too old
x32: Only clear TIF_X32 flag once
x32: Make sure TS_COMPAT is cleared for x32 tasks
fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
fs: Fix close_on_exec pointer in alloc_fdtable
x32: Drop non-__vdso weak symbols from the x32 VDSO
x32: Fix coding style violations in the x32 VDSO code
x32: Add x32 VDSO support
x32: Allow x32 to be configured
x32: If configured, add x32 system calls to system call tables
x32: Handle process creation
x32: Signal-related system calls
x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h>
...
Specify the data structures for the 64-bit ioctls with explicit sizing
and padding so that the x32 kernel will correctly use the 64-bit forms
of these ioctls. Note that these ioctls are bogus in both forms on
both 32 and 64 bits; even on 64 bits the maximum MTRR size is only 44
bits long.
Note that nothing really is supposed to use these ioctls and that the
preferred interface is text strings on /proc/mtrr, or better yet,
nothing at all (use /sys/bus/pci/devices/*/resource*_wc for write
combining; that uses PAT not MTRRs.)
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Tested-by: Nitin A. Kamble <nitin.a.kamble@intel.com>
Link: http://lkml.kernel.org/n/tip-vwvnlu3hjmtkwvij4qxtm90l@git.kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Following is from Notes of section 11.5.3 of Intel processor
manual available at:
http://www.intel.com/Assets/PDF/manual/325384.pdf
For the Pentium 4 and Intel Xeon processors, after the sequence of
steps given above has been executed, the cache lines containing the
code between the end of the WBINVD instruction and before the
MTRRS have actually been disabled may be retained in the cache
hierarchy. Here, to remove code from the cache completely, a
second WBINVD instruction must be executed after the MTRRs have
been disabled.
This patch provides resolution for that.
Ideally, I will like to make changes only for Pentium 4 and Xeon
processors. But, I am not finding easier way to do it.
And, extra wbinvd() instruction does not hurt much for other
processors.
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Link: http://lkml.kernel.org/r/4EBD1CC5.3030008@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While removing custom rendezvous code and switching to stop_machine,
commit 192d885742 ("x86, mtrr: use stop_machine APIs for doing MTRR
rendezvous") completely dropped mtrr setting code on !CONFIG_SMP
breaking MTRR settting on UP.
Fix it by removing the incorrect CONFIG_SMP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Anders Eriksson <aeriksson@fastmail.fm>
Tested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so
it wasn't converted by commit 44c10138fd ("PCI: Change all
drivers to use pci_device->revision") before being moved to
arch/x86/...
Do it now at last -- and save one level of indentation...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/201107012242.08347.sshtylyov@ru.mvista.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
MTRR rendezvous sequence is not implemened using stop_machine() before, as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
Now that we have a new stop_machine_from_inactive_cpu() API, use it for
rendezvous during mtrr init of a logical processor that is coming online.
For the rest (runtime MTRR modification, system boot, resume paths), use
stop_machine() to implement the rendezvous sequence. This will consolidate and
cleanup the code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182057.076997177@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
MTRR rendezvous sequence using stop_one_cpu_nowait() can potentially
happen in parallel with another system wide rendezvous using
stop_machine(). This can lead to deadlock (The order in which
works are queued can be different on different cpu's. Some cpu's
will be running the first rendezvous handler and others will be running
the second rendezvous handler. Each set waiting for the other set to join
for the system wide rendezvous, leading to a deadlock).
MTRR rendezvous sequence is not implemented using stop_machine() as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
stop_machine() works with only online cpus.
For now, take the stop_machine mutex in the MTRR rendezvous sequence that
gets called from an online cpu (here we are in the process context
and can potentially sleep while taking the mutex). And the MTRR rendezvous
that gets triggered during cpu online doesn't need to take this stop_machine
lock (as the stop_machine() already ensures that there is no cpu hotplug
going on in parallel by doing get_online_cpus())
TBD: Pursue a cleaner solution of extending the stop_machine()
infrastructure to handle the case where the calling cpu is
still not online and use this for MTRR rendezvous sequence.
fixes: https://bugzilla.novell.com/show_bug.cgi?id=672008
Reported-by: Vadim Kotelnikov <vadimuzzz@inbox.ru>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182056.807230326@sbsiddha-MOBL3.sc.intel.com
Cc: stable@kernel.org # 2.6.35+, backport a week or two after this gets more testing in mainline
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
On laptops with core i5/i7, there were reports that after resume
graphics workloads were performing poorly on a specific AP, while
the other cpu's were ok. This was observed on a 32bit kernel
specifically.
Debug showed that the PAT init was not happening on that AP
during resume and hence it contributing to the poor workload
performance on that cpu.
On this system, resume flow looked like this:
1. BP starts the resume sequence and we reinit BP's MTRR's/PAT
early on using mtrr_bp_restore()
2. Resume sequence brings all AP's online
3. Resume sequence now kicks off the MTRR reinit on all the AP's.
4. For some reason, between point 2 and 3, we moved from BP
to one of the AP's. My guess is that printk() during resume
sequence is contributing to this. We don't see similar
behavior with the 64bit kernel but there is no guarantee that
at this point the remaining resume sequence (after AP's bringup)
has to happen on BP.
5. set_mtrr() was assuming that we are still on BP and skipped the
MTRR/PAT init on that cpu (because of 1 above)
6. But we were on an AP and this led to not reprogramming PAT
on this cpu leading to bad performance.
Fix this by doing unconditional mtrr_if->set_all() in set_mtrr()
during MTRR/PAT init. This might be unnecessary if we are still
running on BP. But it is of no harm and will guarantee that after
resume, all the cpu's will be in sync with respect to the
MTRR/PAT registers.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Keith Packard <keithp@keithp.com>
Cc: stable@kernel.org [v2.6.32+]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Some subsystems in the x86 tree need to carry out suspend/resume and
shutdown operations with one CPU on-line and interrupts disabled and
they define sysdev classes and sysdevs or sysdev drivers for this
purpose. This leads to unnecessarily complicated code and excessive
memory usage, so switch them to using struct syscore_ops objects for
this purpose instead.
Generally, there are three categories of subsystems that use
sysdevs for implementing PM operations: (1) subsystems whose
suspend/resume callbacks ignore their arguments entirely (the
majority), (2) subsystems whose suspend/resume callbacks use their
struct sys_device argument, but don't really need to do that,
because they can be implemented differently in an arguably simpler
way (io_apic.c), and (3) subsystems whose suspend/resume callbacks
use their struct sys_device argument, but the value of that argument
is always the same and could be ignored (microcode_core.c). In all
of these cases the subsystems in question may be readily converted to
using struct syscore_ops objects for power management and shutdown.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
They were generated by 'codespell' and then manually reviewed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Markus Kohn ran into a hard hang regression on an acer aspire
1310, when acpi is enabled. git bisect showed the following
commit as the bad one that introduced the boot regression.
commit d0af9eed5a
Author: Suresh Siddha <suresh.b.siddha@intel.com>
Date: Wed Aug 19 18:05:36 2009 -0700
x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
Because of the UP configuration of that platform,
native_smp_prepare_cpus() bailed out (in smp_sanity_check())
before doing the set_mtrr_aps_delayed_init()
Further down the boot path, native_smp_cpus_done() will call the
delayed MTRR initialization for the AP's (mtrr_aps_init()) with
mtrr_aps_delayed_init not set. This resulted in the boot
processor reprogramming its MTRR's to the values seen during the
start of the OS boot. While this is not needed ideally, this
shouldn't have caused any side-effects. This is because the
reprogramming of MTRR's (set_mtrr_state() that gets called via
set_mtrr()) will check if the live register contents are
different from what is being asked to write and will do the actual
write only if they are different.
BP's mtrr state is read during the start of the OS boot and
typically nothing would have changed when we ask to reprogram it
on BP again because of the above scenario on an UP platform. So
on a normal UP platform no reprogramming of BP MTRR MSR's
happens and all is well.
However, on this platform, bios seems to be modifying the fixed
mtrr range registers between the start of OS boot and when we
double check the live registers for reprogramming BP MTRR
registers. And as the live registers are modified, we end up
reprogramming the MTRR's to the state seen during the start of
the OS boot.
During ACPI initialization, something in the bios (probably smi
handler?) don't like this fact and results in a hard lockup.
We didn't see this boot hang issue on this platform before the
commit d0af9eed5a, because only
the AP's (if any) will program its MTRR's to the value that BP
had at the start of the OS boot.
Fix this issue by checking mtrr_aps_delayed_init before
continuing further in the mtrr_aps_init(). Now, only AP's (if
any) will program its MTRR's to the BP values during boot.
Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393
[ By the way, this behavior of the bios modifying MTRR's after the start
of the OS boot is not common and the kernel is not prepared to
handle this situation well. Irrespective of this issue, during
suspend/resume, linux kernel will try to reprogram the BP's MTRR values
to the values seen during the start of the OS boot. So suspend/resume might
be already broken on this platform for all linux kernel versions. ]
Reported-and-bisected-by: Markus Kohn <jabber@gmx.org>
Tested-by: Markus Kohn <jabber@gmx.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Thomas Renninger <trenn@novell.com>
Cc: Rafael Wysocki <rjw@novell.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: stable@kernel.org # [v2.6.32+]
LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mtrr: Support mtrr lookup for range spanning across MTRR range
x86, mtrr: Refactor MTRR type overlap check code
Instead of adapting the CPU family check in amd_special_default_mtrr()
for each new CPU family assume that all new AMD CPUs support the
necessary bits in SYS_CFG MSR.
Tom2Enabled is architectural (defined in APM Vol.2).
Tom2ForceMemTypeWB is defined in all BKDGs starting with K8 NPT.
In pre K8-NPT BKDG this bit is reserved (read as zero).
W/o this adaption Linux would unnecessarily complain about bad MTRR
settings on every new AMD CPU family, e.g.
[ 0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 4863MB of RAM.
Cc: stable@kernel.org # .32.x, .35.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123235.GB20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
mtrr_type_lookup [start:end] looked up the resultant MTRR type for that
range, based on fixed and all variable MTRR ranges. It did check for multiple
MTRR var ranges overlapping [start:end] and returned the net type.
However, if the [start:end] range spanned across any var MTRR range,
mtrr_type_lookup would return an error return of 0xFE. This was based on
typical usage of mtrr_type_lookup in PAT mapping, where region being
mapped would not normally span across MTRR ranges and also trying
to keep the code simple.
Mark recently reported the problem with this limitation. When there are
two continguous MTRR's of type "writeback" and if there is a memory mapping
over a region starting in one MTRR range and ending in another MTRR range,
such mapping will fallback to "uncached" due to the above limitation.
Change below adds support for such lookups spanning multiple MTRR ranges.
We now have a wrapper mtrr_type_lookup that dynamically splits such a region
into smaller chunks that fit within one MTRR range and does a
__mtrr_type_lookup on it and combine the results later.
Reported-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <1284159350-19841-3-git-send-email-venki@google.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Move the MTRR type overlap check into a new function. No functional change in
this patch. Just making it easier to add multiple region overlap check in
the following patch.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <1284159350-19841-2-git-send-email-venki@google.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Clean up arch/x86/kernel/cpu/mtrr/cleanup.c: use ";" not "," to terminate statements
* 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, vmware: Preset lpj values when on VMware.
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mtrr: Use stop machine context to rendezvous all the cpu's
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/apic/es7000_32: Remove unused variable
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Avoid unnecessary __clear_user() and xrstor in signal handling
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, vdso: Unmap vdso pages
Use the stop machine context rather than IPI's to rendezvous all the cpus for
MTRR initialization that happens during cpu bringup or for MTRR modifications
during runtime.
This avoids deadlock scenario (reported by Prarit) like:
cpu A holds a read_lock (tasklist_lock for example) with irqs enabled
cpu B waits for the same lock with irqs disabled using write_lock_irq
cpu C doing set_mtrr() (during AP bringup for example), which will try to
rendezvous all the cpus using IPI's
This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the lock and thus not
reaching the rendezvous point.
Using stop cpu (run in the process context of per cpu based keventd) to do
this rendezvous, avoids this deadlock scenario.
Also make sure all the cpu's are in the rendezvous handler before we proceed
with the local_irq_save() on each cpu. This lock step disabling irqs on all
the cpus will avoid other deadlock scenarios (for example involving
with the blocking smp_call_function's etc).
[ This problem is very old. Marking -stable only for 2.6.35 as the
stop_one_cpu_nowait() API is present only in 2.6.35. Any older
kernel interested in this fix need to do some more work in backporting
this patch. ]
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1280515602.2682.10.camel@sbsiddha-MOBL3.sc.intel.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@kernel.org [2.6.35]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Just some dead code, no real bugs.
Found by gcc 4.6 -Wall
Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <201007202219.o6KMJnQ0021072@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Also needed if pr_<level> becomes a bit more space efficient.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <1277768808.29157.280.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Fix missing kernel-doc notation in mtrr/main.c:
Warning(arch/x86/kernel/cpu/mtrr/main.c:152): No description found for parameter 'info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>