Use instrument_atomic_read_write() for atomic RMW ops.
Cc: Will Deacon <will@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Use the new instrument_read_write() where appropriate.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
quite a few architectures have the same csum_partial_copy_nocheck() -
simply memcpy() the data and then return the csum of the copy.
hexagon, parisc, ia64, s390, um: explicitly spelled out that way.
arc, arm64, csky, h8300, m68k/nommu, microblaze, mips/GENERIC_CSUM, nds32,
nios2, openrisc, riscv, unicore32: end up picking the same thing spelled
out in lib/checksum.h (with varying amounts of perversions along the way).
everybody else (alpha, arm, c6x, m68k/mmu, mips/!GENERIC_CSUM, powerpc,
sh, sparc, x86, xtensa) have non-generic variants. For all except c6x
the declaration is in their asm/checksum.h. c6x uses the wrapper
from asm-generic/checksum.h that would normally lead to the lib/checksum.h
instance, but in case of c6x we end up using an asm function from arch/c6x
instead.
Screw that mess - have architectures with private instances define
_HAVE_ARCH_CSUM_AND_COPY in their asm/checksum.h and have the default
one right in net/checksum.h conditional on _HAVE_ARCH_CSUM_AND_COPY
*not* defined.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Patch series "iomap: Constify ioreadX() iomem argument", v3.
The ioread8/16/32() and others have inconsistent interface among the
architectures: some taking address as const, some not.
It seems there is nothing really stopping all of them to take pointer to
const.
This patch (of 4):
The ioreadX() and ioreadX_rep() helpers have inconsistent interface. On
some architectures void *__iomem address argument is a pointer to const,
on some not.
Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.
[krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
[akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.com
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the patch [1], building the kernel using a toolchain built with
binutils 2.33.1 prevents booting a sh4 system under Qemu. Apply the patch
provided by Alan Modra [2] that fix alignment of rodata.
[1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
[2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.html
Signed-off-by: Romain Naour <romain.naour@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Link: https://marc.info/?l=linux-sh&m=158429470221261
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The #ifdef statement that guards the generic version of pud_alloc_one() by
mistake used __HAVE_ARCH_PUD_FREE instead of __HAVE_ARCH_PUD_ALLOC_ONE.
Fix it.
Fixes: d9e8b92967 ("asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200812191415.GE163101@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few patches all over the place during this cycle, mostly bug and
sparse warning fixes for OpenRISC, but a few enhancements too. Note,
there are 2 non OpenRISC specific fixups.
Non OpenRISC fixes:
- In init we need to align the init_task correctly to fix an issue with
MUTEX_FLAGS, reviewed by Peter Z. No one picked this up so I kept it
on my tree.
- In asm-generic/io.h I fixed up some sparse warnings, OK'd by Arnd.
Arnd asked to merge it via my tree.
OpenRISC fixes:
- Many fixes for OpenRISC sprase warnings.
- Add support OpenRISC SMP tlb flushing rather than always flushing the
entire TLB on every CPU.
- Fix bug when dumping stack via /proc/xxx/stack of user threads.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAl829wgACgkQw7McLV5m
J+RvMg/+Ik9jmHiCoDilVzB5yqJ0Ea8oLjg75V9eBE3YtnYJMAbDHb8ye2OsYrlp
QhrAHFi8PB7nJQphod3XXt8Y5JWMYjKIgdazybVQtUlD5IAXgYAR6/IxJ1DVzxa0
AzJ7TGmYSxnhW7GzbRU5xjgdIi5cKQjBUcVM/blDQB6/GZ4wY3OBxK1pn0kNXMPU
gnS+0yPDlwXaZw67YmbF5kF34lvEe0knkOaxxP/S0t2ROb6Xu0PJCEDTbdcGApsB
2xdm0dJwK50ulS0/HWxC18vC/R7d1b0qjR+xvisjydHbZawEN2Kcf3mOCSAETSTk
ST1WFxuTAObqdyc4F15tdsqFvbchPtCH9UAjkkSbmRxGVOKQa88NmW1A+s0hj4BX
enf6I9SYzqiU/WkuFDwSnJ4NETOpPaUVqZbi3WTUfljyXmOdqXbT+416YxViOXpx
OtSyGVN18qs8wjsWlWiGyhM/eAnHwr9q0q1kJ8VZTh+nQSnQFmuWjHSfRan2PkmQ
nnbvXHXJcgWYVlk+JZLOnhOB3zrkH5xmlM2UakVUvP92ESnnSmBCC0RLA0k6kGZ3
PkFBbY4etbA7Ug8r1KueOaqHKwJpTpIb4tU75y3KXyi05FeLEln1doC5M4EQUPDy
eXzdWj6afuEKmAPILiEYlSVXO3t8iIncVBkK7isaR37dURNnJWE=
=0MlF
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
"A few patches all over the place during this cycle, mostly bug and
sparse warning fixes for OpenRISC, but a few enhancements too. Note,
there are 2 non OpenRISC specific fixups.
Non OpenRISC fixes:
- In init we need to align the init_task correctly to fix an issue
with MUTEX_FLAGS, reviewed by Peter Z. No one picked this up so I
kept it on my tree.
- In asm-generic/io.h I fixed up some sparse warnings, OK'd by Arnd.
Arnd asked to merge it via my tree.
OpenRISC fixes:
- Many fixes for OpenRISC sprase warnings.
- Add support OpenRISC SMP tlb flushing rather than always flushing
the entire TLB on every CPU.
- Fix bug when dumping stack via /proc/xxx/stack of user threads"
* tag 'for-linus' of git://github.com/openrisc/linux:
openrisc: uaccess: Add user address space check to access_ok
openrisc: signal: Fix sparse address space warnings
openrisc: uaccess: Remove unused macro __addr_ok
openrisc: uaccess: Use static inline function in access_ok
openrisc: uaccess: Fix sparse address space warnings
openrisc: io: Fixup defines and move include to the end
asm-generic/io.h: Fix sparse warnings on big-endian architectures
openrisc: Implement proper SMP tlb flushing
openrisc: Fix oops caused when dumping stack
openrisc: Add support for external initrd images
init: Align init_task to avoid conflict with MUTEX_FLAGS
openrisc: fix __user in raw_copy_to_user()'s prototype
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl82Y6cTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXlcxCACP21ZI7RZvQBcFtTj5MWa0uwoofFqF
JDG0MvZ5zFKIJFX0pwlZIUrZY5aVJ1NwCDgCI0EXbZEazTaNCD2knFPqrLe3WUFY
mSDF9df7oW9UvTe9L4g3rYAdqsrkbgqhBypm9Vpbcazg/Ki6QVCgAhIo1lbq62+m
J2/0kLO1lVY6opr6vyobaWbm/Y4b0fbrx7N6KwUDhZUYGLGKaOc+WvsZinNl4XW6
VPiEVQUApvVxwG43rLNXjPe83DtassJ2GevSS1whXnZ+K0bViWhyYicbqEl9iV1i
nlNIkEMX5A1rdwV1zEAGyY/zWi+fi2+IdKGGEbtyUsely1vHtZuaDCiQ
=DE2Y
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyper-v fixes from Wei Liu:
- fix oops reporting on Hyper-V
- make objtool happy
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Make hv_setup_sched_clock inline
Drivers: hv: vmbus: Only notify Hyper-V for die events that are oops
Pull more parisc updates from Helge Deller:
- Oscar Carter contributed a patch which fixes parisc's usage of
dereference_function_descriptor() and thus will allow using the
-Wcast-function-type compiler option in the top-level Makefile
- Sven Schnelle fixed a bug in the SBA code to prevent crashes during
kexec
- John David Anglin provided implementations for __smp_store_release()
and __smp_load_acquire barriers() which avoids using the sync
assembler instruction and thus speeds up barrier paths
- Some whitespace cleanups in parisc's atomic.h header file
* 'parisc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Implement __smp_store_release and __smp_load_acquire barriers
parisc: mask out enable and reserved bits from sba imask
parisc: Whitespace cleanups in atomic.h
parisc/kernel/ftrace: Remove function callback casts
sections.h: dereference_function_descriptor() returns void pointer
segment_eq is only used to implement uaccess_kernel. Just open code
uaccess_kernel in the arch uaccess headers and remove one layer of
indirection.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function dereference_function_descriptor() takes on hppa64, ppc64
and ia64 a pointer to a function descriptor and returns a (void) pointer
to the dereferenced function.
To make cross-arch coding easier, on all other architectures the
dereference_function_descriptor() macro should return a void pointer
too.
Signed-off-by: Helge Deller <deller@gmx.de>
Merge misc updates from Andrew Morton:
- a few MM hotfixes
- kthread, tools, scripts, ntfs and ocfs2
- some of MM
Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
mm: vmscan: consistent update to pgrefill
mm/vmscan.c: fix typo
khugepaged: khugepaged_test_exit() check mmget_still_valid()
khugepaged: retract_page_tables() remember to test exit
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
khugepaged: collapse_pte_mapped_thp() flush the right range
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
mm: thp: replace HTTP links with HTTPS ones
mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
mm/page_alloc.c: skip setting nodemask when we are in interrupt
mm/page_alloc: fallbacks at most has 3 elements
mm/page_alloc: silence a KASAN false positive
mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
mm/page_alloc.c: simplify pageblock bitmap access
mm/page_alloc.c: extract the common part in pfn_to_bitidx()
mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
mm/shuffle: remove dynamic reconfiguration
mm/memory_hotplug: document why shuffle_zone() is relevant
mm/page_alloc: remove nr_free_pagecache_pages()
mm: remove vm_total_pages
...
Most architectures define pgd_free() as a wrapper for free_page().
Provide a generic version in asm-generic/pgalloc.h and enable its use for
most architectures.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several architectures define pud_alloc_one() as a wrapper for
__get_free_page() and pud_free() as a wrapper for free_page().
Provide a generic implementation in asm-generic/pgalloc.h and use it where
appropriate.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-6-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For most architectures that support >2 levels of page tables,
pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with
__GFP_ZERO and sometimes followed by memset(0) instead.
More elaborate versions on arm64 and x86 account memory for the user page
tables and call to pgtable_pmd_page_ctor() as the part of PMD page
initialization.
Move the arm64 version to include/asm-generic/pgalloc.h and use the
generic version on several architectures.
The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
not enabled, so there is no functional change for most architectures
except of the addition of __GFP_ACCOUNT for allocation of user page
tables.
The pmd_free() is a wrapper for free_page() in all the cases, so no
functional change here.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: cleanup usage of <asm/pgalloc.h>"
Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table. These patches add
generic versions of these functions in <asm-generic/pgalloc.h> and enable
use of the generic functions where appropriate.
In addition, functions declared and defined in <asm/pgalloc.h> headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the <asm/pgalloc.h> included all over the place.
The first patch in this series removes unneeded includes of
<asm/pgalloc.h>
In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.
This patch (of 8):
In most cases <asm/pgalloc.h> header is required only for allocations of
page table memory. Most of the .c files that include that header do not
use symbols declared in <asm/pgalloc.h> and do not require that header.
As for the other header files that used to include <asm/pgalloc.h>, it is
possible to move that include into the .c file that actually uses symbols
from <asm/pgalloc.h> and drop the include from the header file.
The process was somewhat automated using
sed -i -E '/[<"]asm\/pgalloc\.h/d' \
$(grep -L -w -f /tmp/xx \
$(git grep -E -l '[<"]asm/pgalloc\.h'))
where /tmp/xx contains all the symbols defined in
arch/*/include/asm/pgalloc.h.
[rppt@linux.ibm.com: fix powerpc warning]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on Power9
or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on
Power10 and future processors, leaving us with no way to implement the
functionality it requests. This risks breaking userspace, though we believe
it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion checking.
We now allow stack expansion up to the rlimit, like other architectures.
- Remove the remnants of our (previously disabled) topology update code, which
tried to react to NUMA layout changes on virtualised systems, but was prone
to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link stack
(branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as usual.
Thanks to:
Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy,
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton
Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill
Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy,
Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A.
Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini,
Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe,
Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li
RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal
Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver
O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe
Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju,
Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung
Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong,
YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8tOxATHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDQfEAClXHWf6hnxB84bEu39D51NkVotL1IG
BRWFvyix+xHuUkHIouBPAAMl6ngY5X6wkYd+Z+CY9zHNtdSDoVlJE30YXdMQA/dE
L/rYxR1884yGR/uU/3wusboO68ReXwcKQPmKOymUfh0zH7ujyJsSWLpXFK1YDC5d
2TVVTi0Q+P5ucMHDh0L+AHirIxZvtZSp43+J7xLtywsj+XAxJWCTGo5WCJbdgbCA
Qbv3aOkVyUa3EgsbdM/STPpv82ebqT+PHxeSIO4Jw6ZODtKRH0R5YsWCApuY9eZ+
ebY9RLmgv9ZAhJqB2fv9A5NDcMoGpZNmjM7HrWpXwULKQpkBGHCzJ9FcSdHVMOx8
nbVMFjt4uzLwV1w8lFYslQ2tNH/uH2o9BlryV1RLpiiKokDAJO/NOsWN9y0u/I4J
EmAM5DSX2LgVvvas96IlGK8KX4xkOkf8FLX/H5UDvvAfloH8J4CZXk/CWCab/nqY
KEHPnMmYvQZ1w9SzyZg9sO/1p6Bl1Gmm75Jv2F1lBiRW/42VcGBI/qLsJ4lC59Fc
KbwufYNYYG38wbxDLW1HAPJhRonxIcaZj3EEqk7aTiLZ55nNbu8e2k32CpNXTGqt
npOhzJHimcq7L6+878ZW+xpbZwogIEUdRSsmwb6aT8za3ShnYwSA2Q3LYxh9xyGH
j3GifvPq6Efp3Q==
=QMY1
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on
Power9 or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be
unsupported on Power10 and future processors, leaving us with no way
to implement the functionality it requests. This risks breaking
userspace, though we believe it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion
checking. We now allow stack expansion up to the rlimit, like other
architectures.
- Remove the remnants of our (previously disabled) topology update
code, which tried to react to NUMA layout changes on virtualised
systems, but was prone to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link
stack (branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as
usual.
Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.
* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
selftests/powerpc: Fix pkey syscall redefinitions
powerpc: Fix circular dependency between percpu.h and mmu.h
powerpc/powernv/sriov: Fix use of uninitialised variable
selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
powerpc/40x: Fix assembler warning about r0
powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
cpuidle: pseries: Fixup exit latency for CEDE(0)
cpuidle: pseries: Add function to parse extended CEDE records
cpuidle: pseries: Set the latency-hint before entering CEDE
selftests/powerpc: Fix online CPU selection
powerpc/perf: Consolidate perf_callchain_user_[64|32]()
powerpc/pseries/hotplug-cpu: Remove double free in error path
powerpc/pseries/mobility: Add pr_debug() for device tree changes
powerpc/pseries/mobility: Set pr_fmt()
powerpc/cacheinfo: Warn if cache object chain becomes unordered
powerpc/cacheinfo: Improve diagnostics about malformed cache lists
powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
powerpc/cacheinfo: Set pr_fmt()
powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
...
x86:
* Report last CPU for debugging
* Emulate smaller MAXPHYADDR in the guest than in the host
* .noinstr and tracing fixes from Thomas
* nested SVM page table switching optimization and fixes
Generic:
* Unify shadow MMU cache data structures across architectures
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8pC+oUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNcOwgAjomqtEqQNlp7DdZT7VyyklzbxX1/
ud7v+oOJ8K4sFlf64lSthjPo3N9rzZCcw+yOXmuyuITngXOGc3tzIwXpCzpLtuQ1
WO1Ql3B/2dCi3lP5OMmsO1UAZqy9pKLg1dfeYUPk48P5+p7d/NPmk+Em5kIYzKm5
JsaHfCp2EEXomwmljNJ8PQ1vTjIQSSzlgYUBZxmCkaaX7zbEUMtxAQCStHmt8B84
33LczwXBm3viSWrzsoBV37I70+tseugiSGsCfUyupXOvq55d6D9FCqtCb45Hn4Vh
Ik8ggKdalsk/reiGEwNw1/3nr6mRMkHSbl+Mhc4waOIFf9dn0urgQgOaDg==
=YVx0
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"s390:
- implement diag318
x86:
- Report last CPU for debugging
- Emulate smaller MAXPHYADDR in the guest than in the host
- .noinstr and tracing fixes from Thomas
- nested SVM page table switching optimization and fixes
Generic:
- Unify shadow MMU cache data structures across architectures"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
KVM: SVM: Fix sev_pin_memory() error handling
KVM: LAPIC: Set the TDCR settable bits
KVM: x86: Specify max TDP level via kvm_configure_mmu()
KVM: x86/mmu: Rename max_page_level to max_huge_page_level
KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
KVM: VMX: Make vmx_load_mmu_pgd() static
KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
KVM: VMX: Drop a duplicate declaration of construct_eptp()
KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
KVM: Using macros instead of magic values
MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
KVM: VMX: Add guest physical address check in EPT violation and misconfig
KVM: VMX: introduce vmx_need_pf_intercept
KVM: x86: update exception bitmap on CPUID changes
KVM: x86: rename update_bp_intercept to update_exception_bitmap
...
Pull networking updates from David Miller:
1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.
2) Support UDP segmentation in code TSO code, from Eric Dumazet.
3) Allow flashing different flash images in cxgb4 driver, from Vishal
Kulkarni.
4) Add drop frames counter and flow status to tc flower offloading,
from Po Liu.
5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.
6) Various new indirect call avoidance, from Eric Dumazet and Brian
Vazquez.
7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
Yonghong Song.
8) Support querying and setting hardware address of a port function via
devlink, use this in mlx5, from Parav Pandit.
9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.
10) Switch qca8k driver over to phylink, from Jonathan McDowell.
11) In bpftool, show list of processes holding BPF FD references to
maps, programs, links, and btf objects. From Andrii Nakryiko.
12) Several conversions over to generic power management, from Vaibhav
Gupta.
13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
Yakunin.
14) Various https url conversions, from Alexander A. Klimov.
15) Timestamping and PHC support for mscc PHY driver, from Antoine
Tenart.
16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.
17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.
18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.
19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
drivers. From Luc Van Oostenryck.
20) XDP support for xen-netfront, from Denis Kirjanov.
21) Support receive buffer autotuning in MPTCP, from Florian Westphal.
22) Support EF100 chip in sfc driver, from Edward Cree.
23) Add XDP support to mvpp2 driver, from Matteo Croce.
24) Support MPTCP in sock_diag, from Paolo Abeni.
25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
infrastructure, from Jakub Kicinski.
26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.
27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.
28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.
29) Refactor a lot of networking socket option handling code in order to
avoid set_fs() calls, from Christoph Hellwig.
30) Add rfc4884 support to icmp code, from Willem de Bruijn.
31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.
32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.
33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.
34) Support TCP syncookies in MPTCP, from Flowian Westphal.
35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
Brivio.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
net: thunderx: initialize VF's mailbox mutex before first usage
usb: hso: remove bogus check for EINPROGRESS
usb: hso: no complaint about kmalloc failure
hso: fix bailout in error case of probe
ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
selftests/net: relax cpu affinity requirement in msg_zerocopy test
mptcp: be careful on subflow creation
selftests: rtnetlink: make kci_test_encap() return sub-test result
selftests: rtnetlink: correct the final return value for the test
net: dsa: sja1105: use detected device id instead of DT one on mismatch
tipc: set ub->ifindex for local ipv6 address
ipv6: add ipv6_dev_find()
net: openvswitch: silence suspicious RCU usage warning
Revert "vxlan: fix tos value before xmit"
ptp: only allow phase values lower than 1 period
farsync: switch from 'pci_' to 'dma_' API
wan: wanxl: switch from 'pci_' to 'dma_' API
hv_netvsc: do not use VF device if link is down
dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
net: macb: Properly handle phylink on at91sam9x
...
Here is the large set of char and misc and other driver subsystem
patches for 5.9-rc1. Lots of new driver submissions in here, and
cleanups and features for existing drivers.
Highlights are:
- habanalabs driver updates
- coresight driver updates
- nvmem driver updates
- huge number of "W=1" build warning cleanups from Lee Jones
- dyndbg updates
- virtbox driver fixes and updates
- soundwire driver updates
- mei driver updates
- phy driver updates
- fpga driver updates
- lots of smaller individual misc/char driver cleanups and fixes
Full details are in the shortlog.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXylccQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymofgCfZ1CxNWd0ZVM0YIn8cY9gO6ON7MsAnRq48hvn
Vjf4rKM73GC11bVF4Gyy
=Xq1R
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the large set of char and misc and other driver subsystem
patches for 5.9-rc1. Lots of new driver submissions in here, and
cleanups and features for existing drivers.
Highlights are:
- habanalabs driver updates
- coresight driver updates
- nvmem driver updates
- huge number of "W=1" build warning cleanups from Lee Jones
- dyndbg updates
- virtbox driver fixes and updates
- soundwire driver updates
- mei driver updates
- phy driver updates
- fpga driver updates
- lots of smaller individual misc/char driver cleanups and fixes
Full details are in the shortlog.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (322 commits)
habanalabs: remove unused but set variable 'ctx_asid'
nvmem: qcom-spmi-sdam: Enable multiple devices
dt-bindings: nvmem: SID: add binding for A100's SID controller
nvmem: update Kconfig description
nvmem: qfprom: Add fuse blowing support
dt-bindings: nvmem: Add properties needed for blowing fuses
dt-bindings: nvmem: qfprom: Convert to yaml
nvmem: qfprom: use NVMEM_DEVID_AUTO for multiple instances
nvmem: core: add support to auto devid
nvmem: core: Add nvmem_cell_read_u8()
nvmem: core: Grammar fixes for help text
nvmem: sc27xx: add sc2730 efuse support
nvmem: Enforce nvmem stride in the sysfs interface
MAINTAINERS: Add git tree for NVMEM FRAMEWORK
nvmem: sprd: Fix return value of sprd_efuse_probe()
drivers: android: Fix the SPDX comment style
drivers: android: Fix a variable declaration coding style issue
drivers: android: Remove braces for a single statement if-else block
drivers: android: Remove the use of else after return
drivers: android: Fix a variable declaration coding style issue
...
- Improved selftest coverage, timeouts, and reporting
- Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)
- Refactor __scm_install_fd() into __receive_fd() and fix buggy callers
- Introduce "addfd" command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oZcQWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJomDD/4x3j7eXREcXDsHOmlgEaHWGx4l
JldHFQhV5GjmD7gOkPcoZSG7NfG7F6VpwAJg7ZoR3qUkem7K8DFucxqgo1RldCot
nigleeLX6JeMS0Z+iwjAVZd+5t4xG4J/7GGDHIIMiG5qvwJ0Yf64o1bkjaB2Q/Bv
tluBg0WF32kFMG/ZwyY/V2QDbbue97CFPflybOh1o2nWbVzmUlFEEum3UUvZsxc8
smMsattJyuAV7kcEKzKrs8b010NdFZqwdbub5Np9W3XEXGBYMdIPoNsOQGmB9wby
j2ui0lzboXRG997jM7TCd1l/XZAv8aAwvPplw3FJRybzkOGs9NDyLMoz87yJpR1T
xp511vnMyMbyKIGdungkt7cIyzaictHwaYzznsmuNdCPEjTaIQJr1ctsa4GEgtqf
pnkktZ9YbMCcHU0CtZ8GlOVqA9wE+FUm0/u0zgikzJQsB+HcNItiARTTTHRyco7p
VJCqK8o4Zx4ELV7QNkSH4nhFkVgRopvrvBiPAGro/qwGOofBg8W8wM8O1+V/MDmp
zSU22v4SncT1Xb7dtmdJqDEeHfDikhaCAb4Je2hsGQWzbdAqwHGlpa7vpk9x3Q5r
L+XyP+Z+rPHlXYyypJwUvvOQhXOmP0zYxcEHxByqIBfXiwy+3dN4tDDfatWbccwl
uTlTDM8kmQn6QzSztA==
=yb55
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"There are a bunch of clean ups and selftest improvements along with
two major updates to the SECCOMP_RET_USER_NOTIF filter return:
EPOLLHUP support to more easily detect the death of a monitored
process, and being able to inject fds when intercepting syscalls that
expect an fd-opening side-effect (needed by both container folks and
Chrome). The latter continued the refactoring of __scm_install_fd()
started by Christoph, and in the process found and fixed a handful of
bugs in various callers.
- Improved selftest coverage, timeouts, and reporting
- Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)
- Refactor __scm_install_fd() into __receive_fd() and fix buggy
callers
- Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
Dhillon)"
* tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
seccomp: Introduce addfd ioctl to seccomp user notifier
fs: Expand __receive_fd() to accept existing fd
pidfd: Replace open-coded receive_fd()
fs: Add receive_fd() wrapper for __receive_fd()
fs: Move __scm_install_fd() to __receive_fd()
net/scm: Regularize compat handling of scm_detach_fds()
pidfd: Add missing sock updates for pidfd_getfd()
net/compat: Add missing sock updates for SCM_RIGHTS
selftests/seccomp: Check ENOSYS under tracing
selftests/seccomp: Refactor to use fixture variants
selftests/harness: Clean up kern-doc for fixtures
seccomp: Use -1 marker for end of mode 1 syscall list
seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
selftests/seccomp: Make kcmp() less required
seccomp: Use pr_fmt
selftests/seccomp: Improve calibration loop
selftests/seccomp: use 90s as timeout
selftests/seccomp: Expand benchmark to per-filter measurements
...
On big-endian architectures like OpenRISC, sparse outputs below warnings on
asm-generic/io.h. This is due to io statements like:
__raw_writel(cpu_to_le32(value), PCI_IOBASE + addr);
The __raw_writel() function expects native endianness, however
cpu_to_le32() returns __le32. On little-endian machines these match up
and there is no issue. However, on big-endian we get warnings, for IO
that is defined as little-endian the mismatch is expected.
The fix I propose is to __force to native endian.
Warnings:
./include/asm-generic/io.h:166:15: warning: cast to restricted __le16
./include/asm-generic/io.h:166:15: warning: cast to restricted __le16
./include/asm-generic/io.h:166:15: warning: cast to restricted __le16
./include/asm-generic/io.h:166:15: warning: cast to restricted __le16
./include/asm-generic/io.h:179:15: warning: cast to restricted __le32
./include/asm-generic/io.h:179:15: warning: cast to restricted __le32
./include/asm-generic/io.h:179:15: warning: cast to restricted __le32
./include/asm-generic/io.h:179:15: warning: cast to restricted __le32
./include/asm-generic/io.h:179:15: warning: cast to restricted __le32
./include/asm-generic/io.h:179:15: warning: cast to restricted __le32
./include/asm-generic/io.h:215:22: warning: incorrect type in argument 1 (different base types)
./include/asm-generic/io.h:215:22: expected unsigned short [usertype] value
./include/asm-generic/io.h:215:22: got restricted __le16 [usertype]
./include/asm-generic/io.h:225:22: warning: incorrect type in argument 1 (different base types)
./include/asm-generic/io.h:225:22: expected unsigned int [usertype] value
./include/asm-generic/io.h:225:22: got restricted __le32 [usertype]
Signed-off-by: Stafford Horne <shorne@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
- Improve uclamp performance by using a static key for the fast path
- Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for
better power efficiency of RT tasks on battery powered devices.
(The default is to maximize performance & reduce RT latencies.)
- Improve utime and stime tracking accuracy, which had a fixed boundary
of error, which created larger and larger relative errors as the values
become larger. This is now replaced with more precise arithmetics,
using the new mul_u64_u64_div_u64() helper in math64.h.
- Improve the deadline scheduler, such as making it capacity aware
- Improve frequency-invariant scheduling
- Misc cleanups in energy/power aware scheduling
- Add sched_update_nr_running tracepoint to track changes to nr_running
- Documentation additions and updates
- Misc cleanups and smaller fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oJDURHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ixLg//bqWzFlfWirvngTgDxDnplwUTyKXmMCcq
R1IYhlyK2O5FxvhbRmdmW11W3yzyTPvgCs6Q/70negGaPNe2w1OxfxiK9NMKz5eu
M1LoXas7pL5g7Pr/ZxxHk/8VqJLV4t9MkodiiInmV6lTaznT3sU6a/kpYQjJyFnG
Tuu9jd6JhdRKmePDJnNmUBoGQ7JiOQDcX4HtkcQ3OA+An3624tmJzbW1yts+uj7J
ZWo2EY60RfbA9MxQXGPOaR/nAjngWs4Q6tddAh10mftsPq1gR2iFUKju1d31MQt/
RHLdiqJf+AyUC4popKG7a+7ilCKMBwPociSreTJNPyEUQ1X4AM3vUVk4yjUoiDph
k2WdsCF8/JRdhXg0NnrpPUqOaAbQj53EeXnitEb92E7WyTZgLOvAtpV//xZo6utp
2QHerfrQ9SoGQjz/ho78za5vQtV1x25yDhd+X4XV4QEhIy85G9/2JCpC/Kc/TXLf
OO7A4X69XztKTEJhP60g8ldCPUe4N2vbh1vKY6oAD8AFQVVNZ6n7375/Qa//b0/k
++hcYkPc2EK97/aBFdvzDgqb7aUo7Mtn2ibke16sQU4szulaoRuAHQG4jdGKMwbD
dk2VBoxyxeYFXWHsNneSe87+ha3sd0dSN0ul1EB/SlFrVELMvy634YXnMYGW8ima
PzyPB0ezpuA=
=PbO7
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Improve uclamp performance by using a static key for the fast path
- Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for
better power efficiency of RT tasks on battery powered devices.
(The default is to maximize performance & reduce RT latencies.)
- Improve utime and stime tracking accuracy, which had a fixed boundary
of error, which created larger and larger relative errors as the
values become larger. This is now replaced with more precise
arithmetics, using the new mul_u64_u64_div_u64() helper in math64.h.
- Improve the deadline scheduler, such as making it capacity aware
- Improve frequency-invariant scheduling
- Misc cleanups in energy/power aware scheduling
- Add sched_update_nr_running tracepoint to track changes to nr_running
- Documentation additions and updates
- Misc cleanups and smaller fixes
* tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched/doc: Factorize bits between sched-energy.rst & sched-capacity.rst
sched/doc: Document capacity aware scheduling
sched: Document arch_scale_*_capacity()
arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE
Documentation/sysctl: Document uclamp sysctl knobs
sched/uclamp: Add a new sysctl to control RT default boost value
sched/uclamp: Fix a deadlock when enabling uclamp static key
sched: Remove duplicated tick_nohz_full_enabled() check
sched: Fix a typo in a comment
sched/uclamp: Remove unnecessary mutex_init()
arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE
sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entry
arch_topology, sched/core: Cleanup thermal pressure definition
trace/events/sched.h: fix duplicated word
linux/sched/mm.h: drop duplicated words in comments
smp: Fix a potential usage of stale nr_cpus
sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal
sched: nohz: stop passing around unused "ticks" parameter.
sched: Better document ttwu()
sched: Add a tracepoint to track rq->nr_running
...
- LKMM updates: mostly documentation changes, but also some new litmus tests for atomic ops.
- KCSAN updates: the most important change is that GCC 11 now has all fixes in place
to support KCSAN, so GCC support can be enabled again. Also more annotations.
- futex updates: minor cleanups and simplifications
- seqlock updates: merge preparatory changes/cleanups for the 'associated locks' facilities.
- lockdep updates:
- simplify IRQ trace event handling
- add various new debug checks
- simplify header dependencies, split out <linux/lockdep_types.h>, decouple
lockdep from other low level headers some more
- fix NMI handling
- misc cleanups and smaller fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8n9/wRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hZFQ//dD+AKw9Nym+WbylovmeD0qxWxPyeN/jG
vBVDTOJIJLtZTkZf6YHcYOJlPwaMDYUQluqTPQhsaQZy/NoEb5NM2cFAj2R9gjyT
O8665T1dvhW9Sh353mBpuwviqdrnvCeHTBEcglSlFY7hxToYAflUN0+DXGVtNys8
PFNf3L9SHT0GLVC8+di/eJzQaRqxiB0Pq7kvh2RvPJM/dcQNA9Ho3CCNO5j6qGoY
u7OnMT8xJXkgbdjjUO4RO0v9VjMuNthZ2JiONDgvgKtJfIL2wt5YXIv1EYX0GuWp
WZgIzE4o1G7GJOOzKpFfZFyK8grHu2fWgK1plvodWjlLkBmltJZ1qyOM+wngd/m2
TgtPo73/YFbxFUbbBpkb0eiIaH2t99kMvfCWd05+GiPCtzn9UL9GfFRWd42vonwc
sQWjFrHKlnuzifUfNcLmKg7R2nUtF3Dm/SydiTJ+9NtH/QA17YJKWnlE1moulNtQ
p7H7+8UdcvSQ7F38A74v2IYNIyDsv5qcE8ar4QHdaanBBX/LCyD0UlfgsgxEReXf
GDKkpx7LFQlI6Y2YB+dZgkCwhNBl3/OQ3v6hC95B37fA67dAIQyPIWHiHbaM+029
gghqU4GcUcbjSnHPzl9PPL+hi9MyXrMjpb7CBXytg4NI4EE1waHR+0kX14V8ndRj
MkWQOKPUgB0=
=3MTT
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- LKMM updates: mostly documentation changes, but also some new litmus
tests for atomic ops.
- KCSAN updates: the most important change is that GCC 11 now has all
fixes in place to support KCSAN, so GCC support can be enabled again.
Also more annotations.
- futex updates: minor cleanups and simplifications
- seqlock updates: merge preparatory changes/cleanups for the
'associated locks' facilities.
- lockdep updates:
- simplify IRQ trace event handling
- add various new debug checks
- simplify header dependencies, split out <linux/lockdep_types.h>,
decouple lockdep from other low level headers some more
- fix NMI handling
- misc cleanups and smaller fixes
* tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
kcsan: Improve IRQ state trace reporting
lockdep: Refactor IRQ trace events fields into struct
seqlock: lockdep assert non-preemptibility on seqcount_t write
lockdep: Add preemption enabled/disabled assertion APIs
seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
seqlock: Reorder seqcount_t and seqlock_t API definitions
seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
seqlock: Properly format kernel-doc code samples
Documentation: locking: Describe seqlock design and usage
locking/qspinlock: Do not include atomic.h from qspinlock_types.h
locking/atomic: Move ATOMIC_INIT into linux/types.h
lockdep: Move list.h inclusion into lockdep.h
locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs
futex: Remove unused or redundant includes
futex: Consistently use fshared as boolean
futex: Remove needless goto's
futex: Remove put_futex_key()
rwsem: fix commas in initialisation
docs: locking: Replace HTTP links with HTTPS ones
...
- Removal of the tremendously unpopular read_barrier_depends() barrier,
which is a NOP on all architectures apart from Alpha, in favour of
allowing architectures to override READ_ONCE() and do whatever dance
they need to do to ensure address dependencies provide LOAD ->
LOAD/STORE ordering. This work also offers a potential solution if
compilers are shown to convert LOAD -> LOAD address dependencies into
control dependencies (e.g. under LTO), as weakly ordered architectures
will effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic, augment
the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the device
ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions and
kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl8oTcsACgkQa9axLQDI
XvEj6hAAkn39mO5xrR/Vhpg3DyFPk63ZlMSX9SsOeVyaLbovT6stTs1XAZXPpnkt
rV3gwACyGSrqH6+uey9pHgHJuPF2TdrGEVK08yVKo9KGW/6yXSIncdKFE4jUJ/WJ
wF5j7eMET2aGzcpm5AlzMmq6HOrKB8nZac9H8/x6H+Ox2WdgJkEjOkDvyqACUyum
N3FsTZkWj2pIkTXHNgDZ8KjxVLO8HlFaB2hkxFDl9NPlX2UTCQJ8Tg1KiPLafKaK
gUvH4usQDFdb5RU/UWogre37J4emO0ZTApZOyju+U+PMMWlWVHjZ4isUIS9zz/AE
JNZ23dnKZX2HrYa5p8HZx175zwj/vXUqUHCZPLvQXaAudCEhF8BVljPiG0e80FV5
GHFUgUbylKspp01I/9L+2JvsG96Mr0e+P3Sx7L2HTI42cmtoSa14+MpoSRj7zlft
Qcl8hfrVOjCjUnFRHa/1y1cGvnD9GbgnKJR7zgVxl9bD/Jd48r1HUtwRORZCzWFr
mRPVbPS72fWxMzMV9DZYJm02jJY9kLX2BMl49njbB8MhAhzOvrMVzoVVtMMeRFLR
XHeJpmg36W09FiRGe7LRXlkXIhCQzQG2bJfiphuupCfhjRAitPoq8I925G6Pig60
c8RWaXGU7PrEsdMNrL83vekvGKgqrkoFkRVtsCoQ2X6Hvu/XdYI=
=mh79
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 and cross-arch updates from Catalin Marinas:
"Here's a slightly wider-spread set of updates for 5.9.
Going outside the usual arch/arm64/ area is the removal of
read_barrier_depends() series from Will and the MSI/IOMMU ID
translation series from Lorenzo.
The notable arm64 updates include ARMv8.4 TLBI range operations and
translation level hint, time namespace support, and perf.
Summary:
- Removal of the tremendously unpopular read_barrier_depends()
barrier, which is a NOP on all architectures apart from Alpha, in
favour of allowing architectures to override READ_ONCE() and do
whatever dance they need to do to ensure address dependencies
provide LOAD -> LOAD/STORE ordering.
This work also offers a potential solution if compilers are shown
to convert LOAD -> LOAD address dependencies into control
dependencies (e.g. under LTO), as weakly ordered architectures will
effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at
LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic,
augment the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the
device ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions
and kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words"
Link: http://lkml.kernel.org/r/20200710165203.31284-1-will@kernel.org
Link: http://lkml.kernel.org/r/20200619082013.13661-1-lorenzo.pieralisi@arm.com
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (82 commits)
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
arm64/mm: save memory access in check_and_switch_context() fast switch path
arm64: sigcontext.h: delete duplicated word
arm64: ptrace.h: delete duplicated word
arm64: pgtable-hwdef.h: delete duplicated words
bus: fsl-mc: Add ACPI support for fsl-mc
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
of/irq: Make of_msi_map_rid() PCI bus agnostic
of/irq: make of_msi_map_get_device_domain() bus agnostic
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
of/device: Add input id to of_dma_configure()
of/iommu: Make of_map_rid() PCI agnostic
ACPI/IORT: Add an input ID to acpi_dma_configure()
ACPI/IORT: Remove useless PCI bus walk
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
arm64: enable time namespace support
arm64/vdso: Restrict splitting VVAR VMA
arm64/vdso: Handle faults on timens page
...
* for-next/read-barrier-depends:
: Allow architectures to override __READ_ONCE()
arm64: Reduce the number of header files pulled into vmlinux.lds.S
compiler.h: Move compiletime_assert() macros into compiler_types.h
checkpatch: Remove checks relating to [smp_]read_barrier_depends()
include/linux: Remove smp_read_barrier_depends() from comments
tools/memory-model: Remove smp_read_barrier_depends() from informal doc
Documentation/barriers/kokr: Remove references to [smp_]read_barrier_depends()
Documentation/barriers: Remove references to [smp_]read_barrier_depends()
locking/barriers: Remove definitions for [smp_]read_barrier_depends()
alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb()
vhost: Remove redundant use of read_barrier_depends() barrier
asm/rwonce: Don't pull <asm/barrier.h> into 'asm-generic/rwonce.h'
asm/rwonce: Remove smp_read_barrier_depends() invocation
alpha: Override READ_ONCE() with barriered implementation
asm/rwonce: Allow __READ_ONCE to be overridden by the architecture
compiler.h: Split {READ,WRITE}_ONCE definitions out into rwonce.h
tools: bpf: Use local copy of headers including uapi/linux/filter.h
Conflicts:
arch/arm/include/asm/percpu.h
As Stephen Rothwell noted, there's a conflict between this commit
in locking/core:
a21ee6055c ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
and this fresh upstream commit:
aa54ea903a ("ARM: percpu.h: fix build error")
a21ee6055c is a simpler solution to the dependency problem and doesn't
further increase header hell - so this conflict resolution effectively
reverts aa54ea903a and uses the a21ee6055c solution.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch breaks a header loop involving qspinlock_types.h.
The issue is that qspinlock_types.h includes atomic.h, which then
eventually includes kernel.h which could lead back to the original
file via spinlock_types.h.
As ATOMIC_INIT is now defined by linux/types.h, there is no longer
any need to include atomic.h from qspinlock_types.h. This also
allows the CONFIG_PARAVIRT hack to be removed since it was trying
to prevent exactly this loop.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20200729123316.GC7047@gondor.apana.org.au
This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h.
This allows users of atomic_t to use ATOMIC_INIT without having to
include atomic.h as that way may lead to header loops.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20200729123105.GB7047@gondor.apana.org.au
This is a single bugfix for a regression introduced through a
typo in the v5.8 merge window, leading to incorrect data
returned from inl() on some architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl8gGBEACgkQmmx57+YA
GNldaQ//efCqrYB0DYmj0NVuSWnO5ol84agWMjGM6lfBPRetxpVfprM1u8Auwe8Q
GAGsEFn3thMj4DTIWaBGc4iCbXikjWTKQUzzsz1U05+JVf+lXVwmWTWpfZ1wAdnl
UNjeP9m0pvCgbCmfmPo3vjwc6UYlHDcBIXxzagpLEjPqtvQL/SpEgT3GJGoThrqR
bH/l2vJ+ajsw/cIaVd6xwLdasaPBqvl1gelUnkeMvPJddb9K7DKeNbU0N+yz4m5b
9+rb5o3rXdIIEfS1njqCZoHEs7vEoRvGrSggEDXJyHjD/wd2+1C6sEyF7vOJBrmC
gVcFuu3vNLtrHPUo+vI16ZWcS6rpxRJ+5QC7/Lyj3aC8QtbYoGUoHzNLPPYBraUH
tqB/gBqOrh8FDcRs1WqrvXaflhVxWnU9119c98faLI6W1ZthU2/9XH1ZUfF3LjEp
7O3kqrgbzpZp/QrbvPI4hG3xCMh+1whkh4aTw7cCVxu1io/eGjG2bjA2Yutnib3c
0EdBZ58W4bjxO4Os3UB9zlrg4B1wGDZWwu4F5vcJfnU8ghHpcOlSmwutzGVuFQLZ
tJlZLJysFilVxdnkiAIuJzh7uj8AjGWeHvUwwn+X/UQ19p5FwhqHDDIjOJtJuak6
+idjl37eRPzY6/IDoUUZwURpUxFA57nx8AvvBGmjgZ4vswsfF6c=
=XnT4
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-fixes-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic into master
Pull asm-generic bugfix from Arnd Bergmann:
"A single bugfix for a regression introduced through a typo in the v5.8
merge window, leading to incorrect data returned from inl() on some
architectures"
* tag 'asm-generic-fixes-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
io: Fix return type of _inb and _inl
The return type of functions _inb, _inw and _inl are all u16 which looks
wrong. This patch makes them u8, u16 and u32 respectively.
The original commit text for these does not indicate that these should
be all forced to u16.
Fixes: f009c89df7 ("io: Provide _inX() and _outX()")
Signed-off-by: Stafford Horne <shorne@gmail.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This implements the generic paravirt qspinlocks using H_PROD and
H_CONFER to kick and wait.
This uses an un-directed yield to any CPU rather than the directed
yield to a pre-empted lock holder that paravirtualised simple
spinlocks use, that requires no kick hcall. This is something that
could be investigated and improved in future.
Performance results can be found in the commit which added queued
spinlocks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-5-npiggin@gmail.com
These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.
With this series including subsequent patches, on a 16 socket 1536
thread POWER9, a stress test such as same-file open/close from all
CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs
384158op/s (33x faster), where the difference in throughput between
the fastest and slowest thread goes from 7x to 1.4x.
Thanks to the fast path being identical in terms of atomics and
barriers (after a subsequent optimisation patch), single threaded
performance is not changed (no measurable difference).
On smaller systems, performance and fairness seems to be generally
improved. Using dbench on tmpfs as a test (that starts to run into
kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was
tested with bare metal and KVM guest configurations. Results can be
found here:
https://github.com/linuxppc/issues/issues/305#issuecomment-663487453
Observations are:
- Queued spinlocks are equal when contention is insignificant, as
expected and as measured with microbenchmarks.
- When there is contention, on bare metal queued spinlocks have better
throughput and max latency at all points.
- When virtualised, queued spinlocks are slightly worse approaching
peak throughput, but significantly better throughput and max latency
at all points beyond peak, until queued spinlock maximum latency
rises when clients are 2x vCPUs.
The regressions haven't been analysed very well yet, there are a lot
of things that can be tuned, particularly the paravirtualised locking,
but the numbers already look like a good net win even on relatively
small systems.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
dyndbg populates its callsite info into __verbose section, change that
to a more specific and descriptive name, __dyndbg.
Also, per checkpatch:
simplify __attribute(..) to __section(__dyndbg) declaration.
and 1 spelling fix, decriptor
Acked-by: <jbaron@akamai.com>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20200719231058.1586423-6-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus pointed out that compiler.h - which is a key header that gets included in every
single one of the 28,000+ kernel files during a kernel build - was bloated in:
6553896666: ("vmlinux.lds.h: Create section for protection against instrumentation")
Linus noted:
> I have pulled this, but do we really want to add this to a header file
> that is _so_ core that it gets included for basically every single
> file built?
>
> I don't even see those instrumentation_begin/end() things used
> anywhere right now.
>
> It seems excessive. That 53 lines is maybe not a lot, but it pushed
> that header file to over 12kB, and while it's mostly comments, it's
> extra IO and parsing basically for _every_ single file compiled in the
> kernel.
>
> For what appears to be absolutely zero upside right now, and I really
> don't see why this should be in such a core header file!
Move these primitives into a new header: <linux/instrumentation.h>, and include that
header in the headers that make use of it.
Unfortunately one of these headers is asm-generic/bug.h, which does get included
in a lot of places, similarly to compiler.h. So the de-bloating effect isn't as
good as we'd like it to be - but at least the interfaces are defined separately.
No change to functionality intended.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200604071921.GA1361070@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
page-aligned, but the end of the .bss..page_aligned section is not
guaranteed to be page-aligned.
As a result, objects from other .bss sections may end up on the same 4k
page as the idt_table, and will accidentially get mapped read-only during
boot, causing unexpected page-faults when the kernel writes to them.
This could be worked around by making the objects in the page aligned
sections page sized, but that's wrong.
Explicit sections which store only page aligned objects have an implicit
guarantee that the object is alone in the page in which it is placed. That
works for all objects except the last one. That's inconsistent.
Enforcing page sized objects for these sections would wreckage memory
sanitizers, because the object becomes artificially larger than it should
be and out of bound access becomes legit.
Align the end of the .bss..page_aligned and .data..page_aligned section on
page-size so all objects places in these sections are guaranteed to have
their own page.
[ tglx: Amended changelog ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org
There are no remaining users of [smp_]read_barrier_depends(), so
remove it from the generic implementation of 'barrier.h'.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Now that 'smp_read_barrier_depends()' has gone the way of the Norwegian
Blue, drop the inclusion of <asm/barrier.h> in 'asm-generic/rwonce.h'.
This requires fixups to some architecture vdso headers which were
previously relying on 'asm/barrier.h' coming in via 'linux/compiler.h'.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will@kernel.org>
Alpha overrides __READ_ONCE() directly, so there's no need to use
smp_read_barrier_depends() in the core code. This also means that
__READ_ONCE() can be relied upon to provide dependency ordering.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
The meat and potatoes of READ_ONCE() is defined by the __READ_ONCE()
macro, which uses a volatile casts in an attempt to avoid tearing of
byte, halfword, word and double-word accesses. Allow this to be
overridden by the architecture code in the case that things like memory
barriers are also required.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for allowing architectures to define their own
implementation of the READ_ONCE() macro, move the generic
{READ,WRITE}_ONCE() definitions out of the unwieldy 'linux/compiler.h'
file and into a new 'rwonce.h' header under 'asm-generic'.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Most of the tree only uses and implements csum_partial_copy_nocheck,
but the c6x and lib/checksum.c implement a csum_partial_copy that
isn't used anywere except to define csum_partial_copy. Get rid of
this pointless alias.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix kernel text addresses for relocatable images booting using EFI
and with KASLR disabled so that they match the vmlinux ELF binary.
- Fix unloading and unbinding of PMU driver modules.
- Fix generic mmiowb() when writeX() is called from preemptible context
(reported by the riscv folks).
- Fix ptrace hardware single-step interactions with signal handlers,
system calls and reverse debugging.
- Fix reporting of 64-bit x0 register for 32-bit tasks via 'perf_regs'.
- Add comments describing syscall entry/exit tracing ABI.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl8RgvsQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNKNcB/9wsRJDxQDsCbV83xn5LrpR2qCs6G1UkVWT
7peEQ21Brh60DamHlr9FdwPrIO/C62tQItU/hjCyk5oXZP3soW4J5vAXujP8wPrL
bPe5933HuYkgRnnInCcrACmOnIacO9HGns8OoOKtSdZ6HCaKarL9V4hOfzWVSn7L
RicX+xdn89lzZ+AD2MXYq1Q6mLcpKWx9wa0PSiYL+rGjsUqhwHvJcsYcSMp95/Ay
ZSK27jmxjjTXNW56hE/svz4dzkBvL+8ezwodhjZtz2co8PdGhH2Azbq3QtHeICy+
JB7lSx8A1sYIF3ASAhDYglCOCNlTb1dDN5LYfRwMWZ8cQfnRVdeV
=o4Ve
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into master
Pull arm64 fixes from Will Deacon:
"A batch of arm64 fixes.
Although the diffstat is a bit larger than we'd usually have at this
stage, a decent amount of it is the addition of comments describing
our syscall tracing behaviour, and also a sweep across all the modular
arm64 PMU drivers to make them rebust against unloading and unbinding.
There are a couple of minor things kicking around at the moment (CPU
errata and module PLTs for very large modules), but I'm not expecting
any significant changes now for us in 5.8.
- Fix kernel text addresses for relocatable images booting using EFI
and with KASLR disabled so that they match the vmlinux ELF binary.
- Fix unloading and unbinding of PMU driver modules.
- Fix generic mmiowb() when writeX() is called from preemptible
context (reported by the riscv folks).
- Fix ptrace hardware single-step interactions with signal handlers,
system calls and reverse debugging.
- Fix reporting of 64-bit x0 register for 32-bit tasks via
'perf_regs'.
- Add comments describing syscall entry/exit tracing ABI"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
drivers/perf: Prevent forced unbinding of PMU drivers
asm-generic/mmiowb: Allow mmiowb_set_pending() when preemptible()
arm64: Use test_tsk_thread_flag() for checking TIF_SINGLESTEP
arm64: ptrace: Use NO_SYSCALL instead of -1 in syscall_trace_enter()
arm64: syscall: Expand the comment about ptrace and syscall(-1)
arm64: ptrace: Add a comment describing our syscall entry/exit trap ABI
arm64: compat: Ensure upper 32 bits of x0 are zero on syscall return
arm64: ptrace: Override SPSR.SS when single-stepping is enabled
arm64: ptrace: Consistently use pseudo-singlestep exceptions
drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling
efi/libstub/arm64: Retain 2MB kernel Image alignment if !KASLR
Although mmiowb() is concerned only with serialising MMIO writes occuring
in contexts where a spinlock is held, the call to mmiowb_set_pending()
from the MMIO write accessors can occur in preemptible contexts, such
as during driver probe() functions where ordering between CPUs is not
usually a concern, assuming that the task migration path provides the
necessary ordering guarantees.
Unfortunately, the default implementation of mmiowb_set_pending() is not
preempt-safe, as it makes use of a a per-cpu variable to track its
internal state. This has been reported to generate the following splat
on riscv:
| BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
| caller is regmap_mmio_write32le+0x1c/0x46
| CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3-hfu+ #1
| Call Trace:
| walk_stackframe+0x0/0x7a
| dump_stack+0x6e/0x88
| regmap_mmio_write32le+0x18/0x46
| check_preemption_disabled+0xa4/0xaa
| regmap_mmio_write32le+0x18/0x46
| regmap_mmio_write+0x26/0x44
| regmap_write+0x28/0x48
| sifive_gpio_probe+0xc0/0x1da
Although it's possible to fix the driver in this case, other splats have
been seen from other drivers, including the infamous 8250 UART, and so
it's better to address this problem in the mmiowb core itself.
Fix mmiowb_set_pending() by using the raw_cpu_ptr() to get at the mmiowb
state and then only updating the 'mmiowb_pending' field if we are not
preemptible (i.e. we have a non-zero nesting count).
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Palmer Dabbelt <palmer@dabbelt.com>
Reported-by: Emil Renner Berthing <kernel@esmil.dk>
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Link: https://lore.kernel.org/r/20200716112816.7356-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()
Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-5-aneesh.kumar@linux.ibm.com
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-07-13
The following pull-request contains BPF updates for your *net-next* tree.
We've added 36 non-merge commits during the last 7 day(s) which contain
a total of 62 files changed, 2242 insertions(+), 468 deletions(-).
The main changes are:
1) Avoid trace_printk warning banner by switching bpf_trace_printk to use
its own tracing event, from Alan.
2) Better libbpf support on older kernels, from Andrii.
3) Additional AF_XDP stats, from Ciara.
4) build time resolution of BTF IDs, from Jiri.
5) BPF_CGROUP_INET_SOCK_RELEASE hook, from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding support to generate .BTF_ids section that will hold BTF
ID lists for verifier.
Adding macros that will help to define lists of BTF ID values
placed in .BTF_ids section. They are initially filled with zeros
(during compilation) and resolved later during the linking phase
by resolve_btfids tool.
Following defines list of one BTF ID value:
BTF_ID_LIST(bpf_skb_output_btf_ids)
BTF_ID(struct, sk_buff)
It also defines following variable to access the list:
extern u32 bpf_skb_output_btf_ids[];
The BTF_ID_UNUSED macro defines 4 zero bytes. It's used when we
want to define 'unused' entry in BTF_ID_LIST, like:
BTF_ID_LIST(bpf_skb_output_btf_ids)
BTF_ID(struct, sk_buff)
BTF_ID_UNUSED
BTF_ID(struct, task_struct)
Suggested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200711215329.41165-4-jolsa@kernel.org
The terminator for the mode 1 syscalls list was a 0, but that could be
a valid syscall number (e.g. x86_64 __NR_read). By luck, __NR_read was
listed first and the loop construct would not test it, so there was no
bug. However, this is fragile. Replace the terminator with -1 instead,
and make the variable name for mode 1 syscall lists more descriptive.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Move x86's 'struct kvm_mmu_memory_cache' to common code in anticipation
of moving the entire x86 implementation code to common KVM and reusing
it for arm64 and MIPS. Add a new architecture specific asm/kvm_types.h
to control the existence and parameters of the struct. The new header
is needed to avoid a chicken-and-egg problem with asm/kvm_host.h as all
architectures define instances of the struct in their vCPU structs.
Add an asm-generic version of kvm_types.h to avoid having empty files on
PPC and s390 in the long term, and for arm64 and mips in the short term.
Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-15-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For some mysterious reason GCC-4.9 has a 64 byte section alignment for
structures, all other GCC versions (and Clang) tested (including 4.8
and 5.0) are fine with the 32 bytes alignment.
Getting this right is important for the new SCHED_DATA macro that
creates an explicitly ordered array of 'struct sched_class' in the
linker script and expect pointer arithmetic to work.
Fixes: c3a340f7e7 ("sched: Have sched_class_highest define by vmlinux.lds.h")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200630144905.GX4817@hirez.programming.kicks-ass.net
tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and
tlb->end, then set corresponding cleared_*.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200625080314.230-5-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Some s390 builds get these warnings:
include/asm-generic/cacheflush.h:16:42: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:22:46: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:28:45: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:36:44: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:44:45: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:52:50: warning: 'struct address_space' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:58:52: warning: 'struct address_space' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:75:17: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:74:45: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:82:16: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/cacheflush.h:81:50: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
Forward declare the named structs to get rid of these.
Link: http://lkml.kernel.org/r/20200623135714.4dae4b8a@canb.auug.org.au
Fixes: e0cf615d72 ("asm-generic: don't include <linux/mm.h> in cacheflush.h")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the sched_class descriptors are defined by the linker script, and
this needs to be aware of the existance of stop_sched_class when SMP is
enabled or not, as it is used as the "highest" priority when defined. Move
the declaration of sched_class_highest to the same location in the linker
script that inserts stop_sched_class, and this will also make it easier to
see what should be defined as the highest class, as this linker script
location defines the priorities as well.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191219214558.682913590@goodmis.org
In order to make a micro optimization in pick_next_task(), the order of the
sched class descriptor address must be in the same order as their priority
to each other. That is:
&idle_sched_class < &fair_sched_class < &rt_sched_class <
&dl_sched_class < &stop_sched_class
In order to guarantee this order of the sched class descriptors, add each
one into their own data section and force the order in the linker script.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/157675913272.349305.8936736338884044103.stgit@localhost.localdomain
Since commit 9e343b467c ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses") it is not possible anymore to
use READ_ONCE() to access complex page table entries like the one
defined for powerpc 8xx with 16k size pages.
Define a ptep_get() helper that architectures can override instead
of performing a READ_ONCE() on the page table entry pointer.
Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/087fa12b6e920e32315136b998aa834f99242695.1592225558.git.christophe.leroy@csgroup.eu
This all started about 6 month ago with the attempt to move the Posix CPU
timer heavy lifting out of the timer interrupt code and just have lockless
quick checks in that code path. Trivial 5 patches.
This unearthed an inconsistency in the KVM handling of task work and the
review requested to move all of this into generic code so other
architectures can share.
Valid request and solved with another 25 patches but those unearthed
inconsistencies vs. RCU and instrumentation.
Digging into this made it obvious that there are quite some inconsistencies
vs. instrumentation in general. The int3 text poke handling in particular
was completely unprotected and with the batched update of trace events even
more likely to expose to endless int3 recursion.
In parallel the RCU implications of instrumenting fragile entry code came
up in several discussions.
The conclusion of the X86 maintainer team was to go all the way and make
the protection against any form of instrumentation of fragile and dangerous
code pathes enforcable and verifiable by tooling.
A first batch of preparatory work hit mainline with commit d5f744f9a2.
The (almost) full solution introduced a new code section '.noinstr.text'
into which all code which needs to be protected from instrumentation of all
sorts goes into. Any call into instrumentable code out of this section has
to be annotated. objtool has support to validate this. Kprobes now excludes
this section fully which also prevents BPF from fiddling with it and all
'noinstr' annotated functions also keep ftrace off. The section, kprobes
and objtool changes are already merged.
The major changes coming with this are:
- Preparatory cleanups
- Annotating of relevant functions to move them into the noinstr.text
section or enforcing inlining by marking them __always_inline so the
compiler cannot misplace or instrument them.
- Splitting and simplifying the idtentry macro maze so that it is now
clearly separated into simple exception entries and the more
interesting ones which use interrupt stacks and have the paranoid
handling vs. CR3 and GS.
- Move quite some of the low level ASM functionality into C code:
- enter_from and exit to user space handling. The ASM code now calls
into C after doing the really necessary ASM handling and the return
path goes back out without bells and whistels in ASM.
- exception entry/exit got the equivivalent treatment
- move all IRQ tracepoints from ASM to C so they can be placed as
appropriate which is especially important for the int3 recursion
issue.
- Consolidate the declaration and definition of entry points between 32
and 64 bit. They share a common header and macros now.
- Remove the extra device interrupt entry maze and just use the regular
exception entry code.
- All ASM entry points except NMI are now generated from the shared header
file and the corresponding macros in the 32 and 64 bit entry ASM.
- The C code entry points are consolidated as well with the help of
DEFINE_IDTENTRY*() macros. This allows to ensure at one central point
that all corresponding entry points share the same semantics. The
actual function body for most entry points is in an instrumentable
and sane state.
There are special macros for the more sensitive entry points,
e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
They allow to put the whole entry instrumentation and RCU handling
into safe places instead of the previous pray that it is correct
approach.
- The INT3 text poke handling is now completely isolated and the
recursion issue banned. Aside of the entry rework this required other
isolation work, e.g. the ability to force inline bsearch.
- Prevent #DB on fragile entry code, entry relevant memory and disable
it on NMI, #MC entry, which allowed to get rid of the nested #DB IST
stack shifting hackery.
- A few other cleanups and enhancements which have been made possible
through this and already merged changes, e.g. consolidating and
further restricting the IDT code so the IDT table becomes RO after
init which removes yet another popular attack vector
- About 680 lines of ASM maze are gone.
There are a few open issues:
- An escape out of the noinstr section in the MCE handler which needs
some more thought but under the aspect that MCE is a complete
trainwreck by design and the propability to survive it is low, this was
not high on the priority list.
- Paravirtualization
When PV is enabled then objtool complains about a bunch of indirect
calls out of the noinstr section. There are a few straight forward
ways to fix this, but the other issues vs. general correctness were
more pressing than parawitz.
- KVM
KVM is inconsistent as well. Patches have been posted, but they have
not yet been commented on or picked up by the KVM folks.
- IDLE
Pretty much the same problems can be found in the low level idle code
especially the parts where RCU stopped watching. This was beyond the
scope of the more obvious and exposable problems and is on the todo
list.
The lesson learned from this brain melting exercise to morph the evolved
code base into something which can be validated and understood is that once
again the violation of the most important engineering principle
"correctness first" has caused quite a few people to spend valuable time on
problems which could have been avoided in the first place. The "features
first" tinkering mindset really has to stop.
With that I want to say thanks to everyone involved in contributing to this
effort. Special thanks go to the following people (alphabetical order):
Alexandre Chartre
Andy Lutomirski
Borislav Petkov
Brian Gerst
Frederic Weisbecker
Josh Poimboeuf
Juergen Gross
Lai Jiangshan
Macro Elver
Paolo Bonzini
Paul McKenney
Peter Zijlstra
Vitaly Kuznetsov
Will Deacon
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7j510THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoU2WD/4refvaNm08fG7aiVYem3JJzr0+Pq5O
/opwnI/1D973ApApj5W/Nd53sN5tVqOiXncSKgywRBWZxRCAGjVYypl9rjpvXu4l
HlMjhEKBmWkDryxxrM98Vr7hl3hnId5laR56oFfH+G4LUsItaV6Uak/HfXZ4Mq1k
iYVbEtl2CN+KJjvSgZ6Y1l853Ab5mmGvmeGNHHWCj8ZyjF3cOLoelDTQNnsb0wXM
crKXBcXJSsCWKYyJ5PTvB82crQCET7Su+LgwK06w/ZbW1//2hVIjSCiN5o/V+aRJ
06BZNMj8v9tfglkN8LEQvRIjTlnEQ2sq3GxbrVtA53zxkzbBCBJQ96w8yYzQX0ux
yhqQ/aIZJ1wTYEjJzSkftwLNMRHpaOUnKvJndXRKAYi+eGI7syF61qcZSYGKuAQ/
bK3b/CzU6QWr1235oTADxh4isEwxA0Pg5wtJCfDDOG0MJ9ALMSOGUkhoiz5EqpkU
mzFAwfG/Uj7hRjlkms7Yj2OjZfnU7iypj63GgpXghLjr5ksRFKEOMw8e1GXltVHs
zzwghUjqp2EPq0VOOQn3lp9lol5Prc3xfFHczKpO+CJW6Rpa4YVdqJmejBqJy/on
Hh/T/ST3wa2qBeAw89vZIeWiUJZZCsQ0f//+2hAbzJY45Y6DuR9vbTAPb9agRgOM
xg+YaCfpQqFc1A==
=llba
-----END PGP SIGNATURE-----
Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry updates from Thomas Gleixner:
"The x86 entry, exception and interrupt code rework
This all started about 6 month ago with the attempt to move the Posix
CPU timer heavy lifting out of the timer interrupt code and just have
lockless quick checks in that code path. Trivial 5 patches.
This unearthed an inconsistency in the KVM handling of task work and
the review requested to move all of this into generic code so other
architectures can share.
Valid request and solved with another 25 patches but those unearthed
inconsistencies vs. RCU and instrumentation.
Digging into this made it obvious that there are quite some
inconsistencies vs. instrumentation in general. The int3 text poke
handling in particular was completely unprotected and with the batched
update of trace events even more likely to expose to endless int3
recursion.
In parallel the RCU implications of instrumenting fragile entry code
came up in several discussions.
The conclusion of the x86 maintainer team was to go all the way and
make the protection against any form of instrumentation of fragile and
dangerous code pathes enforcable and verifiable by tooling.
A first batch of preparatory work hit mainline with commit
d5f744f9a2 ("Pull x86 entry code updates from Thomas Gleixner")
That (almost) full solution introduced a new code section
'.noinstr.text' into which all code which needs to be protected from
instrumentation of all sorts goes into. Any call into instrumentable
code out of this section has to be annotated. objtool has support to
validate this.
Kprobes now excludes this section fully which also prevents BPF from
fiddling with it and all 'noinstr' annotated functions also keep
ftrace off. The section, kprobes and objtool changes are already
merged.
The major changes coming with this are:
- Preparatory cleanups
- Annotating of relevant functions to move them into the
noinstr.text section or enforcing inlining by marking them
__always_inline so the compiler cannot misplace or instrument
them.
- Splitting and simplifying the idtentry macro maze so that it is
now clearly separated into simple exception entries and the more
interesting ones which use interrupt stacks and have the paranoid
handling vs. CR3 and GS.
- Move quite some of the low level ASM functionality into C code:
- enter_from and exit to user space handling. The ASM code now
calls into C after doing the really necessary ASM handling and
the return path goes back out without bells and whistels in
ASM.
- exception entry/exit got the equivivalent treatment
- move all IRQ tracepoints from ASM to C so they can be placed as
appropriate which is especially important for the int3
recursion issue.
- Consolidate the declaration and definition of entry points between
32 and 64 bit. They share a common header and macros now.
- Remove the extra device interrupt entry maze and just use the
regular exception entry code.
- All ASM entry points except NMI are now generated from the shared
header file and the corresponding macros in the 32 and 64 bit
entry ASM.
- The C code entry points are consolidated as well with the help of
DEFINE_IDTENTRY*() macros. This allows to ensure at one central
point that all corresponding entry points share the same
semantics. The actual function body for most entry points is in an
instrumentable and sane state.
There are special macros for the more sensitive entry points, e.g.
INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
They allow to put the whole entry instrumentation and RCU handling
into safe places instead of the previous pray that it is correct
approach.
- The INT3 text poke handling is now completely isolated and the
recursion issue banned. Aside of the entry rework this required
other isolation work, e.g. the ability to force inline bsearch.
- Prevent #DB on fragile entry code, entry relevant memory and
disable it on NMI, #MC entry, which allowed to get rid of the
nested #DB IST stack shifting hackery.
- A few other cleanups and enhancements which have been made
possible through this and already merged changes, e.g.
consolidating and further restricting the IDT code so the IDT
table becomes RO after init which removes yet another popular
attack vector
- About 680 lines of ASM maze are gone.
There are a few open issues:
- An escape out of the noinstr section in the MCE handler which needs
some more thought but under the aspect that MCE is a complete
trainwreck by design and the propability to survive it is low, this
was not high on the priority list.
- Paravirtualization
When PV is enabled then objtool complains about a bunch of indirect
calls out of the noinstr section. There are a few straight forward
ways to fix this, but the other issues vs. general correctness were
more pressing than parawitz.
- KVM
KVM is inconsistent as well. Patches have been posted, but they
have not yet been commented on or picked up by the KVM folks.
- IDLE
Pretty much the same problems can be found in the low level idle
code especially the parts where RCU stopped watching. This was
beyond the scope of the more obvious and exposable problems and is
on the todo list.
The lesson learned from this brain melting exercise to morph the
evolved code base into something which can be validated and understood
is that once again the violation of the most important engineering
principle "correctness first" has caused quite a few people to spend
valuable time on problems which could have been avoided in the first
place. The "features first" tinkering mindset really has to stop.
With that I want to say thanks to everyone involved in contributing to
this effort. Special thanks go to the following people (alphabetical
order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
Vitaly Kuznetsov, and Will Deacon"
* tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
x86/entry: Force rcu_irq_enter() when in idle task
x86/entry: Make NMI use IDTENTRY_RAW
x86/entry: Treat BUG/WARN as NMI-like entries
x86/entry: Unbreak __irqentry_text_start/end magic
x86/entry: __always_inline CR2 for noinstr
lockdep: __always_inline more for noinstr
x86/entry: Re-order #DB handler to avoid *SAN instrumentation
x86/entry: __always_inline arch_atomic_* for noinstr
x86/entry: __always_inline irqflags for noinstr
x86/entry: __always_inline debugreg for noinstr
x86/idt: Consolidate idt functionality
x86/idt: Cleanup trap_init()
x86/idt: Use proper constants for table size
x86/idt: Add comments about early #PF handling
x86/idt: Mark init only functions __init
x86/entry: Rename trace_hardirqs_off_prepare()
x86/entry: Clarify irq_{enter,exit}_rcu()
x86/entry: Remove DBn stacks
x86/entry: Remove debug IDT frobbing
x86/entry: Optimize local_db_save() for virt
...
Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.
Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Warnings, bugs and stack protection fails from noinstr sections, e.g. low
level and early entry code, are likely to be fatal.
Mark them as "safe" to be invoked from noinstr protected code to avoid
annotating all usage sites. Getting the information out is important.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.376598577@linutronix.de
Pull READ/WRITE_ONCE rework from Will Deacon:
"This the READ_ONCE rework I've been working on for a while, which
bumps the minimum GCC version and improves code-gen on arm64 when
stack protector is enabled"
[ Side note: I'm _really_ tempted to raise the minimum gcc version to
4.9, so that we can just say that we require _Generic() support.
That would allow us to more cleanly handle a lot of the cases where we
depend on very complex macros with 'sizeof' or __builtin_choose_expr()
with __builtin_types_compatible_p() etc.
This branch has a workaround for sparse not handling _Generic(),
either, but that was already fixed in the sparse development branch,
so it's really just gcc-4.9 that we'd require. - Linus ]
* 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse
compiler_types.h: Optimize __unqual_scalar_typeof compilation time
compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long)
compiler-types.h: Include naked type in __pick_integer_type() match
READ_ONCE: Fix comment describing 2x32-bit atomicity
gcov: Remove old GCC 3.4 support
arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros
locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros
READ_ONCE: Drop pointer qualifiers when reading from scalar types
READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE()
arm64: csum: Disable KASAN for do_csum()
fault_inject: Don't rely on "return value" from WRITE_ONCE()
net: tls: Avoid assigning 'const' pointer to non-const pointer
netfilter: Avoid assigning 'const' pointer to non-const pointer
compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
All architectures define pte_index() as
(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)
and all architectures define pte_offset_kernel() as an entry in the array
of PTEs indexed by the pte_index().
For the most architectures the pte_offset_kernel() implementation relies
on the availability of pmd_page_vaddr() that converts a PMD entry value to
the virtual address of the page containing PTEs array.
Let's move x86 definitions of the PTE accessors to the generic place in
<linux/pgtable.h> and then simply drop the respective definitions from the
other architectures.
The architectures that didn't provide pmd_page_vaddr() are updated to have
that defined.
The generic implementation of pte_offset_kernel() can be overridden by an
architecture and alpha makes use of this because it has special ordering
requirements for its version of pte_offset_kernel().
[rppt@linux.ibm.com: v2]
Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
[rppt@linux.ibm.com: update]
Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
[rppt@linux.ibm.com: update]
Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
[akpm@linux-foundation.org: fix x86 warning]
[sfr@canb.auug.org.au: fix powerpc build]
Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define flush_icache_user_range to flush_icache_range unless the
architecture provides its own implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200515143646.3857579-21-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function currently known as flush_icache_user_range only operates on
a single page. Rename it to flush_icache_user_page as we'll need the
name flush_icache_user_range for something else soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20200515143646.3857579-20-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a magic ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE cpp symbol that
guards non-stub availability of flush_dcache_pagge. Use that to check
if flush_dcache_pagg is implemented.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200515143646.3857579-8-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This seems to lead to some crazy include loops when using
asm-generic/cacheflush.h on more architectures, so leave it to the arch
header for now.
[hch@lst.de: fix warning]
Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will@kernel.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cacheflush.h uses a somewhat to generic include guard name that clashes
with various arch files. Use a more specific one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200515143646.3857579-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to make it
safe against parallel page table manipulations without relying on an IPI for
serialisation.
- A series of fixes & enhancements to make our machine check handling more
robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions on
Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to:
Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov,
Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le
Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy,
Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand,
George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni,
Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo
Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael
Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram
Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram
Sang, Xiongfeng Wang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7aYZ8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPiKD/9zNCuZLFMAFrIdbm0HlYA2RGYZFT75
GUHsqYyei1pxA7PgM3KwJiXELVODsBv0eQbgNh1tbecKrxPRegN/cywd1KLjPZ7I
v5/qweQP8MvR0RhzjbhvUcO0jq/f8u2LbJr5mUfVzjU6tAvrvcWo3oZqDElsekCS
kgyOH3r1vZ2PLTMiGFhb0gWi2iqc+6BHU1AFCGPCMjB1Vu5d5+54VvZ/6lllGsOF
yg9CBXmmVvQ+Bn6tH4zdEB78FYxnAIwBqlbmL79i5ca+HQJ0Sw6HuPRy9XYq35p6
2EiXS4Wrgp7i7+1TN3HO362u5Onb8TSyQU7NS6yCFPoJ6JQxcJMBIw6mHhnXOPuZ
CrjgcdwUMjx8uDoKmX1Epbfuex2w+AysW+4yBHPFiSgl3klKC3D0wi95mR485w2F
rN8uzJtrDeFKcYZJG7IoB/cgFCCPKGf9HaXr8q0S/jBKMffx91ul3cfzlfdIXOCw
FDNw/+ZX7UD6ddFEG12ZTO+vdL8yf1uCRT/DIZwUiDMIA0+M6F4nc7j3lfyZfoO1
65f9UlhoLxScq7VH2fKH4UtZatO9cPID2z1CmiY4UbUIPtFDepSuYClgLF+Duf4b
rkfxhKU0+Ja1zNH5XNc+L+Bc5/W4lFiJXz02dYIjtHoUpWkc1aToOETVwzggYFNM
G3PXIBOI0jRgRw==
=o0WU
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to
make it safe against parallel page table manipulations without
relying on an IPI for serialisation.
- A series of fixes & enhancements to make our machine check handling
more robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions
on Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.
* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
powerpc/pseries: Make vio and ibmebus initcalls pseries specific
cxl: Remove dead Kconfig options
powerpc: Add POWER10 architected mode
powerpc/dt_cpu_ftrs: Add MMA feature
powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
powerpc: Add support for ISA v3.1
powerpc: Add new HWCAP bits
powerpc/64s: Don't set FSCR bits in INIT_THREAD
powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
powerpc/64s: Don't let DT CPU features set FSCR_DSCR
powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
powerpc/module_64: Consolidate ftrace code
powerpc/32: Disable KASAN with pages bigger than 16k
powerpc/uaccess: Don't set KUEP by default on book3s/32
powerpc/uaccess: Don't set KUAP by default on book3s/32
powerpc/8xx: Reduce time spent in allow_user_access() and friends
...
These are updates to SoC specific drivers that did not have
another subsystem maintainer tree to go through for some
reason:
- Some bus and memory drivers for the MIPS P5600 based
Baikal-T1 SoC that is getting added through the MIPS tree.
- There are new soc_device identification drivers for TI K3,
Qualcomm MSM8939
- New reset controller drivers for NXP i.MX8MP, Renesas
RZ/G1H, and Hisilicon hi6220
- The SCMI firmware interface can now work across ARM SMC/HVC
as a transport.
- Mediatek platforms now use a new driver for their "MMSYS"
hardware block that controls clocks and some other aspects
in behalf of the media and gpu drivers.
- Some Tegra processors have improved power management
support, including getting woken up by the PMIC and cluster
power down during idle.
- A new v4l staging driver for Tegra is added.
- Cleanups and minor bugfixes for TI, NXP, Hisilicon,
Mediatek, and Tegra.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl7XvgAACgkQmmx57+YA
GNmj/hAAnAJ/hYehLfgCe711HUntgeRkaoTVpCt8BJNMdxsa23sn3V6k5+WYn1uG
PtlgpefZEMHLUEEVDegR4nZXLG0Pzu1SR12KW34YPcQKkNo/+vlQ9zYUajnJ/KX6
10zdLSIzHfk1VtXKvvQQ8xFyE+S/trGmjC57E6gfoCUT3rl1maD+ccVXUBaz9oob
wuMxGXQAl57mio5yT1OfSk6Fev39xRE2dN1hzP7KUYhsemZajBwBBW5wVJZCsCB8
LCGmxVkavM7BV4r2NokbBDs5rlfedBl/P/IPd9Is5a5tuGUkSsVRG9zqShxYLGM3
S06az6POQFwXKFJoUKW0dK/Koy0D7BK+vhUBPzFv4HZ8iDCVf6Jju2MJ02GMqHPj
OOrXaCbLYrvN/edVUWeeFywqwMbYTRwC4DxyTq5m7HxEB004xTOhs0rX5aR0u4n1
bbsR97LguolwH9iEMzd3F3jCiKBcMecH3lAh5WcrtwlFIRrNhbWoGDoA/4TuORFS
b11rgsTRIJ5Vc++D1HnSnx0ZZvUzyluMvygdALnSgVah6xYe6KVw9Kg/wioAJ04G
uSTidqP3qRhsyET2HQo7CxdVfZbKfP25iKCKrdhQziztKvhF8qrUmZKloXOodRw+
ewYSRmv8c324OYYit1X43oAdW8dntq1XbSIauaqxEb4JC7x/xRY=
=44Jc
-----END PGP SIGNATURE-----
Merge tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM/SoC driver updates from Arnd Bergmann:
"These are updates to SoC specific drivers that did not have another
subsystem maintainer tree to go through for some reason:
- Some bus and memory drivers for the MIPS P5600 based Baikal-T1 SoC
that is getting added through the MIPS tree.
- There are new soc_device identification drivers for TI K3, Qualcomm
MSM8939
- New reset controller drivers for NXP i.MX8MP, Renesas RZ/G1H, and
Hisilicon hi6220
- The SCMI firmware interface can now work across ARM SMC/HVC as a
transport.
- Mediatek platforms now use a new driver for their "MMSYS" hardware
block that controls clocks and some other aspects in behalf of the
media and gpu drivers.
- Some Tegra processors have improved power management support,
including getting woken up by the PMIC and cluster power down
during idle.
- A new v4l staging driver for Tegra is added.
- Cleanups and minor bugfixes for TI, NXP, Hisilicon, Mediatek, and
Tegra"
* tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (155 commits)
clk: sprd: fix compile-testing
bus: bt1-axi: Build the driver into the kernel
bus: bt1-apb: Build the driver into the kernel
bus: bt1-axi: Use sysfs_streq instead of strncmp
bus: bt1-axi: Optimize the return points in the driver
bus: bt1-apb: Use sysfs_streq instead of strncmp
bus: bt1-apb: Use PTR_ERR_OR_ZERO to return from request-regs method
bus: bt1-apb: Fix show/store callback identations
bus: bt1-apb: Include linux/io.h
dt-bindings: memory: Add Baikal-T1 L2-cache Control Block binding
memory: Add Baikal-T1 L2-cache Control Block driver
bus: Add Baikal-T1 APB-bus driver
bus: Add Baikal-T1 AXI-bus driver
dt-bindings: bus: Add Baikal-T1 APB-bus binding
dt-bindings: bus: Add Baikal-T1 AXI-bus binding
staging: tegra-video: fix V4L2 dependency
tee: fix crypto select
drivers: soc: ti: knav_qmss_queue: Make knav_gp_range_ops static
soc: ti: add k3 platforms chipid module driver
dt-bindings: soc: ti: add binding for k3 platforms chipid module
...
There are no architectures that use include/asm-generic/5level-fixup.h
therefore it can be removed along with __ARCH_HAS_5LEVEL_HACK define and
the code it surrounds
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: James Morse <james.morse@arm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200414153455.21744-15-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge more updates from Andrew Morton:
"More mm/ work, plenty more to come
Subsystems affected by this patch series: slub, memcg, gup, kasan,
pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
thp, mmap, kconfig"
* akpm: (131 commits)
arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
riscv: support DEBUG_WX
mm: add DEBUG_WX support
drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
powerpc/mm: drop platform defined pmd_mknotpresent()
mm: thp: don't need to drain lru cache when splitting and mlocking THP
hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
sparc32: register memory occupied by kernel as memblock.memory
include/linux/memblock.h: fix minor typo and unclear comment
mm, mempolicy: fix up gup usage in lookup_node
tools/vm/page_owner_sort.c: filter out unneeded line
mm: swap: memcg: fix memcg stats for huge pages
mm: swap: fix vmstats for huge pages
mm: vmscan: limit the range of LRU type balancing
mm: vmscan: reclaim writepage is IO cost
mm: vmscan: determine anon/file pressure balance at the reclaim root
mm: balance LRU lists based on relative thrashing
mm: only count actual rotations as LRU reclaim cost
...
Patch series "mm/hugetlb: Add some new generic fallbacks", v3.
This series adds the following new generic fallbacks. Before that it
drops __HAVE_ARCH_HUGE_PTEP_GET from arm64 platform.
1. is_hugepage_only_range()
2. arch_clear_hugepage_flags()
After this arm (32 bit) remains the sole platform defining it's own
huge_ptep_get() via __HAVE_ARCH_HUGE_PTEP_GET.
This patch (of 3):
Platform specific huge_ptep_get() is required only when fetching the huge
PTE involves more than just dereferencing the page table pointer. This is
not the case on arm64 platform. Hence huge_ptep_pte() can be dropped
along with it's __HAVE_ARCH_HUGE_PTEP_GET subscription. Before that, it
updates the generic huge_ptep_get() with READ_ONCE() which will prevent
known page table issues with THP on arm64.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1588907271-11920-1-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r//1506527369-19535-1-git-send-email-will.deacon@arm.com/
Link: http://lkml.kernel.org/r/1588907271-11920-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested virtualization
- Nested AMD event injection facelift, building on the rework of generic code
and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page fault
work, will come next week.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
=v7Wn
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested
virtualization
- Nested AMD event injection facelift, building on the rework of
generic code and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch
with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host
side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page
fault work, will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
KVM: check userspace_addr for all memslots
KVM: selftests: update hyperv_cpuid with SynDBG tests
x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
x86/kvm/hyper-v: Add support for synthetic debugger interface
x86/hyper-v: Add synthetic debugger definitions
KVM: selftests: VMX preemption timer migration test
KVM: nVMX: Fix VMX preemption timer migration
x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
KVM: x86/pmu: Support full width counting
KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
KVM: x86: acknowledgment mechanism for async pf page ready notifications
KVM: x86: interrupt based APF 'page ready' event delivery
KVM: introduce kvm_read_guest_offset_cached()
KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
KVM: VMX: Replace zero-length array with flexible-array
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl7WhbkTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXlUnB/0R8dBVSeRfNmyJaadBWKFc/LffwKLD
CQ8PVv22ffkCaEYV2tpnhS6NmkERLNdson4Uo02tVUsjOJ4CrWHTn7aKqYWZyA+O
qv/PiD9TBXJVYMVP2kkyaJlK5KoqeAWBr2kM16tT0cxQmlhE7g0Xo2wU9vhRbU+4
i4F0jffe4lWps65TK392CsPr6UEv1HSel191Py5zLzYqChT+L8WfahmBt3chhsV5
TIUJYQvBwxecFRla7yo+4sUn37ZfcTqD1hCWSr0zs4psW0ge7d80kuaNZS+EqxND
fGm3Bp1BlUuDKsJ/D+AaHLCR47PUZ9t9iMDjZS/ovYglLFwi+h3tAV+W
=LwVR
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyper-v updates from Wei Liu:
- a series from Andrea to support channel reassignment
- a series from Vitaly to clean up Vmbus message handling
- a series from Michael to clean up and augment hyperv-tlfs.h
- patches from Andy to clean up GUID usage in Hyper-V code
- a few other misc patches
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (29 commits)
Drivers: hv: vmbus: Resolve more races involving init_vp_index()
Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug
vmbus: Replace zero-length array with flexible-array
Driver: hv: vmbus: drop a no long applicable comment
hyper-v: Switch to use UUID types directly
hyper-v: Replace open-coded variant of %*phN specifier
hyper-v: Supply GUID pointer to printf() like functions
hyper-v: Use UUID API for exporting the GUID (part 2)
asm-generic/hyperv: Add definitions for Get/SetVpRegister hypercalls
x86/hyperv: Split hyperv-tlfs.h into arch dependent and independent files
x86/hyperv: Remove HV_PROCESSOR_POWER_STATE #defines
KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page
drivers: hv: remove redundant assignment to pointer primary_channel
scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type
Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic
PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality
Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal
hv_utils: Always execute the fcopy and vss callbacks in a tasklet
...
- added support for MIPSr5 and P5600 cores
- converted Loongson PCI driver into a PCI host driver using the generic
PCI framework
- added emulation of CPUCFG command for Loogonson64 cpus
- removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA
- ioremap cleanup
- fix for a race between two threads faulting the same page
- various cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl7WK54aHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAbjA/9EEFeqNg9UNUH6/TS18QV
qkxKp0+LC4Jk+SduzLyYsYy6l/dSaKYl8m9jyJsWjM6BvBZTcMJJOnzIPRafI0s+
MK8GCSZunAkm25DsDvfobQUkbQ/UHjY/fuRpNslbDcsYqIKv90hUMd21ccXY6KC5
RY+aMlpjgksg1X8JJ7k1Rs05sXyUPqpESteyqehF1b/+Iyv7H2L3v5EvQwvPDs6f
TyVgNJU2B3RCU6/uAcWmHdVLxXd+Y8fM0vC8DCO0pg0rGf4be0FbZztHmeq6r2wy
g7wsO7acKWGzulFQD5ftVSQ6i8KHIDNDePmDMtU5oFcXkzUDdGvd3j3Gst19/nve
ZftNmQHOY1JqGUOhdq1fDG/4M3Vc5bvh3W6eMG22TuMLEWsOF8teY8uUa/vxOb+B
2NsJ9q6ylRS7RDWWOrApJWfFYPvhr5wlLxT+azWNa9y3bjV8vDLjNdU0mRLA1nsu
yLzYMwIhtWfZhkJZ+xJVSmQ6LjAHDN5TF/LEx/9itLg5t9wrEosFPAtOv8V15hy4
KBNvvWeoy7RRmBTNuKh7r9Ui4jw7GgxL4D1OwzCsF//GAiGyuuh0zMuUE8EXA6K5
MpdGt+bSOcLl8ILTtGir8e4MXLawDH8n94f8QWLb9FcOvU4KHUjRKU7EQ6dyD5dk
a7xskGLXWdVO3IJ/Xvxcaeo=
=eAtN
-----END PGP SIGNATURE-----
Merge tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- added support for MIPSr5 and P5600 cores
- converted Loongson PCI driver into a PCI host driver using the
generic PCI framework
- added emulation of CPUCFG command for Loogonson64 cpus
- removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA
- ioremap cleanup
- fix for a race between two threads faulting the same page
- various cleanups and fixes
* tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (143 commits)
MIPS: ralink: drop ralink_clk_init for mt7621
MIPS: ralink: bootrom: mark a function as __init to save some memory
MIPS: Loongson64: Reorder CPUCFG model match arms
MIPS: Expose Loongson CPUCFG availability via HWCAP
MIPS: Loongson64: Guard against future cores without CPUCFG
MIPS: Fix build warning about "PTR_STR" redefinition
MIPS: Loongson64: Remove not used pci.c
MIPS: Loongson64: Define PCI_IOBASE
MIPS: CPU_LOONGSON2EF need software to maintain cache consistency
MIPS: DTS: Fix build errors used with various configs
MIPS: Loongson64: select NO_EXCEPT_FILL
MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe()
MIPS: mm: add page valid judgement in function pte_modify
mm/memory.c: Add memory read privilege on page fault handling
mm/memory.c: Update local TLB if PTE entry exists
MIPS: Do not flush tlb page when updating PTE entry
MIPS: ingenic: Default to a generic board
MIPS: ingenic: Add support for GCW Zero prototype
MIPS: ingenic: DTS: Add memory info of GCW Zero
MIPS: Loongson64: Switch to generic PCI driver
...
Merge updates from Andrew Morton:
"A few little subsystems and a start of a lot of MM patches.
Subsystems affected by this patch series: squashfs, ocfs2, parisc,
vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
swap, memcg, pagemap, memory-failure, vmalloc, kasan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits)
kasan: move kasan_report() into report.c
mm/mm_init.c: report kasan-tag information stored in page->flags
ubsan: entirely disable alignment checks under UBSAN_TRAP
kasan: fix clang compilation warning due to stack protector
x86/mm: remove vmalloc faulting
mm: remove vmalloc_sync_(un)mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
x86/mm/64: implement arch_sync_kernel_mappings()
mm/ioremap: track which page-table levels were modified
mm/vmalloc: track which page-table levels were modified
mm: add functions to track page directory modifications
s390: use __vmalloc_node in stack_alloc
powerpc: use __vmalloc_node in alloc_vm_stack
arm64: use __vmalloc_node in arch_alloc_vmap_stack
mm: remove vmalloc_user_node_flags
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove __vmalloc_node_flags_caller
mm: remove both instances of __vmalloc_node_flags
mm: remove the prot argument to __vmalloc_node
mm: remove the pgprot argument to __vmalloc
...
Patch series "mm: Get rid of vmalloc_sync_(un)mappings()", v3.
After the recent issue with vmalloc and tracing code[1] on x86 and a
long history of previous issues related to the vmalloc_sync_mappings()
interface, I thought the time has come to remove it. Please see [2],
[3], and [4] for some other issues in the past.
The patches add tracking of page-table directory changes to the vmalloc
and ioremap code. Depending on which page-table levels changes have
been made, a new per-arch function is called:
arch_sync_kernel_mappings().
On x86-64 with 4-level paging, this function will not be called more
than 64 times in a systems runtime (because vmalloc-space takes 64 PGD
entries which are only populated, but never cleared).
As a side effect this also allows to get rid of vmalloc faults on x86,
making it safe to touch vmalloc'ed memory in the page-fault handler.
Note that this potentially includes per-cpu memory.
This patch (of 7):
Add page-table allocation functions which will keep track of changed
directory entries. They are needed for new PGD, P4D, PUD, and PMD
entries and will be used in vmalloc and ioremap code to decide whether
any changes in the kernel mappings need to be synchronized between
page-tables in the system.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20200515140023.25469-1-joro@8bytes.org
Link: http://lkml.kernel.org/r/20200515140023.25469-2-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull uaccess/csum updates from Al Viro:
"Regularize the sitation with uaccess checksum primitives:
- fold csum_partial_... into csum_and_copy_..._user()
- on x86 collapse several access_ok()/stac()/clac() into
user_access_begin()/user_access_end()"
* 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
default csum_and_copy_to_user(): don't bother with access_ok()
take the dummy csum_and_copy_from_user() into net/checksum.h
arm: switch to csum_and_copy_from_user()
sh32: convert to csum_and_copy_from_user()
m68k: convert to csum_and_copy_from_user()
xtensa: switch to providing csum_and_copy_from_user()
sparc: switch to providing csum_and_copy_from_user()
parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user()
x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}()
x86: switch both 32bit and 64bit to providing csum_and_copy_from_user()
x86_64: csum_..._copy_..._user(): switch to unsafe_..._user()
get rid of csum_partial_copy_to_user()
logic, instead of the current per debug facility blacklist, use the more generic
.noinstr.text approach, combined with a 'noinstr' marker for functions.
Also add instrumentation_begin()/end() to better manage the exact place in entry
code where instrumentation may be used.
Also add a kprobes blacklist for modules.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7U/KERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1h6xg//bnWhJzrxlOr89d7c5pEUeZehTscZ4OxU
HyiWnfgd6bHJGHiB8TRHZInJFys/Y0UG+xzQvCP2YCIHW42tguD3u0wQ1rOrA6im
VkDxUwHn72avqnBq+knMwtqiKQjxJrPe+YpikWOgb4B+9jQwLARzTArhs+aoWBRn
a9jRP1jcuS26F/9wxctFoHVvKZ7Vv+HCgtNzequHsd1e0J8ElvDRk+QkfkaZopl5
cQ44TIfzR8xjJuGqW45hXwOw5PPjhZHwytSoFquSMb57txoWL2devn7S38VaCWv7
/fqmQAnQqlW5eG5ipJ0zWY1n0uLZLRrIecfA1INY8fdJeFFr6cxaN6FM1GhVZ93I
GjZZFYwxDv9IftpeSyCaIzF1zISV+as3r9sMKMt89us77XazRiobjWCi1aE9a1rX
QRv1nTjmypWg65IMV+nfIT26riP6YXSZ3uXQJPwm+kzEjJJl0LSi2AfjWQadcHeZ
Z8svSIepP4oJBJ9tJlZ3K7kHBV3E0G4SV3fnHaUYGrp9gheqhe33U0VWfILcvq7T
zIhtZXzqRGaMKuw0IFy2xITCQyEZAXwTedtSSeyXt0CN/hwhaxbrd38HhKOBw8WH
k+OAmXZ+lgSO5ZvkoxgV6QgHtjsif3ICcHNelJtcbRA80/3oj/QwJ5dAVR61EDZa
3Jn8mMxvCn0=
=25Vr
-----END PGP SIGNATURE-----
Merge tag 'core-kprobes-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kprobes updates from Ingo Molnar:
"Various kprobes updates, mostly centered around cleaning up the
no-instrumentation logic.
Instead of the current per debug facility blacklist, use the more
generic .noinstr.text approach, combined with a 'noinstr' marker for
functions.
Also add instrumentation_begin()/end() to better manage the exact
place in entry code where instrumentation may be used.
And add a kprobes blacklist for modules"
* tag 'core-kprobes-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Prevent probes in .noinstr.text section
vmlinux.lds.h: Create section for protection against instrumentation
samples/kprobes: Add __kprobes and NOKPROBE_SYMBOL() for handlers.
kprobes: Support NOKPROBE_SYMBOL() in modules
kprobes: Support __kprobes blacklist in modules
kprobes: Lock kprobe_mutex while showing kprobe_blacklist
now that can be done conveniently - all non-trivial cases have
_HAVE_ARCH_COPY_AND_CSUM_FROM_USER defined, so the fallback in
net/checksum.h is used only for dummy (copy_from_user, then
csum_partial) implementation. Allowing us to get rid of all
dummy instances, both of csum_and_copy_from_user() and
csum_partial_copy_from_user().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Here add pte_sw_mkyoung function to make page readable on MIPS
platform during page fault handling. This patch improves page
fault latency about 10% on my MIPS machine with lmbench
lat_pagefault case.
It is noop function on other arches, there is no negative
influence on those architectures.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
If two threads concurrently fault at the same page, the thread that
won the race updates the PTE and its local TLB. For now, the other
thread gives up, simply does nothing, and continues.
It could happen that this second thread triggers another fault, whereby
it only updates its local TLB while handling the fault. Instead of
triggering another fault, let's directly update the local TLB of the
second thread. Function update_mmu_tlb is used here to update local
TLB on the second thread, and it is defined as empty on other arches.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
- Generate consistent behaviour for logic_pio by defining and using
generic _inX() and _outX() in asm-generic/io.h which have per-arch
overrideable barriers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJevjd1AAoJEAvIV27ZiWZcyoUQAIDN4NeFM7lIpjDCNgS3P7W6
WG+5EP/BHdnnCBarRYqaZR2B2pnwiTkHKMkDpe8aoaE51udT9bJA8dp162V2eEy0
j/0rOOxR14k2Nj+BoNpsHbJllPQLVPkXehOpwsUhm1qGiE5w8nwZ4iG/BaW97Nbj
TcPYnesviuRf1ghNLNyjYmkuT2zou7shJqCpKHt58o+O3S3RT7c+fbbfBy+DVSmx
3yBersTVOklHQIqCuGPFzy/+qtMQh1iAiRVefkrYG/WOyare4Rc8DZfs0hf8Ljdr
AJdIGBB/12H+jPUn3wdpdi0vrJ6ClUaacRvGz3Ba30eMDRFsH9ZStwWZXIO8Zoz9
wl7kOFwX8UaHO881jvF/LTSYm+7cs/FfCV4MLCHQBx2zepxxQwM3/LAvV5zgVUA0
ZUqgUVeEOTS/WTEE8420oHtL7p/zidG1K1cZSZoh727xoBsC9r5aGw14ceLK+Y9j
FknxIhqITdO9BRp5vkBgQwBMDDK5SkBFdB3ZT2oZkXYzJzFxD2BHQAsFNcR7oaio
zyQVPL7RdK40EN/WASF69m/SB4uefPPxvvSUvP77ucjMdvF006CPE+H8oS+OWZPr
HUhp0rLZNP3bgYv2R4Ln+1CzOS8Llfb8o+tqg/C0qWFp5kadW6469xitkvlppBJA
uBLmjjElHLg4mWG9vhvf
=kbAg
-----END PGP SIGNATURE-----
Merge tag 'hisi-drivers-for-5.8' of git://github.com/hisilicon/linux-hisi into arm/drivers
ARM64: hisi: SoC driver updates for 5.8
- Generate consistent behaviour for logic_pio by defining and using
generic _inX() and _outX() in asm-generic/io.h which have per-arch
overrideable barriers.
* tag 'hisi-drivers-for-5.8' of git://github.com/hisilicon/linux-hisi:
logic_pio: Use _inX() and _outX()
logic_pio: Improve macro argument name
io: Provide _inX() and _outX()
Add definitions for GetVpRegister and SetVpRegister hypercalls, which
are implemented for both x86 and ARM64.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200422195737.10223-5-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
In preparation for adding ARM64 support, split hyperv-tlfs.h into
architecture dependent and architecture independent files, similar
to what has been done with mshyperv.h. Move architecture independent
definitions into include/asm-generic/hyperv-tlfs.h. The split will
avoid duplicating significant lines of code in the ARM64 version of
hyperv-tlfs.h. The split has no functional impact.
Some of the common definitions have "X64" in the symbol name. Change
these to remove the "X64" in the architecture independent version of
hyperv-tlfs.h, but add aliases with the "X64" in the x86 version so
that x86 code will continue to compile. A later patch set will
change all the references and allow removal of the aliases.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200422195737.10223-4-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Some code pathes, especially the low level entry code, must be protected
against instrumentation for various reasons:
- Low level entry code can be a fragile beast, especially on x86.
- With NO_HZ_FULL RCU state needs to be established before using it.
Having a dedicated section for such code allows to validate with tooling
that no unsafe functions are invoked.
Add the .noinstr.text section and the noinstr attribute to mark
functions. noinstr implies notrace. Kprobes will gain a section check
later.
Provide also a set of markers: instrumentation_begin()/end()
These are used to mark code inside a noinstr function which calls
into regular instrumentable text section as safe.
The instrumentation markers are only active when CONFIG_DEBUG_ENTRY is
enabled as the end marker emits a NOP to prevent the compiler from merging
the annotation points. This means the objtool verification requires a
kernel compiled with this option.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.075416272@linutronix.de
Since commit a7851aa54c ("io: change outX() to have their own IO barrier
overrides") and commit 87fe2d543f ("io: change inX() to have their own
IO barrier overrides"), the outX and inX functions have memory barriers
which can be overridden.
However, the generic logic_pio lib has continued to use readl/writel et al
for IO port accesses, which has weaker barriers on arm64.
Provide generic _inX() and _outX(), which can be used by logic pio.
For consistency, we check for !defined({in,out}X) && !defined(_{in,out}X),
for defining _{in,out}X, while a check for just !defined({in,out}X) should
suffice.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
As the bug report [1] pointed out, <linux/vermagic.h> must be included
after <linux/module.h>.
I believe we should not impose any include order restriction. We often
sort include directives alphabetically, but it is just coding style
convention. Technically, we can include header files in any order by
making every header self-contained.
Currently, arch-specific MODULE_ARCH_VERMAGIC is defined in
<asm/module.h>, which is not included from <linux/vermagic.h>.
Hence, the straight-forward fix-up would be as follows:
|--- a/include/linux/vermagic.h
|+++ b/include/linux/vermagic.h
|@@ -1,5 +1,6 @@
| /* SPDX-License-Identifier: GPL-2.0 */
| #include <generated/utsrelease.h>
|+#include <linux/module.h>
|
| /* Simply sanity version stamp for modules. */
| #ifdef CONFIG_SMP
This works enough, but for further cleanups, I split MODULE_ARCH_VERMAGIC
definitions into <asm/vermagic.h>.
With this, <linux/module.h> and <linux/vermagic.h> will be orthogonal,
and the location of MODULE_ARCH_VERMAGIC definitions will be consistent.
For arc and ia64, MODULE_PROC_FAMILY is only used for defining
MODULE_ARCH_VERMAGIC. I squashed it.
For hexagon, nds32, and xtensa, I removed <asm/modules.h> entirely
because they contained nothing but MODULE_ARCH_VERMAGIC definition.
Kbuild will automatically generate <asm/modules.h> at build-time,
wrapping <asm-generic/module.h>.
[1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnic
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Passing volatile-qualified pointers to the asm-generic implementations of
the load-acquire macros results in a re-load from the stack due to the
temporary result variable inheriting the volatile semantics thanks to the
use of 'typeof()'.
Define these temporary variables using 'unqual_scalar_typeof' to drop
the volatile qualifier in the case that they are scalar types.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl6ViNsTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXuXIB/4nuYRCt4d/XaeHF6dCWU45ThG+tNs7
p/OnBPZmknI0SnZ4uR/XW5caHEFj7g9ndYh+M1afZ/zKdsc+syMSDT5XhuhC/GKV
fQRW0qO8N+IAqXbLzJxyBg6fH2anwfe3w2uy2cKDEZk6d4FD5atTWhRY6R4ISq0l
g7pUyvQN1q+G6KH2snmOaZL8mybFkbHrmwtAZzcjzdzqasdLFiQB8EEFkONG66t9
HeNTyUF0mnbGBIePQLSZSHLj5p4yHG/9pa3jgqO5dsmIdsBvoaVNqEi3pCm1s/5n
BH9FWn6fTwpcKvtF385yzBiFFlzBVgXbetxuSmxxOkWW4P+db5B/GL2Y
=fjSF
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- a series from Tianyu Lan to fix crash reporting on Hyper-V
- three miscellaneous cleanup patches
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/Hyper-V: Report crash data in die() when panic_on_oops is set
x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set
x86/Hyper-V: Report crash register data or kmsg before running crash kernel
x86/Hyper-V: Trigger crash enlightenment only once during system crash.
x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump
x86/Hyper-V: Unload vmbus channel in hv panic callback
x86: hyperv: report value of misc_features
hv_debugfs: Make hv_debug_root static
hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6TbaUeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgkH/iWpiKvosA20HJjC
rBqYeJPxQsgZTuBieWJ+MeVxbpcF7RlM4c+glyvg3QJhHwIEG58dl6LBrQbAyBAR
aFHNojr1iAYOruVCGnU3pA008YZiwUIDv/ZQ4DF8fmIU2vI2mJ6qHBv3XDl4G2uR
Nwz8Eu9AgIwZM5coomVOSmoWyFy7Vxmb7W+3t5VmKsvOWx4ib9kyQtOIkvQDEl7j
XCbWfI0xDQr6LFOm4jnCi5R/LhJ2LIqqIvHHrunbpszM8IwK797jCXz4im+dmd5Y
+km46N7a8pDqri36xXz1gdBAU3eG7Pt1NyvfjwRVTdX4GquQ2MT0GoojxbLxUP3y
3pEsQuE=
=whbL
-----END PGP SIGNATURE-----
Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh
Resolve these conflicts:
arch/x86/Kconfig
arch/x86/kernel/Makefile
Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines
in the Kconfig to reduce the probability of future conflicts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When oops happens with panic_on_oops unset, the oops
thread is killed by die() and system continues to run.
In such case, guest should not report crash register
data to host since system still runs. Check panic_on_oops
and return directly in hyperv_report_panic() when the function
is called in the die() and panic_on_oops is unset. Fix it.
Fixes: 7ed4325a44 ("Drivers: hv: vmbus: Make panic reporting to be more useful")
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200406155331.2105-7-Tianyu.Lan@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Accurate userfaultfd WP tracking is possible by tracking exactly which
virtual memory ranges were writeprotected by userland. We can't relay
only on the RW bit of the mapped pagetable because that information is
destroyed by fork() or KSM or swap. If we were to relay on that, we'd
need to stay on the safe side and generate false positive wp faults for
every swapped out page.
[peterx@redhat.com: append _PAGE_UFD_WP to _PAGE_CHG_MASK]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Core and userspace API:
- The userspace API KFIFOs have been imoproved with locks that
do not block interrupts. This makes us better at getting
events to userspace without blocking or disturbing new events
arriving in the same time. This was reviewed by the KFIFO
maintainer Stefani. This is a generic improvement which
paves the road for similar improvements in other subsystems.
- We provide a new ioctl() for monitoring changes in the line
information, such as when multiple clients are taking lines
and giving them back, possibly reconfiguring them in the
process: we can now monitor that and not get stuck with stale
static information.
- An example tool 'gpio-watch' is provided to showcase this
functionality.
- Timestamps for events are switched to ktime_get_ns() which is
monotonic. We previously had a 'realtime' stamp which could
move forward and *backward* in time, which probably would just
cause silent bugs and weird behaviour. In the long run we
see two relevant timestamps: ktime_get_ns() or the timestamp
sometimes provided by the GPIO hardware itself, if that
exists.
- Device Tree overlay support for GPIO hogs. On systems that
load overlays, these overlays can now contain hogs, and will
then be respected.
- Handle pin control interaction with nonexisting pin ranges
in the GPIO library core instead of in the individual
drivers.
New drivers:
- New driver for the Mellanox BlueField 2 GPIO controller.
Driver improvements:
- Introduce the BGPIOF_NO_SET_ON_INPUT flag to the generic
MMIO GPIO library and use this flag in the MT7621 driver.
- Texas Instruments OMAP CPU power management improvements,
such as blocking of idle on pending GPIO interrupts.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl6IWUAACgkQQRCzN7AZ
XXPReQ//WUHALqbLOL/DZR8kVyygVZXV8JC4jcuMQgX+Mvm1DQuGp1T8VPlG1wC7
vy5eTz3oPc+jAaHGQgGiSGdPRQtDkAW5+Wo4hzcHJ5ATlzT7cgNywK3Jk6+YlPlF
CuhHB6aPEymQ45/GseUwEb5AuWWKS03P3XdOBj2IdDhuRHRfyjHi+31s0lh6bfNs
kC2eP+VtddVqLbOdPE1GKjZi+Iq9ffFRmiNTduHbl3o6ZoE6WnMvn2wsC6zPpJr4
AM8zpj9JFUq1DYaQf8bJJLTm8lItcyyUUhq8oKUxBpecjWfhUYwEYHGxqWIk9fbM
oyJ3zBlU0G7bc4gu1VG3cTacUf//BTmQG3iZNSSw9w17I1hOMX9wNJan22VkEbJD
8X+6AcgkldNwnAVWcVgg1lkfgdxlWURMYXl1uQPQZSj2SQL6gFcdkAhKwlgl8jOi
RctUfZ69tD+4ul2b+MhL6BtIk16o2RIBJF53ESxrR4qftRZ3ywW4JQ/AwNoQalxM
Xg8U3axFm8kitOzemVoXGr/AIee9KpcA4NK8yQLEPmSFgBaVzaVlP9Q7Klr7QGXg
RvK45uRULrWnvSyoVY7Ox+ie1NiG8JAbaxKBxkV8n+z+gVMh82xcyQX+5SgWxwW0
dzeEx1B0mCW4WDAK29CcdQVE6m4K2u/fabhZq5IhgS1mQoTC894=
=EFm/
-----END PGP SIGNATURE-----
Merge tag 'gpio-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO development for the v5.7 kernel cycle.
Core and userspace API:
- The userspace API KFIFOs have been imoproved with locks that do not
block interrupts. This makes us better at getting events to
userspace without blocking or disturbing new events arriving in the
same time. This was reviewed by the KFIFO maintainer Stefani. This
is a generic improvement which paves the road for similar
improvements in other subsystems.
- We provide a new ioctl() for monitoring changes in the line
information, such as when multiple clients are taking lines and
giving them back, possibly reconfiguring them in the process: we
can now monitor that and not get stuck with stale static
information.
- An example tool 'gpio-watch' is provided to showcase this
functionality.
- Timestamps for events are switched to ktime_get_ns() which is
monotonic. We previously had a 'realtime' stamp which could move
forward and *backward* in time, which probably would just cause
silent bugs and weird behaviour. In the long run we see two
relevant timestamps: ktime_get_ns() or the timestamp sometimes
provided by the GPIO hardware itself, if that exists.
- Device Tree overlay support for GPIO hogs. On systems that load
overlays, these overlays can now contain hogs, and will then be
respected.
- Handle pin control interaction with nonexisting pin ranges in the
GPIO library core instead of in the individual drivers.
New drivers:
- New driver for the Mellanox BlueField 2 GPIO controller.
Driver improvements:
- Introduce the BGPIOF_NO_SET_ON_INPUT flag to the generic MMIO GPIO
library and use this flag in the MT7621 driver.
- Texas Instruments OMAP CPU power management improvements, such as
blocking of idle on pending GPIO interrupts"
* tag 'gpio-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (59 commits)
Revert "gpio: eic-sprd: Use devm_platform_ioremap_resource()"
pinctrl: Unconditionally assign .request()/.free()
gpio: Unconditionally assign .request()/.free()
gpio: export of_pinctrl_get to modules
pinctrl: Define of_pinctrl_get() dummy for !PINCTRL
gpio: Rename variable in core APIs
gpio: Avoid using pin ranges with !PINCTRL
gpiolib: Remove unused gpio_chip parameter from gpio_set_bias()
gpiolib: Pass gpio_desc to gpio_set_config()
gpiolib: Introduce gpiod_set_config()
tools: gpio: Fix out-of-tree build regression
gpio: gpiolib: fix a doc warning
gpio: tegra186: Add Tegra194 pin ranges for GG.0 and GG.1
gpio: tegra186: Add support for pin ranges
gpio: Support GPIO controllers without pin-ranges
ARM: integrator: impd1: Use GPIO_LOOKUP() helper macro
gpio: brcmstb: support gpio-line-names property
tools: gpio: Fix typo in gpio-utils
tools: gpio-hammer: Apply scripts/Lindent and retain good changes
gpiolib: gpio_name_to_desc: factor out !name check
...
Change a header to mandatory-y if both of the following are met:
[1] At least one architecture (except um) specifies it as generic-y in
arch/*/include/asm/Kbuild
[2] Every architecture (except um) either has its own implementation
(arch/*/include/asm/*.h) or specifies it as generic-y in
arch/*/include/asm/Kbuild
This commit was generated by the following shell script.
----------------------------------->8-----------------------------------
arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d')
tmpfile=$(mktemp)
grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile
find arch -path 'arch/*/include/asm/Kbuild' |
xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u |
while read header
do
mandatory=yes
for arch in $arches
do
if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild &&
! [ -f arch/$arch/include/asm/$header ]; then
mandatory=no
break
fi
done
if [ "$mandatory" = yes ]; then
echo "mandatory-y += $header" >> $tmpfile
for arch in $arches
do
sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild
done
fi
done
sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild
LANG=C sort $tmpfile >> include/asm-generic/Kbuild
----------------------------------->8-----------------------------------
One obvious benefit is the diff stat:
25 files changed, 52 insertions(+), 557 deletions(-)
It is tedious to list generic-y for each arch that needs it.
So, mandatory-y works like a fallback default (by just wrapping
asm-generic one) when arch does not have a specific header
implementation.
See the following commits:
def3f7cefea1b39bae16
It is tedious to convert headers one by one, so I processed by a shell
script.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200210175452.5030-1-masahiroy@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
"Highlights:
1) Fix the iwlwifi regression, from Johannes Berg.
2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.
3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.
4) Add TTL decrement action to openvswitch, from Matteo Croce.
5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.
6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.
7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.
8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.
9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.
10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.
11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.
12) Support bcmgenet on ACPI, from Jeremy Linton.
13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.
14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.
15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.
16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.
17) Offload FIFO qdisc in mlxsw, from Petr Machata.
18) Support UDP sockets in sockmap, from Lorenz Bauer.
19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.
20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.
21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.
22) Add support for hw offload of MACSEC, from Antoine Tenart.
23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.
24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.
25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...
Pull x86 build updates from Ingo Molnar:
"A handful of updates: two linker script cleanups and a stock
defconfig+allmodconfig bootability fix"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Discard .note.gnu.property sections in vDSO
x86, vmlinux.lds: Add RUNTIME_DISCARD_EXIT to generic DISCARDS
x86/Kconfig: Make CMDLINE_OVERRIDE depend on non-empty CMDLINE
Core:
- Consolidation of the vDSO build infrastructure to address the
difficulties of cross-builds for ARM64 compat vDSO libraries by
restricting the exposure of header content to the vDSO build.
This is achieved by splitting out header content into separate
headers. which contain only the minimaly required information which is
necessary to build the vDSO. These new headers are included from the
kernel headers and the vDSO specific files.
- Enhancements to the generic vDSO library allowing more fine grained
control over the compiled in code, further reducing architecture
specific storage and preparing for adopting the generic library by PPC.
- Cleanup and consolidation of the exit related code in posix CPU timers.
- Small cleanups and enhancements here and there
Drivers:
- The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support
- Correct the clock rate of PIT64b global clock
- setup_irq() cleanup
- Preparation for PWM and suspend support for the TI DM timer
- Expand the fttmr010 driver to support ast2600 systems
- The usual small fixes, enhancements and cleanups all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+QETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofJ5D/94s5fpaqiuNcaAsLq2D3DRIrTnqxx7
yEeAOPcbYV1bM1SgY/M83L5yGc2S8ny787e26abwRTCZhZV3eAmRTphIFFIZR0Xk
xS+i67odscbdJTRtztKj3uQ9rFxefszRuphyaa89pwSY9nnyMWLcahGSQOGs0LJK
hvmgwPjyM1drNfPxgPiaFg7vDr2XxNATpQr/FBt+BhelvVan8TlAfrkcNPiLr++Y
Axz925FP7jMaRRbZ1acji34gLiIAZk0jLCUdbix7YkPrqDB4GfO+v8Vez+fGClbJ
uDOYeR4r1+Be/BtSJtJ2tHqtsKCcAL6agtaE2+epZq5HbzaZFRvBFaxgFNF8WVcn
3FFibdEMdsRNfZTUVp5wwgOLN0UIqE/7LifE12oLEL2oFB5H2PiNEUw3E02XHO11
rL3zgHhB6Ke1sXKPCjSGdmIQLbxZmV5kOlQFy7XuSeo5fmRapVzKNffnKcftIliF
1HNtZbgdA+3tdxMFCqoo1QX+kotl9kgpslmdZ0qHAbaRb3xqLoSskbqEjFRMuSCC
8bjJrwboD9T5GPfwodSCgqs/58CaSDuqPFbIjCay+p90Fcg6wWAkZtyG04ZLdPRc
GgNNdN4gjTD9bnrRi8cH47z1g8OO4vt4K4SEbmjo8IlDW+9jYMxuwgR88CMeDXd7
hu7aKsr2I2q/WQ==
=5o9G
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping and timer updates from Thomas Gleixner:
"Core:
- Consolidation of the vDSO build infrastructure to address the
difficulties of cross-builds for ARM64 compat vDSO libraries by
restricting the exposure of header content to the vDSO build.
This is achieved by splitting out header content into separate
headers. which contain only the minimaly required information which
is necessary to build the vDSO. These new headers are included from
the kernel headers and the vDSO specific files.
- Enhancements to the generic vDSO library allowing more fine grained
control over the compiled in code, further reducing architecture
specific storage and preparing for adopting the generic library by
PPC.
- Cleanup and consolidation of the exit related code in posix CPU
timers.
- Small cleanups and enhancements here and there
Drivers:
- The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support
- Correct the clock rate of PIT64b global clock
- setup_irq() cleanup
- Preparation for PWM and suspend support for the TI DM timer
- Expand the fttmr010 driver to support ast2600 systems
- The usual small fixes, enhancements and cleanups all over the
place"
* tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"
vdso: Fix clocksource.h macro detection
um: Fix header inclusion
arm64: vdso32: Enable Clang Compilation
lib/vdso: Enable common headers
arm: vdso: Enable arm to use common headers
x86/vdso: Enable x86 to use common headers
mips: vdso: Enable mips to use common headers
arm64: vdso32: Include common headers in the vdso library
arm64: vdso: Include common headers in the vdso library
arm64: Introduce asm/vdso/processor.h
arm64: vdso32: Code clean up
linux/elfnote.h: Replace elf.h with UAPI equivalent
scripts: Fix the inclusion order in modpost
common: Introduce processor.h
linux/ktime.h: Extract common header for vDSO
linux/jiffies.h: Extract common header for vDSO
linux/time64.h: Extract common header for vDSO
linux/time32.h: Extract common header for vDSO
linux/time.h: Extract common header for vDSO
...
Move access_ok() in and pagefault_enable()/pagefault_disable() out.
Mechanical conversion only - some instances don't really need
a separate access_ok() at all (e.g. the ones only using
get_user()/put_user(), or architectures where access_ok()
is always true); we'll deal with that in followups.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In the x86 kernel, .exit.text and .exit.data sections are discarded at
runtime, not by the linker. Add RUNTIME_DISCARD_EXIT to generic DISCARDS
and define it in the x86 kernel linker script to keep them.
The sections are added before the DISCARD directive so document here
only the situation explicitly as this change doesn't have any effect on
the generated kernel. Also, other architectures like ARM64 will use it
too so generalize the approach with the RUNTIME_DISCARD_EXIT define.
[ bp: Massage and extend commit message. ]
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200326193021.255002-1-hjl.tools@gmail.com
This switches atomic-instrumented.h to use the generic instrumentation
wrappers provided by instrumented.h.
No functional change intended.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Simplify gen_btf logic to make it work with llvm-objcopy. The existing
'file format' and 'architecture' parsing logic is brittle and does not
work with llvm-objcopy/llvm-objdump.
'file format' output of llvm-objdump>=11 will match GNU objdump, but
'architecture' (bfdarch) may not.
.BTF in .tmp_vmlinux.btf is non-SHF_ALLOC. Add the SHF_ALLOC flag
because it is part of vmlinux image used for introspection. C code
can reference the section via linker script defined __start_BTF and
__stop_BTF. This fixes a small problem that previous .BTF had the
SHF_WRITE flag (objcopy -I binary -O elf* synthesized .data).
Additionally, `objcopy -I binary` synthesized symbols
_binary__btf_vmlinux_bin_start and _binary__btf_vmlinux_bin_stop (not
used elsewhere) are replaced with more commonplace __start_BTF and
__stop_BTF.
Add 2>/dev/null because GNU objcopy (but not llvm-objcopy) warns
"empty loadable segment detected at vaddr=0xffffffff81000000, is this intentional?"
We use a dd command to change the e_type field in the ELF header from
ET_EXEC to ET_REL so that lld will accept .btf.vmlinux.bin.o. Accepting
ET_EXEC as an input file is an extremely rare GNU ld feature that lld
does not intend to support, because this is error-prone.
The output section description .BTF in include/asm-generic/vmlinux.lds.h
avoids potential subtle orphan section placement issues and suppresses
--orphan-handling=warn warnings.
Fixes: df786c9b94 ("bpf: Force .BTF section start to zero when dumping from vmlinux")
Fixes: cb0cc635c7 ("powerpc: Include .BTF section")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Stanislav Fomichev <sdf@google.com>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://github.com/ClangBuiltLinux/linux/issues/871
Link: https://lore.kernel.org/bpf/20200318222746.173648-1-maskray@google.com
The comment in 'asm-generic/bitops.h' states that you should "recode
these in the native assembly language, if at all possible". This is
pretty crappy advice now that the generic implementation is defined in
terms of atomic_long_t rather than a spinlock, so update the comment and
hopefully save future architecture maintainers a bit of work.
Reported-by: Stefan Asserhall <stefana@xilinx.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200213093927.1836-1-will@kernel.org
If the current clocksource is not VDSO capable there is no point in
updating the high resolution parts of the VDSO data.
Replace the architecture specific check with a check for a VDSO capable
clocksource and skip the update if there is none.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124403.563379423@linutronix.de
Now that all architectures are converted to use the generic storage the
helpers and conditionals can be removed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124403.470699892@linutronix.de
Inclusion of kernel.h increases the mess with the header dependencies.
Avoid kernel.h inclusion where it's possible.
Besides that, clean up a bit other inclusions inside GPIO subsystem headers.
It includes:
- removal pin control bits (forward declaration and header) from linux/gpio.h
- removal of.h from asm-generic/gpio.h
- use of explicit headers in gpio/consumer.h
- add FIXME note with regard to gpio.h inclusion in of_gpio,h
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200205134336.20197-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
- Enable CMA
- Add support for MB v11
- Defconfig updates
- Minor fixes
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQQbPNTMvXmYlBPRwx7KSWXLKUoMIQUCXjlJ1gAKCRDKSWXLKUoM
IWy9AJ4tauV9sUb+zNadrYxI+2zemRstUwCfQ49LG4kHpFCv8ldSTmhBPJY/3MI=
=QpT4
-----END PGP SIGNATURE-----
Merge tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull Microblaze update from Michal Simek:
- enable CMA
- add support for MB v11
- defconfig updates
- minor fixes
* tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Add ID for Microblaze v11
microblaze: Prevent the overflow of the start
microblaze: Wire CMA allocator
asm-generic: Make dma-contiguous.h a mandatory include/asm header
microblaze: Sync defconfig with latest Kconfig layout
microblaze: defconfig: Disable EXT2 driver and Enable EXT3 & EXT4 drivers
microblaze: Align comments with register usage
dma-continuguous.h is generic for all architectures except arm32 which has
its own version.
Similar change was done for msi.h by commit a1b39bae16
("asm-generic: Make msi.h a mandatory include/asm header")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/linux-arm-kernel/20200117080446.GA8980@lst.de/T/#m92bb56b04161057635d4142e1b3b9b6b0a70122e
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
Merge more updates from Andrew Morton:
"The rest of MM and the rest of everything else: hotfixes, ipc, misc,
procfs, lib, cleanups, arm"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (67 commits)
ARM: dma-api: fix max_pfn off-by-one error in __dma_supported()
treewide: remove redundant IS_ERR() before error code check
include/linux/cpumask.h: don't calculate length of the input string
lib: new testcases for bitmap_parse{_user}
lib: rework bitmap_parse()
lib: make bitmap_parse_user a wrapper on bitmap_parse
lib: add test for bitmap_parse()
bitops: more BITS_TO_* macros
lib/string: add strnchrnul()
proc: convert everything to "struct proc_ops"
proc: decouple proc from VFS with "struct proc_ops"
asm-generic/tlb: provide MMU_GATHER_TABLE_FREE
asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER
asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE
asm-generic/tlb: rename HAVE_RCU_TABLE_FREE
asm-generic/tlb: add missing CONFIG symbol
asm-gemeric/tlb: remove stray function declarations
asm-generic/tlb: avoid potential double flush
mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush
powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
...
As described in the comment, the correct order for freeing pages is:
1) unhook page
2) TLB invalidate page
3) free page
This order equally applies to page directories.
Currently there are two correct options:
- use tlb_remove_page(), when all page directores are full pages and
there are no futher contraints placed by things like software
walkers (HAVE_FAST_GUP).
- use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
architecture does not do IPI based TLB invalidate and has
HAVE_FAST_GUP (or software TLB fill).
This however leaves architectures that don't have page based directories
but don't need RCU in a bind. For those, provide MMU_GATHER_TABLE_FREE,
which provides the independent batching for directories without the
additional RCU freeing.
Link: http://lkml.kernel.org/r/20200116064531.483522-10-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Towards a more consistent naming scheme.
Link: http://lkml.kernel.org/r/20200116064531.483522-9-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Towards a more consistent naming scheme.
Link: http://lkml.kernel.org/r/20200116064531.483522-8-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We removed the actual functions a while ago.
Link: http://lkml.kernel.org/r/20200116064531.483522-5-aneesh.kumar@linux.ibm.com
Fixes: 1808d65b55 ("asm-generic/tlb: Remove arch_tlb*_mmu()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh reported that:
tlb_flush_mmu()
tlb_flush_mmu_tlbonly()
tlb_flush() <-- #1
tlb_flush_mmu_free()
tlb_table_flush()
tlb_table_invalidate()
tlb_flush_mmu_tlbonly()
tlb_flush() <-- #2
does two TLBIs when tlb->fullmm, because __tlb_reset_range() will not
clear tlb->end in that case.
Observe that any caller to __tlb_adjust_range() also sets at least one of
the tlb->freed_tables || tlb->cleared_p* bits, and those are
unconditionally cleared by __tlb_reset_range().
Change the condition for actually issuing TLBI to having one of those bits
set, as opposed to having tlb->end != 0.
Link: http://lkml.kernel.org/r/20200116064531.483522-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI. This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table. With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.
More details in commit d86564a2f0 ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")
The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_NO_INVALIDATE. The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate. Hence we define tlb_needs_table_invalidate to
false for sparc architecture.
Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com
Fixes: a46cc7a90f ("powerpc/mm/radix: Improve TLB/PWC flushes")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Generic page walk and ptdump", v17.
Many architectures current have a debugfs file for dumping the kernel page
tables. Currently each architecture has to implement custom functions for
this because the details of walking the page tables used by the kernel are
different between architectures.
This series extends the capabilities of walk_page_range() so that it can
deal with the page tables of the kernel (which have no VMAs and can
contain larger huge pages than exist for user space). A generic PTDUMP
implementation is the implemented making use of the new functionality of
walk_page_range() and finally arm64 and x86 are switch to using it,
removing the custom table walkers.
To enable a generic page table walker to walk the unusual mappings of the
kernel we need to implement a set of functions which let us know when the
walker has reached the leaf entry. After a suggestion from Will Deacon
I've chosen the name p?d_leaf() as this (hopefully) describes the purpose
(and is a new name so has no historic baggage). Some architectures have
p?d_large macros but this is easily confused with "large pages".
This series ends with a generic PTDUMP implemention for arm64 and x86.
Mostly this is a clean up and there should be very little functional
change. The exceptions are:
* arm64 PTDUMP debugfs now displays pages which aren't present (patch 22).
* arm64 has the ability to efficiently process KASAN pages (which
previously only x86 implemented). This means that the combination of
KASAN and DEBUG_WX is now useable.
This patch (of 23):
Exposing the pud/pgd levels of the page tables to walk_page_range() means
we may come across the exotic large mappings that come with large areas of
contiguous memory (such as the kernel's linear map).
For architectures that don't provide all p?d_leaf() macros, provide
generic do nothing default that are suitable where there cannot be leaf
pages at that level. Futher patches will add implementations for
individual architectures.
The name p?d_leaf() is chosen to minimize the confusion with existing uses
of "large" pages and "huge" pages which do not necessary mean that the
entry is a leaf (for example it may be a set of contiguous entries that
only take 1 TLB slot). For the purpose of walking the page tables we
don't need to know how it will be represented in the TLB, but we do need
to know for sure if it is a leaf of the tree.
Link: http://lkml.kernel.org/r/20191218162402.45610-2-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull percpu updates from Dennis Zhou:
"Separate out variables that can be decrypted into their own page
anytime encryption can be enabled and fix __percpu annotations in
asm-generic for sparse"
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: Separate decrypted varaibles anytime encryption can be enabled
percpu: fix __percpu annotation in asm-generic
Summary of modules changes for the 5.6 merge window:
- Add "MS" (SHF_MERGE|SHF_STRINGS) section flags to __ksymtab_strings to
indicate to the linker that it can perform string deduplication (i.e.,
duplicate strings are reduced to a single copy in the string table).
This means any repeated namespace string would be merged to just one
entry in __ksymtab_strings.
- Various code cleanups and small fixes (fix small memleak in error path,
improve moduleparam docs, silence rcu warnings, improve error logging)
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEVrp26glSWYuDNrCUwEV+OM47wXIFAl40TvwQHGpleXVAa2Vy
bmVsLm9yZwAKCRDARX44zjvBcigxD/4/ksGeXvf3tcsRc5M5S33Tws25vcHeByz/
WEX1f7ZnXukCApFdnpUbVkjiH7EM0+T6lGumv4NPJht+ggP8JoY9hMkBqMmd0js/
+R9U6o0vB4LW8zU68RwE0TS4qphpmpJz16HlhTPtIk4Vo0GBxnEYMMMcVWIeqq1W
m3KcEUudv9/Y7IFawDNRJcUWI1jD2vcfaavbU6XbTw82ARiiScZFrWYzf1PGYJ6L
XvJNwCVh8TDbS4C5kaNWp2LiGXegjKClosdisCIjkQr/3e+Rg1jOGHpa6B2+Vow2
ttq6lmcikNpcCkCV1tFz+ex2LLsLBMAO939c2C0LIhnnIxVgSkDU0pWn3psAxiOl
lRqHtQN42dRlOtBwZ9JoKTT9Wi3H/Lx0FCxg5OdblrSlOqH+GxQjBLkgtvmn/ZAh
/dReehUoqbL55GieZuPPyostg3upCDE27IQZdFrZLWbE0VGiIyU9p6GYo7Tssuo2
Tr8kmhYUF9o1AnlzVQgGgZF73PpM6vhmEnn/dipZrgFI//2A3xkAfi5JdhGLKsFi
UsaeTX3q/AmnC8dqaNayiftSgaK/4hdSboW1hgWLLD98H608s7Bl1reTmXPxSyWj
RvBVP0vp5+u9EItfkAG6jbEpM5ZtyFDUc+5KNfJhym6vaplp5H+krIrT2Li+oLUu
d/eifJ/1vA==
=boqg
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Summary of modules changes for the 5.6 merge window:
- Add "MS" (SHF_MERGE|SHF_STRINGS) section flags to __ksymtab_strings
to indicate to the linker that it can perform string deduplication
(i.e., duplicate strings are reduced to a single copy in the string
table). This means any repeated namespace string would be merged to
just one entry in __ksymtab_strings.
- Various code cleanups and small fixes (fix small memleak in error
path, improve moduleparam docs, silence rcu warnings, improve error
logging)"
* tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module.h: Annotate mod_kallsyms with __rcu
module: avoid setting info->name early in case we can fall back to info->mod->name
modsign: print module name along with error message
kernel/module: Fix memleak in module_add_modinfo_attrs()
export.h: reduce __ksymtab_strings string duplication by using "MS" section flags
moduleparam: fix kerneldoc
modules: lockdep: Suppress suspicious RCU usage warning
The generic implementation of raw_cpu_generic_add_return() is:
#define raw_cpu_generic_add_return(pcp, val) \
({ \
typeof(&(pcp)) __p = raw_cpu_ptr(&(pcp)); \
\
*__p += val; \
*__p; \
})
where the 'pcp' argument is a __percpu lvalue.
There, the variable '__p' is declared as a __percpu pointer
the type of the address of 'pcp') but:
1) the value assigned to it, the return value of raw_cpu_ptr(), is
a plain (__kernel) pointer, not a __percpu one.
2) this variable is dereferenced just after while a __percpu
pointer is implicitly __noderef.
So, fix the declaration of the 'pcp' variable to its correct type:
the plain (non-percpu) pointer corresponding to pcp's address,
using the fact that typeof() ignores the address space and the
'noderef' attribute of its agument.
Same for raw_cpu_generic_xchg(), raw_cpu_generic_cmpxchg() &
raw_cpu_generic_cmpxchg_double().
This removes 209 warnings on ARM, 525 on ARM64, 220 on x86 &
more than 2600 on ppc64 (all of them with the default config).
Cc: Tejun Heo <tj@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Reported-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Unfortunately, GCC 9.1 is expected to be be released without support for
MPX. This means that there was only a relatively small window where
folks could have ever used MPX. It failed to gain wide adoption in the
industry, and Linux was the only mainstream OS to ever support it widely.
Support for the feature may also disappear on future processors.
This set completes the process that we started during the 5.4 merge window.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJeK1/pAAoJEGg1lTBwyZKwgC8QAIiVn1d7A9Uj/WpnpgfCChCZ
9XiV6Ak999qD9fbAcrgNfPjieaD4mtokocSRVJuRgJu5iLnIJCINlozLPe4yVl7P
7zebnxkLq0CIA8d56bEUoFlC0J+oWYlDVQePZzNQsSk5KHVGXVLpF6U4vDVzZeQy
cprgvdeY+ehB7G6IIo0MWTg5ylKYAsOAyVvK8NIGpKY2k6/YqCnsptnsVE7bvlHy
TrEOiUWLv+hh0bMkZdP1PwKQKEuMO/IZly0HtviFbMN7T4TB1spfg7ELoBucEq3T
s4EVbYRe+nIE4tuEAveaX3CgxJek8cY5MlticskdaKSEACBwabdOF55qsZy0u+WA
PYC4iUIXfbOH8OgieKWtGX4IuSkRYdQ2nP4BOpe4ZX4+zvU7zOCIyVSKRrwkX8cc
ADtWI5FAtB36KCgUuWnHGHNZpOxPTbTLBuBataFY4Q2uBNJEBJpscZ5H9ObtyGFU
ZjlzqFnM0nFNDKEI1EEtv9jLzgZTU1RQ46s7EFeSeEQ2/s9wJ3+s5sBlVbljsmus
o658bLOEaRWC/aF15dgmEXW9GAO6uifNdmbzGnRn7oEMYyFQPTWbZvi1zGz58QaG
Y6WTtigVtsSrHS4wpYd+p+n1W06VnB6J3BpBM4G1VQv1Vm0dNd1tUOfkqOzPjg7c
33Itmsz2LaW1mb67GlgZ
=g4cC
-----END PGP SIGNATURE-----
Merge tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx
Pull x86 MPX removal from Dave Hansen:
"MPX requires recompiling applications, which requires compiler
support. Unfortunately, GCC 9.1 is expected to be be released without
support for MPX. This means that there was only a relatively small
window where folks could have ever used MPX. It failed to gain wide
adoption in the industry, and Linux was the only mainstream OS to ever
support it widely.
Support for the feature may also disappear on future processors.
This set completes the process that we started during the 5.4 merge
window when the MPX prctl()s were removed. XSAVE support is left in
place, which allows MPX-using KVM guests to continue to function"
* tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx:
x86/mpx: remove MPX from arch/x86
mm: remove arch_bprm_mm_init() hook
x86/mpx: remove bounds exception code
x86/mpx: remove build infrastructure
x86/alternatives: add missing insn.h include
- Make the update to the coarse timekeeper unconditional. This is required
because the coarse timekeeper interfaces in the VDSO do not depend on a
VDSO capable clocksource. If the system does not have a VDSO capable
clocksource and the update is depending on the VDSO capable clocksource,
the coarse VDSO interfaces would operate on stale data forever.
- Invert the logic of __arch_update_vdso_data() to avoid further head
scratching. Tripped over this several times while analyzing the update
problem above.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl4vXzUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodbPD/4km+XOhsbefcn1Xo6SAQV9akPhKSHY
h1gfjpe4UD+Uj4WfmpERHcCJA3sYtZSjNyEWkwagH1XjB+rcLc3JE8XvhPCZTXCx
g/OQlww1ef6mBZ5nslpPUZs8i0HppoV7Sa955QxR/jWuOIEssg5c+XGqP8xX8AhX
TqBOUcJd0LhqCGt76Gb6LHnOEshE8e6ptZ0xayzMZsab3LJTEaJCrsoDpADQ1q8A
hMjiL3CG9/e12qKYhODFTbyc/wgyGQYK8g6sb9E1Twd2Tw2+ikRbtZuQd3HQv4jV
SiVtmMqLu6IH+G608zeNIn/67/WX9zYqUZ3fZgSjBwXWoB84Gyj11KLnjmCgS6SH
0ddOQKPn8VyQc2anG4obRtMNB+TjJvGnB4QSL2ROJB7Zx6EYMsduhXwIbaNZDDro
nIh6Xvl6iyb0lkhd9zCR7ak7UHJg4ECJsVKK3kAMIHJM4f53d/DwT+ZaHbJZa/2a
OLoBGpBkJoE1X40dXou+0FUyUFRla42+ho99nCU580EyK/ZAuZEqKjjez9QIh4vN
L/I6uEHGBw9myB40nb0DFhRIFR97BUkRTRA3VhyX0CYIE3gUL43zNFsdvcugsxRy
4/Cf7tqhQcSjYjJxpLTRRWt2t6QvDoWfTnrwiPqSepcO17uV8WHLrxK4mT2i8Vjc
PIq7OgZlp09gQA==
=ONO4
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for the generic VDSO code which missed 5.5:
- Make the update to the coarse timekeeper unconditional.
This is required because the coarse timekeeper interfaces in the
VDSO do not depend on a VDSO capable clocksource. If the system
does not have a VDSO capable clocksource and the update is
depending on the VDSO capable clocksource, the coarse VDSO
interfaces would operate on stale data forever.
- Invert the logic of __arch_update_vdso_data() to avoid further head
scratching.
Tripped over this several times while analyzing the update problem
above"
* tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/vdso: Update coarse timekeeper unconditionally
lib/vdso: Make __arch_update_vdso_data() logic understandable
- remove ioremap_nocache given that is is equivalent to
ioremap everywhere
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl4vKHwLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMPGBAAuVNUZaZfWYHpiVP2oRcUQUguFiD3NTbknsyzV2oH
J9P0GfeENSKwE9OOhZ7XIjnCZAJwQgTK/ppQY5yiQ/KAtYyyXjXEJ6jqqjiTDInr
+3+I3t/LhkgrK7tMrb7ylTGa/d7KhaciljnOXC8+b75iddvM9I1z2pbHDbppZMS9
wT4RXL/cFtRb85AfOyPLybcka3f5P2gGvQz38qyimhJYEzHDXZu9VO1Bd20f8+Xf
eLBKX0o6yWMhcaPLma8tm0M0zaXHEfLHUKLSOkiOk+eHTWBZ3b/w5nsOQZYZ7uQp
25yaClbameAn7k5dHajduLGEJv//ZjLRWcN3HJWJ5vzO111aHhswpE7JgTZJSVWI
ggCVkytD3ESXapvswmACSeCIDMmiJMzvn6JvwuSMVB7a6e5mcqTuGo/FN+DrBF/R
IP+/gY/T7zIIOaljhQVkiEIIwiD/akYo0V9fheHTBnqcKEDTHV4WjKbeF6aCwcO+
b8inHyXZSKSMG//UlDuN84/KH/o1l62oKaB1uDIYrrL8JVyjAxctWt3GOt5KgSFq
wVz1lMw4kIvWtC/Sy2H4oB+RtODLp6yJDqmvmPkeJwKDUcd/1JKf0KsZ8j3FpGei
/rEkBEss0KBKyFAgBSRO2jIpdj2epgcBcsdB/r5mlhcn8L77AS6mHbA173kY4pQ/
Kdg=
=TUCJ
-----END PGP SIGNATURE-----
Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap
Pull ioremap updates from Christoph Hellwig:
"Remove the ioremap_nocache API (plus wrappers) that are always
identical to ioremap"
* tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
remove ioremap_nocache and devm_ioremap_nocache
MIPS: define ioremap_nocache to ioremap
From: Dave Hansen <dave.hansen@linux.intel.com>
MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).
arch_bprm_mm_init() is used at execve() time. The only non-stub
implementation is on x86 for MPX. Remove the hook entirely from
all architectures and generic code.
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
The function name suggests that this is a boolean checking whether the
architecture asks for an update of the VDSO data, but it works the other
way round. To spare further confusion invert the logic.
Fixes: 44f57d788e ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200114185946.656652824@linutronix.de
Prefer __always_inline for atomic wrappers. When building for size
(CC_OPTIMIZE_FOR_SIZE), some compilers appear to be less inclined to
inline even relatively small static inline functions that are assumed to
be inlinable such as atomic ops. This can cause problems, for example in
UACCESS regions.
By using __always_inline, we let the real implementation and not the
wrapper determine the final inlining preference.
For x86 tinyconfig we observe:
- vmlinux baseline: 1316204
- vmlinux with patch: 1315988 (-216 bytes)
This came up when addressing UACCESS warnings with CC_OPTIMIZE_FOR_SIZE
in the KCSAN runtime:
http://lkml.kernel.org/r/58708908-84a0-0a81-a836-ad97e33dbb62@infradead.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
The commit c296d4dc13 ("asm-generic: fix a compilation warning") changed
asm-generic/cachflush.h to use static inlines instead of macros and as a
result the nds32 build with CONFIG_CPU_CACHE_ALIASING=n fails:
CC init/main.o
In file included from arch/nds32/include/asm/cacheflush.h:43,
from include/linux/highmem.h:12,
from include/linux/pagemap.h:11,
from include/linux/blkdev.h:16,
from include/linux/blk-cgroup.h:23,
from include/linux/writeback.h:14,
from init/main.c:44:
include/asm-generic/cacheflush.h:50:20: error: static declaration of 'flush_icache_range' follows non-static declaration
static inline void flush_icache_range(unsigned long start, unsigned long end)
^~~~~~~~~~~~~~~~~~
In file included from include/linux/highmem.h:12,
from include/linux/pagemap.h:11,
from include/linux/blkdev.h:16,
from include/linux/blk-cgroup.h:23,
from include/linux/writeback.h:14,
from init/main.c:44:
arch/nds32/include/asm/cacheflush.h:11:6: note: previous declaration of 'flush_icache_range' was here
void flush_icache_range(unsigned long start, unsigned long end);
^~~~~~~~~~~~~~~~~~
Surround the inline functions in asm-generic/cacheflush.h by ifdef's so
that architectures could override them and add the required overrides to
nds32.
Fixes: c296d4dc13 ("asm-generic: fix a compilation warning")
Link: https://lore.kernel.org/lkml/201912212139.yptX8CsV%25lkp@intel.com/
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Commit c3a6cf19e6 ("export: avoid code duplication in
include/linux/export.h") refactors export.h quite nicely, but introduces
a slight increase in memory usage due to using the empty string ""
instead of NULL to indicate that an exported symbol has no namespace. As
mentioned in that commit, this meant an increase of 1 byte per exported
symbol without a namespace. For example, if a kernel configuration has
about 10k exported symbols, this would mean that the size of
__ksymtab_strings would increase by roughly 10kB.
We can alleviate this situation by utilizing the SHF_MERGE and
SHF_STRING section flags. SHF_MERGE|SHF_STRING indicate to the linker
that the data in the section are null-terminated strings that can be
merged to eliminate duplication. More specifically, from the binutils
documentation - "for sections with both M and S, a string which is a
suffix of a larger string is considered a duplicate. Thus "def" will be
merged with "abcdef"; A reference to the first "def" will be changed to
a reference to "abcdef"+3". Thus, all the empty strings would be merged
as well as any strings that can be merged according to the cited method
above. For example, "memset" and "__memset" would be merged to just
"__memset" in __ksymtab_strings.
As of v5.4-rc5, the following statistics were gathered with x86
defconfig with approximately 10.7k exported symbols.
Size of __ksymtab_strings in vmlinux:
-------------------------------------
v5.4-rc5: 213834 bytes
v5.4-rc5 with commit c3a6cf19e6: 224455 bytes
v5.4-rc5 with this patch: 205759 bytes
So, we already see memory savings of ~8kB compared to vanilla -rc5 and
savings of nearly 18.7kB compared to -rc5 with commit c3a6cf19e6 on top.
Unfortunately, as of this writing, strings will not get deduplicated for
kernel modules, as ld does not do the deduplication for
SHF_MERGE|SHF_STRINGS sections for relocatable files (ld -r), which
kernel modules are. A patch for ld is currently being worked on to
hopefully allow for string deduplication in relocatable files in the
future.
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
A few commits splitting the KASAN instrumented bitops header in
three, to match the split of the asm-generic bitops headers.
This is needed on powerpc because we use asm-generic/bitops/non-atomic.h,
for the non-atomic bitops, whereas the existing KASAN instrumented
bitops assume all the underlying operations are provided by the arch
as arch_foo() versions.
Thanks to:
Daniel Axtens & Christophe Leroy.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3qN0wTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLWFD/4/e3CO1oWjQdDGjgvhydhmhjMQqI9m
EINQjg3hl0ceig1HzsgjVdWI4TOf7bXbaQRf8m2pFWpA+iSx4KpjlnXzNjfUDapO
JqzKYfVSdi0o6OAFigDYuN1V5F2jAgrM7/w5PKpVuiAABcgJNcEY59tgEMTdj9r0
9H3OekYv0UnZ5ZNsUhCibKVvVLdbtys3ALrm1YETAauCkY/lpNk6afcr33t0iw2l
WAdd2sfWvw4tBn+35ZrNnv7z4hQ423Imd+lyuI5zhMBOOGspgMxlGGeIn370WyAi
00Jl66TRot6EtWGDVzV10bjB53qDhHrtNIk0NG2QMBB8vdTjBMtXfJnVc3Q8iVZ9
GkpdvMLNAlmxa4AuzpdHAOUQK48aggQzDmJkGp/JdYT6+WwTa5SLZcsy+njHGyjU
It+728FnStilM1XvjnaN9pljqANEcN4/YIycK55XGDsZS9fVRMY/QMAQZbdLHfzc
nB54Q/8vtPc9H69ws3U3g0ogVtYi5ca5RpTPiYU6WUQfEe9mjZdEgglsHC1y8ef2
9J3Muv4ASDGLjKN+G4dvKLCIgF+QKtsvwrtSztQq6wVnXUxuUQBEQSCaMPN9PN5g
Hlkjk95WevXcHyXeE2r8cXndUTboIgWRMZp0AHao44du3EziPMCvE0tFAHOEWfV0
8Oty9Fo5QSb1Mw==
=FCVX
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"A few commits splitting the KASAN instrumented bitops header in three,
to match the split of the asm-generic bitops headers.
This is needed on powerpc because we use the generic bitops for the
non-atomic case only, whereas the existing KASAN instrumented bitops
assume all the underlying operations are provided by the arch as
arch_foo() versions.
Thanks to: Daniel Axtens & Christophe Leroy"
* tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
docs/core-api: Remove possibly confusing sub-headings from Bit Operations
powerpc: support KASAN instrumentation of bitops
kasan: support instrumented bitops combined with generic bitops
There are no architectures that use include/asm-generic/4level-fixup.h
therefore it can be removed along with __ARCH_HAS_4LEVEL_HACK define.
Link: http://lkml.kernel.org/r/1572938135-31886-14-git-send-email-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Anatoly Pugachev <matorola@gmail.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Peter Rosin <peda@axentia.se>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pach series "Introduce the for_each_set_clump8 macro", v18.
While adding GPIO get_multiple/set_multiple callback support for various
drivers, I noticed a pattern of looping manifesting that would be useful
standardized as a macro.
This patchset introduces the for_each_set_clump8 macro and utilizes it
in several GPIO drivers. The for_each_set_clump macro8 facilitates a
for-loop syntax that iterates over a memory region entire groups of set
bits at a time.
For example, suppose you would like to iterate over a 32-bit integer 8
bits at a time, skipping over 8-bit groups with no set bit, where
XXXXXXXX represents the current 8-bit group:
Example: 10111110 00000000 11111111 00110011
First loop: 10111110 00000000 11111111 XXXXXXXX
Second loop: 10111110 00000000 XXXXXXXX 00110011
Third loop: XXXXXXXX 00000000 11111111 00110011
Each iteration of the loop returns the next 8-bit group that has at
least one set bit.
The for_each_set_clump8 macro has four parameters:
* start: set to the bit offset of the current clump
* clump: set to the current clump value
* bits: bitmap to search within
* size: bitmap size in number of bits
In this version of the patchset, the for_each_set_clump macro has been
reimplemented and simplified based on the suggestions provided by Rasmus
Villemoes and Andy Shevchenko in the version 4 submission.
In particular, the function of the for_each_set_clump macro has been
restricted to handle only 8-bit clumps; the drivers that use the
for_each_set_clump macro only handle 8-bit ports so a generic
for_each_set_clump implementation is not necessary. Thus, a solution
for large clumps (i.e. those larger than the width of a bitmap word)
can be postponed until a driver appears that actually requires such a
generic for_each_set_clump implementation.
For what it's worth, a semi-generic for_each_set_clump (i.e. for clumps
smaller than the width of a bitmap word) can be implemented by simply
replacing the hardcoded '8' and '0xFF' instances with respective
variables. I have not yet had a need for such an implementation, and
since it falls short of a true generic for_each_set_clump function, I
have decided to forgo such an implementation for now.
In addition, the bitmap_get_value8 and bitmap_set_value8 functions are
introduced to get and set 8-bit values respectively. Their use is based
on the behavior suggested in the patchset version 4 review.
This patch (of 14):
This macro iterates for each 8-bit group of bits (clump) with set bits,
within a bitmap memory region. For each iteration, "start" is set to
the bit offset of the found clump, while the respective clump value is
stored to the location pointed by "clump". Additionally, the
bitmap_get_value8 and bitmap_set_value8 functions are introduced to
respectively get and set an 8-bit value in a bitmap memory region.
[gustavo@embeddedor.com: fix potential sign-extension overflow]
Link: http://lkml.kernel.org/r/20191015184657.GA26541@embeddedor
[akpm@linux-foundation.org: s/ULL/UL/, per Joe]
[vilhelm.gray@gmail.com: add for_each_set_clump8 documentation]
Link: http://lkml.kernel.org/r/20191016161825.301082-1-vilhelm.gray@gmail.com
Link: http://lkml.kernel.org/r/893c3b4f03266c9496137cc98ac2b1bd27f92c73.1570641097.git.vilhelm.gray@gmail.com
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Suggested-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Phil Reid <preid@electromag.com.au>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Morten Hein Tiljeset <morten.tiljeset@prevas.dk>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is a single SPDX fixup for 5.5-rc1
It resolves an issue where we had missed a few .h files with the
auto-tagging scripts because they had "GPL" text in strings within the
file themselves. This single patch fixes up the issue and provides the
proper SPDX tags at the top of them as needed.
This patch has been in linux-next for many many weeks now with no
reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXebGqw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn31wCePtjLIQCW2PCrub9zeZCqUJ62V0sAoK53hYv+
f3LHw9fd2dzhNCcly7nW
=MuKu
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX fix from Greg KH:
"Here is a single SPDX fixup for 5.5-rc1
It resolves an issue where we had missed a few .h files with the
auto-tagging scripts because they had "GPL" text in strings within the
file themselves. This single patch fixes up the issue and provides the
proper SPDX tags at the top of them as needed.
This patch has been in linux-next for many many weeks now with no
reported issues"
* tag 'spdx-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
export,module: add SPDX GPL-2.0 license identifier to headers with no license
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl3leXUUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyY3g/9FAVVdPEaadNtAhQ/zIxcjozDovKq
0q7yOA3aTBTUoNEinm88an6p0dcC4gNKtGukXmzVH2Hhxm9kLRdtpZGYY00tpLUB
9rI7XsgwwHa+hLwsHbIs507sKGFGy5FLr0ChTTGLDEMppnEvjA2hZooYmcB/OgrC
LlFcwbNKGOk/Si9u2bF2nLO0JDoVHnwzpF99saew/nqc7Lfj9e9IPZFom+VjPBUh
AOvRp2H7uBN+WQlpLeFeMDDoeXh34lX0kYqIV/cVkXVnknDGYKV2CBTg2aeX7jd0
QiPHZh6zlW8zNQgaCZRiBAbatVEOnRMRJ++yiqB8hBYp1LMXm6kJ01YSQpXkugoY
Vp9dtzzTARWV/XkKwD4brw9ZEmIDnO+Ed2x2VbUkPJVcXAvzSQWAx82IU0Iuqmcb
9qr6U2Zf/Xk5aFlGPYVH8QOG+QqzIbZNRQ7NlhDlITyW4P6QPu0mw374yYP2wDGL
sP5YSS3YGa0sQcEgDtVnd4z+WTZI4AwXLPaeaLkDhdfHp2FsERUY4TrPs33J99xw
og4EyokVFzjYzlnBPU6WWn7LL+jj5ccXkL3MA4DR4FJOnNGHh7NXfQUH56rrgsq7
F9/8shL5DuTbQkde1uSyUG9Iq/RigVLlV5DQavFm3dSXvZi0E16t5alC5URNTzk7
at8Bogn53QhlmYc=
=uUXw
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Warn if a host bridge has no NUMA info (Yunsheng Lin)
- Add PCI_STD_NUM_BARS for the number of standard BARs (Denis
Efremov)
Resource management:
- Fix boot-time Embedded Controller GPE storm caused by incorrect
resource assignment after ACPI Bus Check Notification (Mika
Westerberg)
- Protect pci_reassign_bridge_resources() against concurrent
addition/removal (Benjamin Herrenschmidt)
- Fix bridge dma_ranges resource list cleanup (Rob Herring)
- Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control
the MMIO and prefetchable MMIO window sizes of hotplug bridges
independently (Nicholas Johnson)
- Fix MMIO/MMIO_PREF window assignment that assigned more space than
desired (Nicholas Johnson)
- Only enforce bus numbers from bridge EA if the bridge has EA
devices downstream (Subbaraya Sundeep)
- Consolidate DT "dma-ranges" parsing and convert all host drivers to
use shared parsing (Rob Herring)
Error reporting:
- Restore AER capability after resume (Mayurkumar Patel)
- Add PoisonTLPBlocked AER counter (Rajat Jain)
- Use for_each_set_bit() to simplify AER code (Andy Shevchenko)
- Fix AER kernel-doc (Andy Shevchenko)
- Add "pcie_ports=dpc-native" parameter to allow native use of DPC
even if platform didn't grant control over AER (Olof Johansson)
Hotplug:
- Avoid returning prematurely from sysfs requests to enable or
disable a PCIe hotplug slot (Lukas Wunner)
- Don't disable interrupts twice when suspending hotplug ports (Mika
Westerberg)
- Fix deadlocks when PCIe ports are hot-removed while suspended (Mika
Westerberg)
Power management:
- Remove unnecessary ASPM locking (Bjorn Helgaas)
- Add support for disabling L1 PM Substates (Heiner Kallweit)
- Allow re-enabling Clock PM after it has been disabled (Heiner
Kallweit)
- Add sysfs attributes for controlling ASPM link states (Heiner
Kallweit)
- Remove CONFIG_PCIEASPM_DEBUG, including "link_state" and "clk_ctl"
sysfs files (Heiner Kallweit)
- Avoid AMD FCH XHCI USB PME# from D0 defect that prevents wakeup on
USB 2.0 or 1.1 connect events (Kai-Heng Feng)
- Move power state check out of pci_msi_supported() (Bjorn Helgaas)
- Fix incorrect MSI-X masking on resume and revert related nvme quirk
for Kingston NVME SSD running FW E8FK11.T (Jian-Hong Pan)
- Always return devices to D0 when thawing to fix hibernation with
drivers like mlx4 that used legacy power management (previously we
only did it for drivers with new power management ops) (Dexuan Cui)
- Clear PCIe PME Status even for legacy power management (Bjorn
Helgaas)
- Fix PCI PM documentation errors (Bjorn Helgaas)
- Use dev_printk() for more power management messages (Bjorn Helgaas)
- Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas)
- Convert xen-platform from legacy to generic power management (Bjorn
Helgaas)
- Removed unused .resume_early() and .suspend_late() legacy power
management hooks (Bjorn Helgaas)
- Rearrange power management code for clarity (Rafael J. Wysocki)
- Decode power states more clearly ("4" or "D4" really refers to
"D3cold") (Bjorn Helgaas)
- Notice when reading PM Control register returns an error (~0)
instead of interpreting it as being in D3hot (Bjorn Helgaas)
- Add missing link delays required by the PCIe spec (Mika Westerberg)
Virtualization:
- Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn
Helgaas)
- Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code
previously didn't recognize that) (Kuppuswamy Sathyanarayanan)
- Allow VFs to use PASID (the PF PASID capability is shared by the
VFs, but the code previously didn't recognize that) (Kuppuswamy
Sathyanarayanan)
- Disconnect PF and VF ATS enablement, since ATS in PFs and
associated VFs can be enabled independently (Kuppuswamy
Sathyanarayanan)
- Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan)
- Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas)
- Consolidate ATS declarations in linux/pci-ats.h (Krzysztof
Wilczynski)
- Remove unused PRI and PASID stubs (Bjorn Helgaas)
- Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID
interfaces that are only used by built-in IOMMU drivers (Bjorn
Helgaas)
- Hide PRI and PASID state restoration functions used only inside the
PCI core (Bjorn Helgaas)
- Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski)
- Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut)
- Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George
Cherian)
- Fix the UPDCR register address in the Intel ACS quirk (Steffen
Liebergeld)
- Unify ACS quirk implementations (Bjorn Helgaas)
Amlogic Meson host bridge driver:
- Fix meson PERST# GPIO polarity problem (Remi Pommarel)
- Add DT bindings for Amlogic Meson G12A (Neil Armstrong)
- Fix meson clock names to match DT bindings (Neil Armstrong)
- Add meson support for Amlogic G12A SoC with separate shared PHY
(Neil Armstrong)
- Add meson extended PCIe PHY functions for Amlogic G12A USB3+PCIe
combo PHY (Neil Armstrong)
- Add arm64 DT for Amlogic G12A PCIe controller node (Neil Armstrong)
- Add commented-out description of VIM3 USB3/PCIe mux in arm64 DT
(Neil Armstrong)
Broadcom iProc host bridge driver:
- Invalidate iProc PAXB address mapping before programming it
(Abhishek Shah)
- Fix iproc-msi and mvebu __iomem annotations (Ben Dooks)
Cadence host bridge driver:
- Refactor Cadence PCIe host controller to use as a library for both
host and endpoint (Tom Joseph)
Freescale Layerscape host bridge driver:
- Add layerscape LS1028a support (Xiaowei Bao)
Intel VMD host bridge driver:
- Add VMD bus 224-255 restriction decode (Jon Derrick)
- Add VMD 8086:9A0B device ID (Jon Derrick)
- Remove Keith from VMD maintainer list (Keith Busch)
Marvell ARMADA 3700 / Aardvark host bridge driver:
- Use LTSSM state to build link training flag since Aardvark doesn't
implement the Link Training bit (Remi Pommarel)
- Delay before training Aardvark link in case PERST# was asserted
before the driver probe (Remi Pommarel)
- Fix Aardvark issues with Root Control reads and writes (Remi
Pommarel)
- Don't rely on jiffies in Aardvark config access path since
interrupts may be disabled (Remi Pommarel)
- Fix Aardvark big-endian support (Grzegorz Jaszczyk)
Marvell ARMADA 370 / XP host bridge driver:
- Make mvebu_pci_bridge_emul_ops static (Ben Dooks)
Microsoft Hyper-V host bridge driver:
- Add hibernation support for Hyper-V virtual PCI devices (Dexuan
Cui)
- Track Hyper-V pci_protocol_version per-hbus, not globally (Dexuan
Cui)
- Avoid kmemleak false positive on hv hbus buffer (Dexuan Cui)
Mobiveil host bridge driver:
- Change mobiveil csr_read()/write() function names that conflict
with riscv arch functions (Kefeng Wang)
NVIDIA Tegra host bridge driver:
- Fix Tegra CLKREQ dependency programming (Vidya Sagar)
Renesas R-Car host bridge driver:
- Remove unnecessary header include from rcar (Andrew Murray)
- Tighten register index checking for rcar inbound range programming
(Marek Vasut)
- Fix rcar inbound range alignment calculation to improve packing of
multiple entries (Marek Vasut)
- Update rcar MACCTLR setting to match documentation (Yoshihiro
Shimoda)
- Clear bit 0 of MACCTLR before PCIETCTLR.CFINIT per manual
(Yoshihiro Shimoda)
- Add Marek Vasut and Yoshihiro Shimoda as R-Car maintainers (Simon
Horman)
Rockchip host bridge driver:
- Make rockchip 0V9 and 1V8 power regulators non-optional (Robin
Murphy)
Socionext UniPhier host bridge driver:
- Set uniphier to host (RC) mode always (Kunihiko Hayashi)
Endpoint drivers:
- Fix endpoint driver sign extension problem when shifting page
number to phys_addr_t (Alan Mikhak)
Misc:
- Add NumaChip SPDX header (Krzysztof Wilczynski)
- Replace EXTRA_CFLAGS with ccflags-y (Krzysztof Wilczynski)
- Remove unused includes (Krzysztof Wilczynski)
- Removed unused sysfs attribute groups (Ben Dooks)
- Remove PTM and ASPM dependencies on PCIEPORTBUS (Bjorn Helgaas)
- Add PCIe Link Control 2 register field definitions to replace magic
numbers in AMDGPU and Radeon CIK/SI (Bjorn Helgaas)
- Fix incorrect Link Control 2 Transmit Margin usage in AMDGPU and
Radeon CIK/SI PCIe Gen3 link training (Bjorn Helgaas)
- Use pcie_capability_read_word() instead of pci_read_config_word()
in AMDGPU and Radeon CIK/SI (Frederick Lawler)
- Remove unused pci_irq_get_node() Greg Kroah-Hartman)
- Make asm/msi.h mandatory and simplify PCI_MSI_IRQ_DOMAIN Kconfig
(Palmer Dabbelt, Michal Simek)
- Read all 64 bits of Switchtec part_event_bitmap (Logan Gunthorpe)
- Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn
Helgaas)
- Fix bridge emulation big-endian support (Grzegorz Jaszczyk)
- Fix dwc find_next_bit() usage (Niklas Cassel)
- Fix pcitest.c fd leak (Hewenliang)
- Fix typos and comments (Bjorn Helgaas)
- Fix Kconfig whitespace errors (Krzysztof Kozlowski)"
* tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (160 commits)
PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist
asm-generic: Make msi.h a mandatory include/asm header
Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
PCI/MSI: Fix incorrect MSI-X masking on resume
PCI/MSI: Move power state check out of pci_msi_supported()
PCI/MSI: Remove unused pci_irq_get_node()
PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer
PCI: hv: Change pci_protocol_version to per-hbus
PCI: hv: Add hibernation support
PCI: hv: Reorganize the code in preparation of hibernation
MAINTAINERS: Remove Keith from VMD maintainer
PCI/ASPM: Remove PCIEASPM_DEBUG Kconfig option and related code
PCI/ASPM: Add sysfs attributes for controlling ASPM link states
PCI: Fix indentation
drm/radeon: Prefer pcie_capability_read_word()
drm/radeon: Replace numbers with PCI_EXP_LNKCTL2 definitions
drm/radeon: Correct Transmit Margin masks
drm/amdgpu: Prefer pcie_capability_read_word()
PCI: uniphier: Set mode register to host mode
drm/amdgpu: Replace numbers with PCI_EXP_LNKCTL2 definitions
...
- remove unneeded asm headers from hexagon, ia64
- add 'dir-pkg' target, which works like 'tar-pkg' but skips archiving
- add 'helpnewconfig' target, which shows help for new CONFIG options
- support 'make nsdeps' for external modules
- make rebuilds faster by deleting $(wildcard $^) checks
- remove compile tests for kernel-space headers
- refactor modpost to simplify modversion handling
- make single target builds faster
- optimize and clean up scripts/kallsyms.c
- refactor various Makefiles and scripts
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl3lKCUVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGu9sP/iTW/RjDxbAsu0aP8jFqzLK/xKB/
NQn/+dD76TjEmjgew9AXszf2rJL+ixKVymGM08FV59Bbguvi8XmAB/QXK21Sjb5j
rVl3N97TWNkvXM+QJyly23G2UtbubRSPo3g+e70BZrw3lcmrsK+sAmTOL5KtIrNX
9BHM803JwqsMJyvBwTBBw3UFeeBqb38Qx6gmigfSihuDf6pvjoVDKskpsDno3wX7
rdiXYxAsKQLQ/P2ym/bV/Oqe90RqRtV/2/WCpLshlwHkiM9huflv6GjgCkkbAx5H
N3TSptlS7l/2B/XKHgA5ALjHjUlxTGBzLLoevarCd8loKcQXFlgx+vd3nM/WJlHJ
x9UpTklDwGP9eUBsa9W980tEyUVsFGMAC8EcTdW6NN2IRtuCOSA5N2FYYt8/SDd0
2b3PhElTJIp4pTWSYN6JZxB1R8n/YBgxLqOJ6N2U6B9CdKFUCHlwGH23QfN89km/
WEMP85bsaab/dnyxbwelkoYYYyPgUHsC13AbpkHdrDxMbAGO+G1PwpHxC6ErF2en
wRGrcUxWTfHRykO5aJIQtCB9b1fv73134mTzB5fTYd6GtjepGBSBCO9xb2Iy4sc9
Y+nHVVDUrihvSOpJgqh677PcLDutOZR8fFCoc1ZMDAbBsDvrb0Qsee6oEidj98xc
5kXp9YZh/tdh/tdo
=zUaB
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- remove unneeded asm headers from hexagon, ia64
- add 'dir-pkg' target, which works like 'tar-pkg' but skips archiving
- add 'helpnewconfig' target, which shows help for new CONFIG options
- support 'make nsdeps' for external modules
- make rebuilds faster by deleting $(wildcard $^) checks
- remove compile tests for kernel-space headers
- refactor modpost to simplify modversion handling
- make single target builds faster
- optimize and clean up scripts/kallsyms.c
- refactor various Makefiles and scripts
* tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (59 commits)
MAINTAINERS: update Kbuild/Kconfig maintainer's email address
scripts/kallsyms: remove redundant initializers
scripts/kallsyms: put check_symbol_range() calls close together
scripts/kallsyms: make check_symbol_range() void function
scripts/kallsyms: move ignored symbol types to is_ignored_symbol()
scripts/kallsyms: move more patterns to the ignored_prefixes array
scripts/kallsyms: skip ignored symbols very early
scripts/kallsyms: add const qualifiers where possible
scripts/kallsyms: make find_token() return (unsigned char *)
scripts/kallsyms: replace prefix_underscores_count() with strspn()
scripts/kallsyms: add sym_name() to mitigate cast ugliness
scripts/kallsyms: remove unneeded length check for prefix matching
scripts/kallsyms: remove redundant is_arm_mapping_symbol()
scripts/kallsyms: set relative_base more effectively
scripts/kallsyms: shrink table before sorting it
scripts/kallsyms: fix definitely-lost memory leak
scripts/kallsyms: remove unneeded #ifndef ARRAY_SIZE
kbuild: make single target builds even faster
modpost: respect the previous export when 'exported twice' is warned
modpost: do not set ->preloaded for symbols from Module.symvers
...
Merge updates from Andrew Morton:
"Incoming:
- a small number of updates to scripts/, ocfs2 and fs/buffer.c
- most of MM
I still have quite a lot of material (mostly not MM) staged after
linux-next due to -next dependencies. I'll send those across next week
as the preprequisites get merged up"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
mm/page_io.c: annotate refault stalls from swap_readpage
mm/Kconfig: fix trivial help text punctuation
mm/Kconfig: fix indentation
mm/memory_hotplug.c: remove __online_page_set_limits()
mm: fix typos in comments when calling __SetPageUptodate()
mm: fix struct member name in function comments
mm/shmem.c: cast the type of unmap_start to u64
mm: shmem: use proper gfp flags for shmem_writepage()
mm/shmem.c: make array 'values' static const, makes object smaller
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
userfaultfd: wrap the common dst_vma check into an inlined function
userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
userfaultfd: use vma_pagesize for all huge page size calculation
mm/madvise.c: use PAGE_ALIGN[ED] for range checking
mm/madvise.c: replace with page_size() in madvise_inject_error()
mm/mmap.c: make vma_merge() comment more easy to understand
mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
autonuma: reduce cache footprint when scanning page tables
autonuma: fix watermark checking in migrate_balanced_pgdat()
...
A huge pud page can theoretically be faulted in racing with pmd_alloc()
in __handle_mm_fault(). That will lead to pmd_alloc() returning an
invalid pmd pointer.
Fix this by adding a pud_trans_unstable() function similar to
pmd_trans_unstable() and check whether the pud is really stable before
using the pmd pointer.
Race:
Thread 1: Thread 2: Comment
create_huge_pud() Fallback - not taken.
create_huge_pud() Taken.
pmd_alloc() Returns an invalid pointer.
This will result in user-visible huge page data corruption.
Note that this was caught during a code audit rather than a real
experienced problem. It looks to me like the only implementation that
currently creates huge pud pagetable entries is dev_dax_huge_fault()
which doesn't appear to care much about private (COW) mappings or
write-tracking which is, I believe, a prerequisite for create_huge_pud()
falling back on thread 1, but not in thread 2.
Link: http://lkml.kernel.org/r/20191115115808.21181-2-thomas_os@shipmail.org
Fixes: a00cc7d9dd ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The asm-generic/pgtable.h include file appears to be the correct place for
the backup x_devmap() inline functions. Moving them here is also
necessary if we want to include x_devmap() in the [pmd|pud]_unstable
functions. So move the x_devmap() functions to asm-generic/pgtable.h
Link: http://lkml.kernel.org/r/20191115115808.21181-1-thomas_os@shipmail.org
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This came up when removing __ARCH_HAS_5LEVEL_HACK for ARC as code bloat.
With this patch we see the following code reduction.
| bloat-o-meter2 vmlinux-D-elide-p4d_free_tlb vmlinux-E-elide-p?d_clear_bad
| add/remove: 0/2 grow/shrink: 0/0 up/down: 0/-40 (-40)
| function old new delta
| pud_clear_bad 20 - -20
| p4d_clear_bad 20 - -20
| Total: Before=4136930, After=4136890, chg -1.000000%
Link: http://lkml.kernel.org/r/20191016162400.14796-6-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This came up when removing __ARCH_HAS_5LEVEL_HACK for ARC as code bloat.
With this patch we see the following code reduction.
| bloat-o-meter2 vmlinux-E-elide-p?d_clear_bad vmlinux-F-elide-pmd_free_tlb
| add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-112 (-112)
| function old new delta
| free_pgd_range 422 310 -112
| Total: Before=4137042, After=4136930, chg -1.000000%
Note that pmd folding can be tricky: In 2-level setup (where pmd is
conceptually folded) most pmd routines are valid and refer to upper levels.
In this patch we can, but see next patch for example where we can't
Link: http://lkml.kernel.org/r/20191016162400.14796-5-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... independent of __ARCH_HAS_5LEVEL_HACK
This came up when removing __ARCH_HAS_5LEVEL_HACK for ARC as code bloat.
With this patch we see the following code reduction
| bloat-o-meter2 vmlinux-C-elide-pud_free_tlb vmlinux-D-elide-p4d_free_tlb
| add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-104 (-104)
| function old new delta
| free_pgd_range 552 422 -130
| Total: Before=4137172, After=4137042, chg -1.000000%
Link: http://lkml.kernel.org/r/20191016162400.14796-4-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... independent of __ARCH_HAS_4LEVEL_HACK
This came up when removing __ARCH_HAS_5LEVEL_HACK for ARC as code bloat.
With this patch we see the following code reduction
| bloat-o-meter2 vmlinux-B-elide-ARCH_USE_5LEVEL_HACK vmlinux-C-elide-pud_free_tlb
| add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-104 (-104)
| function old new delta
| free_pgd_range 656 552 -104
| Total: Before=4137276, After=4137172, chg -1.000000%
Note: The primary change is alternate defintion for pud_free_tlb() but
while there also removed empty stubs for __pud_free_tlb, which is anyhow
called only from pud_free_tlb()
Link: http://lkml.kernel.org/r/20191016162400.14796-3-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Hibernation support (Dexuan Cui).
- Latency testing framework (Branden Bonaby).
- Decoupling Hyper-V page size from guest page size (Himadri Pandya).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl3f5YIACgkQ3qZv95d3
LNzBww/8Cpv/BnOs2cp56OhC+2++3YlWfmxGnvQb9h52weElgr1AZF33lAynp8BZ
YssOcDnS/G2iAkNDffbQA7s3WTwIjP1weJibOeKbtcXp4SuhNR3gnJafufNddNDv
bw8ZReLQV7hy3sHb3OUx0aJk5Mssp0N9ZpxRilyIpLELPfVp63gFebq6s1MQYljk
BAiNO4SKqsGQGZApt2F4Cc3hX2wU2ZfiDm6SifXiLYITGnvilIn7XFIht+2jJBWS
CdzRoGXcwhQhlj68XWlc89SOzJb7vVUMO1sr84psfbQ2LbhJU8lfJKRJ4b4lR07Z
Uv5FYxjr14S65fv7DkzCfWU+uPN/sObG4pPXihlfqcTraOvYLQ6/x8cw+9tGZg4H
aTtnF40hnO81aKsvPAeIsSzVkoyPaSrt7KKhk+Bw/5EUDTTNp6EbIuL4xwnKt6Rt
2UpA5HM9guQqNb6OZrjlpZfJgd9bNP4CZLBTfOukmnZpONKr2Wv3wubcwQJ8ibQc
1WZ5SfN2Wmg999Ski7j9qzHk0tWJxa6SX+2NLEHRKxy2nJSJ1zlAr//bznMyMgH/
yKPDaSkOFoy0aqiTKV2WzuOY6FGXTrSo5vq8YAgYRgp3xB+5+7zLeqlj3ipXhLYE
HH/eqB27eSnvi0jpub4TbszGJG0o4Z1aYx3aHYYqrOfWX/A5Vls=
=oJGE
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Sasha Levin:
- support for new VMBus protocols (Andrea Parri)
- hibernation support (Dexuan Cui)
- latency testing framework (Branden Bonaby)
- decoupling Hyper-V page size from guest page size (Himadri Pandya)
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
drivers/hv: Replace binary semaphore with mutex
drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
HID: hyperv: Add the support of hibernation
hv_balloon: Add the support of hibernation
x86/hyperv: Implement hv_is_hibernation_supported()
Drivers: hv: balloon: Remove dependencies on guest page size
Drivers: hv: vmbus: Remove dependencies on guest page size
x86: hv: Add function to allocate zeroed page for Hyper-V
Drivers: hv: util: Specify ring buffer size using Hyper-V page size
Drivers: hv: Specify receive buffer size using Hyper-V page size
tools: hv: add vmbus testing tool
drivers: hv: vmbus: Introduce latency testing
video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
hv_netvsc: Add the support of hibernation
hv_sock: Add the support of hibernation
video: hyperv_fb: Add the support of hibernation
scsi: storvsc: Add the support of hibernation
Drivers: hv: vmbus: Add module parameter to cap the VMBus version
...
Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines. The
firmware support is still in development, so the code here won't actually
activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict it to
read-only mode when the kernel is lockdown'ed, otherwise it's trivial to drop
into xmon and modify kernel data, such as the lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache() (VDSO) to work
with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management) driver to
make it behave more like other balloon drivers and enable some cleanups of
generic mm code.
- A series of fixes to our hardware breakpoint support to properly handle
unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to:
Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V, Anthony Steinhauser,
Cédric Le Goater, Chris Packham, Chris Smart, Christophe Leroy, Christopher M.
Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Hildenbrand,
Deb McLemore, Diana Craciun, Eric Richter, Geert Uytterhoeven, Greg
Kroah-Hartman, Greg Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason
Yan, Krzysztof Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M.
Rodrigues, Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes, Ravi
Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth, Tyrel
Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3hBycTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgApBEACk91MEQDYJ9MF9I6uN+85qb5p4pMsp
rGzqnpt+XFidbDAc3eP63pYfIDSo3jtkQ2YL7shAnDOTvkO0md+Vqkl9Aq/G6FIf
lDBlwbgkXMSxS/O2Lpvfn4NZAoK6dKmiV55LSgfliM62X3e2Saeg6TR55wBTgJ6/
SlYPDwZfcVHOAiFS3UmfB+hkiIZk+AI5Zr5VAZvT2ZmeH36yAWkq4JgJI1uAk6m1
/7iCnlfUjx/nl/BhnA3kjjmAgGCJ5s/WuVgwFMz47XpMBWGBhLWpMh/NqDTFb8ca
kpiVQoVPQe2xyO3pL/kOwBy6sii26ftfHDhLKMy1hJdEhVQzS5LerPIMeh1qsU8Q
hV/Cj+jfsrS/vBDOehj3jwx93t+861PmTOqgLnpYQ6Ltrt+2B/74+fufGMHE1kI3
Ffo7xvNw4sw6bSziDxDFqUx2P1dFN5D5EJsJsYM98ekkVAAkzNqCDRvfD2QI8Pif
VXWPYXqtNJTrVPJA0D7Yfo9FDNwhANd0f1zi7r/U5mVXBFUyKOlGqTQSkXgMrVeK
3I7wHPOVGgdA5UUkfcd3pcuqsY081U9E//o5PUfj8ybO5JCwly8NoatbG+xHmKia
a72uJT8MjCo9mGCHKDrwi9l/kqms6ZSv8RP+yMhGuB52YoiGc6PpVyab5jXIUd1N
yTtBlC0YGW1JYw==
=JHzg
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines.
The firmware support is still in development, so the code here
won't actually activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict
it to read-only mode when the kernel is lockdown'ed, otherwise it's
trivial to drop into xmon and modify kernel data, such as the
lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache()
(VDSO) to work with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management)
driver to make it behave more like other balloon drivers and enable
some cleanups of generic mm code.
- A series of fixes to our hardware breakpoint support to properly
handle unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V,
Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart,
Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio
Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana
Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg
Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof
Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes,
Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth,
Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing"
* tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits)
powerpc/fixmap: fix crash with HIGHMEM
x86/efi: remove unused variables
powerpc: Define arch_is_kernel_initmem_freed() for lockdep
powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
powerpc: Avoid clang warnings around setjmp and longjmp
powerpc: Don't add -mabi= flags when building with Clang
powerpc: Fix Kconfig indentation
powerpc/fixmap: don't clear fixmap area in paging_init()
selftests/powerpc: spectre_v2 test must be built 64-bit
powerpc/powernv: Disable native PCIe port management
powerpc/kexec: Move kexec files into a dedicated subdir.
powerpc/32: Split kexec low level code out of misc_32.S
powerpc/sysdev: drop simple gpio
powerpc/83xx: map IMMR with a BAT.
powerpc/32s: automatically allocate BAT in setbat()
powerpc/ioremap: warn on early use of ioremap()
powerpc: Add support for GENERIC_EARLY_IOREMAP
powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt()
powerpc/8xx: use the fixmapped IMMR in cpm_reset()
powerpc/8xx: add __init to cpm1 init functions
...
- clean up various obsolete ioremap and iounmap variants
- add a new generic ioremap implementation and switch csky, nds32 and
riscv over to it
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl3cKcsLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYO1CRAAwFQigsbi0CqqshPWnP0owKV+HA4Xfz/lQZsd7SM/
BVXhKyDJQum6gp73dW025HCfjidTknsbdCUIP/LNUgAnop3lOlnB31/munDnJJ1H
6hB1pc+zB9VgbOe0A6TxtxPRm5aE33k1hZIZS99lOh7mY3FvF7mbkkbVoCjdS3Cq
a9bTX+X+esfUQ5GgaIc2zmz2GLkyFXIeVGs8/CoOX58ESCWQcVZrsQRompo4SgrI
jqwf47NzdmK8hW4mZ+jdQUiWiAmNs5+2om7Bvi/deFAIFUo1/hLHvQzqEGramq/j
5SPHax2gWAN3uWYP91QISkUAJWFydwgmUDoTO1M04ov4xLuBrqIQmc43tLjHo2UT
RwMozWJWN+gkB9zTIboqMPi2qcuDaWcCij7LwHl5zLxPTcOKsrALarL55BQ8MipQ
x6fpvskrQQvlArNTsRWFRUq0mCtkzE3wMZ9RR3AIETQL2hlAzB1S4gzhD+Z6WTYY
pXNgkunonVGxwyN/7iJTEl/mvF/+MynGcWqhrwHZLqncyhn/WJJ2USH3nAD1+yjp
v8v6UUeMXIjUsGAyfTjXy/WXAfwRuSC038AAFcmWKDdh08h4XvPHRficT4U8wr34
7WzGizHP9f1CqrhYL/4exhPY9X2Yb7HhsFd0bZGG0rRvSillPUp0b8s++m12QuQU
+VY=
=ooiA
-----END PGP SIGNATURE-----
Merge tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremap
Pull generic ioremap support from Christoph Hellwig:
"This adds the remaining bits for an entirely generic ioremap and
iounmap to lib/ioremap.c. To facilitate that, it cleans up the giant
mess of weird ioremap variants we had with no users outside the arch
code.
For now just the three newest ports use the code, but there is more
than a handful others that can be converted without too much work.
Summary:
- clean up various obsolete ioremap and iounmap variants
- add a new generic ioremap implementation and switch csky, nds32 and
riscv over to it"
* tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremap: (21 commits)
nds32: use generic ioremap
csky: use generic ioremap
csky: remove ioremap_cache
riscv: use the generic ioremap code
lib: provide a simple generic ioremap implementation
sh: remove __iounmap
nios2: remove __iounmap
hexagon: remove __iounmap
m68k: rename __iounmap and mark it static
arch: rely on asm-generic/io.h for default ioremap_* definitions
asm-generic: don't provide ioremap for CONFIG_MMU
asm-generic: ioremap_uc should behave the same with and without MMU
xtensa: clean up ioremap
x86: Clean up ioremap()
parisc: remove __ioremap
nios2: remove __ioremap
alpha: remove the unused __ioremap wrapper
hexagon: clean up ioremap
ia64: rename ioremap_nocache to ioremap_uc
unicore32: remove ioremap_cached
...
- PERAMAENT flag to ftrace_ops when attaching a callback to a function
As /proc/sys/kernel/ftrace_enabled when set to zero will disable all
attached callbacks in ftrace, this has a detrimental impact on live
kernel tracing, as it disables all that it patched. If a ftrace_ops
is registered to ftrace with the PERMANENT flag set, it will prevent
ftrace_enabled from being disabled, and if ftrace_enabled is already
disabled, it will prevent a ftrace_ops with PREMANENT flag set from
being registered.
- New register_ftrace_direct(). As eBPF would like to register its own
trampolines to be called by the ftrace nop locations directly,
without going through the ftrace trampoline, this function has been
added. This allows for eBPF trampolines to live along side of
ftrace, perf, kprobe and live patching. It also utilizes the ftrace
enabled_functions file that keeps track of functions that have been
modified in the kernel, to allow for security auditing.
- Allow for kernel internal use of ftrace instances. Subsystems in
the kernel can now create and destroy their own tracing instances
which allows them to have their own tracing buffer, and be able
to record events without worrying about other users from writing over
their data.
- New seq_buf_hex_dump() that lets users use the hex_dump() in their
seq_buf usage.
- Notifications now added to tracing_max_latency to allow user space
to know when a new max latency is hit by one of the latency tracers.
- Wider spread use of generic compare operations for use of bsearch and
friends.
- More synthetic event fields may be defined (32 up from 16)
- Use of xarray for architectures with sparse system calls, for the
system call trace events.
This along with small clean ups and fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXdwv4BQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnB5AP91vsdHQjwE1+/UWG/cO+qFtKvn2QJK
QmBRIJNH/s+1TAD/fAOhgw+ojSK3o/qc+NpvPTEW9AEwcJL1wacJUn+XbQc=
=ztql
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New tracing features:
- New PERMANENT flag to ftrace_ops when attaching a callback to a
function.
As /proc/sys/kernel/ftrace_enabled when set to zero will disable
all attached callbacks in ftrace, this has a detrimental impact on
live kernel tracing, as it disables all that it patched. If a
ftrace_ops is registered to ftrace with the PERMANENT flag set, it
will prevent ftrace_enabled from being disabled, and if
ftrace_enabled is already disabled, it will prevent a ftrace_ops
with PREMANENT flag set from being registered.
- New register_ftrace_direct().
As eBPF would like to register its own trampolines to be called by
the ftrace nop locations directly, without going through the ftrace
trampoline, this function has been added. This allows for eBPF
trampolines to live along side of ftrace, perf, kprobe and live
patching. It also utilizes the ftrace enabled_functions file that
keeps track of functions that have been modified in the kernel, to
allow for security auditing.
- Allow for kernel internal use of ftrace instances.
Subsystems in the kernel can now create and destroy their own
tracing instances which allows them to have their own tracing
buffer, and be able to record events without worrying about other
users from writing over their data.
- New seq_buf_hex_dump() that lets users use the hex_dump() in their
seq_buf usage.
- Notifications now added to tracing_max_latency to allow user space
to know when a new max latency is hit by one of the latency
tracers.
- Wider spread use of generic compare operations for use of bsearch
and friends.
- More synthetic event fields may be defined (32 up from 16)
- Use of xarray for architectures with sparse system calls, for the
system call trace events.
This along with small clean ups and fixes"
* tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (51 commits)
tracing: Enable syscall optimization for MIPS
tracing: Use xarray for syscall trace events
tracing: Sample module to demonstrate kernel access to Ftrace instances.
tracing: Adding new functions for kernel access to Ftrace instances
tracing: Fix Kconfig indentation
ring-buffer: Fix typos in function ring_buffer_producer
ftrace: Use BIT() macro
ftrace: Return ENOTSUPP when DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not configured
ftrace: Rename ftrace_graph_stub to ftrace_stub_graph
ftrace: Add a helper function to modify_ftrace_direct() to allow arch optimization
ftrace: Add helper find_direct_entry() to consolidate code
ftrace: Add another check for match in register_ftrace_direct()
ftrace: Fix accounting bug with direct->count in register_ftrace_direct()
ftrace/selftests: Fix spelling mistake "wakeing" -> "waking"
tracing: Increase SYNTH_FIELDS_MAX for synthetic_events
ftrace/samples: Add a sample module that implements modify_ftrace_direct()
ftrace: Add modify_ftrace_direct()
tracing: Add missing "inline" in stub function of latency_fsnotify()
tracing: Remove stray tab in TRACE_EVAL_MAP_FILE's help text
tracing: Use seq_buf_hex_dump() to dump buffers
...
msi.h is generic for all architectures except x86, which has its own
version. Enabling MSI by adding msi.h to every architecture's Kbuild is
just an additional step which doesn't need to be done.
Make msi.h mandatory in the asm-generic/Kbuild so we don't have to do it
for each architecture.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/r/c991669e29a79b1a8e28c3b4b3a125801a693de8.1571983829.git.michal.simek@xilinx.com
Tested-by: Paul Walmsley <paul.walmsley@sifive.com> # build only, rv32/rv64
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # arch/riscv
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- Cross-arch changes to move the linker sections for NOTES and
EXCEPTION_TABLE into the RO_DATA area, where they belong on most
architectures. (Kees Cook)
- Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
trap jumps into the middle of those padding areas instead of
sliding execution. (Kees Cook)
- A thorough cleanup of symbol definitions within x86 assembler code.
The rather randomly named macros got streamlined around a
(hopefully) straightforward naming scheme:
SYM_START(name, linkage, align...)
SYM_END(name, sym_type)
SYM_FUNC_START(name)
SYM_FUNC_END(name)
SYM_CODE_START(name)
SYM_CODE_END(name)
SYM_DATA_START(name)
SYM_DATA_END(name)
etc - with about three times of these basic primitives with some
label, local symbol or attribute variant, expressed via postfixes.
No change in functionality intended. (Jiri Slaby)
- Misc other changes, cleanups and smaller fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
x86/entry/64: Remove pointless jump in paranoid_exit
x86/entry/32: Remove unused resume_userspace label
x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
m68k: Convert missed RODATA to RO_DATA
x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
x86/mm: Report actual image regions in /proc/iomem
x86/mm: Report which part of kernel image is freed
x86/mm: Remove redundant address-of operators on addresses
xtensa: Move EXCEPTION_TABLE to RO_DATA segment
powerpc: Move EXCEPTION_TABLE to RO_DATA segment
parisc: Move EXCEPTION_TABLE to RO_DATA segment
microblaze: Move EXCEPTION_TABLE to RO_DATA segment
ia64: Move EXCEPTION_TABLE to RO_DATA segment
h8300: Move EXCEPTION_TABLE to RO_DATA segment
c6x: Move EXCEPTION_TABLE to RO_DATA segment
arm64: Move EXCEPTION_TABLE to RO_DATA segment
alpha: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Actually use _etext for the end of the text segment
vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
...
- On ARMv8 CPUs without hardware updates of the access flag, avoid
failing cow_user_page() on PFN mappings if the pte is old. The patches
introduce an arch_faults_on_old_pte() macro, defined as false on x86.
When true, cow_user_page() makes the pte young before attempting
__copy_from_user_inatomic().
- Covert the synchronous exception handling paths in
arch/arm64/kernel/entry.S to C.
- FTRACE_WITH_REGS support for arm64.
- ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
- Several kselftest cases specific to arm64, together with a MAINTAINERS
update for these files (moved to the ARM64 PORT entry).
- Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
instructions under certain conditions.
- Workaround for Cortex-A57 and A72 errata where the CPU may
speculatively execute an AT instruction and associate a VMID with the
wrong guest page tables (corrupting the TLB).
- Perf updates for arm64: additional PMU topologies on HiSilicon
platforms, support for CCN-512 interconnect, AXI ID filtering in the
IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
- GICv3 optimisation to avoid a heavy barrier when accessing the
ICC_PMR_EL1 register.
- ELF HWCAP documentation updates and clean-up.
- SMC calling convention conduit code clean-up.
- KASLR diagnostics printed during boot
- NVIDIA Carmel CPU added to the KPTI whitelist
- Some arm64 mm clean-ups: use generic free_initrd_mem(), remove stale
macro, simplify calculation in __create_pgd_mapping(), typos.
- Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
endinanness to help with allmodconfig.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl3YJswACgkQa9axLQDI
XvFwYg//aTGhNLew3ADgW2TYal7LyqetRROixPBrzqHLu2A8No1+QxHMaKxpZVyf
pt25tABuLtPHql3qBzE0ltmfbLVsPj/3hULo404EJb9HLRfUnVGn7gcPkc+p4YAr
IYkYPXJbk6OlJ84vI+4vXmDEF12bWCqamC9qZ+h99qTpMjFXFO17DSJ7xQ8Xic3A
HHgCh4uA7gpTVOhLxaS6KIw+AZNYwvQxLXch2+wj6agbGX79uw9BeMhqVXdkPq8B
RTDJpOdS970WOT4cHWOkmXwsqqGRqgsgyu+bRUJ0U72+0y6MX0qSHIUnVYGmNc5q
Dtox4rryYLvkv/hbpkvjgVhv98q3J1mXt/CalChWB5dG4YwhJKN2jMiYuoAvB3WS
6dR7Dfupgai9gq1uoKgBayS2O6iFLSa4g58vt3EqUBqmM7W7viGFPdLbuVio4ycn
CNF2xZ8MZR6Wrh1JfggO7Hc11EJdSqESYfHO6V/pYB4pdpnqJLDoriYHXU7RsZrc
HvnrIvQWKMwNbqBvpNbWvK5mpBMMX2pEienA3wOqKNH7MbepVsG+npOZTVTtl9tN
FL0ePb/mKJu/2+gW8ntiqYn7EzjKprRmknOiT2FjWWo0PxgJ8lumefuhGZZbaOWt
/aTAeD7qKd/UXLKGHF/9v3q4GEYUdCFOXP94szWVPyLv+D9h8L8=
=TPL9
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from the arm64-specific bits (core arch and perf, new arm64
selftests), it touches the generic cow_user_page() (reviewed by
Kirill) together with a macro for x86 to preserve the existing
behaviour on this architecture.
Summary:
- On ARMv8 CPUs without hardware updates of the access flag, avoid
failing cow_user_page() on PFN mappings if the pte is old. The
patches introduce an arch_faults_on_old_pte() macro, defined as
false on x86. When true, cow_user_page() makes the pte young before
attempting __copy_from_user_inatomic().
- Covert the synchronous exception handling paths in
arch/arm64/kernel/entry.S to C.
- FTRACE_WITH_REGS support for arm64.
- ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
- Several kselftest cases specific to arm64, together with a
MAINTAINERS update for these files (moved to the ARM64 PORT entry).
- Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
instructions under certain conditions.
- Workaround for Cortex-A57 and A72 errata where the CPU may
speculatively execute an AT instruction and associate a VMID with
the wrong guest page tables (corrupting the TLB).
- Perf updates for arm64: additional PMU topologies on HiSilicon
platforms, support for CCN-512 interconnect, AXI ID filtering in
the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
- GICv3 optimisation to avoid a heavy barrier when accessing the
ICC_PMR_EL1 register.
- ELF HWCAP documentation updates and clean-up.
- SMC calling convention conduit code clean-up.
- KASLR diagnostics printed during boot
- NVIDIA Carmel CPU added to the KPTI whitelist
- Some arm64 mm clean-ups: use generic free_initrd_mem(), remove
stale macro, simplify calculation in __create_pgd_mapping(), typos.
- Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
endinanness to help with allmodconfig"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64: Kconfig: add a choice for endianness
kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous"
arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE
MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry
arm64: kaslr: Check command line before looking for a seed
arm64: kaslr: Announce KASLR status on boot
kselftest: arm64: fake_sigreturn_misaligned_sp
kselftest: arm64: fake_sigreturn_bad_size
kselftest: arm64: fake_sigreturn_duplicated_fpsimd
kselftest: arm64: fake_sigreturn_missing_fpsimd
kselftest: arm64: fake_sigreturn_bad_size_for_magic0
kselftest: arm64: fake_sigreturn_bad_magic
kselftest: arm64: add helper get_current_context
kselftest: arm64: extend test_init functionalities
kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
kselftest: arm64: mangle_pstate_invalid_daif_bits
kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
kselftest: arm64: extend toplevel skeleton Makefile
drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform
arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
...
The API will be used by the hv_balloon and hv_vmbus drivers.
Balloon up/down and hot-add of memory must not be active if the user
wants the Linux VM to support hibernation, because they are incompatible
with hibernation according to Hyper-V team, e.g. upon suspend the
balloon VSP doesn't save any info about the ballooned-out pages (if any);
so, after Linux resumes, Linux balloon VSC expects that the VSP will
return the pages if Linux is under memory pressure, but the VSP will
never do that, since the VSP thinks it never stole the pages from the VM.
So, if the user wants Linux VM to support hibernation, Linux must forbid
balloon up/down and hot-add, and the only functionality of the balloon VSC
driver is reporting the VM's memory pressure to the host.
Ideally, when Linux detects that the user wants it to support hibernation,
the balloon VSC should tell the VSP that it does not support ballooning
and hot-add. However, the current version of the VSP requires the VSC
should support these capabilities, otherwise the capability negotiation
fails and the VSC can not load at all, so with the later changes to the
VSC driver, Linux VM still reports to the VSP that the VSC supports these
capabilities, but the VSC ignores the VSP's requests of balloon up/down
and hot add, and reports an error to the VSP, when applicable. BTW, in
the future the balloon VSP driver will allow the VSC to not support the
capabilities of balloon up/down and hot add.
The ACPI S4 state is not a must for hibernation to work, because Linux is
able to hibernate as long as the system can shut down. However in practice
we decide to artificially use the presence of the virtual ACPI S4 state as
an indicator of the user's intent of using hibernation, because Linux VM
must find a way to know if the user wants to use the hibernation feature
or not.
By default, Hyper-V does not enable the virtual ACPI S4 state; on recent
Hyper-V hosts (e.g. RS5, 19H1), the administrator is able to enable the
state for a VM by WMI commands.
Once all the vmbus and VSC patches for the hibernation feature are
accepted, an extra patch will be submitted to forbid hibernation if the
virtual ACPI S4 state is absent, i.e. hv_is_hibernation_supported() is
false.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
The ftrace_graph_stub was created and points to ftrace_stub as a way to
assign the functon graph tracer function pointer to a stub function with a
different prototype than what ftrace_stub has and not trigger the C
verifier. The ftrace_graph_stub was created via the linker script
vmlinux.lds.h. Unfortunately, powerpc already uses the name
ftrace_graph_stub for its internal implementation of the function graph
tracer, and even though powerpc would still build, the change via the linker
script broke function tracer on powerpc from working.
By using the name ftrace_stub_graph, which does not exist anywhere else in
the kernel, this should not be a problem.
Link: https://lore.kernel.org/r/1573849732.5937.136.camel@lca.pw
Fixes: b83b43ffc6 ("fgraph: Fix function type mismatches of ftrace_graph_return using ftrace_stub")
Reorted-by: Qian Cai <cai@lca.pw>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This adds KCSAN instrumentation to atomic-instrumented.h.
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The C compiler is allowing more checks to make sure that function pointers
are assigned to the correct prototype function. Unfortunately, the function
graph tracer uses a special name with its assigned ftrace_graph_return
function pointer that maps to a stub function used by the function tracer
(ftrace_stub). The ftrace_graph_return variable is compared to the
ftrace_stub in some archs to know if the function graph tracer is enabled or
not. This means we can not just simply create a new function stub that
compares it without modifying all the archs.
Instead, have the linker script create a function_graph_stub that maps to
ftrace_stub, and this way we can define the prototype for it to match the
prototype of ftrace_graph_return, and make the compiler checks all happy!
Link: http://lkml.kernel.org/r/20191015090055.789a0aed@gandalf.local.home
Cc: linux-sh@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Commit b24413180f ("License cleanup: add SPDX GPL-2.0 license
identifier to files with no license") took care of a lot of files
without any license information.
These headers were not processed by the tool perhaps because they
contain "GPL" in the code.
I do not see any license boilerplate in them, so they fall back to
GPL version 2 only, which is the project default.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lore.kernel.org/r/20191018045053.8424-1-yamada.masahiro@socionext.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A lot of architectures reuse the same simple ioremap implementation, so
start lifting the most simple variant to lib/ioremap.c. It provides
ioremap_prot and iounmap, plus a default ioremap that uses prot_noncached,
although that can be overridden by asm/io.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
Various architectures that use asm-generic/io.h still defined their
own default versions of ioremap_nocache, ioremap_wt and ioremap_wc
that point back to plain ioremap directly or indirectly. Remove these
definitions and rely on asm-generic/io.h instead. For this to work
the backup ioremap_* defintions needs to be changed to purely cpp
macros instea of inlines to cover for architectures like openrisc
that only define ioremap after including <asm-generic/io.h>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
All MMU-enabled ports have a non-trivial ioremap and should thus provide
the prototype for their implementation instead of providing a generic
one unless a different symbol is not defined. Note that this only
affects sparc32 nds32 as all others do provide their own version.
Also update the kerneldoc comments in asm-generic/io.h to explain the
situation around the default ioremap* implementations correctly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Whatever reason there is for the existence of ioremap_uc, and the fact
that it returns NULL by default on architectures with an MMU applies
equally to nommu architectures, so don't provide different defaults.
In practice the difference is meaningless as the only portable driver
that uses ioremap_uc is atyfb which probably doesn't show up on nommu
devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
EXPORT_SYMBOL from assembly code produces an unused symbol __kcrctab_*.
kcrctab is used as a section name (prefixed with three underscores),
but never used as a symbol.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
For EXPORT_SYMBOL from C files, <linux/export.h> defines __ksymtab_*
as local symbols.
For EXPORT_SYMBOL from assembly, in contrast, <asm-generic/export.h>
produces globally-visible __ksymtab_* symbols due to this .globl
directive.
I do not know why this must be global. It still works without this.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Currently bitops-instrumented.h assumes that the architecture provides
atomic, non-atomic and locking bitops (e.g. both set_bit and __set_bit).
This is true on x86 and s390, but is not always true: there is a
generic bitops/non-atomic.h header that provides generic non-atomic
operations, and also a generic bitops/lock.h for locking operations.
powerpc uses the generic non-atomic version, so it does not have it's
own e.g. __set_bit that could be renamed arch___set_bit.
Split up bitops-instrumented.h to mirror the atomic/non-atomic/lock
split. This allows arches to only include the headers where they
have arch-specific versions to rename. Update x86 and s390.
(The generic operations are automatically instrumented because they're
written in C, not asm.)
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190820024941.12640-1-dja@axtens.net
When using patchable-function-entry, the compiler will record the
callsites into a section named "__patchable_function_entries" rather
than "__mcount_loc". Let's abstract this difference behind a new
FTRACE_CALLSITE_SECTION, so that architectures don't have to handle this
explicitly (e.g. with custom module linker scripts).
As parisc currently handles this explicitly, it is fixed up accordingly,
with its custom linker script removed. Since FTRACE_CALLSITE_SECTION is
only defined when DYNAMIC_FTRACE is selected, the parisc module loading
code is updated to only use the definition in that case. When
DYNAMIC_FTRACE is not selected, modules shouldn't have this section, so
this removes some redundant work in that case.
To make sure that this is keep up-to-date for modules and the main
kernel, a comment is added to vmlinux.lds.h, with the existing ifdeffery
simplified for legibility.
I built parisc generic-{32,64}bit_defconfig with DYNAMIC_FTRACE enabled,
and verified that the section made it into the .ko files for modules.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: linux-parisc@vger.kernel.org
With the previous patch, we should now not be using need_flush_all for
powerpc. But then make sure we force a PID tlbie flush with RIC=2 if
we ever find need_flush_all set. Also don't reset it after a mmu
gather flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024075801.22434-3-aneesh.kumar@linux.ibm.com
The update of the VDSO data is depending on __arch_use_vsyscall() returning
True. This is a leftover from the attempt to map the features of various
architectures 1:1 into generic code.
The usage of __arch_use_vsyscall() in the actual vsyscall implementations
got dropped and replaced by the requirement for the architecture code to
return U64_MAX if the global clocksource is not usable in the VDSO.
But the __arch_use_vsyscall() check in the update code stayed which causes
the VDSO data to be stale or invalid when an architecture actually
implements that function and returns False when the current clocksource is
not usable in the VDSO.
As a consequence the VDSO implementations of clock_getres(), time(),
clock_gettime(CLOCK_.*_COARSE) operate on invalid data and return bogus
information.
Remove the __arch_use_vsyscall() check from the VDSO update function and
update the VDSO data unconditionally.
[ tglx: Massaged changelog and removed the now useless implementations in
asm-generic/ARM64/MIPS ]
Fixes: 44f57d788e ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1571887709-11447-1-git-send-email-chenhc@lemote.com
Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
people, and until recently arm64 used these erroneously/pointlessly for
other levels of page table.
To make it incredibly clear that these only apply to the PTE level, and to
align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
to pgtable_pte_page_{ctor,dtor}().
These changes were generated with the following shell script:
----
git grep -lw 'pgtable_page_.tor' | while read FILE; do
sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
done
----
... with the documentation re-flowed to remain under 80 columns, and
whitespace fixed up in macros to keep backslashes aligned.
There should be no functional change as a result of this patch.
Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original clean up of "cut here" missed the WARN_ON() case (that does
not have a printk message), which was fixed recently by adding an explicit
printk of "cut here". This had the downside of adding a printk() to every
WARN_ON() caller, which reduces the utility of using an instruction
exception to streamline the resulting code. By making this a new BUGFLAG,
all of these can be removed and "cut here" can be handled by the exception
handler.
This was very pronounced on PowerPC, but the effect can be seen on x86 as
well. The resulting text size of a defconfig build shows some small
savings from this patch:
text data bss dec hex filename
19691167 5134320 1646664 26472151 193eed7 vmlinux.before
19676362 5134260 1663048 26473670 193f4c6 vmlinux.after
This change also opens the door for creating something like BUG_MSG(),
where a custom printk() before issuing BUG(), without confusing the "cut
here" line.
Link: http://lkml.kernel.org/r/201908200943.601DD59DCE@keescook
Fixes: 6b15f678fb ("include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for cleaning up "cut here" even more, this removes the
__WARN_*TAINT() helpers, as they limit the ability to add new BUGFLAG_*
flags to call sites. They are removed by expanding them into full
__WARN_FLAGS() calls.
Link: http://lkml.kernel.org/r/20190819234111.9019-6-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Clean up WARN() "cut here" handling", v2.
Christophe Leroy noticed that the fix for missing "cut here" in the WARN()
case was adding explicit printk() calls instead of teaching the exception
handler to add it. This refactors the bug/warn infrastructure to pass
this information as a new BUGFLAG.
Longer details repeated from the last patch in the series:
bug: move WARN_ON() "cut here" into exception handler
The original cleanup of "cut here" missed the WARN_ON() case (that does
not have a printk message), which was fixed recently by adding an explicit
printk of "cut here". This had the downside of adding a printk() to every
WARN_ON() caller, which reduces the utility of using an instruction
exception to streamline the resulting code. By making this a new BUGFLAG,
all of these can be removed and "cut here" can be handled by the exception
handler.
This was very pronounced on PowerPC, but the effect can be seen on x86 as
well. The resulting text size of a defconfig build shows some small
savings from this patch:
text data bss dec hex filename
19691167 5134320 1646664 26472151 193eed7 vmlinux.before
19676362 5134260 1663048 26473670 193f4c6 vmlinux.after
This change also opens the door for creating something like BUG_MSG(),
where a custom printk() before issuing BUG(), without confusing the "cut
here" line.
This patch (of 7):
There's no reason to have specialized helpers for passing the warn taint
down to __warn(). Consolidate and refactor helper macros, removing
__WARN_printf() and warn_slowpath_fmt_taint().
Link: http://lkml.kernel.org/r/20190819234111.9019-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to 78ddc53473 ("thp: rename split_huge_page_pmd() to
split_huge_pmd()"), update related comment.
Link: http://lkml.kernel.org/r/20190731033406.185285-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
cache for page table allocations on several architectures that do not use
PAGE_SIZE tables for one or more levels of the page table hierarchy.
Most architectures do not implement these functions and use __weak default
NOP implementation of pgd_cache_init(). Since there is no such default
for pgtable_cache_init(), its empty stub is duplicated among most
architectures.
Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
drop empty stubs of pgtable_cache_init().
Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org> [arm64]
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: remove quicklist page table caches".
A while ago Nicholas proposed to remove quicklist page table caches [1].
I've rebased his patch on the curren upstream and switched ia64 and sh to
use generic versions of PTE allocation.
[1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com
This patch (of 3):
Remove page table allocator "quicklists". These have been around for a
long time, but have not got much traction in the last decade and are only
used on ia64 and sh architectures.
The numbers in the initial commit look interesting but probably don't
apply anymore. If anybody wants to resurrect this it's in the git
history, but it's unhelpful to have this code and divergent allocator
behaviour for minor archs.
Also it might be better to instead make more general improvements to page
allocator if this is still so slow.
Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Summary of modules changes for the 5.4 merge window:
- Introduce exported symbol namespaces.
This new feature allows subsystem maintainers to partition and
categorize their exported symbols into explicit namespaces. Module
authors are now required to import the namespaces they need.
Some of the main motivations of this feature include: allowing kernel
developers to better manage the export surface, allow subsystem
maintainers to explicitly state that usage of some exported symbols
should only be limited to certain users (think: inter-module or
inter-driver symbols, debugging symbols, etc), as well as more easily
limiting the availability of namespaced symbols to other parts of the
kernel. With the module import requirement, it is also easier to spot
the misuse of exported symbols during patch review. Two new macros are
introduced: EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL(). The API is
thoroughly documented in Documentation/kbuild/namespaces.rst.
- Some small code and kbuild cleanups here and there.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJdh3n8AAoJEMBFfjjOO8Fy94kP+QHZF39QDvLbxAzEYAETAS+o
CFu6wix/DrAwFkTU/kX1eAsAwDBEz0xkMciR4BsLX3sIafUVERxtDXVAui/dA1+6
zfw2c3ObyVwPEk6aUPFprgkj+08gxujsJFlYTsQQUhtRbmxg6R7hD6t6ANxiHaY2
AQe5TzOWXoIa2hHO+7rPMqf8l6qiFCaL0s3v5jrmBXa5mHmc4PVy95h1J6xQVw2u
b+SlvKeylHv+OtCtvthkAJS3hfS35J/1TNb/RNYIvh60IfEguEuFsGuQ9JiSSAZS
pv1cJ+I5d4v8Y/md1rZpdjTJL9gCrq/UUC67+UkejCOn0C+7XM2eR4Bu/jWvdMSn
ZQDHcPhFSIfmP7FaKomPogaBbw1sI1FvM5930pPJzHnyO9+cefBXe7rWaaB+y0At
GAxOtmk1dKh01BT7YO/C0oVuX87csWd74NHypVsbs0TgQo5jBFdZRheyDrq5YB+s
tVK+5H0nqQrCcfo/TvhcsZlgITTGtgTPenaW99/i7qNa9mRUtxC/VkE+aob6HNRF
1iBxxopOTxGN8akyKOVumtkuTQH3EJfouZee//pWbXLzyDmScg/k67vuao8kxbyq
NA1piFAGJAHFsHATxrbvNOq6jZ5bfUT8pwSTs83JppuR++8Hxk7zaShS3/LvsvHt
6ist/epOwTZ7oiNQ04nj
=72Uy
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
"The main bulk of this pull request introduces a new exported symbol
namespaces feature. The number of exported symbols is increasingly
growing with each release (we're at about 31k exports as of 5.3-rc7)
and we currently have no way of visualizing how these symbols are
"clustered" or making sense of this huge export surface.
Namespacing exported symbols allows kernel developers to more
explicitly partition and categorize exported symbols, as well as more
easily limiting the availability of namespaced symbols to other parts
of the kernel. For starters, we have introduced the USB_STORAGE
namespace to demonstrate the API's usage. I have briefly summarized
the feature and its main motivations in the tag below.
Summary:
- Introduce exported symbol namespaces.
This new feature allows subsystem maintainers to partition and
categorize their exported symbols into explicit namespaces. Module
authors are now required to import the namespaces they need.
Some of the main motivations of this feature include: allowing
kernel developers to better manage the export surface, allow
subsystem maintainers to explicitly state that usage of some
exported symbols should only be limited to certain users (think:
inter-module or inter-driver symbols, debugging symbols, etc), as
well as more easily limiting the availability of namespaced symbols
to other parts of the kernel.
With the module import requirement, it is also easier to spot the
misuse of exported symbols during patch review.
Two new macros are introduced: EXPORT_SYMBOL_NS() and
EXPORT_SYMBOL_NS_GPL(). The API is thoroughly documented in
Documentation/kbuild/namespaces.rst.
- Some small code and kbuild cleanups here and there"
* tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Remove leftover '#undef' from export header
module: remove unneeded casts in cmp_name()
module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES
module: remove redundant 'depends on MODULES'
module: Fix link failure due to invalid relocation on namespace offset
usb-storage: export symbols in USB_STORAGE namespace
usb-storage: remove single-use define for debugging
docs: Add documentation for Symbol Namespaces
scripts: Coccinelle script for namespace dependencies.
modpost: add support for generating namespace dependencies
export: allow definition default namespaces in Makefiles or sources
module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS
modpost: add support for symbol namespaces
module: add support for symbol namespaces.
export: explicitly align struct kernel_symbol
module: support reading multiple values per modinfo tag
Pull crypto updates from Herbert Xu:
"API:
- Add the ability to abort a skcipher walk.
Algorithms:
- Fix XTS to actually do the stealing.
- Add library helpers for AES and DES for single-block users.
- Add library helpers for SHA256.
- Add new DES key verification helper.
- Add surrounding bits for ESSIV generator.
- Add accelerations for aegis128.
- Add test vectors for lzo-rle.
Drivers:
- Add i.MX8MQ support to caam.
- Add gcm/ccm/cfb/ofb aes support in inside-secure.
- Add ofb/cfb aes support in media-tek.
- Add HiSilicon ZIP accelerator support.
Others:
- Fix potential race condition in padata.
- Use unbound workqueues in padata"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (311 commits)
crypto: caam - Cast to long first before pointer conversion
crypto: ccree - enable CTS support in AES-XTS
crypto: inside-secure - Probe transform record cache RAM sizes
crypto: inside-secure - Base RD fetchcount on actual RD FIFO size
crypto: inside-secure - Base CD fetchcount on actual CD FIFO size
crypto: inside-secure - Enable extended algorithms on newer HW
crypto: inside-secure: Corrected configuration of EIP96_TOKEN_CTRL
crypto: inside-secure - Add EIP97/EIP197 and endianness detection
padata: remove cpu_index from the parallel_queue
padata: unbind parallel jobs from specific CPUs
padata: use separate workqueues for parallel and serial work
padata, pcrypt: take CPU hotplug lock internally in padata_alloc_possible
crypto: pcrypt - remove padata cpumask notifier
padata: make padata_do_parallel find alternate callback CPU
workqueue: require CPU hotplug read exclusion for apply_workqueue_attrs
workqueue: unconfine alloc/apply/free_workqueue_attrs()
padata: allocate workqueue internally
arm64: dts: imx8mq: Add CAAM node
random: Use wait_event_freezable() in add_hwgenerator_randomness()
crypto: ux500 - Fix COMPILE_TEST warnings
...
Here are three small cleanup patches for the include/asm-generic
directory. Christoph removes the __ioremap as part of a cleanup,
Nico improves the constant do_div() optimization, and Denis
changes BUG_ON() to be consistent with other implementations.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdf97EAAoJEJpsee/mABjZwTAP+wXle7qDBAkT+ddDwJ9IM+5x
j3oJ/eo4YN3II20xTbjmPfXBByb4sBnyZ1EWMtkwcMTjTbyuEwRLOHy810NWn0Yc
zKfrUGuPN2J54WGbEtzlCSf5hWOPdYij2JbkqZmwbCgnn4vHWOtOlL0m7RZ4sKWt
PjjdjZ+7B7ZdVIqkwCmyv7kY+0IQtdekjQDeqXSmoNeT2zeLS/5Di3inD53mi3Q0
1B6Eh1VEETqnnFke3GtHr5euo2EzglNnOpIP5V0VyN/t0HSz4mD9ZwGdANKPUij7
NSNUpCkbyvU6TNY82TMDP81mGH/QBF9Dy/1SKHFc258VJvLcB3Dtdp47fS9rYZ/N
XqxKZ+BVhZSdDV9JKJXX1Bl4vwP6IOF8LalQM50qOIw2kdxA8ZDJ17CPQ3a4kM6Y
QBNepCTIPZdjXpT5OaJdQiX8prsRMEBiD7I+Z9kO54ImHt94IB2MeZqMuuFG0GmL
/o3Ia1FYd2TF3SWHCxe2n2kFsLxQFvtwGP7zPvQpjl0Bh1UGEo0XhBERVe5sKkGm
xYfzXAbi38R6mY25lsqie7ibUQTY6Z2ZFYK5km9oGiAyNO0r9ioy56ROPWxZG2Kv
f74cQiEIORfc5ePz17NRsD8Mppe0g9lSwZ1RV29GtnaXNLET6gnxGQDLdzvitgF4
ZWPIGg+qOM9fmrxdNdCr
=9VXy
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"Here are three small cleanup patches for the include/asm-generic
directory.
Christoph removes the __ioremap as part of a cleanup, Nico improves
the constant do_div() optimization, and Denis changes BUG_ON() to be
consistent with other implementations"
* tag 'asm-generic-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: add unlikely to default BUG_ON(x)
__div64_const32(): improve the generic C version
asm-generic: don't provide __ioremap
Pull core timer updates from Thomas Gleixner:
"Timers and timekeeping updates:
- A large overhaul of the posix CPU timer code which is a preparation
for moving the CPU timer expiry out into task work so it can be
properly accounted on the task/process.
An update to the bogus permission checks will come later during the
merge window as feedback was not complete before heading of for
travel.
- Switch the timerqueue code to use cached rbtrees and get rid of the
homebrewn caching of the leftmost node.
- Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
single function
- Implement the separation of hrtimers to be forced to expire in hard
interrupt context even when PREEMPT_RT is enabled and mark the
affected timers accordingly.
- Implement a mechanism for hrtimers and the timer wheel to protect
RT against priority inversion and live lock issues when a (hr)timer
which should be canceled is currently executing the callback.
Instead of infinitely spinning, the task which tries to cancel the
timer blocks on a per cpu base expiry lock which is held and
released by the (hr)timer expiry code.
- Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
resulting in faster access to timekeeping functions.
- Updates to various clocksource/clockevent drivers and their device
tree bindings.
- The usual small improvements all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
posix-cpu-timers: Fix permission check regression
posix-cpu-timers: Always clear head pointer on dequeue
hrtimer: Add a missing bracket and hide `migration_base' on !SMP
posix-cpu-timers: Make expiry_active check actually work correctly
posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
tick: Mark sched_timer to expire in hard interrupt context
hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
posix-cpu-timers: Utilize timerqueue for storage
posix-cpu-timers: Move state tracking to struct posix_cputimers
posix-cpu-timers: Deduplicate rlimit handling
posix-cpu-timers: Remove pointless comparisons
posix-cpu-timers: Get rid of 64bit divisions
posix-cpu-timers: Consolidate timer expiry further
posix-cpu-timers: Get rid of zero checks
rlimit: Rewrite non-sensical RLIMIT_CPU comment
posix-cpu-timers: Respect INFINITY for hard RTTIME limit
posix-cpu-timers: Switch thread group sampling to array
posix-cpu-timers: Restructure expiry array
posix-cpu-timers: Remove cputime_expires
...
Pull scheduler updates from Ingo Molnar:
- MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.
As perf and the scheduler is getting bigger and more complex,
document the status quo of current responsibilities and interests,
and spread the review pain^H^H^H^H fun via an increase in the Cc:
linecount generated by scripts/get_maintainer.pl. :-)
- Add another series of patches that brings the -rt (PREEMPT_RT) tree
closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
into a new CONFIG_PREEMPTION category that will allow the eventual
introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
to go though.
- Extend the CPU cgroup controller with uclamp.min and uclamp.max to
allow the finer shaping of CPU bandwidth usage.
- Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).
- Improve the behavior of high CPU count, high thread count
applications running under cpu.cfs_quota_us constraints.
- Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.
- Improve CPU isolation housekeeping CPU allocation NUMA locality.
- Fix deadline scheduler bandwidth calculations and logic when cpusets
rebuilds the topology, or when it gets deadline-throttled while it's
being offlined.
- Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
setscheduler() system calls without creating global serialization.
Add new synchronization between cpuset topology-changing events and
the deadline acceptance tests in setscheduler(), which were broken
before.
- Rework the active_mm state machine to be less confusing and more
optimal.
- Rework (simplify) the pick_next_task() slowpath.
- Improve load-balancing on AMD EPYC systems.
- ... and misc cleanups, smaller fixes and improvements - please see
the Git log for more details.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
sched/psi: Correct overly pessimistic size calculation
sched/fair: Speed-up energy-aware wake-ups
sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
sched/uclamp: Update CPU's refcount on TG's clamp changes
sched/uclamp: Use TG's clamps to restrict TASK's clamps
sched/uclamp: Propagate system defaults to the root group
sched/uclamp: Propagate parent clamps
sched/uclamp: Extend CPU's cgroup controller
sched/topology: Improve load balancing on AMD EPYC systems
arch, ia64: Make NUMA select SMP
sched, perf: MAINTAINERS update, add submaintainers and reviewers
sched/fair: Use rq_lock/unlock in online_fair_sched_group
cpufreq: schedutil: fix equation in comment
sched: Rework pick_next_task() slow-path
sched: Allow put_prev_task() to drop rq->lock
sched/fair: Expose newidle_balance()
sched: Add task_struct pointer to sched_class::set_curr_task
sched: Rework CPU hotplug task selection
sched/{rt,deadline}: Fix set_next_task vs pick_next_task
sched: Fix kerneldoc comment for ia64_set_curr_task
...
- 52-bit virtual addressing in the kernel
- New ABI to allow tagged user pointers to be dereferenced by syscalls
- Early RNG seeding by the bootloader
- Improve robustness of SMP boot
- Fix TLB invalidation in light of recent architectural clarifications
- Support for i.MX8 DDR PMU
- Remove direct LSE instruction patching in favour of static keys
- Function error injection using kprobes
- Support for the PPTT "thread" flag introduced by ACPI 6.3
- Move PSCI idle code into proper cpuidle driver
- Relaxation of implicit I/O memory barriers
- Build with RELR relocations when toolchain supports them
- Numerous cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl1yYREQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNAM3CAChqDFQkryXoHwdeEcaukMRVNxtxOi4pM4g
5xqkb7PoqRJssIblsuhaXjrSD97yWCgaqCmFe6rKoes++lP4bFcTe22KXPPyPBED
A+tK4nTuKKcZfVbEanUjI+ihXaHJmKZ/kwAxWsEBYZ4WCOe3voCiJVNO2fHxqg1M
8TskZ2BoayTbWMXih0eJg2MCy/xApBq4b3nZG4bKI7Z9UpXiKN1NYtDh98ZEBK4V
d/oNoHsJ2ZvIQsztoBJMsvr09DTCazCijWZiECadm6l41WEPFizngrACiSJLLtYo
0qu4qxgg9zgFlvBCRQmIYSggTuv35RgXSfcOwChmW5DUjHG+f9GK
=Ru4B
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Although there isn't tonnes of code in terms of line count, there are
a fair few headline features which I've noted both in the tag and also
in the merge commits when I pulled everything together.
The part I'm most pleased with is that we had 35 contributors this
time around, which feels like a big jump from the usual small group of
core arm64 arch developers. Hopefully they all enjoyed it so much that
they'll continue to contribute, but we'll see.
It's probably worth highlighting that we've pulled in a branch from
the risc-v folks which moves our CPU topology code out to where it can
be shared with others.
Summary:
- 52-bit virtual addressing in the kernel
- New ABI to allow tagged user pointers to be dereferenced by
syscalls
- Early RNG seeding by the bootloader
- Improve robustness of SMP boot
- Fix TLB invalidation in light of recent architectural
clarifications
- Support for i.MX8 DDR PMU
- Remove direct LSE instruction patching in favour of static keys
- Function error injection using kprobes
- Support for the PPTT "thread" flag introduced by ACPI 6.3
- Move PSCI idle code into proper cpuidle driver
- Relaxation of implicit I/O memory barriers
- Build with RELR relocations when toolchain supports them
- Numerous cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits)
arm64: remove __iounmap
arm64: atomics: Use K constraint when toolchain appears to support it
arm64: atomics: Undefine internal macros after use
arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
arm64: asm: Kill 'asm/atomic_arch.h'
arm64: lse: Remove unused 'alt_lse' assembly macro
arm64: atomics: Remove atomic_ll_sc compilation unit
arm64: avoid using hard-coded registers for LSE atomics
arm64: atomics: avoid out-of-line ll/sc atomics
arm64: Use correct ll/sc atomic constraints
jump_label: Don't warn on __exit jump entries
docs/perf: Add documentation for the i.MX8 DDR PMU
perf/imx_ddr: Add support for AXI ID filtering
arm64: kpti: ensure patched kernel text is fetched from PoU
arm64: fix fixmap copy for 16K pages and 48-bit VA
perf/smmuv3: Validate groups for global filtering
perf/smmuv3: Validate group size
arm64: Relax Documentation/arm64/tagged-pointers.rst
arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F
arm64: mm: Ignore spurious translation faults taken from the kernel
...
Commit 7290d58095 ("module: use relative references for __ksymtab
entries") converted the '__put' #define into an assembly macro in
asm-generic/export.h but forgot to remove the corresponding '#undef'.
Remove the leftover '#undef'.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Commit 8651ec01da ("module: add support for symbol namespaces.")
broke linking for arm64 defconfig:
| lib/crypto/arc4.o: In function `__ksymtab_arc4_setkey':
| arc4.c:(___ksymtab+arc4_setkey+0x8): undefined reference to `no symbol'
| lib/crypto/arc4.o: In function `__ksymtab_arc4_crypt':
| arc4.c:(___ksymtab+arc4_crypt+0x8): undefined reference to `no symbol'
This is because the dummy initialisation of the 'namespace_offset' field
in 'struct kernel_symbol' when using EXPORT_SYMBOL on architectures with
support for PREL32 locations uses an offset from an absolute address (0)
in an effort to trick 'offset_to_pointer' into behaving as a NOP,
allowing non-namespaced symbols to be treated in the same way as those
belonging to a namespace.
Unfortunately, place-relative relocations require a symbol reference
rather than an absolute value and, although x86 appears to get away with
this due to placing the kernel text at the top of the address space, it
almost certainly results in a runtime failure if the kernel is relocated
dynamically as a result of KASLR.
Rework 'namespace_offset' so that a value of 0, which cannot occur for a
valid namespaced symbol, indicates that the corresponding symbol does
not belong to a namespace.
Cc: Matthias Maennich <maennich@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 8651ec01da ("module: add support for symbol namespaces.")
Reported-by: kbuild test robot <lkp@intel.com>
Tested-by: Matthias Maennich <maennich@google.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matthias Maennich <maennich@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
The EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL() macros can be used to
export a symbol to a specific namespace. There are no _GPL_FUTURE and
_UNUSED variants because these are currently unused, and I'm not sure
they are necessary.
I didn't add EXPORT_SYMBOL_NS() for ASM exports; this patch sets the
namespace of ASM exports to NULL by default. In case of relative
references, it will be relocatable to NULL. If there's a need, this
should be pretty easy to add.
A module that wants to use a symbol exported to a namespace must add a
MODULE_IMPORT_NS() statement to their module code; otherwise, modpost
will complain when building the module, and the kernel module loader
will emit an error and fail when loading the module.
MODULE_IMPORT_NS() adds a modinfo tag 'import_ns' to the module. That
tag can be observed by the modinfo command, modpost and kernel/module.c
at the time of loading the module.
The ELF symbols are renamed to include the namespace with an asm label;
for example, symbol 'usb_stor_suspend' in namespace USB_STORAGE becomes
'usb_stor_suspend.USB_STORAGE'. This allows modpost to do namespace
checking, without having to go through all the effort of parsing ELF and
relocation records just to get to the struct kernel_symbols.
On x86_64 I saw no difference in binary size (compression), but at
runtime this will require a word of memory per export to hold the
namespace. An alternative could be to store namespaced symbols in their
own section and use a separate 'struct namespaced_kernel_symbol' for
that section, at the cost of making the module loader more complex.
Co-developed-by: Martijn Coenen <maco@android.com>
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
This change allows growing struct kernel_symbol without wasting bytes to
alignment. It also concretized the alignment of ksymtab entries if
relative references are used for ksymtab entries.
struct kernel_symbol was already implicitly being aligned to the word
size, except on x86_64 and m68k, where it is aligned to 16 and 2 bytes,
respectively.
As far as I can tell there is no requirement for aligning struct
kernel_symbol to 16 bytes on x86_64, but gcc aligns structs to their
size, and the linker aligns the custom __ksymtab sections to the largest
data type contained within, so setting KSYM_ALIGN to 16 was necessary to
stay consistent with the code generated for non-ASM EXPORT_SYMBOL(). Now
that non-ASM EXPORT_SYMBOL() explicitly aligns to word size (8),
KSYM_ALIGN is no longer necessary.
In case of relative references, the alignment has been changed
accordingly to not waste space when adding new struct members.
As for m68k, struct kernel_symbol is aligned to 2 bytes even though the
structure itself is 8 bytes; using a 4-byte alignment shouldn't hurt.
I manually verified the output of the __ksymtab sections didn't change
on x86, x86_64, arm, arm64 and m68k. As expected, the section contents
didn't change, and the ELF section alignment only changed on x86_64 and
m68k. Feedback from other archs more than welcome.
Co-developed-by: Martijn Coenen <maco@android.com>
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Add unlikely to default BUG_ON(x) in !CONFIG_BUG. It makes
the define consistent with BUG_ON(x) in CONFIG_BUG.
Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Let's rework that code to avoid large immediate values and convert some
64-bit variables to 32-bit ones when possible. This allows gcc to
produce smaller and better code. This even produces optimal code on
RISC-V.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
__ioremap is not a kernel API, but used for helpers with differing
semantics in arch code. We should not provide it in as-generic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Tested-by: Paul Walmsley <paul.walmsley@sifive.com> # rv32, rv64 boot
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # arch/riscv
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Contrary to the description, the first parameter (n) should not be passed
as a pointer, but directly as an lvalue. This is possible because do_div() is
a macro.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lkml.kernel.org/r/20190808181948.27659-1-j.neuschaefer@gmx.net
Hyper-V guests use the default native_sched_clock() in
pv_ops.time.sched_clock on x86. But native_sched_clock() directly uses the
raw TSC value, which can be discontinuous in a Hyper-V VM.
Add the generic hv_setup_sched_clock() to set the sched clock function
appropriately. On x86, this sets pv_ops.time.sched_clock to read the
Hyper-V reference TSC value that is scaled and adjusted to be continuous.
Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.
[ tglx: Folded build fix ]
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lkml.kernel.org/r/20190814123216.32245-3-Tianyu.Lan@microsoft.com
The lockdown module is intended to allow for kernels to be locked down
early in boot - sufficiently early that we don't have the ability to
kmalloc() yet. Add support for early initialisation of some LSMs, and
then add them to the list of names when we do full initialisation later.
Early LSMs are initialised in link order and cannot be overridden via
boot parameters, and cannot make use of kmalloc() (since the allocator
isn't initialised yet).
(Fixed by Stephen Rothwell to include a stub to fix builds when
!CONFIG_SECURITY)
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Morris <jmorris@namei.org>
A compiler throws a warning on an arm64 system since commit 9849a5697d
("arch, mm: convert all architectures to use 5level-fixup.h"),
mm/kasan/init.c: In function 'kasan_free_p4d':
mm/kasan/init.c:344:9: warning: variable 'p4d' set but not used [-Wunused-but-set-variable]
p4d_t *p4d;
^~~
because p4d_none() in "5level-fixup.h" is compiled away while it is a
static inline function in "pgtable-nopud.h".
However, if converted p4d_none() to a static inline there, powerpc would
be unhappy as it reads those in assembler language in
"arch/powerpc/include/asm/book3s/64/pgtable.h", so it needs to skip
assembly include for the static inline C function.
While at it, converted a few similar functions to be consistent with the
ones in "pgtable-nopud.h".
Link: http://lkml.kernel.org/r/20190806232917.881-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function override_function_with_return() is defined separately for
each architecture and every architecture's definition is almost same
with each other. E.g. x86 and powerpc both define function in its own
asm/error-injection.h header and override_function_with_return() has
the same definition, the only difference is that x86 defines an extra
function just_return_func() but it is specific for x86 and is only used
by x86's override_function_with_return(), so don't need to export this
function.
This patch consolidates override_function_with_return() definition into
asm-generic/error-injection.h header, thus all architectures can use the
common definition. As result, the architecture specific headers are
removed; the include/linux/error-injection.h header also changes to
include asm-generic/error-injection.h header rather than architecture
header, furthermore, it includes linux/compiler.h for successful
compilation.
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Commit d66acc39c7 ("bitops: Optimise get_order()") introduced a
compilation warning because "rx_frag_size" is an "ushort" while
PAGE_SHIFT here is 16.
The commit changed the get_order() to be a multi-line macro where
compilers insist to check all statements in the macro even when
__builtin_constant_p(rx_frag_size) will return false as "rx_frag_size"
is a module parameter.
In file included from ./arch/powerpc/include/asm/page_64.h:107,
from ./arch/powerpc/include/asm/page.h:242,
from ./arch/powerpc/include/asm/mmu.h:132,
from ./arch/powerpc/include/asm/lppaca.h:47,
from ./arch/powerpc/include/asm/paca.h:17,
from ./arch/powerpc/include/asm/current.h:13,
from ./include/linux/thread_info.h:21,
from ./arch/powerpc/include/asm/processor.h:39,
from ./include/linux/prefetch.h:15,
from drivers/net/ethernet/emulex/benet/be_main.c:14:
drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create':
./include/asm-generic/getorder.h:54:9: warning: comparison is always
true due to limited range of data type [-Wtype-limits]
(((n) < (1UL << PAGE_SHIFT)) ? 0 : \
^
drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion
of macro 'get_order'
adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
^~~~~~~~~
Fix it by moving all of this multi-line macro into a proper function,
and killing __get_order() off.
[akpm@linux-foundation.org: remove __get_order() altogether]
[cai@lca.pw: v2]
Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw
Fixes: d66acc39c7 ("bitops: Optimise get_order()")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Bill Wendling <morbo@google.com>
Cc: James Y Knight <jyknight@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
functionality which today depends on CONFIG_PREEMPT.
Switch the preemption code, scheduler and init task over to use
CONFIG_PREEMPTION.
That's the first step towards RT in that area. The more complex changes are
coming separately.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190726212124.117528401@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The generic aegis128 software crypto driver recently gained support
for using SIMD intrinsics to increase performance, for which it
uncondionally #include's the <asm/simd.h> header. Unfortunately,
this header does not exist on many architectures, resulting in
build failures.
Since asm-generic already has a version of simd.h, let's make it
a mandatory header so that it gets instantiated on all architectures
that don't provide their own version.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The generic SMP variant of arch_futex_atomic_op_inuser() returns always
-ENOSYS so the switch case and surrounding code are pointless. Remove it
and just return -ENOSYS.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/12bdaca8-99eb-e576-f842-5970ab1d6a92@virtuozzo.com
For architectures using __WARN_TAINT, the WARN_ON macro did not print
out the "cut here" string. The other WARN_XXX macros would print "cut
here" inside __warn_printk, which is not called for WARN_ON since it
doesn't have a message to print.
Link: http://lkml.kernel.org/r/20190624154831.163888-1-ddavenport@chromium.org
Fixes: a7bed27af1 ("bug: fix "cut here" location for __WARN_TAINT architectures")
Signed-off-by: Drew Davenport <ddavenport@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix this compilation warning on x86 by making flush_cache_vmap() inline.
lib/ioremap.c: In function 'ioremap_page_range':
lib/ioremap.c:214:16: warning: variable 'start' set but not used [-Wunused-but-set-variable]
unsigned long start;
^~~~~
While at it, convert all other similar functions to inline for
consistency.
Link: http://lkml.kernel.org/r/1562594592-15228-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The asm-generic changes for 5.3 consist of a cleanup series from
Christoph Hellwig, who explains:
"asm-generic/ptrace.h is a little weird in that it doesn't actually
implement any functionality, but it provided multiple layers of macros
that just implement trivial inline functions. We implement those
directly in the few architectures and be off with a much simpler
design."
Link: https://lore.kernel.org/lkml/20190624054728.30966-1-hch@lst.de/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdKPURAAoJEJpsee/mABjZalgP/idFZKlL9jb32p9eVacW1ngm
CzwKk+49UBpLlimTh3ZtpiSJEHQyXP/QYlJ0/kV65YJriq5FsqlBrkPnWgsgMG9x
HBhJEEfVtXolXK3yNEsFIt/0j0Xh7+uCyBNZNuJrIRy/9x2z2nhBgDAenWPpZNuT
qjpArBAVEWQMsWgmgZUlCKOT7ziSx5+w1bfqiiUZDjwjqimPhLUBfoZmUWHtO49M
4/95RVOIMoLlIcaCUfqsvfkf7v6mfFAADhTrB/FZWVNX839fnpifqQL9BmOlgrEM
kxn5wM/dxRDwRT8+mVRyB8ax4/rIgMIFoaA7Hrv+hoUsiOVD7AkNXynZKQh1hhjl
449j68esoA6vlfdFIhagpKKTiQcWXJDbEgAoSJcM0WIl3JAjc+3nVWShTAAEW65r
Z+Bgy1OczoCsRXbYR/TwpThHj3197xMRQEluzaLnd5Zx5feUDUKuDcxhPpev/ceO
qmV5FeGqxRlZhJjVK8lmcHNZP0e4pkodwrNKC/2NIlIp6EKmMNI0nCjVqINigHGC
97Kc7N94WHdQ3tA7GB8YaUfd8w86W5ZOgRh+uuZ0brPziL1MR5lD/NvzjVSfyvVp
7UHNP7stNbavg20vDhlWGIsWiwoDlJf0YLUA6kXHryb9i/fh8sqWjz99QFu6QIfs
BTgeLtNP8hKhMkgew2XL
=jkfI
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"The asm-generic changes for 5.3 consist of a cleanup series to remove
ptrace.h from Christoph Hellwig, who explains:
'asm-generic/ptrace.h is a little weird in that it doesn't actually
implement any functionality, but it provided multiple layers of
macros that just implement trivial inline functions. We implement
those directly in the few architectures and be off with a much
simpler design.'
at https://lore.kernel.org/lkml/20190624054728.30966-1-hch@lst.de/"
* tag 'asm-generic-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: remove ptrace.h
x86: don't use asm-generic/ptrace.h
sh: don't use asm-generic/ptrace.h
powerpc: don't use asm-generic/ptrace.h
arm64: don't use asm-generic/ptrace.h
- Rework some vmbus code to separate architecture specifics out to
arch/x86/. This is part of the work of adding arm64 support to Hyper-V.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl0nfBEACgkQ3qZv95d3
LNyf8Q//fn+Eb+Pun3a5Cus8qYBQIYrVpt/bUmMuisV8A+MdRIOSwUU/oEbzCEwa
RhKKGKkNSkItKHz0QyqC9FwHZ18+6iLmSF1VwHzKVV+GJ1kQqI9+r4dC23yO/OCz
cuB6Uc+Jd7eLV9021U0jgmYI2HOLOfd5IUNqnyxuw6BPYYaJhrf3J9qSnlKmJ2tW
qhzSfKC2nn0+QbYSC5ww990o10vPMw3mcoXu23HKOU7Q2aqKkdK0HWjNrv49eent
NPbtjFoSABSVS+4eTzW1BrK2Rkwa+TVpAC75a4OPOd4KPUvibqCWjmEl8IE55Pre
BXwUoZnZN5cC1ayUGYuhyOmzFJtwxSWqgXlZRrmr+/XDgNk4k4V2bv253yFegZw2
1hMTbGwYrk4MrGNQB3YvnzJz8ObrTmNR98JvJz/bKPtHKmKmIbHcCWkR5gYAApuy
cY9PqKR5wWQ5A+leePzpXIXDtWlDHy7SDhUU14mbEj2bvQ3tC0yodHabpzl0Lx8e
feXGuqt/g5Lop+KGcRJj32tx23JBRkwfgIyagfRAdzVCs4urp9pt+tKezPkEpWW6
dd+Gc43Rnjd94neTCoo1seS9KH7n56ldnmeygMEDzI0rgW0q1m/Qwys0+6BIQR2G
lmcL1qJlquy3Fjfs1aYkkPqkQYoIqDwGQivSUgIDZ0TLBIsWAfY=
=wIj3
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyper-v updates from Sasha Levin:
- Add a module description to the Hyper-V vmbus module.
- Rework some vmbus code to separate architecture specifics out to
arch/x86/. This is part of the work of adding arm64 support to
Hyper-V.
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Break out ISA independent parts of mshyperv.h
drivers: hv: Add a module description line to the hv_vmbus driver
Most architectures have identical or very similar implementation of
pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and
pte_free().
Add a generic implementation that can be reused across architectures and
enable its use on x86.
The generic implementation uses
GFP_KERNEL | __GFP_ZERO
for the kernel page tables and
GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT
for the user page tables.
The "base" functions for PTE allocation, namely __pte_alloc_one_kernel()
and __pte_alloc_one() are intended for the architectures that require
additional actions after actual memory allocation or must use non-default
GFP flags.
x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and
pte_free().
x86 still implements pte_alloc_one() to allow run-time control of GFP
flags required for "userpte" command line option.
Link: http://lkml.kernel.org/r/1557296232-15361-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a new header to asm-generic to allow optionally instrumenting
architecture-specific asm implementations of bitops.
This change includes the required change for x86 as reference and
changes the kernel API doc to point to bitops-instrumented.h instead.
Rationale: the functions in x86's bitops.h are no longer the kernel API
functions, but instead the arch_ prefixed functions, which are then
instrumented via bitops-instrumented.h.
Other architectures can similarly add support for asm implementations of
bitops.
The documentation text was derived from x86 and existing bitops
asm-generic versions: 1) references to x86 have been removed; 2) as a
result, some of the text had to be reworded for clarity and consistency.
Tested using lib/test_kasan with bitops tests (pre-requisite patch).
Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=198439
Link: http://lkml.kernel.org/r/20190613125950.197667-4-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull m68nommu updates from Greg Ungerer:
"A series of cleanups for the FLAT format binary loader, binfmt_flat,
from Christoph.
The end goal is to support no-MMU on RISC-V, and the last patch
enables that"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
riscv: add binfmt_flat support
binfmt_flat: don't offset the data start
binfmt_flat: move the MAX_SHARED_LIBS definition to binfmt_flat.c
binfmt_flat: remove the persistent argument from flat_get_addr_from_rp
binfmt_flat: provide an asm-generic/flat.h
binfmt_flat: make support for old format binaries optional
binfmt_flat: add a ARCH_HAS_BINFMT_FLAT option
binfmt_flat: add endianess annotations
binfmt_flat: use fixed size type for the on-disk format
binfmt_flat: consolidate two version of flat_v2_reloc_t
binfmt_flat: remove the unused OLD_FLAT_FLAG_RAM definition
binfmt_flat: remove the uapi <linux/flat.h> header
binfmt_flat: replace flat_argvp_envp_on_stack with a Kconfig variable
binfmt_flat: remove flat_old_ram_flag
binfmt_flat: provide a default version of flat_get_relocate_addr
binfmt_flat: remove flat_set_persistent
binfmt_flat: remove flat_reloc_valid
Pull parisc updates from Helge Deller:
"Dynamic ftrace support by Sven Schnelle and a header guard fix by
Denis Efremov"
* 'parisc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: asm: psw.h: missing header guard
parisc: add dynamic ftrace
compiler.h: add CC_USING_PATCHABLE_FUNCTION_ENTRY
parisc: use pr_debug() in kernel/module.c
parisc: add WARN_ON() to clear_fixmap
parisc: add spinlock to patch function
parisc: add support for patching multiple words
Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:
- rwsem scalability improvements, phase #2, by Waiman Long, which are
rather impressive:
"On a 2-socket 40-core 80-thread Skylake system with 40 reader
and writer locking threads, the min/mean/max locking operations
done in a 5-second testing window before the patchset were:
40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255
After the patchset, they became:
40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"
There's a lot of changes to the locking implementation that makes
it similar to qrwlock, including owner handoff for more fair
locking.
Another microbenchmark shows how across the spectrum the
improvements are:
"With a locking microbenchmark running on 5.1 based kernel, the
total locking rates (in kops/s) on a 2-socket Skylake system
with equal numbers of readers and writers (mixed) before and
after this patchset were:
# of Threads Before Patch After Patch
------------ ------------ -----------
2 2,618 4,193
4 1,202 3,726
8 802 3,622
16 729 3,359
32 319 2,826
64 102 2,744"
The changes are extensive and the patch-set has been through
several iterations addressing various locking workloads. There
might be more regressions, but unless they are pathological I
believe we want to use this new implementation as the baseline
going forward.
- jump-label optimizations by Daniel Bristot de Oliveira: the primary
motivation was to remove IPI disturbance of isolated RT-workload
CPUs, which resulted in the implementation of batched jump-label
updates. Beyond the improvement of the real-time characteristics
kernel, in one test this patchset improved static key update
overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
as well.
- atomic64_t cross-arch type cleanups by Mark Rutland: over the last
~10 years of atomic64_t existence the various types used by the
APIs only had to be self-consistent within each architecture -
which means they became wildly inconsistent across architectures.
Mark puts and end to this by reworking all the atomic64
implementations to use 's64' as the base type for atomic64_t, and
to ensure that this type is consistently used for parameters and
return values in the API, avoiding further problems in this area.
- A large set of small improvements to lockdep by Yuyang Du: type
cleanups, output cleanups, function return type and othr cleanups
all around the place.
- A set of percpu ops cleanups and fixes by Peter Zijlstra.
- Misc other changes - please see the Git log for more details"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
locking/lockdep: increase size of counters for lockdep statistics
locking/atomics: Use sed(1) instead of non-standard head(1) option
locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
x86/jump_label: Make tp_vec_nr static
x86/percpu: Optimize raw_cpu_xchg()
x86/percpu, sched/fair: Avoid local_clock()
x86/percpu, x86/irq: Relax {set,get}_irq_regs()
x86/percpu: Relax smp_processor_id()
x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
locking/rwsem: Guard against making count negative
locking/rwsem: Adaptive disabling of reader optimistic spinning
locking/rwsem: Enable time-based spinning on reader-owned rwsem
locking/rwsem: Make rwsem->owner an atomic_long_t
locking/rwsem: Enable readers spinning on writer
locking/rwsem: Clarify usage of owner's nonspinaable bit
locking/rwsem: Wake up almost all readers in wait queue
locking/rwsem: More optimal RT task handling of null owner
locking/rwsem: Always release wait_lock before waking up tasks
locking/rwsem: Implement lock handoff to prevent lock starvation
locking/rwsem: Make rwsem_spin_on_owner() return owner state
...
Break out parts of mshyperv.h that are ISA independent into a
separate file in include/asm-generic. This move facilitates
ARM64 code reusing these definitions and avoids code
duplication. No functionality or behavior is changed.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
No one is using this header anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently the governors are declared in their respective files but they
export their [un]register functions which in turn call the [un]register
governors core's functions. That implies a cyclic dependency which is
not desirable. There is a way to self-encapsulate the governors by letting
them to declare themselves in a __init section table.
Define the table in the asm generic linker description like the other
tables and provide the specific macros to deal with.
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This file implements the flat get/put reloc helpers for architectures
that do not need to overload the relocs by simply using get_user/put_user.
Note that many nommu architectures currently use {get,put}_unaligned, which
looks a little bogus and should probably later be switched over to this
version as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
The new generic VDSO library allows to unify the update_vsyscall[_tz]()
implementations.
Provide a generic implementation based on the x86 code and the bindings
which need to be implemented in architecture specific code.
[ tglx: Moved it into kernel/time where it belongs. Removed the pointless
line breaks in the stub functions. Massaged changelog ]
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-4-vincenzo.frascino@arm.com
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch implements dynamic ftrace for PA-RISC. The required mcount
call sequences can get pretty long, so instead of patching the
whole call sequence out of the functions, we are using
-fpatchable-function-entry from gcc. This puts a configurable amount of
NOPS before/at the start of the function. Taking do_sys_open() as example,
which would look like this when the call is patched out:
1036b248: 08 00 02 40 nop
1036b24c: 08 00 02 40 nop
1036b250: 08 00 02 40 nop
1036b254: 08 00 02 40 nop
1036b258 <do_sys_open>:
1036b258: 08 00 02 40 nop
1036b25c: 08 03 02 41 copy r3,r1
1036b260: 6b c2 3f d9 stw rp,-14(sp)
1036b264: 08 1e 02 43 copy sp,r3
1036b268: 6f c1 01 00 stw,ma r1,80(sp)
When ftrace gets enabled for this function the kernel will patch these
NOPs to:
1036b248: 10 19 57 20 <address of ftrace>
1036b24c: 6f c1 00 80 stw,ma r1,40(sp)
1036b250: 48 21 3f d1 ldw -18(r1),r1
1036b254: e8 20 c0 02 bv,n r0(r1)
1036b258 <do_sys_open>:
1036b258: e8 3f 1f df b,l,n .-c,r1
1036b25c: 08 03 02 41 copy r3,r1
1036b260: 6b c2 3f d9 stw rp,-14(sp)
1036b264: 08 1e 02 43 copy sp,r3
1036b268: 6f c1 01 00 stw,ma r1,80(sp)
So the first NOP in do_sys_open() will be patched to jump backwards into
some minimal trampoline code which pushes a stackframe, saves r1 which
holds the return address, loads the address of the real ftrace function,
and branches to that location. For 64 Bit things are getting a bit more
complicated (and longer) because we must make sure that the address of
ftrace location is 8 byte aligned, and the offset passed to ldd for
fetching the address is 8 byte aligned as well.
Note that gcc has a bug which misplaces the function label, and needs a
patch to make dynamic ftrace work. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90751 for details.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Based on 1 normalized pattern(s):
this file is released under the gplv2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 68 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license v 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 45 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 3 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [kishon] [vijay] [abraham]
[i] [kishon]@[ti] [com] this program is distributed in the hope that
it will be useful but without any warranty without even the implied
warranty of merchantability or fitness for a particular purpose see
the gnu general public license for more details
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [graeme] [gregory]
[gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
[kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
[hk] [hemahk]@[ti] [com] this program is distributed in the hope
that it will be useful but without any warranty without even the
implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1105 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
licensed under the gpl 2 or later
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 82 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190524100845.150836982@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 or at your option any
later version you should have received a copy of the gnu general
public license for example usr src linux copying if not write to the
free software foundation inc 675 mass ave cambridge ma 02139 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 20 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170858.552543146@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public licence as published by
the free software foundation either version 2 of the licence or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 114 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull core fixes from Ingo Molnar:
"This fixes a particularly thorny munmap() bug with MPX, plus fixes a
host build environment assumption in objtool"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Allow AR to be overridden with HOSTAR
x86/mpx, mm/core: Fix recursive munmap() corruption
Christoph Hellwig writes:
This is a series doing two somewhat interwinded things. It improves
the asm-generic nommu uaccess helper to optionally be entirely generic
and not require any arch helpers for the actual uaccess. For the
generic uaccess.h to actually be generically useful I also had to kill
off the mess we made of <asm/segment.h>, which really shouldn't exist
on most architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcv22JAAoJEGCrR//JCVInH3YP/i7hVIEo4azoRB3/PwFTPel8
Buq9BUrGy1kptLWyi1yuOZf874gF6351hkVUo4EYq/uZ3x41ciojxFgbsIriAU0p
2xzJAWY0YaVuWM+PIVj6KjyDA0N7/U5PcAG+03pl1Lhf0GHldmMbQIWt8D8HXbB8
gyaOeBGM9BneKd8Xu0COUaId9/3GXUwsy2zYc77+PxaHHYJzGDPB5lqNdU0gbB54
P9uXI2mpoAepFMsFgP6/FBvT/jCiMifRIdTXPD94NtjfG+Q4lo+LBQ6bpcLfw4ZD
VNh0982Cyl5n7FNetlTK4CPBn0RZsmBRriMotYfXeghFg0mmNTLwNEMg1u6RQ+uq
VYoBrVGhnx4SFB8xdkqO4md6UwprR2SERIIKwuCTbhwSgs+NkB7t4ftOwTzyQ1/6
7WQjclAIxQK9J7uwAeRGCvyrNJplqSfOA/hRjuq/Z0BCE/+m26Gxfv4aDztU5wFO
FWj2uFGTMuufp+DKoh5Vj5aJiFwfqK5/w1VYWSQdaoiWsHlmmu5IkTrrZyz+S3Tj
cifO9Ghe75Pt+rDllc8yqzKYXa5mL89sWyKiy+2ItOvGVh5EqXBbPXtCrFFIHRFL
WojVPu0yO+OoP0sEdamT/4FxbWO9VrV5YXdaRg/GjVA9ARlKNpFrZbuHtvEitwpi
7AbhxMZwBSSb9R3cz67J
=X2CH
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-nommu' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull nommu generic uaccess updates from Arnd Bergmann:
"asm-generic: kill <asm/segment.h> and improve nommu generic uaccess helpers
Christoph Hellwig writes:
This is a series doing two somewhat interwinded things. It improves
the asm-generic nommu uaccess helper to optionally be entirely
generic and not require any arch helpers for the actual uaccess.
For the generic uaccess.h to actually be generically useful I also
had to kill off the mess we made of <asm/segment.h>, which really
shouldn't exist on most architectures"
* tag 'asm-generic-nommu' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: optimize generic uaccess for 8-byte loads and stores
asm-generic: provide entirely generic nommu uaccess
arch: mostly remove <asm/segment.h>
asm-generic: don't include <asm/segment.h> from <asm/uaccess.h>
Merge more updates from Andrew Morton:
- a couple of hotfixes
- almost all of the rest of MM
- lib/ updates
- binfmt_elf updates
- autofs updates
- quite a lot of misc fixes and updates
- reiserfs, fatfs
- signals
- exec
- cpumask
- rapidio
- sysctl
- pids
- eventfd
- gcov
- panic
- pps
- gdb script updates
- ipc updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits)
mm: memcontrol: fix NUMA round-robin reclaim at intermediate level
mm: memcontrol: fix recursive statistics correctness & scalabilty
mm: memcontrol: move stat/event counting functions out-of-line
mm: memcontrol: make cgroup stats and events query API explicitly local
drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl
drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl
mm, memcg: rename ambiguously named memory.stat counters and functions
arch: remove <asm/sizes.h> and <asm-generic/sizes.h>
treewide: replace #include <asm/sizes.h> with #include <linux/sizes.h>
fs/block_dev.c: Remove duplicate header
fs/cachefiles/namei.c: remove duplicate header
include/linux/sched/signal.h: replace `tsk' with `task'
fs/coda/psdev.c: remove duplicate header
ipc: do cyclic id allocation for the ipc object.
ipc: conserve sequence numbers in ipcmni_extend mode
ipc: allow boot time extension of IPCMNI from 32k to 16M
ipc/mqueue: optimize msg_get()
ipc/mqueue: remove redundant wq task assignment
ipc: prevent lockup on alloc_msg and free_msg
scripts/gdb: print cached rate in lx-clk-summary
...
Now that all instances of #include <asm/sizes.h> have been replaced with
#include <linux/sizes.h>, we can remove these.
Link: http://lkml.kernel.org/r/1553267665-27228-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "WITH Linux-syscall-note" should be added to headers exported to the
user-space.
Some kernel-space headers have "WITH Linux-syscall-note", which seems a
mistake.
[1] arch/x86/include/asm/hyperv-tlfs.h
Commit 5a48580322 ("x86/hyper-v: move hyperv.h out of uapi") moved
this file out of uapi, but missed to update the SPDX License tag.
[2] include/asm-generic/shmparam.h
Commit 76ce2a80a2 ("Rename include/{uapi => }/asm-generic/shmparam.h
really") moved this file out of uapi, but missed to update the SPDX
License tag.
[3] include/linux/qcom-geni-se.h
Commit eddac5af06 ("soc: qcom: Add GENI based QUP Wrapper driver")
added this file, but I do not see a good reason why its license tag must
include "WITH Linux-syscall-note".
Link: http://lkml.kernel.org/r/1554196104-3522-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Summary of modules changes for the 5.2 merge window:
- Use a separate table to store symbol types instead of
hijacking fields in struct Elf_Sym
- Trivial code cleanups
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJc2W9gAAoJEMBFfjjOO8FyAqkP+wbCB42LJ8kD2H+pbxpsf7vU
T1+5fZ3rfyLqHDc9Pzt1K6SBR04aLq8OXW3nOIQ5kmirSstiznokAExQBjeMegVJ
14Q5IxOSQ44rBPg5ggCA6s9qNyw2fqgwKWQKXLW+VmYAHiBfIBV0mO576KWIEgE1
vOCQPqW0ccfmjEssumEYMSePg01LATjso3PMcdJm6yI74xSGd8n872y36BleAP8M
232L8B4fuZEuHJ1QaBnXdapKibyMX3k3WbfK9U2RzSWaGjmuBXwihCB+ZA6HCiRX
MVwXQwh63UGCZLYkXnGMwPNgX5TTZRiPT4oIhB0tGIXkEFX/DlA+NGfuinEAYuKS
M0zhjgjSOKYEhNy1GtirbHjNCC4ULF0OZvD/dB8InQON0/t5Duq3sOHSp9+rWMg7
yuHy3NJGHHd65nGd7u7vXmdZXct/NVDdahKhqRA8i+HOxUBdF9vvMMnSASXufWFt
j5GSFA5/IZKpYofho/jvJK2rmQQVkVpD9zWCZcv8BNT124W4cZ5swOOSVYOnSAJb
Wvy1oknjF/HrKIft3UIFR0FU8uIueP0mtnna2B9SBVfZm7rM+/3+a1D/UOHaLtuO
++b1sIOpyApevRZcXpTnH63ZaO2tRC7KnDzaBT8FTZimEyV5KdD6Ec2MKJdi8Nkb
Pr099LJ2i/i8rQbkjM1i
=NO31
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
- Use a separate table to store symbol types instead of hijacking
fields in struct Elf_Sym
- Trivial code cleanups
* tag 'modules-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: add stubs for within_module functions
kallsyms: store type information in its own array
vmlinux.lds.h: drop unused __vermagic
On systems without CONTIG_ALLOC activated but that support gigantic pages,
boottime reserved gigantic pages can not be freed at all. This patch
simply enables the possibility to hand back those pages to memory
allocator.
Link: http://lkml.kernel.org/r/20190327063626.18421-5-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Acked-by: David S. Miller <davem@davemloft.net> [sparc]
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a bit of a mess, to put it mildly. But, it's a bug
that only seems to have showed up in 4.20 but wasn't noticed
until now, because nobody uses MPX.
MPX has the arch_unmap() hook inside of munmap() because MPX
uses bounds tables that protect other areas of memory. When
memory is unmapped, there is also a need to unmap the MPX
bounds tables. Barring this, unused bounds tables can eat 80%
of the address space.
But, the recursive do_munmap() that gets called vi arch_unmap()
wreaks havoc with __do_munmap()'s state. It can result in
freeing populated page tables, accessing bogus VMA state,
double-freed VMAs and more.
See the "long story" further below for the gory details.
To fix this, call arch_unmap() before __do_unmap() has a chance
to do anything meaningful. Also, remove the 'vma' argument
and force the MPX code to do its own, independent VMA lookup.
== UML / unicore32 impact ==
Remove unused 'vma' argument to arch_unmap(). No functional
change.
I compile tested this on UML but not unicore32.
== powerpc impact ==
powerpc uses arch_unmap() well to watch for munmap() on the
VDSO and zeroes out 'current->mm->context.vdso_base'. Moving
arch_unmap() makes this happen earlier in __do_munmap(). But,
'vdso_base' seems to only be used in perf and in the signal
delivery that happens near the return to userspace. I can not
find any likely impact to powerpc, other than the zeroing
happening a little earlier.
powerpc does not use the 'vma' argument and is unaffected by
its removal.
I compile-tested a 64-bit powerpc defconfig.
== x86 impact ==
For the common success case this is functionally identical to
what was there before. For the munmap() failure case, it's
possible that some MPX tables will be zapped for memory that
continues to be in use. But, this is an extraordinarily
unlikely scenario and the harm would be that MPX provides no
protection since the bounds table got reset (zeroed).
I can't imagine anyone doing this:
ptr = mmap();
// use ptr
ret = munmap(ptr);
if (ret)
// oh, there was an error, I'll
// keep using ptr.
Because if you're doing munmap(), you are *done* with the
memory. There's probably no good data in there _anyway_.
This passes the original reproducer from Richard Biener as
well as the existing mpx selftests/.
The long story:
munmap() has a couple of pieces:
1. Find the affected VMA(s)
2. Split the start/end one(s) if neceesary
3. Pull the VMAs out of the rbtree
4. Actually zap the memory via unmap_region(), including
freeing page tables (or queueing them to be freed).
5. Fix up some of the accounting (like fput()) and actually
free the VMA itself.
This specific ordering was actually introduced by:
dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
during the 4.20 merge window. The previous __do_munmap() code
was actually safe because the only thing after arch_unmap() was
remove_vma_list(). arch_unmap() could not see 'vma' in the
rbtree because it was detached, so it is not even capable of
doing operations unsafe for remove_vma_list()'s use of 'vma'.
Richard Biener reported a test that shows this in dmesg:
[1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
[1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
What triggered this was the recursive do_munmap() called via
arch_unmap(). It was freeing page tables that has not been
properly zapped.
But, the problem was bigger than this. For one, arch_unmap()
can free VMAs. But, the calling __do_munmap() has variables
that *point* to VMAs and obviously can't handle them just
getting freed while the pointer is still in use.
I tried a couple of things here. First, I tried to fix the page
table freeing problem in isolation, but I then found the VMA
issue. I also tried having the MPX code return a flag if it
modified the rbtree which would force __do_munmap() to re-walk
to restart. That spiralled out of control in complexity pretty
fast.
Just moving arch_unmap() and accepting that the bonkers failure
case might eat some bounds tables seems like the simplest viable
fix.
This was also reported in the following kernel bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=203123
There are some reports that this commit triggered this bug:
dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
While that commit certainly made the issues easier to hit, I believe
the fundamental issue has been with us as long as MPX itself, thus
the Fixes: tag below is for one of the original MPX commits.
[ mingo: Minor edits to the changelog and the patch. ]
Reported-by: Richard Biener <rguenther@suse.de>
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-um@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: stable@vger.kernel.org
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- allow users to invoke 'make' out of the source tree
- refactor scripts/mkmakefile
- deprecate KBUILD_SRC, which was used to track the source tree
location for O= build.
- fix recordmcount.pl in case objdump output is localized
- turn unresolved symbols in external modules to errors from warnings
by default; pass KBUILD_MODPOST_WARN=1 to get them back to warnings
- generate modules.builtin.modinfo to collect .modinfo data from
built-in modules
- misc Makefile cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJc0upmAAoJED2LAQed4NsGreoP/jkKQojVdzTjM/nn/Xe9FrE9
elQ59Wu/gW/q7jouN733WTJ2fRPAEVaGPw/CT/cCRWjNJjd6MM4voDKVxifZE+u4
qz5XUGq4Mjozd0MJdU1LA1dIRJrEaPofOj7E0JEIerEdLPKpv4kjCCbbHOZwN2Kl
YrpPFJkYspCRVG+txEWY8YaZeU/+OAdNckJnMkX8hnUwsNplAlPw4L/5Y12Uncuz
7g/3T1f701pJhStoO4OPR/+Rivi0EX5AP7TZyv2/FOwTO+Oau5G1rF0j1BT0+ceg
d+bApuaHLR4ocZ6GvWfACMYhH7ais0BUgAwi6HY/b78SGMmNB6xlD5biVqcklx6c
mjrPNaj6WJEQIflZ3yr83tRNhTu7xXRKZkrGHHUjOGNsRULuD8RFrXtusl06A22S
hA3vt2SF4FuLBabPw5yFkKNOTX+P5dG79BC1fWIQjglBGoYqWTRQNYoottr/7t2T
Si+E6J2lAD0EDNuc/EFKWfrgONfatgcdgQuTibwCcE2KfYnMG/C+9DvxRV4lAQtn
fYpap9gws+0AisbVx0m1W088NMFiwrMbN3n3KITGY/15XDmeySEA1QgPKFN8JHJd
Do0duaMx8BHjkA5ms/Bf0AZd0tWUkWVdt0epwU2KafWNGK7VaPnhKxn5eQOGkWV+
gT3J3XZc3b7OEOQ4XX3Z
=zj9w
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- allow users to invoke 'make' out of the source tree
- refactor scripts/mkmakefile
- deprecate KBUILD_SRC, which was used to track the source tree
location for O= build.
- fix recordmcount.pl in case objdump output is localized
- turn unresolved symbols in external modules to errors from warnings
by default; pass KBUILD_MODPOST_WARN=1 to get them back to warnings
- generate modules.builtin.modinfo to collect .modinfo data from
built-in modules
- misc Makefile cleanups
* tag 'kbuild-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (21 commits)
.gitignore: add more all*.config patterns
moduleparam: Save information about built-in modules in separate file
Remove MODULE_ALIAS() calls that take undefined macro
.gitignore: add leading and trailing slashes to generated directories
scripts/tags.sh: fix direct execution of scripts/tags.sh
scripts: override locale from environment when running recordmcount.pl
samples: kobject: allow CONFIG_SAMPLE_KOBJECT to become y
samples: seccomp: turn CONFIG_SAMPLE_SECCOMP into a bool option
kbuild: move Documentation to vmlinux-alldirs
kbuild: move samples/ to KBUILD_VMLINUX_OBJS
modpost: make KBUILD_MODPOST_WARN also configurable for external modules
kbuild: check arch/$(SRCARCH)/include/generated before out-of-tree build
kbuild: remove unneeded dependency for include/config/kernel.release
memory: squash drivers/memory/Makefile.asm-offsets
kbuild: use $(srctree) instead of KBUILD_SRC to check out-of-tree build
kbuild: mkmakefile: generate a simple wrapper of top Makefile
kbuild: mkmakefile: do not check the generated Makefile marker
kbuild: allow Kbuild to start from any directory
kbuild: pass $(MAKECMDGOALS) to sub-make as is
kbuild: fix warning "overriding recipe for target 'Makefile'"
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAlzRrzoUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNc7hAApgsi+3Jf9i29mgrKdrTciZ35TegK
C8pTlOIndpBcmdwDakR50/PgfMHdHll8M9TReVNEjbe0S+Ww5GTE7eWtL3YqoPC2
MuXEqcriz6UNi5Xma6vCZrDznWLXkXnzMDoDoYGDSoKuUYxef0fuqxDBnERM60Ht
s52+0XvR5ZseBw7I1KIv/ix2fXuCGq6eCdqassm0rvLPQ7bq6nWzFAlNXOLud303
DjIWu6Op2EL0+fJSmG+9Z76zFjyEbhMIhw5OPDeH4eO3pxX29AIv0m0JlI7ZXxfc
/VVC3r5G4WrsWxwKMstOokbmsQxZ5pB3ZaceYpco7U+9N2e3SlpsNM9TV+Y/0ac/
ynhYa//GK195LpMXx1BmWmLpjBHNgL8MvQkVTIpDia0GT+5sX7+haDxNLGYbocmw
A/mR+KM2jAU3QzNseGh6c659j3K4tbMIFMNxt7pUBxVPLafcccNngFGTpzCwu5GU
b7y4d21g6g/3Irj14NYU/qS8dTjW0rYrCMDquTpxmMfZ2xYuSvQmnBw91NQzVBp2
98L2/fsUG3yOa5MApgv+ryJySsIM+SW+7leKS5tjy/IJINzyPEZ85l3o8ck8X4eT
nohpKc/ELmeyi3omFYq18ecvFf2YRS5jRnz89i9q65/3ESgGiC0wyGOhNTvjvsyv
k4jT0slIK614aGk=
=p8Fp
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"We've got a reasonably broad set of audit patches for the v5.2 merge
window, the highlights are below:
- The biggest change, and the source of all the arch/* changes, is
the patchset from Dmitry to help enable some of the work he is
doing around PTRACE_GET_SYSCALL_INFO.
To be honest, including this in the audit tree is a bit of a
stretch, but it does help move audit a little further along towards
proper syscall auditing for all arches, and everyone else seemed to
agree that audit was a "good" spot for this to land (or maybe they
just didn't want to merge it? dunno.).
- We can now audit time/NTP adjustments.
- We continue the work to connect associated audit records into a
single event"
* tag 'audit-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: (21 commits)
audit: fix a memory leak bug
ntp: Audit NTP parameters adjustment
timekeeping: Audit clock adjustments
audit: purge unnecessary list_empty calls
audit: link integrity evm_write_xattrs record to syscall event
syscall_get_arch: add "struct task_struct *" argument
unicore32: define syscall_get_arch()
Move EM_UNICORE to uapi/linux/elf-em.h
nios2: define syscall_get_arch()
nds32: define syscall_get_arch()
Move EM_NDS32 to uapi/linux/elf-em.h
m68k: define syscall_get_arch()
hexagon: define syscall_get_arch()
Move EM_HEXAGON to uapi/linux/elf-em.h
h8300: define syscall_get_arch()
c6x: define syscall_get_arch()
arc: define syscall_get_arch()
Move EM_ARCOMPACT and EM_ARCV2 to uapi/linux/elf-em.h
audit: Make audit_log_cap and audit_copy_inode static
audit: connect LOGIN record to its syscall record
...
Problem:
When a kernel module is compiled as a separate module, some important
information about the kernel module is available via .modinfo section of
the module. In contrast, when the kernel module is compiled into the
kernel, that information is not available.
Information about built-in modules is necessary in the following cases:
1. When it is necessary to find out what additional parameters can be
passed to the kernel at boot time.
2. When you need to know which module names and their aliases are in
the kernel. This is very useful for creating an initrd image.
Proposal:
The proposed patch does not remove .modinfo section with module
information from the vmlinux at the build time and saves it into a
separate file after kernel linking. So, the kernel does not increase in
size and no additional information remains in it. Information is stored
in the same format as in the separate modules (null-terminated string
array). Because the .modinfo section is already exported with a separate
modules, we are not creating a new API.
It can be easily read in the userspace:
$ tr '\0' '\n' < modules.builtin.modinfo
ext4.softdep=pre: crc32c
ext4.license=GPL
ext4.description=Fourth Extended Filesystem
ext4.author=Remy Card, Stephen Tweedie, Andrew Morton, Andreas Dilger, Theodore Ts'o and others
ext4.alias=fs-ext4
ext4.alias=ext3
ext4.alias=fs-ext3
ext4.alias=ext2
ext4.alias=fs-ext2
md_mod.alias=block-major-9-*
md_mod.alias=md
md_mod.description=MD RAID framework
md_mod.license=GPL
md_mod.parmtype=create_on_open:bool
md_mod.parmtype=start_dirty_degraded:int
...
Co-Developed-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Mostly just incremental improvements here:
- Introduce AT_HWCAP2 for advertising CPU features to userspace
- Expose SVE2 availability to userspace
- Support for "data cache clean to point of deep persistence" (DC PODP)
- Honour "mitigations=off" on the cmdline and advertise status via sysfs
- CPU timer erratum workaround (Neoverse-N1 #1188873)
- Introduce perf PMU driver for the SMMUv3 performance counters
- Add config option to disable the kuser helpers page for AArch32 tasks
- Futex modifications to ensure liveness under contention
- Rework debug exception handling to seperate kernel and user handlers
- Non-critical fixes and cleanup
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFGgACgkQt6xw3ITB
YzTicAf/TX1h1+ecbx4WJAa4qeiOCPoNpG9efldQumqJhKL44MR5bkhuShna5mwE
ptm5qUXkZCxLTjzssZKnbdbgwa3t+emW8Of3D91IfI9akiZbMoDx5FGgcNbqjazb
RLrhOFHwgontA38yppZN+DrL+sXbvif/CVELdHahkEx6KepSGaS2lmPXRmz/W56v
4yIRy/zxc3Dhjgfm3wKh72nBwoZdLiIc4mchd5pthNlR9E2idrYkQegG1C+gA00r
o8uZRVOWgoh7H+QJE+xLUc8PaNCg8xqRRXOuZYg9GOz6hh7zSWhm+f1nRz9S2tIR
gIgsCHNqoO2I3E1uJpAQXDGtt2kFhA==
=ulpJ
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Mostly just incremental improvements here:
- Introduce AT_HWCAP2 for advertising CPU features to userspace
- Expose SVE2 availability to userspace
- Support for "data cache clean to point of deep persistence" (DC PODP)
- Honour "mitigations=off" on the cmdline and advertise status via
sysfs
- CPU timer erratum workaround (Neoverse-N1 #1188873)
- Introduce perf PMU driver for the SMMUv3 performance counters
- Add config option to disable the kuser helpers page for AArch32 tasks
- Futex modifications to ensure liveness under contention
- Rework debug exception handling to seperate kernel and user
handlers
- Non-critical fixes and cleanup"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
Documentation: Add ARM64 to kernel-parameters.rst
arm64/speculation: Support 'mitigations=' cmdline option
arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
arm64: enable generic CPU vulnerabilites support
arm64: add sysfs vulnerability show for speculative store bypass
arm64: Fix size of __early_cpu_boot_status
clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable counters
clocksource/arm_arch_timer: Remove use of workaround static key
clocksource/arm_arch_timer: Drop use of static key in arch_timer_reg_read_stable
clocksource/arm_arch_timer: Direcly assign set_next_event workaround
arm64: Use arch_timer_read_counter instead of arch_counter_get_cntvct
watchdog/sbsa: Use arch_timer_read_counter instead of arch_counter_get_cntvct
ARM: vdso: Remove dependency with the arch_timer driver internals
arm64: Apply ARM64_ERRATUM_1188873 to Neoverse-N1
arm64: Add part number for Neoverse N1
arm64: Make ARM64_ERRATUM_1188873 depend on COMPAT
arm64: Restrict ARM64_ERRATUM_1188873 mitigation to AArch32
arm64: mm: Remove pte_unmap_nested()
arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable
arm64: compat: Reduce address limit for 64K pages
...
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFaUACgkQt6xw3ITB
YzRICQgAiv7wF/yIbBhDOmCNCAKDO59chvFQWxXWdGk/aAB56kwKAMXJgLOvlMG/
VRuuLyParTFQETC3jaxKgnO/1hb+PZLDt2Q2KqixtjIzBypKUPWvK2sf6THhSRF1
GK0DBVUd1rCrWrR815+SPb8el4xXtdBzvAVB+Fx35PXVNpdRdqCkK+EQ6UnXGokm
rXXHbnfsnquBDtmb4CR4r2beH+aNElXbdt0Kj8VcE5J7f7jTdW3z6Q9WFRvdKmK7
yrsxXXB2w/EsWXOwFp0SLTV5+fgeGgTvv8uLjDw+SG6t0E0PebxjNAflT7dPrbYL
WecjKC9WqBxrGY+4ew6YJP70ijLBCw==
=aC8m
-----END PGP SIGNATURE-----
Merge tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull mmiowb removal from Will Deacon:
"Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
The only relatively recent changes have been addressing review
comments on the documentation, which is in a much better shape thanks
to the efforts of Ben and Ingo.
I was initially planning to split this into two pull requests so that
you could run the coccinelle script yourself, however it's been plain
sailing in linux-next so I've just included the whole lot here to keep
things simple"
* tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits)
docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread
docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section
arch: Remove dummy mmiowb() definitions from arch code
net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
i40iw: Redefine i40iw_mmiowb() to do nothing
scsi/qla1280: Remove stale comment about mmiowb()
drivers: Remove explicit invocations of mmiowb()
drivers: Remove useless trailing comments from mmiowb() invocations
Documentation: Kill all references to mmiowb()
riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
m68k/io: Remove useless definition of mmiowb()
nds32/io: Remove useless definition of mmiowb()
x86/io: Remove useless definition of mmiowb()
arm64/io: Remove useless definition of mmiowb()
ARM/io: Remove useless definition of mmiowb()
mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
...
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this
will allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc0CCEAAoJEDjwexyKj9rgjmkH/A3e2drvuP/hSF3xfCKTQFdx
/PoLHQVCqENB3HU3FA/ljoXuG6jMgwj61looqlxBNumXFpIfTg0E1JC5S4wRGJ+K
cOVhIKV53gcuZkRcCJQp0WMnGzpk1Daf7iYXYmAl+7e+mREUPxOuJ0Ei6vXvRGZS
8cQrUCGrtPgkAeLlndypHI2M2TDDGJIMczOGbOZau8+8Lo7Wq9zt5y0h/v0ew37g
ogA0eGh6koU1435dt2pclZRiZ1XOcar3Uin9ioT+RnSgJ4pr1Pza/F6IGO0RdQa+
rva990lqGFp5r9lE4rMCwK9LWb/rfHdVPd35t9XPwphnQ/ORoWUwLk3uc5XOHow=
=dbuy
-----END PGP SIGNATURE-----
Merge tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this will
allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
* tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (98 commits)
s390/vdso: drop unnecessary cc-ldoption
s390: fix clang -Wpointer-sign warnigns in boot code
s390: drop CONFIG_VIRT_TO_BUS
s390: boot, purgatory: pass $(CLANG_FLAGS) where needed
s390: only build for new CPUs with clang
s390: simplify disabled_wait
s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
s390/unwind: introduce stack unwind API
s390/opcodes: add missing instructions to the disassembler
s390/bug: add entry size to the __bug_table section
s390: use proper expoline sections for .dma code
s390/nospec: rename assembler generated expoline thunks
s390: add missing ENDPROC statements to assembler functions
locking/lockdep: check for freed initmem in static_obj()
s390/kernel: add support for kernel address space layout randomization (KASLR)
s390/kernel: introduce .dma sections
s390/sclp: do not use static sccbs
s390/kprobes: use static buffer for insn_page
s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad
s390/kernel: build a relocatable kernel
...
Pull x86 mm updates from Ingo Molnar:
"The changes in here are:
- text_poke() fixes and an extensive set of executability lockdowns,
to (hopefully) eliminate the last residual circumstances under
which we are using W|X mappings even temporarily on x86 kernels.
This required a broad range of surgery in text patching facilities,
module loading, trampoline handling and other bits.
- tweak page fault messages to be more informative and more
structured.
- remove DISCONTIGMEM support on x86-32 and make SPARSEMEM the
default.
- reduce KASLR granularity on 5-level paging kernels from 512 GB to
1 GB.
- misc other changes and updates"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/mm: Initialize PGD cache during mm initialization
x86/alternatives: Add comment about module removal races
x86/kprobes: Use vmalloc special flag
x86/ftrace: Use vmalloc special flag
bpf: Use vmalloc special flag
modules: Use vmalloc special flag
mm/vmalloc: Add flag for freeing of special permsissions
mm/hibernation: Make hibernation handle unmapped pages
x86/mm/cpa: Add set_direct_map_*() functions
x86/alternatives: Remove the return value of text_poke_*()
x86/jump-label: Remove support for custom text poker
x86/modules: Avoid breaking W^X while loading modules
x86/kprobes: Set instruction page as executable
x86/ftrace: Set trampoline pages as executable
x86/kgdb: Avoid redundant comparison of patched code
x86/alternatives: Use temporary mm for text poking
x86/alternatives: Initialize temporary mm for patching
fork: Provide a function for copying init_mm
uprobes: Initialize uprobes earlier
x86/mm: Save debug registers when loading a temporary mm
...
Pull locking updates from Ingo Molnar:
"Here are the locking changes in this cycle:
- rwsem unification and simpler micro-optimizations to prepare for
more intrusive (and more lucrative) scalability improvements in
v5.3 (Waiman Long)
- Lockdep irq state tracking flag usage cleanups (Frederic
Weisbecker)
- static key improvements (Jakub Kicinski, Peter Zijlstra)
- misc updates, cleanups and smaller fixes"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
locking/lockdep: Remove unnecessary unlikely()
locking/static_key: Don't take sleeping locks in __static_key_slow_dec_deferred()
locking/static_key: Factor out the fast path of static_key_slow_dec()
locking/static_key: Add support for deferred static branches
locking/lockdep: Test all incompatible scenarios at once in check_irq_usage()
locking/lockdep: Avoid bogus Clang warning
locking/lockdep: Generate LOCKF_ bit composites
locking/lockdep: Use expanded masks on find_usage_*() functions
locking/lockdep: Map remaining magic numbers to lock usage mask names
locking/lockdep: Move valid_state() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
locking/rwsem: Prevent unneeded warning during locking selftest
locking/rwsem: Optimize rwsem structure for uncontended lock acquisition
locking/rwsem: Enable lock event counting
locking/lock_events: Don't show pvqspinlock events on bare metal
locking/lock_events: Make lock_events available for all archs & other locks
locking/qspinlock_stat: Introduce generic lockevent_*() counting APIs
locking/rwsem: Enhance DEBUG_RWSEMS_WARN_ON() macro
locking/rwsem: Add debug check for __down_read*()
locking/rwsem: Micro-optimize rwsem_try_read_lock_unqueued()
locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h
...
Pull unified TLB flushing from Ingo Molnar:
"This contains the generic mmu_gather feature from Peter Zijlstra,
which is an all-arch unification of TLB flushing APIs, via the
following (broad) steps:
- enhance the <asm-generic/tlb.h> APIs to cover more arch details
- convert most TLB flushing arch implementations to the generic
<asm-generic/tlb.h> APIs.
- remove leftovers of per arch implementations
After this series every single architecture makes use of the unified
TLB flushing APIs"
* 'core-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm/resource: Use resource_overlaps() to simplify region_intersects()
ia64/tlb: Eradicate tlb_migrate_finish() callback
asm-generic/tlb: Remove tlb_table_flush()
asm-generic/tlb: Remove tlb_flush_mmu_free()
asm-generic/tlb: Remove CONFIG_HAVE_GENERIC_MMU_GATHER
asm-generic/tlb: Remove arch_tlb*_mmu()
s390/tlb: Convert to generic mmu_gather
asm-generic/tlb: Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y
arch/tlb: Clean up simple architectures
um/tlb: Convert to generic mmu_gather
sh/tlb: Convert SH to generic mmu_gather
ia64/tlb: Convert to generic mmu_gather
arm/tlb: Convert to generic mmu_gather
asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
asm-generic/tlb, ia64: Conditionally provide tlb_migrate_finish()
asm-generic/tlb: Provide generic tlb_flush() based on flush_tlb_mm()
asm-generic/tlb, arch: Provide generic tlb_flush() based on flush_tlb_range()
asm-generic/tlb, arch: Provide generic VIPT cache flush
asm-generic/tlb, arch: Provide CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
asm-generic/tlb: Provide a comment
Poking-mm initialization might require to duplicate the PGD in early
stage. Initialize the PGD cache earlier to prevent boot failures.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4fc19708b1 ("x86/alternatives: Initialize temporary mm for patching")
Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
x86 has an nmi_uaccess_okay(), but other architectures do not.
Arch-independent code might need to know whether access to user
addresses is ok in an NMI context or in other code whose execution
context is unknown. Specifically, this function is needed for
bpf_probe_write_user().
Add a default implementation of nmi_uaccess_okay() for architectures
that do not have such a function.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-23-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following warning occurred on s390:
WARNING: CPU: 0 PID: 804 at kernel/locking/lockdep.c:1025 lockdep_register_key+0x30/0x150
This is because the check in static_obj() assumes that all memory within
[_stext, _end] belongs to static objects, which at least for s390 isn't
true. The init section is also part of this range, and freeing it allows
the buddy allocator to allocate memory from it. We have virt == phys for
the kernel on s390, so that such allocations would then have addresses
within the range [_stext, _end].
To fix this, introduce arch_is_kernel_initmem_freed(), similar to
arch_is_kernel_text/data(), and add it to the checks in static_obj().
This will always return 0 on architectures that do not define
arch_is_kernel_initmem_freed. On s390, it will return 1 if initmem has
been freed and the address is in the range [__init_begin, __init_end].
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The architecture implementations of 'arch_futex_atomic_op_inuser()' and
'futex_atomic_cmpxchg_inatomic()' are permitted to return only -EFAULT,
-EAGAIN or -ENOSYS in the case of failure.
Update the comments in the asm-generic/ implementation and also a stray
reference in the robust futex documentation.
Signed-off-by: Will Deacon <will.deacon@arm.com>
On 64-bit architectures we can also use the direct load/store trick for
8-byte values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Move the code to implement uaccess using memcpy or direct loads and
stores to asm-generic/uaccess.h and make it selectable kconfig option.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
A few architectures use <asm/segment.h> internally, but nothing in
common code does. Remove all the empty or almost empty versions of it,
including the asm-generic one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
<asm/segment.h> is an odd x86 legacy that we shouldn't force on other
architectures. arc used it to bring in mm_context_t, but we can do
that inside the arc code easily.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Now that no driver code is using mmiowb() directly, remove the dummy
definitions remaining in architectures that don't make use of
asm-generic/io.h, as well as the definition in asm-generic/io.h itself.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Removing explicit calls to mmiowb() from driver code means that we must
now call into the generic mmiowb_spin_{lock,unlock}() functions from the
core spinlock code. In order to elide barriers following critical
sections without any I/O writes, we also hook into the asm-generic I/O
routines.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for removing all explicit mmiowb() calls from driver
code, implement a tracking system in asm-generic based loosely on the
PowerPC implementation. This allows architectures with a non-empty
mmiowb() definition to have the barrier automatically inserted in
spin_unlock() following a critical section containing an I/O write.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As the generic rwsem-xadd code is using the appropriate acquire and
release versions of the atomic operations, the arch specific rwsem.h
files will not be that much faster than the generic code as long as the
atomic functions are properly implemented. So we can remove those arch
specific rwsem.h and stop building asm/rwsem.h to reduce maintenance
effort.
Currently, only x86, alpha and ia64 have implemented architecture
specific fast paths. I don't have access to alpha and ia64 systems for
testing, but they are legacy systems that are not likely to be updated
to the latest kernel anyway.
By using a rwsem microbenchmark, the total locking rates on a 4-socket
56-core 112-thread x86-64 system before and after the patch were as
follows (mixed means equal # of read and write locks):
Before Patch After Patch
# of Threads wlock rlock mixed wlock rlock mixed
------------ ----- ----- ----- ----- ----- -----
1 29,201 30,143 29,458 28,615 30,172 29,201
2 6,807 13,299 1,171 7,725 15,025 1,804
4 6,504 12,755 1,520 7,127 14,286 1,345
8 6,762 13,412 764 6,826 13,652 726
16 6,693 15,408 662 6,599 15,938 626
32 6,145 15,286 496 5,549 15,487 511
64 5,812 15,495 60 5,858 15,572 60
There were some run-to-run variations for the multi-thread tests. For
x86-64, using the generic C code fast path seems to be a little bit
faster than the assembly version with low lock contention. Looking at
the assembly version of the fast paths, there are assembly to/from C
code wrappers that save and restore all the callee-clobbered registers
(7 registers on x86-64). The assembly generated from the generic C
code doesn't need to do that. That may explain the slight performance
gain here.
The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h
with no code change as no other code other than those under
kernel/locking needs to access the internal rwsem macros and functions.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Link: https://lkml.kernel.org/r/20190322143008.21313-2-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Only ia64-sn2 uses this as an optimization, and there it is of
questionable correctness due to the mm_users==1 test.
Remove it entirely.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are no external users of this API (nor should there be); remove it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As the comment notes; it is a potentially dangerous operation. Just
use tlb_flush_mmu(), that will skip the (double) TLB invalidate if
it really isn't needed anyway.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since all architectures are now using it, it is redundant.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
mmu_gather code. If the option is set the mmu_gather will not
track individual pages for delayed page free anymore. A platform
that enables the option needs to provide its own implementation
of the __tlb_remove_page_size() function to free pages.
No change in behavior intended.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: linux@armlinux.org.uk
Cc: npiggin@gmail.com
Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make issuing a TLB invalidate for page-table pages the normal case.
The reason is twofold:
- too many invalidates is safer than too few,
- most architectures use the linux page-tables natively
and would thus require this.
Make it an opt-out, instead of an opt-in.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Needed for ia64 -- alternatively we drop the entire hook.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When an architecture does not have (an efficient) flush_tlb_range(),
but instead always uses full TLB invalidates, the current generic
tlb_flush() is sub-optimal, for it will generate extra flushes in
order to keep the range small.
But if we cannot do range flushes, that is a moot concern. Optionally
provide this simplified default.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Provide a generic tlb_flush() implementation that relies on
flush_tlb_range(). This is a little awkward because flush_tlb_range()
assumes a VMA for range invalidation, but we no longer have one.
Audit of all flush_tlb_range() implementations shows only vma->vm_mm
and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
invalidates) and VM_HUGETLB (large TLB invalidate) are used.
Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
'fake' VMA.
This allows architectures that have a reasonably efficient
flush_tlb_range() to not require any additional effort.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The one obvious thing SH and ARM want is a sensible default for
tlb_start_vma(). (also: https://lkml.org/lkml/2004/1/15/6 )
Avoid all VIPT architectures providing their own tlb_start_vma()
implementation and rely on architectures to provide a no-op
flush_cache_range() when it is not relevant.
This patch makes tlb_start_vma() default to flush_cache_range(), which
should be right and sufficient. The only exceptions that I found where
(oddly):
- m68k-mmu
- sparc64
- unicore
Those architectures appear to have flush_cache_range(), but their
current tlb_start_vma() does not call it.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the mmu_gather::page_size things into the generic code instead of
PowerPC specific bits.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Write a comment explaining some of this..
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The reference to '__vermagic' is a relict from v2.5 times. And even
there it had a very short life time, from v2.5.59 (commit 1d411b80ee18
("Module Sanity Check") in the historic tree) to v2.5.69 (commit
67ac5b866bda ("[PATCH] complete modinfo section")).
Neither current kernels nor modules contain a '__vermagic' section any
more, so get rid of it.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes
the common Kbuild.asm file. Factor out the duplicated include directives
to scripts/Makefile.asm-generic so that no architecture would opt out
of the mandatory-y mechanism.
um is not forced to include mandatory-y since it is a very exceptional
case which does not support UAPI.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Make the unwinder more robust when it encounters a NULL pointer
call, so the backtrace becomes more useful
- Fix the bogus ORC unwind table alignment
- Prevent kernel panic during kexec on HyperV caused by a cleared but
not disabled hypercall page.
- Remove the now pointless stacksize increase for KASAN_EXTRA, as
KASAN_EXTRA is gone.
- Remove unused variables from the x86 memory management code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Fix kernel panic when kexec on HyperV
x86/mm: Remove unused variable 'old_pte'
x86/mm: Remove unused variable 'cpu'
Revert "x86_64: Increase stack size for KASAN_EXTRA"
x86/unwind: Add hardcoded ORC entry for NULL
x86/unwind: Handle NULL pointer calls better in frame unwinder
x86/unwind/orc: Fix ORC unwind table alignment
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
- uaccess macros clean-up (unsafe user accessors also merged but
reverted, waiting for objtool support on arm64)
- ptrace regsets for Pointer Authentication (ARMv8.3) key management
- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by the
riscv maintainers)
- arm64/perf updates: PMU bindings converted to json-schema, unused
variable and misleading comment removed
- arm64/debug fixes to ensure checking of the triggering exception level
and to avoid the propagation of the UNKNOWN FAR value into the si_code
for debug signals
- Workaround for Fujitsu A64FX erratum 010001
- lib/raid6 ARM NEON optimisations
- NR_CPUS now defaults to 256 on arm64
- Minor clean-ups (documentation/comments, Kconfig warning, unused
asm-offsets, clang warnings)
- MAINTAINERS update for list information to the ARM64 ACPI entry
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlyCl0cACgkQa9axLQDI
XvEyKxAAiogBZLbyhcy8bTUHVzVoJE0FyAkdO2wWnnaff2Ohkhy1Y/npv33IeK2q
RknxqDIx2DUUVPJNRZGoI/WwBtTZdKaAnW4rIKG84yC1eAkFcd96WQasaZzcp1qY
HmvbJiYXM0bh+0J7i3Wgry/QzOkrltJFJW2kp6Wd5aFE+R1WyWyxT6d+Fp0J3vlA
bT70jlpBK6LXEOmmBS+04Ml02+8MvaGxIl8EInBHSfDLRLErj5E8n41rRHKUiSWz
maWI+kVoLYwOE68xiZlDftUBEeQpUSWgg2nxeK+640QSl1wJmVcRcY9nm6TZeMG2
AiZTR9a7cP5rrdSN5suUmb7d4AMMVlVMisGDlwb+9oCxeTRDzg0uwACaVgHfPqQr
UeBdHbL9nStN7uBH23H8L9mKk+tqpFmk0sgzdrKejOwysAiqWV8aazb/Na3qnVRl
J1B5opxMnGOsjXmHvtG/tiZl281Uwz5ZmzfLmIY3gUZgUgdA3511Egp0ry5y1dzJ
SkYC4Hmzb2ybQvXGIDDa3OzCwXXiqyqKsO+O8Egg1k4OIwbp3w+NHE7gKeA+dMgD
gjN7zEalCUi46Q28xiCPEb+88BpQ18czIWGQLb9mAnmYeZPjqqenXKXuRHr4lgVe
jPURJ/vqvFEglZJN1RDuQHKzHEcm5f2XE566sMZYdSoeiUCb0QM=
=2U56
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
- uaccess macros clean-up (unsafe user accessors also merged but
reverted, waiting for objtool support on arm64)
- ptrace regsets for Pointer Authentication (ARMv8.3) key management
- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by
the riscv maintainers)
- arm64/perf updates: PMU bindings converted to json-schema, unused
variable and misleading comment removed
- arm64/debug fixes to ensure checking of the triggering exception
level and to avoid the propagation of the UNKNOWN FAR value into the
si_code for debug signals
- Workaround for Fujitsu A64FX erratum 010001
- lib/raid6 ARM NEON optimisations
- NR_CPUS now defaults to 256 on arm64
- Minor clean-ups (documentation/comments, Kconfig warning, unused
asm-offsets, clang warnings)
- MAINTAINERS update for list information to the ARM64 ACPI entry
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
arm64: mmu: drop paging_init comments
arm64: debug: Ensure debug handlers check triggering exception level
arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
Revert "arm64: uaccess: Implement unsafe accessors"
arm64: avoid clang warning about self-assignment
arm64: Kconfig.platforms: fix warning unmet direct dependencies
lib/raid6: arm: optimize away a mask operation in NEON recovery routine
lib/raid6: use vdupq_n_u8 to avoid endianness warnings
arm64: io: Hook up __io_par() for inX() ordering
riscv: io: Update __io_[p]ar() macros to take an argument
asm-generic/io: Pass result of I/O accessor to __io_[p]ar()
arm64: Add workaround for Fujitsu A64FX erratum 010001
arm64: Rename get_thread_info()
arm64: Remove documentation about TIF_USEDFPU
arm64: irqflags: Fix clang build warnings
arm64: Enable the support of pseudo-NMIs
arm64: Skip irqflags tracing for NMI in IRQs disabled context
arm64: Skip preemption when exiting an NMI
arm64: Handle serror in NMI context
irqchip/gic-v3: Allow interrupts to be set as pseudo-NMI
...
Here is the big char/misc driver patch pull request for 5.1-rc1.
The largest thing by far is the new habanalabs driver for their AI
accelerator chip. For now it is in the drivers/misc directory but will
probably move to a new directory soon along with other drivers of this
type.
Other than that, just the usual set of individual driver updates and
fixes. There's an "odd" merge in here from the DRM tree that they asked
me to do as the MEI driver is starting to interact with the i915 driver,
and it needed some coordination. All of those patches have been
properly acked by the relevant subsystem maintainers.
All of these have been in linux-next with no reported issues, most for
quite some time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXH+dPQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ym1fACgvpZAxjNzoRQJ6f06tc8ujtPk9rUAnR+tCtrZ
9e3l7H76oe33o96Qjhor
=8A2k
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver patch pull request for 5.1-rc1.
The largest thing by far is the new habanalabs driver for their AI
accelerator chip. For now it is in the drivers/misc directory but will
probably move to a new directory soon along with other drivers of this
type.
Other than that, just the usual set of individual driver updates and
fixes. There's an "odd" merge in here from the DRM tree that they
asked me to do as the MEI driver is starting to interact with the i915
driver, and it needed some coordination. All of those patches have
been properly acked by the relevant subsystem maintainers.
All of these have been in linux-next with no reported issues, most for
quite some time"
* tag 'char-misc-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (219 commits)
habanalabs: adjust Kconfig to fix build errors
habanalabs: use %px instead of %p in error print
habanalabs: use do_div for 64-bit divisions
intel_th: gth: Fix an off-by-one in output unassigning
habanalabs: fix little-endian<->cpu conversion warnings
habanalabs: use NULL to initialize array of pointers
habanalabs: fix little-endian<->cpu conversion warnings
habanalabs: soft-reset device if context-switch fails
habanalabs: print pointer using %p
habanalabs: fix memory leak with CBs with unaligned size
habanalabs: return correct error code on MMU mapping failure
habanalabs: add comments in uapi/misc/habanalabs.h
habanalabs: extend QMAN0 job timeout
habanalabs: set DMA0 completion to SOB 1007
habanalabs: fix validation of WREG32 to DMA completion
habanalabs: fix mmu cache registers init
habanalabs: disable CPU access on timeouts
habanalabs: add MMU DRAM default page mapping
habanalabs: Dissociate RAZWI info from event types
misc/habanalabs: adjust Kconfig to fix build errors
...
The .orc_unwind section is a packed array of 6-byte structs. It's
currently aligned to 6 bytes, which is causing warnings in the LLD
linker.
Six isn't a power of two, so it's not a valid alignment value. The
actual alignment doesn't matter much because it's an array of packed
structs. An alignment of two is sufficient. In reality it always gets
aligned to four bytes because it comes immediately after the
4-byte-aligned .orc_unwind_ip section.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Dmitry Golovin <dima@golovin.in>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/218
Link: https://lkml.kernel.org/r/d55027ee95fe73e952dcd8be90aebd31b0095c45.1551892041.git.jpoimboe@redhat.com
Merge misc updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (159 commits)
tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
proc: more robust bulk read test
proc: test /proc/*/maps, smaps, smaps_rollup, statm
proc: use seq_puts() everywhere
proc: read kernel cpu stat pointer once
proc: remove unused argument in proc_pid_lookup()
fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
fs/proc/self.c: code cleanup for proc_setup_self()
proc: return exit code 4 for skipped tests
mm,mremap: bail out earlier in mremap_to under map pressure
mm/sparse: fix a bad comparison
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
writeback: fix inode cgroup switching comment
mm/huge_memory.c: fix "orig_pud" set but not used
mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
mm/memcontrol.c: fix bad line in comment
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm/compaction: pass pgdat to too_many_isolated() instead of zone
mm: remove zone_lru_lock() function, access ->lru_lock directly
...
Only a few small changes this time:
- Michael S. Tsirkin cleans up linux/mman.h
- Mike Rapoport found a typo
I had originally merged another cleanup series for I/O accessors from
Hugo Lefeuvre as well, but dropped it after the discussion of the barrier
semantics and some conflicts. I expect this series to get merged for a
later release though.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcf9yrAAoJEGCrR//JCVInohIP+wcCZ3XEym0oyb9C5BgIdnMF
qq69wbOT9ypAnB469PPjsvGhyCsqGo1Fm2Cludac0C/OfmB/lYJCqklAJrc6a7Cp
jXD5rQkXQZGdmFHS81i24ejcNV9F5fh16vyDwqCgUQDCgg9MeDPirvwaDl958rhq
5dUsoUu+CJRI35jZ8NPWbsSU5Wa2BpckWnuTs97PtlLnMH/RzEyxmSRysxtHNA7S
PgYjvWBHbRK6mTNgcY/eojAoQuQmBgiOppYX1XTffHZgwY/VBIjYL3Wxwe1qbvSN
vvLgE6nvvmRYjdq4VOoVbi9BOtWiCtXw29TWXfxSetN9CWCEwIQJdIm8lHn1CJRh
6GnOdb8ADqAzraxo7GOf78kNsekl3mxh+QYN5gDaGS6eQASCNlGx70zUh7Y6JcE5
6NJ8VMy0oEEFzHAsE0mxluu8LHL3F1hwS832D6mQ697Z71T+IoHgIcMT2jHqp9fw
7/D2taoBAXdGqhToY3hhMMWsi4lFCvxjVVlxCxhp7Ik+cyOgW0O5vG2afLOHrtti
vWEcn7M6nuKvb3MxBVjg8sK2ln6vIXNgZGjVmkxn70ZAmvZJX+KE+G1cvB2gMGc0
S/ogtpbETMgMCwuAf2hdYgiBFJpZL725DEWFfS+p+02PTwjdKF+EWAS4swZM0sCE
U1v1N7yCe/Wb/ijEm0M1
=C9p7
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"Only a few small changes this time:
- Michael S. Tsirkin cleans up linux/mman.h
- Mike Rapoport found a typo
I had originally merged another cleanup series for I/O accessors from
Hugo Lefeuvre as well, but dropped it after the discussion of the
barrier semantics and some conflicts. I expect this series to get
merged for a later release though"
* tag 'asm-generic-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic/page.h: fix typo in #error text requiring a real asm/page.h
arch: move common mmap flags to linux/mman.h
drm: tweak header name
x86/mpx: tweak header name
Pull locking updates from Ingo Molnar:
"The biggest part of this tree is the new auto-generated atomics API
wrappers by Mark Rutland.
The primary motivation was to allow instrumentation without uglifying
the primary source code.
The linecount increase comes from adding the auto-generated files to
the Git space as well:
include/asm-generic/atomic-instrumented.h | 1689 ++++++++++++++++--
include/asm-generic/atomic-long.h | 1174 ++++++++++---
include/linux/atomic-fallback.h | 2295 +++++++++++++++++++++++++
include/linux/atomic.h | 1241 +------------
I preferred this approach, so that the full call stack of the (already
complex) locking APIs is still fully visible in 'git grep'.
But if this is excessive we could certainly hide them.
There's a separate build-time mechanism to determine whether the
headers are out of date (they should never be stale if we do our job
right).
Anyway, nothing from this should be visible to regular kernel
developers.
Other changes:
- Add support for dynamic keys, which removes a source of false
positives in the workqueue code, among other things (Bart Van
Assche)
- Updates to tools/memory-model (Andrea Parri, Paul E. McKenney)
- qspinlock, wake_q and lockdep micro-optimizations (Waiman Long)
- misc other updates and enhancements"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
locking/lockdep: Shrink struct lock_class_key
locking/lockdep: Add module_param to enable consistency checks
lockdep/lib/tests: Test dynamic key registration
lockdep/lib/tests: Fix run_tests.sh
kernel/workqueue: Use dynamic lockdep keys for workqueues
locking/lockdep: Add support for dynamic keys
locking/lockdep: Verify whether lock objects are small enough to be used as class keys
locking/lockdep: Check data structure consistency
locking/lockdep: Reuse lock chains that have been freed
locking/lockdep: Fix a comment in add_chain_cache()
locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count()
locking/lockdep: Reuse list entries that are no longer in use
locking/lockdep: Free lock classes that are no longer in use
locking/lockdep: Update two outdated comments
locking/lockdep: Make it easy to detect whether or not inside a selftest
locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock()
locking/lockdep: Initialize the locks_before and locks_after lists earlier
locking/lockdep: Make zap_class() remove all matching lock order entries
locking/lockdep: Reorder struct lock_class members
locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cache
...
Architectures like ppc64 require to do a conditional tlb flush based on
the old and new value of pte. Enable that by passing old pte value as
the arg.
Link: http://lkml.kernel.org/r/20190116085035.29729-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "NestMMU pte upgrade workaround for mprotect", v5.
We can upgrade pte access (R -> RW transition) via mprotect. We need to
make sure we follow the recommended pte update sequence as outlined in
commit bd5050e38a ("powerpc/mm/radix: Change pte relax sequence to
handle nest MMU hang") for such updates. This patch series does that.
This patch (of 5):
Some architectures may want to call flush_tlb_range from these helpers.
Link: http://lkml.kernel.org/r/20190116085035.29729-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every in-kernel use of this function defined it to KERNEL_DS (either as
an actual define, or as an inline function). It's an entirely
historical artifact, and long long long ago used to actually read the
segment selector valueof '%ds' on x86.
Which in the kernel is always KERNEL_DS.
Inspired by a patch from Jann Horn that just did this for a very small
subset of users (the ones in fs/), along with Al who suggested a script.
I then just took it to the logical extreme and removed all the remaining
gunk.
Roughly scripted with
git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/'
git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d'
plus manual fixups to remove a few unusual usage patterns, the couple of
inline function cases and to fix up a comment that had become stale.
The 'get_ds()' function remains in an x86 kvm selftest, since in user
space it actually does something relevant.
Inspired-by: Jann Horn <jannh@google.com>
Inspired-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The inX() and readX() I/O accessors must enforce ordering against
subsequent calls to the delay() routines, so that a read-back from a
device can be used to postpone a subsequent write to the same device.
On some architectures, including arm64, this ordering can only be
achieved by creating a dependency on the value returned by the I/O
accessor operation, so we need to pass the value we read to the
__io_par() and __io_ar() macros in these cases.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Replace "need to prove a real asm/page.h" with "need to provide a real
asm/page.h"
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We currently check the atomic headers at build-time to ensure they
haven't been modified directly, and these checks require regenerating
the headers in full. As this takes a few seconds, even when
parallelized, this is too slow to run for every kernel build.
Instead, we can generate a hash of each header as we generate them,
which we can cheaply check at build time (~0.16s for all headers).
This patch does so, updating headers with their hashes using the new
gen-atomics.sh script. As some users apparently build the kernel wihout
coreutils, lacking sha1sum, the checks are skipped in this case.
Presumably, most developers have a working coreutils installation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: anders.roxell@linaro.org
Cc: linux-kernel@vger.kernel.rg
Cc: naresh.kamboju@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The WARN_ON_SMP() comment header suggests using spin_is_locked() to
check for locks being held. But these days we prefer lockdep_assert_held(),
so this commit removes that suggestion.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
In order to provide non-atomic functions for io{read|write}64 that will
use readq and writeq when appropriate. We define a number of variants
of these functions in the generic iomap that will do non-atomic
operations on pio but atomic operations on mmio.
These functions are only defined if readq and writeq are defined. If
they are not, then the wrappers that always use non-atomic operations
from include/linux/io-64-nonatomic*.h will be used.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Suresh Warrier <warrier@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merge more updates from Andrew Morton:
- procfs updates
- various misc bits
- lib/ updates
- epoll updates
- autofs
- fatfs
- a few more MM bits
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
mm/page_io.c: fix polled swap page in
checkpatch: add Co-developed-by to signature tags
docs: fix Co-Developed-by docs
drivers/base/platform.c: kmemleak ignore a known leak
fs: don't open code lru_to_page()
fs/: remove caller signal_pending branch predictions
mm/: remove caller signal_pending branch predictions
arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
kernel/sched/: remove caller signal_pending branch predictions
kernel/locking/mutex.c: remove caller signal_pending branch predictions
mm: select HAVE_MOVE_PMD on x86 for faster mremap
mm: speed up mremap by 20x on large regions
mm: treewide: remove unused address argument from pte_alloc functions
initramfs: cleanup incomplete rootfs
scripts/gdb: fix lx-version string output
kernel/kcov.c: mark write_comp_data() as notrace
kernel/sysctl: add panic_print into sysctl
panic: add options to print system info when panic happens
bfs: extra sanity checking and static inode bitmap
exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
...
When testing in userspace, UBSAN pointed out that shifting into the sign
bit is undefined behaviour. It doesn't really make sense to ask for the
highest set bit of a negative value, so just turn the argument type into
an unsigned int.
Some architectures (eg ppc) already had it declared as an unsigned int,
so I don't expect too many problems.
Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc updates from Andrew Morton:
- large KASAN update to use arm's "software tag-based mode"
- a few misc things
- sh updates
- ocfs2 updates
- just about all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits)
kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
memcg, oom: notify on oom killer invocation from the charge path
mm, swap: fix swapoff with KSM pages
include/linux/gfp.h: fix typo
mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
memory_hotplug: add missing newlines to debugging output
mm: remove __hugepage_set_anon_rmap()
include/linux/vmstat.h: remove unused page state adjustment macro
mm/page_alloc.c: allow error injection
mm: migrate: drop unused argument of migrate_page_move_mapping()
blkdev: avoid migration stalls for blkdev pages
mm: migrate: provide buffer_migrate_page_norefs()
mm: migrate: move migrate_page_lock_buffers()
mm: migrate: lock buffers before migrate_page_move_mapping()
mm: migration: factor out code to compute expected number of page references
mm, page_alloc: enable pcpu_drain with zone capability
kmemleak: add config to select auto scan
mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
...
A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for architectures
that have devices that perform DMA that is not cache coherent. Based
on the existing arm64 implementation and also used for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel data
leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlwctQgLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMxgQ//dBpAfS4/J76CdAbYry2zqgcOUU9hIrD6NHiEMWov
ltJxyvEl3LsUmIdEj3aCrYL9jZN0qsnCzn5BVj2c3jDIVgD64fAr7HDf/PbEEfKb
j6/GgEnVLPZV+sQMvhNA5jOzHrkseaqPa4/pNLFZ/l8jnuZ2d+btusDWJpMoVDer
TXVwtIfgeIu0gTygYOShLYXd5qptWKWsZEpbTZOO2sE6+x+ZJX7yQYUxYDTlcOIj
JWVO2l5QNHPc5T9o2at+6L5aNUvnZOxT79sWgyZLn0Kc+FagKAVwfLqUEl0v7foG
8k/xca5/8p3afB1DfrIrtplJqis7cVgdyGxriwuuoO8X4F0nPyWwpGmxsBhrWwwl
xTqC4UorEJ7QwoP6Azopk/vYI2QXIUBLjuCJCuFXZj9+2BGf4IfvBY1S2cLM9qLs
HMcxQonuXJii044KEFS96ePEuiT+igVINweIFBKWcgNCEG0UQtyL6RQ1U5297ipF
JiWZAqD+p9X52UdKS+oKfAiZEekMXn6Xyo97+YCiNpfOo0GP5eEcwhL+JpY4AiRq
apPXtsRy2o1s8yfjdraUIM2Mc2n62vFKb35oUbGCd/QO9piPrFQHl6T0HHcHk4YR
XrUXcHieFZBCYqh7ZVa4RL8Msq1wvGuTL4Dxl43mXdsMoUFRR6eSNWLoAV4IpOLZ
WgA=
=in72
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping
Pull DMA mapping updates from Christoph Hellwig:
"A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for
architectures that have devices that perform DMA that is not cache
coherent. Based on the existing arm64 implementation and also used
for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel
data leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script"
* tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits)
dma-mapping: fix inverted logic in dma_supported
dma-mapping: deprecate dma_zalloc_coherent
dma-mapping: zero memory returned from dma_alloc_*
sparc/iommu: fix ->map_sg return value
sparc/io-unit: fix ->map_sg return value
arm64: default to the direct mapping in get_arch_dma_ops
PCI: Remove unused attr variable in pci_dma_configure
ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled
dma-mapping: bypass indirect calls for dma-direct
vmd: use the proper dma_* APIs instead of direct methods calls
dma-direct: merge swiotlb_dma_ops into the dma_direct code
dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
dma-direct: improve addressability error reporting
swiotlb: remove dma_mark_clean
swiotlb: remove SWIOTLB_MAP_ERROR
ACPI / scan: Refactor _CCA enforcement
dma-mapping: factor out dummy DMA ops
dma-mapping: always build the direct mapping code
dma-mapping: move dma_cache_sync out of line
dma-mapping: move various slow path functions out of line
...
Model call chain after should_failslab(). Likewise, we can now use a
kprobe to override the return value of should_fail_alloc_page() and inject
allocation failures into alloc_page*().
This will allow injecting allocation failures using the BCC tools even
without building kernel with CONFIG_FAIL_PAGE_ALLOC and booting it with a
fail_page_alloc= parameter, which incurs some overhead even when failures
are not being injected. On the other hand, this patch adds an
unconditional call to should_fail_alloc_page() from page allocation
hotpath. That overhead should be rather negligible with
CONFIG_FAIL_PAGE_ALLOC=n when there's no kprobe attached, though.
[vbabka@suse.cz: changelog addition]
Link: http://lkml.kernel.org/r/20181214074330.18917-1-bpoirier@suse.com
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Whilst no architectures actually enable support for huge p4d mappings in
the vmap area, the code that is implemented should be using
break-before-make, as we do for pud and pmd huge entries.
Link: http://lkml.kernel.org/r/1544120495-17438-6-git-send-email-will.deacon@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- Update and clean up x86 fault handling, by Andy Lutomirski.
- Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
and related fallout, by Dan Williams.
- CPA cleanups and reorganization by Peter Zijlstra: simplify the
flow and remove a few warts.
- Other misc cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
x86/mm/dump_pagetables: Use DEFINE_SHOW_ATTRIBUTE()
x86/mm/cpa: Rename @addrinarray to @numpages
x86/mm/cpa: Better use CLFLUSHOPT
x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function
x86/mm/cpa: Make cpa_data::numpages invariant
x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation
x86/mm/cpa: Simplify the code after making cpa->vaddr invariant
x86/mm/cpa: Make cpa_data::vaddr invariant
x86/mm/cpa: Add __cpa_addr() helper
x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests
x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
x86/mm: Validate kernel_physical_mapping_init() PTE population
generic/pgtable: Introduce set_pte_safe()
generic/pgtable: Introduce {p4d,pgd}_same()
generic/pgtable: Make {pmd, pud}_same() unconditionally available
x86/fault: Clean up the page fault oops decoder a bit
x86/fault: Decode page fault OOPSes better
x86/vsyscall/64: Use X86_PF constants in the simulated #PF error code
x86/oops: Show the correct CS value in show_regs()
x86/fault: Don't try to recover from an implicit supervisor access
...
Pull x86 fixes from Ingo Molnar:
"The biggest part is a series of reverts for the macro based GCC
inlining workarounds. It caused regressions in distro build and other
kernel tooling environments, and the GCC project was very receptive to
fixing the underlying inliner weaknesses - so as time ran out we
decided to do a reasonably straightforward revert of the patches. The
plan is to rely on the 'asm inline' GCC 9 feature, which might be
backported to GCC 8 and could thus become reasonably widely available
on modern distros.
Other than those reverts, there's misc fixes from all around the
place.
I wish our final x86 pull request for v4.20 was smaller..."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs"
Revert "x86/objtool: Use asm macros to work around GCC inlining bugs"
Revert "x86/refcount: Work around GCC inlining bug"
Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs"
Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs"
Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops"
Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs"
Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs"
Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"
x86/mtrr: Don't copy uninitialized gentry fields back to userspace
x86/fsgsbase/64: Fix the base write helper functions
x86/mm/cpa: Fix cpa_flush_array() TLB invalidation
x86/vdso: Pass --eh-frame-hdr to the linker
x86/mm: Fix decoy address handling vs 32-bit builds
x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence
x86/dump_pagetables: Fix LDT remap address marker
x86/mm: Fix guard hole handling
This define is used by arm64 to calculate the size of the vmemmap
region. It is defined as the log2 of the upper bound on the size of a
struct page.
We move it into mm_types.h so it can be defined properly instead of set
and checked with a build bug. This also allows us to use the same
define for riscv.
Link: http://lkml.kernel.org/r/20181107205433.3875-2-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
The usage of __flush_tlb_all() in the kernel_physical_mapping_init()
path is not necessary. In general flushing the TLB is not required when
updating an entry from the !present state. However, to give confidence
in the future removal of TLB flushing in this path, use the new
set_pte_safe() family of helpers to assert that the !present assumption
is true in this path.
[ mingo: Minor readability edits. ]
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154395944177.32119.8524957429632012270.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
f77084d963 "x86/mm/pat: Disable preemption around __flush_tlb_all()"
introduced a warning to capture cases __flush_tlb_all() is called without
pre-emption disabled. It triggers a false positive warning in the memory
hotplug path.
On investigation it was found that the __flush_tlb_all() calls are not
necessary. However, they are only "not necessary" in practice provided
the ptes are being initially populated from the !present state.
Introduce set_pte_safe() as a sanity check that the pte is being updated
in a way that does not require a TLB flush.
Forgive the macro, the availability of the various of set_pte() levels
is hit and miss across architectures.
[ mingo: Minor readability edits. ]
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/279dadae-9148-465c-7ec6-3f37e026c6c9@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for introducing '_safe' versions of page table entry 'set'
helpers, introduce generic versions of p4d_same() and pgd_same().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154395943153.32119.1733586547359626311.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for {pmd,pud}_same() to be used outside of transparent
huge page code paths, make them unconditionally available. This enables
them to be used in the definition of a new family of
set_{pte,pmd,pud,p4d,pgd}_safe() helpers.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154395942644.32119.10238934183949418128.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
My main motivation of this commit is to clean up scripts/Kbuild.include
and scripts/Makefile.build.
Currently, CONFIG_TRIM_UNUSED_KSYMS works with a tricky gimmick;
possibly exported symbols are detected by letting $(CPP) replace
EXPORT_SYMBOL* with a special string '=== __KSYM_*===', which is
post-processed by sed, and passed to fixdep. The extra preprocessing
is costly, and hacking cmd_and_fixdep is ugly.
I came up with a new way to find exported symbols; insert a dummy
symbol __ksym_marker_* to each potentially exported symbol. Those
dummy symbols are picked up by $(NM), post-processed by sed, then
appended to .*.cmd files. I collected the post-process part to a
new shell script scripts/gen_ksymdeps.sh for readability. The dummy
symbols are put into the .discard.* section so that the linker
script rips them off the final vmlinux or modules.
A nice side-effect is building with CONFIG_TRIM_UNUSED_KSYMS will
be much faster.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
- A fix for the pgtable_bytes misaccounting on s390. The patch changes
common code part in regard to page table folding and adds extra
checks to mm_[inc|dec]_nr_[pmds|puds].
- Add FORCE for all build targets using if_changed
- Use non-loadable phdr for the .vmlinux.info section to avoid
a segment overlap that confuses kexec
- Cleanup the attribute definition for the diagnostic sampling
- Increase stack size for CONFIG_KASAN=y builds
- Export __node_distance to fix a build error
- Correct return code of a PMU event init function
- An update for the default configs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJb5TIMAAoJEDjwexyKj9rgIH8H/0daZTyxcLwY9gbigaq1Qs4R
/ScmAJJc2U/Qj8b9UskhsmHAUuAufF2oljU16SquP7CBGhtkLRrjPtdh1AMiiZGM
reVF7X5LU8MH0QUoNnKPWAL4DD1q2E99IAEH5TeGIODUG6srqvIHBNtXDWNLPtBf
fpOhJ/NssgxyuYUXi/WnoEjIyP8KABeG6SlpcLzYbmY1hUOIXcixuv39UrL0G5OO
P8ciL+W5rTcPZCnpJ1Xk9hKploT8gWXhMT5QhNnakgMF/25v80+TZy5xRZMuLAmQ
T5SFP6B71o05nLK7fLi3VAIKPv/QibjiyJOEf9uUHdo1XZcD5uRu0EQ/LklLUBU=
=4H06
-----END PGP SIGNATURE-----
Merge tag 's390-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
- A fix for the pgtable_bytes misaccounting on s390. The patch changes
common code part in regard to page table folding and adds extra
checks to mm_[inc|dec]_nr_[pmds|puds].
- Add FORCE for all build targets using if_changed
- Use non-loadable phdr for the .vmlinux.info section to avoid a
segment overlap that confuses kexec
- Cleanup the attribute definition for the diagnostic sampling
- Increase stack size for CONFIG_KASAN=y builds
- Export __node_distance to fix a build error
- Correct return code of a PMU event init function
- An update for the default configs
* tag 's390-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/perf: Change CPUM_CF return code in event init function
s390: update defconfigs
s390/mm: Fix ERROR: "__node_distance" undefined!
s390/kasan: increase instrumented stack size to 64k
s390/cpum_sf: Rework attribute definition for diagnostic sampling
s390/mm: fix mis-accounting of pgtable_bytes
mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
mm: introduce mm_[p4d|pud|pmd]_folded
mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
s390: avoid vmlinux segments overlap
s390/vdso: add missing FORCE to build targets
s390/decompressor: add missing FORCE to build targets
Add three architecture overrideable functions to test if the
p4d, pud, or pmd layer of a page table is folded or not.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Change the currently empty defines for __PAGETABLE_PMD_FOLDED,
__PAGETABLE_PUD_FOLDED and __PAGETABLE_P4D_FOLDED to return 1.
This makes it possible to use __is_defined() to test if the
preprocessor define exists.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As a step towards ensuring the atomic* APIs are consistent, let's switch
to wrappers generated by gen-atomic-instrumented.h, using the same table
used to generate the fallbacks and atomic-long wrappers.
These are checked in rather than generated with Kbuild, since:
* This allows inspection of the atomics with git grep and ctags on a
pristine tree, which Linus strongly prefers being able to do.
* The fallbacks are not affected by machine details or configuration
options, so it is not necessary to regenerate them to take these into
account.
* These are included by files required *very* early in the build process
(e.g. for generating bounds.h), and we'd rather not complicate the
top-level Kbuild file with dependencies.
Generating the atomic headers means that the instrumented wrappers will
remain in sync with the rest of the atomic APIs, and we gain all the
ordering variants of each atomic without having to manually expanded
them all.
The KASAN checks are automatically generated based on the function
parameters defined in atomics.tbl. Note that try_cmpxchg() now correctly
treats 'old' as a parameter that may be written to, and not only read as
the hand-written instrumentation assumed.
Other than the change to try_cmpxchg(), existing code should not be
affected by this patch. The patch introduces instrumentation for all
optional atomics (and ordering variants), along with the ifdeffery this
requires, enabling other architectures to make use of the instrumented
atomics.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com
Cc: linuxdrivers@attotech.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Link: http://lkml.kernel.org/r/20180904104830.2975-5-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As a step towards ensuring the atomic* APIs are consistent, let's switch
to wrappers generated by gen-atomic-long.h, using the same table that
gen-atomic-fallbacks.h uses to fill in gaps in the atomic_* and
atomic64_* APIs.
These are checked in rather than generated with Kbuild, since:
* This allows inspection of the atomics with git grep and ctags on a
pristine tree, which Linus strongly prefers being able to do.
* The fallbacks are not affected by machine details or configuration
options, so it is not necessary to regenerate them to take these into
account.
* These are included by files required *very* early in the build process
(e.g. for generating bounds.h), and we'd rather not complicate the
top-level Kbuild file with dependencies.
Other than *_INIT() and *_cond_read_acquire(), all API functions are
implemented as static inline C functions, ensuring consistent type
promotion and/or truncation without requiring explicit casts to be
applied to parameters or return values.
Since we typedef atomic_long_t to either atomic_t or atomic64_t, we know
these types are equivalent, and don't require explicit casts between
them. However, as the try_cmpxchg*() functions take a pointer for the
'old' parameter, which may be an int or s64, an explicit cast is
generated for this.
There should be no functional change as a result of this patch (i.e.
existing code should not be affected). However, this introduces a number
of functions into the atomic_long_* API, bringing it into line with the
atomic_* and atomic64_* APIs.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com
Cc: linuxdrivers@attotech.com
Cc: dvyukov@google.com
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: aryabinin@virtuozzo.com
Cc: glider@google.com
Link: http://lkml.kernel.org/r/20180904104830.2975-4-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use the same
version of huge_ptep_get, so move this generic implementation into
asm-generic/hugetlb.h.
[arnd@arndb.de: fix ARM 3level page tables]
Link: http://lkml.kernel.org/r/20181005161722.904274-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20180920060358.16606-12-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, ia64, sh, x86 architectures use the same version
of huge_ptep_set_access_flags, so move this generic implementation
into asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-11-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, ia64, mips, powerpc, sh, x86 architectures use the same version of
huge_ptep_set_wrprotect, so move this generic implementation into
asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-10-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, arm64, powerpc, sparc, x86 architectures use the same version of
prepare_hugepage_range, so move this generic implementation into
asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-9-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use
the same version of huge_pte_wrprotect, so move this generic
implementation into asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-8-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use
the same version of huge_pte_none, so move this generic implementation
into asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-7-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, x86 architectures use the same version of huge_ptep_clear_flush, so
move this generic implementation into asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-6-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, ia64, sh, x86 architectures use the same version of
huge_ptep_get_and_clear, so move this generic implementation into
asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-5-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, ia64, mips, powerpc, sh, x86 architectures use the same version of
set_huge_pte_at, so move this generic implementation into
asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-4-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm, arm64, mips, parisc, sh, x86 architectures use the same version of
hugetlb_free_pgd_range, so move this generic implementation into
asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-3-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to reduce copy/paste of functions across architectures and then
make riscv hugetlb port (and future ports) simpler and smaller, this
patchset intends to factorize the numerous hugetlb primitives that are
defined across all the architectures.
Except for prepare_hugepage_range, this patchset moves the versions that
are just pass-through to standard pte primitives into
asm-generic/hugetlb.h by using the same #ifdef semantic that can be found
in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.
s390 architecture has not been tackled in this serie since it does not use
asm-generic/hugetlb.h at all.
This patchset has been compiled on all addressed architectures with
success (except for parisc, but the problem does not come from this
series).
This patch (of 11):
asm-generic/hugetlb.h proposes generic implementations of hugetlb related
functions: use __HAVE_ARCH_HUGE* defines in order to make arch specific
implementations of hugetlb functions consistent with pgtable.h scheme.
Link: http://lkml.kernel.org/r/20180920060358.16606-2-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org> [x86]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Driver core patches for 4.20-rc1
Here is a small number of driver core patches for 4.20-rc1.
Not much happened here this merge window, only a very tiny number of
patches that do:
- add BUS_ATTR_WO() for use by drivers
- component error path fixes
- kernfs range check fix
- other tiny error path fixes and const changes
All of these have been in linux-next with no reported issues for a
while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW9Lhtw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykHTgCguaJ3SgRefuC/WijjqboTC/SikCoAnRVTUxfU
v8BisSN22kR3jmxwsXud
=/IvY
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is a small number of driver core patches for 4.20-rc1.
Not much happened here this merge window, only a very tiny number of
patches that do:
- add BUS_ATTR_WO() for use by drivers
- component error path fixes
- kernfs range check fix
- other tiny error path fixes and const changes
All of these have been in linux-next with no reported issues for a
while"
* tag 'driver-core-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
devres: provide devm_kstrdup_const()
mm: move is_kernel_rodata() to asm-generic/sections.h
devres: constify p in devm_kfree()
driver core: add BUS_ATTR_WO() macro
kernfs: Fix range checks in kernfs_get_target_path
component: fix loop condition to call unbind() if bind() fails
drivers/base/devtmpfs.c: don't pretend path is const in delete_path
kernfs: update comment about kernfs_path() return value
Pull timekeeping updates from Thomas Gleixner:
"The timers and timekeeping departement provides:
- Another large y2038 update with further preparations for providing
the y2038 safe timespecs closer to the syscalls.
- An overhaul of the SHCMT clocksource driver
- SPDX license identifier updates
- Small cleanups and fixes all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
tick/sched : Remove redundant cpu_online() check
clocksource/drivers/dw_apb: Add reset control
clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE
clocksource/drivers: Unify the names to timer-* format
clocksource/drivers/sh_cmt: Add R-Car gen3 support
dt-bindings: timer: renesas: cmt: document R-Car gen3 support
clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer
clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines
clocksource/drivers/sh_cmt: Fixup for 64-bit machines
clocksource/drivers/sh_tmu: Convert to SPDX identifiers
clocksource/drivers/sh_mtu2: Convert to SPDX identifiers
clocksource/drivers/sh_cmt: Convert to SPDX identifiers
clocksource/drivers/renesas-ostm: Convert to SPDX identifiers
clocksource: Convert to using %pOFn instead of device_node.name
tick/broadcast: Remove redundant check
RISC-V: Request newstat syscalls
y2038: signal: Change rt_sigtimedwait to use __kernel_timespec
y2038: socket: Change recvmmsg to use __kernel_timespec
y2038: sched: Change sched_rr_get_interval to use __kernel_timespec
y2038: utimes: Rework #ifdef guards for compat syscalls
...
Pull security subsystem updates from James Morris:
"In this patchset, there are a couple of minor updates, as well as some
reworking of the LSM initialization code from Kees Cook (these prepare
the way for ordered stackable LSMs, but are a valuable cleanup on
their own)"
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
LSM: Don't ignore initialization failures
LSM: Provide init debugging infrastructure
LSM: Record LSM name in struct lsm_info
LSM: Convert security_initcall() into DEFINE_LSM()
vmlinux.lds.h: Move LSM_TABLE into INIT_DATA
LSM: Convert from initcall to struct lsm_info
LSM: Remove initcall tracing
LSM: Rename .security_initcall section to .lsm_info
vmlinux.lds.h: Avoid copy/paste of security_init section
LSM: Correctly announce start of LSM initialization
security: fix LSM description location
keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h
seccomp: remove unnecessary unlikely()
security: tomoyo: Fix obsolete function
security/capabilities: remove check for -EINVAL
Pull locking and misc x86 updates from Ingo Molnar:
"Lots of changes in this cycle - in part because locking/core attracted
a number of related x86 low level work which was easier to handle in a
single tree:
- Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
McKenney, Andrea Parri)
- lockdep scalability improvements and micro-optimizations (Waiman
Long)
- rwsem improvements (Waiman Long)
- spinlock micro-optimization (Matthew Wilcox)
- qspinlocks: Provide a liveness guarantee (more fairness) on x86.
(Peter Zijlstra)
- Add support for relative references in jump tables on arm64, x86
and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)
- Be a lot less permissive on weird (kernel address) uaccess faults
on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
Horn)
- macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
Amit)
- ... and a handful of other smaller changes as well"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
locking/lockdep: Make global debug_locks* variables read-mostly
locking/lockdep: Fix debug_locks off performance problem
locking/pvqspinlock: Extend node size when pvqspinlock is configured
locking/qspinlock_stat: Count instances of nested lock slowpaths
locking/qspinlock, x86: Provide liveness guarantee
x86/asm: 'Simplify' GEN_*_RMWcc() macros
locking/qspinlock: Rework some comments
locking/qspinlock: Re-order code
locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
futex: Replace spin_is_locked() with lockdep
locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
x86/refcount: Work around GCC inlining bug
x86/objtool: Use asm macros to work around GCC inlining bugs
...
- mostly more consolidation of the direct mapping code, including
converting over hexagon, and merging the coherent and non-coherent
code into a single dma_map_ops instance (me)
- cleanups for the dma_configure/dma_unconfigure callchains (me)
- better handling of dma_masks in odd setups (me, Alexander Duyck)
- better debugging of passing vmalloc address to the DMA API
(Stephen Boyd)
- CMA command line parsing fix (He Zhe)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlvNg6YLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMm/Q/9FFVOH73Nc3rT40N2HdaPbzV2hXmI1//hEJcImDP5
mLGq8XqieGuo8Pmu9+xp1tC2UnfUkhK4FjhQbWM+qKER/RNYES2BD50xVFmt6ICS
9d8IaRcs+ceggljfdwszkkucJspBsYNxpiKjjao0OsHn6UDatu6elZs/yvb2nXci
HCJUvs9vYm9MkAtVXEtOQtij3YRaJ/9xYY4h5Dy5vBtHPp+kjUMF0mWAwA2+Ec1V
8iqKjUY3c8nr8Kf6WE9tzJ0wrMFijc4HJlE3W1ud8YsKdfCkCf8XiIuS6PgTzOeK
0cn9h8dVrV1ZXJ/D/9JZDivmYvIsoKWAYVQHNzAiq7PI3uOJY1ggCxyZpWtTHZhM
ATHF0sJGpIenkSWybYpKee8e8RsS7L9dUgu6bYpK5pVkirNYnR9IOGVJNmS63L7Q
B0uUtqjBKDG2yNGZGY9zqBQFgxiPO0wxFLeKyHbIsC0b7FBti3rXGAimch5WiBuL
zlDV0zEfMH0BW6gNPrjfFur84duKtGZ/0DBSxQ0E1Mvk8B1LBr78MgZt8OfJEuoe
dx1FYU70u8PYi+hjmn386YnNNMTjd1GT5XW7AWedM2wCjRYmNy0yMGmm9cACMneN
5eBv/SYr7X1zKNL7w7H6KQVZilTJcBoj3f/lmjL7i22m9FXYQpcUP61L8wHNM8H2
iJo=
=AVSD
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping
Pull dma mapping updates from Christoph Hellwig:
"First batch of dma-mapping changes for 4.20.
There will be a second PR as some big changes were only applied just
before the end of the merge window, and I want to give them a few more
days in linux-next.
Summary:
- mostly more consolidation of the direct mapping code, including
converting over hexagon, and merging the coherent and non-coherent
code into a single dma_map_ops instance (me)
- cleanups for the dma_configure/dma_unconfigure callchains (me)
- better handling of dma_masks in odd setups (me, Alexander Duyck)
- better debugging of passing vmalloc address to the DMA API (Stephen
Boyd)
- CMA command line parsing fix (He Zhe)"
* tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping: (27 commits)
dma-direct: respect DMA_ATTR_NO_WARN
dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN
dma-direct: document the zone selection logic
dma-debug: Check for drivers mapping invalid addresses in dma_map_single()
dma-direct: fix return value of dma_direct_supported
dma-mapping: move dma_default_get_required_mask under ifdef
dma-direct: always allow dma mask <= physiscal memory size
dma-direct: implement complete bus_dma_mask handling
dma-direct: refine dma_direct_alloc zone selection
dma-direct: add an explicit dma_direct_get_required_mask
dma-mapping: make the get_required_mask method available unconditionally
unicore32: remove swiotlb support
Revert "dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops"
dma-mapping: support non-coherent devices in dma_common_get_sgtable
dma-mapping: consolidate the dma mmap implementations
dma-mapping: merge direct and noncoherent ops
dma-mapping: move the dma_coherent flag to struct device
MIPS: don't select DMA_MAYBE_COHERENT from DMA_PERDEV_COHERENT
dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration
dma-mapping: fix panic caused by passing empty cma command line argument
...
- Core mmu_gather changes which allow tracking the levels of page-table
being cleared together with the arm64 low-level flushing routines
- Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
mitigate Spectre-v4 dynamically without trapping to EL3 firmware
- Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack
- Optimise emulation of MRS instructions to ID_* registers on ARMv8.4
- Support for Common Not Private (CnP) translations allowing threads of
the same CPU to share the TLB entries
- Accelerated crc32 routines
- Move swapper_pg_dir to the rodata section
- Trap WFI instruction executed in user space
- ARM erratum 1188874 workaround (arch_timer)
- Miscellaneous fixes and clean-ups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlvKGdEACgkQa9axLQDI
XvGSQBAAiOH6aQABL4TB7c5KIc7C+Unjm6QCFCoaeGWoHuemnM6cFJ7RQsi0GqnP
dVEX5V/FKfmeTWO5g24Ah+MbTm3Bt6+81gywAmi1rrHhmCaCIPjT7xDqy/WsLlvt
7WtgegSGvQ7DIMj2dbfFav6+ra67qAiYZTc46jvuynVl6DrE3BCiyTDbXAWt2nzP
Xf3un4AHRbg3UEMUZTLqU5q4z0tbM6rEAZru8O0UOTnD2q7uttUqW3Ab7fpuEkkj
lEVrMWD3h8SJg+Df9CbXmCNOjh4VhwBwDb5LgO8vA/AcyV/YLEF5b2OUAk/28qwo
0GBwjqRyI4+YQ9LPg41MhGzrlnta0HCdYoeNLgLQZiDcUkuSfGhoA+MNZNOR8B08
sCWF7F6f8UIQm8KMMBiYYdlVyUYgHLsWE/1+CyeLV0oIoWT5k3c+Xe3pho9KpVb0
Co04TqMlqalry0sbevHz5c55H7iWIjB1Tpo3SxM105dVJVibXRPXkz+WZ5iPO+xa
ex2j1kjNdA/AUzrSCZ5lh22zhg0WsfwD++E5meAaJMxieim8FeZDRga43rowJ0BA
zMbSNB/+NDFZ9EhC40VaUfKk8Tkgiug9J5swv0+v7hy1QLDyydHhbOecTuIueauM
6taiT2Iuov5yFng1eonYj4htvouVF4WOhPGthFPJMOcrB9mLMhs=
=3Mc8
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from some new arm64 features and clean-ups, this also contains
the core mmu_gather changes for tracking the levels of the page table
being cleared and a minor update to the generic
compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ.
Summary:
- Core mmu_gather changes which allow tracking the levels of
page-table being cleared together with the arm64 low-level flushing
routines
- Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
mitigate Spectre-v4 dynamically without trapping to EL3 firmware
- Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack
- Optimise emulation of MRS instructions to ID_* registers on ARMv8.4
- Support for Common Not Private (CnP) translations allowing threads
of the same CPU to share the TLB entries
- Accelerated crc32 routines
- Move swapper_pg_dir to the rodata section
- Trap WFI instruction executed in user space
- ARM erratum 1188874 workaround (arch_timer)
- Miscellaneous fixes and clean-ups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work
arm64: cpufeature: Trap CTR_EL0 access only where it is necessary
arm64: cpufeature: Fix handling of CTR_EL0.IDC field
arm64: cpufeature: ctr: Fix cpu capability check for late CPUs
Documentation/arm64: HugeTLB page implementation
arm64: mm: Use __pa_symbol() for set_swapper_pgd()
arm64: Add silicon-errata.txt entry for ARM erratum 1188873
Revert "arm64: uaccess: implement unsafe accessors"
arm64: mm: Drop the unused cpu parameter
MAINTAINERS: fix bad sdei paths
arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines
arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c
arm64: xen: Use existing helper to check interrupt status
arm64: Use daifflag_restore after bp_hardening
arm64: daifflags: Use irqflags functions for daifflags
arm64: arch_timer: avoid unused function warning
arm64: Trap WFI executed in userspace
arm64: docs: Document SSBS HWCAP
arm64: docs: Fix typos in ELF hwcaps
arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe
...
Export this routine so that we can use it later in devm_kstrdup_const()
and devm_kfree().
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enabling both CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y and
CONFIG_GCOV_PROFILE_ALL=y results in linker warnings:
warning: orphan section `.data..LPBX1' being placed in
section `.data..LPBX1'.
LD_DEAD_CODE_DATA_ELIMINATION adds compiler flag -fdata-sections. This
option causes GCC to create separate data sections for data objects,
including those generated by GCC internally for gcov profiling. The
names of these objects start with a dot (.LPBX0, .LPBX1), resulting in
section names starting with 'data..'.
As section names starting with 'data..' are used for specific purposes
in the Linux kernel, the linker script does not automatically include
them in the output data section, resulting in the "orphan section"
linker warnings.
Fix this by specifically including sections named "data..LPBX*" in the
data section.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Enabling CONFIG_GCOV_PROFILE_ALL=y causes linker errors on ARM:
`.text.exit' referenced in section `.ARM.exidx.text.exit':
defined in discarded section `.text.exit'
`.text.exit' referenced in section `.fini_array.00100':
defined in discarded section `.text.exit'
And related errors on NDS32:
`.text.exit' referenced in section `.dtors.65435':
defined in discarded section `.text.exit'
The gcov compiler flags cause certain compiler versions to generate
additional destructor-related sections that are not yet handled by the
linker script, resulting in references between discarded and
non-discarded sections.
Since destructors are not used in the Linux kernel, fix this by
discarding these additional sections.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Greentime Hu <green.hu@gmail.com>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Since the struct lsm_info table is not an initcall, we can just move it
into INIT_DATA like all the other tables.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
In preparation for switching from initcall to just a regular set of
pointers in a section, rename the internal section name.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Avoid copy/paste by defining SECURITY_INIT in terms of SECURITY_INITCALL.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
As described in:
77b0bf55bc: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
This patch increases the kernel size:
text data bss dec hex filename
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux before
18147336 10226688 2957312 31331336 1de1408 ./vmlinux after (+1755)
But enables more aggressive inlining (and probably better branch decisions).
The number of static text symbols in vmlinux is much lower:
Before: 40218
After: 40053 (-165)
The assembly code gets harder to read due to the extra macro layer.
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181003213100.189959-7-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both spin locks and write locks currently do:
f0 0f b1 17 lock cmpxchg %edx,(%rdi)
85 c0 test %eax,%eax
75 05 jne [slowpath]
This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
appropriately. Peter pointed out that using atomic_try_cmpxchg_acquire()
will let the compiler know this is true. Comparing before/after
disassemblies show the only effect is to remove this insn.
Take this opportunity to make the spin & write lock code resemble each
other more closely and have similar likely() hints.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Link: http://lkml.kernel.org/r/20180820162639.GC25153@bombadil.infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The __jump_table sections emitted into the core kernel and into
each module consist of statically initialized references into
other parts of the code, and with the exception of entries that
point into init code, which are defused at post-init time, these
data structures are never modified.
So let's move them into the ro_after_init section, to prevent them
from being corrupted inadvertently by buggy code, or deliberately
by an attacker.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: https://lkml.kernel.org/r/20180919065144.25010-9-ard.biesheuvel@linaro.org
All the cache maintainance is already stubbed out when not enabled,
but merging the two allows us to nicely handle the case where
cache maintainance is required for some devices, but not others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
The !CONFIG_GENERIC_IOMAP version of ioport_map uses MMIO_UPPER_LIMIT to
prevent users from making I/O accesses outside the expected I/O range -
however it erroneously treats MMIO_UPPER_LIMIT as a mask which is
contradictory to its other users.
The introduction of CONFIG_INDIRECT_PIO, which subtracts an arbitrary
amount from IO_SPACE_LIMIT to form MMIO_UPPER_LIMIT, results in ioport_map
mangling the given port rather than capping it.
We address this by aligning more closely with the CONFIG_GENERIC_IOMAP
implementation of ioport_map by using the comparison operator and
returning NULL where the port exceeds MMIO_UPPER_LIMIT. Though note that
we preserve the existing behavior of masking with IO_SPACE_LIMIT such that
we don't break existing buggy drivers that somehow rely on this masking.
Fixes: 5745392e0c ("PCI: Apply the new generic I/O management on PCI IO hosts")
Reported-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for maintaining the mmu_gather code as its own entity,
move the implementation out of memory.c and into its own file.
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It is common for architectures with hugepage support to require only a
single TLB invalidation operation per hugepage during unmap(), rather than
iterating through the mapping at a PAGE_SIZE increment. Currently,
however, the level in the page table where the unmap() operation occurs
is not stored in the mmu_gather structure, therefore forcing
architectures to issue additional TLB invalidation operations or to give
up and over-invalidate by e.g. invalidating the entire TLB.
Ideally, we could add an interval rbtree to the mmu_gather structure,
which would allow us to associate the correct mapping granule with the
various sub-mappings within the range being invalidated. However, this
is costly in terms of book-keeping and memory management, so instead we
approximate by keeping track of the page table levels that are cleared
and provide a means to query the smallest granule required for invalidation.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some architectures require different TLB invalidation instructions
depending on whether it is only the last-level of page table being
changed, or whether there are also changes to the intermediate
(directory) entries higher up the tree.
Add a new bit to the flags bitfield in struct mmu_gather so that the
architecture code can operate accordingly if it's the intermediate
levels being invalidated.
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The inner workings of the mmu_gather-based TLB invalidation mechanism
are not relevant to nommu configurations, so guard them with an #ifdef.
This allows us to implement future functions using static inlines
without breaking the build.
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Nothing is left in asm/unistd.h except for the redirect to
uapi/asm/unistd.h, so removing the file simply leads to that one being
used directly. The linux/export.h inclusion is a leftover from commit
e1b5bb6d12 ("consolidate cond_syscall and SYSCALL_ALIAS declarations")
and should not be used anyway.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The sys_llseek sytem call is needed on all 32-bit architectures and
none of the 64-bit ones, so we can remove the __ARCH_WANT_SYS_LLSEEK guard
and simplify the include/asm-generic/unistd.h header further.
Since 32-bit tasks can run either natively or in compat mode on 64-bit
architectures, we have to check for both !CONFIG_64BIT and CONFIG_COMPAT.
There are a few 64-bit architectures that also reference sys_llseek
in their 64-bit ABI (e.g. sparc), but I verified that those all
select CONFIG_COMPAT, so the #if check is still correct here. It's
a bit odd to include it in the syscall table though, as it's the
same as sys_lseek() on 64-bit, but with strange calling conventions.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
While converting compat system call handlers to work on 32-bit
architectures, I found a number of types used in those handlers
that are identical between all architectures.
Let's move all the identical ones into asm-generic/compat.h to avoid
having to add even more identical definitions of those types.
For unknown reasons, mips defines __compat_gid32_t, __compat_uid32_t
and compat_caddr_t as signed, while all others have them unsigned.
This seems to be a mistake, but I'm leaving it alone here. The other
types all differ by size or alignment on at least on architecture.
compat_aio_context_t is currently defined in linux/compat.h but
also needed for compat_sys_io_getevents(), so let's move it into
the same place.
While we still have not decided whether the 32-bit time handling
will always use the compat syscalls, or in which form, I think this
is a useful cleanup that we can merge regardless.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
New architectures should no longer need stat64, which is not y2038
safe and has been replaced by statx(). This removes the 'select
__ARCH_WANT_STAT64' statement from asm-generic/unistd.h and instead
moves it into the respective asm/unistd.h UAPI header files for each
architecture that uses it today.
In the generic file, the system call number and entry points are now
made conditional, so newly added architectures (e.g. riscv32 or csky)
will never need to carry backwards compatiblity for it.
arm64 is the only 64-bit architecture using the asm-generic/unistd.h
file, and it already sets __ARCH_WANT_NEW_STAT in its headers, and I
use the same #ifdef here: future 64-bit architectures therefore won't
see newstat or stat64 any more. They don't suffer from the y2038 time_t
overflow, but for consistency it seems best to also let them use statx().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
- add build_{menu,n,g,x}config targets for compile-testing Kconfig
- fix and improve recursive dependency detection in Kconfig
- fix parallel building of menuconfig/nconfig
- fix syntax error in clang-version.sh
- suppress distracting log from syncconfig
- remove obsolete "rpm" target
- remove VMLINUX_SYMBOL(_STR) macro entirely
- fix microblaze build with CONFIG_DYNAMIC_FTRACE
- move compiler test for dead code/data elimination to Kconfig
- rename well-known LDFLAGS variable to KBUILD_LDFLAGS
- misc fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbgYhCAAoJED2LAQed4NsGErAP/jt7gt76+N0PZmADBZqyVR/H
4k286g3OiT7DIcdvwqE5BRvu+zNOamDujnnXw63/jwu2RjrkLX/JnhzTbC0IZleZ
KeO4bU4ZH0WFa0Ny9pp0LAnzbXGMnQjDXygcUd5BFoEd5JSLKW2PISEEjRh6b5B7
swJRdgySFaMrUBRNf13FwH5EvX/D0xZQe/wFhFCOv6L4gJZFMmpGUIepgTjTUmxZ
wcNN6xxXg+ulLHVcPdPQ9EYssNHN5xNys02+IdIrhhXuNHji/TFm4dGYuU+dDGeE
Eu4O6Qs7pg0PFGrZ5gLxXDJEp75W+uaTNOqV+jcjq8MRxJuWxyy2biUeelKRT/KH
0iv4ZQJVOMOhl8fZgLtQaXHyQ++5uwd6kvPPf+XFdkogGAIXK0wKWLoALFEOXwb6
z1BBnFx09LrKPGt0ZlKX624OEczedv/UAFiSh3Ic2S3PFEpq4oHrEGhTnyKRobPv
OEcF3RqKjmAdK7PLy4kVpTLhkutkWWhw6Giy9qXUkXYJWonJR7NTQ1mIan2LoGZC
sGi+qKae/8xgO2Nerx59tZpkiHYTMfYeAo8frzWurOxm3YzEfaxNNGPl+IMW7VKz
cNPzQZ5tMUy4i4PAhk/gIWibnUTPfjDbWsZSMtIbO0GFcao56EvllwD8/awuy7lO
QkaAeZHFcF+qgU3muaYK
=Vsb2
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- add build_{menu,n,g,x}config targets for compile-testing Kconfig
- fix and improve recursive dependency detection in Kconfig
- fix parallel building of menuconfig/nconfig
- fix syntax error in clang-version.sh
- suppress distracting log from syncconfig
- remove obsolete "rpm" target
- remove VMLINUX_SYMBOL(_STR) macro entirely
- fix microblaze build with CONFIG_DYNAMIC_FTRACE
- move compiler test for dead code/data elimination to Kconfig
- rename well-known LDFLAGS variable to KBUILD_LDFLAGS
- misc fixes and cleanups
* tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: rename LDFLAGS to KBUILD_LDFLAGS
kbuild: pass LDFLAGS to recordmcount.pl
kbuild: test dead code/data elimination support in Kconfig
initramfs: move gen_initramfs_list.sh from scripts/ to usr/
vmlinux.lds.h: remove stale <linux/export.h> include
export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR()
Coccinelle: remove pci_alloc_consistent semantic to detect in zalloc-simple.cocci
kbuild: make sorting initramfs contents independent of locale
kbuild: remove "rpm" target, which is alias of "rpm-pkg"
kbuild: Fix LOADLIBES rename in Documentation/kbuild/makefiles.txt
kconfig: suppress "configuration written to .config" for syncconfig
kconfig: fix "Can't open ..." in parallel build
kbuild: Add a space after `!` to prevent parsing as file pattern
scripts: modpost: check memory allocation results
kconfig: improve the recursive dependency report
kconfig: report recursive dependency involving 'imply'
kconfig: error out when seeing recursive dependency
kconfig: add build-only configurator targets
scripts/dtc: consolidate include path options in Makefile
The generic tlb_end_vma does not call invalidate_range mmu notifier, and
it resets resets the mmu_gather range, which means the notifier won't be
called on part of the range in case of an unmap that spans multiple
vmas.
ARM64 seems to be the only arch I could see that has notifiers and uses
the generic tlb_end_vma. I have not actually tested it.
[ Catalin and Will point out that ARM64 currently only uses the
notifiers for KVM, which doesn't use the ->invalidate_range()
callback right now, so it's a bug, but one that happens to
not affect them. So not necessary for stable. - Linus ]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commits:
95b0e6357d x86/mm/tlb: Always use lazy TLB mode
64482aafe5 x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
ac03158969 x86/mm/tlb: Make lazy TLB mode lazier
61d0beb579 x86/mm/tlb: Restructure switch_mm_irqs_off()
2ff6ddf19c x86/mm/tlb: Leave lazy TLB mode at page table free time
In order to simplify the TLB invalidate fixes for x86 and unify the
parts that need backporting. We'll try again later.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab entries,
each consisting of two 64-bit fields containing absolute references, to
the symbol itself and to a char array containing its name, respectively.
When we build the same configuration with KASLR enabled, we end up with an
additional ~192 KB of relocations in the .init section, i.e., one 24 byte
entry for each absolute reference, which all need to be processed at boot
time.
Given how the struct kernel_symbol that describes each entry is completely
local to module.c (except for the references emitted by EXPORT_SYMBOL()
itself), we can easily modify it to contain two 32-bit relative references
instead. This reduces the size of the __ksymtab section by 50% for all
64-bit architectures, and gets rid of the runtime relocations entirely for
architectures implementing KASLR, either via standard PIE linking (arm64)
or using custom host tools (x86).
Note that the binary search involving __ksymtab contents relies on each
section being sorted by symbol name. This is implemented based on the
input section names, not the names in the ksymtab entries, so this patch
does not interfere with that.
Given that the use of place-relative relocations requires support both in
the toolchain and in the module loader, we cannot enable this feature for
all architectures. So make it dependent on whether
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.
Link: http://lkml.kernel.org/r/20180704083651.24360-4-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Explicitly state that WARN*() should be used only for recoverable kernel
issues/bugs and that it should not be used for any kind of invalid
external inputs or transient conditions.
Motivation: it's a very useful capability to be able to understand if a
particular kernel splat means a kernel bug or simply an invalid user-space
program. For the former one wants to notify kernel developers, while
notifying kernel developers for the latter is annoying. Even a kernel
developer may not know what to do with a WARNING in an unfamiliar
subsystem. This is especially critical for any automated testing systems
that may use panic_on_warn and mail kernel developers.
The clear separation also serves as an additional documentation: is it a
condition that must never occur because of additional checks/logic
elsewhere? or is it simply a check for invalid inputs or unfortunate
conditions?
Use of pr_err() for user messages also leads to better error messages.
"Something is wrong in file foo on line X" is not particularly useful
message for end user. pr_err() forces developers to write more meaningful
error messages for user.
As of now we are almost there. We are doing systematic kernel testing
with panic_on_warn and are not seeing massive amounts of false positives.
But every now and then another WARN on ENOMEM or invalid inputs pops up
and leads to a lengthy argument each time. The goal of this change is to
officially document the rules.
Link: http://lkml.kernel.org/r/20180620103716.61636-1-dvyukov@gmail.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is unneeded since commit a621438500 ("vmlinux.lds.h: remove
no-op macro VMLINUX_SYMBOL()").
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Here are all of the driver core and related patches for 4.19-rc1.
Nothing huge here, just a number of small cleanups and the ability to
now stop the deferred probing after init happens.
All of these have been in linux-next for a while with only a merge issue
reported. That merge issue is in fs/sysfs/group.c and Stephen has
posted the diff of what it should be to resolve this. I'll follow up
with that diff to this pull request.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW3g86Q8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynyXQCePaZSW8wft4b7nLN8RdZ98ATBru0Ani10lrJa
HQeQJRNbWU1AZ0ym7695
=tOaH
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here are all of the driver core and related patches for 4.19-rc1.
Nothing huge here, just a number of small cleanups and the ability to
now stop the deferred probing after init happens.
All of these have been in linux-next for a while with only a merge
issue reported"
* tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits)
base: core: Remove WARN_ON from link dependencies check
drivers/base: stop new probing during shutdown
drivers: core: Remove glue dirs from sysfs earlier
driver core: remove unnecessary function extern declare
sysfs.h: fix non-kernel-doc comment
PM / Domains: Stop deferring probe at the end of initcall
iommu: Remove IOMMU_OF_DECLARE
iommu: Stop deferring probe at end of initcalls
pinctrl: Support stopping deferred probe after initcalls
dt-bindings: pinctrl: add a 'pinctrl-use-default' property
driver core: allow stopping deferred probe after init
driver core: add a debugfs entry to show deferred devices
sysfs: Fix internal_create_group() for named group updates
base: fix order of OF initialization
linux/device.h: fix kernel-doc notation warning
Documentation: update firmware loader fallback reference
kobject: Replace strncpy with memcpy
drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number
kernfs: Replace strncpy with memcpy
device: Add #define dev_fmt similar to #define pr_fmt
...
Some architectures just don't have PAGE_KERNEL_EXEC. The mm/nommu.c and
mm/vmalloc.c code have been using PAGE_KERNEL as a fallback for years.
Move this fallback to asm-generic.
Link: http://lkml.kernel.org/r/20180510185507.2439-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some architectures do not define certain PAGE_KERNEL_* flags, this is
either because:
a) The way to implement some of these flags is *not yet ported*, or
b) The architecture *has no way* to describe them
Over time we have accumulated a few PAGE_KERNEL_* fallback workarounds
for architectures in the kernel which do not define them using
*relatively safe* equivalents. Move these scattered fallback hacks into
asm-generic.
We start off with PAGE_KERNEL_RO using PAGE_KERNEL as a fallback. This
has been in place on the firmware loader for years. Move the fallback
into the respective asm-generic header.
Link: http://lkml.kernel.org/r/20180510185507.2439-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge L1 Terminal Fault fixes from Thomas Gleixner:
"L1TF, aka L1 Terminal Fault, is yet another speculative hardware
engineering trainwreck. It's a hardware vulnerability which allows
unprivileged speculative access to data which is available in the
Level 1 Data Cache when the page table entry controlling the virtual
address, which is used for the access, has the Present bit cleared or
other reserved bits set.
If an instruction accesses a virtual address for which the relevant
page table entry (PTE) has the Present bit cleared or other reserved
bits set, then speculative execution ignores the invalid PTE and loads
the referenced data if it is present in the Level 1 Data Cache, as if
the page referenced by the address bits in the PTE was still present
and accessible.
While this is a purely speculative mechanism and the instruction will
raise a page fault when it is retired eventually, the pure act of
loading the data and making it available to other speculative
instructions opens up the opportunity for side channel attacks to
unprivileged malicious code, similar to the Meltdown attack.
While Meltdown breaks the user space to kernel space protection, L1TF
allows to attack any physical memory address in the system and the
attack works across all protection domains. It allows an attack of SGX
and also works from inside virtual machines because the speculation
bypasses the extended page table (EPT) protection mechanism.
The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646
The mitigations provided by this pull request include:
- Host side protection by inverting the upper address bits of a non
present page table entry so the entry points to uncacheable memory.
- Hypervisor protection by flushing L1 Data Cache on VMENTER.
- SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
by offlining the sibling CPU threads. The knobs are available on
the kernel command line and at runtime via sysfs
- Control knobs for the hypervisor mitigation, related to L1D flush
and SMT control. The knobs are available on the kernel command line
and at runtime via sysfs
- Extensive documentation about L1TF including various degrees of
mitigations.
Thanks to all people who have contributed to this in various ways -
patches, review, testing, backporting - and the fruitful, sometimes
heated, but at the end constructive discussions.
There is work in progress to provide other forms of mitigations, which
might be less horrible performance wise for a particular kind of
workloads, but this is not yet ready for consumption due to their
complexity and limitations"
* 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
x86/microcode: Allow late microcode loading with SMT disabled
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
x86/mm/kmmio: Make the tracer robust against L1TF
x86/mm/pat: Make set_memory_np() L1TF safe
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/speculation/l1tf: Invert all not present mappings
cpu/hotplug: Fix SMT supported evaluation
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Documentation/l1tf: Remove Yonah processors from not vulnerable list
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
x86: Don't include linux/irq.h from asm/hardirq.h
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
cpu/hotplug: detect SMT disabled by BIOS
...
Pull x86 mm updates from Thomas Gleixner:
- Make lazy TLB mode even lazier to avoid pointless switch_mm()
operations, which reduces CPU load by 1-2% for memcache workloads
- Small cleanups and improvements all over the place
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove redundant check for kmem_cache_create()
arm/asm/tlb.h: Fix build error implicit func declaration
x86/mm/tlb: Make clear_asid_other() static
x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off()
x86/mm/tlb: Always use lazy TLB mode
x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
x86/mm/tlb: Make lazy TLB mode lazier
x86/mm/tlb: Restructure switch_mm_irqs_off()
x86/mm/tlb: Leave lazy TLB mode at page table free time
mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids
x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap: Update pgtable free interfaces with addr
x86/mm: Disable ioremap free page handling on x86-PAE
We currently don't instrument cmpxchg_double() and
cmpxchg_double_local() due to compilation issues reported in the past,
which are supposedly related to GCC bug 72873 [1], reported when GCC 7
was not yet released. This bug only applies to x86-64, and does not
apply to other architectures.
While the test case for GCC bug 72873 triggers issues with released
versions of GCC, the instrumented kernel code compiles fine for all
configurations I have tried, and it is unclear how the two cases
are/were related.
As we can't reproduce the kernel build failures, let's instrument
cmpxchg_double*() again. We can revisit the issue if build failures
reappear.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andy.shevchenko@gmail.com
Cc: aryabinin@virtuozzo.com
Cc: catalin.marinas@arm.com
Cc: glider@google.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: parri.andrea@gmail.com
Cc: peter@hurleysoftware.com
Link: http://lkml.kernel.org/r/20180716113017.3909-6-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently we define some fairly verbose wrappers for the cmpxchg()
family so that we can pass a pointer and size into kasan_check_write().
The wrappers duplicate the size-switching logic necessary in arch code,
and only work for scalar types. On some architectures, (cmp)xchg are
used on non-scalar types, and thus the instrumented wrappers need to be
able to handle this.
We could take the type-punning logic from {READ,WRITE}_ONCE(), but this
makes the wrappers even more verbose, and requires several local
variables in the macros.
Instead, let's simplify the wrappers into simple macros which:
* snapshot the pointer into a single local variable, called __ai_ptr to
avoid conflicts with variables in the scope of the caller.
* call kasan_check_write() on __ai_ptr.
* invoke the relevant arch_*() function, passing the original arguments,
bar __ai_ptr being substituted for ptr.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andy.shevchenko@gmail.com
Cc: arnd@arndb.de
Cc: aryabinin@virtuozzo.com
Cc: catalin.marinas@arm.com
Cc: glider@google.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: parri.andrea@gmail.com
Cc: peter@hurleysoftware.com
Link: http://lkml.kernel.org/r/20180716113017.3909-4-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy discovered that speculative memory accesses while in lazy
TLB mode can crash a system, when a CPU tries to dereference a
speculative access using memory contents that used to be valid
page table memory, but have since been reused for something else
and point into la-la land.
The latter problem can be prevented in two ways. The first is to
always send a TLB shootdown IPI to CPUs in lazy TLB mode, while
the second one is to only send the TLB shootdown at page table
freeing time.
The second should result in fewer IPIs, since operationgs like
mprotect and madvise are very common with some workloads, but
do not involve page table freeing. Also, on munmap, batching
of page table freeing covers much larger ranges of virtual
memory than the batching of unmapped user pages.
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-3-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the
!__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses
assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g.
ia64) and breaks build:
include/asm-generic/pgtable.h: Assembler messages:
include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle
Move those two static inlines into the !__ASSEMBLY__ section so that they
don't confuse the asm build pass.
Fixes: 42e4089c78 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The mmu_gather APIs keep track of the invalidated address range
including the span covered by invalidated page table pages. Ranges
covered by page tables but not ptes (and therefore no TLBs) still need
to be invalidated because some architectures (x86) can cache
intermediate page table entries, and invalidate those with normal TLB
invalidation instructions to be almost-backward-compatible.
Architectures which don't cache intermediate page table entries, or
which invalidate these caches separately from TLB invalidation, do not
require TLB invalidation range expanded over page tables.
Allow architectures to supply their own p??_free_tlb functions, which
can avoid the __tlb_adjust_range.
Link: http://lkml.kernel.org/r/20180703013131.2807-1-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we use the driver core to stop deferred probe for missing
drivers, IOMMU_OF_DECLARE can be removed.
This is slightly less optimal than having a list of built-in drivers in
that we'll now defer probe twice before giving up. This shouldn't have a
significant impact on boot times as past discussions about deferred
probe have given no evidence of deferred probe having a substantial
impact.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-arm-msm@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Cc: devicetree@vger.kernel.org
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.
1. ioremap with 4K size, a valid pte page table is set.
2. iounmap it, its pte entry is set to 0.
3. ioremap the same address with 2M size, update its pmd entry with
a new value.
4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.
Commit b6bdb7517c ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.
To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.
Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.
[toshi.kani@hpe.com: merge changes, rewrite patch description]
Fixes: 28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: mhocko@suse.com
Cc: akpm@linux-foundation.org
Cc: hpa@zytor.com
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Will Deacon <will.deacon@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com
One of my tests compiles the kernel with gcc 4.5.3, and I hit the
following build error:
include/linux/semaphore.h: In function 'sema_init':
include/linux/semaphore.h:35:17: error: unknown field 'val' specified in initializer
include/linux/semaphore.h:35:17: warning: missing braces around initializer
include/linux/semaphore.h:35:17: warning: (near initialization for '(anonymous).raw_lock.<anonymous>.val')
I bisected it down to:
625e88be1f ("locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock'")
... which makes qspinlock have an anonymous union, which makes initializing it special
for older compilers. By adding strategic brackets, it makes the build
happy again.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 625e88be1f ("locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock'")
Link: http://lkml.kernel.org/r/20180621203526.172ab5c4@vmware.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The conditional inc/dec ops differ for atomic_t and atomic64_t:
- atomic_inc_unless_positive() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_unless_negative() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_if_positive is optional for atomic_t, and is mandatory for atomic64_t.
Let's make these consistently optional for both. At the same time, let's
clean up the existing fallbacks to use atomic_try_cmpxchg().
The instrumented atomics are updated accordingly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-18-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many of the inc/dec ops are mandatory, but for most architectures inc/dec are
simply trivial wrappers around their corresponding add/sub ops.
Let's make all the inc/dec ops optional, so that we can get rid of these
boilerplate wrappers.
The instrumented atomics are updated accordingly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-17-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some of the atomics return the result of a test applied after the atomic
operation, and almost all architectures implement these as trivial
wrappers around the underlying atomic. Specifically:
* <atomic>_inc_and_test(v) is (<atomic>_inc_return(v) == 0)
* <atomic>_dec_and_test(v) is (<atomic>_dec_return(v) == 0)
* <atomic>_sub_and_test(i, v) is (<atomic>_sub_return(i, v) == 0)
* <atomic>_add_negative(i, v) is (<atomic>_add_return(i, v) < 0)
Rather than have these definitions duplicated in all architectures, with
minor inconsistencies in formatting and documentation, let's make these
operations optional, with default fallbacks as above. Implementations
must now provide a preprocessor symbol.
The instrumented atomics are updated accordingly.
Both x86 and m68k have custom implementations, which are left as-is,
given preprocessor symbols to avoid being overridden.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-16-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Architectures with atomic64_fetch_add_unless() provide a preprocessor
symbol if they do so, and all other architectures have trivial C
implementations of atomic64_add_unless() which are near-identical.
Let's unify the trivial definitions of atomic64_fetch_add_unless() in
<linux/atomic.h>, so that we always have both
atomic64_fetch_add_unless() and atomic64_add_unless() with less
boilerplate code.
This means that atomic64_add_unless() is always implemented in core
code, and the instrumented atomics are updated accordingly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-15-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As a step towards unifying the atomic/atomic64/atomic_long APIs, this
patch converts the generic implementation of atomic64_add_unless() into
a generic implementation of atomic64_fetch_add_unless().
A wrapper in <linux/atomic.h> will build atomic_add_unless() atop of
this, provided it is given a preprocessor definition.
No functional change is intended as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-9-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently all architectures must implement atomic_fetch_add_unless(),
with common code providing atomic_add_unless(). Architectures must also
implement atomic64_add_unless() directly, with no corresponding
atomic64_fetch_add_unless().
This divergence is unfortunate, and means that the APIs for atomic_t,
atomic64_t, and atomic_long_t differ.
In preparation for unifying things, with architectures providing
atomic64_fetch_add_unless, this patch adds a generic
atomic64_add_unless() which will use atomic64_fetch_add_unless(). The
instrumented atomics are updated to take this case into account.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Albert Ou <albert@sifive.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Link: https://lore.kernel.org/lkml/20180621121321.4761-8-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Several architectures these have a near-identical implementation based
on atomic_read() and atomic_cmpxchg() which we can instead define in
<linux/atomic.h>, so let's do so, using something close to the existing
x86 implementation with try_cmpxchg().
Where an architecture provides its own atomic_fetch_add_unless(), it
must define a preprocessor symbol for it. The instrumented atomics are
updated accordingly.
Note that arch/arc's existing atomic_fetch_add_unless() had redundant
barriers, as these are already present in its atomic_cmpxchg()
implementation.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Link: https://lore.kernel.org/lkml/20180621121321.4761-7-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We define a trivial fallback for atomic_inc_not_zero(), but don't do
the same for atomic64_inc_not_zero(), leading most architectures to
define the same boilerplate.
Let's add a fallback in <linux/atomic.h>, and remove the redundant
implementations. Note that atomic64_add_unless() is always defined in
<linux/atomic.h>, and promotes its arguments to the requisite types, so
we need not do this explicitly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-6-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some of the atomics return a status value, which is a boolean value
describing whether the operation was performed. To make it clear that
this is a boolean value, let's update the common fallbacks to return
bool, fixing up the return values and comments likewise.
At the same time, let's simplify the description of the operations in
their respective comments.
The instrumented atomics and generic atomic64 implementation are updated
accordingly.
Note that atomic64_dec_if_positive() doesn't follow the usual test op
pattern, and returns the would-be decremented value. This is not
changed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-5-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While __atomic_add_unless() was originally intended as a building-block
for atomic_add_unless(), it's now used in a number of places around the
kernel. It's the only common atomic operation named __atomic*(), rather
than atomic_*(), and for consistency it would be better named
atomic_fetch_add_unless().
This lack of consistency is slightly confusing, and gets in the way of
scripting atomics. Given that, let's clean things up and promote it to
an official part of the atomics API, in the form of
atomic_fetch_add_unless().
This patch converts definitions and invocations over to the new name,
including the instrumented version, using the following script:
----
git grep -w __atomic_add_unless | while read line; do
sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}";
done
git grep -w __arch_atomic_add_unless | while read line; do
sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}";
done
----
Note that we do not have atomic{64,_long}_fetch_add_unless(), which will
be introduced by later patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The lock bitops can be implemented more efficiently using the atomic_fetch_*()
ops, which provide finer-grained control over the memory ordering semantics
than the bitops.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: yamada.masahiro@socionext.com
Link: https://lore.kernel.org/lkml/1529412794-17720-8-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
table entry. This sets the high bits in the CPU's address space, thus
making sure to point to not point an unmapped entry to valid cached memory.
Some server system BIOSes put the MMIO mappings high up in the physical
address space. If such an high mapping was mapped to unprivileged users
they could attack low memory by setting such a mapping to PROT_NONE. This
could happen through a special device driver which is not access
protected. Normal /dev/mem is of course access protected.
To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.
Valid page mappings are allowed because the system is then unsafe anyways.
It's not expected that users commonly use PROT_NONE on MMIO. But to
minimize any impact this is only enforced if the mapping actually refers to
a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
the check for root.
For mmaps this is straight forward and can be handled in vm_insert_pfn and
in remap_pfn_range().
For mprotect it's a bit trickier. At the point where the actual PTEs are
accessed a lot of state has been changed and it would be difficult to undo
on an error. Since this is a uncommon case use a separate early page talk
walk pass for MMIO PROT_NONE mappings that checks for this condition
early. For non MMIO and non PROT_NONE there are no changes.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
<uapi/asm-generic/int-ll64.h> has the same typedefs except that it
prefixes them with double-underscore for user space. Use them for
the kernel space typedefs.
Link: http://lkml.kernel.org/r/1526350925-14922-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Lihao Liang <lianglihao@huawei.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- improve fixdep to coalesce consecutive slashes in dep-files
- fix some issues of the maintainer string generation in deb-pkg script
- remove unused CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX and clean-up
several tools and linker scripts
- clean-up modpost
- allow to enable the dead code/data elimination for PowerPC in EXPERT mode
- improve two coccinelle scripts for better performance
- pass endianness and machine size flags to sparse for all architecture
- misc fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbF/yvAAoJED2LAQed4NsGEPgP/2qBg7w4raGvQtblqGY1qo6j
3xGKYUKdg3GhIRf1zB9lPwkAmQcyLKzKlet/gYoTUTLKbfRUX8wDzJf/3TV0kpLW
QQ2HM1/jsqrD1HSO21OPJ1rzMSNn1NcOSLWSeOLWUBorHkkvAHlenJcJSOo6szJr
tTgEN78T/9id/artkFqdG+1Q3JhnI5FfH3u0lE20Eqxk5AAxrUKArHYsgRjgOg9o
8DlHDTRsnTiUd4TtmC+VYSZK1BHz1ORlANaRiL69T+BGFZGNCvRSV09QkaD+ObxT
dB4TTJne32Qg6g5qYX0bzLqfRdfJ8tpmJGQkycf3OT1rLgmDbWFaaOEDQTAe3mSw
nT6ZbpQB1OoTgMD2An9ApWfUQRfsMnujm/pRP+BkRdKKkMJvXJCH7PvFw8rjqTt3
PjK6DGbpG6H0G+DePtthMHrz/TU6wi5MFf7kQxl0AtFmpa3R0q67VhdM04BEYNCq
Dbs1YaXWKKi101k14oSQ0kmRasZ9Jz5tvyfZ7wvy1LpGONXxtEbc6JQyBJ6tmf4f
fCAxvHLSb/TQSmJhk9Rch7uPYT9B9hC16dseMrF9Pab8yR346fz70L1UdFE10j3q
iKFbYkueq8uJCJDxNktsgHzbOF6Le5vaWauOafRN26K7p7+CRpVOy0O2bknX3yDa
hKOGzCfQjT8sfdMmtyIH
=2LYT
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- improve fixdep to coalesce consecutive slashes in dep-files
- fix some issues of the maintainer string generation in deb-pkg script
- remove unused CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX and clean-up
several tools and linker scripts
- clean-up modpost
- allow to enable the dead code/data elimination for PowerPC in EXPERT
mode
- improve two coccinelle scripts for better performance
- pass endianness and machine size flags to sparse for all architecture
- misc fixes
* tag 'kbuild-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
kbuild: add machine size to CHECKFLAGS
kbuild: add endianness flag to CHEKCFLAGS
kbuild: $(CHECK) doesnt need NOSTDINC_FLAGS twice
scripts: Fixed printf format mismatch
scripts/tags.sh: use `find` for $ALLSOURCE_ARCHS generation
coccinelle: deref_null: improve performance
coccinelle: mini_lock: improve performance
powerpc: Allow LD_DEAD_CODE_DATA_ELIMINATION to be selected
kbuild: Allow LD_DEAD_CODE_DATA_ELIMINATION to be selectable if enabled
kbuild: LD_DEAD_CODE_DATA_ELIMINATION no -ffunction-sections/-fdata-sections for module build
kbuild: Fix asm-generic/vmlinux.lds.h for LD_DEAD_CODE_DATA_ELIMINATION
modpost: constify *modname function argument where possible
modpost: remove redundant is_vmlinux() test
modpost: use strstarts() helper more widely
modpost: pass struct elf_info pointer to get_modinfo()
checkpatch: remove VMLINUX_SYMBOL() check
vmlinux.lds.h: remove no-op macro VMLINUX_SYMBOL()
kbuild: remove CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX
export.h: remove code for prefixing symbols with underscore
depmod.sh: remove symbol prefix support
...
Pull timers and timekeeping updates from Thomas Gleixner:
- Core infrastucture work for Y2038 to address the COMPAT interfaces:
+ Add a new Y2038 safe __kernel_timespec and use it in the core
code
+ Introduce config switches which allow to control the various
compat mechanisms
+ Use the new config switch in the posix timer code to control the
32bit compat syscall implementation.
- Prevent bogus selection of CPU local clocksources which causes an
endless reselection loop
- Remove the extra kthread in the clocksource code which has no value
and just adds another level of indirection
- The usual bunch of trivial updates, cleanups and fixlets all over the
place
- More SPDX conversions
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
clocksource/drivers/mxs_timer: Switch to SPDX identifier
clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Remove outdated file path
clocksource/drivers/arc_timer: Add comments about locking while read GFRC
clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages
clocksource/drivers/sprd: Fix Kconfig dependency
clocksource: Move inline keyword to the beginning of function declarations
timer_list: Remove unused function pointer typedef
timers: Adjust a kernel-doc comment
tick: Prefer a lower rating device only if it's CPU local device
clocksource: Remove kthread
time: Change nanosleep to safe __kernel_* types
time: Change types to new y2038 safe __kernel_* types
time: Fix get_timespec64() for y2038 safe compat interfaces
time: Add new y2038 safe __kernel_timespec
posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_64BIT_TIME in architectures
compat: Enable compat_get/put_timespec64 always
...
Pull locking updates from Ingo Molnar:
- Lots of tidying up changes all across the map for Linux's formal
memory/locking-model tooling, by Alan Stern, Akira Yokosawa, Andrea
Parri, Paul E. McKenney and SeongJae Park.
Notable changes beyond an overall update in the tooling itself is the
tidying up of spin_is_locked() semantics, which spills over into the
kernel proper as well.
- qspinlock improvements: the locking algorithm now guarantees forward
progress whereas the previous implementation in mainline could starve
threads indefinitely in cmpxchg() loops. Also other related cleanups
to the qspinlock code (Will Deacon)
- misc smaller improvements, cleanups and fixes all across the locking
subsystem
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
locking/rwsem: Simplify the is-owner-spinnable checks
tools/memory-model: Add reference for 'Simplifying ARM concurrency'
tools/memory-model: Update ASPLOS information
MAINTAINERS, tools/memory-model: Update e-mail address for Andrea Parri
tools/memory-model: Fix coding style in 'lock.cat'
tools/memory-model: Remove out-of-date comments and code from lock.cat
tools/memory-model: Improve mixed-access checking in lock.cat
tools/memory-model: Improve comments in lock.cat
tools/memory-model: Remove duplicated code from lock.cat
tools/memory-model: Flag "cumulativity" and "propagation" tests
tools/memory-model: Add model support for spin_is_locked()
tools/memory-model: Add scripts to test memory model
tools/memory-model: Fix coding style in 'linux-kernel.def'
tools/memory-model: Model 'smp_store_mb()'
tools/memory-order: Update the cheat-sheet to show that smp_mb__after_atomic() orders later RMW operations
tools/memory-order: Improve key for SELF and SV
tools/memory-model: Fix cheat sheet typo
tools/memory-model: Update required version of herdtools7
tools/memory-model: Redefine rb in terms of rcu-fence
tools/memory-model: Rename link and rcu-path to rcu-link and rb
...
Add a new dma_map_ops implementation that uses dma-direct for the
address mapping of streaming mappings, and which requires arch-specific
implemenations of coherent allocate/free.
Architectures have to provide flushing helpers to ownership trasnfers
to the device and/or CPU, and can provide optional implementations of
the coherent mmap functionality, and the cache_flush routines for
non-coherent long term allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
KEEP more tables, and add the function/data section wildcard to more
section selections.
This is a little ad-hoc at the moment, but kernel code should be moved
to consistently use .text..x (note: double dots) for explicit sections
and all references to it in the linker script can be made with
TEXT_MAIN, and similarly for other sections.
For now, let's see if major architectures move to enabling this option
then we can do some refactoring passes. Otherwise if it remains unused
or superseded by LTO, this may not be required.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Now that VMLINUX_SYMBOL() is no-op, clean up the linker script.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX was selected by BLACKFIN, METAG.
They were removed by commit 4ba66a9760 ("arch: remove blackfin port"),
commit bb6fb6dfcc ("metag: Remove arch/metag/"), respectively.
No more architecture enables CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
Clean up the export.h headers. I am keeping VMLINUX_SYMBOL() and
VMLINUX_SYMBOL_STR() because they are widely used.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads. Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv)
Reviewed-by: Jens Axboe <axboe@kernel.dk>
A qspinlock can be unlocked simply by writing zero to the locked byte.
This can be implemented in the generic code, so do that and remove the
arch-specific override for x86 in the !PV case.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-11-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'struct __qspinlock' provides a handy union of fields so that
subcomponents of the lockword can be accessed by name, without having to
manage shifts and masks explicitly and take endianness into account.
This is useful in qspinlock.h and also potentially in arch headers, so
move the 'struct __qspinlock' into 'struct qspinlock' and kill the extra
definition.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Whilst we currently provide smp_cond_load_acquire() and
atomic_cond_read_acquire(), there are cases where the ACQUIRE semantics are
not required because of a subsequent fence or release operation once the
conditional loop has exited.
This patch adds relaxed versions of the conditional spinning primitives
to avoid unnecessary barrier overhead on architectures such as arm64.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 99492c39f3 ("earlycon: Fix __earlycon_table stride") tried to fix
__earlycon_table stride by forcing the earlycon_id struct alignment to 32
and asking the linker to 32-byte align the __earlycon_table symbol. This
fix was based on commit 07fca0e57f ("tracing: Properly align linker
defined symbols") which tried a similar fix for the tracing subsystem.
However, this fix doesn't quite work because there is no guarantee that
gcc will place structures packed into an array format. In fact, gcc 4.9
chooses to 64-byte align these structs by inserting additional padding
between the entries because it has no clue that they are supposed to be in
an array. If we are unlucky, the linker will assign symbol
"__earlycon_table" to a 32-byte aligned address which does not correspond
to the 64-byte aligned contents of section "__earlycon_table".
To address this same problem, the fix to the tracing system was
subsequently re-implemented using a more robust table of pointers approach
by commits:
3d56e331b6 ("tracing: Replace syscall_meta_data struct array with pointer array")
6549864629 ("tracepoints: Fix section alignment using pointer array")
e4a9ea5ee7 ("tracing: Replace trace_event struct array with pointer array")
Let's use this same "array of pointers to structs" approach for
EARLYCON_TABLE.
Fixes: 99492c39f3 ("earlycon: Fix __earlycon_table stride")
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Suggested-by: Aaron Durbin <adurbin@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have a couple of files that try to include asm/compat.h on
architectures where this is available. Those should generally use the
higher-level linux/compat.h file, but that in turn fails to include
asm/compat.h when CONFIG_COMPAT is disabled, unless we can provide
that header on all architectures.
This adds the asm/compat.h for all remaining architectures to
simplify the dependencies.
Architectures that are getting removed in linux-4.17 are not changed
here, to avoid needless conflicts with the removal patches. Those
architectures are broken by this patch, but we have already shown
that they have no users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
I have one regression fix for a minor build problem after the architecture
removal series, plus a rework of the barriers in the readl/writel
functions, thanks to work by Sinan Kaya:
This started from a discussion on the linuxpcc and rdma mailing lists
[1]. To summarize, we decided that architectures are responsible to
serialize readl() and writel() accesses on a device MMIO space relative
to DMA performed by that device.
This series provides a pessimistic implementation of that behavior for
asm-generic/io.h, which is in turn used by a number of architectures
(h8300, microblaze, nios2, openrisc, s390, sparc, um, unicore32, and
xtensa). Some of those presumably need no extra barriers, or something
weaker than rmb()/wmb(), and they are advised to override the new default
for better performance.
For inb()/outb(), the same barriers are used, but architectures might
want to add another barrier to outb() here if that can guarantee
non-posted behavior (some architectures can, others cannot do that).
The readl_relaxed()/writel_relaxed() family of functions retains the
existing behavior with no extra barriers.
[1]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-March/170481.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJazitHAAoJEGCrR//JCVInd0wP/iMzr1HWDgMjeeuxekFjwWDg
9fL+BFt1afeYb4wniqJcF7ymLow/H5Fbhj4dwM1p34De+CZ3+3JGNyK8qzoeKPjR
I2U5QqjWCHWDqpWRGWxO28dbs5/1EoW1zgctTNMUPHiamnomz9XIn0xaVKpu4HZ3
OtaeJm8seKTSj1+A2fye9sDpqMUJuVcnZAWJgqMJ8T98uMBOiJYWHftnFEJpSlwG
SJSt4AYsJnE+3BFawX1g3VWrHn9WN1uwVasJ1INFkLYNuLMYaK7RYjoBWNwHW+RQ
luq4xZE+HZehyZptilfs05x2IlhGSOVN5m0nVM2if9aXoEoO1UdaySbwO6Ukq085
VyfCzY+k4l0v44o4JqaSyAFLEae0809E6cQcGg3cjdstQv1Q3cgAJ96myP0x+QTw
b0xJGoo46eOfqpK4njARyjTSceYPgzkB5Dqngg9rCuh+EogotWpRRDB6zoeGGRK8
oOzMp0qLsAZFcYvjft5h0Cp6X51qfyJpBkJkvnASmF4yJPZlpCRGux+HM3jFb9bV
zbH+KPqTa47OmOK8MNIaFHMR1yMgZU6B2oEwFDEaG0M+6FC5irMSkgcDwIIMJXlJ
wLp7+4WhwFzFDe1mp/tKM5V4h9D6vQtSUjgOJffhxRXqCMkxc7eABmYBBkjMCsca
ibKXyZN16d1kRU9j7upb
=oBQh
-----END PGP SIGNATURE-----
Merge tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"I have one regression fix for a minor build problem after the
architecture removal series, plus a rework of the barriers in the
readl/writel functions, thanks to work by Sinan Kaya:
This started from a discussion on the linuxpcc and rdma mailing
lists[1]. To summarize, we decided that architectures are responsible
to serialize readl() and writel() accesses on a device MMIO space
relative to DMA performed by that device.
This series provides a pessimistic implementation of that behavior for
asm-generic/io.h, which is in turn used by a number of architectures
(h8300, microblaze, nios2, openrisc, s390, sparc, um, unicore32, and
xtensa). Some of those presumably need no extra barriers, or something
weaker than rmb()/wmb(), and they are advised to override the new
default for better performance.
For inb()/outb(), the same barriers are used, but architectures might
want to add another barrier to outb() here if that can guarantee
non-posted behavior (some architectures can, others cannot do that).
The readl_relaxed()/writel_relaxed() family of functions retains the
existing behavior with no extra barriers"
[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-March/170481.html
* tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
io: change writeX_relaxed() to remove barriers
io: change readX_relaxed() to remove barriers
dts: remove cris & metag dts hard link file
io: change inX() to have their own IO barrier overrides
io: change outX() to have their own IO barrier overrides
io: define stronger ordering for the default writeX() implementation
io: define stronger ordering for the default readX() implementation
io: define several IO & PIO barrier types for the asm-generic version
Now that we hardened writeX() API in asm-generic version, writeX_relaxed()
API is violating the rules when writeX_relaxed() == writeX() in the default
implementation.
The relaxed API shouldn't have any barriers in it and it doesn't provide
any ordering with respect to the memory transactions. The only requirement
is for writes to be ordered with respect to each other. This is achieved
by the volatile in the __raw_writeX() API.
Open code the relaxed API and remove any barriers in it.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Now that we hardened readX() API in asm-generic version, readX_relaxed()
API is violating the rules when readX_relaxed() == readX() in the default
implementation.
The relaxed API shouldn't have any barriers in it and it doesn't provide
any ordering with respect to the memory transactions. The only requirement
is for reads to be ordered with respect to each other. This is achieved
by the volatile in the __raw_readX() API.
Open code the relaxed API and remove any barriers in it.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past invalid
privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as of now)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
=bPlD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past
invalid privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as
of now)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
kvm: x86: fix a prototype warning
kvm: selftests: add sync_regs_test
kvm: selftests: add API testing infrastructure
kvm: x86: fix a compile warning
KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
KVM: X86: Introduce handle_ud()
KVM: vmx: unify adjacent #ifdefs
x86: kvm: hide the unused 'cpu' variable
KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
kvm: Add emulation for movups/movupd
KVM: VMX: raise internal error for exception during invalid protected mode state
KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
KVM: x86: Fix misleading comments on handling pending exceptions
KVM: x86: Rename interrupt.pending to interrupt.injected
KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
x86/kvm: use Enlightened VMCS when running on Hyper-V
x86/hyper-v: detect nested features
x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlrHeY8UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxhLRAAndV/0NDyWZU0eZNM6twri2SEFnF7
E4ar+YthxDxxJG4TLJbIA12jc5NgHZy4WuttDa6Jb99KreBXIHJFlNi/V/tme6zf
+yXUuxWae7wJzBiaay57VqLGSc80gt/LTgjLa1siwQqjTbO3wSXR6JJXNaE9FtQ4
/jL61t8bD1Peb5cWTpt9p0hrnKI0/pHwASdReyFS4F/HDKdvpof7BxE/OU3HSxxA
XKC2v6RjY4S93vkzvApDXQ+vhKquVRK7/ojyTXQUO/GIzcARprO7H4k62N4ar0x/
qbXLkR8IMkwA8ecsNmcL92ftb/cXoHfd+wdK8WpijqzF4kW4SdteVWbIhUzI0gbr
0gjDYIzjplvH3pZGv/qvx+8sFtAP95OdPjuAAW2qJ9TCVfmiS8naNFCvcxg87RhD
gjyQD3If1X7F8wy309lhq7VNyRexTHgIMgTXHyFvuZMzn/Qe1huL2XCwDcEAg/OX
AvU2iuSE5tWAh7gIUMF/aWi3uoeJUyyoru5ZR//gqdFfx9YxpSimO1UDXnpPi8SR
Iz/jzHJc0aWGYdQ9l6HiSbJF3P/QQcWYs9igt0A7BRGB05SPdWCh7sSO70FJa8ME
f4WID5/qEiaH26kiSRX4cUqpc8Amk8bT0DXw2OT57qy3JM0ZdV5ENQX11pSpr9hv
uLEf0DU7AEmdvzQ=
=T++R
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- move pci_uevent_ers() out of pci.h (Michael Ellerman)
- skip ASPM common clock warning if BIOS already configured it (Sinan
Kaya)
- fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)
- remove last user of pci_get_bus_and_slot() and the function itself
(Sinan Kaya)
- add decoding for 16 GT/s link speed (Jay Fang)
- add interfaces to get max link speed and width (Tal Gilboa)
- add pcie_bandwidth_capable() to compute max supported link bandwidth
(Tal Gilboa)
- add pcie_bandwidth_available() to compute bandwidth available to
device (Tal Gilboa)
- add pcie_print_link_status() to log link speed and whether it's
limited (Tal Gilboa)
- use PCI core interfaces to report when device performance may be
limited by its slot instead of doing it in each driver (Tal Gilboa)
- fix possible cpqphp NULL pointer dereference (Shawn Lin)
- rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
hotplug (Mika Westerberg)
- add support for PCI I/O port space that's neither directly accessible
via CPU in/out instructions nor directly mapped into CPU physical
memory space. This is fairly intrusive and includes minor changes to
interfaces used for I/O space on most platforms (Zhichang Yuan, John
Garry)
- add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
John Garry)
- use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)
- remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
(Shawn Lin)
- report quirk timings with dev_info (Bjorn Helgaas)
- report quirks that take longer than 10ms (Bjorn Helgaas)
- add and use Altera Vendor ID (Johannes Thumshirn)
- tidy Makefiles and comments (Bjorn Helgaas)
- don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
ia64, and mn10300 with x86 (Bjorn Helgaas)
- move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
Lawler)
- merge pcieport_if.h into portdrv.h (Bjorn Helgaas)
- move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
Helgaas)
- completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)
- remove portdrv link order dependency (Bjorn Helgaas)
- remove support for unused VC portdrv service (Bjorn Helgaas)
- simplify portdrv feature permission checking (Bjorn Helgaas)
- remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
Helgaas)
- remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)
- use cached AER capability offset (Frederick Lawler)
- don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)
- rename pcie-dpc.c to dpc.c (Bjorn Helgaas)
- use generic pci_mmap_resource_range() instead of powerpc and xtensa
arch-specific versions (David Woodhouse)
- support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)
- remove System and Video ROM reservations on sparc (Bjorn Helgaas)
- probe for device reset support during enumeration instead of runtime
(Bjorn Helgaas)
- add ACS quirk for Ampere (née APM) root ports (Feng Kan)
- add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
Vincent-Cross)
- protect device restore with device lock (Sinan Kaya)
- handle failure of FLR gracefully (Sinan Kaya)
- handle CRS (config retry status) after device resets (Sinan Kaya)
- skip various config reads for SR-IOV VFs as an optimization
(KarimAllah Ahmed)
- consolidate VPD code in vpd.c (Bjorn Helgaas)
- add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
- add DT support for R-Car r8a7743 (Biju Das)
- fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
bridge driver that causes a general protection fault (Dexuan Cui)
- fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
(Dexuan Cui)
- fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
(Dexuan Cui)
- make several structures static (Fengguang Wu)
- increase number of MSI IRQs supported by Synopsys DesignWare bridges
from 32 to 256 (Gustavo Pimentel)
- implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
API from DesignWare drivers (Gustavo Pimentel)
- add Tegra power management support (Manikanta Maddireddy)
- add Tegra loadable module support (Manikanta Maddireddy)
- handle 64-bit BARs correctly in endpoint support (Niklas Cassel)
- support optional regulator for HiSilicon STB (Shawn Guo)
- use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)
- support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)
* tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
HISI LPC: Add ACPI support
ACPI / scan: Do not enumerate Indirect IO host children
ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
of: Add missing I/O range exception for indirect-IO devices
PCI: Apply the new generic I/O management on PCI IO hosts
PCI: Add fwnode handler as input param of pci_register_io_range()
PCI: Remove __weak tag from pci_register_io_range()
MAINTAINERS: Add missing /drivers/pci/cadence directory entry
fm10k: Report PCIe link properties with pcie_print_link_status()
net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
net/mlx5: Report PCIe link properties with pcie_print_link_status()
net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
PCI: Add pcie_print_link_status() to log link speed and whether it's limited
PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
misc: pci_endpoint_test: Handle 64-bit BARs properly
PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
...
Open code readX() inside inX() so that inX() variants have their own
overrideable Port IO barrier combinations as __io_pbr() and __io_par() for
actions to be taken before port IO and after port IO read.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Open code writeX() inside outX() so that outX() variants have their own
overrideable Port IO barrier combinations as __io_pbw() and __io_paw() for
actions to be taken before port IO and after port IO write.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The default implementation of mapping writeX() to __raw_writeX() is wrong.
writeX() has stronger ordering semantics. Compiler is allowed to reorder
memory writes against __raw_writeX().
Use the previously defined __io_aw() and __io_bw() macros to harden
code generation according to architecture support.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The default implementation of mapping readX() to __raw_readX() is wrong.
readX() has stronger ordering semantics. Compiler is allowed to reorder
__raw_readX() against the memory accesses following register read.
Use the previously defined __io_ar() and __io_br() macros to harden
code generation according to architecture support.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Getting ready to harden readX()/writeX() and inX()/outX() semantics for the
generic implementation.
Defining two set of macros as __io_br() and __io_ar() to indicate actions
to be taken before and after MMIO read.
Defining two set of macros as __io_bw() and __io_aw() to indicate actions
to be taken before and after MMIO write.
Defining two set of macros as __io_pbw() and __io_paw() to indicate actions
to be taken before and after Port IO write.
Defining two set of macros as __io_pbr() and __io_par() to indicate actions
to be taken before and after Port IO read.
If rmb() is available for the architecture, prefer rmb() as the default
implementation of __io_ar()/__io_par().
If wmb() is available for the architecture, prefer wmb() as the default
implementation of __io_bw()/__io_pbw().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Nothing particularly stands out here, probably because people were tied
up with spectre/meltdown stuff last time around. Still, the main pieces
are:
- Rework of our CPU features framework so that we can whitelist CPUs that
don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to elide
instruction and data cache maintenance when writing out instructions
- Removal of the large memory model which resulted in suboptimal codegen
by the compiler and increased the use of literal pools, which could
potentially be used as ROP gadgets since they are mapped as executable
- Rework of forced signal delivery so that the siginfo_t is well-formed
and handling of show_unhandled_signals is consolidated and made
consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJaw1TCAAoJELescNyEwWM0gyQIAJVMK4QveBW+LwF96NYdZo16
p90Aa+nqKelh/s93govQArDMv1gxyuXdFlQZVOGPQHfqpz6RhJWmBA2tFsUbQrUc
OBcioPrRihqTmKBe+1r1XORwZxkVX6GGmCn0LYpPR7I3TjxXZpvxqaxGxiUvHkci
yVxWlDTyN/7eL3akhCpCDagN3Fxwk3QnJLqE3fxOFMlY7NvQcmUxcITiUl/s469q
xK6SWH9SRH1JK8jTHPitwUBiU//3FfCqSI9HLEdDIDoTuPcVM8UetWvi4QzrzJL1
UYg8lmU0CXNmflDzZJDaMf+qFApOrGxR0YVPpBzlQvxe0JIY69g48f+JzDPz8nc=
=+gNa
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Nothing particularly stands out here, probably because people were
tied up with spectre/meltdown stuff last time around. Still, the main
pieces are:
- Rework of our CPU features framework so that we can whitelist CPUs
that don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to
elide instruction and data cache maintenance when writing out
instructions
- Removal of the large memory model which resulted in suboptimal
codegen by the compiler and increased the use of literal pools,
which could potentially be used as ROP gadgets since they are
mapped as executable
- Rework of forced signal delivery so that the siginfo_t is
well-formed and handling of show_unhandled_signals is consolidated
and made consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric
Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
arm64: uaccess: Fix omissions from usercopy whitelist
arm64: fpsimd: Split cpu field out from struct fpsimd_state
arm64: tlbflush: avoid writing RES0 bits
arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h
arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG
arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC
arm64: fpsimd: include <linux/init.h> in fpsimd.h
drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor
perf: arm_spe: include linux/vmalloc.h for vmap()
Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)"
arm64: cpufeature: Avoid warnings due to unused symbols
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
arm64: Delay enabling hardware DBM feature
arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
arm64: capabilities: Handle shared entries
arm64: capabilities: Add support for checks based on a list of MIDRs
arm64: Add helpers for checking CPU MIDR against a range
arm64: capabilities: Clean up midr range helpers
arm64: capabilities: Change scope of VHE to Boot CPU feature
...
After introducing the new generic I/O space management (Logical PIO), the
original PCI MMIO relevant helpers need to be updated based on the new
interfaces defined in logical PIO.
Adapt the corresponding code to match the changes introduced by logical
PIO.
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # earlier draft
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Pull sparc updates from David Miller:
1) Add support for ADI (Application Data Integrity) found in more
recent sparc64 cpus. Essentially this is keyed based access to
virtual memory, and if the key encoded in the virual address is
wrong you get a trap.
The mm changes were reviewed by Andrew Morton and others.
Work by Khalid Aziz.
2) Validate DAX completion index range properly, from Rob Gardner.
3) Add proper Kconfig deps for DAX driver. From Guenter Roeck.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
sparc64: Make atomic_xchg() an inline function rather than a macro.
sparc64: Properly range check DAX completion index
sparc: Make auxiliary vectors for ADI available on 32-bit as well
sparc64: Oracle DAX driver depends on SPARC64
sparc64: Update signal delivery to use new helper functions
sparc64: Add support for ADI (Application Data Integrity)
mm: Allow arch code to override copy_highpage()
mm: Clear arch specific VM flags on protection change
mm: Add address parameter to arch_validate_prot()
sparc64: Add auxiliary vectors to report platform ADI properties
sparc64: Add handler for "Memory Corruption Detected" trap
sparc64: Add HV fault type handlers for ADI related faults
sparc64: Add support for ADI register fields, ASIs and traps
mm, swap: Add infrastructure for saving page metadata on swap
signals, sparc: Add signal codes for ADI violations
Pull networking updates from David Miller:
1) Support offloading wireless authentication to userspace via
NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.
2) A lot of work on network namespace setup/teardown from Kirill Tkhai.
Setup and cleanup of namespaces now all run asynchronously and thus
performance is significantly increased.
3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon
Streiff.
4) Support zerocopy on RDS sockets, from Sowmini Varadhan.
5) Use denser instruction encoding in x86 eBPF JIT, from Daniel
Borkmann.
6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime
Chevallier.
7) Support grafting of child qdiscs in mlxsw driver, from Nogah
Frankel.
8) Add packet forwarding tests to selftests, from Ido Schimmel.
9) Deal with sub-optimal GSO packets better in BBR congestion control,
from Eric Dumazet.
10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.
11) Add path MTU tests to selftests, from Stefano Brivio.
12) Various bits of IPSEC offloading support for mlx5, from Aviad
Yehezkel, Yossi Kuperman, and Saeed Mahameed.
13) Support RSS spreading on ntuple filters in SFC driver, from Edward
Cree.
14) Lots of sockmap work from John Fastabend. Applications can use eBPF
to filter sendmsg and sendpage operations.
15) In-kernel receive TLS support, from Dave Watson.
16) Add XDP support to ixgbevf, this is significant because it should
allow optimized XDP usage in various cloud environments. From Tony
Nguyen.
17) Add new Intel E800 series "ice" ethernet driver, from Anirudh
Venkataramanan et al.
18) IP fragmentation match offload support in nfp driver, from Pieter
Jansen van Vuuren.
19) Support XDP redirect in i40e driver, from Björn Töpel.
20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of
tracepoints in their raw form, from Alexei Starovoitov.
21) Lots of striding RQ improvements to mlx5 driver with many
performance improvements, from Tariq Toukan.
22) Use rhashtable for inet frag reassembly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits)
net: mvneta: improve suspend/resume
net: mvneta: split rxq/txq init and txq deinit into SW and HW parts
ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
net: bgmac: Correctly annotate register space
route: check sysctl_fib_multipath_use_neigh earlier than hash
fix typo in command value in drivers/net/phy/mdio-bitbang.
sky2: Increase D3 delay to sky2 stops working after suspend
net/mlx5e: Set EQE based as default TX interrupt moderation mode
ibmvnic: Disable irqs before exiting reset from closed state
net: sched: do not emit messages while holding spinlock
vlan: also check phy_driver ts_info for vlan's real device
Bluetooth: Mark expected switch fall-throughs
Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME
Bluetooth: btrsi: remove unused including <linux/version.h>
Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4
sh_eth: kill useless check in __sh_eth_get_regs()
sh_eth: add sh_eth_cpu_data::no_xdfar flag
ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()
ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()
...
This removes the entire architecture code for blackfin, cris, frv, m32r,
metag, mn10300, score, and tile, including the associated device drivers.
I have been working with the (former) maintainers for each one to ensure
that my interpretation was right and the code is definitely unused in
mainline kernels. Many had fond memories of working on the respective
ports to start with and getting them included in upstream, but also saw
no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company
in charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems
that all the SoC product lines are still around, but have not used the
custom CPU architectures for several years at this point. In contrast,
CPU instruction sets that remain popular and have actively maintained
kernel ports tend to all be used across multiple licensees.
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I made
sure that they are all unused. Some architectures (notably tile, mn10300,
and blackfin) are still being shipped in products with old kernels,
but those products will never be updated to newer kernel releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing their
support or getting it added to mainline gcc in the first place.
They all have patched gcc-7.3 ports that work to some degree, but
complete upstream support won't happen before gcc-8.1. Csky posted
their first kernel patch set last week, their situation will be similar.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJawdL2AAoJEGCrR//JCVInuH0P/RJAZh1nTD+TR34ZhJq2TBoo
PgygwDU7Z2+tQVU+EZ453Gywz9/NMRFk1RWAZqrLix4ZtyIMvC6A1qfT2yH1Y7Fb
Qh6tccQeLe4ezq5u4S/46R/fQXu3Txr92yVwzJJUuPyU0arF9rv5MmI8e6p7L1en
yb74kSEaCe+/eMlsEj1Cc1dgthDNXGKIURHkRsILoweysCpesjiTg4qDcL+yTibV
FP2wjVbniKESMKS6qL71tiT5sexvLsLwMNcGiHPj94qCIQuI7DLhLdBVsL5Su6gI
sbtgv0dsq4auRYAbQdMaH1hFvu6WptsuttIbOMnz2Yegi2z28H8uVXkbk2WVLbqG
ZESUwutGh8MzOL2RJ4jyyQq5sfo++CRGlfKjr6ImZRv03dv0pe/W85062cK5cKNs
cgDDJjGRorOXW7dyU6jG2gRqODOQBObIv3w5efdq5OgzOWlbI4EC+Y5u1Z0JF/76
pSwtGXA6YhwC+9LLAlnVTHG+yOwuLmAICgoKcTbzTVDKA2YQZG/cYuQfI5S1wD8e
X6urPx3Md2GCwLXQ9mzKBzKZUpu/Tuhx0NvwF4qVxy6x1PELjn68zuP7abDHr46r
57/09ooVN+iXXnEGMtQVS/OPvYHSa2NgTSZz6Y86lCRbZmUOOlK31RDNlMvYNA+s
3iIVHovno/JuJnTOE8LY
=fQ8z
-----END PGP SIGNATURE-----
Merge tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pul removal of obsolete architecture ports from Arnd Bergmann:
"This removes the entire architecture code for blackfin, cris, frv,
m32r, metag, mn10300, score, and tile, including the associated device
drivers.
I have been working with the (former) maintainers for each one to
ensure that my interpretation was right and the code is definitely
unused in mainline kernels. Many had fond memories of working on the
respective ports to start with and getting them included in upstream,
but also saw no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company in
charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
seems that all the SoC product lines are still around, but have not
used the custom CPU architectures for several years at this point. In
contrast, CPU instruction sets that remain popular and have actively
maintained kernel ports tend to all be used across multiple licensees.
[ See the new nds32 port merged in the previous commit for the next
generation of "one company in charge of an SoC line, a CPU
microarchitecture and a software ecosystem" - Linus ]
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I
made sure that they are all unused. Some architectures (notably tile,
mn10300, and blackfin) are still being shipped in products with old
kernels, but those products will never be updated to newer kernel
releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing
their support or getting it added to mainline gcc in the first
place. They all have patched gcc-7.3 ports that work to some
degree, but complete upstream support won't happen before gcc-8.1.
Csky posted their first kernel patch set last week, their situation
will be similar
[ Palmer Dabbelt points out that RISC-V support is in mainline gcc
since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"
This really says it all:
2498 files changed, 95 insertions(+), 467668 deletions(-)
* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
MAINTAINERS: UNICORE32: Change email account
staging: iio: remove iio-trig-bfin-timer driver
tty: hvc: remove tile driver
tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
serial: remove tile uart driver
serial: remove m32r_sio driver
serial: remove blackfin drivers
serial: remove cris/etrax uart drivers
usb: Remove Blackfin references in USB support
usb: isp1362: remove blackfin arch glue
usb: musb: remove blackfin port
usb: host: remove tilegx platform glue
pwm: remove pwm-bfin driver
i2c: remove bfin-twi driver
spi: remove blackfin related host drivers
watchdog: remove bfin_wdt driver
can: remove bfin_can driver
mmc: remove bfin_sdh driver
input: misc: remove blackfin rotary driver
input: keyboard: remove bf54x driver
...
driver and timer driver), which has been through 7 rounds of review on mailing
list.
It is able to boot to shell and passes most LTP-2017 testsuites in nds32 AE3XX
platform.
Total Tests: 1901
Total Skipped Tests: 618
Total Failures: 78
Copied below is the ChangeLog that contains the history of this patch set:
Changes in v7:
- Update cpu binding document to add "andestech,nds32v3" as fallback
- Remove unnecessary configs of arch/nds32/Kconfig
- Use GENERIC_CALIBRATE_DELAY
- Add more help texts for minimum CPU type config
- Update defconfig because of Kconfig changed and bug fixed
- Move early_trap_init() declaration to nds32.h
- Refine dma.c
- Remove apply_relocate() in module.c and include <linux/moduleloader.h> to catch it
- Add do_kernel_restart() in machine_restart()
- Clean up setup.c to remove CONFIG_VGA_CONSOLE and some extern declaration functions
- Add negative dependency for VGA_CONSOLE on nds32
- Refine ptrace.c and arch/nds32/include/asm/ptrace.h
- Refine syscall restart flow and arch/nds32/kernel/signal.c
- Fix a bug in VDSO
- Remove the handling for kernel code unaligned accessing
- Add a description for unaligned access handling in git commit message.
- Rebase to v4.16-rc1
- Replace ACCESS_ONCE with READ_ONCE
- Replace atomic_long_dec(&mm->nr_ptes) with mm_dec_nr_ptes(mm)
- Remove print_symbol(%s) with printk(%pS)
- Add bpf_perf_event.h
- Remove init_stack and init_thread_info
Changes in v6:
- Refine naming for atl2c
- Refine ae3xx.dts
- Remove CONFIG_TIMER_ATCPIT100 in defconfig
- Refine elf.h
- Fix a vdso bug
- Separate arch patchset and timer patchset
- To select TIMER_OF in drivers/clocksource/Kconfig instead of arch/nds32/Kconfig
Changes in v5:
- Remove __NR__llseek and sys_mmap()
- Add a comment to explain that we don't have clocksource cycle counter in the CPU
- Add volatile in iounmap()
- Fix typo Featuretures to Features
- Replace CPU_CACHE_NONALIASING with !CPU_CACHE_ALIASING
- Fix a endian bug when we try to get val = of_get_property(cpu,"clock-frequency", NULL)
- Add screen_info to fix the building error when CONFIG_ VGA_CONSOLE is enabled
- Remove unnecessary msync()
- Add depends on !64BIT || BROKEN for faraday Kconfig because the descriptor only supports 32bit
- Add atl2c binding document
- Remove unnecessary include headers
- Fix a vector table bug. It placed wrong vector handlers for 2 exceptions.
- Fix a vdso bug. It may encounter TLB multi-hit exception because we accidently set it as a global page.
- Add proper isb and barrier after some cache operations
- Fix a bug in system call restart flow. $r0 ~ $r5 does not be recovered before restarting system call
- Fix the build errors for OpenRISC and SPARC because io.h changed.
- Update ae3xx.dts to support atl2c.
Changes in v4:
- Add atcpit100 timer driver due to it include vdso implementations and sent
them together with nds32 may help reviewer to review.
- Update ae3xx.dts for atcpit100 clock setting and remove vdso settings.
- To get cycle counter register by timer driver instead of dts.
- Use "depends on NDS32 || COMPILE_TEST" in atcpit100 driver because it is needed for nds32 vdso
- Update defconfig becasue kconfig rename from CONFIG_CLKSRC_ATCPIT100 to CONFIG_TIMER_ATCPIT100
- Remove ag101p.dts because we are not yet ready for ag101p platform.
- Update copyright style to SPDX-License-Identifier
- Include <linux/uaccess.h> instead of <asm/uaccess.h>
- Add local_irq_save()/local_irq_restore() to protect SR_TLB_VPN in update_mmu_cache().
- Update cpu_dcache_inval_all implementation to make sure all level cache are writeback.
Changes in v3:
- Use arch's io.h instead of generic one
- Add andestech-boards binding document
- Update nds32/cpus.txt binding document
- Remove atcpit100 timer drivers
- Select NO_BOOTMEM and delete HAVE_MEMBLOCK_NODE_MAP
- make CPU_BIG_ENDIAN and CPU_LITTLE_ENDIAN are dependent
- Add cpu type to select HWZOL/CPU_CACHE_ALIASING
- Change CPU_CACHE_NONALIASING to CPU_CACHE_ALIASING
- Remove bootarg from device tree script
- Update ag101p.dts and ae3xx.dts for correct board name.
- Clear and simplify defconfig
- Implement L2C_R_REG/ L2C_W_REG with readl/writel instead of __raw_readl/__raw_writel for endian save
- Remove early_init_dt_add_memory_arch/early_init_dt_alloc_memory_arch to use the generic ones
- Refine devicetree.c
- Fix bug https://lkml.kernel.org/r/1499782590-31366-1-git-send-ema...
- Refine irqchip/irq-ativic32.c implementations
- Add COMPILE_TEST in drivers/net/ethernet/faraday/Kconfig
- Refine cache operations
- Add CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS
- Fix ZERO_PAGE define
- Remove SA_RESTORER
- Remove uapi/asm/signal.h
- Redefine user_pt_regs
- Remove spinlock.h
- Remove __ARCH_WANT_RENAMEAT and __ARCH_WANT_SYSCALL_OFF_T from unistd.h
- Remove set_fs(USER_DS) because flush_old_exec() will do this setting
- Replace in_atomic() with faulthandler_disabled()
- Add barrier.h
- Select COMMON_CLK
- Add clk_pll in dts
- Add of_clk_init() in arch/nds32/kernel/time.c
Changes in v2:
- Set GENERIC_CALIBRATE_DELAY default n
- Add earlycon support
- Remove earlyprintk
- Add CPU_BIG_ENDIAN, CPU_LITTLE_ENDIAN support
- Refine unalignment access exception handler
- Add VMSPLIT support
- Use only one defconfig
- Change interrupt-cells from 2 to 1
- Refine andestech cpu names in bindings/nds32/cpus.txt
- Get clock frequency in dts because fpga bitmap doesn't include this feature
- Update MAINTAINERS for bindings
- Remove unused configs in Kconfig
- Refine device tree scripts
- Refine coding style
- Use generic ioremap_nocache
- Remove L2CC_PA_BASE define and its codes in head.S. It will be moved to bootloader.
- Set PHYS_OFFSET to 0x0 instead of CONFIG_MEMORY_START
- Remove unused macros
- Simplify cpu_cache_* API
- Change __asm__ __volatile__ to asm volatile
- Refine uaccess.h
- Remove unused/deprecated syscall
- Use generic posix_types.h
- Remove arch_trace_hardirqs_on/arch_trace_hardirqs_off
- Fix bug of restart syscall
- Refine syscall implementations
- Use IS_ENABLED to replace ifdef as possible
- Remove device_initcall(nds32_device_probe)
- Refine vdso implementations
- Refine copy_from_user()/copy_to_user()/clear_user()/get_user()/memmove()/memcpy()
- Refine ioremap.c
- Refine irq-ativic32.c
- Fix a bug of earlycon.c
- Export ioremap_nocache/ioremap_uc/ioremap_wc/ioremap_wt
- Add atcpit100 driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
iQIcBAABAgAGBQJawZ28AAoJEHfB0l0b2JxExKYQAJ7btaPeIplIndrphlTQkfzW
d1AhVBhwAlvqsOcFf+kwVIqOnfcLCtzVvgc63qc6mAroDZKcd+uuqOtkC801b33i
jxcfSX802PciT3VhE8xz9OWFY9D3in/UCBhGe2OvY/cD/eC34gZhUqJhML/ioR5q
DNPvua6lgYIN9VrFds19MjRzl7RwDBqNccQoFTXWc9Hl1Vs0YdKdbkOz0IWNtoLQ
crP0v/UuHMC++WdU+MvDIEFqNVuXikg/NA+odPIbp3eF3xcmQBM0blWAi37eOKFo
rzTw7TKtL8xObjvhyzx3aYFKPpLBfuYwk8onoZlthlqcwFClZy4lzdDdDxJhKiJ6
5hilzCSqEWXB9osQsrWgAuK1rNRvroChIp6/rcdGAq33mTPLVydx7hSKELhE7wuN
UUaiJSSNRG1ZrR8tkccQpaRBjJ/gfXWGC3ys723oWz8A4bDzMkvZVzdOGOEZ+CsI
w4HKNHLeY50wztV6dDSiVPhvUXQjBH9qd2zVHlutbfulPI/XNkGRfWpEGVT1zD4y
pO3aHVJfsv+8aeyVBcXyN74O34a9HYa7811v7V5RI+uftdPhkzOwxuMQVMMJPe3s
4u4NglP7fekeWyDGCXFKOGoVOuCAUPHuzBUCFjL/cStHjrkBGVQvL1I925sJNabu
IatFh62x8Ez4m2hIf5fg
=J/YF
-----END PGP SIGNATURE-----
Merge tag 'nds32-for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
Pull nds32 architecture support from Greentime Hu:
"This contains the core nds32 Linux port (including interrupt
controller driver and timer driver), which has been through seven
rounds of review on mailing list.
It is able to boot to shell and passes most LTP-2017 testsuites in
nds32 AE3XX platform:
Total Tests: 1901
Total Skipped Tests: 618
Total Failures: 78"
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
* tag 'nds32-for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux: (44 commits)
nds32: To use the generic dump_stack()
nds32: fix building failed if using elf toolchain.
nios2: add ioremap_nocache declaration before include asm-generic/io.h.
nds32: fix building failed if using older version gcc.
dt-bindings: timer: Add andestech atcpit100 timer binding doc
clocksource/drivers/atcpit100: VDSO support
clocksource/drivers/atcpit100: Add andestech atcpit100 timer
net: faraday add nds32 support.
irqchip: Andestech Internal Vector Interrupt Controller driver
dt-bindings: interrupt-controller: Andestech Internal Vector Interrupt Controller
dt-bindings: nds32 SoC Bindings
dt-bindings: nds32 L2 cache controller Bindings
dt-bindings: nds32 CPU Bindings
MAINTAINERS: Add nds32
nds32: Build infrastructure
nds32: defconfig
nds32: Miscellaneous header files
nds32: Device tree support
nds32: Generic timers support
nds32: Loadable modules
...
Pull x86 mm updates from Ingo Molnar:
- Extend the memmap= boot parameter syntax to allow the redeclaration
and dropping of existing ranges, and to support all e820 range types
(Jan H. Schönherr)
- Improve the W+X boot time security checks to remove false positive
warnings on Xen (Jan Beulich)
- Support booting as Xen PVH guest (Juergen Gross)
- Improved 5-level paging (LA57) support, in particular it's possible
now to have a single kernel image for both 4-level and 5-level
hardware (Kirill A. Shutemov)
- AMD hardware RAM encryption support (SME/SEV) fixes (Tom Lendacky)
- Preparatory commits for hardware-encrypted RAM support on Intel CPUs.
(Kirill A. Shutemov)
- Improved Intel-MID support (Andy Shevchenko)
- Show EFI page tables in page_tables debug files (Andy Lutomirski)
- ... plus misc fixes and smaller cleanups
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
x86/cpu/tme: Fix spelling: "configuation" -> "configuration"
x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT
x86/mm: Update comment in detect_tme() regarding x86_phys_bits
x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic
x86/mm: Remove pointless checks in vmalloc_fault
x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms
ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback
ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export
x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf
x86/pconfig: Detect PCONFIG targets
x86/tme: Detect if TME and MKTME is activated by BIOS
x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
x86/boot/compressed/64: Use page table in trampoline memory
x86/boot/compressed/64: Use stack from trampoline memory
x86/boot/compressed/64: Make sure we have a 32-bit code segment
x86/mm: Do not use paravirtualized calls in native_set_p4d()
kdump, vmcoreinfo: Export pgtable_l5_enabled value
x86/boot/compressed/64: Prepare new top-level page table for trampoline
x86/boot/compressed/64: Set up trampoline memory
x86/boot/compressed/64: Save and restore trampoline memory
...
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.
>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
__u64 args[0];
};
int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
// program can read args[N] where N depends on tracepoint
// and statically verified at program load+attach time
}
kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.
Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.
For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
0xffffffff81132080 <+0>: mov %ecx,%ecx
0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3>
where
TRACE_EVENT(xdp_exception,
TP_PROTO(const struct net_device *dev,
const struct bpf_prog *xdp, u32 act),
The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.
This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.
The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.
The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.
On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.
Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.
In the future bpf_raw_tracepoints can be extended with
query/introspection logic.
__bpf_raw_tp_map section logic was contributed by Steven Rostedt
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
create pud/pmd mappings. A kernel panic was observed on arm64 systems
with Cortex-A75 in the following steps as described by Hanjun Guo.
1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.
This panic is not reproducible on x86. INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86. x86
still has memory leak.
The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:
- The iounmap() path is shared with vunmap(). Since vmap() only
supports pte mappings, making vunmap() to free a pte page is an
overhead for regular vmap users as they do not need a pte page freed
up.
- Checking if all entries in a pte page are cleared in the unmap path
is racy, and serializing this check is expensive.
- The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
purge.
Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
clear a given pud/pmd entry and free up a page for the lower level
entries.
This patch implements their stub functions on x86 and arm64, which work
as workaround.
[akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
Fixes: e61ce6ade4 ("mm: change ioremap to set up huge I/O mappings")
Reported-by: Lei Li <lious.lilei@hisilicon.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
41f8bba7f5 ("of/pci: Add pci_register_io_range() and
pci_pio_to_address()") added support for PCI I/O space mapped into CPU
physical memory space. With that support, the I/O ranges configured for
PCI/PCIe hosts on some architectures can be mapped to logical PIO and
converted easily between CPU address and the corresponding logical PIO.
Based on this, PCI I/O port space can be accessed via in/out accessors that
use memory read/write.
But on some platforms, there are bus hosts that access I/O port space with
host-local I/O port addresses rather than memory addresses.
Add a more generic I/O mapping method to support those devices. With this
patch, both the CPU addresses and the host-local port can be mapped into
the logical PIO space with different logical/fake PIOs. After this, all
the I/O accesses to either PCI MMIO devices or host-local I/O peripherals
can be unified into the existing I/O accessors defined in asm-generic/io.h
and be redirected to the right device-specific hooks based on the input
logical PIO.
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
[bhelgaas: remove -EFAULT return from logic_pio_register_range() per
https://lkml.kernel.org/r/20180403143909.GA21171@ulmo, fix NULL pointer
checking per https://lkml.kernel.org/r/20180403211505.GA29612@embeddedor.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
If a processor supports special metadata for a page, for example ADI
version tags on SPARC M7, this metadata must be saved when the page is
swapped out. The same metadata must be restored when the page is swapped
back in. This patch adds two new architecture specific functions -
arch_do_swap_page() to be called when a page is swapped in, and
arch_unmap_one() to be called when a page is being unmapped for swap
out. These architecture hooks allow page metadata to be saved if the
architecture supports it.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comments are factored out from the code changes to make them
easier to read. Add them separately to explain some non-obvious
aspects.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/cc595efc644bb905407012d82d3eb8bac3368e7a.1517246437.git.dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
KASAN uses compiler instrumentation to intercept all memory accesses. But it does
not see memory accesses done in assembly code. One notable user of assembly code
is atomic operations. Frequently, for example, an atomic reference decrement is
the last access to an object and a good candidate for a racy use-after-free.
Add manual KASAN checks to atomic operations.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>,
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/2fa6e7f0210fd20fe404e5b67e6e9213af2b69a1.1517246437.git.dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In commit 316ca8804e ("ACPI/IORT: Remove linker section for IORT entries
probing"), iort entries was removed in vmlinux.lds.h. But in
commit 2fcc112af3 ("clocksource/drivers: Rename clksrc table to timer"),
this line was back incorrectly.
It does no harm except for adding some useless symbols, so fix it.
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
This patch introduces kvm_para_has_hint() to query for hints about
the configuration of the guests. The first hint KVM_HINTS_DEDICATED,
is set if the guest has dedicated physical CPUs for each vCPU (i.e.
pinning and no over-commitment). This allows optimizing spinlocks
and tells the guest to avoid PV TLB flush.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
It allows some architectures to use this generic macro instead of
defining theirs.
sparc: io: To use the define of ioremap_[nocache|wc|wb] in asm-generic/io.h
It will move the ioremap_nocache out of the CONFIG_MMU ifdef. This means that
in order to suppress re-definition errors we need to remove the #define
in arch/sparc/include/asm/io_32.h. Also, the change adds a prototype for
ioremap where size is size_t and offset is phys_addr_t so fix that as well.
Signed-off-by: Greentime Hu <greentime@andestech.com>
Looking at functions with large stack frames across all architectures
led me discovering that BUG() suffers from the same problem as
fortify_panic(), which I've added a workaround for already.
In short, variables that go out of scope by calling a noreturn function
or __builtin_unreachable() keep using stack space in functions
afterwards.
A workaround that was identified is to insert an empty assembler
statement just before calling the function that doesn't return. I'm
adding a macro "barrier_before_unreachable()" to document this, and
insert calls to that in all instances of BUG() that currently suffer
from this problem.
The files that saw the largest change from this had these frame sizes
before, and much less with my patch:
fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]
In case of ARC and CRIS, it turns out that the BUG() implementation
actually does return (or at least the compiler thinks it does),
resulting in lots of warnings about uninitialized variable use and
leaving noreturn functions, such as:
block/cfq-iosched.c: In function 'cfq_async_queue_prio':
block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
include/linux/dmaengine.h: In function 'dma_maxpq':
include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]
This makes them call __builtin_trap() instead, which should normally
dump the stack and kill the current process, like some of the other
architectures already do.
I tried adding barrier_before_unreachable() to panic() and
fortify_panic() as well, but that had very little effect, so I'm not
submitting that patch.
Vineet said:
: For ARC, it is double win.
:
: 1. Fixes 3 -Wreturn-type warnings
:
: | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
: [-Wreturn-type]
: | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
: [-Wreturn-type]
: | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
: non-void function [-Wreturn-type]
:
: 2. bloat-o-meter reports code size improvements as gcc elides the
: generated code for stack return.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Tested-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.
The change doesn't affect the kernel image size much:
text data bss dec hex filename
8628091 4734304 1368064 14730459 e0c4db vmlinux.before
8628393 4734340 1368064 14730797 e0c62d vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A test_and_{}_bit() operation fails if the value of the bit is such that
the modification does not take place. For example, if test_and_set_bit()
returns 1. In these cases, follow the behaviour of cmpxchg and allow the
operation to be unordered. This also applies to test_and_set_bit_lock()
if the lock is found to be be taken already.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1518528619-20049-1-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This tag contains the fixes we'd like to target for the 4.16 merge
window. It's not as much as I was originally hoping to do but between
glibc, the chip, and FOSDEM there just wasn't enough time to get
everything put together. As such, this merge window is essentially just
going to be small changes. This includes mostly cleanups:
* A build fix failure to the audit test cases. RISC-V doesn't have
renameat because the generic syscall ABI moved to renameat2 by the
time of our port. The syscall audit test cases don't understand this,
so I added a trivial fix. This went through mailing list review
during the 4.15 merge window, but nobody has picked it up so I think
it's best to just do this here.
* The removal of our command-line argument processing code. The
"mem_end" stuff was broken and the rest duplicated generic device tree
code. The generic code was already being called.
* Some unused/redundant code has been removed, including
__ARCH_HAVE_MMU, current_pgdir, and the initialization of init_mm.pgd.
* SUM is disabled upon taking a trap, which means that user memory is
protected during traps taking inside copy_{to,from}_user().
* The sptbr CSR has been renamed to satp in C code. We haven't changed
the assembly code in order to maintain compatibility with binutils
2.29, which doesn't understand the new name.
Additionally, we're adding some new features:
* Basic ftrace support, thanks to Alan Kao!
* Support for ZONE_DMA32. This is necessary for all the normal reasons,
but also to deal with a deficiency in the Xilinx PCIe controller we're
using on our FPGA-based systems. While the ZONE_DMA32 addition should
be sufficient for most uses, it doesn't complete the fix for the
Xilinx controller.
* TLB shootdowns now only target the harts where they're necessary,
instead of applying to all harts in the system.
These patches have all been sitting on our linux-next branch for a while
now. Due to time constraints this is all I feel comfortable submitting
during the 4.16 merge window, hopefully we'll do better next time!
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlp7N2gTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQX8kD/4xxw6TuuESmDXxAQPQ+S8J98uKRfAF
9kMMzJJARcW5sT1vo3pKpE8+Ss0Hy2fIcaYsw5Je/Yl7vdAy/Dk7X3/mx7mxf5BP
8m2cSd7DFLLLhntZTbr1Y5fJ6awFLtzI46zn/SzTdTatLWKXNLS5wmPKE33ddq/C
iTi4k/as8E/vuNtuPy1GsOF0gICpZ2xB4YoMwTgWfpxTekBkUktO3EOHmZTwQEEM
U1muB+4WoqusbBt6cP3Q7cUF3b6aMVSevWnywZGkD+yWOGRXTVzMgT7R4YlKEOre
OQypZocYUbRmZQMZACKpgHIcOZpePaSTIQ2zzhXEPVGB0XAHtMRnAaVtwPxwG6c4
EThDCN9ldShutKqT4XilHrh5gf0sy7qG0PIidPhMmXH9LCeTSAU4VdISJP1jkq19
chiMHlf6+/DhikyiH0+lK/MX8vQMt6UJL1SlRKO/c2FxxKAZKnENJ+tuAlkAlwoC
gnvZsE5BUYw1ptRHXR0d5C4m8M2M9LPZfpWYcg+1mRO9EA+kt0XCupL7RsrdFuoa
FCVEhP/JMaiX0JtmAHfVIU0yNGjH3b5xi3FoGk2Aoj/c8O3F5YcwT5C5nO+jpv32
n9vyMR20/721+yA2dFIlq4DnelwdZczOTqrcDYJrLxXzk8OXUFFffbe4kbDCxp34
WniBxwnY9BF25g==
=cNRH
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains the fixes we'd like to target for the 4.16 merge window.
It's not as much as I was originally hoping to do but between glibc,
the chip, and FOSDEM there just wasn't enough time to get everything
put together. As such, this merge window is essentially just going to
be small changes. This includes mostly cleanups:
- A build fix failure to the audit test cases.
RISC-V doesn't have renameat because the generic syscall ABI moved
to renameat2 by the time of our port. The syscall audit test cases
don't understand this, so I added a trivial fix. This went through
mailing list review during the 4.15 merge window, but nobody has
picked it up so I think it's best to just do this here.
- The removal of our command-line argument processing code. The
"mem_end" stuff was broken and the rest duplicated generic device
tree code. The generic code was already being called.
- Some unused/redundant code has been removed, including
__ARCH_HAVE_MMU, current_pgdir, and the initialization of
init_mm.pgd.
- SUM is disabled upon taking a trap, which means that user memory is
protected during traps taking inside copy_{to,from}_user().
- The sptbr CSR has been renamed to satp in C code. We haven't
changed the assembly code in order to maintain compatibility with
binutils 2.29, which doesn't understand the new name.
Additionally, we're adding some new features:
- Basic ftrace support, thanks to Alan Kao!
- Support for ZONE_DMA32.
This is necessary for all the normal reasons, but also to deal with
a deficiency in the Xilinx PCIe controller we're using on our
FPGA-based systems. While the ZONE_DMA32 addition should be
sufficient for most uses, it doesn't complete the fix for the
Xilinx controller.
- TLB shootdowns now only target the harts where they're necessary,
instead of applying to all harts in the system.
These patches have all been sitting on our linux-next branch for a
while now. Due to time constraints this is all I feel comfortable
submitting during the 4.16 merge window, hopefully we'll do better
next time!"
[ Note to self: "harts" is RISC-V speak for "hardware threads". I had
to look that up. - Linus ]
* tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
riscv: inline set_pgdir into its only caller
riscv: rename sptbr to satp
riscv: don't read back satp in paging_init
riscv: remove the unused current_pgdir function
riscv: add ZONE_DMA32
RISC-V: Limit the scope of TLB shootdowns
riscv: disable SUM in the exception handler
riscv: remove redundant unlikely()
riscv: remove unused __ARCH_HAVE_MMU define
riscv/ftrace: Add basic support
RISC-V: Remove mem_end command line processing
RISC-V: Remove duplicate command-line parsing logic
audit: Avoid build failures on systems without renameat
Pull locking fixlets from Ingo Molnar:
"An endianness fix and a jump labels branch hint update"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/qrwlock: include asm/byteorder.h as needed
jump_label: Add branch hints to static_branch_{un,}likely()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJad5lgAAoJEFmIoMA60/r8s2kQAI3PztawDpaCP9Z12pkbBHSt
Ho0xTyk9rCZi9kQJbNjc+a+QrlA3QmTHXIXerB3LSWoh7M+XhsECjem92eHpgLNS
JvYPhTfOrCr0vdiAmOz6hD0AqN/psrbfzgiJhSwomsGEFS77k7kERSJckRv81sxb
Aj5F/WjucAgLorwm4auveAJEQ7atE7/6pkXzoqYm4G6NLOb46jUcRGndrnvXZBlz
fws8fBM4BHyi7i25CYQl24tFq1CGax1rIPgLg+4KnH76bQk/N6Ju0sGVSzfh+hG8
SIerK9bJbzGRAuNKoxB3aO1dyzsK3x9WztE2mG98w5trOISPIR1FqnvC/225FWAU
d6eIXiC7wKnEx+DElNTzCjzfHc7SAJoupO32H7CoiTe5zPUlWlxJ1zLYkK1gt50q
m8PRBiYTglxyznzrO0drtcdjEzvbdZNRrsYnul4wi1vSHzjk6F6XLtzT10XWM1M1
1pXLB8384FTj0Hu4bq6Y3Aivkmz0Sf+eQM2NaOwe+Zj7/1VV0d3lvi4LUXkqzLCA
FoXPJSMxG2Qu+iflCeYRQBJjExaZH3eNLZ3dT6QpcJrjaFVedd9u5DeeFqNL27zV
bhr8TdqrR4p4rc8EBAGoCapw96IxLZROKB3gxbrZVOpfIZpzthwHbElHX6aqUgF4
w/EV1JWs36WXWaxFk8wd
=ttq9
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- skip AER driver error recovery callbacks for correctable errors
reported via ACPI APEI, as we already do for errors reported via the
native path (Tyler Baicar)
- fix DPC shared interrupt handling (Alex Williamson)
- print full DPC interrupt number (Keith Busch)
- enable DPC only if AER is available (Keith Busch)
- simplify DPC code (Bjorn Helgaas)
- calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
Helgaas)
- enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
Helgaas)
- move ASPM internal interfaces out of public header (Bjorn Helgaas)
- allow hot-removal of VGA devices (Mika Westerberg)
- speed up unplug and shutdown by assuming Thunderbolt controllers
don't support Command Completed events (Lukas Wunner)
- add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
Jay Cornwall)
- expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)
- clean up PCI DMA interface usage (Christoph Hellwig)
- remove PCI pool API (replaced with DMA pool) (Romain Perier)
- deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
Kaya)
- move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)
- add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)
- remove warnings on sysfs mmap failure (Bjorn Helgaas)
- quiet ROM validation messages (Alex Deucher)
- remove redundant memory alloc failure messages (Markus Elfring)
- fill in types for compile-time VGA and other I/O port resources
(Bjorn Helgaas)
- make "pci=pcie_scan_all" work for Root Ports as well as Downstream
Ports to help AmigaOne X1000 (Bjorn Helgaas)
- add SPDX tags to all PCI files (Bjorn Helgaas)
- quirk Marvell 9128 DMA aliases (Alex Williamson)
- quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)
- fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
Cassel)
- use DMA API to get MSI address for DesignWare IP (Niklas Cassel)
- fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)
- fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)
- add support for ARTPEC-7 SoC (Niklas Cassel)
- add endpoint-mode support for ARTPEC (Niklas Cassel)
- add Cadence PCIe host and endpoint controller driver (Cyrille
Pitchen)
- handle multiple INTx status bits being set in dra7xx (Vignesh R)
- translate dra7xx hwirq range to fix INTD handling (Vignesh R)
- remove deprecated Exynos PHY initialization code (Jaehoon Chung)
- fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)
- fix NULL pointer dereference in iProc BCMA driver (Ray Jui)
- fix Keystone interrupt-controller-node lookup (Johan Hovold)
- constify qcom driver structures (Julia Lawall)
- rework Tegra config space mapping to increase space available for
endpoints (Vidya Sagar)
- simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)
- remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)
- add support for Global Fabric Manager Server (GFMS) event to
Microsemi Switchtec switch driver (Logan Gunthorpe)
- add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)
* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
PCI: endpoint: Fix EPF device name to support multi-function devices
PCI: endpoint: Add the function number as argument to EPC ops
PCI: cadence: Add host driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
PCI: Add vendor ID for Cadence
PCI: Add generic function to probe PCI host controllers
PCI: generic: fix missing call of pci_free_resource_list()
PCI: OF: Add generic function to parse and allocate PCI resources
PCI: Regroup all PCI related entries into drivers/pci/Makefile
PCI/DPC: Reformat DPC register definitions
PCI/DPC: Add and use DPC Status register field definitions
PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
PCI/DPC: Remove unnecessary RP PIO register structs
PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
PCI/DPC: Make RP PIO log size check more generic
PCI/DPC: Rename local "status" to "dpc_status"
PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
...
Moving the qrwlock struct definition into a header file introduced
a subtle bug on all little-endian machines, where some files in some
configurations would see the fields in an incorrect order. This was
found by building with an LTO enabled compiler that warns every time we
try to link together files with incompatible data structures.
A second patch changes linux/kconfig.h to always define the symbols,
but this seems to be the root cause of most of the issues, so I'd suggest
we do both.
On a current linux-next kernel, I verified that this header is
responsible for all type mismatches as a result from the endianess
confusion.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: e0d02285f1 ("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'")
Link: http://lkml.kernel.org/r/20180202154104.1522809-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
to the clk rate protection support added by Jerome Brunet. This feature
will allow consumers to lock in a certain rate on the output of a clk so
that things like audio playback don't hear pops when the clk frequency
changes due to shared parent clks changing rates. Currently the clk
API doesn't guarantee the rate of a clk stays at the rate you request
after clk_set_rate() is called, so this new API will allow drivers
to express that requirement. Beyond this, the core got some debugfs
pretty printing patches and a couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h file
causes small one-liners in all the architecture Kbuild files. Overall, the
driver diff seems to be the normal stuff that comes all the time to
fix little problems here and there and to support new hardware.
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJac5vRAAoJEK0CiJfG5JUlUaIP/Riq0tbApfc4k4GMvSvaieR/
AwZFIMCxOxO+KGdUsBWj7UUoDfBYmxyknHZkVUA/m+Lm7cRH/YHHMghEceZLaBYW
zPQmDfkTl/QkwysXZMCw9vg4vO0tt5gWbHljQnvVhxVVTCkIRpaE8Vkktj1RZzpY
WU/TkvPbVGY3SNm504TRXKWC9KpMTEXVvzqlg6zLDJ/jE7PGzBKtewqMoLDCBH2L
q6b50BSXDo2Hep0vm6e5xneXKjLNR4kgN4PkbM4Yoi4iWLLbgAu79NfyOvvr/imS
HxOHRms9tejtyaiR6bQSF0pbLOERZ3QSbMFEbxdxnCTuPEfy3Nw/2W7mNJlhJa8g
EGLMnLL4WdloL4Z83dAcMrj9OmxYf7Yobf5dMidLrQT5EYuafdj0ParbI8TQpWSB
eTqaffSUGPE/7xuKouYBcbvocpXXWCcokrP/mEn3OEHXkIeeut1Jd3RmEvsi3gtJ
pNraJTIpvt4c05rj6yLUOhWfyqlA+fH3p4Fx3rrH1tmKEiG+lrhKoxF26uALZe0V
OvarhG+LPIE10pCIYlQjZjQVnYLGCxsGAIoK1uz7VYvFPh2T0cxQlzzeqFgrlTyN
32hMj3LhkQw82FG9xZqjTX1935R35mySRlx63x7HStI1YFief2X9+RHjJR/lofG0
nC0JWTp5sC/pKf54QBXj
=bGPp
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The core framework has a handful of patches this time around, mostly
due to the clk rate protection support added by Jerome Brunet.
This feature will allow consumers to lock in a certain rate on the
output of a clk so that things like audio playback don't hear pops
when the clk frequency changes due to shared parent clks changing
rates. Currently the clk API doesn't guarantee the rate of a clk stays
at the rate you request after clk_set_rate() is called, so this new
API will allow drivers to express that requirement.
Beyond this, the core got some debugfs pretty printing patches and a
couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h
file causes small one-liners in all the architecture Kbuild files.
Overall, the driver diff seems to be the normal stuff that comes all
the time to fix little problems here and there and to support new
hardware.
Summary:
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
clk: aspeed: Handle inverse polarity of USB port 1 clock gate
clk: aspeed: Fix return value check in aspeed_cc_init()
clk: aspeed: Add reset controller
clk: aspeed: Register gated clocks
clk: aspeed: Add platform driver and register PLLs
clk: aspeed: Register core clocks
clk: Add clock driver for ASPEED BMC SoCs
clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built
clk: fix reentrancy of clk_enable() on UP systems
clk: meson-axg: fix potential NULL dereference in axg_clkc_probe()
clk: Simplify debugfs registration
clk: Fix debugfs_create_*() usage
clk: Show symbolic clock flags in debugfs
clk: renesas: r8a7796: Add FDP clock
clk: Move __clk_{get,put}() into private clk.h API
clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks
clk: Improve flags doc for of_clk_detect_critical()
arch: Remove clkdev.h asm-generic from Kbuild
clk: sunxi-ng: a83t: Add M divider to TCON1 clock
clk: Prepare to remove asm-generic/clkdev.h
...
Pull printk updates from Petr Mladek:
- Add a console_msg_format command line option:
The value "default" keeps the old "[time stamp] text\n" format. The
value "syslog" allows to see the syslog-like "<log
level>[timestamp] text" format.
This feature was requested by people doing regression tests, for
example, 0day robot. They want to have both filtered and full logs
at hands.
- Reduce the risk of softlockup:
Pass the console owner in a busy loop.
This is a new approach to the old problem. It was first proposed by
Steven Rostedt on Kernel Summit 2017. It marks a context in which
the console_lock owner calls console drivers and could not sleep.
On the other side, printk() callers could detect this state and use
a busy wait instead of a simple console_trylock(). Finally, the
console_lock owner checks if there is a busy waiter at the end of
the special context and eventually passes the console_lock to the
waiter.
The hand-off works surprisingly well and helps in many situations.
Well, there is still a possibility of the softlockup, for example,
when the flood of messages stops and the last owner still has too
much to flush.
There is increasing number of people having problems with
printk-related softlockups. We might eventually need to get better
solution. Anyway, this looks like a good start and promising
direction.
- Do not allow to schedule in console_unlock() called from printk():
This reverts an older controversial commit. The reschedule helped
to avoid softlockups. But it also slowed down the console output.
This patch is obsoleted by the new console waiter logic described
above. In fact, the reschedule made the hand-off less effective.
- Deprecate "%pf" and "%pF" format specifier:
It was needed on ia64, ppc64 and parisc64 to dereference function
descriptors and show the real function address. It is done
transparently by "%ps" and "pS" format specifier now.
Sergey Senozhatsky found that all the function descriptors were in
a special elf section and could be easily detected.
- Remove printk_symbol() API:
It has been obsoleted by "%pS" format specifier, and this change
helped to remove few continuous lines and a less intuitive old API.
- Remove redundant memsets:
Sergey removed unnecessary memset when processing printk.devkmsg
command line option.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
printk: drop redundant devkmsg_log_str memsets
printk: Never set console_may_schedule in console_trylock()
printk: Hide console waiter logic into helpers
printk: Add console owner and waiter logic to load balance console writes
kallsyms: remove print_symbol() function
checkpatch: add pF/pf deprecation warning
symbol lookup: introduce dereference_symbol_descriptor()
parisc64: Add .opd based function descriptor dereference
powerpc64: Add .opd based function descriptor dereference
ia64: Add .opd based function descriptor dereference
sections: split dereference_function_descriptor()
openrisc: Fix conflicting types for _exext and _stext
lib: do not use print_symbol()
irq debug: do not use print_symbol()
sysfs: do not use print_symbol()
drivers: do not use print_symbol()
x86: do not use print_symbol()
unicore32: do not use print_symbol()
sh: do not use print_symbol()
mn10300: do not use print_symbol()
...
Merge updates from Andrew Morton:
- misc fixes
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
mm: remove PG_highmem description
tools, vm: new option to specify kpageflags file
mm/swap.c: make functions and their kernel-doc agree
mm, memory_hotplug: fix memmap initialization
mm: correct comments regarding do_fault_around()
mm: numa: do not trap faults on shared data section pages.
hugetlb, mbind: fall back to default policy if vma is NULL
hugetlb, mempolicy: fix the mbind hugetlb migration
mm, hugetlb: further simplify hugetlb allocation API
mm, hugetlb: get rid of surplus page accounting tricks
mm, hugetlb: do not rely on overcommit limit during migration
mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
mm, hugetlb: unify core page allocation accounting and initialization
mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
mm/memcontrol.c: make local symbol static
mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
mm/compaction.c: fix comment for try_to_compact_pages()
mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
zsmalloc: use U suffix for negative literals being shifted
...
Instead of marking the pmd ready for split, invalidate the pmd. This
should take care of powerpc requirement. Only side effect is that we
mark the pmd invalid early. This can result in us blocking access to
the page a bit longer if we race against a thp split.
[kirill.shutemov@linux.intel.com: rebased, dirty THP once]
Link: http://lkml.kernel.org/r/20171213105756.69879-13-kirill.shutemov@linux.intel.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil noted that pmdp_invalidate() is not atomic and we can lose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().
The patch change pmdp_invalidate() to make the entry non-present
atomically and return previous value of the entry. This value can be
used to check if CPU set dirty/accessed bits under us.
The race window is very small and I haven't seen any reports that can be
attributed to the bug. For this reason, I don't think backporting to
stable trees needed.
Link: http://lkml.kernel.org/r/20171213105756.69879-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Do not lose dirty bit on THP pages", v4.
Vlastimil noted that pmdp_invalidate() is not atomic and we can lose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().
The bug can lead to data loss, but the race window is tiny and I haven't
seen any reports that suggested that it happens in reality. So I don't
think it worth sending it to stable.
Unfortunately, there's no way to address the issue in a generic way. We
need to fix all architectures that support THP one-by-one.
All architectures that have THP supported have to provide atomic
pmdp_invalidate() that returns previous value.
If generic implementation of pmdp_invalidate() is used, architecture
needs to provide atomic pmdp_estabish().
pmdp_estabish() is not used out-side generic implementation of
pmdp_invalidate() so far, but I think this can change in the future.
This patch (of 12):
This is an implementation of pmdp_establish() that is only suitable for
an architecture that doesn't have hardware dirty/accessed bits. In this
case we can't race with CPU which sets these bits and non-atomic
approach is fine.
Link: http://lkml.kernel.org/r/20171213105756.69879-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
1) Significantly shrink the core networking routing structures. Result
of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf
2) Add netdevsim driver for testing various offloads, from Jakub
Kicinski.
3) Support cross-chip FDB operations in DSA, from Vivien Didelot.
4) Add a 2nd listener hash table for TCP, similar to what was done for
UDP. From Martin KaFai Lau.
5) Add eBPF based queue selection to tun, from Jason Wang.
6) Lockless qdisc support, from John Fastabend.
7) SCTP stream interleave support, from Xin Long.
8) Smoother TCP receive autotuning, from Eric Dumazet.
9) Lots of erspan tunneling enhancements, from William Tu.
10) Add true function call support to BPF, from Alexei Starovoitov.
11) Add explicit support for GRO HW offloading, from Michael Chan.
12) Support extack generation in more netlink subsystems. From Alexander
Aring, Quentin Monnet, and Jakub Kicinski.
13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
Russell King.
14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.
15) Many improvements and simplifications to the NFP driver bpf JIT,
from Jakub Kicinski.
16) Support for ipv6 non-equal cost multipath routing, from Ido
Schimmel.
17) Add resource abstration to devlink, from Arkadi Sharshevsky.
18) Packet scheduler classifier shared filter block support, from Jiri
Pirko.
19) Avoid locking in act_csum, from Davide Caratti.
20) devinet_ioctl() simplifications from Al viro.
21) More TCP bpf improvements from Lawrence Brakmo.
22) Add support for onlink ipv6 route flag, similar to ipv4, from David
Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
tls: Add support for encryption using async offload accelerator
ip6mr: fix stale iterator
net/sched: kconfig: Remove blank help texts
openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
tcp_nv: fix potential integer overflow in tcpnv_acked
r8169: fix RTL8168EP take too long to complete driver initialization.
qmi_wwan: Add support for Quectel EP06
rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
ipmr: Fix ptrdiff_t print formatting
ibmvnic: Wait for device response when changing MAC
qlcnic: fix deadlock bug
tcp: release sk_frag.page in tcp_disconnect
ipv4: Get the address of interface correctly.
net_sched: gen_estimator: fix lockdep splat
net: macb: Handle HRESP error
net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
ipv6: addrconf: break critical section in addrconf_verify_rtnl()
ipv6: change route cache aging logic
i40e/i40evf: Update DESC_NEEDED value to reflect larger value
bnxt_en: cleanup DIM work on device shutdown
...
This pull requests contains a consolidation of the generic no-IOMMU code,
a well as the glue code for swiotlb. All the code is based on the x86
implementation with hooks to allow all architectures that aren't cache
coherent to use it. The x86 conversion itself has been deferred because
the x86 maintainers were a little busy in the last months.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlpxcVoLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYN/Lw/+Je9teM4NPQ8lU/ncbJN/bUzCFGJ6dFt2eVX/6xs3
sfl8vBdeHt6CBM02rRNecEr31z3+orjQes5JnlEJFYeG3jumV0zCPw/zbxqjzbJ1
3n6cckLxbxzy8Ca1G/BVjHLAUX5eWp1ujn/Q4d03VKVQZhJvFYlqDbP3TrNVx7xn
k86u37p/o+ngjwX66UdZ3C4iIBF8zqy6n2kkpv4HUQtHHzPwEvliN39eNilovb56
iGOzjDX1UWHAu4xCTVnPHSG4fA4XU41NWzIN3DIVPE25lYSISSl9TFAdR8GeZA0G
0Yj6sW53pRSoUwco1ocoS44/FgrPOB5/vHIL06pABvicXBiomje1QylqcK7zAczk
esjkfPEZrmZuu99GtqFyDNKEvKKdy+aBGaTZ3y+NxsuBs+0xS2Owz1IE4Tk28xaw
xh7zn+CVdk2fJh6ZIdw5Eu9b9VN08UriqDmDzO/ylDlcNGcDi7wcxiSTEkHJ1ON/
g9nletV6f3egL0wljDcOnhCJCHTvmWEeq3z8lE55QzPzSH0hHpnGQ2WD0tKrroxz
kjOZp0TdXa4F5iysOHe2xl2sftOH0zIkBQJ+oBcK12mTaLu21+yeuCggQXJ/CBdk
1Ol7l9g9T0TDuZPfiTHt5+6jmECQs92LElWA8x7uF7Fpix3BpnafWaaSMSsosF3F
D1Y=
=Nrl9
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping
Pull dma mapping updates from Christoph Hellwig:
"Except for a runtime warning fix from Christian this is all about
consolidation of the generic no-IOMMU code, a well as the glue code
for swiotlb.
All the code is based on the x86 implementation with hooks to allow
all architectures that aren't cache coherent to use it.
The x86 conversion itself has been deferred because the x86
maintainers were a little busy in the last months"
* tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping: (57 commits)
MAINTAINERS: add the iommu list for swiotlb and xen-swiotlb
arm64: use swiotlb_alloc and swiotlb_free
arm64: replace ZONE_DMA with ZONE_DMA32
mips: use swiotlb_{alloc,free}
mips/netlogic: remove swiotlb support
tile: use generic swiotlb_ops
tile: replace ZONE_DMA with ZONE_DMA32
unicore32: use generic swiotlb_ops
ia64: remove an ifdef around the content of pci-dma.c
ia64: clean up swiotlb support
ia64: use generic swiotlb_ops
ia64: replace ZONE_DMA with ZONE_DMA32
swiotlb: remove various exports
swiotlb: refactor coherent buffer allocation
swiotlb: refactor coherent buffer freeing
swiotlb: wire up ->dma_supported in swiotlb_dma_ops
swiotlb: add common swiotlb_map_ops
swiotlb: rename swiotlb_free to swiotlb_exit
x86: rename swiotlb_dma_ops
powerpc: rename swiotlb_dma_ops
...
renameat has been deprecated in favor of renameat2 for new ports. This
allows the audit tests to build on RISC-V.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Add SPDX GPL-2.0+ to all PCI files that specified the GPL and allowed
either GPL version 2 or any later version.
Remove the boilerplate GPL version 2 or later language, relying on the
assertion in b24413180f ("License cleanup: add SPDX GPL-2.0 license
identifier to files with no license") that the SPDX identifier may be used
instead of the full boilerplate text.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The trivial direct mapping implementation already does a virtual to
physical translation which isn't strictly a noop, and will soon learn
to do non-direct but linear physical to dma translations through the
device offset and a few small tricks. Rename it to a better fitting
name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
For architectures that just use the generic dma_noop_ops we can provide
a generic version of dma-mapping.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add injectable error types for each error-injectable function.
One motivation of error injection test is to find software flaws,
mistakes or mis-handlings of expectable errors. If we find such
flaws by the test, that is a program bug, so we need to fix it.
But if the tester miss input the error (e.g. just return success
code without processing anything), it causes unexpected behavior
even if the caller is correctly programmed to handle any errors.
That is not what we want to test by error injection.
To clarify what type of errors the caller must expect for each
injectable function, this introduces injectable error types:
- EI_ETYPE_NULL : means the function will return NULL if it
fails. No ERR_PTR, just a NULL.
- EI_ETYPE_ERRNO : means the function will return -ERRNO
if it fails.
- EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO
(ERR_PTR) or NULL.
ALLOW_ERROR_INJECTION() macro is expanded to get one of
NULL, ERRNO, ERRNO_NULL to record the error type for
each function. e.g.
ALLOW_ERROR_INJECTION(open_ctree, ERRNO)
This error types are shown in debugfs as below.
====
/ # cat /sys/kernel/debug/error_injection/list
open_ctree [btrfs] ERRNO
io_ctl_init [btrfs] ERRNO
====
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.
Some differences has been made:
- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
feature. It is automatically enabled if the arch supports
error injection feature for kprobe or ftrace etc.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Construct the init thread stack in the linker script rather than doing it
by means of a union so that ia64's init_task.c can be got rid of.
The following symbols are then made available from INIT_TASK_DATA() linker
script macro:
init_thread_union
init_stack
INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the
size of the init stack. init_thread_union is given its own section so that
it can be placed into the stack space in the right order. I'm assuming
that the ia64 ordering is correct and that the task_struct is first and the
thread_info second.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Will Deacon <will.deacon@arm.com> (arm64)
Tested-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
There are two format specifiers to print out a pointer in symbolic
format: '%pS/%ps' and '%pF/%pf'. On most architectures, the two
mean exactly the same thing, but some architectures (ia64, ppc64,
parisc64) use an indirect pointer for C function pointers, where
the function pointer points to a function descriptor (which in
turn contains the actual pointer to the code). The '%pF/%pf, when
used appropriately, automatically does the appropriate function
descriptor dereference on such architectures.
The "when used appropriately" part is tricky. Basically this is
a subtle ABI detail, specific to some platforms, that made it to
the API level and people can be unaware of it and miss the whole
"we need to dereference the function" business out. [1] proves
that point (note that it fixes only '%pF' and '%pS', there might
be '%pf' and '%ps' cases as well).
It appears that we can handle everything within the affected
arches and make '%pS/%ps' smart enough to retire '%pF/%pf'.
Function descriptors live in .opd elf section and all affected
arches (ia64, ppc64, parisc64) handle it properly for kernel
and modules. So we, technically, can decide if the dereference
is needed by simply looking at the pointer: if it belongs to
.opd section then we need to dereference it.
The kernel and modules have their own .opd sections, obviously,
that's why we need to split dereference_function_descriptor()
and use separate kernel and module dereference arch callbacks.
This patch does the first step, it
a) adds dereference_kernel_function_descriptor() function.
b) adds a weak alias to dereference_module_function_descriptor()
function.
So, for the time being, we will have:
1) dereference_function_descriptor()
A generic function, that simply dereferences the pointer. There is
bunch of places that call it: kgdbts, init/main.c, extable, etc.
2) dereference_kernel_function_descriptor()
A function to call on kernel symbols that does kernel .opd section
address range test.
3) dereference_module_function_descriptor()
A function to call on modules' symbols that does modules' .opd
section address range test.
[1] https://marc.info/?l=linux-kernel&m=150472969730573
Link: http://lkml.kernel.org/r/20171109234830.5067-2-sergey.senozhatsky@gmail.com
To: Fenghua Yu <fenghua.yu@intel.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Paul Mackerras <paulus@samba.org>
To: Michael Ellerman <mpe@ellerman.id.au>
To: James Bottomley <jejb@parisc-linux.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-ia64@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Tony Luck <tony.luck@intel.com> #ia64
Tested-by: Santosh Sivaraj <santosh@fossix.org> #powerpc
Tested-by: Helge Deller <deller@gmx.de> #parisc64
Signed-off-by: Petr Mladek <pmladek@suse.com>
Now that every architecture is using the generic clkdev.h file
and we no longer include asm/clkdev.h anywhere in the tree, we
can remove it.
Cc: Russell King <linux@armlinux.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
net/ipv6/ip6_gre.c is a case of parallel adds.
include/trace/events/tcp.h is a little bit more tricky. The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 PTI preparatory patches from Thomas Gleixner:
"Todays Advent calendar window contains twentyfour easy to digest
patches. The original plan was to have twenty three matching the date,
but a late fixup made that moot.
- Move the cpu_entry_area mapping out of the fixmap into a separate
address space. That's necessary because the fixmap becomes too big
with NRCPUS=8192 and this caused already subtle and hard to
diagnose failures.
The top most patch is fresh from today and cures a brain slip of
that tall grumpy german greybeard, who ignored the intricacies of
32bit wraparounds.
- Limit the number of CPUs on 32bit to 64. That's insane big already,
but at least it's small enough to prevent address space issues with
the cpu_entry_area map, which have been observed and debugged with
the fixmap code
- A few TLB flush fixes in various places plus documentation which of
the TLB functions should be used for what.
- Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
more than sysenter now and keeping the name makes backtraces
confusing.
- Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
which is only invoked on fork().
- Make vysycall more robust.
- A few fixes and cleanups of the debug_pagetables code. Check
PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
C89 initialization of the address hint array which already was out
of sync with the index enums.
- Move the ESPFIX init to a different place to prepare for PTI.
- Several code moves with no functional change to make PTI
integration simpler and header files less convoluted.
- Documentation fixes and clarifications"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
init: Invoke init_espfix_bsp() from mm_init()
x86/cpu_entry_area: Move it out of the fixmap
x86/cpu_entry_area: Move it to a separate unit
x86/mm: Create asm/invpcid.h
x86/mm: Put MMU to hardware ASID translation in one place
x86/mm: Remove hard-coded ASID limit checks
x86/mm: Move the CR3 construction functions to tlbflush.h
x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
x86/mm: Remove superfluous barriers
x86/mm: Use __flush_tlb_one() for kernel memory
x86/microcode: Dont abuse the TLB-flush interface
x86/uv: Use the right TLB-flush API
x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
x86/mm/64: Improve the memory map documentation
x86/ldt: Prevent LDT inheritance on exec
x86/ldt: Rework locking
arch, mm: Allow arch_dup_mmap() to fail
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
...
init_espfix_bsp() needs to be invoked before the page table isolation
initialization. Move it into mm_init() which is the place where pti_init()
will be added.
While at it get rid of the #ifdeffery and provide proper stub functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
allowed to fail. Fix up all instances.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: dan.j.williams@intel.com
Cc: hughd@google.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Using BPF we can override kprob'ed functions and return arbitrary
values. Obviously this can be a bit unsafe, so make this feature opt-in
for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
order to give BPF access to that function for error injection purposes.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:
did you consider using the other paradigm:
In arch include files:
#define pud_write pud_write
static inline int pud_write(pud_t pud)
.....
Then in include/asm-generic/pgtable.h:
#ifndef pud_write
tatic inline int pud_write(pud_t pud)
{
....
}
#endif
If you had, then the powerpc code would have worked ... ;-) and many
of the other interfaces in include/asm-generic/pgtable.h are
protected that way ...
Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.
Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable. In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().
kernel BUG at ./include/linux/hugetlb.h:244!
[..]
RIP: 0010:follow_devmap_pud+0x482/0x490
[..]
Call Trace:
follow_page_mask+0x28c/0x6e0
__get_user_pages+0xe4/0x6c0
get_user_pages_unlocked+0x130/0x1b0
get_user_pages_fast+0x89/0xb0
iov_iter_get_pages_alloc+0x114/0x4a0
nfs_direct_read_schedule_iovec+0xd2/0x350
? nfs_start_io_direct+0x63/0x70
nfs_file_direct_read+0x1e0/0x250
nfs_file_read+0x90/0xc0
For now this just implements a simple check for the _PAGE_RW bit similar
to pmd_write. However, this implies that the gup-slow-path check is
missing the extra checks that the gup-fast-path performs with
pud_access_permitted. Later patches will align all checks to use the
'access_permitted' helper if the architecture provides it.
Note that the generic 'access_permitted' helper fallback is the simple
_PAGE_RW check on architectures that do not define the
'access_permitted' helper(s).
[dan.j.williams@intel.com: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a00cc7d9dd ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit a7be6e5a7f ("mm: drop useless local parameters of
__register_one_node()") removed the last user of parent_node().
The parent_node() macro in generic situation is unnecessary.
Remove it for cleanup.
Link: http://lkml.kernel.org/r/1504234599-29533-8-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prior to v4.11, x86 used warn_slowpath_fmt() for handling WARN()s.
After WARN() was moved to using UD0 on x86, the warning text started
appearing _before_ the "cut here" line. This appears to have been a
long-standing bug on architectures that used __WARN_TAINT, but it didn't
get fixed.
v4.11 and earlier on x86:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2956 at drivers/misc/lkdtm_bugs.c:65 lkdtm_WARNING+0x21/0x30
This is a warning message
Modules linked in:
v4.12 and later on x86:
This is a warning message
------------[ cut here ]------------
WARNING: CPU: 1 PID: 2982 at drivers/misc/lkdtm_bugs.c:68 lkdtm_WARNING+0x15/0x20
Modules linked in:
With this fix:
------------[ cut here ]------------
This is a warning message
WARNING: CPU: 3 PID: 3009 at drivers/misc/lkdtm_bugs.c:67 lkdtm_WARNING+0x15/0x20
Since the __FILE__ reporting happens as part of the UD0 handler, it
isn't trivial to move the message to after the WARNING line, but at
least we can fix the position of the "cut here" line so all the various
logging tools will start including the actual runtime warning message
again, when they follow the instruction and "cut here".
Link: http://lkml.kernel.org/r/1510100869-73751-4-git-send-email-keescook@chromium.org
Fixes: 9a93848fe7 ("x86/debug: Implement __WARN() using UD0")
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "cut here" string is used in a few paths. Define it in a single
place.
Link: http://lkml.kernel.org/r/1510100869-73751-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I like _ONCE warnings because it's guaranteed that they don't flood the
log.
During testing I find it useful to reset the state of the once warnings,
so that I can rerun tests and see if they trigger again, or can
guarantee that a test run always hits the same warnings.
This patch adds a debugfs interface to reset all the _ONCE warnings so
that they appear again:
echo 1 > /sys/kernel/debug/clear_warn_once
This is implemented by putting all the warning booleans into a special
section, and clearing it.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20171017221455.6740-1-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAloJ+P8UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQVeRaWujKfIqbhg//S7a4vVnOz9ttK3lR+w4rXHQnsH4a
h+8X9+FYwtcfXNqfki/MGbmOerh1pqJZstWGVJ1kVizJxjPGsF8ZzcaBXFmlfflu
E6kX17L9n//1A0VZTYKCqqqh4rRPFqNdF0NDySO6p0I6PRZ1xHBWw6f1aIGabDru
ZpwWkiF/PliP3/vEPJkux84ovwc0+vg8DNhwtlN5StT6ty5YhM8KMuYCnaGtdbvC
qBZYWNxOSxdf6Lw7Yq7yeziLa7+yvgUbGCrZDF40T/WDLl/CZILX5KuVkDlgKFl2
wgZz5cesAEKFLalEZ3w2EAhZ7a941BWwkbcR0NJiCznTnUUAOD2EZVpapnSqb5/5
XcOGWKetx2BZvpj1hZ8KJa9Nd4Qr/w0iZoreUDnGeItAepNXLQ3Cu0wTdjPGybn5
N8EtL0FSlnUe6FAva+HXNXmrt9fDo9Vj4X3b32un1c0JiFULCBAb6QO83UG/FCzv
svFjIG4HnrjnEHa3B4pxN083ljSZGJttCjnjErEM0oGDmUCfQ8b9WGvkTJEQpPA6
wEAZWPL5MwwWhwZM+qiq+vFTVNLKFXFZO1aPerw8InANoMIqqkC4M4zaOSW2yZd9
ixrATqmikuBK2mG9msG3lOB+XEs6Yz0qF10VQP0NzdeSKVmoZQROQS9Y1nacmKOC
jWgvyVPQ9jajByM=
=SKna
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Another relatively small pull request for audit, nine patches total.
The only real new bit of functionality is the patch from Richard which
adds the ability to filter records based on the filesystem type.
The remainder are bug fixes and cleanups; the bug fix highlights
include:
- ensuring that we properly audit init/PID-1 (me)
- allowing the audit daemon to shutdown the kernel/auditd connection
cleanly by setting the audit PID to zero (Steve)"
* tag 'audit-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: filter PATH records keyed on filesystem magic
Audit: remove unused audit_log_secctx function
audit: Allow auditd to set pid to 0 to end auditing
audit: Add new syscalls to the perm=w filter
audit: use audit_set_enabled() in audit_enable()
audit: convert audit_ever_enabled to a boolean
audit: don't use simple_strtol() anymore
audit: initialize the audit subsystem as early as possible
audit: ensure that 'audit=1' actually enables audit for PID 1
Pull x86 core updates from Ingo Molnar:
"Note that in this cycle most of the x86 topics interacted at a level
that caused them to be merged into tip:x86/asm - but this should be a
temporary phenomenon, hopefully we'll back to the usual patterns in
the next merge window.
The main changes in this cycle were:
Hardware enablement:
- Add support for the Intel UMIP (User Mode Instruction Prevention)
CPU feature. This is a security feature that disables certain
instructions such as SGDT, SLDT, SIDT, SMSW and STR. (Ricardo Neri)
[ Note that this is disabled by default for now, there are some
smaller enhancements in the pipeline that I'll follow up with in
the next 1-2 days, which allows this to be enabled by default.]
- Add support for the AMD SEV (Secure Encrypted Virtualization) CPU
feature, on top of SME (Secure Memory Encryption) support that was
added in v4.14. (Tom Lendacky, Brijesh Singh)
- Enable new SSE/AVX/AVX512 CPU features: AVX512_VBMI2, GFNI, VAES,
VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG. (Gayatri Kammela)
Other changes:
- A big series of entry code simplifications and enhancements (Andy
Lutomirski)
- Make the ORC unwinder default on x86 and various objtool
enhancements. (Josh Poimboeuf)
- 5-level paging enhancements (Kirill A. Shutemov)
- Micro-optimize the entry code a bit (Borislav Petkov)
- Improve the handling of interdependent CPU features in the early
FPU init code (Andi Kleen)
- Build system enhancements (Changbin Du, Masahiro Yamada)
- ... plus misc enhancements, fixes and cleanups"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits)
x86/build: Make the boot image generation less verbose
selftests/x86: Add tests for the STR and SLDT instructions
selftests/x86: Add tests for User-Mode Instruction Prevention
x86/traps: Fix up general protection faults caused by UMIP
x86/umip: Enable User-Mode Instruction Prevention at runtime
x86/umip: Force a page fault when unable to copy emulated result to user
x86/umip: Add emulation code for UMIP instructions
x86/cpufeature: Add User-Mode Instruction Prevention definitions
x86/insn-eval: Add support to resolve 16-bit address encodings
x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
x86/insn-eval: Add support to resolve 32-bit address encodings
x86/insn-eval: Compute linear address in several utility functions
resource: Fix resource_size.cocci warnings
X86/KVM: Clear encryption attribute when SEV is active
X86/KVM: Decrypt shared per-cpu variables when SEV is active
percpu: Introduce DEFINE_PER_CPU_DECRYPTED
x86: Add support for changing memory encryption attribute in early boot
x86/io: Unroll string I/O when SEV is active
x86/boot: Add early boot support when running with SEV active
...
Pull core locking updates from Ingo Molnar:
"The main changes in this cycle are:
- Another attempt at enabling cross-release lockdep dependency
tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
with better performance and fewer false positives. (Byungchul Park)
- Introduce lockdep_assert_irqs_enabled()/disabled() and convert
open-coded equivalents to lockdep variants. (Frederic Weisbecker)
- Add down_read_killable() and use it in the VFS's iterate_dir()
method. (Kirill Tkhai)
- Convert remaining uses of ACCESS_ONCE() to
READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
driven. (Mark Rutland, Paul E. McKenney)
- Get rid of lockless_dereference(), by strengthening Alpha atomics,
strengthening READ_ONCE() with smp_read_barrier_depends() and thus
being able to convert users of lockless_dereference() to
READ_ONCE(). (Will Deacon)
- Various micro-optimizations:
- better PV qspinlocks (Waiman Long),
- better x86 barriers (Michael S. Tsirkin)
- better x86 refcounts (Kees Cook)
- ... plus other fixes and enhancements. (Borislav Petkov, Juergen
Gross, Miguel Bernal Marin)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
rcu: Use lockdep to assert IRQs are disabled/enabled
netpoll: Use lockdep to assert IRQs are disabled/enabled
timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
irq_work: Use lockdep to assert IRQs are disabled/enabled
irq/timings: Use lockdep to assert IRQs are disabled/enabled
perf/core: Use lockdep to assert IRQs are disabled/enabled
x86: Use lockdep to assert IRQs are disabled/enabled
smp/core: Use lockdep to assert IRQs are disabled/enabled
timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
timers/nohz: Use lockdep to assert IRQs are disabled/enabled
workqueue: Use lockdep to assert IRQs are disabled/enabled
irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
locking/rwlocks: Fix comments
x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
...
- The old driver statement has been added to the kernel docs.
- We have a couple of new helper scripts. find-unused-docs.sh from Sayli
Karnic will point out kerneldoc comments that are not actually used in
the documentation. Jani Nikula's documentation-file-ref-check finds
references to non-existing files.
- A new ftrace document from Steve Rostedt.
- Vinod Koul converted the dmaengine docs to RST
Beyond that, it's mostly simple fixes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaCK37AAoJEI3ONVYwIuV63nwQALeqzVwGqqTwiyRyMqgEwMQM
je/6IurEwTHtyfwtW/mztCfNid1CLTiYZg7RET3/zlHjcUI/9VlV2dbBksGFgoQo
muHGqhwTJjXYREwjK3FkzrGckRsVZKJgdzmZYgukCCY6Ir7IffwJKYaLOCZN1S/l
4nBHQpt2nITo0WhdmZjaNRKOQxMA8nN5yGpOIl0neGE6ywIUMgauCCCHhxnOPVWg
ant1HliS8WR8Tizqt9wQgLCvs5lvklsBFibZPO9LBTPG2Zy3HIO9kb+npUAh2MTl
j0Wg39zzOFvVVErqErqUIwIuQ9IrfltHrEHYYoruTvDBXBiMKIcwApF+DS+H3WSp
TnDu3Qif4llM5SZsZGvcjawXNnbck+7SYOe9cyqpylV3SWMWrEX1tbUv6zVuVk+7
fencYBvEZgkJmWbjDeO/Z4S50STxRTzIxFwZgLft7g/RiHo9HvlubjjwQTqBFjxA
fVkolN7h69MGkrD8TF19eapyujqSXaNYH0pFYo87JNOjLgYmezUHyvHd8YeZJL31
Ll0h10HqSNVzJsjFolBMgrC3CcVjsEXdBufu0yVk45sAg9ZiMYOCpwa6Rtp+tfxa
uIBf1LKzfWSa0ocKx7+sMJt0B/CXwU3AMtsbYGyDhFhR2r3cp1NWBHf5nisz9etD
2Md9RDFAMLELZurewB9Q
=H6ud
-----END PGP SIGNATURE-----
Merge tag 'docs-4.15' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A relatively calm cycle for the docs tree again.
- The old driver statement has been added to the kernel docs.
- We have a couple of new helper scripts. find-unused-docs.sh from
Sayli Karnic will point out kerneldoc comments that are not actually
used in the documentation. Jani Nikula's
documentation-file-ref-check finds references to non-existing files.
- A new ftrace document from Steve Rostedt.
- Vinod Koul converted the dmaengine docs to RST
Beyond that, it's mostly simple fixes.
This set reaches outside of Documentation/ a bit more than most. In
all cases, the changes are to comment docs, mostly from Randy, in
places where there didn't seem to be anybody better to take them"
* tag 'docs-4.15' of git://git.lwn.net/linux: (52 commits)
documentation: fb: update list of available compiled-in fonts
MAINTAINERS: update DMAengine documentation location
dmaengine: doc: ReSTize pxa_dma doc
dmaengine: doc: ReSTize dmatest doc
dmaengine: doc: ReSTize client API doc
dmaengine: doc: ReSTize provider doc
dmaengine: doc: Add ReST style dmaengine document
ftrace/docs: Add documentation on how to use ftrace from within the kernel
bug-hunting.rst: Fix an example and a typo in a Sphinx tag
scripts: Add a script to find unused documentation
samples: Convert timers to use timer_setup()
documentation: kernel-api: add more info on bitmap functions
Documentation: fix selftests related file refs
Documentation: fix ref to power basic-pm-debugging
Documentation: fix ref to trace stm content
Documentation: fix ref to coccinelle content
Documentation: fix ref to workqueue content
Documentation: fix ref to sphinx/kerneldoc.py
Documentation: fix locking rt-mutex doc refs
docs: dev-tools: correct Coccinelle version number
...
The audit subsystem allows selecting audit events based on watches for
a particular behavior like writing to a file. A lot of syscalls have
been added without updating the list. This patch adds 2 syscalls to the
write filters: fallocate and renameat2.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
[PM: cleaned up some whitespace errors]
Signed-off-by: Paul Moore <paul@paul-moore.com>
KVM guest defines three per-CPU variables (steal-time, apf_reason, and
kvm_pic_eoi) which are shared between a guest and a hypervisor.
When SEV is active, memory is encrypted with a guest-specific key, and if
the guest OS wants to share the memory region with the hypervisor then it
must clear the C-bit (i.e set decrypted) before sharing it.
DEFINE_PER_CPU_DECRYPTED can be used to define the per-CPU variables
which will be shared between a guest and a hypervisor.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: linux-arch@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Lameter <cl@linux.com>
Link: https://lkml.kernel.org/r/20171020143059.3291-16-brijesh.singh@amd.com
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a prospective writer takes the qrwlock locking slowpath due to the
lock being held, it attempts to cmpxchg the wmode field from 0 to
_QW_WAITING so that concurrent lockers also take the slowpath and queue
on the spinlock accordingly, allowing the lockers to drain.
Unfortunately, this isn't fair, because a fastpath writer that comes in
after the lock is made available but before the _QW_WAITING flag is set
can effectively jump the queue. If there is a steady stream of prospective
writers, then the waiter will be held off indefinitely.
This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
into two distinct fields: _QW_LOCKED continues to occupy the bottom byte
of the lockword so that it can be cleared unconditionally when unlocking,
but _QW_WAITING now occupies what used to be the bottom bit of the reader
count. This then forces the slow-path for concurrent lockers.
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Tested-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Jeremy.Linton@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1507810851-306-6-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The qrwlock slowpaths involve spinning when either a prospective reader
is waiting for a concurrent writer to drain, or a prospective writer is
waiting for concurrent readers to drain. In both of these situations,
atomic_cond_read_acquire() can be used to avoid busy-waiting and make use
of any backoff functionality provided by the architecture.
This patch replaces the open-code loops and rspin_until_writer_unlock()
implementation with atomic_cond_read_acquire(). The write mode transition
zero to _QW_WAITING is left alone, since (a) this doesn't need acquire
semantics and (b) should be fast.
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Tested-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Jeremy.Linton@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1507810851-306-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
smp_cond_load_acquire() provides a way to spin on a variable with acquire
semantics until some conditional expression involving the variable is
satisfied. Architectures such as arm64 can potentially enter a low-power
state, waking up only when the value of the variable changes, which
reduces the system impact of tight polling loops.
This patch makes the same interface available to users of atomic_t,
atomic64_t and atomic_long_t, rather than require messy accesses to the
structure internals.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Jeremy.Linton@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1507810851-306-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's no good reason to keep the internal structure of struct qrwlock
hidden from qrwlock.h, particularly as it's actually needed for unlock
and ends up being abstracted independently behind the __qrwlock_write_byte()
function.
Stop pretending we can hide this stuff, and move the __qrwlock definition
into qrwlock, removing the __qrwlock_write_byte() nastiness and using the
same struct definition everywhere instead.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Jeremy.Linton@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1507810851-306-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename the unwinder config options from:
CONFIG_ORC_UNWINDER
CONFIG_FRAME_POINTER_UNWINDER
CONFIG_GUESS_UNWINDER
to:
CONFIG_UNWINDER_ORC
CONFIG_UNWINDER_FRAME_POINTER
CONFIG_UNWINDER_GUESS
... in order to give them a more logical config namespace.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The arch_{read,spin,write}_lock_flags() macros are simply mapped to the
non-flags versions by the majority of architectures, so do this in core
code and remove the dummy implementations. Also remove the implementation
in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call
local_irq_save(flags) anyway.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Outside of the locking code itself, {read,spin,write}_can_lock() have no
users in tree. Apparmor (the last remaining user of write_can_lock()) got
moved over to lockdep by the previous patch.
This patch removes the use of {read,spin,write}_can_lock() from the
BUILD_LOCK_OPS macro, deferring to the trylock operation for testing the
lock status, and subsequently removes the unused macros altogether. They
aren't guaranteed to work in a concurrent environment and can give
incorrect results in the case of qrwlock.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Using .text.unlikely for refcount exceptions isn't safe because gcc may
move entire functions into .text.unlikely (e.g. in6_dev_dev()), which
would cause any uses of a protected refcount_t function to stay inline
with the function, triggering the protection unconditionally:
.section .text.unlikely,"ax",@progbits
.type in6_dev_get, @function
in6_dev_getx:
.LFB4673:
.loc 2 4128 0
.cfi_startproc
...
lock; incl 480(%rbx)
js 111f
.pushsection .text.unlikely
111: lea 480(%rbx), %rcx
112: .byte 0x0f, 0xff
.popsection
113:
This creates a unique .text..refcount section and adds an additional
test to the exception handler to WARN in the case of having none of OF,
SF, nor ZF set so we can see things like this more easily in the future.
The double dot for the section name keeps it out of the TEXT_MAIN macro
namespace, to avoid collisions and so it can be put at the end with
text.unlikely to keep the cold code together.
See commit:
cb87481ee8 ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured")
... which matches C names: [a-zA-Z0-9_] but not ".".
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch <linux-arch@vger.kernel.org>
Fixes: 7a46ec0e2f ("locking/refcounts, x86/asm: Implement fast refcount overflow protection")
Link: http://lkml.kernel.org/r/1504382986-49301-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
it's possible (albeit unlikely) that the compiler will split the access
across multiple instructions.
In this_cpu_generic_read() we disable preemption but not interrupts
before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
in the middle of the split load instructions. If a this_cpu_write() or
RMW this_cpu_*() op is made to the same variable in the interrupt
handling path, this_cpu_read() will return a torn value.
For native word types, we can avoid tearing using READ_ONCE(), but this
won't work in all cases (e.g. 64-bit types on most 32-bit platforms).
This patch reworks this_cpu_generic_read() to use READ_ONCE() where
possible, otherwise falling back to disabling interrupts.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull misc leftovers from Al Viro.
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix the __user misannotations in asm-generic get_user/put_user
fput: Don't reinvent the wheel but use existing llist API
namespace.c: Don't reinvent the wheel but use existing llist API
General updates:
* Constify pci_device_id in various drivers
* Constify device_type
* Remove pad control code from the Gemini driver
* Use %pOF to print OF node full_name
* Various fixes in the physmap_of driver
* Remove unused vars in mtdswap
* Check devm_kzalloc() return value in the spear_smi driver
* Check clk_prepare_enable() return code in the st_spi_fsm driver
* Create per MTD device debugfs enties
NAND updates, from Boris Brezillon:
* Fix memory leaks in the core
* Remove unused NAND locking support
* Rename nand.h into rawnand.h (preparing support for spi NANDs)
* Use NAND_MAX_ID_LEN where appropriate
* Fix support for 20nm Hynix chips
* Fix support for Samsung and Hynix SLC NANDs
* Various cleanup, improvements and fixes in the qcom driver
* Fixes for bugs detected by various static code analysis tools
* Fix mxc ooblayout definition
* Add a new part_parsers to tmio and sharpsl platform data in order to
define a custom list of partition parsers
* Request the reset line in exclusive mode in the sunxi driver
* Fix a build error in the orion-nand driver when compiled for ARMv4
* Allow 64-bit mvebu platforms to select the PXA3XX driver
SPI NOR updates, from Cyrille Pitchen and Marek Vasut:
* add support to the JEDEC JESD216B specification (SFDP tables).
* add support to the Intel Denverton SPI flash controller.
* fix error recovery for Spansion/Cypress SPI NOR memories.
* fix 4-byte address management for the Aspeed SPI controller.
* add support to some Microchip SST26 memory parts
* remove unneeded pinctrl header Write a message for tag:
-----BEGIN PGP SIGNATURE-----
iQJABAABCAAqBQJZrav6Ixxib3Jpcy5icmV6aWxsb25AZnJlZS1lbGVjdHJvbnMu
Y29tAAoJEGXtNgF+CLcABwkP/joDrq09RIC9n5gP+ubJe6O1jKvNWDd6bIVXD3Ke
73R0a0ANwwWlNYWTChTdrb8UeewVS1bzutyy5O2Sbdb6Jc6s7xkfQDTsbET2HWOK
S7Lt/zjlC6/6cow59B6h43PGS6wmIFaZD3K+70sGhvFnV8epVUzS2Aa783xS8LXm
so2djZOdUYnW+yE0eho24VQR6nS4YP4Vc+7Mm9skjU0ifjB9mJiWRkzoQnqIgORO
M+Iab+qjDs9KR/edWh6mZtnvjps0VSW4I40YsClpcgIn550w1DSXe4u6/8Nk+2Bp
gfrALls91gob0ocxmEdIyLID+M0410HcN/Lvh36nw+tkkGTaXf0D6mkqzdKNrZ3w
yz+UV9uf19kr1c6zFGcCvUlD0btn9KT+F2legnhgURtwUyDFQcaYQlkpDIeEzUMV
ZrtzKbSE2v9810YKXjtCnseewdP+Eph/ewN6ODX5yg/fs8K0fyQYTRtYYM50U69X
md8zznBBDPhJVu5T2Of7my9V1SxvCP8a7LrKjAXuFHpZ/CHiPe+QOWBgG2L+zXXT
e10/rTg7T2pcyKpBvL/3/mCYeJ+Iup3lKT1EHGCXcKnLGecVgOsbvdG+JnvQMI2J
FLmu1exvrzi0Gcrs/05hqwyUvkHZ5FB1a+heNOtmQ+h1U0ElXqILyu7brzghupRe
3phO
=UgCd
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20170904' of git://git.infradead.org/linux-mtd
Pull MTD updates from Boris Brezillon:
"General updates:
- Constify pci_device_id in various drivers
- Constify device_type
- Remove pad control code from the Gemini driver
- Use %pOF to print OF node full_name
- Various fixes in the physmap_of driver
- Remove unused vars in mtdswap
- Check devm_kzalloc() return value in the spear_smi driver
- Check clk_prepare_enable() return code in the st_spi_fsm driver
- Create per MTD device debugfs enties
NAND updates, from Boris Brezillon:
- Fix memory leaks in the core
- Remove unused NAND locking support
- Rename nand.h into rawnand.h (preparing support for spi NANDs)
- Use NAND_MAX_ID_LEN where appropriate
- Fix support for 20nm Hynix chips
- Fix support for Samsung and Hynix SLC NANDs
- Various cleanup, improvements and fixes in the qcom driver
- Fixes for bugs detected by various static code analysis tools
- Fix mxc ooblayout definition
- Add a new part_parsers to tmio and sharpsl platform data in order
to define a custom list of partition parsers
- Request the reset line in exclusive mode in the sunxi driver
- Fix a build error in the orion-nand driver when compiled for ARMv4
- Allow 64-bit mvebu platforms to select the PXA3XX driver
SPI NOR updates, from Cyrille Pitchen and Marek Vasut:
- add support to the JEDEC JESD216B specification (SFDP tables).
- add support to the Intel Denverton SPI flash controller.
- fix error recovery for Spansion/Cypress SPI NOR memories.
- fix 4-byte address management for the Aspeed SPI controller.
- add support to some Microchip SST26 memory parts
- remove unneeded pinctrl header Write a message for tag:"
* tag 'for-linus-20170904' of git://git.infradead.org/linux-mtd: (74 commits)
mtd: nand: complain loudly when chip->bits_per_cell is not correctly initialized
mtd: nand: make Samsung SLC NAND usable again
mtd: nand: tmio: Register partitions using the parsers
mfd: tmio: Add partition parsers platform data
mtd: nand: sharpsl: Register partitions using the parsers
mtd: nand: sharpsl: Add partition parsers platform data
mtd: nand: qcom: Support for IPQ8074 QPIC NAND controller
mtd: nand: qcom: support for IPQ4019 QPIC NAND controller
dt-bindings: qcom_nandc: IPQ8074 QPIC NAND documentation
dt-bindings: qcom_nandc: IPQ4019 QPIC NAND documentation
dt-bindings: qcom_nandc: fix the ipq806x device tree example
mtd: nand: qcom: support for different DEV_CMD register offsets
mtd: nand: qcom: QPIC data descriptors handling
mtd: nand: qcom: enable BAM or ADM mode
mtd: nand: qcom: erased codeword detection configuration
mtd: nand: qcom: support for read location registers
mtd: nand: qcom: support for passing flags in DMA helper functions
mtd: nand: qcom: add BAM DMA descriptor handling
mtd: nand: qcom: allocate BAM transaction
mtd: nand: qcom: DMA mapping support for register read buffer
...
Soft dirty bit is designed to keep tracked over page migration. This
patch makes it work in the same manner for thp migration too.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When THP migration is being used, memory management code needs to handle
pmd migration entries properly. This patch uses !pmd_present() or
is_swap_pmd() (depending on whether pmd_none() needs separate code or
not) to check pmd migration entries at the places where a pmd entry is
present.
Since pmd-related code uses split_huge_page(), split_huge_pmd(),
pmd_trans_huge(), pmd_trans_unstable(), or
pmd_none_or_trans_huge_or_clear_bad(), this patch:
1. adds pmd migration entry split code in split_huge_pmd(),
2. takes care of pmd migration entries whenever pmd_trans_huge() is present,
3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.
Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
them.
Until this commit, a pmd entry should be:
1. pointing to a pte page,
2. is_swap_pmd(),
3. pmd_trans_huge(),
4. pmd_devmap(), or
5. pmd_none().
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 mm changes from Ingo Molnar:
"PCID support, 5-level paging support, Secure Memory Encryption support
The main changes in this cycle are support for three new, complex
hardware features of x86 CPUs:
- Add 5-level paging support, which is a new hardware feature on
upcoming Intel CPUs allowing up to 128 PB of virtual address space
and 4 PB of physical RAM space - a 512-fold increase over the old
limits. (Supercomputers of the future forecasting hurricanes on an
ever warming planet can certainly make good use of more RAM.)
Many of the necessary changes went upstream in previous cycles,
v4.14 is the first kernel that can enable 5-level paging.
This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
default.
(By Kirill A. Shutemov)
- Add 'encrypted memory' support, which is a new hardware feature on
upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
RAM to be encrypted and decrypted (mostly) transparently by the
CPU, with a little help from the kernel to transition to/from
encrypted RAM. Such RAM should be more secure against various
attacks like RAM access via the memory bus and should make the
radio signature of memory bus traffic harder to intercept (and
decrypt) as well.
This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
by default.
(By Tom Lendacky)
- Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
hardware feature that attaches an address space tag to TLB entries
and thus allows to skip TLB flushing in many cases, even if we
switch mm's.
(By Andy Lutomirski)
All three of these features were in the works for a long time, and
it's coincidence of the three independent development paths that they
are all enabled in v4.14 at once"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
x86/mm: Use pr_cont() in dump_pagetable()
x86/mm: Fix SME encryption stack ptr handling
kvm/x86: Avoid clearing the C-bit in rsvd_bits()
x86/CPU: Align CR3 defines
x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
acpi, x86/mm: Remove encryption mask from ACPI page protection type
x86/mm, kexec: Fix memory corruption with SME on successive kexecs
x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
x86/mm: Allow userspace have mappings above 47-bit
x86/mm: Prepare to expose larger address space to userspace
x86/mpx: Do not allow MPX if we have mappings above 47-bit
x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
x86/mm/dump_pagetables: Fix printout of p4d level
x86/mm/dump_pagetables: Generalize address normalization
x86/boot: Fix memremap() related build failure
...
Pull locking updates from Ingo Molnar:
- Add 'cross-release' support to lockdep, which allows APIs like
completions, where it's not the 'owner' who releases the lock, to be
tracked. It's all activated automatically under
CONFIG_PROVE_LOCKING=y.
- Clean up (restructure) the x86 atomics op implementation to be more
readable, in preparation of KASAN annotations. (Dmitry Vyukov)
- Fix static keys (Paolo Bonzini)
- Add killable versions of down_read() et al (Kirill Tkhai)
- Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)
- Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)
- Remove smp_mb__before_spinlock() and convert its usages, introduce
smp_mb__after_spinlock() (Peter Zijlstra)
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
locking/lockdep/selftests: Fix mixed read-write ABBA tests
sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
smp: Avoid using two cache lines for struct call_single_data
locking/lockdep: Untangle xhlock history save/restore from task independence
locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
futex: Remove duplicated code and fix undefined behaviour
Documentation/locking/atomic: Finish the document...
locking/lockdep: Fix workqueue crossrelease annotation
workqueue/lockdep: 'Fix' flush_work() annotation
locking/lockdep/selftests: Add mixed read-write ABBA tests
mm, locking/barriers: Clarify tlb_flush_pending() barriers
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
locking/lockdep: Explicitly initialize wq_barrier::done::map
locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
locking/refcounts, x86/asm: Implement fast refcount overflow protection
locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
...
Pull x86 asm updates from Ingo Molnar:
- Introduce the ORC unwinder, which can be enabled via
CONFIG_ORC_UNWINDER=y.
The ORC unwinder is a lightweight, Linux kernel specific debuginfo
implementation, which aims to be DWARF done right for unwinding.
Objtool is used to generate the ORC unwinder tables during build, so
the data format is flexible and kernel internal: there's no
dependency on debuginfo created by an external toolchain.
The ORC unwinder is almost two orders of magnitude faster than the
(out of tree) DWARF unwinder - which is important for perf call graph
profiling. It is also significantly simpler and is coded defensively:
there has not been a single ORC related kernel crash so far, even
with early versions. (knock on wood!)
But the main advantage is that enabling the ORC unwinder allows
CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
measurably:
With frame pointers disabled, GCC does not have to add frame pointer
instrumentation code to every function in the kernel. The kernel's
.text size decreases by about 3.2%, resulting in better cache
utilization and fewer instructions executed, resulting in a broad
kernel-wide speedup. Average speedup of system calls should be
roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
a speedup of 5-10% for some function execution intense workloads.
The main cost of the unwinder is that the unwinder data has to be
stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
config - which is a modest cost on modern x86 systems.
Given how young the ORC unwinder code is it's not enabled by default
- but given the performance advantages the plan is to eventually make
it the default unwinder on x86.
See Documentation/x86/orc-unwinder.txt for more details.
- Remove lguest support: its intended role was that of a temporary
proof of concept for virtualization, plus its removal will enable the
reduction (removal) of the paravirt API as well, so Rusty agreed to
its removal. (Juergen Gross)
- Clean up and fix FSGS related functionality (Andy Lutomirski)
- Clean up IO access APIs (Andy Shevchenko)
- Enhance the symbol namespace (Jiri Slaby)
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
objtool: Handle GCC stack pointer adjustment bug
x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
x86/fpu/math-emu: Add ENDPROC to functions
x86/boot/64: Extract efi_pe_entry() from startup_64()
x86/boot/32: Extract efi_pe_entry() from startup_32()
x86/lguest: Remove lguest support
x86/paravirt/xen: Remove xen_patch()
objtool: Fix objtool fallthrough detection with function padding
x86/xen/64: Fix the reported SS and CS in SYSCALL
objtool: Track DRAP separately from callee-saved registers
objtool: Fix validate_branch() return codes
x86: Clarify/fix no-op barriers for text_poke_bp()
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
selftests/x86/fsgsbase: Test selectors 1, 2, and 3
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
x86/asm/32: Fix regs_get_register() on segment registers
x86/xen/64: Rearrange the SYSCALL entries
x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
...
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Add branch type profiling/tracing support. (Jin Yao)
- Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of
physical memory addresses, where the PMU supports it. (Kan Liang)
- Export some PMU capability details in the new
/sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi
Kleen)
- Aux data fixes and updates (Will Deacon)
- kprobes fixes and updates (Masami Hiramatsu)
- AMD uncore PMU driver fixes and updates (Janakarajan Natarajan)
On the tooling side, here's a (limited!) list of highlights - there
were many other changes that I could not list, see the shortlog and
git history for details:
UI improvements:
- Implement a visual marker for fused x86 instructions in the
annotate TUI browser, available now in 'perf report', more work
needed to have it available as well in 'perf top' (Jin Yao)
Further explanation from one of Jin's patches:
│ ┌──cmpl $0x0,argp_program_version_hook
81.93 │ ├──je 20
│ │ lock cmpxchg %esi,0x38a9a4(%rip)
│ │↓ jne 29
│ │↓ jmp 43
11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
That means the cmpl+je is a fused instruction pair and they should
be considered together.
- Record the branch type and then show statistics and info about in
callchain entries (Jin Yao)
Example from one of Jin's patches:
# perf record -g -j any,save_type
# perf report --branch-history --stdio --no-children
38.50% div.c:45 [.] main div
|
---main div.c:42 (RET CROSS_2M cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (RET CROSS_2M cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (RET CROSS_2M cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (RET CROSS_2M cycles:9)
namespaces support:
- Add initial support for namespaces, using setns to access files in
namespaces, grabbing their build-ids, etc. (Krister Johansen)
perf trace enhancements:
- Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace'
(Arnaldo Carvalho de Melo)
- Add initial 'clone' syscall args beautifier in 'perf trace'
(Arnaldo Carvalho de Melo)
- Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace'
(Arnaldo Carvalho de Melo)
- Beautifiers for the 'cmd' arg of several ioctl types, including:
sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de
Melo)
- Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data'
CTF conversion, allowing CTF trace visualization tools to show
callchains and to resolve symbols (Geneviève Bastien)
- Beautify the fcntl syscall, which is an interesting one in the
sense that infrastructure had to be put in place to change the
formatters of some arguments according to the value in a previous
one, i.e. cmd dictates how arg and the syscall return will be
formatted. (Arnaldo Carvalho de Melo
perf stat enhancements:
- Use group read for event groups in 'perf stat', reducing overhead
when groups are defined in the event specification, i.e. when using
{} to enclose a list of events, asking them to be read at the same
time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa)
pipe mode improvements:
- Process tracing data in 'perf annotate' pipe mode (David
Carrillo-Cisneros)
- Add header record types to pipe-mode, now this command:
$ perf record -o - -e cycles sleep 1 | perf report --stdio --header
Will show the same as in non-pipe mode, i.e. involving a perf.data
file (David Carrillo-Cisneros)
Vendor specific hardware event support updates/enhancements:
- Update POWER9 vendor events tables (Sukadev Bhattiprolu)
- Add POWER9 PMU events Sukadev (Bhattiprolu)
- Support additional POWER8+ PVR in PMU mapfile (Shriya)
- Add Skylake server uncore JSON vendor events (Andi Kleen)
- Support exporting Intel PT data to sqlite3 with python perf
scripts, this is in addition to the postgresql support that was
already there (Adrian Hunter)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits)
perf symbols: Fix plt entry calculation for ARM and AARCH64
perf probe: Fix kprobe blacklist checking condition
perf/x86: Fix caps/ for !Intel
perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR
perf/core, pt, bts: Get rid of itrace_started
perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments
tools headers: Sync cpu features kernel ABI headers with tooling headers
perf tools: Pass full path of FEATURES_DUMP
perf tools: Robustify detection of clang binary
tools lib: Allow external definition of CC, AR and LD
perf tools: Allow external definition of flex and bison binary names
tools build tests: Don't hardcode gcc name
perf report: Group stat values on global event id
perf values: Zero value buffers
perf values: Fix allocation check
perf values: Fix thread index bug
perf report: Add dump_read function
perf record: Set read_format for inherit_stat
perf c2c: Fix remote HITM detection for Skylake
perf tools: Fix static build with newer toolchains
...
Pull RCU updates from Ingo Molnad:
"The main RCU related changes in this cycle were:
- Removal of spin_unlock_wait()
- SRCU updates
- RCU torture-test updates
- RCU Documentation updates
- Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant
- Miscellaneous RCU fixes
- CPU-hotplug fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
arch: Remove spin_unlock_wait() arch-specific definitions
locking: Remove spin_unlock_wait() generic definitions
drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
ipc: Replace spin_unlock_wait() with lock/unlock pair
exit: Replace spin_unlock_wait() with lock/unlock pair
completion: Replace spin_unlock_wait() with lock/unlock pair
doc: Set down RCU's scheduling-clock-interrupt needs
doc: No longer allowed to use rcu_dereference on non-pointers
doc: Add RCU files to docbook-generation files
doc: Update memory-barriers.txt for read-to-write dependencies
doc: Update RCU documentation
membarrier: Provide expedited private command
rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
rcu: Add warning to rcu_idle_enter() for irqs enabled
rcu: Make rcu_idle_enter() rely on callers disabling irqs
rcu: Add assertions verifying blocked-tasks list
rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
rcu: Add TPS() protection for _rcu_barrier_trace strings
rcu: Use idle versions of swait to make idle-hack clear
swait: Add idle variants which don't contribute to load average
...
When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
@node. The assumption seems that if !NUMA, there shouldn't be more than
one node and thus reporting cpu_online_mask regardless of @node is
correct. However, that assumption was broken years ago to support
DISCONTIGMEM and whether a system has multiple nodes or not is
separately controlled by NEED_MULTIPLE_NODES.
This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
cpumask_of_node() will report cpu_online_mask for all possible nodes,
indicating that the CPUs are associated with multiple nodes which is an
impossible configuration.
This bug has been around forever but doesn't look like it has caused any
noticeable symptoms. However, it triggers a WARN recently added to
workqueue to verify NUMA affinity configuration.
Fix it by reporting empty cpumask on non-zero nodes if !NUMA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZo2HiAAoJEHm+PkMAQRiG3OcIAJqSeVK2uQ/QhmqFN1ExYay4
bdTjSTtSk7GH6PxI2C0cqfZvsxOUU7ICDHG8bYM1LA0S0SxfOtoFhHGKc/BcFLX8
MiKJWlF51ZbX0mkIEpKF+C8pRrXPgSqtk3N450/k2BzG9qCZSM93A2NCOB7v9T9w
XOBUIYHqfTS2tdmCinjwu8Ls+w8oPOGH1gLjxZyGnBlg4lTqHMcUufmHeVEAh11d
giGByqqqXH69kGD1HNC7H6quzXN9rz4n0gEwEG0mIhfkJ98b+ESSWwSEXXypOAQD
QT5/6+2YizXf5DPCqR46xasQCPjRsS6Sv0cF2cntW2PEAb4jBjhx5gTFlJcoOC8=
=efWJ
-----END PGP SIGNATURE-----
Merge tag 'v4.13-rc7' into mtd/next
Merge v4.13-rc7 back to resolve merge conflicts in
drivers/mtd/nand/nandsim.c and include/asm-generic/vmlinux.lds.h.
There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.
Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.
This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.
Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker <dalias@libc.org>
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
- fix linker script regression caused by dead code elimination support
- fix typos and outdated comments
- specify kselftest-clean as a PHONY target
- fix "make dtbs_install" when $(srctree) includes shell special
characters like '~'
- Move -fshort-wchar to the global option list because defining it
partially emits warnings
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZnvezAAoJED2LAQed4NsGTK0P/jZY3jNT3tzgldH5ggCeA7Z2
qekRVR3z0CCm3FU+NlWLVNsiyWDRxuN93lIVML8f7wt/kuJDNxycx5G7+Yiw60yH
h41gibJJdS3GL+VzfGUk1WeZkoAiGVVbLvENJzOQRIeHblsi8Ris+pBwT3MY/WKl
XM0k+xsc/abB3dnhgws8WiUvrNhRiqsciq7IBkDPzxGa4JC53+9Yjjc1HA6gah+g
gKX2qZmsjKm9Hsueot9clTttT+iDHmJ0QetHeMGy0T6bqjlRzG+H37ZGtGv3ZP4p
wcsct8QJt+8/TSEtvkROg2apBUlrEkblvsL/G2UFew42Fz4Z3XLariHTcrDoYuHY
IiUc/tUbmx6Ft72ZvepD+qN+rxQiWKqRzTD+7sDDNPPbdgC4pwZ2Z6m7Oih8mRT5
jGlex4PN/rA5ZvTDFF8c/1GhStZrfxM7A7t6k4Snui6hOpxcpK9vHDgKrfGjk83h
xkwvrPIXnreaAvUYz18JhTU1Zuzj5vwIZgPI1bRl3fi4JmagsAwl+IJawWXMFODA
WONJnqavCYRcdffD0cGfJqA8YSmpxfiDxtP7NiOt05jagDHDGH6bbsUR5Oni/9wU
9D8wdn4XdrrYGG2kEVMDa0x4Yn8vVuPfyxy5DgcRciMcYCUfNHOIHFutRvFhwcqm
rV7ege5oIH8DdsWoE5pj
=m8Zm
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix linker script regression caused by dead code elimination support
- fix typos and outdated comments
- specify kselftest-clean as a PHONY target
- fix "make dtbs_install" when $(srctree) includes shell special
characters like '~'
- Move -fshort-wchar to the global option list because defining it
partially emits warnings
* tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: update comments of Makefile.asm-generic
kbuild: Do not use hyphen in exported variable name
Makefile: add kselftest-clean to PHONY target list
Kbuild: use -fshort-wchar globally
fixdep: trivial: typo fix and correction
kbuild: trivial cleanups on the comments
kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured
There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair. This commit therefore removes spin_unlock_wait() and related
definitions from core code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
When XIP_KERNEL is enabled, some functions are defined in the .data
ELF section because we require them to be in RAM whenever we communicate
with the flash chip. However this causes problems when FTRACE is
enabled and gcc emits calls to __gnu_mcount_nc in the function
prolog:
drivers/built-in.o: In function `cfi_chip_setup':
:(.data+0x272fc): relocation truncated to fit: R_ARM_CALL against symbol `__gnu_mcount_nc' defined in .text section in arch/arm/kernel/built-in.o
drivers/built-in.o: In function `cfi_probe_chip':
:(.data+0x27de8): relocation truncated to fit: R_ARM_CALL against symbol `__gnu_mcount_nc' defined in .text section in arch/arm/kernel/built-in.o
/tmp/ccY172rP.s: Assembler messages:
/tmp/ccY172rP.s:70: Warning: ignoring changed section attributes for .data
/tmp/ccY172rP.s: Error: 1 warning, treating warnings as errors
make[5]: *** [drivers/mtd/chips/cfi_probe.o] Error 1
/tmp/ccK4rjeO.s: Assembler messages:
/tmp/ccK4rjeO.s:421: Warning: ignoring changed section attributes for .data
/tmp/ccK4rjeO.s: Error: 1 warning, treating warnings as errors
make[5]: *** [drivers/mtd/chips/cfi_util.o] Error 1
/tmp/ccUvhCYR.s: Assembler messages:
/tmp/ccUvhCYR.s:1895: Warning: ignoring changed section attributes for .data
/tmp/ccUvhCYR.s: Error: 1 warning, treating warnings as errors
Specifically, this does not work because the .data section is not
marked executable, which leads LD to not generate trampolines for
long calls.
This moves the __xipram functions into their own .xiptext section instead.
The section is still placed next to .data and located in RAM but is marked
executable, which avoids the build errors.
Also, we only need to place the XIP functions into a separate section
if both CONFIG_XIP_KERNEL and CONFIG_MTD_XIP are set: When only MTD_XIP
is used, the whole kernel is still in RAM and we do not need to worry
about pulling out the rug under it. When only XIP_KERNEL but not MTD_XIP
is set, the kernel is in some form of ROM, but we never write to it.
Note that MTD_XIP has been broken on ARM since around 2011 or 2012. I
have sent another patch[2] to fix compilation, which I plan to merge
through arm-soc unless there are objections. The obvious alternative
to that would be to completely rip out the MTD_XIP support from the
kernel, since obviously nobody has been using it in a long while.
Link: [1] https://patchwork.kernel.org/patch/8109771/
Link: [2] https://patchwork.kernel.org/patch/9855225/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Conflicts:
include/linux/mm_types.h
mm/huge_memory.c
I removed the smp_mb__before_spinlock() like the following commit does:
8b1b436dd1 ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
and fixed up the affected commits.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
Quote from Mel Gorman:
"The race in question is CPU 0 running madv_free and updating some PTEs
while CPU 1 is also running madv_free and looking at the same PTEs.
CPU 1 may have writable TLB entries for a page but fail the pte_dirty
check (because CPU 0 has updated it already) and potentially fail to
flush.
Hence, when madv_free on CPU 1 returns, there are still potentially
writable TLB entries and the underlying PTE is still present so that a
subsequent write does not necessarily propagate the dirty bit to the
underlying PTE any more. Reclaim at some unknown time at the future
may then see that the PTE is still clean and discard the page even
though a write has happened in the meantime. I think this is possible
but I could have missed some protection in madv_free that prevents it
happening."
This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].
TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on. In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.
I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.
NOTE:
This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um). It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends. However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.
[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/
[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a preparatory patch for solving race problems caused by
TLB batch. For that, we will increase/decrease TLB flush pending count
of mm_struct whenever tlb_[gather|finish]_mmu is called.
Before making it simple, this patch separates architecture specific part
and rename it to arch_tlb_[gather|finish]_mmu and generic part just
calls it.
It shouldn't change any behavior.
Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generate irqentry and softirqentry text sections without
any Kconfig dependencies. This will add extra sections, but
there should be no performace impact.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S . Miller <davem@davemloft.net>
Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-arch@vger.kernel.org
Cc: linux-cris-kernel@axis.com
Cc: mathieu.desnoyers@efficios.com
Link: http://lkml.kernel.org/r/150172789110.27216.3955739126693102122.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Those architectures that have a special atomic_set implementation also
need a special atomic_set_release(), because for the very same reason
WRITE_ONCE() is broken for them, smp_store_release() is too.
The vast majority is architectures that have spinlock hash based atomic
implementation except hexagon which seems to have a hardware 'feature'.
The spinlock based atomics should be SC, that is, none of them appear to
place extra barriers in atomic_cmpxchg() or any of the other SC atomic
primitives and therefore seem to rely on their spinlock implementation
being SC (I did not fully validate all that).
Therefore, the normal atomic_set() is SC and can be used at
atomic_set_release().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: davem@davemloft.net
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: rkuo@codeaurora.org
Cc: vgupta@synopsys.com
Link: http://lkml.kernel.org/r/20170609110506.yod47flaav3wgoj5@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The .data and .bss sections were modified in the generic linker script to
pull in sections named .data.<C identifier>, which are generated by gcc with
-ffunction-sections and -fdata-sections options.
The problem with this pattern is it can also match section names that Linux
defines explicitly, e.g., .data.unlikely. This can cause Linux sections to
get moved into the wrong place.
The way to avoid this is to use ".." separators for explicit section names
(the dot character is valid in a section name but not a C identifier).
However currently there are sections which don't follow this rule, so for
now just disable the wild card by default.
Example: http://marc.info/?l=linux-arm-kernel&m=150106824024221&w=2
Cc: <stable@vger.kernel.org> # 4.9
Fixes: b67067f117 ("kbuild: allow archs to select link dead code/data elimination")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
It plugs into the existing x86 unwinder framework.
It relies on objtool to generate the needed .orc_unwind and
.orc_unwind_ip sections.
For more details on why ORC is used instead of DWARF, see
Documentation/x86/orc-unwinder.txt - but the short version is
that it's a simplified, fundamentally more robust debugninfo
data structure, which also allows up to two orders of magnitude
faster lookups than the DWARF unwinder - which matters to
profiling workloads like perf.
Thanks to Andy Lutomirski for the performance improvement ideas:
splitting the ORC unwind table into two parallel arrays and creating a
fast lookup table to search a subset of the unwind table.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
[ Extended the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Generic header defines xlate_dev_kmem_ptr().
Reuse it from generic header and remove in x86 code.
Move a description to the generic header as well.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-i2c@vger.kernel.org
Cc: wsa@the-dreams.de
Link: http://lkml.kernel.org/r/20170630170934.83028-5-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Generic header defines memset_io, memcpy_fromio(). and memcpy_toio().
Reuse them from generic header and remove in x86 code.
Move the descriptions to the generic header as well.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-i2c@vger.kernel.org
Cc: wsa@the-dreams.de
Link: http://lkml.kernel.org/r/20170630170934.83028-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add early_memremap() support to be able to specify encrypted and
decrypted mappings with and without write-protection. The use of
write-protection is necessary when encrypting data "in place". The
write-protect attribute is considered cacheable for loads, but not
stores. This implies that the hardware will never give the core a
dirty line with this memtype.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/479b5832c30fae3efa7932e48f81794e86397229.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros. Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active. Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.
The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.
Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask. These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.
Also, an early initialization function is added for SME. If SME is active,
this function:
- Updates the early_pmd_flags so that early page faults create mappings
with the encryption mask.
- Updates the __supported_pte_mask to include the encryption mask.
- Updates the protection_map entries to include the encryption mask so
that user-space allocations will automatically have the encryption mask
applied.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull uacess-unaligned removal from Al Viro:
"That stuff had just one user, and an exotic one, at that - binfmt_flat
on arm and m68k"
* 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kill {__,}{get,put}_user_unaligned()
binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
This series of patches splits BUILD_BUG related macros out of
"include/linux/bug.h" into new file "include/linux/build_bug.h" (patch
5), and changes the pointer type checking in the `container_of()` macro
to deal with pointers of array type better (patch 6). Patches 1 to 4
are prerequisites.
Patches 2, 3, 4, and 5 have been inserted since the previous version of
this patch series. Patch 6 here corresponds to v3 and v4's patch 2.
Patch 1 was a prerequisite in v3 of this series to avoid a lot of
warnings when <linux/bug.h> was included by <linux/kernel.h>. That is
no longer relevant for v5 of the series, but I left it in because it was
acked by a Arnd Bergmann and Michal Nazarewicz.
Patches 2, 3, and 4 are some checkpatch clean-ups on
"include/linux/bug.h" before splitting out the BUILD_BUG stuff in patch
5.
Patch 5 splits the BUILD_BUG related macros out of "include/linux/bug.h"
into new file "include/linux/build_bug.h" because including
<linux/bug.h> in "include/linux/kernel.h" would result in build failures
due to circular dependencies.
Patch 6 changes the pointer type checking by `container_of()` to avoid
some incompatible pointer warnings when the dereferenced pointer has
array type.
1) asm-generic/bug.h: declare struct pt_regs; before function prototype
2) linux/bug.h: correct formatting of block comment
3) linux/bug.h: correct "(foo*)" should be "(foo *)"
4) linux/bug.h: correct "space required before that '-'"
5) bug: split BUILD_BUG stuff out into <linux/build_bug.h>
6) kernel.h: handle pointers to arrays better in container_of()
This patch (of 6):
The declaration of `__warn()` has `struct pt_regs *regs` as one of its
parameters. This can result in compiler warnings if `struct regs` is not
already declared. Add an empty declaration of `struct pt_regs` to avoid
the warnings.
Link: http://lkml.kernel.org/r/20170525120316.24473-2-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Highlights include:
- Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
- Platform support for FSP2 (476fpe) board
- Enable ZONE_DEVICE on 64-bit server CPUs.
- Generic & powerpc spin loop primitives to optimise busy waiting
- Convert VDSO update function to use new update_vsyscall() interface
- Optimisations to hypercall/syscall/context-switch paths
- Improvements to the CPU idle code on Power8 and Power9.
As well as many other fixes and improvements.
Thanks to:
Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton
Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian
Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan,
Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo
Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul
Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell,
Thiago Jung Bauermann, Yang Li.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZXyPCAAoJEFHr6jzI4aWAI9QQAISf2x5y//cqCi4ISyQB5pTq
KLS/yQajNkQOw7c0fzBZOaH5Xd/SJ6AcKWDg8yDlpDR3+sRRsr98iIRECgKS5I7/
DxD9ywcbSoMXFQQo1ZMCp5CeuMUIJRtugBnUQM+JXCSUCPbznCY5DchDTLyTBTpO
MeMVhI//JxthhoOMA9MudiEGaYCU9ho442Z4OJUSiLUv8WRbvQX9pTqoc4vx1fxA
BWf2mflztBVcIfKIyxIIIlDLukkMzix6gSYPMCbC7lzkbnU7JSqKiheJXjo1gJS2
ePHKDxeNR2/QP0g/j3aT/MR1uTt9MaNBSX3gANE1xQ9OoJ8m1sOtCO4gNbSdLWae
eXhDnoiEp930DRZOeEioOItuWWoxFaMyYk3BMmRKV4QNdYL3y3TRVeL2/XmRGqYL
Lxz4IY/x5TteFEJNGcRX90uizNSi8AaEXPF16pUk8Ctt6eH3ZSwPMv2fHeYVCMr0
KFlKHyaPEKEoztyzLcUR6u9QB56yxDN58bvLpd32AeHvKhqyxFoySy59x9bZbatn
B2y8mmDItg860e0tIG6jrtplpOVvL8i5jla5RWFVoQDuxxrLAds3vG9JZQs+eRzx
Fiic93bqeUAS6RzhXbJ6QQJYIyhE7yqpcgv7ME1W87SPef3HPBk9xlp3yIDwdA2z
bBDvrRnvupusz8qCWrxe
=w8rj
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
- Platform support for FSP2 (476fpe) board
- Enable ZONE_DEVICE on 64-bit server CPUs.
- Generic & powerpc spin loop primitives to optimise busy waiting
- Convert VDSO update function to use new update_vsyscall() interface
- Optimisations to hypercall/syscall/context-switch paths
- Improvements to the CPU idle code on Power8 and Power9.
As well as many other fixes and improvements.
Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
Bauermann, Yang Li"
* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
powerpc/mm/hash: Implement mark_rodata_ro() for hash
powerpc/vmlinux.lds: Align __init_begin to 16M
powerpc/lib/code-patching: Use alternate map for patch_instruction()
powerpc/xmon: Add patch_instruction() support for xmon
powerpc/kprobes/optprobes: Use patch_instruction()
powerpc/kprobes: Move kprobes over to patch_instruction()
powerpc/mm/radix: Fix execute permissions for interrupt_vectors
powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
powerpc/64s: Blacklist rtas entry/exit from kprobes
powerpc/64s: Blacklist functions invoked on a trap
powerpc/64s: Un-blacklist system_call() from kprobes
powerpc/64s: Move system_call() symbol to just after setting MSR_EE
powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
powerpc/64s: Convert .L__replay_interrupt_return to a local label
powerpc64/elfv1: Only dereference function descriptor for non-text symbols
cxl: Export library to support IBM XSL
powerpc/dts: Use #include "..." to include local DT
powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
...
Merge misc updates from Andrew Morton:
- a few hotfixes
- various misc updates
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits)
mm, memory_hotplug: move movable_node to the hotplug proper
mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
mm, memory_hotplug: drop artificial restriction on online/offline
mm: memcontrol: account slab stats per lruvec
mm: memcontrol: per-lruvec stats infrastructure
mm: memcontrol: use generic mod_memcg_page_state for kmem pages
mm: memcontrol: use the node-native slab memory counters
mm: vmstat: move slab statistics from zone to node counters
mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
mm/zswap.c: improve a size determination in zswap_frontswap_init()
mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
mm/swapfile.c: sort swap entries before free
mm/oom_kill: count global and memory cgroup oom kills
mm: per-cgroup memory reclaim stats
mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
mm: kmemleak: factor object reference updating out of scan_block()
mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
mm, mempolicy: don't check cpuset seqlock where it doesn't matter
mm, cpuset: always use seqlock when changing task's nodemask
mm, mempolicy: simplify rebinding mempolicies when updating cpusets
...
Pull user access str* updates from Al Viro:
"uaccess str...() dead code removal"
* 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
s390 keyboard.c: don't open-code strndup_user()
mips: get rid of unused __strnlen_user()
get rid of unused __strncpy_from_user() instances
kill strlen_user()
- Added TRACE_DEFINE_SIZEOF() which allows trace events that use
sizeof() it the TP_printk() to be converted to the actual size such
that trace-cmd and perf can parse them correctly.
- Some rework of the TRACE_DEFINE_ENUM() such that the above
TRACE_DEFINE_SIZEOF() could reuse the same code.
- Recording of tgid (Thread Group ID). This is similar to how
task COMMs are recorded (cached at sched_switch), where it is
in a table and used on output of the trace and trace_pipe files.
- Have ":mod:<module>" be cached when written into set_ftrace_filter.
Then the functions of the module will be traced at module load.
- Some random clean ups and small fixes.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJZXjYuFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
fsgIAKUvhpn2igoYCR9tWqu+DovEmwxCIumbCzmCFQcRKlLttRte94yY5+W9hnV0
JPzd9T9zBDVqq1fI7iIop1SuTwEfKW6lJom0usZ8AFpK+YKm6FHnQ28POlvHzre2
lzO41tpRWiehLQsITZ47eByhsvEfhx86mYT/oM1JSR6Pii1OpjyNYmDMw6BaMNBT
kSCQFgIhzAhVuHjwAnB/S++E/ou7M5bCwCb5CNh7MubKubV5upHpoJcgYGO+WWa6
56H/iEhff4EECTGJVefd8e78MtJPL8EsuM0nAcMPlnl8AaiOpP7XCdlgTwdefLvP
b3o+nP15voSHkARGXC6eM6gH0po=
=rvGB
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The new features of this release:
- Added TRACE_DEFINE_SIZEOF() which allows trace events that use
sizeof() it the TP_printk() to be converted to the actual size such
that trace-cmd and perf can parse them correctly.
- Some rework of the TRACE_DEFINE_ENUM() such that the above
TRACE_DEFINE_SIZEOF() could reuse the same code.
- Recording of tgid (Thread Group ID). This is similar to how task
COMMs are recorded (cached at sched_switch), where it is in a table
and used on output of the trace and trace_pipe files.
- Have ":mod:<module>" be cached when written into set_ftrace_filter.
Then the functions of the module will be traced at module load.
- Some random clean ups and small fixes"
* tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (26 commits)
ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes
ftrace: Decrement count for dyn_ftrace_total_info for init functions
ftrace: Unlock hash mutex on failed allocation in process_mod_list()
tracing: Add support for display of tgid in trace output
tracing: Add support for recording tgid of tasks
ftrace: Decrement count for dyn_ftrace_total_info file
ftrace: Remove unused function ftrace_arch_read_dyn_info()
sh/ftrace: Remove only user of ftrace_arch_read_dyn_info()
ftrace: Have cached module filters be an active filter
ftrace: Implement cached modules tracing on module load
ftrace: Have the cached module list show in set_ftrace_filter
ftrace: Add :mod: caching infrastructure to trace_array
tracing: Show address when function names are not found
ftrace: Add missing comment for FTRACE_OPS_FL_RCU
tracing: Rename update the enum_map file
tracing: Add TRACE_DEFINE_SIZEOF() macros
tracing: define TRACE_DEFINE_SIZEOF() macro to map sizeof's to their values
tracing: Rename enum_replace to eval_replace
trace: rename enum_map functions
trace: rename trace.c enum functions
...
When unmapping a hugepage range, huge_pte_clear() is used to clear the
page table entries that are marked as not present. huge_pte_clear()
internally just ends up calling pte_clear() which does not correctly
deal with hugepages consisting of contiguous page table entries.
Add a size argument to address this issue and allow architectures to
override huge_pte_clear() by wrapping it in a #ifndef block.
Update s390 implementation with the size parameter as well.
Note that the change only affects huge_pte_clear() - the other generic
hugetlb functions don't need any change.
Link: http://lkml.kernel.org/r/20170522162555.4313-1-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390 bits]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The config option name is now renamed to 'TIMER_OF' for consistency with
the CLOCKSOURCE_OF_DECLARE => TIMER_OF_DECLARE change.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
The table name is now renamed to 'timer' for consistency with
the CLOCKSOURCE_OF_DECLARE => TIMER_OF_DECLARE change.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
The kernel and its modules have sections containing the enum
string to value conversions. Rename this section because we
intend to store more than enums in it.
Link: http://lkml.kernel.org/r/20170531215653.3240-2-jeremy.linton@arm.com
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
After discussing it, this feature is dropped as it is not considered
adequate:
https://patchwork.kernel.org/patch/9639317/
There is no user of this macro yet, so there is no impact on the drivers.
This reverts commit 376bc27150.
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
By moving the kernel side __SI_* defintions right next to the userspace
ones we can kill the non-uapi versions of <asm/siginfo.h> include
include/asm-generic/siginfo.h and untangle the unholy mess of includes.
[ tglx: Removed uapi/asm/siginfo.h from m32r, microblaze, mn10300 and score ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170603190102.28866-6-hch@lst.de
Having it in asm-generic/siginfo.h doesn't make any sense as it is in no way
architecture specific. Move it to signal.h instead where several related
functions already reside.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170603190102.28866-5-hch@lst.de
Add --orphan-handling=warn to final link flags. This ensures we can
handle all sections explicitly. This would have caught subtle breakage
such as 7de3b27bac at build-time.
Also bring existing orphan sections into the fold:
- .text.hot and .text.unlikely are compiler generated sections.
- .sdata2, .dynsbss, .plt are used by PPC32
- We previously did not specify DWARF_DEBUG or STABS_DEBUG
- DWARF_DEBUG did not include all DWARF sections that can be emitted
- A number of sections are unused and can be discarded.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Improvement of headers_install by Nicolas Dichtel.
It has been long since the introduction of uapi directories,
but the de-coupling of exported headers has not been completed.
Headers listed in header-y are exported whether they exist in
uapi directories or not. His work fixes this inconsistency.
All (and only) headers under uapi directories are now exported.
The asm-generic wrappers are still exceptions, but this is a big
step forward.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZE7MBAAoJED2LAQed4NsGroAP/iARejrIFmxuH96D5h2aiP1j
c8KHQ+5fuq4w2KBmfbfkNvWbazlVheT6RrYWBUh/GABGsSqQC07d8New6B8TaUkE
K0E48RsuYxouP18Ys6BOO4/zyRhEFD7Ta72PGQ/gDQY+6hAu4jYQnMdG0wipTblS
QWgnUxTqfCbTjnRpRKXpcwRff+OeTWtOv3s0V8UashJUxnFVQ7Br2uRsm/KKkU/k
jQC65KyHL4HlsFeeAiMmQ9IQPVwLsd6+d5crs0nydHaJ2XrFlNNQ7EEMyG8FxPdx
9b/VpS+XY6DO+jeqkcpFrdL9IgcmCn72Qc5/4vrHuQO2dpWW5mVaVPq9RAGP0Yq/
FB0vZRTp/tOIkD+0esirZW2gJtU3DWMY1A9rc5jjLRabdnRXVTdLfhEnksYJEfES
yPbDEuKyzo6a+zBSqNtMquJPmYVYEDS2mcmgxY5sB58qtXkUN2Yr+uUALxC8XhXW
SHHwIAV3a+UX5ZU9Ys8dp2hI4EXYXtdvsz2zvl4qPIn/Q9d1YoEJRe7/Y0p8gBXM
5pVJ1yohKoYrNZVGBe0LO/gHGVAVgMj0cKn0Xg51bbvjxY2U5djUbMY0uw1gFrrM
O9ld3C6O8zH5BsExCfwp9iPz2SW5W9N80kgnKfjCHBRUKuMTkm02DJf8Hx+pyfVQ
DCy9lYTi76IgZ1uflKq9
=Rqdo
-----END PGP SIGNATURE-----
Merge tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild UAPI updates from Masahiro Yamada:
"Improvement of headers_install by Nicolas Dichtel.
It has been long since the introduction of uapi directories, but the
de-coupling of exported headers has not been completed. Headers listed
in header-y are exported whether they exist in uapi directories or
not. His work fixes this inconsistency.
All (and only) headers under uapi directories are now exported. The
asm-generic wrappers are still exceptions, but this is a big step
forward"
* tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
arch/include: remove empty Kbuild files
uapi: export all arch specifics directories
uapi: export all headers under uapi directories
smc_diag.h: fix include from userland
btrfs_tree.h: fix include from userland
uapi: includes linux/types.h before exporting files
Makefile.headersinst: remove destination-y option
Makefile.headersinst: cleanup input files
x86: stop exporting msr-index.h to userland
nios2: put setup.h in uapi
h8300: put bitsperlong.h in uapi
Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.
In fact, all headers under uapi directories should be exported, thus it's
useless to have an exhaustive list.
After this patch, the following files, which were not exported, are now
exported (with make headers_install_all):
asm-arc/kvm_para.h
asm-arc/ucontext.h
asm-blackfin/shmparam.h
asm-blackfin/ucontext.h
asm-c6x/shmparam.h
asm-c6x/ucontext.h
asm-cris/kvm_para.h
asm-h8300/shmparam.h
asm-h8300/ucontext.h
asm-hexagon/shmparam.h
asm-m32r/kvm_para.h
asm-m68k/kvm_para.h
asm-m68k/shmparam.h
asm-metag/kvm_para.h
asm-metag/shmparam.h
asm-metag/ucontext.h
asm-mips/hwcap.h
asm-mips/reg.h
asm-mips/ucontext.h
asm-nios2/kvm_para.h
asm-nios2/ucontext.h
asm-openrisc/shmparam.h
asm-parisc/kvm_para.h
asm-powerpc/perf_regs.h
asm-sh/kvm_para.h
asm-sh/ucontext.h
asm-tile/shmparam.h
asm-unicore32/shmparam.h
asm-unicore32/ucontext.h
asm-x86/hwcap2.h
asm-xtensa/kvm_para.h
drm/armada_drm.h
drm/etnaviv_drm.h
drm/vgem_drm.h
linux/aspeed-lpc-ctrl.h
linux/auto_dev-ioctl.h
linux/bcache.h
linux/btrfs_tree.h
linux/can/vxcan.h
linux/cifs/cifs_mount.h
linux/coresight-stm.h
linux/cryptouser.h
linux/fsmap.h
linux/genwqe/genwqe_card.h
linux/hash_info.h
linux/kcm.h
linux/kcov.h
linux/kfd_ioctl.h
linux/lightnvm.h
linux/module.h
linux/nbd-netlink.h
linux/nilfs2_api.h
linux/nilfs2_ondisk.h
linux/nsfs.h
linux/pr.h
linux/qrtr.h
linux/rpmsg.h
linux/sched/types.h
linux/sed-opal.h
linux/smc.h
linux/smc_diag.h
linux/stm.h
linux/switchtec_ioctl.h
linux/vfio_ccw.h
linux/wil6210_uapi.h
rdma/bnxt_re-abi.h
Note that I have removed from this list the files which are generated in every
exported directories (like .install or .install.cmd).
Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
subdirs with a pure makefile command.
For the record, note that exported files for asm directories are a mix of
files listed by:
- include/uapi/asm-generic/Kbuild.asm;
- arch/<arch>/include/uapi/asm/Kbuild;
- arch/<arch>/include/asm/Kbuild.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This includes:
* Some code optimizations for the Intel VT-d driver
* Code to switch off a previously enabled Intel IOMMU
* Support for 'struct iommu_device' for OMAP, Rockchip and
Mediatek IOMMUs
* Some header optimizations for IOMMU core code headers and a
few fixes that became necessary in other parts of the kernel
because of that
* ACPI/IORT updates and fixes
* Some Exynos IOMMU optimizations
* Code updates for the IOMMU dma-api code to bring it closer to
use per-cpu iova caches
* New command-line option to set default domain type allocated
by the iommu core code
* Another command line option to allow the Intel IOMMU switched
off in a tboot environment
* ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using
an IDENTITY domain in conjunction with DMA ops, Support for
SMR masking, Support for 16-bit ASIDs (was previously broken)
* Various other small fixes and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZEY4XAAoJECvwRC2XARrjth0QAKV56zjnFclv39aDo6eCq9CT
51+XT4bPY5VKQ2+Jx76TBNObHmGK+8KEMHfT9khpWJtFCDyy25SGckLry1nYqmZs
tSTsbj4sOeCyKzOLITlRN9/OzKXkjKAxYuq+sQZZFDFYf3kCM/eag0dGAU6aVLNp
tkIal3CSpGjCQ9M5JohrtQ1mwiGqCIkMIgvnBjRw+bfpLnQNG+VL6VU2G3RAkV2b
5Vbdoy+P7ZQnJSZr/bibYL2BaQs2diR4gOppT5YbsfniMq4QYSjheu1xBboGX8b7
sx8yuPi4370irSan0BDvlvdQdjBKIRiDjfGEKDhRwPhtvN6JREGakhEOC8MySQ37
mP96B72Lmd+a7DEl5udOL7tQILA0DcUCX0aOyF714khnZuFU5tVlCotb/36xeJ+T
FPc3RbEVQ90m8dYU6MNJ+ahtb/ZapxGTRfisIigB6wlnZa0Evabp9EJSce6oJMkm
whbBhDubeEU18n9XAaofMbu+P2LAzq8cxiRMlsDvT4mIy7jO86jjCmhpu1Tfn2GY
4wrEQZdWOMvhUsIhObXA0aC3BzC506uvnKPW3qy041RaxBuelWiBi29qzYbhxzkr
DLDpWbUZNYPyFJjttpavyQb2/XRduBTJdVP1pQpkJNDsW5jLiBkpSqm9xNADapRY
vLSYRX0JCIquaD+PAuxn
=3aE8
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- code optimizations for the Intel VT-d driver
- ability to switch off a previously enabled Intel IOMMU
- support for 'struct iommu_device' for OMAP, Rockchip and Mediatek
IOMMUs
- header optimizations for IOMMU core code headers and a few fixes that
became necessary in other parts of the kernel because of that
- ACPI/IORT updates and fixes
- Exynos IOMMU optimizations
- updates for the IOMMU dma-api code to bring it closer to use per-cpu
iova caches
- new command-line option to set default domain type allocated by the
iommu core code
- another command line option to allow the Intel IOMMU switched off in
a tboot environment
- ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an
IDENTITY domain in conjunction with DMA ops, Support for SMR masking,
Support for 16-bit ASIDs (was previously broken)
- various other small fixes and improvements
* tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (63 commits)
soc/qbman: Move dma-mapping.h include to qman_priv.h
soc/qbman: Fix implicit header dependency now causing build fails
iommu: Remove trace-events include from iommu.h
iommu: Remove pci.h include from trace/events/iommu.h
arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops()
ACPI/IORT: Fix CONFIG_IOMMU_API dependency
iommu/vt-d: Don't print the failure message when booting non-kdump kernel
iommu: Move report_iommu_fault() to iommu.c
iommu: Include device.h in iommu.h
x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed
iommu/arm-smmu: Correct sid to mask
iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code
omap3isp: Remove iommu_group related code
iommu/omap: Add iommu-group support
iommu/omap: Make use of 'struct iommu_device'
iommu/omap: Store iommu_dev pointer in arch_data
iommu/omap: Move data structures to omap-iommu.h
iommu/omap: Drop legacy-style device support
...
Patch series "set_memory_* functions header refactor", v3.
The set_memory_* APIs came out of a desire to have a better way to
change memory attributes. Many of these attributes were linked to cache
functionality so the prototypes were put in cacheflush.h. These days,
the APIs have grown and have a much wider use than just cache APIs. To
support this growth, split off set_memory_* and friends into a separate
header file to avoid growing cacheflush.h for APIs that have nothing to
do with caches.
Link: http://lkml.kernel.org/r/1488920133-27229-2-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 mm updates from Ingo Molnar:
"The main x86 MM changes in this cycle were:
- continued native kernel PCID support preparation patches to the TLB
flushing code (Andy Lutomirski)
- various fixes related to 32-bit compat syscall returning address
over 4Gb in applications, launched from 64-bit binaries - motivated
by C/R frameworks such as Virtuozzo. (Dmitry Safonov)
- continued Intel 5-level paging enablement: in particular the
conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)
- x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)
- ... plus misc updates, fixes and cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
x86/mm: Fix flush_tlb_page() on Xen
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
x86/mm/64: Fix crash in remove_pagetable()
Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
x86/boot/e820: Remove a redundant self assignment
x86/mm: Fix dump pagetables for 4 levels of page tables
x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
x86/espfix: Add support for 5-level paging
x86/kasan: Extend KASAN to support 5-level paging
x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
x86/paravirt: Add 5-level support to the paravirt code
x86/mm: Define virtual memory map for 5-level paging
x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
x86/boot: Detect 5-level paging support
x86/mm/numa: Remove numa_nodemask_from_meminfo()
...
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- unwinder fixes and enhancements
- improve ftrace interaction with the unwinder
- optimize the code footprint of WARN() and related debugging
constructs
- ... plus misc updates, cleanups and fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/unwind: Dump all stacks in unwind_dump()
x86/unwind: Silence more entry-code related warnings
x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder
x86/unwind: Remove unused 'sp' parameter in unwind_dump()
x86/unwind: Prepend hex mask value with '0x' in unwind_dump()
x86/unwind: Properly zero-pad 32-bit values in unwind_dump()
x86/unwind: Ensure stack pointer is aligned
debug: Avoid setting BUGFLAG_WARNING twice
x86/unwind: Silence entry-related warnings
x86/unwind: Read stack return address in update_stack_state()
x86/unwind: Move common code into update_stack_state()
debug: Fix __bug_table[] in arch linker scripts
debug: Add _ONCE() logic to report_bug()
x86/debug: Define BUG() again for !CONFIG_BUG
x86/debug: Implement __WARN() using UD0
x86/ftrace: Use Makefile logic instead of #ifdef for compiling ftrace_*.o
x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set
x86/ftrace: Clean up ftrace_regs_caller
x86/ftrace: Add stack frame pointer to ftrace_caller
x86/ftrace: Move the ftrace specific code out of entry_32.S
...
Pull uaccess unification updates from Al Viro:
"This is the uaccess unification pile. It's _not_ the end of uaccess
work, but the next batch of that will go into the next cycle. This one
mostly takes copy_from_user() and friends out of arch/* and gets the
zero-padding behaviour in sync for all architectures.
Dealing with the nocache/writethrough mess is for the next cycle;
fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
sold on access_ok() in there, BTW; just not in this pile), same for
reducing __copy_... callsites, strn*... stuff, etc. - there will be a
pile about as large as this one in the next merge window.
This one sat in -next for weeks. -3KLoC"
* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
HAVE_ARCH_HARDENED_USERCOPY is unconditional now
CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
m32r: switch to RAW_COPY_USER
hexagon: switch to RAW_COPY_USER
microblaze: switch to RAW_COPY_USER
get rid of padding, switch to RAW_COPY_USER
ia64: get rid of copy_in_user()
ia64: sanitize __access_ok()
ia64: get rid of 'segment' argument of __do_{get,put}_user()
ia64: get rid of 'segment' argument of __{get,put}_user_check()
ia64: add extable.h
powerpc: get rid of zeroing, switch to RAW_COPY_USER
esas2r: don't open-code memdup_user()
alpha: fix stack smashing in old_adjtimex(2)
don't open-code kernel_setsockopt()
mips: switch to RAW_COPY_USER
mips: get rid of tail-zeroing in primitives
mips: make copy_from_user() zero tail explicitly
mips: clean and reorder the forest of macros...
mips: consolidate __invoke_... wrappers
...
The IORT linker section introduced by commit 34ceea275f
("ACPI/IORT: Introduce linker section for IORT entries probing")
was needed to make sure SMMU drivers are registered (and therefore
probed) in the kernel before devices using the SMMU have a chance
to probe in turn.
Through the introduction of deferred IOMMU configuration the linker
section based IORT probing infrastructure is not needed any longer, in
that device/SMMU probe dependencies are managed through the probe
deferral mechanism, making the IORT linker section infrastructure
unused, so that it can be removed.
Remove the unused IORT linker section probing infrastructure
from the kernel to complete the ACPI IORT IOMMU configure probe
deferral mechanism implementation.
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Dan reported that his static checking complains about BUGFLAG_WARNING
being set on both sides of the bitwise-or, it figures that that might've
been an unintentional mistake.
Since there are no architectures that implement __WARN_TAINT() (I
converted them all to implement __WARN_FLAGS()), and all __WARN_FLAGS()
implementations already set BUGFLAG_WARNING, we can remove the bit from
BUGFLAG_TAINT() and make Dan's checker happy.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170410084939.4bwhrvpmauwfzauq@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When __{start,end}_ro_after_init is referenced from C code, we run into
the following build errors on blackfin:
kernel/extable.c:169: undefined reference to `__start_ro_after_init'
kernel/extable.c:169: undefined reference to `__end_ro_after_init'
The build error is due to the fact that blackfin is one of the few
arches that prepends an underscore '_' to all symbols defined in C.
Fix this by wrapping __{start,end}_ro_after_init in vmlinux.lds.h with
VMLINUX_SYMBOL(), which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX.
Link: http://lkml.kernel.org/r/1491259387-15869-1-git-send-email-jeyu@redhat.com
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer fixes from Thomas Gleixner:
"Two small fixes for the new CLKEVT_OF infrastructure"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vmlinux.lds: Add __clkevt_of_table to kernel
clockevents: Fix syntax error in clkevt-of macro
A section name for .data..ro_after_init was added by both:
commit d07a980c1b ("s390: add proper __ro_after_init support")
and
commit d7c19b066d ("mm: kmemleak: scan .data.ro_after_init")
The latter adds incorrect wrapping around the existing s390 section, and
came later. I'd prefer the s390 naming, so this moves the s390-specific
name up to the asm-generic/sections.h and renames the section as used by
kmemleak (and in the future, kernel/extable.c).
Link: http://lkml.kernel.org/r/20170327192213.GA129375@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390 parts]
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh suggested moving the _ONCE logic inside the trap handler, using a
bit in the bug_entry::flags field, avoiding the need for the extra
variable.
Sadly this only works for WARN_ON_ONCE(), since the others have
printk() statements prior to triggering the trap.
Still, this saves a fair amount of text and some data:
text data filename
10682460 4530992 defconfig-build/vmlinux.orig
10665111 4530096 defconfig-build/vmlinux.patched
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The code introduced by commit 0c8893c9095d ("clockevents: Add a
clkevt-of mechanism like clksrc-of") refer to __clkevt_of_table
what doesn't exist in the vmlinux. As a result kernel build
failed with error: "clkevt-probe.c:63: undefined reference to
`__clkevt_of_table’"
Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
This is a preparation patch for the transition of x86 to the generic GUP_fast()
implementation.
On x86, we would need to do additional permission checks to determine if
access is allowed.
Let's abstract it out into separate helpers.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dann Frazier <dann.frazier@canonical.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The only arch that defines it to something meaningful is x86.
But x86 doesn't use the generic GUP_fast() implementation -- the
only place where the callback is called.
Let's drop it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dann Frazier <dann.frazier@canonical.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Convert all non-architecture-specific code to 5-level paging.
It's mostly mechanical adding handling one more page table level in
places where we deal with pud_t.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Like with pgtable-nopud.h for 4-level paging, this new header is base
for converting an architectures to properly folded p4d_t level.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are going to introduce <asm-generic/pgtable-nop4d.h> to provide
abstraction for properly (in opposite to 5level-fixup.h hack) folded
p4d level. The new header will be included from pgtable-nopud.h.
If an architecture uses <asm-generic/nop*d.h>, we cannot use
5level-fixup.h directly to quickly convert the architecture to 5-level
paging as it would conflict with pgtable-nop4d.h.
With this patch an architecture can define __ARCH_USE_5LEVEL_HACK before
inclusion <asm-genenric/nop*d.h> to use 5level-fixup.h.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are going to switch core MM to 5-level paging abstraction.
This is preparation step which adds <asm-generic/5level-fixup.h>
As with 4level-fixup.h, the new header allows quickly make all
architectures compatible with 5-level paging in core MM.
In long run we would like to switch architectures to properly folded p4d
level by using <asm-generic/pgtable-nop4d.h>, but it requires more
changes to arch-specific code.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Often all is needed is these small helpers, instead of compiler.h or a
full kprobes.h. This is important for asm helpers, in fact even some
asm/kprobes.h make use of these helpers... instead just keep a generic
asm file with helpers useful for asm code with the least amount of
clutter as possible.
Likewise we need now to also address what to do about this file for both
when architectures have CONFIG_HAVE_KPROBES, and when they do not. Then
for when architectures have CONFIG_HAVE_KPROBES but have disabled
CONFIG_KPROBES.
Right now most asm/kprobes.h do not have guards against CONFIG_KPROBES,
this means most architecture code cannot include asm/kprobes.h safely.
Correct this and add guards for architectures missing them.
Additionally provide architectures that not have kprobes support with
the default asm-generic solution. This lets us force asm/kprobes.h on
the header include/linux/kprobes.h always, but most importantly we can
now safely include just asm/kprobes.h on architecture code without
bringing the full kitchen sink of header files.
Two architectures already provided a guard against CONFIG_KPROBES on its
kprobes.h: sh, arch. The rest of the architectures needed gaurds added.
We avoid including any not-needed headers on asm/kprobes.h unless
kprobes have been enabled.
In a subsequent atomic change we can try now to remove compiler.h from
include/linux/kprobes.h.
During this sweep I've also identified a few architectures defining a
common macro needed for both kprobes and ftrace, that of the definition
of the breakput instruction up. Some refer to this as
BREAKPOINT_INSTRUCTION. This must be kept outside of the #ifdef
CONFIG_KPROBES guard.
[mcgrof@kernel.org: fix arm64 build]
Link: http://lkml.kernel.org/r/CAB=NE6X1WMByuARS4mZ1g9+W=LuVBnMDnh_5zyN0CLADaVh=Jw@mail.gmail.com
[sfr@canb.auug.org.au: fixup for kprobes declarations moving]
Link: http://lkml.kernel.org/r/20170214165933.13ebd4f4@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170203233139.32682-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge more updates from Andrew Morton:
- almost all of the rest of MM
- misc bits
- KASAN updates
- procfs
- lib/ updates
- checkpatch updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (124 commits)
checkpatch: remove false unbalanced braces warning
checkpatch: notice unbalanced else braces in a patch
checkpatch: add another old address for the FSF
checkpatch: update $logFunctions
checkpatch: warn on logging continuations
checkpatch: warn on embedded function names
lib/lz4: remove back-compat wrappers
fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version
crypto: change LZ4 modules to work with new LZ4 module version
lib/decompress_unlz4: change module to work with new LZ4 module version
lib: update LZ4 compressor module
lib/test_sort.c: make it explicitly non-modular
lib: add CONFIG_TEST_SORT to enable self-test of sort()
rbtree: use designated initializers
linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors
lib/find_bit.c: micro-optimise find_next_*_bit
lib: add module support to atomic64 tests
lib: add module support to glob tests
lib: add module support to crc32 tests
kernel/ksysfs.c: add __ro_after_init to bin_attribute structure
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYsL7RAAoJEMOzHC1eZifkXmgP/1HI2c+UeFbV9RVy0AjwQZr0
wgE0qM3ICD1ZV+gNqP4zN0NAIECq1p3qLAV94P+mkfpFc04FZcdysME8ZgvNyzTN
BlELpElVKpE9KqlEE7beoybpyc2GlCaieEFB1ouj8lW/nUSc4apqHDh/OVqRssfu
QyQ3FDIt6ejvMzq6XRFYJtxL3e8qwAhlIiQBxFYR2b+9MoPgz7Ctf0j0a2SGeTFD
UmUz5IGkMXicv91YJeD7vb0iP0ujpbFrXGXneSLS6InzS7h5cZGw1O9B2cyVbBMN
unA02MVc/9rz/96J6MIC1GnlI5IuWvsbC0nIRkTukHFVQEcrxmX5BO9iQ47dJ0+n
XpCvhRm7djrdkqRsGx8y9xG9hAVXu8a2EyWDz91dJLU8JpTkHkErOLBvmKBEp3kE
/K+k75SGIVKpy7hU9buyMmlOhQY09sdzdzfyqCaXD7H4nImShZGmcrZV10mBlZtJ
7k+Ke+fm0EpEewOVyOzwEGjTPr3LDzYCpm2qjOG9/zZd6MX0da8eH2X7DzB9GDWW
yR5UiGbtFVE0M13tfE58aVueVRt0o+wLwRZZYur643aLPyypxHWTl3RAhuhv9NzD
TaB8n3J6kDt+E5ivzPm2439AM/p9ZcyNGScUSxo84lwXW74ozcpfz029Yvhw52F9
cD4kHEFG7+OYdbZmwV3C
=wmBO
-----END PGP SIGNATURE-----
Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
"Highlights include:
- optimized memset and memcpy routines, ~20% boot time saving
- support for cpu idling
- adding support for l.swa and l.lwa atomic operations (in spec from
2014)
- use atomics to implement: bitops, cmpxchg, futex
- the atomics are in preparation for SMP support"
* tag 'openrisc-for-linus' of git://github.com/openrisc/linux: (25 commits)
openrisc: head: Init r0 to 0 on start
openrisc: Export ioremap symbols used by modules
arch/openrisc/lib/memcpy.c: use correct OR1200 option
openrisc: head: Remove unused strings
openrisc: head: Move init strings to rodata section
openrisc: entry: Fix delay slot detection
openrisc: entry: Whitespace and comment cleanups
scripts/checkstack.pl: Add openrisc support
MAINTAINERS: Add the openrisc official repository
openrisc: Add .gitignore
openrisc: Add optimized memcpy routine
openrisc: Add optimized memset
openrisc: Initial support for the idle state
openrisc: Fix the bitmask for the unit present register
openrisc: remove unnecessary stddef.h include
openrisc: add futex_atomic_* implementations
openrisc: add optimized atomic operations
openrisc: add cmpxchg and xchg implementations
openrisc: add atomic bitops
openrisc: add l.lwa/l.swa emulation
...
Without this KSM will consider the page write protected, but a numa
fault can later mark the page writable. This can result in memory
corruption.
Link: http://lkml.kernel.org/r/1487498625-10891-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Numabalancing preserve write fix", v2.
This patch series address an issue w.r.t THP migration and autonuma
preserve write feature. migrate_misplaced_transhuge_page() cannot deal
with concurrent modification of the page. It does a page copy without
following the migration pte sequence. IIUC, this was done to keep the
migration simpler and at the time of implemenation we didn't had THP
page cache which would have required a more elaborate migration scheme.
That means thp autonuma migration expect the protnone with saved write
to be done such that both kernel and user cannot update the page
content. This patch series enables archs like ppc64 to do that. We are
good with the hash translation mode with the current code, because we
never create a hardware page table entry for a protnone pte.
This patch (of 2):
Autonuma preserves the write permission across numa fault to avoid
taking a writefault after a numa fault (Commit: b191f9b106 " mm: numa:
preserve PTE write permissions across a NUMA hinting fault").
Architecture can implement protnone in different ways and some may
choose to implement that by clearing Read/ Write/Exec bit of pte.
Setting the write bit on such pte can result in wrong behaviour. Fix
this up by allowing arch to override how to save the write bit on a
protnone pte.
[aneesh.kumar@linux.vnet.ibm.com: don't mark pte saved write in case of dirty_accountable]
Link: http://lkml.kernel.org/r/1487942884-16517-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
[aneesh.kumar@linux.vnet.ibm.com: v3]
Link: http://lkml.kernel.org/r/1487498625-10891-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1487050314-3892-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current transparent hugepage code only supports PMDs. This patch
adds support for transparent use of PUDs with DAX. It does not include
support for anonymous pages. x86 support code also added.
Most of this patch simply parallels the work that was done for huge
PMDs. The only major difference is how the new ->pud_entry method in
mm_walk works. The ->pmd_entry method replaces the ->pte_entry method,
whereas the ->pud_entry method works along with either ->pmd_entry or
->pte_entry. The pagewalk code takes care of locking the PUD before
calling ->pud_walk, so handlers do not need to worry whether the PUD is
stable.
[dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()]
Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com
[dave.jiang@intel.com: native_pud_clear missing on i386 build]
Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using the l.lwa and l.swa atomic instruction pair.
Most openrisc processor cores provide these instructions now. If the
instructions are not available emulation is provided.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
[shorne@gmail.com: remove OPENRISC_HAVE_INST_LWA_SWA config suggesed by
Alan Cox https://lkml.org/lkml/2014/7/23/666]
[shorne@gmail.com: expand to implement all ops suggested by Peter
Zijlstra https://lkml.org/lkml/2017/2/20/317]
Signed-off-by: Stafford Horne <shorne@gmail.com>
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- Implement wraparound-safe refcount_t and kref_t types based on
generic atomic primitives (Peter Zijlstra)
- Improve and fix the ww_mutex code (Nicolai Hähnle)
- Add self-tests to the ww_mutex code (Chris Wilson)
- Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
Bueso)
- Micro-optimize the current-task logic all around the core kernel
(Davidlohr Bueso)
- Tidy up after recent optimizations: remove stale code and APIs,
clean up the code (Waiman Long)
- ... plus misc fixes, updates and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
fork: Fix task_struct alignment
locking/spinlock/debug: Remove spinlock lockup detection code
lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
lkdtm: Convert to refcount_t testing
kref: Implement 'struct kref' using refcount_t
refcount_t: Introduce a special purpose refcount type
sched/wake_q: Clarify queue reinit comment
sched/wait, rcuwait: Fix typo in comment
locking/mutex: Fix lockdep_assert_held() fail
locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
locking/rwsem: Reinit wake_q after use
locking/rwsem: Remove unnecessary atomic_long_t casts
jump_labels: Move header guard #endif down where it belongs
locking/atomic, kref: Implement kref_put_lock()
locking/ww_mutex: Turn off __must_check for now
locking/atomic, kref: Avoid more abuse
locking/atomic, kref: Use kref_get_unless_zero() more
locking/atomic, kref: Kill kref_sub()
locking/atomic, kref: Add kref_read()
locking/atomic, kref: Add KREF_INIT()
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this (fairly busy) cycle were:
- There was a class of scheduler bugs related to forgetting to update
the rq-clock timestamp which can cause weird and hard to debug
problems, so there's a new debug facility for this: which uncovered
a whole lot of bugs which convinced us that we want to keep the
debug facility.
(Peter Zijlstra, Matt Fleming)
- Various cputime related updates: eliminate cputime and use u64
nanoseconds directly, simplify and improve the arch interfaces,
implement delayed accounting more widely, etc. - (Frederic
Weisbecker)
- Move code around for better structure plus cleanups (Ingo Molnar)
- Move IO schedule accounting deeper into the scheduler plus related
changes to improve the situation (Tejun Heo)
- ... plus a round of sched/rt and sched/deadline fixes, plus other
fixes, updats and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
sched/core: Remove unlikely() annotation from sched_move_task()
sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
sched/topology: Split out scheduler topology code from core.c into topology.c
sched/core: Remove unnecessary #include headers
sched/rq_clock: Consolidate the ordering of the rq_clock methods
delayacct: Include <uapi/linux/taskstats.h>
sched/core: Clean up comments
sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
sched/clock: Add dummy clear_sched_clock_stable() stub function
sched/cputime: Remove generic asm headers
sched/cputime: Remove unused nsec_to_cputime()
s390, sched/cputime: Remove unused cputime definitions
powerpc, sched/cputime: Remove unused cputime definitions
s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
ia64, sched/cputime: Remove unused cputime definitions
ia64: Convert vtime to use nsec units directly
ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
sched/cputime: Remove jiffies based cputime
sched/cputime, vtime: Return nsecs instead of cputime_t to account
sched/cputime: Complete nsec conversion of tick based accounting
...
The modversion symbol CRCs are emitted as ELF symbols, which allows us
to easily populate the kcrctab sections by relying on the linker to
associate each kcrctab slot with the correct value.
This has a couple of downsides:
- Given that the CRCs are treated as memory addresses, we waste 4 bytes
for each CRC on 64 bit architectures,
- On architectures that support runtime relocation, a R_<arch>_RELATIVE
relocation entry is emitted for each CRC value, which identifies it
as a quantity that requires fixing up based on the actual runtime
load offset of the kernel. This results in corrupted CRCs unless we
explicitly undo the fixup (and this is currently being handled in the
core module code)
- Such runtime relocation entries take up 24 bytes of __init space
each, resulting in a x8 overhead in [uncompressed] kernel size for
CRCs.
Switching to explicit 32 bit values on 64 bit architectures fixes most
of these issues, given that 32 bit values are not treated as quantities
that require fixing up based on the actual runtime load offset. Note
that on some ELF64 architectures [such as PPC64], these 32-bit values
are still emitted as [absolute] runtime relocatable quantities, even if
the value resolves to a build time constant. Since relative relocations
are always resolved at build time, this patch enables MODULE_REL_CRCS on
powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC
references into relative references into .rodata where the actual CRC
value is stored.
So redefine all CRC fields and variables as u32, and redefine the
__CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using
inline assembler (which is necessary since 64-bit C code cannot use
32-bit types to hold memory addresses, even if they are ultimately
resolved using values that do not exceed 0xffffffff). To avoid
potential problems with legacy 32-bit architectures using legacy
toolchains, the equivalent C definition of the kcrctab entry is retained
for 32-bit architectures.
Note that this mostly reverts commit d4703aefdb ("module: handle ppc64
relocating kcrctabs when CONFIG_RELOCATABLE=y")
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cputime_t is now only used by two architectures:
* powerpc (when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y)
* s390
And since the core doesn't use it anymore, we don't need any arch support
from the others. So we can remove their stub implementations.
A final cleanup would be to provide an efficient pure arch
implementation of cputime_to_nsec() for s390 and powerpc and finally
remove include/linux/cputime.h .
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-36-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Initially, nsec based cputime_t implementation belonged to IA64. It got
exported later for CONFIG_VIRT_CPU_ACCOUNTING_GEN but now it is again
only used by IA64. So let's move it back there.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-29-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This cputime_t implementation is now unused, we can remove it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-28-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since sem->count had been changed to a atomic_long_t type, it is no
longer necessary to use the atomic_long_t cast anymore. So remove them.
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1484836312-6656-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The asm-prototypes.h file is used to provide dummy function declarations
for genksyms, when processing asm files with EXPORT_SYMBOL. Make sure
that any architecture defines get out of our way. x86 currently has an
issue with memcpy on 64bit with CONFIG_KMEMCHECK=y and with
memset/__memset on 32bit:
$ cat init/test.c
#include <asm/asm-prototypes.h>
$ make -s init/test.o
In file included from ./arch/x86/include/asm/string.h:4:0,
from ./include/linux/string.h:18,
from ./include/linux/bitmap.h:8,
from ./include/linux/cpumask.h:11,
from ./arch/x86/include/asm/cpumask.h:4,
from ./arch/x86/include/asm/msr.h:10,
from ./arch/x86/include/asm/processor.h:20,
from ./arch/x86/include/asm/cpufeature.h:4,
from ./arch/x86/include/asm/thread_info.h:52,
from ./include/linux/thread_info.h:25,
from ./arch/x86/include/asm/preempt.h:6,
from ./include/linux/preempt.h:59,
from ./include/linux/spinlock.h:50,
from ./include/linux/seqlock.h:35,
from ./include/linux/time.h:5,
from ./include/uapi/linux/timex.h:56,
from ./include/linux/timex.h:56,
from ./include/linux/sched.h:19,
from ./include/linux/uaccess.h:4,
from ./arch/x86/include/asm/asm-prototypes.h:2,
from init/test.c:1:
./arch/x86/include/asm/string_64.h:52:47: error: expected declaration specifiers or ‘...’ before ‘(’ token
#define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len))
./include/asm-generic/asm-prototypes.h:6:14: note: in expansion of macro ‘memcpy’
extern void *memcpy(void *, const void *, __kernel_size_t);
^
...
During real build, this manifests itself by genksyms segfaulting.
Fixes: 334bb77387 ("x86/kbuild: enable modversions for symbols exported from asm")
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michal Marek <mmarek@suse.com>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kbuild updates from Michal Marek:
- prototypes for x86 asm-exported symbols (Adam Borowski) and a warning
about missing CRCs (Nick Piggin)
- asm-exports fix for LTO (Nicolas Pitre)
- thin archives improvements (Nick Piggin)
- linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick
Piggin)
- genksyms support for __builtin_va_list keyword
- misc minor fixes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
x86/kbuild: enable modversions for symbols exported from asm
kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case
scripts/kallsyms: remove last remnants of --page-offset option
make use of make variable CURDIR instead of calling pwd
kbuild: cmd_export_list: tighten the sed script
kbuild: minor improvement for thin archives build
kbuild: modpost warn if export version crc is missing
kbuild: keep data tables through dead code elimination
kbuild: improve linker compatibility with lib-ksyms.o build
genksyms: Regenerate parser
kbuild/genksyms: handle va_list type
kbuild: thin archives for multi-y targets
kbuild: kallsyms allow 3-pass generation if symbols size has changed
These changes include:
* Support for the ACPI IORT table on ARM systems and patches to
make the ARM-SMMU driver make use of it
* Conversion of the Exynos IOMMU driver to device dependency
links and implementation of runtime pm support based on that
conversion
* Update the Mediatek IOMMU driver to use the new
struct device->iommu_fwspec member
* Implementation of dma_map/unmap_resource in the generic ARM
dma-iommu layer
* A number of smaller fixes and improvements all over the place
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJYUskLAAoJECvwRC2XARrjGnQQALfaIJffrh+5vjs08h3J86TF
lUEvxu1/5PGCwXEWbIH+NQh4pMgBuOL0exa6sR55CqtHPkUKMndv+sVXWUkCrl9Z
Q4jro+IuPxbHHdQvULjdqn1gtvPvBXN/OFCbQsfj6G+GblAZCIn0OtkKm2MqdGa4
xHPTAhW+QAt7X8O7PoMsvCNa0loZJx05ToIPbhN23U0H12TwU4aHaMg13+KSGgz/
wEXGDYDU8EnGubcM/JEpy84qywRU1D+TDwzaIFT2sy7Ewhmgz6rSf+SOa+vn4u5z
G9xhc6b8Vq+2ZKr1j28pPdrCYMRpQxBWru7t2rcdV0Xa+J1JEPKJfyEPxeldDeXJ
4bqld/D/0nq7MFvGL73qff+oiU2qOJ1VnPrQO8SbHyueeFd4axMYlPsU7GeDWqoz
LX1gKuHE1t2GqsNgn/dLouyJGDKjyYzZbio0HuPHGjegkx+dkTXEM0yzLtZJG8hw
cZSlYijrRJeEROpM8/5BGid3BOAIx8qlOXO/QzI41e0KnQikkgQBgr1L2+Ki8ZaB
mNs/S/BqDr6LSUP5ArQHlZ0wu/pk1ehwYkmzp9j7Ui1jGdSWwbqw7KkLDXd0pL4g
NMhsprJLYlnY2GUdrbcDOEWHS9Ex0vRsPKNWoqCggzg/8h8cQmuiDZO346vhlvq/
ORMwDO7LNZuPHqDM0WA+
=EYLL
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"These changes include:
- support for the ACPI IORT table on ARM systems and patches to make
the ARM-SMMU driver make use of it
- conversion of the Exynos IOMMU driver to device dependency links
and implementation of runtime pm support based on that conversion
- update the Mediatek IOMMU driver to use the new struct
device->iommu_fwspec member
- implementation of dma_map/unmap_resource in the generic ARM
dma-iommu layer
- a number of smaller fixes and improvements all over the place"
* tag 'iommu-updates-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (44 commits)
ACPI/IORT: Make dma masks set-up IORT specific
iommu/amd: Missing error code in amd_iommu_init_device()
iommu/s390: Drop duplicate header pci.h
ACPI/IORT: Introduce iort_iommu_configure
ACPI/IORT: Add single mapping function
ACPI/IORT: Replace rid map type with type mask
iommu/arm-smmu: Add IORT configuration
iommu/arm-smmu: Split probe functions into DT/generic portions
iommu/arm-smmu-v3: Add IORT configuration
iommu/arm-smmu-v3: Split probe functions into DT/generic portions
ACPI/IORT: Add support for ARM SMMU platform devices creation
ACPI/IORT: Add node match function
ACPI: Implement acpi_dma_configure
iommu/arm-smmu-v3: Convert struct device of_node to fwnode usage
iommu/arm-smmu: Convert struct device of_node to fwnode usage
iommu: Make of_iommu_set/get_ops() DT agnostic
ACPI/IORT: Add support for IOMMU fwnode registration
ACPI/IORT: Introduce linker section for IORT entries probing
ACPI: Add FWNODE_ACPI_STATIC fwnode type
iommu/arm-smmu: Set SMTNMB_TLBEN in ACR to enable caching of bypass entries
...
Commit 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm") adds
modversion support for symbols exported from asm files. Architectures
must include C-style declarations for those symbols in asm/asm-prototypes.h
in order for them to be versioned.
Add these declarations for x86, and an architecture-independent file that
can be used for common symbols.
With f27c2f6 reverting 8ab2ae6 ("default exported asm symbols to zero") we
produce a scary warning on x86, this commit fixes that.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Peter Wu <peter@lekensteyn.nl>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Michal Marek <mmarek@suse.com>
Add arch specific callback in the generic THP page cache code that will
deposit and withdarw preallocated page table. Archs like ppc64 use this
preallocated table to store the hash pte slot information.
Testing:
kernel build of the patch series on tmpfs mounted with option huge=always
The related thp stat:
thp_fault_alloc 72939
thp_fault_fallback 60547
thp_collapse_alloc 603
thp_collapse_alloc_failed 0
thp_file_alloc 253763
thp_file_mapped 4251
thp_split_page 51518
thp_split_page_failed 1
thp_deferred_split_page 73566
thp_split_pmd 665
thp_zero_page_alloc 3
thp_zero_page_alloc_failed 0
[akpm@linux-foundation.org: remove unneeded parentheses, per Kirill]
Link: http://lkml.kernel.org/r/20161113150025.17942-2-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Independent of whether the vma is for anonymous memory, some arches like
ppc64 would like to override pmd_move_must_withdraw().
One option is to encapsulate the vma_is_anonymous() check for general
architectures inside pmd_move_must_withdraw() so that is always called
and architectures that need unconditional overriding can override this
function. ppc64 needs to override the function when the MMU is
configured to use hash PTE's.
[bsingharora@gmail.com: reworked changelog]
Link: http://lkml.kernel.org/r/20161113150025.17942-1-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we check for page size change early in the loop, we can
partially revert e9d55e1570 ("mm: change the interface for
__tlb_remove_page").
This simplies the code much, by removing the need to track the last
address with which we adjusted the range. We also go back to the older
way of filling the mmu_gather array, ie, we add an entry and then check
whether the gather batch is full.
Link: http://lkml.kernel.org/r/20161026084839.27299-6-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With commit e77b0852b5 ("mm/mmu_gather: track page size with mmu
gather and force flush if page size change") we added the ability to
force a tlb flush when the page size change in a mmu_gather loop. We
did that by checking for a page size change every time we added a page
to mmu_gather for lazy flush/remove. We can improve that by moving the
page size change check early and not doing it every time we add a page.
This also helps us to do tlb flush when invalidating a range covering
dax mapping. Wrt dax mapping we don't have a backing struct page and
hence we don't call tlb_remove_page, which earlier forced the tlb flush
on page size change. Moving the page size change check earlier means we
will do the same even for dax mapping.
We also avoid doing this check on architecture other than powerpc.
In a later patch we will remove page size check from tlb_remove_page().
Link: http://lkml.kernel.org/r/20161026084839.27299-5-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This add tlb_remove_hugetlb_entry similar to tlb_remove_pmd_tlb_entry.
Link: http://lkml.kernel.org/r/20161026084839.27299-4-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We use __tlb_adjust_range to update range convered by mmu_gather struct.
We later use the 'start' and 'end' to do a mmu_notifier_invalidate_range
in tlb_flush_mmu_tlbonly(). Update the 'end' correctly in
__tlb_adjust_range so that we call mmu_notifier_invalidate_range with
the correct range values.
Wrt tlbflush, this should not have any impact, because a flush with
correct start address will flush tlb mapping for the range.
Also add comment w.r.t updating the range when we free pagetable pages.
For now we don't support a range based page table cache flush.
Link: http://lkml.kernel.org/r/20161026084839.27299-3-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull scheduler updates from Ingo Molnar:
"The main scheduler changes in this cycle were:
- support Intel Turbo Boost Max Technology 3.0 (TBM3) by introducig a
notion of 'better cores', which the scheduler will prefer to
schedule single threaded workloads on. (Tim Chen, Srinivas
Pandruvada)
- enhance the handling of asymmetric capacity CPUs further (Morten
Rasmussen)
- improve/fix load handling when moving tasks between task groups
(Vincent Guittot)
- simplify and clean up the cputime code (Stanislaw Gruszka)
- improve mass fork()ed task spread a.k.a. hackbench speedup (Vincent
Guittot)
- make struct kthread kmalloc()ed and related fixes (Oleg Nesterov)
- add uaccess atomicity debugging (when using access_ok() in the
wrong context), under CONFIG_DEBUG_ATOMIC_SLEEP=y (Peter Zijlstra)
- implement various fixes, cleanups and other enhancements (Daniel
Bristot de Oliveira, Martin Schwidefsky, Rafael J. Wysocki)"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
sched/core: Use load_avg for selecting idlest group
sched/core: Fix find_idlest_group() for fork
kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker()
kthread: Don't use to_live_kthread() in kthread_[un]park()
kthread: Don't use to_live_kthread() in kthread_stop()
Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function"
kthread: Make struct kthread kmalloc'ed
x86/uaccess, sched/preempt: Verify access_ok() context
sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable
sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO
x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h>
cpufreq/intel_pstate: Use CPPC to get max performance
acpi/bus: Set _OSC for diverse core support
acpi/bus: Enable HWP CPPC objects
x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
x86/sysctl: Add sysctl for ITMT scheduling feature
x86: Enable Intel Turbo Boost Max Technology 3.0
x86/topology: Define x86's arch_update_cpu_topology
sched: Extend scheduler's asym packing
sched/fair: Clean up the tunable parameter definitions
...
Pull mm/PAT cleanup from Ingo Molnar:
"A single cleanup for a generic interface that was originally
introduced for PAT"
* 'mm-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pat, mm: Make track_pfn_insert() return void
With binutils-2.26 and before, a weak missing symbol was kept during the
final link, and a missing CRC for an export would lead to that CRC being
treated as zero implicitly. With binutils-2.27, the crc symbol gets
dropped, and any module trying to use it will fail to load.
This sets the weak CRC symbol to zero explicitly, making it defined in
vmlinux, which in turn lets us load the modules referring to that CRC.
The comment above the __CRC_SYMBOL macro suggests that this was always
the intention, although it also seems that all symbols defined in C have
a correct CRC these days, and only the exports that are now done in
assembly need this.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit e647b53227 ("ACPI: Add early device probing
infrastructure") the kernel has gained the infrastructure that allows
adding linker script section entries to execute ACPI driver callbacks
(ie probe routines) for all subsystems that register a table entry
in the respective kernel section (eg clocksource, irqchip).
Since ARM IOMMU devices data is described through IORT tables when
booting with ACPI, the ARM IOMMU drivers must be made able to hook ACPI
callback routines that are called to probe IORT entries and initialize
the respective IOMMU devices.
To avoid adding driver specific hooks into IORT table initialization
code (breaking therefore code modularity - ie ACPI IORT code must be made
aware of ARM SMMU drivers ACPI init callbacks), this patch adds code
that allows ARM SMMU drivers to take advantage of the ACPI early probing
infrastructure, so that they can add linker script section entries
containing drivers callback to be executed on IORT tables detection.
Since IORT nodes are differentiated by a type, the callback routines
can easily parse the IORT table entries, check the IORT nodes and
carry out some actions whenever the IORT node type associated with
the driver specific callback is matched.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Tomasz Nowicki <tn@semihalf.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Tomasz Nowicki <tn@semihalf.com>
Cc: Tomasz Nowicki <tn@semihalf.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is enabled we must ensure
that we still keep various programatically-accessed tables.
[npiggin: Fold Paul's patches into one, and add a few more tables.
diff symbol tables of allyesconfig with/without -gc-sections shows up
lost tables quite easily.]
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Currently cputime_to_scaled() just return it's argument on
all implementations, we don't need to call this function.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1479175612-14718-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Limit the number of kmemleak false positives by including
.data.ro_after_init in memory scanning. To achieve this we need to add
symbols for start and end of the section to the linker scripts.
The problem was been uncovered by commit 56989f6d85 ("genetlink: mark
families as __ro_after_init").
Link: http://lkml.kernel.org/r/1478274173-15218-1-git-send-email-jakub.kicinski@netronome.com
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 345ddcc882 ("ftrace: Have set_ftrace_pid use the bitmap like
events do") added a couple of this_cpu_read calls to the ftrace code.
On x86 this is not a problem, since it has single instructions to read
percpu data. Other architectures which use the generic variant now
have additional preempt_disable and preempt_enable calls in the core
ftrace code. This may lead to recursive calls and in result to a dead
machine, e.g. if preemption and debugging options are enabled.
To fix this use the notrace variant of preempt_disable and
preempt_enable within the generic percpu code.
Reported-and-bisected-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: 345ddcc882 ("ftrace: Have set_ftrace_pid use the bitmap like events do")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The use of config_enabled() is ambiguous. For config options,
IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer.
Sometimes config_enabled() has been used for non-config options because
it is useful to check whether the given symbol is defined or not.
I have been tackling on deprecating config_enabled(), and now is the
time to finish this work.
Some new users have appeared for v4.9-rc1, but it is trivial to replace
them:
- arch/x86/mm/kaslr.c
replace config_enabled() with IS_ENABLED() because
CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.
- include/asm-generic/export.h
replace config_enabled() with __is_defined().
Then, config_enabled() can be removed now.
Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
options, and __is_defined() for non-config symbols.
Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Its all generic atomic_long_t stuff now.
Tested-by: Jason Low <jason.low2@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull kbuild updates from Michal Marek:
- EXPORT_SYMBOL for asm source by Al Viro.
This does bring a regression, because genksyms no longer generates
checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
working on a patch to fix this.
Plus, we are talking about functions like strcpy(), which rarely
change prototypes.
- Fixes for PPC fallout of the above by Stephen Rothwell and Nick
Piggin
- fixdep speedup by Alexey Dobriyan.
- preparatory work by Nick Piggin to allow architectures to build with
-ffunction-sections, -fdata-sections and --gc-sections
- CONFIG_THIN_ARCHIVES support by Stephen Rothwell
- fix for filenames with colons in the initramfs source by me.
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
initramfs: Escape colons in depfile
ppc: there is no clear_pages to export
powerpc/64: whitelist unresolved modversions CRCs
kbuild: -ffunction-sections fix for archs with conflicting sections
kbuild: add arch specific post-link Makefile
kbuild: allow archs to select link dead code/data elimination
kbuild: allow architectures to use thin archives instead of ld -r
kbuild: Regenerate genksyms lexer
kbuild: genksyms fix for typeof handling
fixdep: faster CONFIG_ search
ia64: move exports to definitions
sparc32: debride memcpy.S a bit
[sparc] unify 32bit and 64bit string.h
sparc: move exports to definitions
ppc: move exports to definitions
arm: move exports to definitions
s390: move exports to definitions
m68k: move exports to definitions
alpha: move exports to actual definitions
x86: move exports to actual definitions
...
Pull percpu updates from Tejun Heo:
- Nick improved generic implementations of percpu operations which
modify the variable and return so that they calculate the physical
address only once.
- percpu_ref percpu <-> atomic mode switching improvements. The
patchset was originally posted about a year ago but fell through the
crack.
- misc non-critical fixes.
* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()
mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk()
percpu: eliminate two sparse warnings
percpu: improve generic percpu modify-return implementation
percpu-refcount: init ->confirm_switch member properly
percpu_ref: allow operation mode switching operations to be called concurrently
percpu_ref: restructure operation mode switching
percpu_ref: unify staggered atomic switching wait behavior
percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic()
percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation
Pull libata updates from Tejun Heo:
- Write same support added
- Minor ahci MSIX irq handling updates
- Non-critical SCSI command translation fixes
- Controller specific changes
* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: qoriq: Revert "ahci: qoriq: Disable NCQ on ls2080a SoC"
libata: remove <asm-generic/libata-portmap.h>
libata: remove unused definitions from <asm/libata-portmap.h>
pata_at91: Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
ata: Replace BUG() with BUG_ON().
ata: sata_mv: Replacing dma_pool_alloc and memset with a single call dma_pool_zalloc.
libata: Some drives failing on SCT Write Same
ahci: use pci_alloc_irq_vectors
libata: SCT Write Same handle ATA_DFLAG_PIO
libata: SCT Write Same / DSM Trim
libata: Add support for SCT Write Same
libata: Safely overwrite attached page in WRITE SAME xlat
ahci: also use a per-port lock for the multi-MSIX case
ARM: dts: STiH407-family: Add ports-implemented property in sata nodes
ahci: st: Add ports-implemented property in support
ahci: qoriq: enable snoopable sata read and write
ahci: qoriq: adjust sata parameter
libata-scsi: fix MODE SELECT translation for Control mode page
libata-scsi: use u8 array to store mode page copy
Pull uaccess.h prepwork from Al Viro:
"Preparations to tree-wide switch to use of linux/uaccess.h (which,
obviously, will allow to start unifying stuff for real). The last step
there, ie
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
`git grep -l "$PATT"|grep -v ^include/linux/uaccess.h`
is not taken here - I would prefer to do it once just before or just
after -rc1. However, everything should be ready for it"
* 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
remove a stray reference to asm/uaccess.h in docs
sparc64: separate extable_64.h, switch elf_64.h to it
score: separate extable.h, switch module.h to it
mips: separate extable.h, switch module.h to it
x86: separate extable.h, switch sections.h to it
remove stray include of asm/uaccess.h from cacheflush.h
mn10300: remove a bogus processor.h->uaccess.h include
xtensa: split uaccess.h into C and asm sides
bonding: quit messing with IOCTL
kill __kernel_ds_p off
mn10300: finish verify_area() off
frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h
exceptions: detritus removal
When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.
This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.
Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We get 1 warning when building kernel with W=1:
drivers/char/mem.c:220:12: warning: no previous prototype for 'phys_mem_access_prot_allowed' [-Wmissing-prototypes]
int __weak phys_mem_access_prot_allowed(struct file *file,
In fact, its declaration is spreading to several header files in
different architecture, but need to be declare in common header file.
So this patch moves phys_mem_access_prot_allowed() to pgtable.h.
Link: http://lkml.kernel.org/r/1473751597-12139-1-git-send-email-baoyou.xie@linaro.org
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some architectures require an additional load to find the address of
percpu pointers. In some implemenatations, the C aliasing rules do not
allow the result of that load to be kept over the store that modifies
the percpu variable, which causes additional loads.
Work around this by finding the pointer first, then operating on that.
It's also possible to mark things as restrict and those kind of games,
but that can require larger and arch specific changes.
On powerpc, __this_cpu_inc_return compiles to:
ld 10,48(13)
ldx 9,3,10
addi 9,9,1
stdx 9,3,10
ld 9,48(13)
ldx 3,9,3
With this patch it compiles to:
ld 10,48(13)
ldx 9,3,10
addi 9,9,1
stdx 9,3,10
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
To: Tejun Heo <tj@kernel.org>
To: Christoph Lameter <cl@linux.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
asm-generic is only intended for architecture defaults, and we can simply
kill it off by moving the two defintions directly to <linux/libata.h>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Enabling -ffunction-sections modified the generic linker script to
pull .text.* sections into regular TEXT_TEXT section, conflicting
with some architectures. Revert that change and require archs that
enable the option to ensure they have no conflicting section names,
and do the appropriate merging.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: b67067f117 ("kbuild: allow archs to select link dead code/data elimination")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Pull uaccess fixes from Al Viro:
"Fixes for broken uaccess primitives - mostly lack of proper zeroing
in copy_from_user()/get_user()/__get_user(), but for several
architectures there's more (broken clear_user() on frv and
strncpy_from_user() on hexagon)"
* 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
avr32: fix copy_from_user()
microblaze: fix __get_user()
microblaze: fix copy_from_user()
m32r: fix __get_user()
blackfin: fix copy_from_user()
sparc32: fix copy_from_user()
sh: fix copy_from_user()
sh64: failing __get_user() should zero
score: fix copy_from_user() and friends
score: fix __get_user/get_user
s390: get_user() should zero on failure
ppc32: fix copy_from_user()
parisc: fix copy_from_user()
openrisc: fix copy_from_user()
nios2: fix __get_user()
nios2: copy_from_user() should zero the tail of destination
mn10300: copy_from_user() should zero on access_ok() failure...
mn10300: failing __get_user() and get_user() should zero
mips: copy_from_user() must zero the destination on access_ok() failure
ARC: uaccess: get_user to zero out dest in cause of fault
...
... in all cases, including the failing access_ok()
Note that some architectures using asm-generic/uaccess.h have
__copy_from_user() not zeroing the tail on failure halfway
through. This variant works either way.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
select to build with -ffunction-sections, -fdata-sections, and link
with --gc-sections. It requires some work (documented) to ensure all
unreferenced entrypoints are live, and requires toolchain and build
verification, so it is made a per-arch option for now.
On a random powerpc64le build, this yelds a significant size saving,
it boots and runs fine, but there is a lot I haven't tested as yet, so
these savings may be reduced if there are bugs in the link.
text data bss dec filename
11169741 1180744 1923176 14273661 vmlinux
10445269 1004127 1919707 13369103 vmlinux.dce
~700K text, ~170K data, 6% removed from kernel image size.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
There are three usercopy warnings which are currently being silenced for
gcc 4.6 and newer:
1) "copy_from_user() buffer size is too small" compile warning/error
This is a static warning which happens when object size and copy size
are both const, and copy size > object size. I didn't see any false
positives for this one. So the function warning attribute seems to
be working fine here.
Note this scenario is always a bug and so I think it should be
changed to *always* be an error, regardless of
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
2) "copy_from_user() buffer size is not provably correct" compile warning
This is another static warning which happens when I enable
__compiletime_object_size() for new compilers (and
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size
is const, but copy size is *not*. In this case there's no way to
compare the two at build time, so it gives the warning. (Note the
warning is a byproduct of the fact that gcc has no way of knowing
whether the overflow function will be called, so the call isn't dead
code and the warning attribute is activated.)
So this warning seems to only indicate "this is an unusual pattern,
maybe you should check it out" rather than "this is a bug".
I get 102(!) of these warnings with allyesconfig and the
__compiletime_object_size() gcc check removed. I don't know if there
are any real bugs hiding in there, but from looking at a small
sample, I didn't see any. According to Kees, it does sometimes find
real bugs. But the false positive rate seems high.
3) "Buffer overflow detected" runtime warning
This is a runtime warning where object size is const, and copy size >
object size.
All three warnings (both static and runtime) were completely disabled
for gcc 4.6 with the following commit:
2fb0815c9e ("gcc4: disable __compiletime_object_size for GCC 4.6+")
That commit mistakenly assumed that the false positives were caused by a
gcc bug in __compiletime_object_size(). But in fact,
__compiletime_object_size() seems to be working fine. The false
positives were instead triggered by #2 above. (Though I don't have an
explanation for why the warnings supposedly only started showing up in
gcc 4.6.)
So remove warning #2 to get rid of all the false positives, and re-enable
warnings #1 and #3 by reverting the above commit.
Furthermore, since #1 is a real bug which is detected at compile time,
upgrade it to always be an error.
Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
needed.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch aims to get rid of endianness in queued_write_unlock(). We
want to set __qrwlock->wmode to NULL, however the address is not
&lock->cnts in big endian machine. That causes queued_write_unlock()
write NULL to the wrong field of __qrwlock.
So implement __qrwlock_write_byte() which returns the correct
__qrwlock->wmode address.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman.Long@hpe.com
Cc: arnd@arndb.de
Cc: boqun.feng@gmail.com
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1468835259-4486-1-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add asm-usable variants of EXPORT_SYMBOL/EXPORT_SYMBOL_GPL. This
commit just adds the default implementation; most of the architectures
can simply add export.h to asm/Kbuild and start using <asm/export.h>
from assembler. The rest needs to have their <asm/export.h> define
everal macros and then explicitly include <asm-generic/export.h>
One area where the things might diverge from default is the alignment;
normally it's 8 bytes on 64bit targets and 4 on 32bit ones, both for
unsigned long and for struct kernel_symbol. Unfortunately, amd64 and
m68k are unusual - m68k aligns to 2 bytes (for both) and amd64 aligns
struct kernel_symbol to 16 bytes. For those we'll need asm/export.h to
override the constants used by generic version - KSYM_ALIGN and KCRC_ALIGN
for kernel_symbol and unsigned long resp. And no, __alignof__ would not
do the trick - on amd64 __alignof__ of struct kernel_symbol is 8, not 16.
More serious source of unpleasantness is treatment of function
descriptors on architectures that have those. Things like ppc64,
parisc, ia64, etc. need more than the address of the first insn to
call an arbitrary function. As the result, their representation of
pointers to functions is not the typical "address of the entry point" -
it's an address of a small static structure containing all the required
information (including the entry point, of course). Sadly, the asm-side
conventions differ in what the function name refers to - entry point or
the function descriptor. On ppc64 we do the latter;
bar: .quad foo
is what void (*bar)(void) = foo; turns into and the rare places where
we need to explicitly work with the label of entry point are dealt with
as DOTSYM(foo). For our purposes it's ideal - generic macros are usable.
However, parisc would have foo and P%foo used for label of entry point
and address of the function descriptor and
bar: .long P%foo
woudl be used instead. ia64 goes similar to parisc in that respect,
except that there it's @fptr(foo) rather than P%foo. Such architectures
need to define KSYM_FUNC that would turn a function name into whatever
is needed to refer to function descriptor.
What's more, on such architectures we need to know whether we are exporting
a function or an object - in assembler we have to tell that explicitly, to
decide whether we want EXPORT_SYMBOL(foo) produce e.g.
__ksymtab_foo: .quad foo
or
__ksymtab_foo: .quad @fptr(foo)
For that reason we introduce EXPORT_DATA_SYMBOL{,_GPL}(), to be used for
exports of data objects. On normal architectures it's the same thing
as EXPORT_SYMBOL{,_GPL}(), but on parisc-like ones they differ and the
right one needs to be used. Most of the exports are functions, so we
keep EXPORT_SYMBOL for those...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup rtc-cmos,
rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after shutdown for QNAP
TS-41x
- s3c: clock fixes
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJXokhIAAoJENiigzvaE+LCZqQP+wWzintN/N1u3dKiVB7iSdwq
+S/jAXD9wW8OK9PI60/YUGRYeUXmZW9t4XYg1VKCxU9KpVC17LgOtDyXD8BufP1V
uREJEzZw9O7zCCjeHp/ICFjBkc62Net6ZDOO+ZyXPNfddpS1Xq1uUgXLZc/202UR
ID/kewu0pJRDnoxyqznWn9+8D33w/ygXs2slY2Ive0ONtjdgxGcsj2rNbb2RYn2z
OP7br3lLg7qkFh4TtXb61eh/9GYIk6wzP/CrX5l/jH4SjQnrIk5g/X/Cd1qQ/qso
JZzFoonOKvIp5Gw/+fZ9NP3YFcnkoRMv4NjZV8PAmsYLds+ibRiBcoB8u6FmiJV7
WW5uopgPkfCGN5BV3+QHwJDVe+WlgnlzaT5zPUCcP5KWusDts4fWIgzP7vrtAzf4
3OJLrgSGdBeOqWnJD21nxKUD27JOseX7D+BFtwxR4lMsXHqlHJfETpZ8gts1ZGH3
2U353j/jkZvGWmc6dMcuxOXT2K4VqpYeIIqs0IcLu6hM9crtR89zPR2Iu1AilfDW
h2NroF+Q//SgMMzWoTEG6Tn7RAc7MthgA/tRCFZF9CBMzNs988w0CTHnKsIHmjpU
UKkMeJGAC9YrPYIcqrg0oYsmLUWXc8JuZbGJBnei3BzbaMTlcwIN9qj36zfq6xWc
TMLpbWEoIsgFIZMP/hAP
=rpGB
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"RTC for 4.8
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after
shutdown for QNAP TS-41x
- s3c: clock fixes"
* tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
rtc: rv8803: Clear V1F when setting the time
rtc: rv8803: Stop the clock while setting the time
rtc: rv8803: Always apply the I²C workaround
rtc: rv8803: Fix read day of week
rtc: rv8803: Remove the check for valid time
rtc: rv8803: Kconfig: Indicate rx8900 support
rtc: asm9260: remove .owner field for driver
rtc: at91sam9: Fix missing spin_lock_init()
rtc: m41t80: add suspend handlers for alarm IRQ
rtc: m41t80: make it a real error message
rtc: pcf85063: Add support for the PCF85063A device
rtc: pcf85063: fix year range
rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
rtc: s3c: Remove unnecessary call to disable already disabled clock
rtc: abx80x: use devm_add_action_or_reset()
rtc: m41t80: use devm_add_action_or_reset()
rtc: fix a typo and reduce three empty lines to one
rtc: s35390a: improve two comments in .set_alarm
...
Pull kbuild updates from Michal Marek:
- GCC plugin support by Emese Revfy from grsecurity, with a fixup from
Kees Cook. The plugins are meant to be used for static analysis of
the kernel code. Two plugins are provided already.
- reduction of the gcc commandline by Arnd Bergmann.
- IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada
- bin2c fix by Michael Tautschnig
- setlocalversion fix by Wolfram Sang
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
gcc-plugins: disable under COMPILE_TEST
kbuild: Abort build on bad stack protector flag
scripts: Fix size mismatch of kexec_purgatory_size
kbuild: make samples depend on headers_install
Kbuild: don't add obj tree in additional includes
Kbuild: arch: look for generated headers in obtree
Kbuild: always prefix objtree in LINUXINCLUDE
Kbuild: avoid duplicate include path
Kbuild: don't add ../../ to include path
vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
kconfig.h: use already defined macros for IS_REACHABLE() define
export.h: use __is_defined() to check if __KSYM_* is defined
kconfig.h: use __is_defined() to check if MODULE is defined
kbuild: setlocalversion: print error to STDERR
Add sancov plugin
Add Cyclomatic complexity GCC plugin
GCC plugin infrastructure
Shared library support
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2
- most(?) of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in <linux/compaction.h>
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...
This allows an arch which needs to do special handing with respect to
different page size when flushing tlb to implement the same in mmu
gather.
Link: http://lkml.kernel.org/r/1465049193-22197-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This updates the generic and arch specific implementation to return true
if we need to do a tlb flush. That means if a __tlb_remove_page
indicate a flush is needed, the page we try to remove need to be tracked
and added again after the flush. We need to track it because we have
already update the pte to none and we can't just loop back.
This change is done to enable us to do a tlb_flush when we try to flush
a range that consists of different page sizes. For architectures like
ppc64, we can do a range based tlb flush and we need to track page size
for that. When we try to remove a huge page, we will force a tlb flush
and starts a new mmu gather.
[aneesh.kumar@linux.vnet.ibm.com: mm-change-the-interface-for-__tlb_remove_page-v3]
Link: http://lkml.kernel.org/r/1465049193-22197-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1464860389-29019-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.8:
API:
- first part of skcipher low-level conversions
- add KPP (Key-agreement Protocol Primitives) interface.
Algorithms:
- fix IPsec/cryptd reordering issues that affects aesni
- RSA no longer does explicit leading zero removal
- add SHA3
- add DH
- add ECDH
- improve DRBG performance by not doing CTR by hand
Drivers:
- add x86 AVX2 multibuffer SHA256/512
- add POWER8 optimised crc32c
- add xts support to vmx
- add DH support to qat
- add RSA support to caam
- add Layerscape support to caam
- add SEC1 AEAD support to talitos
- improve performance by chaining requests in marvell/cesa
- add support for Araneus Alea I USB RNG
- add support for Broadcom BCM5301 RNG
- add support for Amlogic Meson RNG
- add support Broadcom NSP SoC RNG"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits)
crypto: vmx - Fix aes_p8_xts_decrypt build failure
crypto: vmx - Ignore generated files
crypto: vmx - Adding support for XTS
crypto: vmx - Adding asm subroutines for XTS
crypto: skcipher - add comment for skcipher_alg->base
crypto: testmgr - Print akcipher algorithm name
crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op
crypto: nx - off by one bug in nx_of_update_msc()
crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
crypto: scatterwalk - Inline start/map/done
crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
crypto: scatterwalk - Fix test in scatterwalk_done
crypto: api - Optimise away crypto_yield when hard preemption is on
crypto: scatterwalk - add no-copy support to copychunks
crypto: scatterwalk - Remove scatterwalk_bytes_sglen
crypto: omap - Stop using crypto scatterwalk_bytes_sglen
crypto: skcipher - Remove top-level givcipher interface
crypto: user - Remove crypto_lookup_skcipher call
crypto: cts - Convert to skcipher
...
Pull s390 updates from Martin Schwidefsky:
"There are a couple of new things for s390 with this merge request:
- a new scheduling domain "drawer" is added to reflect the unusual
topology found on z13 machines. Performance tests showed up to 8
percent gain with the additional domain.
- the new crc-32 checksum crypto module uses the vector-galois-field
multiply and sum SIMD instruction to speed up crc-32 and crc-32c.
- proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in
the generic vmlinux.lds linker script definitions.
- kcov instrumentation support. A prerequisite for that is the
inline assembly basic block cleanup, which is the reason for the
net/iucv/iucv.c change.
- support for 2GB pages is added to the hugetlbfs backend.
Then there are two removals:
- the oprofile hardware sampling support is dead code and is removed.
The oprofile user space uses the perf interface nowadays.
- the ETR clock synchronization is removed, this has been superseeded
be the STP clock synchronization. And it always has been
"interesting" code..
And the usual bug fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits)
s390/pci: Delete an unnecessary check before the function call "pci_dev_put"
s390/smp: clean up a condition
s390/cio/chp : Remove deprecated create_singlethread_workqueue
s390/chsc: improve channel path descriptor determination
s390/chsc: sanitize fmt check for chp_desc determination
s390/cio: make fmt1 channel path descriptor optional
s390/chsc: fix ioctl CHSC_INFO_CU command
s390/cio/device_ops: fix kernel doc
s390/cio: allow to reset channel measurement block
s390/console: Make preferred console handling more consistent
s390/mm: fix gmap tlb flush issues
s390/mm: add support for 2GB hugepages
s390: have unique symbol for __switch_to address
s390/cpuinfo: show maximum thread id
s390/ptrace: clarify bits in the per_struct
s390: stack address vs thread_info
s390: remove pointless load within __switch_to
s390: enable kcov support
s390/cpumf: use basic block for ecctr inline assembly
s390/hypfs: use basic block for diag inline assembly
...
Pull NOHZ updates from Ingo Molnar:
- fix system/idle cputime leaked on cputime accounting (all nohz
configs) (Rik van Riel)
- remove the messy, ad-hoc irqtime account on nohz-full and make it
compatible with CONFIG_IRQ_TIME_ACCOUNTING=y instead (Rik van Riel)
- cleanups (Frederic Weisbecker)
- remove unecessary irq disablement in the irqtime code (Rik van Riel)
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Drop local_irq_save/restore from irqtime_account_irq()
sched/cputime: Reorganize vtime native irqtime accounting headers
sched/cputime: Clean up the old vtime gen irqtime accounting completely
sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code
sched/cputime: Count actually elapsed irq & softirq time
Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.
The main changes are:
- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)
- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)
- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)
- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)
- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)
- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)
- optimize mutex_trylock() fastpath (Peter Zijlstra)
- ... misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...
If CONFIG_KASAN is enabled and gcc is configured with
--disable-initfini-array and/or gold linker is used, gcc emits
.ctors/.dtors and .text.startup/.text.exit sections instead of
.init_array/.fini_array. .dtors section is not explicitly accounted in
the linker script and messes vvar/percpu layout.
We want:
ffffffff822bfd80 D _edata
ffffffff822c0000 D __vvar_beginning_hack
ffffffff822c0000 A __vvar_page
ffffffff822c0080 0000000000000098 D vsyscall_gtod_data
ffffffff822c1000 A __init_begin
ffffffff822c1000 D init_per_cpu__irq_stack_union
ffffffff822c1000 A __per_cpu_load
ffffffff822d3000 D init_per_cpu__gdt_page
We got:
ffffffff8279a600 D _edata
ffffffff8279b000 A __vvar_page
ffffffff8279c000 A __init_begin
ffffffff8279c000 D init_per_cpu__irq_stack_union
ffffffff8279c000 A __per_cpu_load
ffffffff8279e000 D __vvar_beginning_hack
ffffffff8279e080 0000000000000098 D vsyscall_gtod_data
ffffffff827ae000 D init_per_cpu__gdt_page
This happens because __vvar_page and .vvar get different addresses in
arch/x86/kernel/vmlinux.lds.S:
. = ALIGN(PAGE_SIZE);
__vvar_page = .;
.vvar : AT(ADDR(.vvar) - LOAD_OFFSET) {
/* work around gold bug 13023 */
__vvar_beginning_hack = .;
Discard .dtors/.fini_array/.text.exit, since we don't call dtors.
Merge .text.startup into init text.
Link: http://lkml.kernel.org/r/1467386363-120030-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, if there was any irq or softirq time during 'ticks'
jiffies, the entire period will be accounted as irq or softirq
time.
This is inaccurate if only a subset of the time was actually spent
handling irqs, and could conceivably mis-count all of the ticks during
a period as irq time, when there was some irq and some softirq time.
This can actually happen when irqtime_account_process_tick is called
from account_idle_ticks, which can pass a larger number of ticks down
all at once.
Fix this by changing irqtime_account_hi_update(), irqtime_account_si_update(),
and steal_account_process_ticks() to work with cputime_t time units, and
return the amount of time spent in each mode.
Rename steal_account_process_ticks() to steal_account_process_time(), to
reflect that time is now accounted in cputime_t, instead of ticks.
Additionally, have irqtime_account_process_tick() take into account how
much time was spent in each of steal, irq, and softirq time.
The latter could help improve the accuracy of cputime
accounting when returning from idle on a NO_HZ_IDLE CPU.
Properly accounting how much time was spent in hardirq and
softirq time will also allow the NO_HZ_FULL code to re-use
these same functions for hardirq and softirq accounting.
Signed-off-by: Rik van Riel <riel@redhat.com>
[ Make nsecs_to_cputime64() actually return cputime64_t. ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1468421405-20056-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The use of config_enabled() against config options is ambiguous.
Now, IS_ENABLED() is implemented purely with macro expansion, so
let's replace config_enabled() with IS_ENABLED().
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Since all architectures have this implemented now natively, remove this
dead code.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all the architectures have implemented support for these new
atomic primitives add on the generic infrastructure to expose and use
it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch updates/fixes all spin_unlock_wait() implementations.
The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.
This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.
I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.
Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.
I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.
Note: PPC32 will be fixed independently
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chris@zankel.net
Cc: cmetcalf@mellanox.com
Cc: davem@davemloft.net
Cc: dhowells@redhat.com
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: linux@armlinux.org.uk
Cc: mpe@ellerman.id.au
Cc: ralf@linux-mips.org
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: vgupta@synopsys.com
Cc: ysato@users.sourceforge.jp
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since all asm/barrier.h should/must include asm-generic/barrier.h the
latter is a good place for generic infrastructure like this.
This also allows archs to override the new smp_acquire__after_ctrl_dep().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
commit c74ba8b348 ("arch: Introduce post-init read-only memory")
introduced the __ro_after_init attribute which allows to add variables
to the ro_after_init data section.
This new section was added to rodata, even though it contains writable
data. This in turn causes problems on architectures which mark the
page table entries read-only that point to rodata very early.
This patch allows architectures to implement an own handling of the
.data..ro_after_init section.
Usually that would be:
- mark the rodata section read-only very early
- mark the ro_after_init section read-only within mark_rodata_ro
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The existing version uses a heavy barrier while only release semantics
is required. So use atomic_sub_return_release() instead.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: waiman.long@hp.com
Link: http://lkml.kernel.org/r/1464943094-3129-1-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A while back Viro posted a number of 'interesting' mutex_is_locked()
users on IRC, one of those was RCU.
RCU seems to use mutex_is_locked() to avoid doing mutex_trylock(), the
regular load before modify pattern.
While the use isn't wrong per se, its curious in that its needed at all,
mutex_trylock() should be good enough on its own to avoid the pointless
cacheline bounces.
So fix those and remove the mutex_is_locked() (ab)use from RCU.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hpe.com>
Link: http://lkml.kernel.org/r/20160601185815.GW3190@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The rwsem-xadd count has been converted to an atomic variable and the
rwsem code now directly uses atomic_long_add() and
atomic_long_add_return(), so we can remove the arch implementations of
rwsem_atomic_add() and rwsem_atomic_update().
Signed-off-by: Jason Low <jason.low2@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Terry Rudd <terry.rudd@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Convert the rwsem count variable to an atomic_long_t since we use it
as an atomic variable. This also allows us to remove the
rwsem_atomic_{add,update}() "abstraction" which would now be an unnecesary
level of indirection. In follow up patches, we also remove the
rwsem_atomic_{add,update}() definitions across the various architectures.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jason Low <jason.low2@hpe.com>
[ Build warning fixes on various architectures. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Terry Rudd <terry.rudd@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Waiman Long <Waiman.Long@hpe.com>
Link: http://lkml.kernel.org/r/1465017963-4839-2-git-send-email-jason.low2@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While this prior commit:
54cf809b95 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
... fixes spin_is_locked() and spin_unlock_wait() for the usage
in ipc/sem and netfilter, it does not in fact work right for the
usage in task_work and futex.
So while the 2 locks crossed problem:
spin_lock(A) spin_lock(B)
if (!spin_is_locked(B)) spin_unlock_wait(A)
foo() foo();
... works with the smp_mb() injected by both spin_is_locked() and
spin_unlock_wait(), this is not sufficient for:
flag = 1;
smp_mb(); spin_lock()
spin_unlock_wait() if (!flag)
// add to lockless list
// iterate lockless list
... because in this scenario, the store from spin_lock() can be delayed
past the load of flag, uncrossing the variables and loosing the
guarantee.
This patch reworks spin_is_locked() and spin_unlock_wait() to work in
both cases by exploiting the observation that while the lock byte
store can be delayed, the contender must have registered itself
visibly in other state contained in the word.
It also allows for architectures to override both functions, as PPC
and ARM64 have an additional issue for which we currently have no
generic solution.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Giovanni Gherdovich <ggherdovich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org # v4.2 and later
Fixes: 54cf809b95 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No architecture uses the genrtc driver any more, so let's kill it off
for good. This now also includes asm-generic/rtc.h, which is otherwise
completely unused.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Drivers should not really include stuff from asm-generic directly,
and the PC-style cmos rtc driver does this in order to reuse the
mc146818 implementation of get_rtc_time/set_rtc_time rather than
the architecture specific one for the architecture it gets built for.
To make it more obvious what is going on, this moves and renames the
two functions into include/linux/mc146818rtc.h, which holds the
other mc146818 specific code. Ideally it would be in a .c file,
but that would require extra infrastructure as the functions are
called by multiple drivers with conflicting dependencies.
With this change, the asm-generic/rtc.h header also becomes much
more generic, so it can be reused more easily across any architecture
that still relies on the genrtc driver.
The only caller of the internal __get_rtc_time/__set_rtc_time
functions is in arch/alpha/kernel/rtc.c, and we just change those
over to the new naming.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
This will allow device drivers to consistently use io{read,write}XX
also for 64-bit accesses.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
While reviewing the addition of io{read,write}64be accessors, Arnd
-finds a potential problem:
"If an architecture overrides readq/writeq to have barriers but does
not override ioread64be/iowrite64be, this will lack the barriers and
behave differently from the little-endian version. I think the only
affected architecture is ARC, since ARM and ARM64 both override the
big-endian accessors to have the correct barriers, and all others
don't use barriers at all."
-suggests a fix for the same problem in existing code (16/32-bit
accessors); the fix leads "to a double-swap on architectures that
don't override the io{read,write}{16,32}be accessors, but it will
work correctly on all architectures without them having to override
these accessors."
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull scheduler fixes from Ingo Molnar:
"Two fixes: one for a lost wakeup, the other to fix the compiler
optimizing out preempt operations on ARM64 (and possibly other non-x86
architectures)"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix remote wakeups
sched/preempt: Fix preempt_count manipulations
Similar to commits:
51d7d5205d ("powerpc: Add smp_mb() to arch_spin_is_locked()")
d86b8da04d ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
qspinlock suffers from the fact that the _Q_LOCKED_VAL store is
unordered inside the ACQUIRE of the lock.
And while this is not a problem for the regular mutual exclusive
critical section usage of spinlocks, it breaks creative locking like:
spin_lock(A) spin_lock(B)
spin_unlock_wait(B) if (!spin_is_locked(A))
do_something() do_something()
In that both CPUs can end up running do_something at the same time,
because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait()
spin_is_locked() loads (even on x86!!).
To avoid making the normal case slower, add smp_mb()s to the less used
spin_unlock_wait() / spin_is_locked() side of things to avoid this
problem.
Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net>
Reported-by: Giovanni Gherdovich <ggherdovich@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # v4.2 and later
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge updates from Andrew Morton:
- fsnotify fix
- poll() timeout fix
- a few scripts/ tweaks
- debugobjects updates
- the (small) ocfs2 queue
- Minor fixes to kernel/padata.c
- Maybe half of the MM queue
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
mm, page_alloc: restore the original nodemask if the fast path allocation failed
mm, page_alloc: uninline the bad page part of check_new_page()
mm, page_alloc: don't duplicate code in free_pcp_prepare
mm, page_alloc: defer debugging checks of pages allocated from the PCP
mm, page_alloc: defer debugging checks of freed pages until a PCP drain
cpuset: use static key better and convert to new API
mm, page_alloc: inline pageblock lookup in page free fast paths
mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
mm, page_alloc: pull out side effects from free_pages_check
mm, page_alloc: un-inline the bad part of free_pages_check
mm, page_alloc: check multiple page fields with a single branch
mm, page_alloc: remove field from alloc_context
mm, page_alloc: avoid looking up the first zone in a zonelist twice
mm, page_alloc: shortcut watermark checks for order-0 pages
mm, page_alloc: reduce cost of fair zone allocation policy retry
mm, page_alloc: shorten the page allocator fast path
mm, page_alloc: check once if a zone has isolated pageblocks
mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
mm, page_alloc: simplify last cpupid reset
mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
...
I've just discovered that the useful-sounding has_transparent_hugepage()
is actually an architecture-dependent minefield: on some arches it only
builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
not, but on some of those (arm and arm64) it then gives the wrong
answer; and on mips alone it's marked __init, which would crash if
called later (but so far it has not been called later).
Straighten this out: make it available to all configs, with a sensible
default in asm-generic/pgtable.h, removing its definitions from those
arches (arc, arm, arm64, sparc, tile) which are served by the default,
adding #define has_transparent_hugepage has_transparent_hugepage to
those (mips, powerpc, s390, x86) which need to override the default at
runtime, and removing the __init from mips (but maybe that kind of code
should be avoided after init: set a static variable the first time it's
called).
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Ning Qu <quning@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [arch/s390]
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The updates include:
* Rate limiting for the VT-d fault handler
* Remove statistics code from the AMD IOMMU driver. It is unused
and should be replaced by something more generic if needed
* Per-domain pagesize-bitmaps in IOMMU core code to support
systems with different types of IOMMUs
* Support for ACPI devices in the AMD IOMMU driver
* 4GB mode support for Mediatek IOMMU driver
* ARM-SMMU updates from Will Deacon:
- Support for 64k pages with SMMUv1 implementations
(e.g MMU-401)
- Remove open-coded 64-bit MMIO accessors
- Initial support for 16-bit VMIDs, as supported by some
ThunderX SMMU implementations
- A couple of errata workarounds for silicon in the
field
* Various fixes here and there
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJXPeM1AAoJECvwRC2XARrjA2QP/2Cz+pVkpQCuvhAse57eN4rB
wWXKTjqSFZ4PcA3Vu5yvX6XMv15g46xXFJAhf2spE5//8+xgFfYBgkBRpnqu1brw
SL6f8A912MnfMRgWqcdKkJNeHbiN0kOvcIQv1J8GNfciqMiyYFhiLP6fFiRmWR/F
XDBjUeFZ5+Uwf1BAGqw0cVPexeakEbsLHUGqxFsh5g2T4i43aHzO2HJT3IdwWHDt
F2ivs8gNFGBeJEyzhW8TD0rOEEyHAnM3N18qPEU9+dD0UmjnTQPymEZSbsGW5d4j
Cn40QYlA+Zmbwgx6LaDVChzQyRJu6O3uvFThyRviiYKCri/Nc9cUT4vHsFGU4MXb
1d3bqrgzaw7vw31BN7S1Py3MV+WpVnEYjFm2O+hW28OjtSpm6ZvbI8wc0rF4UT/I
KgL0gSeA8tp25uVISM+ktpIrObYsAcoCz8nvurpDv2AGkKRzhyoSze0Jg43rusD8
BH7iFWu1LRPlulTGlrHMtNmbZeEApUPbObcQAOcrBOj9vjuFaZ8qduZmB+hwS2iV
p9atn+54LmGO0LuzqsGrhApIeXTeTZSrGyjlbUADWBJlTw8Xyk/CR39Wf3m/Xmpr
DiJ/5oa8SKQtNbwvbScn1+sInNWP/pH/JgnRO3Yvqth8HWF/DlpzNj5XxAB8czwr
qjk9WjpEXun50ocPFQeS
=jpPD
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The updates include:
- rate limiting for the VT-d fault handler
- remove statistics code from the AMD IOMMU driver. It is unused and
should be replaced by something more generic if needed
- per-domain pagesize-bitmaps in IOMMU core code to support systems
with different types of IOMMUs
- support for ACPI devices in the AMD IOMMU driver
- 4GB mode support for Mediatek IOMMU driver
- ARM-SMMU updates from Will Deacon:
- support for 64k pages with SMMUv1 implementations (e.g MMU-401)
- remove open-coded 64-bit MMIO accessors
- initial support for 16-bit VMIDs, as supported by some ThunderX
SMMU implementations
- a couple of errata workarounds for silicon in the field
- various fixes here and there"
* tag 'iommu-updates-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (44 commits)
iommu/arm-smmu: Use per-domain page sizes.
iommu/amd: Remove statistics code
iommu/dma: Finish optimising higher-order allocations
iommu: Allow selecting page sizes per domain
iommu: of: enforce const-ness of struct iommu_ops
iommu: remove unused priv field from struct iommu_ops
iommu/dma: Implement scatterlist segment merging
iommu/arm-smmu: Clear cache lock bit of ACR
iommu/arm-smmu: Support SMMUv1 64KB supplement
iommu/arm-smmu: Decouple context format from kernel config
iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses
io-64-nonatomic: Add relaxed accessor variants
iommu/arm-smmu: Work around MMU-500 prefetch errata
iommu/arm-smmu: Convert ThunderX workaround to new method
iommu/arm-smmu: Differentiate specific implementations
iommu/arm-smmu: Workaround for ThunderX erratum #27704
iommu/arm-smmu: Add support for 16 bit VMID
iommu/amd: Move get_device_id() and friends to beginning of file
iommu/amd: Don't use IS_ERR_VALUE to check integer values
iommu/amd: Signedness bug in acpihid_device_group()
...
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for MIPS for 4.7. Here's the summary of
the changes:
- ATH79: Support for DTB passuing using the UHI boot protocol
- ATH79: Remove support for builtin DTB.
- ATH79: Add zboot debug serial support.
- ATH79: Add initial support for Dragino MS14 (Dragine 2), Onion Omega
and DPT-Module.
- ATH79: Update devicetree clock support for AR9132 and AR9331.
- ATH79: Cleanup the DT code.
- ATH79: Support newer SOCs in ath79_ddr_ctrl_init.
- ATH79: Fix regression in PCI window initialization.
- BCM47xx: Move SPROM driver to drivers/firmware/
- BCM63xx: Enable partition parser in defconfig.
- BMIPS: BMIPS5000 has I cache filing from D cache
- BMIPS: BMIPS: Add cpu-feature-overrides.h
- BMIPS: Add Whirlwind support
- BMIPS: Adjust mips-hpt-frequency for BCM7435
- BMIPS: Remove maxcpus from BCM97435SVMB DTS
- BMIPS: Add missing 7038 L1 register cells to BCM7435
- BMIPS: Various tweaks to initialization code.
- BMIPS: Enable partition parser in defconfig.
- BMIPS: Cache tweaks.
- BMIPS: Add UART, I2C and SATA devices to DT.
- BMIPS: Add BCM6358 and BCM63268support
- BMIPS: Add device tree example for BCM6358.
- BMIPS: Improve Improve BCM6328 and BCM6368 device trees
- Lantiq: Add support for device tree file from boot loader
- Lantiq: Allow build with no built-in DT.
- Loongson 3: Reserve 32MB for RS780E integrated GPU.
- Loongson 3: Fix build error after ld-version.sh modification
- Loongson 3: Move chipset ACPI code from drivers to arch.
- Loongson 3: Speedup irq processing.
- Loongson 3: Add basic Loongson 3A support.
- Loongson 3: Set cache flush handlers to nop.
- Loongson 3: Invalidate special TLBs when needed.
- Loongson 3: Fast TLB refill handler.
- MT7620: Fallback strategy for invalid syscfg0.
- Netlogic: Fix CP0_EBASE redefinition warnings
- Octeon: Initialization fixes
- Octeon: Add DTS files for the D-Link DSR-1000N and EdgeRouter Lite
- Octeon: Enable add Octeon-drivers in cavium_octeon_defconfig
- Octeon: Correctly handle endian-swapped initramfs images.
- Octeon: Support CN73xx, CN75xx and CN78xx.
- Octeon: Remove dead code from cvmx-sysinfo.
- Octeon: Extend number of supported CPUs past 32.
- Octeon: Remove some code limiting NR_IRQS to 255.
- Octeon: Simplify octeon_irq_ciu_gpio_set_type.
- Octeon: Mark some functions __init in smp.c
- Octeon: Octeon: Add Octeon III CN7xxx interface detection
- PIC32: Add serial driver and bindings for it.
- PIC32: Add PIC32 deadman timer driver and bindings.
- PIC32: Add PIC32 clock timer driver and bindings.
- Pistachio: Determine SoC revision during boot
- Sibyte: Fix Kconfig dependencies of SIBYTE_BUS_WATCHER.
- Sibyte: Strip redundant comments from bcm1480_regs.h.
- Panic immediately if panic_on_oops is set.
- module: fix incorrect IS_ERR_VALUE macro usage.
- module: Make consistent use of pr_*
- Remove no longer needed work_on_cpu() call.
- Remove CONFIG_IPV6_PRIVACY from defconfigs.
- Fix registers of non-crashing CPUs in dumps.
- Handle MIPSisms in new vmcore_elf32_check_arch.
- Select CONFIG_HANDLE_DOMAIN_IRQ and make it work.
- Allow RIXI to be used on non-R2 or R6 cores.
- Reserve nosave data for hibernation
- Fix siginfo.h to use strict POSIX types.
- Don't unwind user mode with EVA.
- Fix watchpoint restoration
- Ptrace watchpoints for R6.
- Sync icache when it fills from dcache
- I6400 I-cache fills from dcache.
- Various MSA fixes.
- Cleanup MIPS_CPU_* definitions.
- Signal: Move generic copy_siginfo to signal.h
- Signal: Fix uapi include in exported asm/siginfo.h
- Timer fixes for sake of KVM.
- XPA TLB refill fixes.
- Treat perf counter feature
- Update John Crispin's email address
- Add PIC32 watchdog and bindings.
- Handle R10000 LL/SC bug in set_pte()
- cpufreq: Various fixes for Longson1.
- R6: Fix R2 emulation.
- mathemu: Cosmetic fix to ADDIUPC emulation, plenty of other small fixes
- ELF: ABI and FP fixes.
- Allow for relocatable kernel and use that to support KASLR.
- Fix CPC_BASE_ADDR mask
- Plenty fo smp-cps, CM, R6 and M6250 fixes.
- Make reset_control_ops const.
- Fix kernel command line handling of leading whitespace.
- Cleanups to cache handling.
- Add brcm, bcm6345-l1-intc device tree bindings.
- Use generic clkdev.h header
- Remove CLK_IS_ROOT usage.
- Misc small cleanups.
- CM: Fix compilation error when !MIPS_CM
- oprofile: Fix a preemption issue
- Detect DSP ASE v3 support:1"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (275 commits)
MIPS: pic32mzda: fix getting timer clock rate.
MIPS: ath79: fix regression in PCI window initialization
MIPS: ath79: make ath79_ddr_ctrl_init() compatible for newer SoCs
MIPS: Fix VZ probe gas errors with binutils <2.24
MIPS: perf: Fix I6400 event numbers
MIPS: DEC: Export `ioasic_ssr_lock' to modules
MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
MIPS: CM: Fix compilation error when !MIPS_CM
MIPS: Fix genvdso error on rebuild
USB: ohci-jz4740: Remove obsolete driver
MIPS: JZ4740: Probe OHCI platform device via DT
MIPS: JZ4740: Qi LB60: Remove support for AVT2 variant
MIPS: pistachio: Determine SoC revision during boot
MIPS: BMIPS: Adjust mips-hpt-frequency for BCM7435
mips: mt7620: fallback to SDRAM when syscfg0 does not have a valid value for the memory type
MIPS: Prevent "restoration" of MSA context in non-MSA kernels
MIPS: cevt-r4k: Dynamically calculate min_delta_ns
MIPS: malta-time: Take seconds into account
MIPS: malta-time: Start GIC count before syncing to RTC
MIPS: Force CPUs to lose FP context during mode switches
...
Vikram reported that his ARM64 compiler managed to 'optimize' away the
preempt_count manipulations in code like:
preempt_enable_no_resched();
put_user();
preempt_disable();
Irrespective of that fact that that is horrible code that should be
fixed for many reasons, it does highlight a deficiency in the generic
preempt_count manipulators. As it is never right to combine/elide
preempt_count manipulations like this.
Therefore sprinkle some volatile in the two generic accessors to
ensure the compiler is aware of the fact that the preempt_count is
observed outside of the regular program-order view and thus cannot be
optimized away like this.
x86; the only arch not using the generic code is not affected as we
do all this in asm in order to use the segment base per-cpu stuff.
Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
Tested-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a787870924 ("sched, arch: Create asm/preempt.h")
Link: http://lkml.kernel.org/r/20160516131751.GH3205@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- MSR access API fixes and enhancements (Andy Lutomirski)
- early exception handling improvements (Andy Lutomirski)
- user-space FS/GS prctl usage fixes and improvements (Andy
Lutomirski)
- Remove the cpu_has_*() APIs and replace them with equivalents
(Borislav Petkov)
- task switch micro-optimization (Brian Gerst)
- 32-bit entry code simplification (Denys Vlasenko)
- enhance PAT handling in enumated CPUs (Toshi Kani)
... and lots of other cleanups/fixlets"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
x86/arch_prctl/64: Restore accidentally removed put_cpu() in ARCH_SET_GS
x86/entry/32: Remove asmlinkage_protect()
x86/entry/32: Remove GET_THREAD_INFO() from entry code
x86/entry, sched/x86: Don't save/restore EFLAGS on task switch
x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs
selftests/x86/ldt_gdt: Test set_thread_area() deletion of an active segment
x86/tls: Synchronize segment registers in set_thread_area()
x86/asm/64: Rename thread_struct's fs and gs to fsbase and gsbase
x86/arch_prctl/64: Remove FSBASE/GSBASE < 4G optimization
x86/segments/64: When load_gs_index fails, clear the base
x86/segments/64: When loadsegment(fs, ...) fails, clear the base
x86/asm: Make asm/alternative.h safe from assembly
x86/asm: Stop depending on ptrace.h in alternative.h
x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()
x86/asm: Make sure verify_cpu() has a good stack
x86/extable: Add a comment about early exception handlers
x86/msr: Set the return value to zero when native_rdmsr_safe() fails
x86/paravirt: Make "unsafe" MSR accesses unsafe even if PARAVIRT=y
x86/paravirt: Add paravirt_{read,write}_msr()
x86/msr: Carry on after a non-"safe" MSR access fails
...
Pull support for killable rwsems from Ingo Molnar:
"This, by Michal Hocko, implements down_write_killable().
The main usecase will be to update mm_sem usage sites to use this new
API, to allow the mm-reaper introduced in commit aac4536355 ("mm,
oom: introduce oom reaper") to tear down oom victim address spaces
asynchronously with minimum latencies and without deadlock worries"
[ The vfs will want it too as the inode lock is changed from a mutex to
a rwsem due to the parallel lookup and readdir updates ]
* 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix comment on register clobbering
locking/rwsem: Fix down_write_killable()
locking/rwsem, x86: Add frame annotation for call_rwsem_down_write_failed_killable()
locking/rwsem: Provide down_write_killable()
locking/rwsem, x86: Provide __down_write_killable()
locking/rwsem, s390: Provide __down_write_killable()
locking/rwsem, ia64: Provide __down_write_killable()
locking/rwsem, alpha: Provide __down_write_killable()
locking/rwsem: Introduce basis for down_write_killable()
locking/rwsem, sparc: Drop superfluous arch specific implementation
locking/rwsem, sh: Drop superfluous arch specific implementation
locking/rwsem, xtensa: Drop superfluous arch specific implementation
locking/rwsem: Drop explicit memory barriers
locking/rwsem: Get rid of __down_write_nested()
The generic copy_siginfo() is currently defined in
asm-generic/siginfo.h, after including uapi/asm-generic/siginfo.h which
defines the generic struct siginfo. However this makes it awkward for an
architecture to use it if it has to define its own struct siginfo (e.g.
MIPS and potentially IA64), since it means that asm-generic/siginfo.h
can only be included after defining the arch-specific siginfo, which may
be problematic if the arch-specific definition needs definitions from
uapi/asm-generic/siginfo.h.
It is possible to work around this by first including
uapi/asm-generic/siginfo.h to get the constants before defining the
arch-specific siginfo, and include asm-generic/siginfo.h after. However
uapi headers can't be included by other uapi headers, so that first
include has to be in an ifdef __kernel__, with the non __kernel__ case
including the non-UAPI header instead.
Instead of that mess, move the generic copy_siginfo() definition into
linux/signal.h, which allows an arch-specific uapi/asm/siginfo.h to
include asm-generic/siginfo.h and define the arch-specific siginfo, and
for the generic copy_siginfo() to see that arch-specific definition.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Petr Malat <oss@malat.biz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Christopher Ferris <cferris@google.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 4.0-
Patchwork: https://patchwork.linux-mips.org/patch/12478/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Move retrieval of compat syscall numbers into inline function defined in
asm-generic header so that arches may override it.
[ralf@linux-mips.org: Resolve merge conflict.]
Suggested-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: IMG-MIPSLinuxKerneldevelopers@imgtec.com
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12978/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Whilst commit 9439eb3ab9 ("asm-generic: io: implement relaxed
accessor macros as conditional wrappers") makes the *_relaxed forms of
I/O accessors universally available to drivers, in cases where writeq()
is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
end up falling back to writel() regardless of whether writel_relaxed()
is available (identically for s/write/read/).
Add corresponding relaxed forms of the nonatomic helpers to delegate
to the equivalent 32-bit accessors as appropriate. We also need to fix
io.h to avoid defining default relaxed variants if the basic accessors
themselves don't exist.
CC: Christoph Hellwig <hch@lst.de>
CC: Darren Hart <dvhart@linux.intel.com>
CC: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The recent decoupling of pagefault disable and preempt disable added an
explicit preempt_disable/enable() pair to the futex_atomic_cmpxchg_inatomic()
implementation in asm-generic/futex.h. But it forgot to add preempt_enable()
calls to the error handling code pathes, which results in a preemption count
imbalance.
This is observable on boot when the test for atomic_cmpxchg() is calling
futex_atomic_cmpxchg_inatomic() on a NULL pointer.
Add the missing preempt_enable() calls to the error handling code pathes.
[ tglx: Massaged changelog ]
Fixes: d9b9ff8c18 ("sched/preempt, futex: Disable preemption in UP futex_atomic_cmpxchg_inatomic() explicitly")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Cc: linux-arch@vger.kernel.org
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1460640963-690-1-git-send-email-romain.perier@free-electrons.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
04633df0c4 ("x86/cpu: Call verify_cpu() after having entered long mode too")
added the call to verify_cpu() for sanitizing CPU configuration.
The latter uses the stack minimally and it can happen that we land in
startup_64() directly from a 64-bit bootloader. Then we want to use our
own, known good stack.
Do that.
APs don't need this as the trampoline sets up a stack for them.
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459434062-31055-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce a generic implementation necessary for down_write_killable().
This is a trivial extension of the already existing down_write() call
which can be interrupted by SIGKILL. This patch doesn't provide
down_write_killable() yet because arches have to provide the necessary
pieces before.
rwsem_down_write_failed() which is a generic slow path for the
write lock is extended to take a task state and renamed to
__rwsem_down_write_failed_common(). The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).
rwsem_down_write_failed_killable() is exported as a new way to wait for
the lock and be killable.
For rwsem-spinlock implementation the current __down_write() it updated
in a similar way as __rwsem_down_write_failed_common() except it doesn't
need new exports just visible __down_write_killable().
Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable() implementation and
use rwsem_down_write_failed_killable() for the slow path.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-7-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.
This shouldn't introduce any functional change.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.
Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>. Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are only three patches this time, most other changes to
files in include/asm-generic tend to go through the tree of whoever
depends on the change.
Two patches are cleanups for stuff that is no longer needed,
the main change is to adapt the generic version of BUG_ON()
for CONFIG_BUG=n to make it behave consistently with BUG().
This avoids undefined behavior along with a number of warnings
about that undefined behavior in randconfig builds when
we keep going on after hitting a BUG_ON().
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVvRXDGCrR//JCVInAQKUFRAAmp23pohv08LZzXL8Qu7XfFN+b1RkZ936
WYBeiA9PEWufQs2hgXaEUXy0onO7ah4cs2NWfkaBPyxT+I9mN+ThdzqVrlTE+AEO
2K0f2RaZANC238zB86Yv/YvTj7FegH0DDdMBq/P06vlYdgBegx49U3pMpguxl3d0
/q9MyqTzo9j4uOEK4ix4/Dko+4eKIS5Y/xeb0TkeKA6HiBVzAhGLZFl+eMku07Bf
ap8B705hBDXSBFeWcK9AvKjHZCM+FCkb+C3TXo9x5tUu8g5OIG1t962OQvT9ldsP
rvo5ppRh/TAY2Z9chN3cKrsvshbHiZ9uRzeksCunL+SK+dOhEIPCVzLXndQpi3RD
NgeNKgo6gKYdle44pEj0EH2ktuvr0u8sbjQg9SY2miC1H4DmEbCakSqtQegHXTKd
chJ6xyNiQXktdfo0pFOtCA2gjqiAriugttBqUtGcK9zRqjGGpP5hOUQVm3jR7UMp
Hjb+oj5o+Gjz5J1t5zsjbhFINDCHAgXRzqqaoT9RfE9+QlUftUhu+N9KVFgzhe9I
93VHaqgGIRoi856BO7UZSaMGhy7ljm1nQ18jP9aZl/tBco0kpd3AO8og9dJ0u2j+
3fEqAHH30ia8GJCfIDnolxTL6uaqcCIeAoLgGcmn+QZS7ka+tD+000rtgd2pdy9/
gy/VPpFG064=
=8tPL
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are only three patches this time, most other changes to files in
include/asm-generic tend to go through the tree of whoever depends on
the change.
Two patches are cleanups for stuff that is no longer needed, the main
change is to adapt the generic version of BUG_ON() for CONFIG_BUG=n to
make it behave consistently with BUG().
This avoids undefined behavior along with a number of warnings about
that undefined behavior in randconfig builds when we keep going on
after hitting a BUG_ON()"
* tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: remove old nonatomic-io wrapper files
asm-generic: default BUG_ON(x) to if(x)BUG()
asm-generic: page.h: Remove useless get_user_page and free_user_page
Pull locking fixes from Ingo Molnar:
"Documentation updates and a bitops ordering fix"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bitops: Do not default to __clear_bit() for __clear_bit_unlock()
documentation: Clarify compiler store-fusion example
documentation: Transitivity is not cumulativity
documentation: Add alternative release-acquire outcome
documentation: Distinguish between local and global transitivity
documentation: Subsequent writes ordered by rcu_dereference()
documentation: Remove obsolete reference to RCU-protected indexes
documentation: Fix memory-barriers.txt section references
documentation: Fix control dependency and identical stores
__clear_bit_unlock() is a special little snowflake. While it carries the
non-atomic '__' prefix, it is specifically documented to pair with
test_and_set_bit() and therefore should be 'somewhat' atomic.
Therefore the generic implementation of __clear_bit_unlock() cannot use
the fully non-atomic __clear_bit() as a default.
If an arch is able to do better; is must provide an implementation of
__clear_bit_unlock() itself.
Specifically, this came up as a result of hackbench livelock'ing in
slab_lock() on ARC with SMP + SLUB + !LLSC.
The issue was incorrect pairing of atomic ops.
slab_lock() -> bit_spin_lock() -> test_and_set_bit()
slab_unlock() -> __bit_spin_unlock() -> __clear_bit()
The non serializing __clear_bit() was getting "lost"
80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set
80543b90: or r3,r2,1 <--- (B) other core unlocks right here
80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock)
Fixes ARC STAR 9000817404 (and probably more).
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160309114054.GJ6356@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).
There's a background article at LWN.net:
https://lwn.net/Articles/643797/
The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.
This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).
This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:
mmap(..., PROT_EXEC);
or
mprotect(ptr, sz, PROT_EXEC);
(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.
So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.
We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.
There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.
Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"
* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...
Pull networking updates from David Miller:
"Highlights:
1) Support more Realtek wireless chips, from Jes Sorenson.
2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.
3) Make several TCP sysctls per-namespace, from Nikolay Borisov.
4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.
5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.
6) Add 802.1AE MACSEC support, from Sabrina Dubroca.
7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.
8) Add and use SKB bulk free infrastructure, from Jesper Brouer.
9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.
10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.
11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.
12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.
13) Hardware offload support for flower classifiers, from Amir Vadai.
14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...
Merge second patch-bomb from Andrew Morton:
- a couple of hotfixes
- the rest of MM
- a new timer slack control in procfs
- a couple of procfs fixes
- a few misc things
- some printk tweaks
- lib/ updates, notably to radix-tree.
- add my and Nick Piggin's old userspace radix-tree test harness to
tools/testing/radix-tree/. Matthew said it was a godsend during the
radix-tree work he did.
- a few code-size improvements, switching to __always_inline where gcc
screwed up.
- partially implement character sets in sscanf
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
sscanf: implement basic character sets
lib/bug.c: use common WARN helper
param: convert some "on"/"off" users to strtobool
lib: add "on"/"off" support to kstrtobool
lib: update single-char callers of strtobool()
lib: move strtobool() to kstrtobool()
include/linux/unaligned: force inlining of byteswap operations
include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
usb: common: convert to use match_string() helper
ide: hpt366: convert to use match_string() helper
ata: hpt366: convert to use match_string() helper
power: ab8500: convert to use match_string() helper
power: charger_manager: convert to use match_string() helper
drm/edid: convert to use match_string() helper
pinctrl: convert to use match_string() helper
device property: convert to use match_string() helper
lib/string: introduce match_string() helper
radix-tree tests: add test for radix_tree_iter_next
radix-tree tests: add regression3 test
...
Core changes:
- The gpio_chip is now a *real device*. Until now the gpio chips
were just piggybacking the parent device or (gasp) floating in
space outside of the device model. We now finally make GPIO chips
devices. The gpio_chip will create a gpio_device which contains
a struct device, and this gpio_device struct is kept private.
Anything that needs to be kept private from the rest of the kernel
will gradually be moved over to the gpio_device.
- As a result of making the gpio_device a real device, we have added
resource management, so devm_gpiochip_add_data() will cut down on
overhead and reduce code lines. A huge slew of patches convert
almost all drivers in the subsystem to use this.
- Building on making the GPIO a real device, we add the first step
of a new userspace ABI: the GPIO character device. We take small
steps here, so we first add a pure *information* ABI and the tool
"lsgpio" that will list all GPIO devices on the system and all
lines on these devices. We can now discover GPIOs properly from
userspace. We still have not come up with a way to actually *use*
GPIOs from userspace.
- To encourage people to use the character device for the future,
we have it always-enabled when using GPIO. The old sysfs ABI is
still opt-in (and can be used in parallel), but is marked as
deprecated. We will keep it around for the foreseeable future,
but it will not be extended to cover ever more use cases.
Cleanup:
- Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
includes. This dates back to when GPIO was an opt-in feature and
no shared library even existed: just a header file with proper
prototypes was provided and all semantics were up to the arch to
implement. These patches make the GPIO chip even more a proper
device and cleans out leftovers of the old in-kernel API here
and there. Still some cruft is left but it's very little now.
- There is still some clamping of return values for .get() going
on, but we now return sane values in the vast majority of drivers
and the errorpath is sanitized. Some patches for powerpc, blackfin
and unicore still drop in.
- We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
implementations to use gpiochip_add_data() and cut down on code
lines.
- MPC8xxx is converted to use the generic GPIO helpers.
- ATH79 is converted to use the generic GPIO helpers.
New drivers:
- WinSystems WS16C48
- Acces 104-DIO-48E
- F81866 (a F7188x variant)
- Qoric (a MPC8xxx variant)
- TS-4800
- SPI serializers (pisosr): simple 74xx shift registers connected
to SPI to obtain a dirt-cheap output-only GPIO expander.
- Texas Instruments TPIC2810
- Texas Instruments TPS65218
- Texas Instruments TPS65912
- X-Gene (ARM64) standby GPIO controller
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW6m24AAoJEEEQszewGV1zUasP/RpTrjRcNI5QFHjudd2oioDx
R/IljC06Q072ZqVy/MR7QxwhoU8jUnCgKgv4rgMa1OcfHblxC2R1+YBKOUSij831
E+SYmYDYmoMhN7j5Aslr66MXg1rLdFSdCZWemuyNruAK8bx6cTE1AWS8AELQzzTn
Re/CPpCDbujLy0ZK2wJHgr9ZkdcBGICtDRCrOR3Kyjpwk/DSZcruK1PDN+VQMI3k
bJlwgtGenOHINgCq/16edpwj/hzmoJXhTOZXJHI5XVR6czTwb3SvCYACvCkauI/a
/N7b3quG88b5y0OPQPVxp5+VVl9GyVcv5oGzIfTNat/g5QinShZIT4kVV9r0xu6/
TQHh1HlXleh+QI3yX0oRv9ztHreMf+vdpw1dhIwLqHqfJ7AWdOGk7BbKjwCrsOoq
t/qUVFnyvooLpyr53Z5JY8+LqyynHF68G+jUQyHLgTZ0GCE+z+1jqNl1T501n3kv
3CSlNYxSN/YUBN3cnroAIU/ZWcV4YRdxmOtEWP+7xgcdzTE6s/JHb2fuEfVHzWPf
mHWtJGy8U0IR4VSSEln5RtjhRr0PAjTHeTOGAmivUnaIGDziTowyUVF+X5hwC77E
DGTuLVx/Kniv173DK7xNAsUZNAETBa3fQZTgu+RfOpMiM1FZc7tI1rd7K7PjbyCc
d2M0gcq+d11ITJTxC7OM
=9AJ4
-----END PGP SIGNATURE-----
Merge tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for kernel v4.6. There is quite a
lot of interesting stuff going on.
The patches to other subsystems and arch-wide are ACKed as far as
possible, though I consider things like per-arch <asm/gpio.h> as
essentially a part of the GPIO subsystem so it should not be needed.
Core changes:
- The gpio_chip is now a *real device*. Until now the gpio chips
were just piggybacking the parent device or (gasp) floating in
space outside of the device model.
We now finally make GPIO chips devices. The gpio_chip will create
a gpio_device which contains a struct device, and this gpio_device
struct is kept private. Anything that needs to be kept private
from the rest of the kernel will gradually be moved over to the
gpio_device.
- As a result of making the gpio_device a real device, we have added
resource management, so devm_gpiochip_add_data() will cut down on
overhead and reduce code lines. A huge slew of patches convert
almost all drivers in the subsystem to use this.
- Building on making the GPIO a real device, we add the first step of
a new userspace ABI: the GPIO character device. We take small
steps here, so we first add a pure *information* ABI and the tool
"lsgpio" that will list all GPIO devices on the system and all
lines on these devices.
We can now discover GPIOs properly from userspace. We still have
not come up with a way to actually *use* GPIOs from userspace.
- To encourage people to use the character device for the future, we
have it always-enabled when using GPIO. The old sysfs ABI is still
opt-in (and can be used in parallel), but is marked as deprecated.
We will keep it around for the foreseeable future, but it will not
be extended to cover ever more use cases.
Cleanup:
- Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
includes.
This dates back to when GPIO was an opt-in feature and no shared
library even existed: just a header file with proper prototypes was
provided and all semantics were up to the arch to implement. These
patches make the GPIO chip even more a proper device and cleans out
leftovers of the old in-kernel API here and there.
Still some cruft is left but it's very little now.
- There is still some clamping of return values for .get() going on,
but we now return sane values in the vast majority of drivers and
the errorpath is sanitized. Some patches for powerpc, blackfin and
unicore still drop in.
- We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
implementations to use gpiochip_add_data() and cut down on code
lines.
- MPC8xxx is converted to use the generic GPIO helpers.
- ATH79 is converted to use the generic GPIO helpers.
New drivers:
- WinSystems WS16C48
- Acces 104-DIO-48E
- F81866 (a F7188x variant)
- Qoric (a MPC8xxx variant)
- TS-4800
- SPI serializers (pisosr): simple 74xx shift registers connected to
SPI to obtain a dirt-cheap output-only GPIO expander.
- Texas Instruments TPIC2810
- Texas Instruments TPS65218
- Texas Instruments TPS65912
- X-Gene (ARM64) standby GPIO controller"
* tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (194 commits)
Revert "Share upstreaming patches"
gpio: mcp23s08: Fix clearing of interrupt.
gpiolib: Fix comment referring to gpio_*() in gpiod_*()
gpio: pca953x: Fix pca953x_gpio_set_multiple() on 64-bit
gpio: xgene: Fix kconfig for standby GIPO contoller
gpio: Add generic serializer DT binding
gpio: uapi: use 0xB4 as ioctl() major
gpio: tps65912: fix bad merge
Revert "gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free"
gpio: omap: drop dev field from gpio_bank structure
gpio: mpc8xxx: Slightly update the code for better readability
gpio: mpc8xxx: Remove *read_reg and *write_reg from struct mpc8xxx_gpio_chip
gpio: mpc8xxx: Fixup setting gpio direction output
gpio: mcp23s08: Add support for mcp23s18
dt-bindings: gpio: altera: Fix altr,interrupt-type property
gpio: add driver for MEN 16Z127 GPIO controller
gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free
gpio: timberdale: Switch to devm_ioremap_resource()
gpio: ts4800: Add IMX51 dependency
gpiolib: rewrite gpiodev_add_to_list
...
- Initial page table creation reworked to avoid breaking large block
mappings (huge pages) into smaller ones. The ARM architecture requires
break-before-make in such cases to avoid TLB conflicts but that's not
always possible on live page tables
- Kernel virtual memory layout: the kernel image is no longer linked to
the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of
the vmalloc space, allowing the kernel to be loaded (nearly) anywhere
in physical RAM
- Kernel ASLR: position independent kernel Image and modules being
randomly mapped in the vmalloc space with the randomness is provided
by UEFI (efi_get_random_bytes() patches merged via the arm64 tree,
acked by Matt Fleming)
- Implement relative exception tables for arm64, required by KASLR
(initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but
actual x86 conversion to deferred to 4.7 because of the merge
dependencies)
- Support for the User Access Override feature of ARMv8.2: this allows
uaccess functions (get_user etc.) to be implemented using LDTR/STTR
instructions. Such instructions, when run by the kernel, perform
unprivileged accesses adding an extra level of protection. The
set_fs() macro is used to "upgrade" such instruction to privileged
accesses via the UAO bit
- Half-precision floating point support (part of ARMv8.2)
- Optimisations for CPUs with or without a hardware prefetcher (using
run-time code patching)
- copy_page performance improvement to deal with 128 bytes at a time
- Sanity checks on the CPU capabilities (via CPUID) to prevent
incompatible secondary CPUs from being brought up (e.g. weird
big.LITTLE configurations)
- valid_user_regs() reworked for better sanity check of the sigcontext
information (restored pstate information)
- ACPI parking protocol implementation
- CONFIG_DEBUG_RODATA enabled by default
- VDSO code marked as read-only
- DEBUG_PAGEALLOC support
- ARCH_HAS_UBSAN_SANITIZE_ALL enabled
- Erratum workaround Cavium ThunderX SoC
- set_pte_at() fix for PROT_NONE mappings
- Code clean-ups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+
RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC
hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv
50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3
DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x
YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY
OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk
EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7
3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN
dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r
xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM
LepccTgykiUBqW5TRzPz
=/oS+
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Here are the main arm64 updates for 4.6. There are some relatively
intrusive changes to support KASLR, the reworking of the kernel
virtual memory layout and initial page table creation.
Summary:
- Initial page table creation reworked to avoid breaking large block
mappings (huge pages) into smaller ones. The ARM architecture
requires break-before-make in such cases to avoid TLB conflicts but
that's not always possible on live page tables
- Kernel virtual memory layout: the kernel image is no longer linked
to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
of the vmalloc space, allowing the kernel to be loaded (nearly)
anywhere in physical RAM
- Kernel ASLR: position independent kernel Image and modules being
randomly mapped in the vmalloc space with the randomness is
provided by UEFI (efi_get_random_bytes() patches merged via the
arm64 tree, acked by Matt Fleming)
- Implement relative exception tables for arm64, required by KASLR
(initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
but actual x86 conversion to deferred to 4.7 because of the merge
dependencies)
- Support for the User Access Override feature of ARMv8.2: this
allows uaccess functions (get_user etc.) to be implemented using
LDTR/STTR instructions. Such instructions, when run by the kernel,
perform unprivileged accesses adding an extra level of protection.
The set_fs() macro is used to "upgrade" such instruction to
privileged accesses via the UAO bit
- Half-precision floating point support (part of ARMv8.2)
- Optimisations for CPUs with or without a hardware prefetcher (using
run-time code patching)
- copy_page performance improvement to deal with 128 bytes at a time
- Sanity checks on the CPU capabilities (via CPUID) to prevent
incompatible secondary CPUs from being brought up (e.g. weird
big.LITTLE configurations)
- valid_user_regs() reworked for better sanity check of the
sigcontext information (restored pstate information)
- ACPI parking protocol implementation
- CONFIG_DEBUG_RODATA enabled by default
- VDSO code marked as read-only
- DEBUG_PAGEALLOC support
- ARCH_HAS_UBSAN_SANITIZE_ALL enabled
- Erratum workaround Cavium ThunderX SoC
- set_pte_at() fix for PROT_NONE mappings
- Code clean-ups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Fix misspellings in comments.
arm64: efi: add missing frame pointer assignment
arm64: make mrs_s prefixing implicit in read_cpuid
arm64: enable CONFIG_DEBUG_RODATA by default
arm64: Rework valid_user_regs
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: KVM: Move kvm_call_hyp back to its original localtion
arm64: mm: treat memstart_addr as a signed quantity
arm64: mm: list kernel sections in order
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
arm64: kconfig: add submenu for 8.2 architectural features
arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
arm64: Add support for Half precision floating point
arm64: Remove fixmap include fragility
arm64: Add workaround for Cavium erratum 27456
arm64: mm: Mark .rodata as RO
...
The traceoff_on_warning option doesn't have any effect on s390, powerpc,
arm64, parisc, and sh because there are two different types of WARN
implementations:
1) The above mentioned architectures treat WARN() as a special case of a
BUG() exception. They handle warnings in report_bug() in lib/bug.c.
2) All other architectures just call warn_slowpath_*() directly. Their
warnings are handled in warn_slowpath_common() in kernel/panic.c.
Support traceoff_on_warning on all architectures and prevent any future
divergence by using a single common function to emit the warning.
Also remove the '()' from '%pS()', because the parentheses look funky:
[ 45.607629] WARNING: at /root/warn_mod/warn_mod.c:17 .init_dummy+0x20/0x40 [warn_mod]()
Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
atomic_long_inc(), atomic_long_dec() and atomic_long_add()
functions get deinlined about 40 times. Examples of disassembly:
<atomic_long_inc> (21 copies, 147 calls):
55 push %rbp
48 89 e5 mov %rsp,%rbp
f0 48 ff 07 lock incq (%rdi)
5d pop %rbp
c3 retq
<atomic_long_dec> (4 copies, 14 calls) is similar to inc.
<atomic_long_add> (11 copies, 41 calls):
55 push %rbp
48 89 e5 mov %rsp,%rbp
f0 48 01 3e lock add %rdi,(%rsi)
5d pop %rbp
c3 retq
This patch fixes this via s/inline/__always_inline/.
Code size decrease after the patch is ~1.3k:
text data bss dec hex filename
92203657 20826112 36417536 149447305 8e86289 vmlinux
92202377 20826112 36417536 149446025 8e85d89 vmlinux4_atomiclong_after
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This has hit me a couple of times already. I would be debugging code
and the system would simply hang and then reboot. Finally, I found that
the problem was caused by WARN_ON_ONCE() and friends.
The macro WARN_ON_ONCE(condition) is defined as:
static bool __section(.data.unlikely) __warned;
int __ret_warn_once = !!(condition);
if (unlikely(__ret_warn_once))
if (WARN_ON(!__warned))
__warned = true;
unlikely(__ret_warn_once);
Which looks great and all. But what I have hit, is an issue when
WARN_ON() itself hits the same WARN_ON_ONCE() code. Because, the
variable __warned is not yet set. Then it too calls WARN_ON() and that
triggers the warning again. It keeps doing this until the stack is
overflowed and the system crashes.
By setting __warned first before calling WARN_ON() makes the original
WARN_ON_ONCE() really only warn once, and not an infinite amount of
times if the WARN_ON() also triggers the warning.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We remove one instace of flush_tlb_range here. That was added by commit
f714f4f20e ("mm: numa: call MMU notifiers on THP migration"). But the
pmdp_huge_clear_flush_notify should have done the require flush for us.
Hence remove the extra flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's the big tty/serial driver pull request for 4.6-rc1.
Lots of changes in here, Peter has been on a tear again, with lots of
refactoring and bugs fixes, many thanks to the great work he has been
doing. Lots of driver updates and fixes as well, full details in the
shortlog.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlbp8z8ACgkQMUfUDdst+ym1vwCgnOOCORaZyeQ4QrcxPAK5pHFn
VrMAoNHvDgNYtG+Hmzv25Lgp3HnysPin
=MLRG
-----END PGP SIGNATURE-----
Merge tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here's the big tty/serial driver pull request for 4.6-rc1.
Lots of changes in here, Peter has been on a tear again, with lots of
refactoring and bugs fixes, many thanks to the great work he has been
doing. Lots of driver updates and fixes as well, full details in the
shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (220 commits)
serial: 8250: describe CONFIG_SERIAL_8250_RSA
serial: samsung: optimize UART rx fifo access routine
serial: pl011: add mark/space parity support
serial: sa1100: make sa1100_register_uart_fns a function
tty: serial: 8250: add MOXA Smartio MUE boards support
serial: 8250: convert drivers to use up_to_u8250p()
serial: 8250/mediatek: fix building with SERIAL_8250=m
serial: 8250/ingenic: fix building with SERIAL_8250=m
serial: 8250/uniphier: fix modular build
Revert "drivers/tty/serial: make 8250/8250_ingenic.c explicitly non-modular"
Revert "drivers/tty/serial: make 8250/8250_mtk.c explicitly non-modular"
serial: mvebu-uart: initial support for Armada-3700 serial port
serial: mctrl_gpio: Add missing module license
serial: ifx6x60: avoid uninitialized variable use
tty/serial: at91: fix bad offset for UART timeout register
tty/serial: at91: restore dynamic driver binding
serial: 8250: Add hardware dependency to RT288X option
TTY, devpts: document pty count limiting
tty: goldfish: support platform_device with id -1
drivers: tty: goldfish: Add device tree bindings
...
Pull read-only kernel memory updates from Ingo Molnar:
"This tree adds two (security related) enhancements to the kernel's
handling of read-only kernel memory:
- extend read-only kernel memory to a new class of formerly writable
kernel data: 'post-init read-only memory' via the __ro_after_init
attribute, and mark the ARM and x86 vDSO as such read-only memory.
This kind of attribute can be used for data that requires a once
per bootup initialization sequence, but is otherwise never modified
after that point.
This feature was based on the work by PaX Team and Brad Spengler.
(by Kees Cook, the ARM vDSO bits by David Brown.)
- make CONFIG_DEBUG_RODATA always enabled on x86 and remove the
Kconfig option. This simplifies the kernel and also signals that
read-only memory is the default model and a first-class citizen.
(Kees Cook)"
* 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM/vdso: Mark the vDSO code read-only after init
x86/vdso: Mark the vDSO code read-only after init
lkdtm: Verify that '__ro_after_init' works correctly
arch: Introduce post-init read-only memory
x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
asm-generic: Consolidate mark_rodata_ro()
This patch updates all instances of csum_tcpudp_magic and
csum_tcpudp_nofold to reflect the types that are usually used as the source
inputs. For example the protocol field is populated based on nexthdr which
is actually an unsigned 8 bit value. The length is usually populated based
on skb->len which is an unsigned integer.
This addresses an issue in which the IPv6 function csum_ipv6_magic was
generating a checksum using the full 32b of skb->len while
csum_tcpudp_magic was only using the lower 16 bits. As a result we could
run into issues when attempting to adjust the checksum as there was no
protocol agnostic way to update it.
With this change the value is still truncated as many architectures use
"(len + proto) << 8", however this truncation only occurs for values
greater than 16776960 in length and as such is unlikely to occur as we stop
the inner headers at ~64K in size.
I did have to make a few minor changes in the arm, mn10300, nios2, and
score versions of the function in order to support these changes as they
were either using things such as an OR to combine the protocol and length,
or were using ntohs to convert the length which would have truncated the
value.
I also updated a few spots in terms of whitespace and type differences for
the addresses. Most of this was just to make sure all of the definitions
were in sync going forward.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For a long time all architectures implement the pci_dma_* functions using
the generic DMA API, and they all use the same header to do so.
Move this header, pci-dma-compat.h, to include/linux and include it from
the generic pci.h instead of having each arch duplicate this include.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The two header files got moved to include/linux, and most
users were already converted, this changes the remaining drivers
and removes the files.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Yisen Zhuang <yisen.zhuang@huawei.com>
When CONFIG_BUG is disabled, BUG_ON() will only evaluate the condition,
but will not actually stop the current thread. GCC warns about a couple
of BUG_ON() users where this actually leads to further undefined
behavior:
include/linux/ceph/osdmap.h: In function 'ceph_can_shift_osds':
include/linux/ceph/osdmap.h:54:1: warning: control reaches end of non-void function
fs/ext4/inode.c: In function 'ext4_map_blocks':
fs/ext4/inode.c:548:5: warning: 'retval' may be used uninitialized in this function
drivers/mfd/db8500-prcmu.c: In function 'prcmu_config_clkout':
drivers/mfd/db8500-prcmu.c:762:10: warning: 'div_mask' may be used uninitialized in this function
drivers/mfd/db8500-prcmu.c:769:13: warning: 'mask' may be used uninitialized in this function
drivers/mfd/db8500-prcmu.c:757:7: warning: 'bits' may be used uninitialized in this function
drivers/tty/serial/8250/8250_core.c: In function 'univ8250_release_irq':
drivers/tty/serial/8250/8250_core.c:252:18: warning: 'i' may be used uninitialized in this function
drivers/tty/serial/8250/8250_core.c:235:19: note: 'i' was declared here
There is an obvious conflict of interest here: on the one hand, someone
who disables CONFIG_BUG() will want the kernel to be as small as possible
and doesn't care about printing error messages to a console that nobody
looks at. On the other hand, running into a BUG_ON() condition means that
something has gone wrong, and we probably want to also stop doing things
that might cause data corruption.
This patch picks the second choice, and changes the NOP to BUG(), which
normally stops the execution of the current thread in some form (endless
loop or a trap). This follows the logic we applied in a4b5d580e0 ("bug:
Make BUG() always stop the machine").
For ARM multi_v7_defconfig, the size slightly increases:
section CONFIG_BUG=y CONFIG_BUG=n CONFIG_BUG=n+patch
.text 8320248 | 8180944 | 8207688
.rodata 3633720 | 3567144 | 3570648
__bug_table 32508 | --- | ---
__modver 692 | 1584 | 2176
.init.text 558132 | 548300 | 550088
.exit.text 12380 | 12256 | 12380
.data 1016672 | 1016064 | 1016128
Total 14622556 | 14374510 | 14407326
So instead of saving 1.70% of the total image size, we only save 1.48%
by turning off CONFIG_BUG, but in return we can ensure that we don't run
into cases of uninitialized variable or return code uses when something
bad happens. Aside from that, we significantly reduce the number of
warnings in randconfig builds, which makes it easier to fix the warnings
about other problems.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Move the __ARCH_SPIN_LOCK_UNLOCKED definition from qspinlock.h into
qspinlock_types.h.
The definition of __ARCH_SPIN_LOCK_UNLOCKED comes from the build arch's
include files; but on x86 when CONFIG_QUEUED_SPINLOCKS=y, it just
it's defined in asm-generic/qspinlock.h. In most cases, this doesn't
matter because linux/spinlock.h includes asm/spinlock.h, which for x86
includes asm-generic/qspinlock.h. However, any code that only includes
linux/mutex.h will break, because it only includes asm/spinlock_types.h.
For example, this breaks systemtap, which only includes mutex.h.
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <Waiman.Long@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455907767-17821-1-git-send-email-dan.streetman@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
They are not symmetric with each other, neither are used in real world
(can not be found by grep command in source code root directory), so
remove them.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
One of the easiest ways to protect the kernel from attack is to reduce
the internal attack surface exposed when a "write" flaw is available. By
making as much of the kernel read-only as possible, we reduce the
attack surface.
Many things are written to only during __init, and never changed
again. These cannot be made "const" since the compiler will do the wrong
thing (we do actually need to write to them). Instead, move these items
into a memory region that will be made read-only during mark_rodata_ro()
which happens after all kernel __init code has finished.
This introduces __ro_after_init as a way to mark such memory, and adds
some documentation about the existing __read_mostly marking.
This improves the security of the Linux kernel by marking formerly
read-write memory regions as read-only on a fully booted up system.
Based on work by PaX Team and Brad Spengler.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Fix build error on 32-bit with checkpoint restart from Aneesh Kumar
- Fix dedotify for binutils >= 2.26 from Andreas Schwab
- Don't trace hcalls on offline CPUs from Denis Kirjanov
- eeh: Fix stale cached primary bus from Gavin Shan
- eeh: Fix stale PE primary bus from Gavin Shan
- mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V
- ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWx8l5AAoJEFHr6jzI4aWAeY0P/AomeQCRieoBMKJi36WX4+gU
Cm1iBgM593VEM/KFsYtedm5+4QaCmPE+1tVm4/u0wbLeEQ8TqNLSZLniB9USE0hb
9655gGQyFE95BZa8WfbqHOI7+BK+TkUOWGY0CfyqPVrknzSN2MCDHjUaNo1wge6l
zmIYIkKhaQAinFSFovOdjQ63rYdk6CxsfgbP1Gl2aX0cmzWW1n07AvZLqNmLFJ+4
L3uBXPcrEKY/nfkRi+FutoTb86ggt9J9dqCfJHHfWKn60qwhpKwiva84k3jI/BOu
yBTFeNzlobXt0ceDSWx1ITXzKmJQokWGC5+Lo+0mDb4veAbhLgHlXdx7NUcZIB6+
YGYGSOkeKCnbnInIOGLz45LlevJFviI94y0YY4tt++Csq/IjnBhDeTkGx7zcqRLG
v5hl7AhykHd3Me5iRuyRRVoVyk6+318OZW450Oxxj7EtkzpSeLfHCMKxk5w1Nxuk
tenWQeApdkTVr91m5VJNFOsrmtFDLJv51C8duiUFWzc195ejSMYDR86K+qBeaebs
39obXHVYRnCrn9TzODIR9SnEd7dHImekQ4a3G3F54mLJ4qqUN089TDqBGY2GNuT8
j8QVBttp3SWuZSvtvDJLxoFt2QTKxcuiMQ4FX/OAS4qWRjSR8v2WTCyBZt68l7er
kpUnIelJSuIDVLdNuFlf
=7Yzi
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix build error on 32-bit with checkpoint restart from Aneesh Kumar
- Fix dedotify for binutils >= 2.26 from Andreas Schwab
- Don't trace hcalls on offline CPUs from Denis Kirjanov
- eeh: Fix stale cached primary bus from Gavin Shan
- eeh: Fix stale PE primary bus from Gavin Shan
- mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V
- ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy
* tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/ioda: Set "read" permission when "write" is set
powerpc/mm: Fix Multi hit ERAT cause by recent THP update
powerpc/powernv: Fix stale PE primary bus
powerpc/eeh: Fix stale cached primary bus
powerpc/pseries: Don't trace hcalls on offline CPUs
powerpc: Fix dedotify for binutils >= 2.26
powerpc/book3s_32: Fix build error with checkpoint restart
The ARM version of asm/gpio.h basically just contains the same definitions
as the gpiolib version, with the exception of ARCH_NR_GPIOS.
This adds the option for overriding the constant through Kconfig to
the architecture-independent header, so we can remove the ARM specific
file later.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
As discussed earlier, we attempt to enforce protection keys in
software.
However, the code checks all faults to ensure that they are not
violating protection key permissions. It was assumed that all
faults are either write faults where we check PKRU[key].WD (write
disable) or read faults where we check the AD (access disable)
bit.
But, there is a third category of faults for protection keys:
instruction faults. Instruction faults never run afoul of
protection keys because they do not affect instruction fetches.
So, plumb the PF_INSTR bit down in to the
arch_vma_access_permitted() function where we do the protection
key checks.
We also add a new FAULT_FLAG_INSTRUCTION. This is because
handle_mm_fault() is not passed the architecture-specific
error_code where we keep PF_INSTR, so we need to encode the
instruction fetch information in to the arch-generic fault
flags.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We try to enforce protection keys in software the same way that we
do in hardware. (See long example below).
But, we only want to do this when accessing our *own* process's
memory. If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory. PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.
This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process. We want to avoid that.
To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
fault flag: FAULT_FLAG_REMOTE. They indicate that we are
walking an mm which is not guranteed to be the same as
current->mm and should not be subject to protection key
enforcement.
Thanks to Jerome Glisse for pointing out this scenario.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Today, for normal faults and page table walks, we check the VMA
and/or PTE to ensure that it is compatible with the action. For
instance, if we get a write fault on a non-writeable VMA, we
SIGSEGV.
We try to do the same thing for protection keys. Basically, we
try to make sure that if a user does this:
mprotect(ptr, size, PROT_NONE);
*ptr = foo;
they see the same effects with protection keys when they do this:
mprotect(ptr, size, PROT_READ|PROT_WRITE);
set_pkey(ptr, size, 4);
wrpkru(0xffffff3f); // access disable pkey 4
*ptr = foo;
The state to do that checking is in the VMA, but we also
sometimes have to do it on the page tables only, like when doing
a get_user_pages_fast() where we have no VMA.
We add two functions and expose them to generic code:
arch_pte_access_permitted(pte_flags, write)
arch_vma_access_permitted(vma, write)
These are, of course, backed up in x86 arch code with checks
against the PTE or VMA's protection key.
But, there are also cases where we do not want to respect
protection keys. When we ptrace(), for instance, we do not want
to apply the tracer's PKRU permissions to the PTEs from the
process being traced.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently __set_fixmap_offset is a macro function which has a local
variable called 'addr'. If a caller passes a 'phys' parameter which is
derived from a variable also called 'addr', the local variable will
shadow this, and the compiler will complain about the use of an
uninitialized variable. To avoid the issue with namespace clashes,
'addr' is prefixed with a liberal sprinkling of underscores.
Turning __set_fixmap_offset into a static inline breaks the build for
several architectures. Fixing this properly requires updates to a number
of architectures to make them agree on the prototype of __set_fixmap (it
could be done as a subsequent patch series).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
[catalin.marinas@arm.com: squashed the original function patch and macro fixup]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.
Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.
Consider following race:
CPU0 CPU1
shrink_page_list()
add_to_swap()
split_huge_page_to_list()
__split_huge_pmd_locked()
pmdp_huge_clear_flush_notify()
// pmd_none() == true
exit_mmap()
unmap_vmas()
zap_pmd_range()
// no action on pmd since pmd_none() == true
pmd_populate()
As result the THP will not be freed. The leak is detected by check_mm():
BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
The above required us to not mark pmd none during a pmd split.
The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)
Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.
Fixes: eef1b3ba05 ("thp: implement split_huge_pmd()")
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use a single common table of struct earlycon_id for both command line
and devicetree. Re-define OF_EARLYCON_DECLARE() macro to instance a
unique earlycon declaration (the declaration is only guaranteed to be
unique within a compilation unit; separate compilation units must still
use unique earlycon names).
The semantics of OF_EARLYCON_DECLARE() is different; it declares an
earlycon which can matched either on the command line or by devicetree.
EARLYCON_DECLARE() is semantically unchanged; it declares an earlycon
which is matched by command line only. Remove redundant instances of
EARLYCON_DECLARE().
This enables all earlycons to properly initialize struct console
with the appropriate name and index, which improves diagnostics and
enables direct earlycon-to-console handoff.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PCI flag management constants and functions were previously declared in
include/asm-generic/pci-bridge.h. But they are not specific to bridges,
and arches did not include pci-bridge.h consistently.
Move the following interfaces and related constants to include/linux/pci.h
and remove pci-bridge.h:
pci_set_flags()
pci_add_flags()
pci_clear_flags()
pci_has_flag()
This fixes these warnings when building for some arches:
drivers/pci/host/pcie-designware.c:562:20: error: 'PCI_PROBE_ONLY' undeclared (first use in this function)
drivers/pci/host/pcie-designware.c:562:7: error: implicit declaration of function 'pci_has_flag' [-Werror=implicit-function-declaration]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The datatype __kernel_time_t is u32 on 32bit platform, so its subject to
overflows in the timeval/timespec to cputime conversion.
Currently the following functions are affected:
1. setitimer()
2. timer_create/timer_settime()
3. sys_clock_nanosleep
This can happen on MIPS32 and ARM32 with "Full dynticks CPU time accounting"
enabled, which is required for CONFIG_NO_HZ_FULL.
Enforce u64 conversion to prevent the overflow.
Fixes: 31c1fc8187 ("ARM: Kconfig: allow full nohz CPU accounting")
Signed-off-by: zengtao <prime.zeng@huawei.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: <fweisbec@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1454384314-154784-1-git-send-email-prime.zeng@huawei.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge third patch-bomb from Andrew Morton:
"I'm pretty much done for -rc1 now:
- the rest of MM, basically
- lib/ updates
- checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit
- cpu_mask simplifications
- kexec, rapidio, MAINTAINERS etc, etc.
- more dma-mapping cleanups/simplifications from hch"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
MAINTAINERS: add/fix git URLs for various subsystems
mm: memcontrol: add "sock" to cgroup2 memory.stat
mm: memcontrol: basic memory statistics in cgroup2 memory controller
mm: memcontrol: do not uncharge old page in page cache replacement
Documentation: cgroup: add memory.swap.{current,max} description
mm: free swap cache aggressively if memcg swap is full
mm: vmscan: do not scan anon pages if memcg swap limit is hit
swap.h: move memcg related stuff to the end of the file
mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
mm: vmscan: pass memcg to get_scan_count()
mm: memcontrol: charge swap to cgroup2
mm: memcontrol: clean up alloc, online, offline, free functions
mm: memcontrol: flatten struct cg_proto
mm: memcontrol: rein in the CONFIG space madness
net: drop tcp_memcontrol.c
mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
mm: memcontrol: allow to disable kmem accounting for cgroup2
mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
mm: memcontrol: separate kmem code from legacy tcp accounting code
...
The asm-generic tree this time contains one series from Nicolas Pitre
that makes the optimized do_div() implementation from the ARM
architecture available to all architectures. This also adds stricter
type checking for callers of do_div, which has uncovered a number
of bugs in existing code, and fixes up the ones we have found.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVqARKWCrR//JCVInAQJrBhAAlwZL0IiVGFfDXWtvQGOm+yC5j4vdIhMf
1scsvRbk3ln1xUk5+NM61NpxbQotro78K5HxFZFhaVGUTbbFXM9w2VZSyI8ZaGAJ
Od6lBUUyLQmzlbHDJ3v/zrZn8Up7qZlRApmXcbUVDtssfnEfKk4xA2RG9JwIMS1c
uZMvnD7N3P9vxDPl+CsYlB2osi6Yks3VQ1tXYe2z6siO+H67zHaF08+ls7fbsd3d
oyKjZqlaQ02MIOr+AdR0h9iKyJJ6SXT0DQlsMyzB6aBWmeBCNLNALNIiukDk9Qc1
VV3sF1MOS3LtfU2TeOx4Na7hcd2iC6WYLb271iApO2Ww7t16n+de3i6AipZxLUJ0
08jiRlisTzUhXDobRSqI3mcQlxrB5UGfyblab2z/MqGGmIGJSPPRdTPRQUgi0ZKg
jksSmsaPwOQp64FhTgECLJthlYX7h6ULjkvJ9h60gZHa4jhGZbGPeMwHPf1uSm95
EvQE971Ssgm4jwhvxZ/kt1ruuZI/fxxG1Qfw+C25QkXZGKye2nB+icLWeMwz+FXG
HLqkmaAjasf5MAV1GiK8U6zoC6bCOLU0Lea83hOwRPZ999v3Nym1giSatNv4/pB+
QmkXRvFi93cdQ643l7xcUEDT2zpk4pogF3xREiBhyaXtqLlT7pPMKsBQOgdWvFuu
Ou0ZbEAwIVo=
=4psa
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"The asm-generic tree this time contains one series from Nicolas Pitre
that makes the optimized do_div() implementation from the ARM
architecture available to all architectures.
This also adds stricter type checking for callers of do_div, which has
uncovered a number of bugs in existing code, and fixes up the ones we
have found"
* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
ARM: asm/div64.h: adjust to generic codde
__div64_32(): make it overridable at compile time
__div64_const32(): abstract out the actual 128-bit cross product code
do_div(): generic optimization for constant divisor on 32-bit machines
div64.h: optimize do_div() for power-of-two constant divisors
mtd/sm_ftl.c: fix wrong do_div() usage
drm/mgag200/mgag200_mode.c: fix wrong do_div() usage
hid-sensor-hub.c: fix wrong do_div() usage
ti/fapll: fix wrong do_div() usage
ti/clkt_dpll: fix wrong do_div() usage
tegra/clk-divider: fix wrong do_div() usage
imx/clk-pllv2: fix wrong do_div() usage
imx/clk-pllv1: fix wrong do_div() usage
nouveau/nvkm/subdev/clk/gk20a.c: fix wrong do_div() usage
Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.
[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a new kind of barrier, and reworks virtio and xen
to use it.
Plus some fixes here and there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWlU2kAAoJECgfDbjSjVRpZ6IH/Ra19ecG8sCQo9zskr4zo22Z
DZXC3u0sJDBYjjBAiw3IY1FKh7wx2Fr1RhUOj1bteBgcFCMCV1zInP5ITiCyzd1H
YYh1w9C2tZaj2T4t9L4hIrAdtIF8fGS+oI2IojXPjOuDLEt6pfFBEjHp/sfl3UJq
ZmZvw4OXviSNej7jBw8Xni3Uv18yfmLGXvMdkvMSPC1/XL29voGDqTVwhqJwxLVz
k/ZLcKFOzIs9N7Nja0Jl1EiZtC2Y9cpItqweicNAzszlpkSL44vQxmCSefB+WyQ4
gt0O3+AxYkLfrxzCBhUA4IpRex3/XPW1b+1e/V1XjfR2n/FlyLe+AIa8uPJElFc=
=ukaV
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio barrier rework+fixes from Michael Tsirkin:
"This adds a new kind of barrier, and reworks virtio and xen to use it.
Plus some fixes here and there"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits)
checkpatch: add virt barriers
checkpatch: check for __smp outside barrier.h
checkpatch.pl: add missing memory barriers
virtio: make find_vqs() checkpatch.pl-friendly
virtio_balloon: fix race between migration and ballooning
virtio_balloon: fix race by fill and leak
s390: more efficient smp barriers
s390: use generic memory barriers
xen/events: use virt_xxx barriers
xen/io: use virt_xxx barriers
xenbus: use virt_xxx barriers
virtio_ring: use virt_store_mb
sh: move xchg_cmpxchg to a header by itself
sh: support 1 and 2 byte xchg
virtio_ring: update weak barriers to use virt_xxx
Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
asm-generic: implement virt_xxx memory barriers
x86: define __smp_xxx
xtensa: define __smp_xxx
tile: define __smp_xxx
...
Add a helper to check if an object (given an address and a size) is part
of a section (given beginning and end addresses). For convenience, also
provide a helper that performs this check for __init data using the
__init_begin and __init_end limits.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to the conversion of vm_insert_mixed() use pfn_t in the
vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the
pfn is backed by a devm_memremap_pages() mapping.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__phys_to_pfn and __pfn_to_phys are symmetric, PHYS_PFN and PFN_PHYS are
semmetric:
- y = (phys_addr_t)x << PAGE_SHIFT
- y >> PAGE_SHIFT = (phys_add_t)x
- (unsigned long)(y >> PAGE_SHIFT) = x
[akpm@linux-foundation.org: use macro arg name `x']
[arnd@arndb.de: include linux/pfn.h for PHYS_PFN definition]
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guests running within virtual machines might be affected by SMP effects even if
the guest itself is compiled without SMP support. This is an artifact of
interfacing with an SMP host while running an UP kernel. Using mandatory
barriers for this use-case would be possible but is often suboptimal.
In particular, virtio uses a bunch of confusing ifdefs to work around
this, while xen just uses the mandatory barriers.
To better handle this case, low-level virt_mb() etc macros are made available.
These are implemented trivially using the low-level __smp_xxx macros,
the purpose of these wrappers is to annotate those specific cases.
These have the same effect as smp_mb() etc when SMP is enabled, but generate
identical code for SMP and non-SMP systems. For example, virtual machine guests
should use virt_mb() rather than smp_mb() when synchronizing against a
(possibly SMP) host.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
On !SMP, most architectures define their
barriers as compiler barriers.
On SMP, most need an actual barrier.
Make it possible to remove the code duplication for
!SMP by defining low-level __smp_xxx barriers
which do not depend on the value of SMP, then
use them from asm-generic conditionally.
Besides reducing code duplication, these low level APIs will also be
useful for virtualization, where a barrier is sometimes needed even if
!SMP since we might be talking to another kernel on the same SMP system.
Both virtio and Xen drivers will benefit.
The smp_xxx variants should use __smp_XXX ones or barrier() depending on
SMP, identically for all architectures.
We keep ifndef guards around them for now - once/if all
architectures are converted to use the generic
code, we'll be able to remove these.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Allow architectures to override smp_store_release
and smp_load_acquire by guarding the defines
in asm-generic/barrier.h with ifndef directives.
This is in preparation to reusing asm-generic/barrier.h
on architectures which have their own definition
of these macros.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
With commit b92b8b35a2 ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- make the debugfs 'kernel_page_tables' file read-only, as it only
has read ops. (Borislav Petkov)
- micro-optimize clflush_cache_range() (Chris Wilson)
- swiotlb enhancements, which fixes certain KVM emulated devices
(Igor Mammedov)
- fix an LDT related debug message (Jan Beulich)
- modularize CONFIG_X86_PTDUMP (Kees Cook)
- tone down an overly alarming warning (Laura Abbott)
- Mark variable __initdata (Rasmus Villemoes)
- PAT additions (Toshi Kani)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Micro-optimise clflush_cache_range()
x86/mm/pat: Change free_memtype() to support shrinking case
x86/mm/pat: Add untrack_pfn_moved for mremap
x86/mm: Drop WARN from multi-BAR check
x86/LDT: Print the real LDT base address
x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN
x86/mm: Introduce max_possible_pfn
x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only
x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata
x86/mm: Turn CONFIG_X86_PTDUMP into a module
Pull locking updates from Ingo Molnar:
"So we have a laundry list of locking subsystem changes:
- continuing barrier API and code improvements
- futex enhancements
- atomics API improvements
- pvqspinlock enhancements: in particular lock stealing and adaptive
spinning
- qspinlock micro-enhancements"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
futex: Cleanup the goto confusion in requeue_pi()
futex: Remove pointless put_pi_state calls in requeue()
futex: Document pi_state refcounting in requeue code
futex: Rename free_pi_state() to put_pi_state()
futex: Drop refcount if requeue_pi() acquired the rtmutex
locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
lcoking/barriers, arch: Use smp barriers in smp_store_release()
locking/cmpxchg, arch: Remove tas() definitions
locking/pvqspinlock: Queue node adaptive spinning
locking/pvqspinlock: Allow limited lock stealing
locking/pvqspinlock: Collect slowpath lock statistics
sched/core, locking: Document Program-Order guarantees
locking, sched: Introduce smp_cond_acquire() and use it
locking/pvqspinlock, x86: Optimize the PV unlock code path
locking/qspinlock: Avoid redundant read of next pointer
locking/qspinlock: Prefetch the next node cacheline
locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
atomics: Add test for atomic operations with _relaxed variants
mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following
WARN_ON_ONCE() message in untrack_pfn().
WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0()
Call Trace:
[<ffffffff817729ea>] dump_stack+0x45/0x57
[<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0
[<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20
[<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0
[<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860
[<ffffffff811d3725>] unmap_vmas+0x55/0xb0
[<ffffffff811d916c>] unmap_region+0xac/0x120
[<ffffffff811db86a>] do_munmap+0x28a/0x460
[<ffffffff811dec33>] move_vma+0x1b3/0x2e0
[<ffffffff811df113>] SyS_mremap+0x3b3/0x510
[<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71
MREMAP_FIXED moves a pfnmap from old vma to new vma. untrack_pfn() is
called with the old vma after its pfnmap page table has been removed,
which causes follow_phys() to fail. The new vma has a new pfnmap to
the same pfn & cache type with VM_PAT set. Therefore, we only need to
clear VM_PAT from the old vma in this case.
Add untrack_pfn_moved(), which clears VM_PAT from a given old vma.
move_vma() is changed to call this function with the old vma when
VM_PFNMAP is set. move_vma() then calls do_munmap(), and untrack_pfn()
is a no-op since VM_PAT is cleared.
Reported-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
With commit b92b8b35a2 ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch replaces the cmpxchg() and xchg() calls in the native
qspinlock code with the more relaxed _acquire or _release versions of
those calls to enable other architectures to adopt queued spinlocks
with less memory barrier performance overhead.
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447114167-47185-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some architectures may want to override the default implementation
at compile time to do things inline. For example, ARM uses a
non-standard calling convention for better efficiency in this case.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
The default C implementation for the 128-bit cross product is abstracted
into the __arch_xprod_64() macro that can be overridden to let
architectures provide their own assembly optimized implementation.
There are many advantages to an assembly version for this operation.
Carry bit handling becomes trivial, and 32-bit shifts may be achieved
simply by inverting register pairs on some architectures. This has the
potential to be quite faster and use much fewer instructions.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
64-by-32-bit divisions are prominent in the kernel, even on 32-bit
machines. Luckily, many of them use a constant divisor that allows
for a much faster multiplication by the divisor's reciprocal.
The compiler already performs this optimization when compiling a 32-by-32
division with a constant divisor. Unfortunately, on 32-bit machines, gcc
does not optimize 64-by-32 divisions in that case, except for constant
divisors that happen to be a power of 2.
Let's avoid the slow path whenever the divisor is constant by manually
computing the reciprocal ourselves and performing the multiplication
inline. In most cases, this improves performance of 64-by-32 divisions
by about two orders of magnitude compared to the __div64_32() fallback,
especially on architectures lacking a native div instruction.
The algorithm used here comes from the existing ARM code.
The __div64_const32_is_OK macro can be predefined by architectures to
disable this optimization in some cases. For example, some ancient gcc
version on ARM would crash with an ICE when fed this code.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Alexey Brodkin <abrodkin@synopsys.com>
Let's perform the obvious mask and shift operation in this case.
On 32-bit targets, gcc is able to do the same thing with a constant
divisor that happens to be a power of two i.e. it turns the division
into an inline shift, but it doesn't hurt to be explicit.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig
to clean up various abuses of headers in there. The patch to rename the
io-64-nonatomic-*.h headers caused some conflicts with new users, so I
added a workaround that we can remove in the next merge window.
The only other patch is a warning fix from Marek Vasut
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVjzaf2CrR//JCVInAQImmhAA20fZ91sUlnA5skKNPT1phhF6Z7UF2Sx5
nPKcHQD3HA3lT1OKfPBYvCo+loYflvXFLaQThVylVcnE/8ecAEMtft4nnGW2nXvh
sZqHIZ8fszTB53cynAZKTjdobD1wu33Rq7XRzg0ugn1mdxFkOzCHW/xDRvWRR5TL
rdQjzzgvn2PNlqFfHlh6cZ5ykShM36AIKs3WGA0H0Y/aYsE9GmDOAUp41q1mLXnA
4lKQaIxoeOa+kmlsUB0wEHUecWWWJH4GAP+CtdKzTX9v12bGNhmiKUMCETG78BT3
uL8irSqaViNwSAS9tBxSpqvmVUsa5aCA5M3MYiO+fH9ifd7wbR65g/wq39D3Pc01
KnZ3BNVRW5XSA3c86pr8vbg/HOynUXK8TN0lzt6rEk8bjoPBNEDy5YWzy0t6reVe
wX65F+ver8upjOKe9yl2Jsg+5Kcmy79GyYjLUY3TU2mZ+dIdScy/jIWatXe/OTKZ
iB4Ctc4MDe9GDECmlPOWf98AXqsBUuKQiWKCN/OPxLtFOeWBvi4IzvFuO8QvnL9p
jZcRDmIlIWAcDX/2wMnLjV+Hqi3EeReIrYznxTGnO7HHVInF555GP51vFaG5k+SN
smJQAB0/sostmC1OCCqBKq5b6/li95/No7+0v0SUhJJ5o76AR5CcNsnolXesw1fu
vTUkB/I66Hk=
=dQKG
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
"The asm-generic changes for 4.4 are mostly a series from Christoph
Hellwig to clean up various abuses of headers in there. The patch to
rename the io-64-nonatomic-*.h headers caused some conflicts with new
users, so I added a workaround that we can remove in the next merge
window.
The only other patch is a warning fix from Marek Vasut"
* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: temporarily add back asm-generic/io-64-nonatomic*.h
asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
gpio-mxc: stop including <asm-generic/bug>
n_tracesink: stop including <asm-generic/bug>
n_tracerouter: stop including <asm-generic/bug>
mlx5: stop including <asm-generic/kmap_types.h>
hifn_795x: stop including <asm-generic/kmap_types.h>
drbd: stop including <asm-generic/kmap_types.h>
move count_zeroes.h out of asm-generic
move io-64-nonatomic*.h out of asm-generic
- ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).
The most significant change is to allow the AML debugger to be
built into the kernel. On top of that there is an update related
to the NFIT table (the ACPI persistent memory interface)
and a few fixes and cleanups.
- ACPI CPPC2 (Collaborative Processor Performance Control v2)
support along with a cpufreq frontend (Ashwin Chaugule).
This can only be enabled on ARM64 at this point.
- New ACPI infrastructure for the early probing of IRQ chips and
clock sources (Marc Zyngier).
- Support for a new hierarchical properties extension of the ACPI
_DSD (Device Specific Data) device configuration object allowing
the kernel to handle hierarchical properties (provided by the
platform firmware this way) automatically and make them available
to device drivers via the generic device properties interface
(Rafael Wysocki).
- Generic device properties API extension to obtain an index of
certain string value in an array of strings, along the lines of
of_property_match_string(), but working for all of the supported
firmware node types, and support for the "dma-names" device
property based on it (Mika Westerberg).
- ACPI core fix to parse the MADT (Multiple APIC Description Table)
entries in the order expected by platform firmware (and mandated
by the specification) to avoid confusion on systems with more than
255 logical CPUs (Lukasz Anaczkowski).
- Consolidation of the ACPI-based handling of PCI host bridges
on x86 and ia64 (Jiang Liu).
- ACPI core fixes to ensure that the correct IRQ number is used to
represent the SCI (System Control Interrupt) in the cases when
it has been re-mapped (Chen Yu).
- New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).
- ACPI EC driver fixes (Lv Zheng).
- Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
Kosina, Rami Rosen, Rasmus Villemoes).
- New mechanism in the PM core allowing drivers to check if the
platform firmware is going to be involved in the upcoming system
suspend or if it has been involved in the suspend the system is
resuming from at the moment (Rafael Wysocki).
This should allow drivers to optimize their suspend/resume
handling in some cases and the changes include a couple of users
of it (the i8042 input driver, PCI PM).
- PCI PM fix to prevent runtime-suspended devices with PME enabled
from being resumed during system suspend even if they aren't
configured to wake up the system from sleep (Rafael Wysocki).
- New mechanism to report the number of a wakeup IRQ that woke up
the system from sleep last time (Alexandra Yates).
- Removal of unused interfaces from the generic power domains
framework and fixes related to latency measurements in that
code (Ulf Hansson, Daniel Lezcano).
- cpufreq core sysfs interface rework to make it handle CPUs that
share performance scaling settings (represented by a common
cpufreq policy object) more symmetrically (Viresh Kumar).
This should help to simplify the CPU offline/online handling among
other things.
- cpufreq core fixes and cleanups (Viresh Kumar).
- intel_pstate fixes related to the Turbo Activation Ratio (TAR)
mechanism on client platforms which causes the turbo P-states
range to vary depending on platform firmware settings (Srinivas
Pandruvada).
- intel_pstate sysfs interface fix (Prarit Bhargava).
- Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
Bhat, Luis de Bethencourt).
- cpuidle mvebu driver cleanups (Russell King).
- OPP (Operating Performance Points) framework code reorganization
to make it more maintainable (Viresh Kumar).
- Intel Broxton support for the RAPL (Running Average Power Limits)
power capping driver (Amy Wiles).
- Assorted power management code fixes and cleanups (Dan Carpenter,
Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
Villemoes).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJWOC9oAAoJEILEb/54YlRx/c8P/joflwoFsISwJccG62YTQMuc
bMQKM4Kw0vl5La8+pkLpe5t6+mW7l81UFtYF6Dzd8LOKlD9sszD34z1lHmCeT/oR
wn0uZpHagRyLMUfoyiEtlU/VRU6WQNNtS3EgjwUi7xgFz9Q0pjcCZ9OQ6vKov1j5
+6j40ODif5sgo+2vl+rztJiV0SIMkYdkgNqgfN1FE9bdLA2Zkk+PxxJbtGQORuDu
O/K+XhQT2xWquVWi/1p+VtQxs5glBS1oKm0kogV5bElCvNTRNIVABUNcjogITQwo
QSAKgoCKIoaIl5jtDT6u5dc0y67q/dMtqOY9fOCcOz1Z7jbWQzR8D7mpFWIsJUPK
K2LClI3t85ynpN6Jref246A6+C9nwB8JMAiAR11oBw7WbBlkd6tbRgcT5B+iz8UE
FuCCif7pha/Fs+Jt1YRazscIqteQ2bAhhxikuIPMfw2M6M67MNfVNeKA1bAoSM34
dH7JsilblitvV7shrwJHwXPXCOF2jEPoK8I4/q2+TR5qUxEpRJjelQxXGSaQScMZ
iNnjeTgv8H8q+rY5Yjzsl4pxP0Fvf7IuqkptWOJbgepg4cQc9pS87wOpY3uEeQzr
H7ruaQJFCnLO4aXbPNClsiJARhrBk+qMlsh4vBEyCJ2T0ucb+nIUcN4BTi8t85yl
X97BfHHUiDoUrnIsNids
=1gaH
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"Quite a new features are included this time.
First off, the Collaborative Processor Performance Control interface
(version 2) defined by ACPI will now be supported on ARM64 along with
a cpufreq frontend for CPU performance scaling.
Second, ACPI gets a new infrastructure for the early probing of IRQ
chips and clock sources (along the lines of the existing similar
mechanism for DT).
Next, the ACPI core and the generic device properties API will now
support a recently introduced hierarchical properties extension of the
_DSD (Device Specific Data) ACPI device configuration object. If the
ACPI platform firmware uses that extension to organize device
properties in a hierarchical way, the kernel will automatically handle
it and make those properties available to device drivers via the
generic device properties API.
It also will be possible to build the ACPICA's AML interpreter
debugger into the kernel now and use that to diagnose AML-related
problems more efficiently. In the future, this should make it
possible to single-step AML execution and do similar things.
Interesting stuff, although somewhat experimental at this point.
Finally, the PM core gets a new mechanism that can be used by device
drivers to distinguish between suspend-to-RAM (based on platform
firmware support) and suspend-to-idle (or other variants of system
suspend the platform firmware is not involved in) and possibly
optimize their device suspend/resume handling accordingly.
In addition to that, some existing features are re-organized quite
substantially.
First, the ACPI-based handling of PCI host bridges on x86 and ia64 is
unified and the common code goes into the ACPI core (so as to reduce
code duplication and eliminate non-essential differences between the
two architectures in that area).
Second, the Operating Performance Points (OPP) framework is
reorganized to make the code easier to find and follow.
Next, the cpufreq core's sysfs interface is reorganized to get rid of
the "primary CPU" concept for configurations in which the same
performance scaling settings are shared between multiple CPUs.
Finally, some interfaces that aren't necessary any more are dropped
from the generic power domains framework.
On top of the above we have some minor extensions, cleanups and bug
fixes in multiple places, as usual.
Specifics:
- ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).
The most significant change is to allow the AML debugger to be
built into the kernel. On top of that there is an update related
to the NFIT table (the ACPI persistent memory interface) and a few
fixes and cleanups.
- ACPI CPPC2 (Collaborative Processor Performance Control v2) support
along with a cpufreq frontend (Ashwin Chaugule).
This can only be enabled on ARM64 at this point.
- New ACPI infrastructure for the early probing of IRQ chips and
clock sources (Marc Zyngier).
- Support for a new hierarchical properties extension of the ACPI
_DSD (Device Specific Data) device configuration object allowing
the kernel to handle hierarchical properties (provided by the
platform firmware this way) automatically and make them available
to device drivers via the generic device properties interface
(Rafael Wysocki).
- Generic device properties API extension to obtain an index of
certain string value in an array of strings, along the lines of
of_property_match_string(), but working for all of the supported
firmware node types, and support for the "dma-names" device
property based on it (Mika Westerberg).
- ACPI core fix to parse the MADT (Multiple APIC Description Table)
entries in the order expected by platform firmware (and mandated by
the specification) to avoid confusion on systems with more than 255
logical CPUs (Lukasz Anaczkowski).
- Consolidation of the ACPI-based handling of PCI host bridges on x86
and ia64 (Jiang Liu).
- ACPI core fixes to ensure that the correct IRQ number is used to
represent the SCI (System Control Interrupt) in the cases when it
has been re-mapped (Chen Yu).
- New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).
- ACPI EC driver fixes (Lv Zheng).
- Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
Kosina, Rami Rosen, Rasmus Villemoes).
- New mechanism in the PM core allowing drivers to check if the
platform firmware is going to be involved in the upcoming system
suspend or if it has been involved in the suspend the system is
resuming from at the moment (Rafael Wysocki).
This should allow drivers to optimize their suspend/resume handling
in some cases and the changes include a couple of users of it (the
i8042 input driver, PCI PM).
- PCI PM fix to prevent runtime-suspended devices with PME enabled
from being resumed during system suspend even if they aren't
configured to wake up the system from sleep (Rafael Wysocki).
- New mechanism to report the number of a wakeup IRQ that woke up the
system from sleep last time (Alexandra Yates).
- Removal of unused interfaces from the generic power domains
framework and fixes related to latency measurements in that code
(Ulf Hansson, Daniel Lezcano).
- cpufreq core sysfs interface rework to make it handle CPUs that
share performance scaling settings (represented by a common cpufreq
policy object) more symmetrically (Viresh Kumar).
This should help to simplify the CPU offline/online handling among
other things.
- cpufreq core fixes and cleanups (Viresh Kumar).
- intel_pstate fixes related to the Turbo Activation Ratio (TAR)
mechanism on client platforms which causes the turbo P-states range
to vary depending on platform firmware settings (Srinivas
Pandruvada).
- intel_pstate sysfs interface fix (Prarit Bhargava).
- Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
Bhat, Luis de Bethencourt).
- cpuidle mvebu driver cleanups (Russell King).
- OPP (Operating Performance Points) framework code reorganization to
make it more maintainable (Viresh Kumar).
- Intel Broxton support for the RAPL (Running Average Power Limits)
power capping driver (Amy Wiles).
- Assorted power management code fixes and cleanups (Dan Carpenter,
Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
Villemoes)"
* tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (108 commits)
cpufreq: postfix policy directory with the first CPU in related_cpus
cpufreq: create cpu/cpufreq/policyX directories
cpufreq: remove cpufreq_sysfs_{create|remove}_file()
cpufreq: create cpu/cpufreq at boot time
cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
PM / Domains: Merge measurements for PM QoS device latencies
PM / Domains: Don't measure ->start|stop() latency in system PM callbacks
PM / clk: Fix broken build due to non-matching code and header #ifdefs
ACPI / Documentation: add copy_dsdt to ACPI format options
ACPI / sysfs: correctly check failing memory allocation
ACPI / video: Add a quirk to force native backlight on Lenovo IdeaPad S405
ACPI / CPPC: Fix potential memory leak
ACPI / CPPC: signedness bug in register_pcc_channel()
ACPI / PAD: power_saving_thread() is not freezable
ACPI / PM: Fix incorrect wakeup IRQ setting during suspend-to-idle
ACPI: Using correct irq when waiting for events
ACPI: Use correct IRQ when uninstalling ACPI interrupt handler
cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver
cpuidle: mvebu: clean up multiple platform drivers
...
Pull s390 updates from Martin Schwidefsky:
"There is only one new feature in this pull for the 4.4 merge window,
most of it is small enhancements, cleanup and bug fixes:
- Add the s390 backend for the software dirty bit tracking. This
adds two new pgtable functions pte_clear_soft_dirty and
pmd_clear_soft_dirty which is why there is a hit to
arch/x86/include/asm/pgtable.h in this pull request.
- A series of cleanup patches for the AP bus, this includes the
removal of the support for two outdated crypto cards (PCICC and
PCICA).
- The irq handling / signaling on buffer full in the runtime
instrumentation code is dropped.
- Some micro optimizations: remove unnecessary memory barriers for a
couple of functions: [smb_]rmb, [smb_]wmb, atomics, bitops, and for
spin_unlock. Use the builtin bswap if available and make
test_and_set_bit_lock more cache friendly.
- Statistics and a tracepoint for the diagnose calls to the
hypervisor.
- The CPU measurement facility support to sample KVM guests is
improved.
- The vector instructions are now always enabled for user space
processes if the hardware has the vector facility. This simplifies
the FPU handling code. The fpu-internal.h header is split into fpu
internals, api and types just like x86.
- Cleanup and improvements for the common I/O layer.
- Rework udelay to solve a problem with kprobe. udelay has busy loop
semantics but still uses an idle processor state for the wait"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits)
s390: remove runtime instrumentation interrupts
s390/cio: de-duplicate subchannel validation
s390/css: unneeded initialization in for_each_subchannel
s390/Kconfig: use builtin bswap
s390/dasd: fix disconnected device with valid path mask
s390/dasd: fix invalid PAV assignment after suspend/resume
s390/dasd: fix double free in dasd_eckd_read_conf
s390/kernel: fix ptrace peek/poke for floating point registers
s390/cio: move ccw_device_stlck functions
s390/cio: move ccw_device_call_handler
s390/topology: reduce per_cpu() invocations
s390/nmi: reduce size of percpu variable
s390/nmi: fix terminology
s390/nmi: remove casts
s390/nmi: remove pointless error strings
s390: don't store registers on disabled wait anymore
s390: get rid of __set_psw_mask()
s390/fpu: split fpu-internal.h into fpu internals, api, and type headers
s390/dasd: fix list_del corruption after lcu changes
s390/spinlock: remove unneeded serializations at unlock
...
Pull scheduler changes from Ingo Molnar:
"The main changes in this cycle were:
- sched/fair load tracking fixes and cleanups (Byungchul Park)
- Make load tracking frequency scale invariant (Dietmar Eggemann)
- sched/deadline updates (Juri Lelli)
- stop machine fixes, cleanups and enhancements for bugs triggered by
CPU hotplug stress testing (Oleg Nesterov)
- scheduler preemption code rework: remove PREEMPT_ACTIVE and related
cleanups (Peter Zijlstra)
- Rework the sched_info::run_delay code to fix races (Peter Zijlstra)
- Optimize per entity utilization tracking (Peter Zijlstra)
- ... misc other fixes, cleanups and smaller updates"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS
sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()
sched: Start stopper early
stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
sched/x86: Fix typo in __switch_to() comments
sched/core: Remove a parameter in the migrate_task_rq() function
sched/core: Drop unlikely behind BUG_ON()
sched/core: Fix task and run queue sched_info::run_delay inconsistencies
sched/numa: Fix task_tick_fair() from disabling numa_balancing
sched/core: Add preempt_count invariant check
sched/core: More notrace annotations
sched/core: Kill PREEMPT_ACTIVE
sched/core, sched/x86: Kill thread_info::saved_preempt_count
sched/core: Simplify preempt_count tests
sched/core: Robustify preemption leak checks
sched/core: Stop setting PREEMPT_ACTIVE
...
This seems to be a mis-reading of how alpha memory ordering works, and
is not backed up by the alpha architecture manual. The helper functions
don't do anything special on any other architectures, and the arguments
that support them being safe on other architectures also argue that they
are safe on alpha.
Basically, the "control dependency" is between a previous read and a
subsequent write that is dependent on the value read. Even if the
subsequent write is actually done speculatively, there is no way that
such a speculative write could be made visible to other cpu's until it
has been committed, which requires validating the speculation.
Note that most weakely ordered architectures (very much including alpha)
do not guarantee any ordering relationship between two loads that depend
on each other on a control dependency:
read A
if (val == 1)
read B
because the conditional may be predicted, and the "read B" may be
speculatively moved up to before reading the value A. So we require the
user to insert a smp_rmb() between the two accesses to be correct:
read A;
if (A == 1)
smp_rmb()
read B
Alpha is further special in that it can break that ordering even if the
*address* of B depends on the read of A, because the cacheline that is
read later may be stale unless you have a memory barrier in between the
pointer read and the read of the value behind a pointer:
read ptr
read offset(ptr)
whereas all other weakly ordered architectures guarantee that the data
dependency (as opposed to just a control dependency) will order the two
accesses. As a result, alpha needs a "smp_read_barrier_depends()" in
between those two reads for them to be ordered.
The coontrol dependency that "READ_ONCE_CTRL()" and "atomic_read_ctrl()"
had was a control dependency to a subsequent *write*, however, and
nobody can finalize such a subsequent write without having actually done
the read. And were you to write such a value to a "stale" cacheline
(the way the unordered reads came to be), that would seem to lose the
write entirely.
So the things that make alpha able to re-order reads even more
aggressively than other weak architectures do not seem to be relevant
for a subsequent write. Alpha memory ordering may be strange, but
there's no real indication that it is *that* strange.
Also, the alpha architecture reference manual very explicitly talks
about the definition of "Dependence Constraints" in section 5.6.1.7,
where a preceding read dominates a subsequent write.
Such a dependence constraint admittedly does not impose a BEFORE (alpha
architecture term for globally visible ordering), but it does guarantee
that there can be no "causal loop". I don't see how you could avoid
such a loop if another cpu could see the stored value and then impact
the value of the first read. Put another way: the read and the write
could not be seen as being out of order wrt other cpus.
So I do not see how these "x_ctrl()" functions can currently be necessary.
I may have to eat my words at some point, but in the absense of clear
proof that alpha actually needs this, or indeed even an explanation of
how alpha could _possibly_ need it, I do not believe these functions are
called for.
And if it turns out that alpha really _does_ need a barrier for this
case, that barrier still should not be "smp_read_barrier_depends()".
We'd have to make up some new speciality barrier just for alpha, along
with the documentation for why it really is necessary.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul E McKenney <paulmck@us.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull locking changes from Ingo Molnar:
"The main changes in this cycle were:
- More gradual enhancements to atomic ops: new atomic*_read_ctrl()
ops, synchronize atomic_{read,set}() ordering requirements between
architectures, add atomic_long_t bitops. (Peter Zijlstra)
- Add _{relaxed|acquire|release}() variants for inc/dec atomics and
use them in various locking primitives: mutex, rtmutex, mcs, rwsem.
This enables weakly ordered architectures (such as arm64) to make
use of more locking related optimizations. (Davidlohr Bueso)
- Implement atomic[64]_{inc,dec}_relaxed() on ARM. (Will Deacon)
- Futex kernel data cache footprint micro-optimization. (Rasmus
Villemoes)
- pvqspinlock runtime overhead micro-optimization. (Waiman Long)
- misc smaller fixlets"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec}
locking/rwsem: Use acquire/release semantics
locking/mcs: Use acquire/release semantics
locking/rtmutex: Use acquire/release semantics
locking/mutex: Use acquire/release semantics
locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics
atomic: Implement atomic_read_ctrl()
atomic, arch: Audit atomic_{read,set}()
atomic: Add atomic_long_t bitops
futex: Force hot variables into a single cache line
locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL
locking/osq: Relax atomic semantics
locking/qrwlock: Rename ->lock to ->wait_lock
locking/Documentation/lockstat: Fix typo - lokcing -> locking
locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h
- pgtable-generic.c: Fold individual #ifdef for each helper into a top
level #ifdef. Makes code more readable
- Converted the stub helpers for !THP to BUILD_BUG() vs. runtime BUG()
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20151009133450.GA8597@node
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
New users of these files still start showing up in linux-next, so it's
better to have a migration strategy. All existing users as of 4.3-rc4
are converted to use linux/io-64-nonatomic-*.h, and after 4.4-rc1
we can change all the new ones that have come in since, and then
remove this file again.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: LKP project <lkp@linux.intel.com>
This change is similar to e001bbae71
ARM: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
A recent change in kernel/acct.c added a new warning for many
configurations using generic __xchg() implementation:
In file included from ./arch/nios2/include/asm/cmpxchg.h:12:0,
from include/asm-generic/atomic.h:18,
from arch/nios2/include/generated/asm/atomic.h:1,
from include/linux/atomic.h:4,
from include/linux/spinlock.h:406,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from kernel/acct.c:46:
kernel/acct.c: In function 'acct_pin_kill':
include/asm-generic/cmpxchg.h:94:3: warning: value computed is not used [-Wunused-value]
((__typeof__(*(ptr)))__cmpxchg_local_generic((ptr), (unsigned long)(o),\
^
include/asm-generic/cmpxchg.h:102:28: note: in expansion of macro 'cmpxchg_local'
#define cmpxchg(ptr, o, n) cmpxchg_local((ptr), (o), (n))
^
kernel/acct.c:177:2: note: in expansion of macro 'cmpxchg'
cmpxchg(&acct->ns->bacct, pin, NULL);
^
The code is in fact correct, it's just a cmpxchg() call that
intentionally ignores the result, and no other code does that. The
warning does not show up on x86 because of the way that its cmpxchg()
macro is written. This changes the asm-ggeneric implementation to use
a similar construct with a compound expression instead of a typecast,
which causes the compiler to not complain about an unused result.
Fix the other macros in this file in a similar way, and place them
just below their function implementations.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This header contains a few helpers currenly only used by the mpi
implementation, and not default implementation of architecture code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
These are not implementations of default architecture code but helpers
for drivers. Move them to the place they belong to.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There are primitives to create and query the software dirty bits
in a pte or pmd. But the clearing of the software dirty bits is done
in common code with x86 specific page table functions.
Add the missing architecture primitives to clear the software dirty
bits to allow the feature to be used on non-x86 systems, e.g. the
s390 architecture.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As of 654672d4ba (locking/atomics: Add _{acquire|release|relaxed}()
variants of some atomic operations) and 6d79ef2d30 (locking, asm-generic:
Add _{relaxed|acquire|release}() variants for 'atomic_long_t'), weakly
ordered archs can benefit from more relaxed use of barriers when locking
and unlocking, instead of regular full barrier semantics. While currently
only arm64 supports such optimizations, updating corresponding locking
primitives serves for other archs to immediately benefit as well, once the
necessary machinery is implemented of course.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E.McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443643395-17016-6-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As of 654672d4ba (locking/atomics: Add _{acquire|release|relaxed}()
variants of some atomic operations) and 6d79ef2d30 (locking, asm-generic:
Add _{relaxed|acquire|release}() variants for 'atomic_long_t'), weakly
ordered archs can benefit from more relaxed use of barriers when locking
and unlocking, instead of regular full barrier semantics. While currently
only arm64 supports such optimizations, updating corresponding locking
primitives serves for other archs to immediately benefit as well, once the
necessary machinery is implemented of course.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E.McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443643395-17016-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Similar to what we have for regular add/sub calls. For now, no actual arch
implements them, so everyone falls back to the default atomics... iow,
nothing changes. These will be used in future primitives.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E.McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443643395-17016-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Assuming units of PREEMPT_DISABLE_OFFSET for preempt_count() numbers.
Now that TASK_DEAD no longer results in preempt_count() == 3 during
scheduling, we will always call context_switch() with preempt_count()
== 2.
However, we don't always end up with preempt_count() == 2 in
finish_task_switch() because new tasks get created with
preempt_count() == 1.
Create FORK_PREEMPT_COUNT and set it to 2 and use that in the right
places. Note that we cannot use INIT_PREEMPT_COUNT as that serves
another purpose (boot).
After this, preempt_count() is invariant across the context switch,
with exception of PREEMPT_ACTIVE.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull strscpy string copy function implementation from Chris Metcalf.
Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.
The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.
strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result. To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.
strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string. Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated. It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.
strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG. It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.
So why did I waffle about this for so long?
Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.
And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.
So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches. Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.
* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: use global strscpy() rather than private copy
string: provide strscpy()
Make asm/word-at-a-time.h available on all architectures
DT enjoys a rather nice probing infrastructure for clocksources,
while ACPI is so far stuck into a very distant past.
This patch introduces a declarative API, allowing clocksources
to be self-contained and be called when parsing the GTDT table.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
DT enjoys a rather nice probing infrastructure for irqchips, while
ACPI is so far stuck into a very distant past.
This patch introduces a declarative API, allowing irqchips to be
self-contained and be called when a particular entry is matched
in the MADT table.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
IRQ controllers and timers are the two types of device the kernel
requires before being able to use the device driver model.
ACPI so far lacks a proper probing infrastructure similar to the one
we have with DT, where we're able to declare IRQ chips and
clocksources inside the driver code, and let the core code pick it up
and call us back on a match. This leads to all kind of really ugly
hacks all over the arm64 code and even in the ACPI layer.
In order to allow some basic probing based on the ACPI tables,
introduce "struct acpi_probe_entry" which contains just enough
data and callbacks to match a table, an optional subtable, and
call a probe function. A driver can, at build time, register itself
and expect being called if the right entry exists in the ACPI
table.
A acpi_probe_device_table() is provided, taking an identifier for
a set of acpi_prove_entries, and iterating over the registered
entries.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
more conveniently use atomics in control dependencies.
Since we can assume atomic_read() implies a READ_ONCE(), we must only
emit an extra smp_read_barrier_depends() in order to upgrade to
READ_ONCE_CTRL() semantics.
Requested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20150918115637.GM3604@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch makes sure that atomic_{read,set}() are at least
{READ,WRITE}_ONCE().
We already had the 'requirement' that atomic_read() should use
ACCESS_ONCE(), and most archs had this, but a few were lacking.
All are now converted to use READ_ONCE().
And, by a symmetry and general paranoia argument, upgrade atomic_set()
to use WRITE_ONCE().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: james.hogan@imgtec.com
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Cc: will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When adding the atomic bitops, I seem to have forgotten about
atomic_long_t, fix this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull libnvdimm fixes from Dan Williams:
- a boot regression (since v4.2) fix for some ARM configurations from
Tyler
- regression (since v4.1) fixes for mkfs.xfs on a DAX enabled device
from Jeff. These are tagged for -stable.
- a pair of locking fixes from Axel that are hidden from lockdep since
they involve device_lock(). The "btt" one is tagged for -stable, the
other only applies to the new "pfn" mechanism in v4.3.
- a fix for the pmem ->rw_page() path to use wmb_pmem() from Ross.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
mm: fix type cast in __pfn_to_phys()
pmem: add proper fencing to pmem_rw_page()
libnvdimm: pfn_devs: Fix locking in namespace_store
libnvdimm: btt_devs: Fix locking in namespace_store
blockdev: don't set S_DAX for misaligned partitions
dax: fix O_DIRECT I/O to the last block of a blockdev
The various definitions of __pfn_to_phys() have been consolidated to
use a generic macro in include/asm-generic/memory_model.h. This hit
mainline in the form of 012dcef3f0 "mm: move __phys_to_pfn and
__pfn_to_phys to asm/generic/memory_model.h". When the generic macro
was implemented the type cast to phys_addr_t was dropped which caused
boot regressions on ARM platforms with more than 4GB of memory and
LPAE enabled.
It was suggested to use PFN_PHYS() defined in include/linux/pfn.h
as provides the correct logic and avoids further duplication.
Reported-by: kernelci.org bot <bot@kernelci.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tyler Baker <tyler.baker@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>