Commit Graph

322 Commits

Author SHA1 Message Date
Huacai Chen
7f2a83f1c2 KVM: MIPS: Add CPUCFG emulation for Loongson-3
Loongson-3 overrides lwc2 instructions to implement CPUCFG and CSR
read/write functions. These instructions all cause guest exit so CSR
doesn't benifit KVM guest (and there are always legacy methods to
provide the same functions as CSR). So, we only emulate CPUCFG and let
it return a reduced feature list (which means the virtual CPU doesn't
have any other advanced features, including CSR) in KVM.

Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-12-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 13:51:33 -04:00
WANG Xuerui
f06da27eb8 MIPS: Expose Loongson CPUCFG availability via HWCAP
The point is to allow userspace to probe for CPUCFG without possibly
triggering invalid instructions. In addition to that, future Loongson
feature bits could all be stuffed into CPUCFG bit fields (or "leaves"
in x86-speak) if Loongson does not make mistakes, so ELF HWCAP bits are
conserved.

Userspace can determine native CPUCFG availability by checking the LCSRP
(Loongson CSR Present) bit in CPUCFG output after seeing CPUCFG bit in
HWCAP. Native CPUCFG always sets the LCSRP bit, as CPUCFG is part of the
Loongson CSR ASE, while the emulation intentionally leaves this bit
clear.

The other existing Loongson-specific HWCAP bits are, to my best
knowledge, unused, as

(1) they are fairly recent additions,
(2) Loongson never back-ported the patch into their kernel fork, and
(3) Loongson's existing installed base rarely upgrade, if ever;

However, they are still considered userspace ABI, hence unfortunately
unremovable. But hopefully at least we could stop adding new Loongson
HWCAP bits in the future.

Cc: Paul Burton <paulburton@kernel.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Huacai Chen <chenhc@lemote.com>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-31 10:53:56 +02:00
Huacai Chen
f83e4f9896 MIPS: Loongson-3: Add some unaligned instructions emulation
1, Add unaligned gslq, gssq, gslqc1, gssqc1 emulation;
2, Add unaligned gsl{h, w, d}x, gss{h, w, d}x emulation;
3, Add unaligned gslwxc1, gsswxc1, gsldxc1, gssdxc1 emulation.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Pei Huang <huangpei@loongson.cn>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-04-24 18:18:38 +02:00
Masahiro Yamada
0fb9dc2867 arch: sembuf.h: make uapi asm/sembuf.h self-contained
Userspace cannot compile <asm/sembuf.h> due to some missing type
definitions.  For example, building it for x86 fails as follows:

    CC      usr/include/asm/sembuf.h.s
  In file included from <command-line>:32:0:
  usr/include/asm/sembuf.h:17:20: error: field `sem_perm' has incomplete type
    struct ipc64_perm sem_perm; /* permissions .. see ipc.h */
                      ^~~~~~~~
  usr/include/asm/sembuf.h:24:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t sem_otime; /* last semop time */
    ^~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:25:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused1;
    ^~~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:26:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t sem_ctime; /* last change time */
    ^~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:27:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused2;
    ^~~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:29:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t sem_nsems; /* no. of semaphores in array */
    ^~~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:30:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused3;
    ^~~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:31:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused4;
    ^~~~~~~~~~~~~~~~

It is just a matter of missing include directive.

Include <asm/ipcbuf.h> to make it self-contained, and add it to
the compile-test coverage.

Link: http://lkml.kernel.org/r/20191030063855.9989-3-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:14 -08:00
Masahiro Yamada
9ef0e00418 arch: msgbuf.h: make uapi asm/msgbuf.h self-contained
Userspace cannot compile <asm/msgbuf.h> due to some missing type
definitions.  For example, building it for x86 fails as follows:

    CC      usr/include/asm/msgbuf.h.s
  In file included from usr/include/asm/msgbuf.h:6:0,
                   from <command-line>:32:
  usr/include/asm-generic/msgbuf.h:25:20: error: field `msg_perm' has incomplete type
    struct ipc64_perm msg_perm;
                      ^~~~~~~~
  usr/include/asm-generic/msgbuf.h:27:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t msg_stime; /* last msgsnd time */
    ^~~~~~~~~~~~~~~
  usr/include/asm-generic/msgbuf.h:28:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t msg_rtime; /* last msgrcv time */
    ^~~~~~~~~~~~~~~
  usr/include/asm-generic/msgbuf.h:29:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t msg_ctime; /* last change time */
    ^~~~~~~~~~~~~~~
  usr/include/asm-generic/msgbuf.h:41:2: error: unknown type name `__kernel_pid_t'
    __kernel_pid_t msg_lspid; /* pid of last msgsnd */
    ^~~~~~~~~~~~~~
  usr/include/asm-generic/msgbuf.h:42:2: error: unknown type name `__kernel_pid_t'
    __kernel_pid_t msg_lrpid; /* last receive pid */
    ^~~~~~~~~~~~~~

It is just a matter of missing include directive.

Include <asm/ipcbuf.h> to make it self-contained, and add it to
the compile-test coverage.

Link: http://lkml.kernel.org/r/20191030063855.9989-2-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:14 -08:00
Arnd Bergmann
1bf883c1a9 y2038: stat: avoid 'time_t' in 'struct stat'
The time_t definition may differ between user space and kernel space,
so replace time_t with an unambiguous 'long' for the mips and sparc.

The same structures also contain 'off_t', which has the same problem,
so replace that as well on those two architectures and powerpc.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15 14:38:28 +01:00
Arnd Bergmann
caf5e32d4e y2038: ipc: remove __kernel_time_t reference from headers
There are two structures based on time_t that conflict between libc and
kernel: timeval and timespec. Both are now renamed to __kernel_old_timeval
and __kernel_old_timespec.

For time_t, the old typedef is still __kernel_time_t. There is nothing
wrong with that name, but it would be nice to not use that going forward
as this type is used almost only in deprecated interfaces because of
the y2038 overflow.

In the IPC headers (msgbuf.h, sembuf.h, shmbuf.h), __kernel_time_t is only
used for the 64-bit variants, which are not deprecated.

Change these to a plain 'long', which is the same type as __kernel_time_t
on all 64-bit architectures anyway, to reduce the number of users of the
old type.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15 14:38:28 +01:00
Jiaxun Yang
38dffe1e4d
MIPS: elf_hwcap: Export userspace ASEs
A Golang developer reported MIPS hwcap isn't reflecting instructions
that the processor actually supported so programs can't apply optimized
code at runtime.

Thus we export the ASEs that can be used in userspace programs.

Reported-by: Meng Zhuo <mengzhuo1203@gmail.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: linux-mips@vger.kernel.org
Cc: Paul Burton <paul.burton@mips.com>
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Paul Burton <paul.burton@mips.com>
2019-10-10 11:57:36 -07:00
Minchan Kim
1a4e58cce8 mm: introduce MADV_PAGEOUT
When a process expects no accesses to a certain memory range for a long
time, it could hint kernel that the pages can be reclaimed instantly but
data should be preserved for future use.  This could reduce workingset
eviction so it ends up increasing performance.

This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
MADV_PAGEOUT can be used by a process to mark a memory range as not
expected to be used for a long time so that kernel reclaims *any LRU*
pages instantly.  The hint can help kernel in deciding which pages to
evict proactively.

A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
intentionally because it's automatically bounded by PMD size.  If PMD
size(e.g., 256) makes some trouble, we could fix it later by limit it to
SWAP_CLUSTER_MAX[1].

- man-page material

MADV_PAGEOUT (since Linux x.x)

Do not expect access in the near future so pages in the specified
regions could be reclaimed instantly regardless of memory pressure.
Thus, access in the range after successful operation could cause
major page fault but never lose the up-to-date contents unlike
MADV_DONTNEED. Pages belonging to a shared mapping are only processed
if a write access is allowed for the calling process.

MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
VM_PFNMAP pages.

[1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/

[minchan@kernel.org: clear PG_active on MADV_PAGEOUT]
  Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com
[akpm@linux-foundation.org: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:41 -07:00
Minchan Kim
9c276cc65a mm: introduce MADV_COLD
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.

- Background

The Android terminology used for forking a new process and starting an app
from scratch is a cold start, while resuming an existing app is a hot
start.  While we continually try to improve the performance of cold
starts, hot starts will always be significantly less power hungry as well
as faster so we are trying to make hot start more likely than cold start.

To increase hot start, Android userspace manages the order that apps
should be killed in a process called ActivityManagerService.
ActivityManagerService tracks every Android app or service that the user
could be interacting with at any time and translates that into a ranked
list for lmkd(low memory killer daemon).  They are likely to be killed by
lmkd if the system has to reclaim memory.  In that sense they are similar
to entries in any other cache.  Those apps are kept alive for
opportunistic performance improvements but those performance improvements
will vary based on the memory requirements of individual workloads.

- Problem

Naturally, cached apps were dominant consumers of memory on the system.
However, they were not significant consumers of swap even though they are
good candidate for swap.  Under investigation, swapping out only begins
once the low zone watermark is hit and kswapd wakes up, but the overall
allocation rate in the system might trip lmkd thresholds and cause a
cached process to be killed(we measured performance swapping out vs.
zapping the memory by killing a process.  Unsurprisingly, zapping is 10x
times faster even though we use zram which is much faster than real
storage) so kill from lmkd will often satisfy the high zone watermark,
resulting in very few pages actually being moved to swap.

- Approach

The approach we chose was to use a new interface to allow userspace to
proactively reclaim entire processes by leveraging platform information.
This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
that are known to be cold from userspace and to avoid races with lmkd by
reclaiming apps as soon as they entered the cached state.  Additionally,
it could provide many chances for platform to use much information to
optimize memory efficiency.

To achieve the goal, the patchset introduce two new options for madvise.
One is MADV_COLD which will deactivate activated pages and the other is
MADV_PAGEOUT which will reclaim private pages instantly.  These new
options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
ways to gain some free memory space.  MADV_PAGEOUT is similar to
MADV_DONTNEED in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed immediately; MADV_COLD is similar
to MADV_FREE in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed when memory pressure rises.

This patch (of 5):

When a process expects no accesses to a certain memory range, it could
give a hint to kernel that the pages can be reclaimed when memory pressure
happens but data should be preserved for future use.  This could reduce
workingset eviction so it ends up increasing performance.

This patch introduces the new MADV_COLD hint to madvise(2) syscall.
MADV_COLD can be used by a process to mark a memory range as not expected
to be used in the near future.  The hint can help kernel in deciding which
pages to evict early during memory pressure.

It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves

	active file page -> inactive file LRU
	active anon page -> inacdtive anon LRU

Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
LRU's head because MADV_COLD is a little bit different symantic.
MADV_FREE means it's okay to discard when the memory pressure because the
content of the page is *garbage* so freeing such pages is almost zero
overhead since we don't need to swap out and access afterward causes just
minor fault.  Thus, it would make sense to put those freeable pages in
inactive file LRU to compete other used-once pages.  It makes sense for
implmentaion point of view, too because it's not swapbacked memory any
longer until it would be re-dirtied.  Even, it could give a bonus to make
them be reclaimed on swapless system.  However, MADV_COLD doesn't mean
garbage so reclaiming them requires swap-out/in in the end so it's bigger
cost.  Since we have designed VM LRU aging based on cost-model, anonymous
cold pages would be better to position inactive anon's LRU list, not file
LRU.  Furthermore, it would help to avoid unnecessary scanning if system
doesn't have a swap device.  Let's start simpler way without adding
complexity at this moment.  However, keep in mind, too that it's a caveat
that workloads with a lot of pages cache are likely to ignore MADV_COLD on
anonymous memory because we rarely age anonymous LRU lists.

* man-page material

MADV_COLD (since Linux x.x)

Pages in the specified regions will be treated as less-recently-accessed
compared to pages in the system with similar access frequencies.  In
contrast to MADV_FREE, the contents of the region are preserved regardless
of subsequent writes to pages.

MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
pages.

[akpm@linux-foundation.org: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:41 -07:00
David S. Miller
dca73a65a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-06-19

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) new SO_REUSEPORT_DETACH_BPF setsocktopt, from Martin.

2) BTF based map definition, from Andrii.

3) support bpf_map_lookup_elem for xskmap, from Jonathan.

4) bounded loops and scalar precision logic in the verifier, from Alexei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-20 00:06:27 -04:00
Martin KaFai Lau
99f3a064bc bpf: net: Add SO_DETACH_REUSEPORT_BPF
There is SO_ATTACH_REUSEPORT_[CE]BPF but there is no DETACH.
This patch adds SO_DETACH_REUSEPORT_BPF sockopt.  The same
sockopt can be used to undo both SO_ATTACH_REUSEPORT_[CE]BPF.

reseport_detach_prog() is added and it is mostly a mirror
of the existing reuseport_attach_prog().  The differences are,
it does not call reuseport_alloc() and returns -ENOENT when
there is no old prog.

Cc: Craig Gallek <kraig@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15 01:21:19 +02:00
Greg Kroah-Hartman
96ac6d4351 treewide: Add SPDX license identifier - Kbuild
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

      GPL-2.0

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:32:33 -07:00
Linus Torvalds
92fab77b6b Main MIPS changes for v5.2:
- A set of memblock initialization improvements thanks to Serge Semin,
   tidying up after our conversion from bootmem to memblock back in
   v4.20.
 
 - Our eBPF JIT the previously supported only MIPS64r2 through MIPS64r5
   is improved to also support MIPS64r6. Support for MIPS32 systems is
   introduced, with the caveat that it only works for programs that don't
   use 64 bit registers or operations - those will bail out & need to be
   interpreted.
 
 - Improvements to the allocation & configuration of our exception vector
   that should fix issues seen on some platforms using recent versions of
   U-Boot.
 
 - Some minor improvements to code generated for jump labels, along with
   enabling them by default for generic kernels.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXNNB2RUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN1zeAD/U/ScowcQE8ynoY97nA70d3UmbETH
 YETUX5WcOfR65O8A/1hvMX8QJ1x87XUlNTkE6Gdh/itAZJpJWiSo3dnd1GoF
 =L9IJ
 -----END PGP SIGNATURE-----

Merge tag 'mips_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Paul Burton:

 - A set of memblock initialization improvements thanks to Serge Semin,
   tidying up after our conversion from bootmem to memblock back in
   v4.20.

 - Our eBPF JIT the previously supported only MIPS64r2 through MIPS64r5
   is improved to also support MIPS64r6. Support for MIPS32 systems is
   introduced, with the caveat that it only works for programs that
   don't use 64 bit registers or operations - those will bail out & need
   to be interpreted.

 - Improvements to the allocation & configuration of our exception
   vector that should fix issues seen on some platforms using recent
   versions of U-Boot.

 - Some minor improvements to code generated for jump labels, along with
   enabling them by default for generic kernels.

* tag 'mips_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (27 commits)
  mips: Manually call fdt_init_reserved_mem() method
  mips: Make sure dt memory regions are valid
  mips: Perform early low memory test
  mips: Dump memblock regions for debugging
  mips: Add reserve-nomap memory type support
  mips: Use memblock to reserve the __nosave memory range
  mips: Discard post-CMA-init foreach loop
  mips: Reserve memory for the kernel image resources
  MIPS: Remove duplicate EBase configuration
  MIPS: Sync icache for whole exception vector
  MIPS: Always allocate exception vector for MIPSr2+
  MIPS: Use memblock_phys_alloc() for exception vector
  mips: Combine memblock init and memory reservation loops
  mips: Discard rudiments from bootmem_init
  mips: Make sure kernel .bss exists in boot mem pool
  mips: vdso: drop unnecessary cc-ldoption
  Revert "MIPS: ralink: fix cpu clock of mt7621 and add dt clk devices"
  MIPS: generic: Enable CONFIG_JUMP_LABEL
  MIPS: jump_label: Use compact branches for >= r6
  MIPS: jump_label: Remove redundant nops
  ...
2019-05-08 16:41:47 -07:00
Arnd Bergmann
0768e17073 net: socket: implement 64-bit timestamps
The 'timeval' and 'timespec' data structures used for socket timestamps
are going to be redefined in user space based on 64-bit time_t in future
versions of the C library to deal with the y2038 overflow problem,
which breaks the ABI definition.

Unlike many modern ioctl commands, SIOCGSTAMP and SIOCGSTAMPNS do not
use the _IOR() macro to encode the size of the transferred data, so it
remains ambiguous whether the application uses the old or new layout.

The best workaround I could find is rather ugly: we redefine the command
code based on the size of the respective data structure with a ternary
operator. This lets it get evaluated as late as possible, hopefully after
that structure is visible to the caller. We cannot use an #ifdef here,
because inux/sockios.h might have been included before any libc header
that could determine the size of time_t.

The ioctl implementation now interprets the new command codes as always
referring to the 64-bit structure on all architectures, while the old
architecture specific command code still refers to the old architecture
specific layout. The new command number is only used when they are
actually different.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19 14:07:40 -07:00
Paul Burton
ec86e545c1 A small batch of MIPS fixes for 5.1:
- An interrupt masking fix for Loongson-based Lemote 2F systems (fixing
   a regression from v3.19).
 
 - A relocation fix for configurations in which the devicetree is stored
   in an ELF section (fixing a regression from v4.7).
 
 - Fix jump labels for MIPSr6 kernels where they previously could
   inadvertently place a control transfer instruction in a forbidden slot
   & take unexpected exceptions (fixing MIPSr6 support added in v4.0).
 
 - Extend an existing USB power workaround for the Netgear WNDR3400 to v2
   boards in addition to the v3 ones that already used it.
 
 - Remove the custom MIPS32 definition of __kernel_fsid_t to make it
   consistent with MIPS64 & every other architecture, in particular
   resolving issues for code which tries to print the val field whose
   type previously differed (though had identical memory layout).
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXJARJxUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN0qJAEAg6i9PnkuHZFXjlaUsvBWyVJRrpgR
 Y9vLYXTGJZdb1BwA/i17C6xD7i41Ef2/TtOuPc5fJ6IfEbt74nKJEeBxNTUO
 =V6Ds
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXK0o6hUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN1r6QEAyfJVKxisnxwDGmP0QUoRkISv7Mi/
 xhdiPmC5AaR8qjoBAPbGex4JdLRVVoti/lxYk2mJ38JEM/zxL3YTxcj/n7IF
 =CJp2
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_5.1_1' into mips-next

A small batch of MIPS fixes for 5.1:

- An interrupt masking fix for Loongson-based Lemote 2F systems (fixing
  a regression from v3.19).

- A relocation fix for configurations in which the devicetree is stored
  in an ELF section (fixing a regression from v4.7).

- Fix jump labels for MIPSr6 kernels where they previously could
  inadvertently place a control transfer instruction in a forbidden slot
  & take unexpected exceptions (fixing MIPSr6 support added in v4.0).

- Extend an existing USB power workaround for the Netgear WNDR3400 to v2
  boards in addition to the v3 ones that already used it.

- Remove the custom MIPS32 definition of __kernel_fsid_t to make it
  consistent with MIPS64 & every other architecture, in particular
  resolving issues for code which tries to print the val field whose
  type previously differed (though had identical memory layout).

Merged into mips-next to gain the MIPSr6 jump label fix before enabling
jump labels by default for generic kernel builds.

Signed-off-by: Paul Burton <paul.burton@mips.com>
2019-04-09 16:21:13 -07:00
Hassan Naveed
0d1d17b9ff
MIPS: uasm: Add div, mul and sel instructions for mipsr6
Add the following instructions for use by eBPF on mipsr6:
insn_ddivu_r6, insn_divu_r6, insn_dmodu, insn_dmulu, insn_modu,
insn_mulu, insn_seleqz, insn_selnez

Signed-off-by: Hassan Naveed <hnaveed@wavecomp.com>
Reviewed-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: kafai@fb.com
Cc: songliubraving@fb.com
Cc: yhs@fb.com
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: open list:MIPS <linux-mips@linux-mips.org>
Cc: open list <linux-kernel@vger.kernel.org>
2019-03-19 15:26:06 -07:00
Linus Torvalds
b7a42146dc A small batch of MIPS fixes for 5.1:
- An interrupt masking fix for Loongson-based Lemote 2F systems (fixing
   a regression from v3.19).
 
 - A relocation fix for configurations in which the devicetree is stored
   in an ELF section (fixing a regression from v4.7).
 
 - Fix jump labels for MIPSr6 kernels where they previously could
   inadvertently place a control transfer instruction in a forbidden slot
   & take unexpected exceptions (fixing MIPSr6 support added in v4.0).
 
 - Extend an existing USB power workaround for the Netgear WNDR3400 to v2
   boards in addition to the v3 ones that already used it.
 
 - Remove the custom MIPS32 definition of __kernel_fsid_t to make it
   consistent with MIPS64 & every other architecture, in particular
   resolving issues for code which tries to print the val field whose
   type previously differed (though had identical memory layout).
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXJARJxUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN0qJAEAg6i9PnkuHZFXjlaUsvBWyVJRrpgR
 Y9vLYXTGJZdb1BwA/i17C6xD7i41Ef2/TtOuPc5fJ6IfEbt74nKJEeBxNTUO
 =V6Ds
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_5.1_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
 "A small batch of MIPS fixes for 5.1:

   - An interrupt masking fix for Loongson-based Lemote 2F systems
     (fixing a regression from v3.19)

   - A relocation fix for configurations in which the devicetree is
     stored in an ELF section (fixing a regression from v4.7)

   - Fix jump labels for MIPSr6 kernels where they previously could
     inadvertently place a control transfer instruction in a forbidden
     slot & take unexpected exceptions (fixing MIPSr6 support added in
     v4.0)

   - Extend an existing USB power workaround for the Netgear WNDR3400 to
     v2 boards in addition to the v3 ones that already used it

   - Remove the custom MIPS32 definition of __kernel_fsid_t to make it
     consistent with MIPS64 & every other architecture, in particular
     resolving issues for code which tries to print the val field whose
     type previously differed (though had identical memory layout)"

* tag 'mips_fixes_5.1_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Remove custom MIPS32 __kernel_fsid_t type
  mips: bcm47xx: Enable USB power on Netgear WNDR3400v2
  MIPS: Fix kernel crash for R6 in jump label branch function
  MIPS: Ensure ELF appended dtb is relocated
  mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction.
2019-03-19 10:50:15 -07:00
Linus Torvalds
28d747f266 Kbuild updates for v5.1 (2nd)
- add more Build-Depends to Debian source package
 
  - prefix header search paths with $(srctree)/
 
  - make modpost show verbose section mismatch warnings
 
  - avoid hard-coded CROSS_COMPILE for h8300
 
  - fix regression for Debian make-kpkg command
 
  - add semantic patch to detect missing put_device()
 
  - fix some warnings of 'make deb-pkg'
 
  - optimize NOSTDINC_FLAGS evaluation
 
  - add warnings about redundant generic-y
 
  - clean up Makefiles and scripts
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcjm13AAoJED2LAQed4NsG9FoQALFscagW8R5LIDmzzRPmslhF
 W1qm9rEmtdnOHGg20QbYUnJwtGZjVN4lIZp6eQ3v6mhvm6IY2VhInGJpcLnwbojb
 o7y4wKcP9/ucIpfV/z32DrUfEM+qnQwztn56u7lJBxf4cTFEOIwIIS8v1KEnsNXX
 Zzvu1kSKsc4ZHHdE7h3dmr3iC5GOz/6EAJ9U33WcLy24tRTevIxcZsYvb/SOvDAT
 NYdPK8yptuVVO+odHObNwMVBidRcXRb49gWQGWLuAvfbklh33pomYarWkNe/Syif
 UeCHDNwvqzEmjSks73EomdCjME0roWhgKbm/dXJKXhe2hBzP1psMWNzRPSRa4yIj
 SHE7UfFPXCa+tNveJo2qzTOhpMw1DRiNgZD3EM2cRvwZ1ip8emJr70qFfL+RGpqq
 4ZlLb9Tibb51ApLcn+r0AnOMrC8MkK1zC8dKNxgUwdJ7D4UqZ70348c2GXE54yfv
 kxst/gtLb9r6YEtaCsKbCk1XgR2y2QGtyYrVLKsI/v6fhPVBKxnDXIpsn0Q6NYFi
 UiYKojTpFKvEMl0tc1EaYrIGoq9ZH4wDna3q4lOSRiyrypUl8NfflWwDSIuYVP5Z
 Y2tIPYTcGeCxt3gyXu0riL6tvpy1KGVlByNB9V297rSrVenH4VcfYPLJhYAtqpRo
 gO2eyp64i9LduVZOrEEP
 =6GIM
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - add more Build-Depends to Debian source package

 - prefix header search paths with $(srctree)/

 - make modpost show verbose section mismatch warnings

 - avoid hard-coded CROSS_COMPILE for h8300

 - fix regression for Debian make-kpkg command

 - add semantic patch to detect missing put_device()

 - fix some warnings of 'make deb-pkg'

 - optimize NOSTDINC_FLAGS evaluation

 - add warnings about redundant generic-y

 - clean up Makefiles and scripts

* tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: remove stale lxdialog/.gitignore
  kbuild: force all architectures except um to include mandatory-y
  kbuild: warn redundant generic-y
  Revert "modsign: Abort modules_install when signing fails"
  kbuild: Make NOSTDINC_FLAGS a simply expanded variable
  kbuild: deb-pkg: avoid implicit effects
  coccinelle: semantic code search for missing put_device()
  kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG
  kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb
  kbuild: deb-pkg: add CONFIG_ prefix to kernel config options
  kbuild: add workaround for Debian make-kpkg
  kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG}
  unicore32: simplify linker script generation for decompressor
  h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
  kbuild: move archive command to scripts/Makefile.lib
  modpost: always show verbose warning for section mismatch
  ia64: prefix header search path with $(srctree)/
  libfdt: prefix header search paths with $(srctree)/
  deb-pkg: generate correct build dependencies
2019-03-17 13:25:26 -07:00
Masahiro Yamada
037fc3368b kbuild: force all architectures except um to include mandatory-y
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes
the common Kbuild.asm file. Factor out the duplicated include directives
to scripts/Makefile.asm-generic so that no architecture would opt out
of the mandatory-y mechanism.

um is not forced to include mandatory-y since it is a very exceptional
case which does not support UAPI.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-03-17 12:56:32 +09:00
Paul Burton
f6cab793d4
MIPS: Remove custom MIPS32 __kernel_fsid_t type
For MIPS32 kernels we have a custom definition of __kernel_fsid_t. This
differs from the asm-generic version used by all other architectures &
MIPS64 in one way - it declares the val field as an array of long,
rather than an array of int. Since int & long have identical size &
alignment when targeting MIPS32 anyway, this makes little sense.

Beyond the pointlessness this causes problems for code which prints
entries from the val array, for example the fanotify_encode_fid()
function [1]. If such code uses a format specified suited to an int then
it encounters compiler warnings when building for MIPS32, such as:

  In file included from include/linux/kernel.h:14:0,
                   from include/linux/list.h:9,
                   from include/linux/preempt.h:11,
                   from include/linux/spinlock.h:51,
                   from include/linux/fdtable.h:11,
                   from fs/notify/fanotify/fanotify.c:3:
  fs/notify/fanotify/fanotify.c: In function 'fanotify_encode_fid':
  include/linux/kern_levels.h:5:18: warning: format '%x' expects argument
    of type 'unsigned int', but argument 2 has type 'long int' [-Wformat=]

Remove the custom __kernel_fsid_t definition & make use of the
asm-generic version which will have an identical layout in memory
anyway, in order to remove the inconsistency with other architectures.

One possible regression this could cause if is any code is attempting to
print entries from the val array with a long-sized format specifier, in
which case it would begin seeing compiler warnings when built against
kernel headers including this change. Since such code is exceedingly
rare, and would have to be MIPS32-specific to expect a long, this seems
to be a problem that it's extremely unlikely anyone will encounter.

[1] https://lore.kernel.org/linux-mips/CAOQ4uxiEkczB7PNCXegFC-eYb9zAGaio_o=OgHAJHFd7eavBxA@mail.gmail.com/T/#mb43103277c79ef06b884359209e817db1c136140

Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
2019-03-14 11:31:20 -07:00
Linus Torvalds
f3ca4c55a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "More fixes in the queue:

  1) Netfilter nat can erroneously register the device notifier twice,
     fix from Florian Westphal.

  2) Use after free in nf_tables, from Pablo Neira Ayuso.

  3) Parallel update of steering rule fix in mlx5 river, from Eli
     Britstein.

  4) RX processing panic in lan743x, fix from Bryan Whitehead.

  5) Use before initialization of TCP_SKB_CB, fix from Christoph Paasch.

  6) Fix locking in SRIOV mode of mlx4 driver, from Jack Morgenstein.

  7) Fix TX stalls in lan743x due to mishandling of interrupt ACKing
     modes, from Bryan Whitehead.

  8) Fix infoleak in l2tp_ip6_recvmsg(), from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  pptp: dst_release sk_dst_cache in pptp_sock_destruct
  MAINTAINERS: GENET & SYSTEMPORT: Add internal Broadcom list
  l2tp: fix infoleak in l2tp_ip6_recvmsg()
  net/tls: Inform user space about send buffer availability
  net_sched: return correct value for *notify* functions
  lan743x: Fix TX Stall Issue
  net/mlx4_core: Fix qp mtt size calculation
  net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling
  net/mlx4_core: Fix reset flow when in command polling mode
  mlxsw: minimal: Initialize base_mac
  mlxsw: core: Prevent duplication during QSFP module initialization
  net: dwmac-sun8i: fix a missing check of of_get_phy_mode
  net: sh_eth: fix a missing check of of_get_phy_mode
  net: 8390: fix potential NULL pointer dereferences
  net: fujitsu: fix a potential NULL pointer dereference
  net: qlogic: fix a potential NULL pointer dereference
  isdn: hfcpci: fix potential NULL pointer dereference
  Documentation: devicetree: add a new optional property for port mac address
  net: rocker: fix a potential NULL pointer dereference
  net: qlge: fix a potential NULL pointer dereference
  ...
2019-03-14 09:28:12 -07:00
Arnd Bergmann
a623a7a1a5 y2038: fix socket.h header inclusion
Referencing the __kernel_long_t type caused some user space applications
to stop compiling when they had not already included linux/posix_types.h,
e.g.

s/multicast.c -o ext/sockets/multicast.lo
In file included from /builddir/build/BUILD/php-7.3.3/main/php.h:468,
                 from /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:27:
/builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c: In function 'zm_startup_sockets':
/builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:776:40: error: '__kernel_long_t' undeclared (first use in this function)
  776 |  REGISTER_LONG_CONSTANT("SO_SNDTIMEO", SO_SNDTIMEO, CONST_CS | CONST_PERSISTENT);

It is safe to include that header here, since it only contains kernel
internal types that do not conflict with other user space types.

It's still possible that some related build failures remain, but those
are likely to be for code that is not already y2038 safe.

Reported-by: Laura Abbott <labbott@redhat.com>
Fixes: a9beb86ae6 ("sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-11 11:06:00 -07:00
Linus Torvalds
fa29f5ba42 asm-generic changes for v5.1
Only a few small changes this time:
 
 - Michael S. Tsirkin cleans up linux/mman.h
 - Mike Rapoport found a typo
 
 I had originally merged another cleanup series for I/O accessors from
 Hugo Lefeuvre as well, but dropped it after the discussion of the barrier
 semantics and some conflicts. I expect this series to get merged for a
 later release though.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJcf9yrAAoJEGCrR//JCVInohIP+wcCZ3XEym0oyb9C5BgIdnMF
 qq69wbOT9ypAnB469PPjsvGhyCsqGo1Fm2Cludac0C/OfmB/lYJCqklAJrc6a7Cp
 jXD5rQkXQZGdmFHS81i24ejcNV9F5fh16vyDwqCgUQDCgg9MeDPirvwaDl958rhq
 5dUsoUu+CJRI35jZ8NPWbsSU5Wa2BpckWnuTs97PtlLnMH/RzEyxmSRysxtHNA7S
 PgYjvWBHbRK6mTNgcY/eojAoQuQmBgiOppYX1XTffHZgwY/VBIjYL3Wxwe1qbvSN
 vvLgE6nvvmRYjdq4VOoVbi9BOtWiCtXw29TWXfxSetN9CWCEwIQJdIm8lHn1CJRh
 6GnOdb8ADqAzraxo7GOf78kNsekl3mxh+QYN5gDaGS6eQASCNlGx70zUh7Y6JcE5
 6NJ8VMy0oEEFzHAsE0mxluu8LHL3F1hwS832D6mQ697Z71T+IoHgIcMT2jHqp9fw
 7/D2taoBAXdGqhToY3hhMMWsi4lFCvxjVVlxCxhp7Ik+cyOgW0O5vG2afLOHrtti
 vWEcn7M6nuKvb3MxBVjg8sK2ln6vIXNgZGjVmkxn70ZAmvZJX+KE+G1cvB2gMGc0
 S/ogtpbETMgMCwuAf2hdYgiBFJpZL725DEWFfS+p+02PTwjdKF+EWAS4swZM0sCE
 U1v1N7yCe/Wb/ijEm0M1
 =C9p7
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "Only a few small changes this time:

   - Michael S. Tsirkin cleans up linux/mman.h

   - Mike Rapoport found a typo

  I had originally merged another cleanup series for I/O accessors from
  Hugo Lefeuvre as well, but dropped it after the discussion of the
  barrier semantics and some conflicts. I expect this series to get
  merged for a later release though"

* tag 'asm-generic-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic/page.h: fix typo in #error text requiring a real asm/page.h
  arch: move common mmap flags to linux/mman.h
  drm: tweak header name
  x86/mpx: tweak header name
2019-03-06 09:18:43 -08:00
Michael S. Tsirkin
746c9398f5 arch: move common mmap flags to linux/mman.h
Now that we have 3 mmap flags shared by all architectures,
let's move them into the common header.

This will help discourage future architectures from duplicating code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-18 17:49:30 +01:00
Deepa Dinamani
a9beb86ae6 sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW
Add new socket timeout options that are y2038 safe.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: davem@davemloft.net
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:17:31 -08:00
Deepa Dinamani
45bdc66159 socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
as the time format. struct timeval is not y2038 safe.
The subsequent patches in the series add support for new socket
timeout options with _NEW suffix that will use y2038 safe
data structures. Although the existing struct timeval layout
is sufficiently wide to represent timeouts, because of the way
libc will interpret time_t based on user defined flag, these
new flags provide a way of having a structure that is the same
for all architectures consistently.
Rename the existing options with _OLD suffix forms so that the
right option is enabled for userspace applications according
to the architecture and time_t definition of libc.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:17:31 -08:00
Deepa Dinamani
9718475e69 socket: Add SO_TIMESTAMPING_NEW
Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
for all architectures.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: chris@zankel.net
Cc: fenghua.yu@intel.com
Cc: rth@twiddle.net
Cc: tglx@linutronix.de
Cc: ubraun@linux.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:17:31 -08:00
Deepa Dinamani
887feae36a socket: Add SO_TIMESTAMP[NS]_NEW
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
socket timestamp options.
These are the y2038 safe versions of the SO_TIMESTAMP_OLD
and SO_TIMESTAMPNS_OLD for all architectures.

Note that the format of scm_timestamping.ts[0] is not changed
in this patch.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:17:31 -08:00
Deepa Dinamani
7f1bc6e95d sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
way they are currently defined, are not y2038 safe.
Subsequent patches in the series add new y2038 safe versions
of these options which provide 64 bit timestamps on all
architectures uniformly.
Hence, rename existing options with OLD tag suffixes.

Also note that kernel will not use the untagged SO_TIMESTAMP*
and SCM_TIMESTAMP* options internally anymore.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: deller@gmx.de
Cc: dhowells@redhat.com
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-afs@lists.infradead.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:17:30 -08:00
David Herrmann
f5dd3d0c96 net: introduce SO_BINDTOIFINDEX sockopt
This introduces a new generic SOL_SOCKET-level socket option called
SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a
network interface index as argument, rather than the network interface
name.

User-space often refers to network-interfaces via their index, but has
to temporarily resolve it to a name for a call into SO_BINDTODEVICE.
This might pose problems when the network-device is renamed
asynchronously by other parts of the system. When this happens, the
SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong
device.

In most cases user-space only ever operates on devices which they
either manage themselves, or otherwise have a guarantee that the device
name will not change (e.g., devices that are UP cannot be renamed).
However, particularly in libraries this guarantee is non-obvious and it
would be nice if that race-condition would simply not exist. It would
make it easier for those libraries to operate even in situations where
the device-name might change under the hood.

A real use-case that we recently hit is trying to start the network
stack early in the initrd but make it survive into the real system.
Existing distributions rename network-interfaces during the transition
from initrd into the real system. This, obviously, cannot affect
devices that are up and running (unless you also consider moving them
between network-namespaces). However, the network manager now has to
make sure its management engine for dormant devices will not run in
parallel to these renames. Particularly, when you offload operations
like DHCP into separate processes, these might setup their sockets
early, and thus have to resolve the device-name possibly running into
this race-condition.

By avoiding a call to resolve the device-name, we no longer depend on
the name and can run network setup of dormant devices in parallel to
the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this
race.

Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 14:55:51 -08:00
Masahiro Yamada
d6e4b3e326 arch: remove redundant UAPI generic-y defines
Now that Kbuild automatically creates asm-generic wrappers for missing
mandatory headers, it is redundant to list the same headers in
generic-y and mandatory-y.

Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
2019-01-06 10:22:15 +09:00
Masahiro Yamada
d4ce5458ea arch: remove stale comments "UAPI Header export list"
These comments are leftovers of commit fcc8487d47 ("uapi: export all
headers under uapi directories").

Prior to that commit, exported headers must be explicitly added to
header-y. Now, all headers under the uapi/ directories are exported.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-01-06 09:46:51 +09:00
Linus Torvalds
e0c38a4d1f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New ipset extensions for matching on destination MAC addresses, from
    Stefano Brivio.

 2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to
    nfp driver. From Stefano Brivio.

 3) Implement GRO for plain UDP sockets, from Paolo Abeni.

 4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT
    bit so that we could support the entire vlan_tci value.

 5) Rework the IPSEC policy lookups to better optimize more usecases,
    from Florian Westphal.

 6) Infrastructure changes eliminating direct manipulation of SKB lists
    wherever possible, and to always use the appropriate SKB list
    helpers. This work is still ongoing...

 7) Lots of PHY driver and state machine improvements and
    simplifications, from Heiner Kallweit.

 8) Various TSO deferral refinements, from Eric Dumazet.

 9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov.

10) Batch dropping of XDP packets in tuntap, from Jason Wang.

11) Lots of cleanups and improvements to the r8169 driver from Heiner
    Kallweit, including support for ->xmit_more. This driver has been
    getting some much needed love since he started working on it.

12) Lots of new forwarding selftests from Petr Machata.

13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel.

14) Packed ring support for virtio, from Tiwei Bie.

15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov.

16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu.

17) Implement coalescing on TCP backlog queue, from Eric Dumazet.

18) Implement carrier change in tun driver, from Nicolas Dichtel.

19) Support msg_zerocopy in UDP, from Willem de Bruijn.

20) Significantly improve garbage collection of neighbor objects when
    the table has many PERMANENT entries, from David Ahern.

21) Remove egdev usage from nfp and mlx5, and remove the facility
    completely from the tree as it no longer has any users. From Oz
    Shlomo and others.

22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and
    therefore abort the operation before the commit phase (which is the
    NETDEV_CHANGEADDR event). From Petr Machata.

23) Add indirect call wrappers to avoid retpoline overhead, and use them
    in the GRO code paths. From Paolo Abeni.

24) Add support for netlink FDB get operations, from Roopa Prabhu.

25) Support bloom filter in mlxsw driver, from Nir Dotan.

26) Add SKB extension infrastructure. This consolidates the handling of
    the auxiliary SKB data used by IPSEC and bridge netfilter, and is
    designed to support the needs to MPTCP which could be integrated in
    the future.

27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits)
  net: dccp: fix kernel crash on module load
  drivers/net: appletalk/cops: remove redundant if statement and mask
  bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
  net/net_namespace: Check the return value of register_pernet_subsys()
  net/netlink_compat: Fix a missing check of nla_parse_nested
  ieee802154: lowpan_header_create check must check daddr
  net/mlx4_core: drop useless LIST_HEAD
  mlxsw: spectrum: drop useless LIST_HEAD
  net/mlx5e: drop useless LIST_HEAD
  iptunnel: Set tun_flags in the iptunnel_metadata_reply from src
  net/mlx5e: fix semicolon.cocci warnings
  staging: octeon: fix build failure with XFRM enabled
  net: Revert recent Spectre-v1 patches.
  can: af_can: Fix Spectre v1 vulnerability
  packet: validate address length if non-zero
  nfc: af_nfc: Fix Spectre v1 vulnerability
  phonet: af_phonet: Fix Spectre v1 vulnerability
  net: core: Fix Spectre v1 vulnerability
  net: minor cleanup in skb_ext_add()
  net: drop the unused helper skb_ext_get()
  ...
2018-12-27 13:04:52 -08:00
Firoz Khan
99bf73ebf9
mips: generate uapi header and system call table files
System call table generation script must be run to gener-
ate unistd_(nr_)n64/n32/o32.h and syscall_table_32_o32/
64_n64/64_n32/64-o32.h files. This patch will have changes
which will invokes the script.

This patch will generate unistd_(nr_)n64/n32/o32.h and
syscall_table_32_o32/64_n64/64-n32/64-o32.h files by the
syscall table generation script invoked by parisc/Make-
file and the generated files against the removed files
must be identical.

The generated uapi header file will be included in uapi/-
asm/unistd.h and generated system call table header file
will be included by kernel/scall32-o32/64-n64/64-n32/-
64-o32.Sfile.

Signed-off-by: Firoz Khan <firoz.khan@linaro.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: y2038@lists.linaro.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: arnd@arndb.de
Cc: deepa.kernel@gmail.com
Cc: marcin.juszkiewicz@linaro.org
2018-12-14 11:19:02 -08:00
Firoz Khan
be856439c9
mips: add +1 to __NR_syscalls in uapi header
All other architectures are hold a value for __NR_syscalls will
be equal to the last system call number +1.

But in mips architecture, __NR_syscalls hold the value equal to
total number of system exits in the architecture. One of the
patch in this patch series will genarate uapi header files.

In order to make the implementation common across all architect-
ures, add +1 to __NR_syscalls, which will be equal to the last
system call number +1.

Signed-off-by: Firoz Khan <firoz.khan@linaro.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: y2038@lists.linaro.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: arnd@arndb.de
Cc: deepa.kernel@gmail.com
Cc: marcin.juszkiewicz@linaro.org
2018-12-14 11:19:01 -08:00
Firoz Khan
a5ee2be91a
mips: remove unused macros
Remove __NR_Linux_syscalls from uapi/asm/unistd.h as
there is no users to use NR_syscalls macro in mips
kernel.

MAX_SYSCALL_NO can also remove as there is commit
2957c9e61e ("[MIPS] IRIX: Goodbye and thanks for
all the fish"), eight years ago.

Signed-off-by: Firoz Khan <firoz.khan@linaro.org>
[paul.burton@mips.com:
 - Drop the removal of NR_syscalls which is used by
   kernel/trace/trace.h.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: y2038@lists.linaro.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: arnd@arndb.de
Cc: deepa.kernel@gmail.com
Cc: marcin.juszkiewicz@linaro.org
2018-12-14 11:13:40 -08:00
Firoz Khan
ef2512c826
mips: add __NR_syscalls along with __NR_Linux_syscalls
__NR_Linux_syscalls macro holds the number of system call
exist in mips architecture. We have to change the value of
__NR_Linux_syscalls, if we add or delete a system call.

One of the patch in this patch series has a script which
will generate a uapi header based on syscall.tbl file.
The syscall.tbl file contains the total number of system
calls information. So we have two option to update __NR-
_Linux_syscalls value.

1. Update __NR_Linux_syscalls in asm/unistd.h manually
   by counting the no.of system calls. No need to update
   __NR_Linux_syscalls until we either add a new system
   call or delete existing system call.

2. We can keep this feature it above mentioned script,
   that will count the number of syscalls and keep it in
   a generated file. In this case we don't need to expli-
   citly update __NR_Linux_syscalls in asm/unistd.h file.

The 2nd option will be the recommended one. For that, I
added the __NR_syscalls macro in uapi/asm/unistd.h along
with __NR_Linux_syscalls. The macro __NR_syscalls also
added for making the name convention same across all
architecture. While __NR_syscalls isn't strictly part of
the uapi, having it as part of the generated header to
simplifies the implementation. We also need to enclose
this macro with #ifdef __KERNEL__ to avoid side effects.

Signed-off-by: Firoz Khan <firoz.khan@linaro.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: y2038@lists.linaro.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: arnd@arndb.de
Cc: deepa.kernel@gmail.com
Cc: marcin.juszkiewicz@linaro.org
2018-12-13 11:06:46 -08:00
Jiong Wang
ee94b90c8a mips: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_X
Jitting of BPF_K is supported already, but not BPF_X. This patch complete
the support for the latter on both MIPS and microMIPS.

Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07 13:30:48 -08:00
Jiong Wang
17f6c83fb5 mips: bpf: fix encoding bug for mm_srlv32_op
For micro-mips, srlv inside POOL32A encoding space should use 0x50
sub-opcode, NOT 0x90.

Some early version ISA doc describes the encoding as 0x90 for both srlv and
srav, this looks to me was a typo. I checked Binutils libopcode
implementation which is using 0x50 for srlv and 0x90 for srav.

v1->v2:
  - Keep mm_srlv32_op sorted by value.

Fixes: f31318fdf3 ("MIPS: uasm: Add srlv uasm instruction")
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07 13:29:48 -08:00
Sean Young
1287533d3d
MIPS: Remove superfluous check for __linux__
When building BPF code using "clang -target bpf -c", clang does not
define __linux__.

To build BPF IR decoders the include linux/lirc.h is needed which
includes linux/types.h. Currently this workaround is needed:

https://git.linuxtv.org/v4l-utils.git/commit/?id=dd3ff81f58c4e1e6f33765dc61ad33c48ae6bb07

This check might otherwise be useful to stop users from using a non-linux
compiler, but if you're doing that you are going to have a lot more
trouble anyway.

Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/21149/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
2018-11-19 12:55:02 -08:00
Linus Torvalds
5bd4af34a0 TTY/Serial patches for 4.20-rc1
Here is the big tty and serial pull request for 4.20-rc1
 
 Lots of little things here, including a merge from the SPI tree in order
 to keep things simpler for everyone to sync around for one platform.
 
 Major stuff is:
 	- tty buffer clearing after use
 	- atmel_serial fixes and additions
 	- xilinx uart driver updates
 and of course, lots of tiny fixes and additions to individual serial
 drivers.
 
 All of these have been in linux-next with no reported issues for a
 while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW9bW0w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymYhgCfbxr0T+4lF/rpGxNXNnV4u5boRJUAn2L8R+1y
 URbAWHvKfaby2AVfQ1z0
 =qTHH
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here is the big tty and serial pull request for 4.20-rc1

  Lots of little things here, including a merge from the SPI tree in
  order to keep things simpler for everyone to sync around for one
  platform.

  Major stuff is:

   - tty buffer clearing after use

   - atmel_serial fixes and additions

   - xilinx uart driver updates

  and of course, lots of tiny fixes and additions to individual serial
  drivers.

  All of these have been in linux-next with no reported issues for a
  while"

* tag 'tty-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (66 commits)
  of: base: Change logic in of_alias_get_alias_list()
  of: base: Fix english spelling in of_alias_get_alias_list()
  serial: sh-sci: do not warn if DMA transfers are not supported
  serial: uartps: Do not allow use aliases >= MAX_UART_INSTANCES
  tty: check name length in tty_find_polling_driver()
  serial: sh-sci: Add r8a77990 support
  tty: wipe buffer if not echoing data
  tty: wipe buffer.
  serial: fsl_lpuart: Remove the alias node dependence
  TTY: sn_console: Replace spin_is_locked() with spin_trylock()
  Revert "serial:serial_core: Allow use of CTS for PPS line discipline"
  serial: 8250_uniphier: add auto-flow-control support
  serial: 8250_uniphier: flatten probe function
  serial: 8250_uniphier: remove unused "fifo-size" property
  dt-bindings: serial: sh-sci: Document r8a7744 bindings
  serial: uartps: Fix missing unlock on error in cdns_get_id()
  tty/serial: atmel: add ISO7816 support
  tty/serial_core: add ISO7816 infrastructure
  serial:serial_core: Allow use of CTS for PPS line discipline
  serial: docs: Fix filename for serial reference implementation
  ...
2018-10-29 10:42:20 -07:00
Eric W. Biederman
f283801851 signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
Rework the defintion of struct siginfo so that the array padding
struct siginfo to SI_MAX_SIZE can be placed in a union along side of
the rest of the struct siginfo members.  The result is that we no
longer need the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-10-03 16:46:43 +02:00
Nicolas Ferre
ad8c0eaa0a tty/serial_core: add ISO7816 infrastructure
Add the ISO7816 ioctl and associated accessors and data structure.
Drivers can then use this common implementation to handle ISO7816
(smart cards).

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
[ludovic.desroches@microchip.com: squash and rebase, removal of gpios, checkpatch fixes]
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-02 13:38:55 -07:00
Richard Cochran
80b14dee2b net: Add a new socket option for a future transmit time.
This patch introduces SO_TXTIME. User space enables this option in
order to pass a desired future transmit time in a CMSG when calling
sendmsg(2). The argument to this socket option is a 8-bytes long struct
provided by the uapi header net_tstamp.h defined as:

struct sock_txtime {
	clockid_t 	clockid;
	u32		flags;
};

Note that new fields were added to struct sock by filling a 2-bytes
hole found in the struct. For that reason, neither the struct size or
number of cachelines were altered.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04 22:30:27 +09:00
Paul Burton
4337aac1e1
MIPS: Wire up io_pgetevents syscall
Wire up the io_pgetevents syscall that was introduced by commit
7a074e96de ("aio: implement io_pgetevents").

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19593/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-06-19 21:14:29 -07:00
Paul Burton
e426b3754a
MIPS: Wire up the restartable sequences (rseq) syscall
Wire up the restartable sequences (rseq) syscall for MIPS. This was
introduced by commit d7822b1e24 ("rseq: Introduce restartable
sequences system call") & MIPS now supports the prerequisites.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Reviewed-by: James Hogan <jhogan@kernel.org>
Patchwork: https://patchwork.linux-mips.org/patch/19525/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
2018-06-19 21:14:09 -07:00
Arnd Bergmann
3f3a4b3fbf y2038: mips: Extend sysvipc data structures
MIPS is the weirdest case for sysvipc, because each of the
three data structures is done differently:

* msqid64_ds has padding in the right place so we could in theory
  extend this one to just have 64-bit values instead of time_t.
  As this does not work for most of the other combinations,
  we just handle it in the common manner though.

* semid64_ds has no padding for 64-bit time_t, but has two reserved
  'long' fields, which are sufficient to extend the sem_otime
  and sem_ctime fields to 64 bit. In order to do this, the libc
  implementation will have to copy the data into another structure
  that has the fields in a different order. MIPS is the only
  architecture with this problem, so this is best done in MIPS
  specific libc code.

* shmid64_ds is slightly worse than that, because it has three
  time_t fields but only two unused 32-bit words. As a workaround,
  we extend each field only by 16 bits, ending up with 48-bit
  timestamps that user space again has to work around by itself.

The compat versions of the data structures are changed in the
same way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-20 16:20:04 +02:00
Michal Hocko
a4ff8e8620 mm: introduce MAP_FIXED_NOREPLACE
Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.

This has started as a follow up discussion [3][4] resulting in the
runtime failure caused by hardening patch [5] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber an existing underlying mapping (e.g.  stack).
The reason for the failure is that some architectures enforce an
alignment for the given address hint without MAP_FIXED used (e.g.  for
shared or file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [6].  The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution.  We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
address but unlike MAP_FIXED it fails with EEXIST if the given range
conflicts with an existing one.  The flag is introduced as a completely
new one rather than a MAP_FIXED extension because of the backward
compatibility.  We really want a never-clobber semantic even on older
kernels which do not recognize the flag.  Unfortunately mmap sucks
wrt flags evaluation because we do not EINVAL on unknown flags.  On
those kernels we would simply use the traditional hint based semantic so
the caller can still get a different address (which sucks) but at least
not silently corrupt an existing mapping.  I do not see a good way
around that.  Except we won't export expose the new semantic to the
userspace at all.

It seems there are users who would like to have something like that.
Jemalloc has been mentioned by Michael Ellerman [7]

Florian Weimer has mentioned the following:
: glibc ld.so currently maps DSOs without hints.  This means that the kernel
: will map right next to each other, and the offsets between them a completely
: predictable.  We would like to change that and supply a random address in a
: window of the address space.  If there is a conflict, we do not want the
: kernel to pick a non-random address. Instead, we would try again with a
: random address.

John Hubbard has mentioned CUDA example
: a) Searches /proc/<pid>/maps for a "suitable" region of available
: VA space.  "Suitable" generally means it has to have a base address
: within a certain limited range (a particular device model might
: have odd limitations, for example), it has to be large enough, and
: alignment has to be large enough (again, various devices may have
: constraints that lead us to do this).
:
: This is of course subject to races with other threads in the process.
:
: Let's say it finds a region starting at va.
:
: b) Next it does:
:     p = mmap(va, ...)
:
: *without* setting MAP_FIXED, of course (so va is just a hint), to
: attempt to safely reserve that region. If p != va, then in most cases,
: this is a failure (almost certainly due to another thread getting a
: mapping from that region before we did), and so this layer now has to
: call munmap(), before returning a "failure: retry" to upper layers.
:
:     IMPROVEMENT: --> if instead, we could call this:
:
:             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
:
:         , then we could skip the munmap() call upon failure. This
:         is a small thing, but it is useful here. (Thanks to Piotr
:         Jaroszynski and Mark Hairgrove for helping me get that detail
:         exactly right, btw.)
:
: c) After that, CUDA suballocates from p, via:
:
:      q = mmap(sub_region_start, ... MAP_FIXED ...)
:
: Interestingly enough, "freeing" is also done via MAP_FIXED, and
: setting PROT_NONE to the subregion. Anyway, I just included (c) for
: general interest.

Atomic address range probing in the multithreaded programs in general
sounds like an interesting thing to me.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
should follow.  Actually real MAP_FIXED usages should be docummented
properly and they should be more of an exception.

[1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
[3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

This patch (of 2):

MAP_FIXED is used quite often to enforce mapping at the particular range.
The main problem of this flag is, however, that it is inherently dangerous
because it unmaps existing mappings covered by the requested range.  This
can cause silent memory corruptions.  Some of them even with serious
security implications.  While the current semantic might be really
desiderable in many cases there are others which would want to enforce the
given range but rather see a failure than a silent memory corruption on a
clashing range.  Please note that there is no guarantee that a given range
is obeyed by the mmap even when it is free - e.g.  arch specific code is
allowed to apply an alignment.

Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
request with a single exception that it fails with EEXIST if the requested
address is already covered by an existing mapping.  We still do rely on
get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
check for a conflicting vma after it returns.

The flag is introduced as a completely new one rather than a MAP_FIXED
extension because of the backward compatibility.  We really want a
never-clobber semantic even on older kernels which do not recognize the
flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
EINVAL on unknown flags.  On those kernels we would simply use the
traditional hint based semantic so the caller can still get a different
address (which sucks) but at least not silently corrupt an existing
mapping.  I do not see a good way around that.

[mpe@ellerman.id.au: fix whitespace]
[fail on clashing range with EEXIST as per Florian Weimer]
[set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.org
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Evans <jasone@google.com>
Cc: David Goldblatt <davidtgoldblatt@gmail.com>
Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Marcin Nowakowski
256211f2b0
MIPS: Add crc instruction support flag to elf_hwcap
Indicate that CRC32 and CRC32C instuctions are supported by the CPU
through elf_hwcap flags.

This will be used by a follow-up commit that introduces crc32(c) crypto
acceleration modules and is required by GENERIC_CPU_AUTOPROBE feature.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@mips.com>
Signed-off-by: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/18600/
2018-02-19 20:50:35 +00:00
Al Viro
7a163b2195 unify {de,}mangle_poll(), get rid of kernel-side POLL...
except, again, POLLFREE and POLL_BUSY_LOOP.

With this, we finally get to the promised end result:

 - POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
   stray instances of ->poll() still using those will be caught by
   sparse.

 - eventpoll.c and select.c warning-free wrt __poll_t

 - no more kernel-side definitions of POLL... - userland ones are
   visible through the entire kernel (and used pretty much only for
   mangle/demangle)

 - same behavior as after the first series (i.e. sparc et.al. epoll(2)
   working correctly).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11 14:37:22 -08:00
Linus Torvalds
168fe32a07 Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull poll annotations from Al Viro:
 "This introduces a __bitwise type for POLL### bitmap, and propagates
  the annotations through the tree. Most of that stuff is as simple as
  'make ->poll() instances return __poll_t and do the same to local
  variables used to hold the future return value'.

  Some of the obvious brainos found in process are fixed (e.g. POLLIN
  misspelled as POLL_IN). At that point the amount of sparse warnings is
  low and most of them are for genuine bugs - e.g. ->poll() instance
  deciding to return -EINVAL instead of a bitmap. I hadn't touched those
  in this series - it's large enough as it is.

  Another problem it has caught was eventpoll() ABI mess; select.c and
  eventpoll.c assumed that corresponding POLL### and EPOLL### were
  equal. That's true for some, but not all of them - EPOLL### are
  arch-independent, but POLL### are not.

  The last commit in this series separates userland POLL### values from
  the (now arch-independent) kernel-side ones, converting between them
  in the few places where they are copied to/from userland. AFAICS, this
  is the least disruptive fix preserving poll(2) ABI and making epoll()
  work on all architectures.

  As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
  it will trigger only on what would've triggered EPOLLWRBAND on other
  architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
  at all on sparc. With this patch they should work consistently on all
  architectures"

* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  make kernel-side POLL... arch-independent
  eventpoll: no need to mask the result of epi_item_poll() again
  eventpoll: constify struct epoll_event pointers
  debugging printk in sg_poll() uses %x to print POLL... bitmap
  annotate poll(2) guts
  9p: untangle ->poll() mess
  ->si_band gets POLL... bitmap stored into a user-visible long field
  ring_buffer_poll_wait() return value used as return value of ->poll()
  the rest of drivers/*: annotate ->poll() instances
  media: annotate ->poll() instances
  fs: annotate ->poll() instances
  ipc, kernel, mm: annotate ->poll() instances
  net: annotate ->poll() instances
  apparmor: annotate ->poll() instances
  tomoyo: annotate ->poll() instances
  sound: annotate ->poll() instances
  acpi: annotate ->poll() instances
  crypto: annotate ->poll() instances
  block: annotate ->poll() instances
  x86: annotate ->poll() instances
  ...
2018-01-30 17:58:07 -08:00
Al Viro
09d1415d24 signal/mips: switch mips to generic siginfo
... having taught the latter that si_errno and si_code might be
swapped.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-01-12 14:34:48 -06:00
Hendrik Brueckner
c895f6f703 bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type
Commit 0515e5999a ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
program type") introduced the bpf_perf_event_data structure which
exports the pt_regs structure.  This is OK for multiple architectures
but fail for s390 and arm64 which do not export pt_regs.  Programs
using them, for example, the bpf selftest fail to compile on these
architectures.

For s390, exporting the pt_regs is not an option because s390 wants
to allow changes to it.  For arm64, there is a user_pt_regs structure
that covers parts of the pt_regs structure for use by user space.

To solve the broken uapi for s390 and arm64, introduce an abstract
type for pt_regs and add an asm/bpf_perf_event.h file that concretes
the type.  An asm-generic header file covers the architectures that
export pt_regs today.

The arch-specific enablement for s390 and arm64 follows in separate
commits.

Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Fixes: 0515e5999a ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-05 15:02:40 +01:00
Al Viro
c71d227fc4 make kernel-side POLL... arch-independent
mangle/demangle on the way to/from userland

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-29 19:00:41 -05:00
Al Viro
8ced390c2b define __poll_t, annotate constants
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:19:52 -05:00
Linus Torvalds
a3841f94c7 libnvdimm for 4.15
* Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
  'userspace flush' of persistent memory updates via filesystem-dax
   mappings. It arranges for any filesystem metadata updates that may be
   required to satisfy a write fault to also be flushed ("on disk") before
   the kernel returns to userspace from the fault handler. Effectively
   every write-fault that dirties metadata completes an fsync() before
   returning from the fault handler. The new MAP_SHARED_VALIDATE mapping
   type guarantees that the MAP_SYNC flag is validated as supported by the
   filesystem's ->mmap() file operation.
 
 * Add support for the standard ACPI 6.2 label access methods that
   replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This
   enables interoperability with environments that only implement the
   standardized methods.
 
 * Add support for the ACPI 6.2 NVDIMM media error injection methods.
 
 * Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch
   last shutdown status, firmware update, SMART error injection, and
   SMART alarm threshold control.
 
 * Cleanup physical address information disclosures to be root-only.
 
 * Fix revalidation of the DIMM "locked label area" status to support
   dynamic unlock of the label area.
 
 * Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
   (system-physical-address) command and error injection commands.
 
 Acknowledgements that came after the commits were pushed to -next:
 
 957ac8c421 dax: fix PMD faults on zero-length files
 Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
 
 a39e596baa xfs: support for synchronous DAX faults
 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
 
 7b565c9f96 xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDfvcAAoJEB7SkWpmfYgCk7sP/2qJhBH+VTTdg2osDnhAdAhI
 co/AGEmsHFlUCMBb/Ek7UnMAmhBYiJU2q4ywPsNFBpusXpMlqNy5Iwo7k4/wQHE/
 SJcIM0g4zg0ViFuUhwV+C2T0R5UzFR8JLd9EYWj/YS6aJpurtotm5l4UStaM0Hzo
 AhxSXJLrBDuqCpbOxbctfiGEmdRL7aRfBEAARTNRKBn/iXxJUcYHlp62rtXQS+t4
 I6LC/URCWTNTTMGmzW6TRsgSD9WMfd19xKcGzN3qL6ee0KFccxN4ctFqHA/sFGOh
 iYLeR0XJUjJxyp+PkWGteXPVZL0Kj3bD/lSTG+Co5bm/ra8a/sh3TSFfgFyoBZD1
 EqMN8Ryf80hGp3FabeH2Iw2SviYPZpHSWgjddjxLD0RA6OmpzINc+Wm8eqApjMME
 sbZDTOijiab4QMQ0XamF4GuDHyQtawv5Y/w2Ehhl1tmiqW+5tKhsKqxkQt+/V3Yt
 RTVSRe2Pkway66b+cD64IdQ6L2tyonPnmi5IzgkKOhlOEGomy+4/U2Jt2bMbhzq6
 ymszKmXp2XI8P06wU8sHrIUeXO5I9qoKn/fZA73Eb8aIzgJe3tBE/5+Ab7RG6HB9
 1OVfcMWoXU1gNgNktTs63X1Lsg4aW9kt/K4fPHHcqUcaliEJpJTlAbg9GLF2buoW
 nQ+0fTRgMRihE3ZA0Fs3
 =h2vZ
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm and dax updates from Dan Williams:
 "Save for a few late fixes, all of these commits have shipped in -next
  releases since before the merge window opened, and 0day has given a
  build success notification.

  The ext4 touches came from Jan, and the xfs touches have Darrick's
  reviewed-by. An xfstest for the MAP_SYNC feature has been through
  a few round of reviews and is on track to be merged.

   - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
     'userspace flush' of persistent memory updates via filesystem-dax
     mappings. It arranges for any filesystem metadata updates that may
     be required to satisfy a write fault to also be flushed ("on disk")
     before the kernel returns to userspace from the fault handler.
     Effectively every write-fault that dirties metadata completes an
     fsync() before returning from the fault handler. The new
     MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag
     is validated as supported by the filesystem's ->mmap() file
     operation.

   - Add support for the standard ACPI 6.2 label access methods that
     replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods.
     This enables interoperability with environments that only implement
     the standardized methods.

   - Add support for the ACPI 6.2 NVDIMM media error injection methods.

   - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for
     latch last shutdown status, firmware update, SMART error injection,
     and SMART alarm threshold control.

   - Cleanup physical address information disclosures to be root-only.

   - Fix revalidation of the DIMM "locked label area" status to support
     dynamic unlock of the label area.

   - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
     (system-physical-address) command and error injection commands.

  Acknowledgements that came after the commits were pushed to -next:

   - 957ac8c421 ("dax: fix PMD faults on zero-length files"):
       Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

   - a39e596baa ("xfs: support for synchronous DAX faults") and
     7b565c9f96 ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()")
        Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>"

* tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits)
  acpi, nfit: add 'Enable Latch System Shutdown Status' command support
  dax: fix general protection fault in dax_alloc_inode
  dax: fix PMD faults on zero-length files
  dax: stop requiring a live device for dax_flush()
  brd: remove dax support
  dax: quiet bdev_dax_supported()
  fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
  tools/testing/nvdimm: unit test clear-error commands
  acpi, nfit: validate commands against the device type
  tools/testing/nvdimm: stricter bounds checking for error injection commands
  xfs: support for synchronous DAX faults
  xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
  ext4: Support for synchronous DAX faults
  ext4: Simplify error handling in ext4_dax_huge_fault()
  dax: Implement dax_finish_sync_fault()
  dax, iomap: Add support for synchronous faults
  mm: Define MAP_SYNC and VM_SYNC flags
  dax: Allow tuning whether dax_insert_mapping_entry() dirties entry
  dax: Allow dax_iomap_fault() to return pfn
  dax: Fix comment describing dax_iomap_fault()
  ...
2017-11-17 09:51:57 -08:00
Dan Williams
1c97259740 mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags
The mmap(2) syscall suffers from the ABI anti-pattern of not validating
unknown flags. However, proposals like MAP_SYNC need a mechanism to
define new behavior that is known to fail on older kernels without the
support. Define a new MAP_SHARED_VALIDATE flag pattern that is
guaranteed to fail on all legacy mmap implementations.

It is worth noting that the original proposal was for a standalone
MAP_VALIDATE flag. However, when that  could not be supported by all
archs Linus observed:

    I see why you *think* you want a bitmap. You think you want
    a bitmap because you want to make MAP_VALIDATE be part of MAP_SYNC
    etc, so that people can do

    ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED
		    | MAP_SYNC, fd, 0);

    and "know" that MAP_SYNC actually takes.

    And I'm saying that whole wish is bogus. You're fundamentally
    depending on special semantics, just make it explicit. It's already
    not portable, so don't try to make it so.

    Rename that MAP_VALIDATE as MAP_SHARED_VALIDATE, make it have a value
    of 0x3, and make people do

    ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED_VALIDATE
		    | MAP_SYNC, fd, 0);

    and then the kernel side is easier too (none of that random garbage
    playing games with looking at the "MAP_VALIDATE bit", but just another
    case statement in that map type thing.

    Boom. Done.

Similar to ->fallocate() we also want the ability to validate the
support for new flags on a per ->mmap() 'struct file_operations'
instance basis.  Towards that end arrange for flags to be generically
validated against a mmap_supported_flags exported by 'struct
file_operations'. By default all existing flags are implicitly
supported, but new flags require MAP_SHARED_VALIDATE and
per-instance-opt-in.

Cc: Jan Kara <jack@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-03 06:26:22 -07:00
Greg Kroah-Hartman
e2be04c7f9 License cleanup: add SPDX license identifier to uapi header files with a license
Many user space API headers have licensing information, which is either
incomplete, badly formatted or just a shorthand for referring to the
license under which the file is supposed to be.  This makes it hard for
compliance tools to determine the correct license.

Update these files with an SPDX license identifier.  The identifier was
chosen based on the license information in the file.

GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license
identifier with the added 'WITH Linux-syscall-note' exception, which is
the officially assigned exception identifier for the kernel syscall
exception:

   NOTE! This copyright does *not* cover user programs that use kernel
   services by normal system calls - this is merely considered normal use
   of the kernel, and does *not* fall under the heading of "derived work".

This exception makes it possible to include GPL headers into non GPL
code, without confusing license compliance tools.

Headers which have either explicit dual licensing or are just licensed
under a non GPL license are updated with the corresponding SPDX
identifier and the GPLv2 with syscall exception identifier.  The format
is:
        ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE)

SPDX license identifiers are a legally binding shorthand, which can be
used instead of the full boiler plate text.  The update does not remove
existing license information as this has to be done on a case by case
basis and the copyright holders might have to be consulted. This will
happen in a separate step.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.  See the previous patch in this series for the
methodology of how this patch was researched.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:20:11 +01:00
Greg Kroah-Hartman
6f52b16c5b License cleanup: add SPDX license identifier to uapi header files with no license
Many user space API headers are missing licensing information, which
makes it hard for compliance tools to determine the correct license.

By default are files without license information under the default
license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
them from being included in non GPLV2 code, which is obviously not
intended. The user space API headers fall under the syscall exception
which is in the kernels COPYING file:

   NOTE! This copyright does *not* cover user programs that use kernel
   services by normal system calls - this is merely considered normal use
   of the kernel, and does *not* fall under the heading of "derived work".

otherwise syscall usage would not be possible.

Update the files which contain no license information with an SPDX
license identifier.  The chosen identifier is 'GPL-2.0 WITH
Linux-syscall-note' which is the officially assigned identifier for the
Linux syscall exception.  SPDX license identifiers are a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.  See the previous patch in this series for the
methodology of how this patch was researched.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:19:54 +01:00
Linus Torvalds
7318413077 Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "This is the main pull request for 4.14 for MIPS; below a summary of
  the non-merge commits:

  CM:
   - Rename mips_cm_base to mips_gcr_base
   - Specify register size when generating accessors
   - Use BIT/GENMASK for register fields, order & drop shifts
   - Add cluster & block args to mips_cm_lock_other()

  CPC:
   - Use common CPS accessor generation macros
   - Use BIT/GENMASK for register fields, order & drop shifts
   - Introduce register modify (set/clear/change) accessors
   - Use change_*, set_* & clear_* where appropriate
   - Add CM/CPC 3.5 register definitions
   - Use GlobalNumber macros rather than magic numbers
   - Have asm/mips-cps.h include CM & CPC headers
   - Cluster support for topology functions
   - Detect CPUs in secondary clusters

  CPS:
   - Read GIC_VL_IDENT directly, not via irqchip driver

  DMA:
   - Consolidate coherent and non-coherent dma_alloc code
   - Don't use dma_cache_sync to implement fd_cacheflush

  FPU emulation / FP assist code:
   - Another series of 14 commits fixing corner cases such as NaN
     propgagation and other special input values.
   - Zero bits 32-63 of the result for a CLASS.D instruction.
   - Enhanced statics via debugfs
   - Do not use bools for arithmetic. GCC 7.1 moans about this.
   - Correct user fault_addr type

  Generic MIPS:
   - Enhancement of stack backtraces
   - Cleanup from non-existing options
   - Handle non word sized instructions when examining frame
   - Fix detection and decoding of ADDIUSP instruction
   - Fix decoding of SWSP16 instruction
   - Refactor handling of stack pointer in get_frame_info
   - Remove unreachable code from force_fcr31_sig()
   - Convert to using %pOF instead of full_name
   - Remove the R6000 support.
   - Move FP code from *_switch.S to *_fpu.S
   - Remove unused ST_OFF from r2300_switch.S
   - Allow platform to specify multiple its.S files
   - Add #includes to various files to ensure code builds reliable and
     without warning..
   - Remove __invalidate_kernel_vmap_range
   - Remove plat_timer_setup
   - Declare various variables & functions static
   - Abstract CPU core & VP(E) ID access through accessor functions
   - Store core & VP IDs in GlobalNumber-style variable
   - Unify checks for sibling CPUs
   - Add CPU cluster number accessors
   - Prevent direct use of generic_defconfig
   - Make CONFIG_MIPS_MT_SMP default y
   - Add __ioread64_copy
   - Remove unnecessary inclusions of linux/irqchip/mips-gic.h

  GIC:
   - Introduce asm/mips-gic.h with accessor functions
   - Use new GIC accessor functions in mips-gic-timer
   - Remove counter access functions from irq-mips-gic.c
   - Remove gic_read_local_vp_id() from irq-mips-gic.c
   - Simplify shared interrupt pending/mask reads in irq-mips-gic.c
   - Simplify gic_local_irq_domain_map() in irq-mips-gic.c
   - Drop gic_(re)set_mask() functions in irq-mips-gic.c
   - Remove gic_set_polarity(), gic_set_trigger(), gic_set_dual_edge(),
     gic_map_to_pin() and gic_map_to_vpe() from irq-mips-gic.c.
   - Convert remaining shared reg access, local int mask access and
     remaining local reg access to new accessors
   - Move GIC_LOCAL_INT_* to asm/mips-gic.h
   - Remove GIC_CPU_INT* macros from irq-mips-gic.c
   - Move various definitions to the driver
   - Remove gic_get_usm_range()
   - Remove __gic_irq_dispatch() forward declaration
   - Remove gic_init()
   - Use mips_gic_present() in place of gic_present and remove
     gic_present
   - Move gic_get_c0_*_int() to asm/mips-gic.h
   - Remove linux/irqchip/mips-gic.h
   - Inline __gic_init()
   - Inline gic_basic_init()
   - Make pcpu_masks a per-cpu variable
   - Use pcpu_masks to avoid reading GIC_SH_MASK*
   - Clean up mti, reserved-cpu-vectors handling
   - Use cpumask_first_and() in gic_set_affinity()
   - Let the core set struct irq_common_data affinity

  microMIPS:
   - Fix microMIPS stack unwinding on big endian systems

  MIPS-GIC:
   - SYNC after enabling GIC region

  NUMA:
   - Remove the unused parent_node() macro

  R6:
   - Constify r2_decoder_tables
   - Add accessor & bit definitions for GlobalNumber

  SMP:
   - Constify smp ops
   - Allow boot_secondary SMP op to return errors

  VDSO:
   - Drop gic_get_usm_range() usage
   - Avoid use of linux/irqchip/mips-gic.h

  Platform changes:

  Alchemy:
   - Add devboard machine type to cpuinfo
   - update cpu feature overrides
   - Threaded carddetect irqs for devboards

  AR7:
   - allow NULL clock for clk_get_rate

  BCM63xx:
   - Fix ENETDMA_6345_MAXBURST_REG offset
   - Allow NULL clock for clk_get_rate

  CI20:
   - Enable GPIO and RTC drivers in defconfig
   - Add ethernet and fixed-regulator nodes to DTS

  Generic platform:
   - Move Boston and NI 169445 FIT image source to their own files
   - Include asm/bootinfo.h for plat_fdt_relocated()
   - Include asm/time.h for get_c0_*_int()
   - Include asm/bootinfo.h for plat_fdt_relocated()
   - Include asm/time.h for get_c0_*_int()
   - Allow filtering enabled boards by requirements
   - Don't explicitly disable CONFIG_USB_SUPPORT
   - Bump default NR_CPUS to 16

  JZ4700:
   - Probe the jz4740-rtc driver from devicetree

  Lantiq:
   - Drop check of boot select from the spi-falcon driver.
   - Drop check of boot select from the lantiq-flash MTD driver.
   - Access boot cause register in the watchdog driver through regmap
   - Add device tree binding documentation for the watchdog driver
   - Add docs for the RCU DT bindings.
   - Convert the fpi bus driver to a platform_driver
   - Remove ltq_reset_cause() and ltq_boot_select(
   - Switch to a proper reset driver
   - Switch to a new drivers/soc GPHY driver
   - Add an USB PHY driver for the Lantiq SoCs using the RCU module
   - Use of_platform_default_populate instead of __dt_register_buses
   - Enable MFD_SYSCON to be able to use it for the RCU MFD
   - Replace ltq_boot_select() with dummy implementation.

  Loongson 2F:
   - Allow NULL clock for clk_get_rate

  Malta:
   - Use new GIC accessor functions

  NI 169445:
   - Add support for NI 169445 board.
   - Only include in 32r2el kernels

  Octeon:
   - Add support for watchdog of 78XX SOCs.
   - Add support for watchdog of CN68XX SOCs.
   - Expose support for mips32r1, mips32r2 and mips64r1
   - Enable more drivers in config file
   - Add support for accessing the boot vector.
   - Remove old boot vector code from watchdog driver
   - Define watchdog registers for 70xx, 73xx, 78xx, F75xx.
   - Make CSR functions node aware.
   - Allow access to CIU3 IRQ domains.
   - Misc cleanups in the watchdog driver

  Omega2+:
   - New board, add support and defconfig

  Pistachio:
   - Enable Root FS on NFS in defconfig

  Ralink:
   - Add Mediatek MT7628A SoC
   - Allow NULL clock for clk_get_rate
   - Explicitly request exclusive reset control in the pci-mt7620 PCI driver.

  SEAD3:
   - Only include in 32 bit kernels by default

  VoCore:
   - Add VoCore as a vendor t0 dt-bindings
   - Add defconfig file"

* '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (167 commits)
  MIPS: Refactor handling of stack pointer in get_frame_info
  MIPS: Stacktrace: Fix microMIPS stack unwinding on big endian systems
  MIPS: microMIPS: Fix decoding of swsp16 instruction
  MIPS: microMIPS: Fix decoding of addiusp instruction
  MIPS: microMIPS: Fix detection of addiusp instruction
  MIPS: Handle non word sized instructions when examining frame
  MIPS: ralink: allow NULL clock for clk_get_rate
  MIPS: Loongson 2F: allow NULL clock for clk_get_rate
  MIPS: BCM63XX: allow NULL clock for clk_get_rate
  MIPS: AR7: allow NULL clock for clk_get_rate
  MIPS: BCM63XX: fix ENETDMA_6345_MAXBURST_REG offset
  mips: Save all registers when saving the frame
  MIPS: Add DWARF unwinding to assembly
  MIPS: Make SAVE_SOME more standard
  MIPS: Fix issues in backtraces
  MIPS: jz4780: DTS: Probe the jz4740-rtc driver from devicetree
  MIPS: Ci20: Enable RTC driver
  watchdog: octeon-wdt: Add support for 78XX SOCs.
  watchdog: octeon-wdt: Add support for cn68XX SOCs.
  watchdog: octeon-wdt: File cleaning.
  ...
2017-09-15 20:43:33 -07:00
Linus Torvalds
dd198ce714 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "Life has been busy and I have not gotten half as much done this round
  as I would have liked. I delayed it so that a minor conflict
  resolution with the mips tree could spend a little time in linux-next
  before I sent this pull request.

  This includes two long delayed user namespace changes from Kirill
  Tkhai. It also includes a very useful change from Serge Hallyn that
  allows the security capability attribute to be used inside of user
  namespaces. The practical effect of this is people can now untar
  tarballs and install rpms in user namespaces. It had been suggested to
  generalize this and encode some of the namespace information
  information in the xattr name. Upon close inspection that makes the
  things that should be hard easy and the things that should be easy
  more expensive.

  Then there is my bugfix/cleanup for signal injection that removes the
  magic encoding of the siginfo union member from the kernel internal
  si_code. The mips folks reported the case where I had used FPE_FIXME
  me is impossible so I have remove FPE_FIXME from mips, while at the
  same time including a return statement in that case to keep gcc from
  complaining about unitialized variables.

  I almost finished the work to get make copy_siginfo_to_user a trivial
  copy to user. The code is available at:

     git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

  But I did not have time/energy to get the code posted and reviewed
  before the merge window opened.

  I was able to see that the security excuse for just copying fields
  that we know are initialized doesn't work in practice there are buggy
  initializations that don't initialize the proper fields in siginfo. So
  we still sometimes copy unitialized data to userspace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Introduce v3 namespaced file capabilities
  mips/signal: In force_fcr31_sig return in the impossible case
  signal: Remove kernel interal si_code magic
  fcntl: Don't use ambiguous SIG_POLL si_codes
  prctl: Allow local CAP_SYS_ADMIN changing exe_file
  security: Use user_namespace::level to avoid redundant iterations in cap_capable()
  userns,pidns: Verify the userns for new pid namespaces
  signal/testing: Don't look for __SI_FAULT in userspace
  signal/mips: Document a conflict with SI_USER with SIGFPE
  signal/sparc: Document a conflict with SI_USER with SIGFPE
  signal/ia64: Document a conflict with SI_USER with SIGFPE
  signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-11 18:34:47 -07:00
Linus Torvalds
d34fc1adf0 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - various misc bits

 - DAX updates

 - OCFS2

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits)
  mm,fork: introduce MADV_WIPEONFORK
  x86,mpx: make mpx depend on x86-64 to free up VMA flag
  mm: add /proc/pid/smaps_rollup
  mm: hugetlb: clear target sub-page last when clearing huge page
  mm: oom: let oom_reap_task and exit_mmap run concurrently
  swap: choose swap device according to numa node
  mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
  mm, oom: do not rely on TIF_MEMDIE for memory reserves access
  z3fold: use per-cpu unbuddied lists
  mm, swap: don't use VMA based swap readahead if HDD is used as swap
  mm, swap: add sysfs interface for VMA based swap readahead
  mm, swap: VMA based swap readahead
  mm, swap: fix swap readahead marking
  mm, swap: add swap readahead hit statistics
  mm/vmalloc.c: don't reinvent the wheel but use existing llist API
  mm/vmstat.c: fix wrong comment
  selftests/memfd: add memfd_create hugetlbfs selftest
  mm/shmem: add hugetlbfs support to memfd_create()
  mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
  mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
  ...
2017-09-06 20:49:49 -07:00
Rik van Riel
d2cd9ede6e mm,fork: introduce MADV_WIPEONFORK
Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
in the child process after fork.  This differs from MADV_DONTFORK in one
important way.

If a child process accesses memory that was MADV_WIPEONFORK, it will get
zeroes.  The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will get a
segmentation fault, since those address ranges are no longer valid in
the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large programs
to fork in systems with strict memory overcommit restrictions, changing
the semantics of MADV_DONTFORK might break existing programs.

MADV_WIPEONFORK only works on private, anonymous VMAs.

The use case is libraries that store or cache information, and want to
know that they need to regenerate it in the child process after fork.

Examples of this would be:
 - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
   check, which is too slow without a PID cache)
 - PKCS#11 API reinitialization check (mandated by specification)
 - glibc's upcoming PRNG (reseed after fork)
 - OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized PRNG in
every child process are pretty obvious.  However, due to libraries
having all kinds of internal state, and programs getting compiled with
many different versions of each library, it is unreasonable to expect
calling programs to re-initialize everything manually after fork.

A further complication is the proliferation of clone flags, programs
bypassing glibc's functions to call clone directly, and programs calling
unshare, causing the glibc pthread_atfork hook to not get called.

It would be better to have the kernel take care of this automatically.

The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
MADV_WIPEONFORK.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

[akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: Colm MacCártaigh <colm@allcosts.net>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:30 -07:00
Mike Kravetz
aafd4562df mm: arch: consolidate mmap hugetlb size encodings
A non-default huge page size can be encoded in the flags argument of the
mmap system call.  The definitions for these encodings are in arch
specific header files.  However, all architectures use the same values.

Consolidate all the definitions in the primary user header file
(uapi/linux/mman.h).  Include definitions for all known huge page sizes.
Use the generic encoding definitions in hugetlb_encode.h as the basis
for these definitions.

Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:28 -07:00
Matt Redfearn
cea8cd498f MIPS: microMIPS: Fix decoding of swsp16 instruction
When the immediate encoded in the instruction is accessed, it is sign
extended due to being a signed value being assigned to a signed integer.
The ISA specifies that this operation is an unsigned operation.
The sign extension leads us to incorrectly decode:

801e9c8e:       cbf1            sw      ra,68(sp)

As having an immediate of 1073741809.

Since the instruction format does not specify signed/unsigned, and this
is currently the only location to use this instuction format, change it
to an unsigned immediate.

Fixes: bb9bc4689b ("MIPS: Calculate microMIPS ra properly when unwinding the stack")
Suggested-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: Miodrag Dinic <miodrag.dinic@imgtec.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16957/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-09-06 13:20:25 +02:00
Eric W. Biederman
20229305af mips/signal: In force_fcr31_sig return in the impossible case
In a recent discussion Maciej Rozycki reported that this case is
impossible.

Handle the impossible case by just returning instead of trying to
handle it.  This makes static analysis simpler as it means nothing
needs to consider the impossible case after the return statement.

As the code no longer has to deal with this case remove FPE_FIXME from
the mips siginfo.h

Cc: "Maciej W. Rozycki" <macro@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Link: http://lkml.kernel.org/r/20170718140651.15973-4-ebiederm@xmission.com
Ref: ea1b75cf91 ("signal/mips: Document a conflict with SI_USER with SIGFPE")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-08-17 17:31:27 -05:00
Willem de Bruijn
76851d1212 sock: add SOCK_ZEROCOPY sockopt
The send call ignores unknown flags. Legacy applications may already
unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a
socket opts in to zerocopy.

Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing.
Processes can also query this socket option to detect kernel support
for the feature. Older kernels will return ENOPROTOOPT.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 21:37:29 -07:00
Eric W. Biederman
cc731525f2 signal: Remove kernel interal si_code magic
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS

While this looks plausible on the surface, in practice this situation has
not worked well.

- Injected positive signals are not copied to user space properly
  unless they have these magic high bits set.

- Injected positive signals are not reported properly by signalfd
  unless they have these magic high bits set.

- These kernel internal values leaked to userspace via ptrace_peek_siginfo

- It was possible to inject these kernel internal values and cause the
  the kernel to misbehave.

- Kernel developers got confused and expected these kernel internal values
  in userspace in kernel self tests.

- Kernel developers got confused and set si_code to __SI_FAULT which
  is SI_USER in userspace which causes userspace to think an ordinary user
  sent the signal and that it was not kernel generated.

- The values make it impossible to reorganize the code to transform
  siginfo_copy_to_user into a plain copy_to_user.  As si_code must
  be massaged before being passed to userspace.

So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.

To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used.  Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.

A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like.  The good news is only problem
architectures pay the cost.

Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values.  Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.

Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first.  The high bits are no
longer used to hold a magic union member.

Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.

The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.

The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.

Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-07-24 14:30:28 -05:00
Eric W. Biederman
ea1b75cf91 signal/mips: Document a conflict with SI_USER with SIGFPE
Setting si_code to __SI_FAULT results in a userspace seeing
an si_code of 0.  This is the same si_code as SI_USER.  Posix
and common sense requires that SI_USER not be a signal specific
si_code.  As such this use of 0 for the si_code is a pretty
horribly broken ABI.

This use of of __SI_FAULT is only a decade old.  Which compared
to the other pieces of kernel code that has made this mistake
is almost yesterday.

This is probably worth fixing but I don't know mips well enough
to know what si_code to would be the proper one to use.

Cc: Ralf Baechle <ralf@linux-mips.org>
Ref: 948a34cf39 ("[MIPS] Maintain si_code field properly for FP exceptions")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-07-19 19:13:15 -05:00
Gleb Fotengauer-Malinovskiy
c632517923 tty: Fix TIOCGPTPEER ioctl definition
This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag
because it doesn't copy anything from/to userspace to access the
argument.

Fixes: 54ebbfb160 ("tty: add TIOCGPTPEER ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Aleksa Sarai <asarai@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-17 17:04:41 +02:00
Linus Torvalds
568d135d33 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "Boston platform support:
   - Document DT bindings
   - Add CLK driver for board clocks

  CM:
   - Avoid per-core locking with CM3 & higher
   - WARN on attempt to lock invalid VP, not BUG

  CPS:
   - Select CONFIG_SYS_SUPPORTS_SCHED_SMT for MIPSr6
   - Prevent multi-core with dcache aliasing
   - Handle cores not powering down more gracefully
   - Handle spurious VP starts more gracefully

  DSP:
   - Add lwx & lhx missaligned access support

  eBPF:
   - Add MIPS support along with many supporting change to add the
     required infrastructure

  Generic arch code:
   - Misc sysmips MIPS_ATOMIC_SET fixes
   - Drop duplicate HAVE_SYSCALL_TRACEPOINTS
   - Negate error syscall return in trace
   - Correct forced syscall errors
   - Traced negative syscalls should return -ENOSYS
   - Allow samples/bpf/tracex5 to access syscall arguments for sane
     traces
   - Cleanup from old Kconfig options in defconfigs
   - Fix PREF instruction usage by memcpy for MIPS R6
   - Fix various special cases in the FPU eulation
   - Fix some special cases in MIPS16e2 support
   - Fix MIPS I ISA /proc/cpuinfo reporting
   - Sort MIPS Kconfig alphabetically
   - Fix minimum alignment requirement of IRQ stack as required by
     ABI / GCC
   - Fix special cases in the module loader
   - Perform post-DMA cache flushes on systems with MAARs
   - Probe the I6500 CPU
   - Cleanup cmpxchg and add support for 1 and 2 byte operations
   - Use queued read/write locks (qrwlock)
   - Use queued spinlocks (qspinlock)
   - Add CPU shared FTLB feature detection
   - Handle tlbex-tlbp race condition
   - Allow storing pgd in C0_CONTEXT for MIPSr6
   - Use current_cpu_type() in m4kc_tlbp_war()
   - Support Boston in the generic kernel

  Generic platform:
   - yamon-dt: Pull YAMON DT shim code out of SEAD-3 board
   - yamon-dt: Support > 256MB of RAM
   - yamon-dt: Use serial* rather than uart* aliases
   - Abstract FDT fixup application
   - Set RTC_ALWAYS_BCD to 0
   - Add a MAINTAINERS entry

  core kernel:
   - qspinlock.c: include linux/prefetch.h

  Loongson 3:
   - Add support

  Perf:
   - Add I6500 support

  SEAD-3:
   - Remove GIC timer from DT
   - Set interrupt-parent per-device, not at root node
   - Fix GIC interrupt specifiers

  SMP:
   - Skip IPI setup if we only have a single CPU

  VDSO:
   - Make comment match reality
   - Improvements to time code in VDSO"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (86 commits)
  locking/qspinlock: Include linux/prefetch.h
  MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
  MIPS: Fix minimum alignment requirement of IRQ stack
  MIPS: generic: Support MIPS Boston development boards
  MIPS: DTS: img: Don't attempt to build-in all .dtb files
  clk: boston: Add a driver for MIPS Boston board clocks
  dt-bindings: Document img,boston-clock binding
  MIPS: Traced negative syscalls should return -ENOSYS
  MIPS: Correct forced syscall errors
  MIPS: Negate error syscall return in trace
  MIPS: Drop duplicate HAVE_SYSCALL_TRACEPOINTS select
  MIPS16e2: Provide feature overrides for non-MIPS16 systems
  MIPS: MIPS16e2: Report ASE presence in /proc/cpuinfo
  MIPS: MIPS16e2: Subdecode extended LWSP/SWSP instructions
  MIPS: MIPS16e2: Identify ASE presence
  MIPS: VDSO: Fix a mismatch between comment and preprocessor constant
  MIPS: VDSO: Add implementation of gettimeofday() fallback
  MIPS: VDSO: Add implementation of clock_gettime() fallback
  MIPS: VDSO: Fix conversions in do_monotonic()/do_monotonic_coarse()
  MIPS: Use current_cpu_type() in m4kc_tlbp_war()
  ...
2017-07-15 10:59:54 -07:00
Linus Torvalds
5518b69b76 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Reasonably busy this cycle, but perhaps not as busy as in the 4.12
  merge window:

   1) Several optimizations for UDP processing under high load from
      Paolo Abeni.

   2) Support pacing internally in TCP when using the sch_fq packet
      scheduler for this is not practical. From Eric Dumazet.

   3) Support mutliple filter chains per qdisc, from Jiri Pirko.

   4) Move to 1ms TCP timestamp clock, from Eric Dumazet.

   5) Add batch dequeueing to vhost_net, from Jason Wang.

   6) Flesh out more completely SCTP checksum offload support, from
      Davide Caratti.

   7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
      Neira Ayuso, and Matthias Schiffer.

   8) Add devlink support to nfp driver, from Simon Horman.

   9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
      Prabhu.

  10) Add stack depth tracking to BPF verifier and use this information
      in the various eBPF JITs. From Alexei Starovoitov.

  11) Support XDP on qed device VFs, from Yuval Mintz.

  12) Introduce BPF PROG ID for better introspection of installed BPF
      programs. From Martin KaFai Lau.

  13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.

  14) For loads, allow narrower accesses in bpf verifier checking, from
      Yonghong Song.

  15) Support MIPS in the BPF selftests and samples infrastructure, the
      MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
      Daney.

  16) Support kernel based TLS, from Dave Watson and others.

  17) Remove completely DST garbage collection, from Wei Wang.

  18) Allow installing TCP MD5 rules using prefixes, from Ivan
      Delalande.

  19) Add XDP support to Intel i40e driver, from Björn Töpel

  20) Add support for TC flower offload in nfp driver, from Simon
      Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
      Kicinski, and Bert van Leeuwen.

  21) IPSEC offloading support in mlx5, from Ilan Tayari.

  22) Add HW PTP support to macb driver, from Rafal Ozieblo.

  23) Networking refcount_t conversions, From Elena Reshetova.

  24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
      for tuning the TCP sockopt settings of a group of applications,
      currently via CGROUPs"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
  net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
  dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
  cxgb4: Support for get_ts_info ethtool method
  cxgb4: Add PTP Hardware Clock (PHC) support
  cxgb4: time stamping interface for PTP
  nfp: default to chained metadata prepend format
  nfp: remove legacy MAC address lookup
  nfp: improve order of interfaces in breakout mode
  net: macb: remove extraneous return when MACB_EXT_DESC is defined
  bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
  bpf: fix return in load_bpf_file
  mpls: fix rtm policy in mpls_getroute
  net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
  net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
  net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
  net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
  ...
2017-07-05 12:31:59 -07:00
Miodrag Dinic
3f88ec6333 MIPS: unaligned: Add DSP lwx & lhx missaligned access support
Add handling of missaligned access for DSP load instructions
lwx & lhx.

Since DSP instructions share SPECIAL3 opcode with other non-DSP
instructions, necessary logic was inserted for distinguishing
between instructions with SPECIAL3 opcode. For that purpose,
the instruction format for DSP instructions is added to
arch/mips/include/uapi/asm/inst.h.

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtech.com>
Cc: James.Hogan@imgtec.com
Cc: Paul.Burton@imgtec.com
Cc: Raghu.Gandham@imgtec.com
Cc: Leonid.Yegoshin@imgtec.com
Cc: Douglas.Leung@imgtec.com
Cc: Petar.Jovanovic@imgtec.com
Cc: Goran.Ferenc@imgtec.com
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16511/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-29 02:42:24 +02:00
David Daney
1f22d599c9 MIPS: Correctly define DBSHFL type instruction opcodes.
DSHD was incorrectly classified as being BSHFL, and DSHD was missing
altogether.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16366/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-28 12:22:38 +02:00
David Herrmann
28b5ba2aa0 net: introduce SO_PEERGROUPS getsockopt
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.

While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.

Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.

Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.

So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.

Cc: Michal Sekletar <msekleta@redhat.com>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 11:38:41 -04:00
Aleksa Sarai
54ebbfb160 tty: add TIOCGPTPEER ioctl
When opening the slave end of a PTY, it is not possible for userspace to
safely ensure that /dev/pts/$num is actually a slave (in cases where the
mount namespace in which devpts was mounted is controlled by an
untrusted process). In addition, there are several unresolvable
race conditions if userspace were to attempt to detect attacks through
stat(2) and other similar methods [in addition it is not clear how
userspace could detect attacks involving FUSE].

Resolve this by providing an interface for userpace to safely open the
"peer" end of a PTY file descriptor by using the dentry cached by
devpts. Since it is not possible to have an open master PTY without
having its slave exposed in /dev/pts this interface is safe. This
interface currently does not provide a way to get the master pty (since
it is not clear whether such an interface is safe or even useful).

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Valentin Rothberg <vrothberg@suse.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 12:27:54 +02:00
David S. Miller
1c4f676a68 net: Define SCM_TIMESTAMPING_PKTINFO on all architectures.
A definition was only provided for asm-generic/socket.h
using platforms, define it for the others as well

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 23:13:37 -04:00
Nicolas Dichtel
fcc8487d47 uapi: export all headers under uapi directories
Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.

In fact, all headers under uapi directories should be exported, thus it's
useless to have an exhaustive list.

After this patch, the following files, which were not exported, are now
exported (with make headers_install_all):
asm-arc/kvm_para.h
asm-arc/ucontext.h
asm-blackfin/shmparam.h
asm-blackfin/ucontext.h
asm-c6x/shmparam.h
asm-c6x/ucontext.h
asm-cris/kvm_para.h
asm-h8300/shmparam.h
asm-h8300/ucontext.h
asm-hexagon/shmparam.h
asm-m32r/kvm_para.h
asm-m68k/kvm_para.h
asm-m68k/shmparam.h
asm-metag/kvm_para.h
asm-metag/shmparam.h
asm-metag/ucontext.h
asm-mips/hwcap.h
asm-mips/reg.h
asm-mips/ucontext.h
asm-nios2/kvm_para.h
asm-nios2/ucontext.h
asm-openrisc/shmparam.h
asm-parisc/kvm_para.h
asm-powerpc/perf_regs.h
asm-sh/kvm_para.h
asm-sh/ucontext.h
asm-tile/shmparam.h
asm-unicore32/shmparam.h
asm-unicore32/ucontext.h
asm-x86/hwcap2.h
asm-xtensa/kvm_para.h
drm/armada_drm.h
drm/etnaviv_drm.h
drm/vgem_drm.h
linux/aspeed-lpc-ctrl.h
linux/auto_dev-ioctl.h
linux/bcache.h
linux/btrfs_tree.h
linux/can/vxcan.h
linux/cifs/cifs_mount.h
linux/coresight-stm.h
linux/cryptouser.h
linux/fsmap.h
linux/genwqe/genwqe_card.h
linux/hash_info.h
linux/kcm.h
linux/kcov.h
linux/kfd_ioctl.h
linux/lightnvm.h
linux/module.h
linux/nbd-netlink.h
linux/nilfs2_api.h
linux/nilfs2_ondisk.h
linux/nsfs.h
linux/pr.h
linux/qrtr.h
linux/rpmsg.h
linux/sched/types.h
linux/sed-opal.h
linux/smc.h
linux/smc_diag.h
linux/stm.h
linux/switchtec_ioctl.h
linux/vfio_ccw.h
linux/wil6210_uapi.h
rdma/bnxt_re-abi.h

Note that I have removed from this list the files which are generated in every
exported directories (like .install or .install.cmd).

Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
subdirs with a pure makefile command.

For the record, note that exported files for asm directories are a mix of
files listed by:
 - include/uapi/asm-generic/Kbuild.asm;
 - arch/<arch>/include/uapi/asm/Kbuild;
 - arch/<arch>/include/asm/Kbuild.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-05-11 00:21:54 +09:00
Linus Torvalds
2d3e4866de * ARM: HYP mode stub supports kexec/kdump on 32-bit; improved PMU
support; virtual interrupt controller performance improvements; support
 for userspace virtual interrupt controller (slower, but necessary for
 KVM on the weird Broadcom SoCs used by the Raspberry Pi 3)
 
 * MIPS: basic support for hardware virtualization (ImgTec
 P5600/P6600/I6400 and Cavium Octeon III)
 
 * PPC: in-kernel acceleration for VFIO
 
 * s390: support for guests without storage keys; adapter interruption
 suppression
 
 * x86: usual range of nVMX improvements, notably nested EPT support for
 accessed and dirty bits; emulation of CPL3 CPUID faulting
 
 * generic: first part of VCPU thread request API; kvm_stat improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZEHUkAAoJEL/70l94x66DBeYH/09wrpJ2FjU4Rqv7FxmqgWfH
 9WGi4wvn/Z+XzQSyfMJiu2SfZVzU69/Y67OMHudy7vBT6knB+ziM7Ntoiu/hUfbG
 0g5KsDX79FW15HuvuuGh9kSjUsj7qsQdyPZwP4FW/6ZoDArV9mibSvdjSmiUSMV/
 2wxaoLzjoShdOuCe9EABaPhKK0XCrOYkygT6Paz1pItDxaSn8iW3ulaCuWMprUfG
 Niq+dFemK464E4yn6HVD88xg5j2eUM6bfuXB3qR3eTR76mHLgtwejBzZdDjLG9fk
 32PNYKhJNomBxHVqtksJ9/7cSR6iNPs7neQ1XHemKWTuYqwYQMlPj1NDy0aslQU=
 =IsiZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - HYP mode stub supports kexec/kdump on 32-bit
   - improved PMU support
   - virtual interrupt controller performance improvements
   - support for userspace virtual interrupt controller (slower, but
     necessary for KVM on the weird Broadcom SoCs used by the Raspberry
     Pi 3)

  MIPS:
   - basic support for hardware virtualization (ImgTec P5600/P6600/I6400
     and Cavium Octeon III)

  PPC:
   - in-kernel acceleration for VFIO

  s390:
   - support for guests without storage keys
   - adapter interruption suppression

  x86:
   - usual range of nVMX improvements, notably nested EPT support for
     accessed and dirty bits
   - emulation of CPL3 CPUID faulting

  generic:
   - first part of VCPU thread request API
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  kvm: nVMX: Don't validate disabled secondary controls
  KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
  Revert "KVM: Support vCPU-based gfn->hva cache"
  tools/kvm: fix top level makefile
  KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
  KVM: Documentation: remove VM mmap documentation
  kvm: nVMX: Remove superfluous VMX instruction fault checks
  KVM: x86: fix emulation of RSM and IRET instructions
  KVM: mark requests that need synchronization
  KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
  KVM: add explicit barrier to kvm_vcpu_kick
  KVM: perform a wake_up in kvm_make_all_cpus_request
  KVM: mark requests that do not need a wakeup
  KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
  KVM: x86: always use kvm_make_request instead of set_bit
  KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
  s390: kvm: Cpu model support for msa6, msa7 and msa8
  KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  kvm: better MWAIT emulation for guests
  KVM: x86: virtualize cpuid faulting
  ...
2017-05-08 12:37:56 -07:00
David S. Miller
6b6cbc1471 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were simply overlapping changes.  In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.

In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15 21:16:30 -04:00
Chenbo Feng
5daab9db7b New getsockopt option to get socket cookie
Introduce a new getsockopt operation to retrieve the socket cookie
for a specific socket based on the socket fd.  It returns a unique
non-decreasing cookie for each socket.
Tested: https://android-review.googlesource.com/#/c/358163/

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08 08:07:01 -07:00
Paolo Bonzini
4b4357e025 kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public
Its value has never changed; we might as well make it part of the ABI instead
of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO).

Because PPC does not always make MMIO available, the code has to be made
dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:01 +02:00
James Hogan
d42a008f86 KVM: MIPS/VZ: Emulate MAARs when necessary
Add emulation of Memory Accessibility Attribute Registers (MAARs) when
necessary. We can't actually do anything with whatever the guest
provides, but it may not be possible to clear Guest.Config5.MRP so we
have to emulate at least a pair of MAARs.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
2017-03-28 14:53:58 +01:00
James Hogan
955d8dc3ee KVM: MIPS: Implement HYPCALL emulation
Emulate the HYPCALL instruction added in the VZ ASE and used by the MIPS
paravirtualised guest support that is already merged. The new hypcall.c
handles arguments and the return value. No actual hypercalls are yet
supported, but this still allows us to safely step over hypercalls and
set an error code in the return value for forward compatibility.

Non-zero HYPCALL codes are not handled.

We also document the hypercall ABI which asm/kvm_para.h uses.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Andreas Herrmann <andreas.herrmann@caviumnetworks.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
2017-03-28 14:53:33 +01:00
Sridhar Samudrala
6d4339028b net: Introduce SO_INCOMING_NAPI_ID
This socket option returns the NAPI ID associated with the queue on which
the last frame is received. This information can be used by the apps to
split the incoming flows among the threads based on the Rx queue on which
they are received.

If the NAPI ID actually represents a sender_cpu then the value is ignored
and 0 is returned.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 20:49:31 -07:00
Josh Hunt
a2d133b1d4 sock: introduce SO_MEMINFO getsockopt
Allows reading of SK_MEMINFO_VARS via socket option. This way an
application can get all meminfo related information in single socket
option call instead of multiple calls.

Adds helper function, sk_get_meminfo(), and uses that for both
getsockopt and sock_diag_put_meminfo().

Suggested by Eric Dumazet.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:18:58 -07:00
James Hogan
9cb74b5e13 MIPS: Wire up statx system call
Wire up the statx system call for MIPS, which was introduced in commit
a528d35e8b ("statx: Add a system call to make enhanced file info
available").

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15387/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-03-08 12:11:13 +01:00
James Hogan
230c57244c KVM: MIPS: Claim KVM_CAP_READONLY_MEM support
Now that load/store faults due to read only memory regions are treated
as MMIO accesses it is safe to claim support for read only memory
regions (KVM_CAP_READONLY_MEM).

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
2017-02-03 15:21:29 +00:00
Francis Yan
1c885808e4 tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 10:04:25 -05:00
Ralf Baechle
11ed3e0ef3 MIPS: Wire up new pkey_{mprotect,alloc,free} syscalls
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14380/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-14 11:05:13 -07:00
Dave Hansen
e8c24d3a23 x86/pkeys: Allocation/free syscalls
This patch adds two new system calls:

	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	int pkey_free(int pkey);

These implement an "allocator" for the protection keys
themselves, which can be thought of as analogous to the allocator
that the kernel has for file descriptors.  The kernel tracks
which numbers are in use, and only allows operations on keys that
are valid.  A key which was not obtained by pkey_alloc() may not,
for instance, be passed to pkey_mprotect().

These system calls are also very important given the kernel's use
of pkeys to implement execute-only support.  These help ensure
that userspace can never assume that it has control of a key
unless it first asks the kernel.  The kernel does not promise to
preserve PKRU (right register) contents except for allocated
pkeys.

The 'init_access_rights' argument to pkey_alloc() specifies the
rights that will be established for the returned pkey.  For
instance:

	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);

will allocate 'pkey', but also sets the bits in PKRU[1] such that
writing to 'pkey' is already denied.

The kernel does not prevent pkey_free() from successfully freeing
in-use pkeys (those still assigned to a memory range by
pkey_mprotect()).  It would be expensive to implement the checks
for this, so we instead say, "Just don't do it" since sane
software will never do it anyway.

Any piece of userspace calling pkey_alloc() needs to be prepared
for it to fail.  Why?  pkey_alloc() returns the same error code
(ENOSPC) when there are no pkeys and when pkeys are unsupported.
They can be unsupported for a whole host of reasons, so apps must
be prepared for this.  Also, libraries or LD_PRELOADs might steal
keys before an application gets access to them.

This allocation mechanism could be implemented in userspace.
Even if we did it in userspace, we would still need additional
user/kernel interfaces to tell userspace which keys are being
used by the kernel internally (such as for execute-only
mappings).  Having the kernel provide this facility completely
removes the need for these additional interfaces, or having an
implementation of this in userspace at all.

Note that we have to make changes to all of the architectures
that do not use mman-common.h because we use the new
PKEY_DENY_ACCESS/WRITE macros in arch-independent code.

1. PKRU is the Protection Key Rights User register.  It is a
   usermode-accessible register that controls whether writes
   and/or access to each individual pkey is allowed or denied.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-09 13:02:27 +02:00
Linus Torvalds
4305f42401 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "This is the main pull request for MIPS for 4.8.  Also includes is a
  minor SSB cleanup as SSB code traditionally is merged through the MIPS
  tree:

  ATH25:
    - MIPS: Add default configuration for ath25

  Boot:
    - For zboot, copy appended dtb to the end of the kernel
    - store the appended dtb address in a variable

  BPF:
    - Fix off by one error in offset allocation

  Cobalt code:
    - Fix typos

  Core code:
    - debugfs_create_file returns NULL on error, so don't use IS_ERR for
      testing for errors.
    - Fix double locking issue in RM7000 S-cache code.  This would only
      affect RM7000 ARC systems on reboot.
    - Fix page table corruption on THP permission changes.
    - Use compat_sys_keyctl for 32 bit userspace on 64 bit kernels.
      David says, there are no compatibility issues raised by this fix.
    - Move some signal code around.
    - Rewrite r4k count/compare clockevent device registration such that
      min_delta_ticks/max_delta_ticks files are guaranteed to be
      initialized.
    - Only register r4k count/compare as clockevent device if we can
      assume the clock to be constant.
    - Fix MSA asm warnings in control reg accessors
    - uasm and tlbex fixes and tweaking.
    - Print segment physical address when EU=1.
    - Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO.
    - CP: Allow booting by VP other than VP 0
    - Cache handling fixes and optimizations for r4k class caches
    - Add hotplug support for R6 processors
    - Cleanup hotplug bits in kconfig
    - traps: return correct si code for accessing nonmapped addresses
    - Remove cpu_has_safe_index_cacheops

  Lantiq:
    - Register IRQ handler for virtual IRQ number
    - Fix EIU interrupt loading code
    - Use the real EXIN count
    - Fix build error.

  Loongson 3:
    - Increase HPET_MIN_PROG_DELTA and decrease HPET_MIN_CYCLES

  Octeon:
    - Delete built-in DTB pruning code for D-Link DSR-1000N.
    - Clean up GPIO definitions in dlink_dsr-1000n.dts.
    - Add more LEDs to the DSR-100n DTS
    - Fix off by one in octeon_irq_gpio_map()
    - Typo fixes
    - Enable SATA by default in cavium_octeon_defconfig
    - Support readq/writeq()
    - Remove forced mappings of USB interrupts.
    - Ensure DMA descriptors are always in the low 4GB
    - Improve USB reset code for OCTEON II.

  Pistachio:
    - Add maintainers entry for pistachio SoC Support
    - Remove plat_setup_iocoherency

  Ralink:
    - Fix pwm UART in spis group pinmux.

  SSB:
    - Change bare unsigned to unsigned int to suit coding style

  Tools:
    - Fix reloc tool compiler warnings.

  Other:
    - Delete use of ARCH_WANT_OPTIONAL_GPIOLIB"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (61 commits)
  MIPS: mm: Fix definition of R6 cache instruction
  MIPS: tools: Fix relocs tool compiler warnings
  MIPS: Cobalt: Fix typo
  MIPS: Octeon: Fix typo
  MIPS: Lantiq: Fix build failure
  MIPS: Use CPHYSADDR to implement mips32 __pa
  MIPS: Octeon: Dlink_dsr-1000n.dts: add more leds.
  MIPS: Octeon: Clean up GPIO definitions in dlink_dsr-1000n.dts.
  MIPS: Octeon: Delete built-in DTB pruning code for D-Link DSR-1000N.
  MIPS: store the appended dtb address in a variable
  MIPS: ZBOOT: copy appended dtb to the end of the kernel
  MIPS: ralink: fix spis group pinmux
  MIPS: Factor o32 specific code into signal_o32.c
  MIPS: non-exec stack & heap when non-exec PT_GNU_STACK is present
  MIPS: Use per-mm page to execute branch delay slot instructions
  MIPS: Modify error handling
  MIPS: c-r4k: Use SMP calls for CM indexed cache ops
  MIPS: c-r4k: Avoid small flush_icache_range SMP calls
  MIPS: c-r4k: Local flush_icache_range cache op override
  MIPS: c-r4k: Split r4k_flush_kernel_vmap_range()
  ...
2016-08-06 09:13:11 -04:00
James Hogan
233b2ca181 MIPS: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for MIPS at all even though ARCH_DLINFO will contain one NEW_AUX_ENT for
the VDSO address.

This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which MIPS doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.

Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13823/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28 12:06:16 +02:00
Paul Burton
1b49260006 MIPS: inst.h: Rename cbcond{0,1}_op to pop{1,3}0_op
The opcodes currently defined in inst.h as cbcond0_op & cbcond1_op are
actually defined in the MIPS base instruction set manuals as pop10 &
pop30 respectively. Rename them as such, for consistency with the
documentation.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 16:09:07 +02:00
Paul Burton
1c66b79bb3 MIPS: inst.h: Rename b{eq,ne}zcji[al]c_op to pop{6,7}6_op
The opcodes currently defined in inst.h as beqzcjic_op & bnezcjialc_op
are actually defined in the MIPS base instruction set manuals as pop66 &
pop76 respectively. Rename them as such, for consistency with the
documentation.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 16:08:59 +02:00
James Hogan
6f63405cb6 MIPS: uasm: Add r6 MUL encoding
Add the R6 MUL instruction encoding for 3 operand signed multiply to
uasm so that KVM can use uasm for generating its entry point code at
runtime on R6.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 16:08:40 +02:00
James Hogan
9f730a60e5 MIPS: uasm: Add MTHI/MTLO instructions
Add MTHI/MTLO instructions for writing to the hi & lo registers to uasm
so that KVM can use uasm for generating its entry point code at runtime.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 16:08:35 +02:00
James Hogan
61c64cf99a MIPS: uasm: Add DI instruction
Add DI instruction for disabling interrupts to uasm so that KVM can use
uasm for generating its entry point code at runtime.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 16:08:29 +02:00
James Hogan
59e3559f48 MIPS: uasm: Add CFCMSA/CTCMSA instructions
Add CFCMSA/CTCMSA instructions for accessing MSA control registers to
uasm so that KVM can use uasm for generating its entry point code at
runtime.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 16:08:20 +02:00