Commit Graph

7588 Commits

Author SHA1 Message Date
Jim Mattson
1aa561b1a4 kvm: x86: Add "last CPU" to some KVM_EXIT information
More often than not, a failed VM-entry in an x86 production
environment is induced by a defective CPU. To help identify the bad
hardware, include the id of the last logical CPU to run a vCPU in the
information provided to userspace on a KVM exit for failed VM-entry or
for KVM internal errors not associated with emulation. The presence of
this additional information is indicated by a new capability,
KVM_CAP_LAST_CPU.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-5-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:45 -04:00
David S. Miller
e80a07b244 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Support for rejecting packets from the prerouting chain, from
   Laura Garcia Liebana.

2) Remove useless assignment in pipapo, from Stefano Brivio.

3) On demand hook registration in IPVS, from Julian Anastasov.

4) Expire IPVS connection from process context to not overload
   timers, also from Julian.

5) Fallback to conntrack TCP tracker to handle connection reuse
   in IPVS, from Julian Anastasov.

6) Several patches to support for chain bindings.

7) Expose enum nft_chain_flags through UAPI.

8) Reject unsupported chain flags from the netlink control plane.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 12:42:40 -07:00
Matias Bjørling
82394db738 block: add capacity field to zone descriptors
In the zoned storage model, the sectors within a zone are typically all
writeable. With the introduction of the Zoned Namespace (ZNS) Command
Set in the NVM Express organization, the model was extended to have a
specific writeable capacity.

Extend the zone descriptor data structure with a zone capacity field to
indicate to the user how many sectors in a zone are writeable.

Introduce backward compatibility in the zone report ioctl by extending
the zone report header data structure with a flags field to indicate if
the capacity field is available.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 16:16:19 +02:00
Stanislav Fomichev
f5836749c9 bpf: Add BPF_CGROUP_INET_SOCK_RELEASE hook
Sometimes it's handy to know when the socket gets freed. In
particular, we'd like to try to use a smarter allocation of
ports for bpf_bind and explore the possibility of limiting
the number of SOCK_DGRAM sockets the process can have.

Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers on
inet socket release. It triggers only for userspace sockets
(not in-kernel ones) and therefore has the same semantics as
the existing BPF_CGROUP_INET_SOCK_CREATE.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200706230128.4073544-2-sdf@google.com
2020-07-08 01:03:31 +02:00
Daniel Lezcano
1ce50e7d40 thermal: core: genetlink support for events/cmd/sampling
Initially the thermal framework had a very simple notification
mechanism to send generic netlink messages to the userspace.

The notification function was never called from anywhere and the
corresponding dead code was removed. It was probably a first attempt
to introduce the netlink notification.

At LPC2018, the presentation "Linux thermal: User kernel interface",
proposed to create the notifications to the userspace via a kfifo.

The advantage of the kfifo is the performance. It is usually used from
a 1:1 communication channel where a driver captures data and sends it
as fast as possible to a userspace process.

The drawback is that only one process uses the notification channel
exclusively, thus no other process is allowed to use the channel to
get temperature or notifications.

This patch defines a generic netlink API to discover the current
thermal setup and adds event notifications as well as temperature
sampling. As any genetlink protocol, it can evolve and the versioning
allows to keep the backward compatibility.

In order to prevent the user from getting flooded with data on a
single channel, there are two multicast channels, one for the
temperature sampling when the thermal zone is updated and another one
for the events, so the user can get the events only without the
thermal zone temperature sampling.

Also, a list of commands to discover the thermal setup is added and
can be extended when needed.

Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20200706105538.2159-3-daniel.lezcano@linaro.org
2020-07-07 15:55:21 +02:00
Xu Yilun
09d8615014 fpga: dfl: afu: add AFU interrupt support
AFU (Accelerated Function Unit) is dynamic region of the DFL based FPGA,
and always defined by users. Some DFL based FPGA cards allow users to
implement their own interrupts in AFU. In order to support this,
hardware implements a new UINT (AFU Interrupt) private feature with
related capability register which describes the number of supported
AFU interrupts as well as the local index of the interrupts for
software enumeration, and from software side, driver follows the common
DFL interrupt notification and handling mechanism, and it implements
two ioctls below for user to query number of irqs supported and set/unset
interrupt triggers.

 Ioctls:
 * DFL_FPGA_PORT_UINT_GET_IRQ_NUM
   get the number of irqs, which is used to determine how many interrupts
   UINT feature supports.

 * DFL_FPGA_PORT_UINT_SET_IRQ
   set/unset eventfds as AFU interrupt triggers.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
2020-07-06 21:37:08 -07:00
Xu Yilun
d43f20bae5 fpga: dfl: fme: add interrupt support for global error reporting
Error reporting interrupt is very useful to notify users that some
errors are detected by the hardware. Once users are notified, they
could query hardware logged error states, no need to continuously
poll on these states.

This patch adds interrupt support for fme global error reporting sub
feature. It follows the common DFL interrupt notification and handling
mechanism. And it implements two ioctls below for user to query
number of irqs supported, and set/unset interrupt triggers.

 Ioctls:
 * DFL_FPGA_FME_ERR_GET_IRQ_NUM
   get the number of irqs, which is used to determine whether/how many
   interrupts fme error reporting feature supports.

 * DFL_FPGA_FME_ERR_SET_IRQ
   set/unset given eventfds as fme error reporting interrupt triggers.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
2020-07-06 21:35:42 -07:00
Xu Yilun
fe6a3d6521 fpga: dfl: afu: add interrupt support for port error reporting
Error reporting interrupt is very useful to notify users that some
errors are detected by the hardware. Once users are notified, they
could query hardware logged error states, no need to continuously
poll on these states.

This patch adds interrupt support for port error reporting sub feature.
It follows the common DFL interrupt notification and handling mechanism,
implements two ioctl commands below for user to query number of irqs
supported, and set/unset interrupt triggers.

 Ioctls:
 * DFL_FPGA_PORT_ERR_GET_IRQ_NUM
   get the number of irqs, which is used to determine whether/how many
   interrupts error reporting feature supports.

 * DFL_FPGA_PORT_ERR_SET_IRQ
   set/unset given eventfds as error interrupt triggers.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
2020-07-06 21:34:46 -07:00
Tiezhu Yang
ea1be1e59b serial: Remove duplicated macro definition of port type
There exists the same macro definition of port type from 0 to 13
in include/uapi/linux/serial.h, remove these duplicated code in
include/uapi/linux/serial_core.h which includes the former header.

Acked-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1588853015-28392-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-06 14:06:08 +02:00
David S. Miller
f91c031e65 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-07-04

The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 17 day(s) which contain
a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).

The main changes are:

1) bpftool ability to show PIDs of processes having open file descriptors
   for BPF map/program/link/BTF objects, relying on BPF iterator progs
   to extract this info efficiently, from Andrii Nakryiko.

2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
   seq_files, from Yonghong Song.

3) Support access to BPF map fields in struct bpf_map from programs
   through BTF struct access, from Andrey Ignatov.

4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
   via seq_file from BPF iterator progs, from Song Liu.

5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
   helper, from Dmitry Yakunin.

6) Optimize BPF sk_storage selection of its caching index, from Martin
   KaFai Lau.

7) Removal of redundant synchronize_rcu()s from BPF map destruction which
   has been a historic leftover, from Alexei Starovoitov.

8) Several improvements to test_progs to make it easier to create a shell
   loop that invokes each test individually which is useful for some CIs,
   from Jesper Dangaard Brouer.

9) Fix bpftool prog dump segfault when compiled without skeleton code on
   older clang versions, from John Fastabend.

10) Bunch of cleanups and minor improvements, from various others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-04 17:48:34 -07:00
Pablo Neira Ayuso
c1f79a2eef netfilter: nf_tables: reject unsupported chain flags
Bail out if userspace sends unsupported chain flags.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-04 02:51:28 +02:00
Pablo Neira Ayuso
d0e2c7de92 netfilter: nf_tables: add NFT_CHAIN_BINDING
This new chain flag specifies that:

* the kernel dynamically allocates the chain name, if no chain name
  is specified.

* If the immediate expression that refers to this chain is removed,
  then this bound chain (and its content) is destroyed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-04 01:22:14 +02:00
Pablo Neira Ayuso
67c49de4ad netfilter: nf_tables: expose enum nft_chain_flags through UAPI
This enum definition was never exposed through UAPI. Rename
NFT_BASE_CHAIN to NFT_CHAIN_BASE for consistency.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-04 01:18:41 +02:00
Pablo Neira Ayuso
51d70f181f netfilter: nf_tables: add NFTA_VERDICT_CHAIN_ID attribute
This netlink attribute allows you to identify the chain to jump/goto by
means of the chain ID.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-04 01:18:41 +02:00
Pablo Neira Ayuso
837830a4b4 netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute
This new netlink attribute allows you to add rules to chains by the
chain ID.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-04 01:18:41 +02:00
Pablo Neira Ayuso
74cccc3d38 netfilter: nf_tables: add NFTA_CHAIN_ID attribute
This netlink attribute allows you to refer to chains inside a
transaction as an alternative to the name and the handle. The chain
binding support requires this new chain ID approach.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-04 01:18:41 +02:00
Horatiu Vultur
36a8e8e265 bridge: Extend br_fill_ifinfo to return MPR status
This patch extends the function br_fill_ifinfo to return also the MRP
status for each instance on a bridge. It also adds a new filter
RTEXT_FILTER_MRP to return the MRP status only when this is set, not to
interfer with the vlans. The MRP status is return only on the bridge
interfaces.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02 14:19:15 -07:00
Horatiu Vultur
e4266b991f bridge: uapi: mrp: Extend MRP attributes to get the status
Add MRP attribute IFLA_BRIDGE_MRP_INFO to allow the userspace to get the
current state of the MRP instances. This is a nested attribute that
contains other attributes like, ring id, index of primary and secondary
port, priority, ring state, ring role.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02 14:19:15 -07:00
Peter Zijlstra
78c2141b65 Merge branch 'perf/vlbr' 2020-07-02 15:51:48 +02:00
Song Liu
fa28dcb82a bpf: Introduce helper bpf_get_task_stack()
Introduce helper bpf_get_task_stack(), which dumps stack trace of given
task. This is different to bpf_get_stack(), which gets stack track of
current task. One potential use case of bpf_get_task_stack() is to call
it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.

bpf_get_task_stack() uses stack_trace_save_tsk() instead of
get_perf_callchain() for kernel stack. The benefit of this choice is that
stack_trace_save_tsk() doesn't require changes in arch/. The downside of
using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
stack trace to unsigned long array. For 32-bit systems, we need to
translate it to u64 array.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630062846.664389-3-songliubraving@fb.com
2020-07-01 08:23:19 -07:00
David S. Miller
e708e2bd55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-06-30

The following pull-request contains BPF updates for your *net* tree.

We've added 28 non-merge commits during the last 9 day(s) which contain
a total of 35 files changed, 486 insertions(+), 232 deletions(-).

The main changes are:

1) Fix an incorrect verifier branch elimination for PTR_TO_BTF_ID pointer
   types, from Yonghong Song.

2) Fix UAPI for sockmap and flow_dissector progs that were ignoring various
   arguments passed to BPF_PROG_{ATTACH,DETACH}, from Lorenz Bauer & Jakub Sitnicki.

3) Fix broken AF_XDP DMA hacks that are poking into dma-direct and swiotlb
   internals and integrate it properly into DMA core, from Christoph Hellwig.

4) Fix RCU splat from recent changes to avoid skipping ingress policy when
   kTLS is enabled, from John Fastabend.

5) Fix BPF ringbuf map to enforce size to be the power of 2 in order for its
   position masking to work, from Andrii Nakryiko.

6) Fix regression from CAP_BPF work to re-allow CAP_SYS_ADMIN for loading
   of network programs, from Maciej Żenczykowski.

7) Fix libbpf section name prefix for devmap progs, from Jesper Dangaard Brouer.

8) Fix formatting in UAPI documentation for BPF helpers, from Quentin Monnet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30 14:20:45 -07:00
David S. Miller
d9b8b9845f This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - update mailing list URL, by Sven Eckelmann
 
  - fix typos and grammar in documentation, by Sven Eckelmann
 
  - introduce a configurable per interface hop penalty,
    by Linus Luessing
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl769w8WHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoatoEACN9O+fCwDIlIdxw/bUlGFw1J+V
 4rDReo5grIkl3BSdnLkkaPXTcazCWRoFhCdH6esTsF4/R/nwUa5AZuAbVd2NnCFU
 H+hVc7q/w0Hqppx5APT0AghgLcp8mFuHV9jqtJOjv9Kvd05Il6XwGkALT8QX2e2P
 ypMD+PBuQT1XCMl0v02lA0ki2TxvCtFYs1UznbAdl86GOGMua3t6fuRWQ2fpUMou
 t9251NnwSvSyeYuI8Zws7Wza2S4FOilX/CAjjGi02D0xf7T/JDs3FiPd6obdNfe/
 4j6R2dHWjxz1sb6bEdM3wfmUlU4StMQ+Vywpu1fh3gihLc2tmtMZhn6tJk3YjALt
 zlwDO2p/si7FLrizfMS7mUesZJH7Ji79BZtZWXsdW3p1zU1jlfJh5dc2ZaKwL9qk
 MSpd9DH2F459YC6ELcITLnhCymc7O/g3n3voNnSxi7uX4M5LsGt0mcx6g/iiFN16
 nPNHzXo7WJjgRcmLEQ3kYeD7Kjqh5XEogKLnwhThBxCFIQkNFF3OYqWtJyj5Sg5o
 +EredOAcq1Rc5CR//V43rQZLk4o1I9qQysKFqbjHohMIgLccz+47QAeBXrhku0ZT
 XU6k09Q0pkuo3iVPYnIUTDUkHtvFtsCKR98+8tB5xDR5F155OpFg7AXaKIx5Iym8
 4/eGrx22T+6TIeSGGg==
 =ceu6
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20200630' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - update mailing list URL, by Sven Eckelmann

 - fix typos and grammar in documentation, by Sven Eckelmann

 - introduce a configurable per interface hop penalty,
   by Linus Luessing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30 12:59:15 -07:00
Merlijn Wajer
c463bb2a8f Input: add SW_MACHINE_COVER
This event code represents the state of a removable cover of a device.
Value 0 means that the cover is open or removed, value 1 means that the
cover is closed.

Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Merlijn Wajer <merlijn@wizzup.org>
Link: https://lore.kernel.org/r/20200612125402.18393-2-merlijn@wizzup.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-06-30 12:06:49 -07:00
Jean-Philippe Brucker
970471914c iommu: Allow page responses without PASID
Some PCIe devices do not expect a PASID value in PRI Page Responses.
If the "PRG Response PASID Required" bit in the PRI capability is zero,
then the OS should not set the PASID field. Similarly on Arm SMMU,
responses to stall events do not have a PASID.

Currently iommu_page_response() systematically checks that the PASID in
the page response corresponds to the one in the page request. This can't
work with virtualization because a page response coming from a guest OS
won't have a PASID if the passed-through device does not require one.

Add a flag to page requests that declares whether the corresponding
response needs to have a PASID. When this flag isn't set, allow page
responses without PASID.

Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20200616144712.748818-1-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-06-30 11:49:21 +02:00
Amit Cohen
ecc31c6024 ethtool: Add link extended state
Currently, drivers can only tell whether the link is up/down using
LINKSTATE_GET, but no additional information is given.

Add attributes to LINKSTATE_GET command in order to allow drivers
to expose the user more information in addition to link state to ease
the debug process, for example, reason for link down state.

Extended state consists of two attributes - link_ext_state and
link_ext_substate. The idea is to avoid 'vendor specific' states in order
to prevent drivers to use specific link_ext_state that can be in the future
common link_ext_state.

The substates allows drivers to add more information to the common
link_ext_state. For example, vendor can expose 'Autoneg' as link_ext_state
and add 'No partner detected during force mode' as link_ext_substate.

If a driver cannot pinpoint the extended state with the substate
accuracy, it is free to expose only the extended state and omit the
substate attribute.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29 17:45:02 -07:00
Petr Machata
aee9caa03f net: sched: sch_red: Add qevents "early_drop" and "mark"
In order to allow acting on dropped and/or ECN-marked packets, add two new
qevents to the RED qdisc: "early_drop" and "mark". Filters attached at
"early_drop" block are executed as packets are early-dropped, those
attached at the "mark" block are executed as packets are ECN-marked.

Two new attributes are introduced: TCA_RED_EARLY_DROP_BLOCK with the block
index for the "early_drop" qevent, and TCA_RED_MARK_BLOCK for the "mark"
qevent. Absence of these attributes signifies "don't care": no block is
allocated in that case, or the existing blocks are left intact in case of
the change callback.

For purposes of offloading, blocks attached to these qevents appear with
newly-introduced binder types, FLOW_BLOCK_BINDER_TYPE_RED_EARLY_DROP and
FLOW_BLOCK_BINDER_TYPE_RED_MARK.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29 17:08:28 -07:00
Greg Kroah-Hartman
9cf6ffae38 Merge 5.8-rc3 into usb-next
We want the USB fixes in here, and this resolves a merge issue found in
linux-next.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-29 08:22:27 +02:00
Martin
fe80536acf bareudp: Added attribute to enable & disable rx metadata collection
Metadata need not be collected in receive if the packet from bareudp
device is not targeted to openvswitch.

Signed-off-by: Martin <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28 20:48:20 -07:00
Xu Yilun
322b598be4 fpga: dfl: introduce interrupt trigger setting API
FPGA user applications may be interested in interrupts generated by
DFL features. For example, users can implement their own FPGA
logics with interrupts enabled in AFU (Accelerated Function Unit,
dynamic region of DFL based FPGA). So user applications need to be
notified to handle these interrupts.

In order to allow userspace applications to monitor interrupts,
driver requires userspace to provide eventfds as interrupt
notification channels. Applications then poll/select on the eventfds
to get notified.

This patch introduces a generic helper functions to do eventfds binding
with given interrupts.

Sub feature drivers are expected to use XXX_GET_IRQ_NUM to query irq
info, and XXX_SET_IRQ to set eventfds for interrupts. This patch also
introduces helper functions for these 2 ioctls.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
2020-06-28 12:43:16 -07:00
Linus Torvalds
c322f5399f VFIO fixes for v5.8-rc3
- Fix double free of eventfd ctx (Alex Williamson)
 
  - Fix duplicate use of capability ID (Alex Williamson)
 
  - Fix SR-IOV VF memory enable handling (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJe91WIAAoJECObm247sIsiYxUP+wSMBfCKLaI/DWnfylBFGWXP
 2RKFhPYHBr7PYu3PS35UUK5mddpCAxjXjjrmSFy7VLbwGv1pBQylCqVdYO+z/P83
 x4ciasUxIN96Y14IBctZ18XCcv5KKsEjoyQYxtpdnShmMMBrLfmM6kqQmsEmYlod
 VkLYYJx2GcX+hz7ThZxh0FcemBv8I6kdkxAX5/xM7X07vu3xpA/Z5qbjREfNaNWS
 PSmBuneEB+4JVeJoyOffIwCC7wtPL2gNCdS3AeIWLHrLTUb3JK+cPxYaqzj3nfHg
 FipOcoN5q0R4Wc1sZRmvkn9cBE2qLDPwwdtNL8NApZpsOeA05/QwyEe/X/0wNqaF
 rqhjnQYTQ6I6yqmg7APiO7vTz7YAsplJtWoxSmsT2oLPn93ObCeDunDn/pBYuRWF
 tdc5G74ewOA/ee1jJqsMx9xK8U0I+pQQ9ig/opROEutV1qmG57/13tusAaSvNGy4
 OjFYx4LcR0zGjSF7YKFXs5EVPLt4EYLb4xnNKqunedlNvtjItuEhonSs8rpMKvK/
 MC/Aa3xA3RtbDOhnAH20PRYuGFzExUWosUVXdDfdViVZ+S6L2ZGfVocJUm6fF/WY
 zxDLQC9OaenXhHNop2egRqbZTTvvzI3O78sFGhDhBLdJa/gQbDh9Asy9n/0+BpX2
 TyW8LMfNzi70bX1LG64p
 =kuUL
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.8-rc3' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - Fix double free of eventfd ctx (Alex Williamson)

 - Fix duplicate use of capability ID (Alex Williamson)

 - Fix SR-IOV VF memory enable handling (Alex Williamson)

* tag 'vfio-v5.8-rc3' of git://github.com/awilliam/linux-vfio:
  vfio/pci: Fix SR-IOV VF handling with MMIO blocking
  vfio/type1: Fix migration info capability ID
  vfio/pci: Clear error and request eventfd ctx after releasing
2020-06-27 15:17:21 -07:00
Linus Torvalds
6a6c9b220a drm fixes for v5.8-rc3
core:
 - fix VT registration regression
 
 ttm:
 - fix two fence leaks
 
 amdgpu:
 - Fix missed mutex unlock in DC error path
 - Fix firmware leak for sdma5
 - DC bpc property fixes
 
 amdkfd:
 - Fix memleak in an error path
 
 radeon:
 - Fix copy paste typo in NI DPM spll validation
 
 rcar-du:
 - build fix
 
 tegra:
 - add missing zpos property
 - child driver registeration fix
 - debugfs cleanup fix
 - doc fix
 
 mcde:
 - reorder fbdev setup
 
 panel:
 - fix connector type
 - fix orienation for some panels
 
 sun4i:
 - fix dma/iommu configuration
 
 uvesafb:
 - respect blank flag
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJe9XhdAAoJEAx081l5xIa+9xEP/0xKWygNTFenNgr3cBTta882
 OrUXn4T3Kss4ZPsax6UnPUXEMnBs5/+98S2QKc1EHK7EBHBtZ2EKJqz/xSkeYaOK
 RecM0zCjJfr5ujQwUlCN7EoGbNYhBUfJWLCvvsO5/DaOf7m1p2gwoa9ElXkDB3/v
 r8zoWbkyAcWfTJrZmzJU8/h/ilVat19Z2PRpRA5lkne/dPLVJ36zF/Yi+yXZuVXN
 knU7f0fVa7u+xBxm0xMtKAD02HmLoIOnCkNj2X7vUbkxGcJK2OcRoUDk2rWel4Km
 IZkkzLKPwtEkvyD7sZNPU7EsJWKX2Kmy0PYYWRIKTwOV3JhgtOZkj2D5zbXAGzBr
 bE+tyJS0bhwuG87IeEaensZ2vM+hFzAa7tFaYNm2lYJnuJyLvYbFwABlGkTJ91VI
 ivciHzUga4yW20iSW+0udiL4ADOvgVLVtlu4GlspGUssWI6mWg7U9wu6WyX8zPE6
 j+cpMR4+x9KpmV92opU7BTiENE0Rn4AeKagOM54/th7R02TElHcKvgSnaE+JAPVz
 9B7VNWDBoPoEqPmeNWzsseNp850RMVylb/f1js3rVtjigfsrJyVIFZhdkJ0H33Iq
 4ZlOeX6wc4vvcx0OgNq7Gxt1zn1XRQqTLV+OSi4cmrABYhqNfpaInABIJi9AnZhg
 6B1xvnZ0lmSSmyWuFkP+
 =boAF
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2020-06-26' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Usual rc3 pickup, lots of little fixes all over.

  The core VT registration regression fix is probably the largest,
  otherwise ttm, amdgpu and tegra are the bulk, with some minor driver
  fixes.

  No i915 pull this week which may or may not mean I get 2x of it next
  week, we'll see how it goes.

  core:
   - fix VT registration regression

  ttm:
   - fix two fence leaks

  amdgpu:
   - Fix missed mutex unlock in DC error path
   - Fix firmware leak for sdma5
   - DC bpc property fixes

  amdkfd:
   - Fix memleak in an error path

  radeon:
   - Fix copy paste typo in NI DPM spll validation

  rcar-du:
   - build fix

  tegra:
   - add missing zpos property
   - child driver registeration fix
   - debugfs cleanup fix
   - doc fix

  mcde:
   - reorder fbdev setup

  panel:
   - fix connector type
   - fix orienation for some panels

  sun4i:
   - fix dma/iommu configuration

  uvesafb:
   - respect blank flag"

* tag 'drm-fixes-2020-06-26' of git://anongit.freedesktop.org/drm/drm: (25 commits)
  drm/amd: fix potential memleak in err branch
  drm/amd/display: Fix ineffective setting of max bpc property
  drm/amd/display: Enable output_bpc property on all outputs
  drm/amdgpu: add fw release for sdma v5_0
  drm/fb-helper: Fix vt restore
  drm/radeon: fix fb_div check in ni_init_smc_spll_table()
  drm/amdgpu/display: Unlock mutex on error
  drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU
  drm: panel-orientation-quirks: Use generic orientation-data for Acer S1003
  drm: panel-orientation-quirks: Add quirk for Asus T101HA panel
  video: fbdev: uvesafb: fix "noblank" option handling
  drm/panel-simple: fix connector type for newhaven_nhd_43_480272ef_atxl
  drm/panel-simple: fix connector type for LogicPD Type28 Display
  drm: rcar-du: Fix build error
  drm: mcde: Fix forgotten user of drm->dev_private
  drm: mcde: Fix display initialization problem
  drm/tegra: Add zpos property for cursor planes
  gpu: host1x: Detach driver on unregister
  gpu: host1x: Correct trivial kernel-doc inconsistencies
  drm/tegra: hub: Register child devices
  ...
2020-06-26 12:27:25 -07:00
Linus Lüssing
3bda14d09d batman-adv: Introduce a configurable per interface hop penalty
In some setups multiple hard interfaces with similar link qualities
or throughput values are available. But people have expressed the desire
to consider one of them as a backup only.

Some creative solutions are currently in use: Such people are
configuring multiple batman-adv mesh/soft interfaces, wire them
together with some veth pairs and then tune the hop penalty to achieve
an effect similar to a tunable per interface hop penalty.

This patch introduces a new, configurable, per hard interface hop penalty
to simplify such setups.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-06-26 10:37:11 +02:00
Sven Eckelmann
bccb48c89f batman-adv: Fix typos and grammar in documentation
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-06-26 10:36:30 +02:00
Dave Airlie
687a0ed337 Short summary of fixes pull (less than what git shortlog provides):
* In mcde, set up fbdev after device registration and removde the last access
 to dev->dev_private. Fixes an error message and a segmentation fault.
 
  * Set the connector type for LogicPT Type 28 and newhaven_nhd_43_480272ef_atxl
 panels.
 
  * In uvesafb, fix the handling of the noblank option.
 
  * Fix panel orientation for Asus T101HA and Acer S1003.
 
  * Fix DMA configuration for sun4i if IOMMU is present.
 
  * Fix regression in VT restoration. Unbreaks userspace (i.e., Xorg) VT handling.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAl70X3MACgkQaA3BHVML
 eiPP+ggAnquBWHn4QYVrw4tlcI8supVJurQflcYYYBzPksiijheJAE+74nfCWwTB
 aijbyiPSp6tq8aP2NQpasW3pQdo0gQ2I9CwYzCxPTDk3X0iSxGEIToDieBTonmKn
 UrvUZCQ5L775XJASus2gH4/b34Y0zIrQfCLj1ECOOSNjUWvTLntWrbJDl8S9SISh
 teIAFZ1qAlcrTTNuAZ1lp+e+AlXEXKhi8+CqY0ltiV1q4JtO1SdOMnb5y0rxsYOy
 H/KVWbZWdWkgIrZpJ1v/+jio0Mkh0jDi1ldJ6UDxWAv5viQnExICUY1d88Qw2xVo
 73w4iJ/ujNIK3kDZQmvLMlVTjc1oSw==
 =GRqS
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2020-06-25' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull (less than what git shortlog provides):

 * In mcde, set up fbdev after device registration and removde the last access
to dev->dev_private. Fixes an error message and a segmentation fault.

 * Set the connector type for LogicPT Type 28 and newhaven_nhd_43_480272ef_atxl
panels.

 * In uvesafb, fix the handling of the noblank option.

 * Fix panel orientation for Asus T101HA and Acer S1003.

 * Fix DMA configuration for sun4i if IOMMU is present.

 * Fix regression in VT restoration. Unbreaks userspace (i.e., Xorg) VT handling.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200625082717.GA14856@linux-uq9g
2020-06-26 13:49:17 +10:00
David S. Miller
7bed145516 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes in xfrm_device.c, between the double
ESP trailing bug fix setting the XFRM_INIT flag and the changes
in net-next preparing for bonding encryption support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 19:29:51 -07:00
Linus Torvalds
4a21185cda Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen.

 2) The default crypto algorithm selection in Kconfig for IPSEC is out
    of touch with modern reality, fix this up. From Eric Biggers.

 3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii
    Nakryiko.

 4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from
    Hangbin Liu.

 5) Adjust packet alignment handling in ax88179_178a driver to match
    what the hardware actually does. From Jeremy Kerr.

 6) register_netdevice can leak in the case one of the notifiers fail,
    from Yang Yingliang.

 7) Use after free in ip_tunnel_lookup(), from Taehee Yoo.

 8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir
    Oltean.

 9) tg3 driver can sleep forever when we get enough EEH errors, fix from
    David Christensen.

10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet
    drivers, from Ciara Loftus.

11) Fix scanning loop break condition in of_mdiobus_register(), from
    Florian Fainelli.

12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon.

13) Endianness fix in mlxsw, from Ido Schimmel.

14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen.

15) Missing bridge mrp configuration validation, from Horatiu Vultur.

16) Fix circular netns references in wireguard, from Jason A. Donenfeld.

17) PTP initialization on recovery is not done properly in qed driver,
    from Alexander Lobakin.

18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong,
    from Rahul Lakkireddy.

19) Don't clear bound device TX queue of socket prematurely otherwise we
    get problems with ktls hw offloading, from Tariq Toukan.

20) ipset can do atomics on unaligned memory, fix from Russell King.

21) Align ethernet addresses properly in bridging code, from Thomas
    Martitz.

22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set,
    from Marcelo Ricardo Leitner.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits)
  rds: transport module should be auto loaded when transport is set
  sch_cake: fix a few style nits
  sch_cake: don't call diffserv parsing code when it is not needed
  sch_cake: don't try to reallocate or unshare skb unconditionally
  ethtool: fix error handling in linkstate_prepare_data()
  wil6210: account for napi_gro_receive never returning GRO_DROP
  hns: do not cast return value of napi_gro_receive to null
  socionext: account for napi_gro_receive never returning GRO_DROP
  wireguard: receive: account for napi_gro_receive never returning GRO_DROP
  vxlan: fix last fdb index during dump of fdb with nhid
  sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
  tc-testing: avoid action cookies with odd length.
  bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
  tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
  net: dsa: sja1105: fix tc-gate schedule with single element
  net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
  net: dsa: sja1105: unconditionally free old gating config
  net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
  net: macb: free resources on failure path of at91ether_open()
  net: macb: call pm_runtime_put_sync on failure path
  ...
2020-06-25 18:27:40 -07:00
Rao Shoaib
4c342f778f rds: transport module should be auto loaded when transport is set
This enhancement auto loads transport module when the transport
is set via SO_RDS_TRANSPORT socket option.

Reviewed-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:26:25 -07:00
Yonghong Song
0d4fad3e57 bpf: Add bpf_skc_to_udp6_sock() helper
The helper is used in tracing programs to cast a socket
pointer to a udp6_sock pointer.
The return value could be NULL if the casting is illegal.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20200623230815.3988481-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song
478cfbdf5f bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers
Three more helpers are added to cast a sock_common pointer to
an tcp_sock, tcp_timewait_sock or a tcp_request_sock for
tracing programs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230811.3988277-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song
af7ec13833 bpf: Add bpf_skc_to_tcp6_sock() helper
The helper is used in tracing programs to cast a socket
pointer to a tcp6_sock pointer.
The return value could be NULL if the casting is illegal.

A new helper return type RET_PTR_TO_BTF_ID_OR_NULL is added
so the verifier is able to deduce proper return types for the helper.

Different from the previous BTF_ID based helpers,
the bpf_skc_to_tcp6_sock() argument can be several possible
btf_ids. More specifically, all possible socket data structures
with sock_common appearing in the first in the memory layout.
This patch only added socket types related to tcp and udp.

All possible argument btf_id and return value btf_id
for helper bpf_skc_to_tcp6_sock() are pre-calculcated and
cached. In the future, it is even possible to precompute
these btf_id's at kernel build time.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230809.3988195-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Nikolay Aleksandrov
b5f1d9ec28 net: bridge: add a flag to avoid refreshing fdb when changing/adding
When we modify or create a new fdb entry sometimes we want to avoid
refreshing its activity in order to track it properly. One example is
when a mac is received from EVPN multi-homing peer by FRR, which doesn't
want to change local activity accounting. It makes it static and sets a
flag to track its activity.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24 14:36:33 -07:00
Nikolay Aleksandrov
31cbc39b63 net: bridge: add option to allow activity notifications for any fdb entries
This patch adds the ability to notify about activity of any entries
(static, permanent or ext_learn). EVPN multihoming peers need it to
properly and efficiently handle mac sync (peer active/locally active).
We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the
current activity state and to control if static entries should be monitored
at all. We use 2 bits - one to activate fdb entry tracking (disabled by
default) and the second to denote that an entry is inactive. We need
the second bit in order to avoid multiple notifications of inactivity.
Obviously this makes no difference for dynamic entries since at the time
of inactivity they get deleted, while the tracked non-dynamic entries get
the inactive bit set and get a notification.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24 14:36:33 -07:00
Nikolay Aleksandrov
899426b3bd net: neighbor: add fdb extended attribute
Add an attribute to NDA which will contain all future fdb-specific
attributes in order to avoid polluting the NDA namespace with e.g.
bridge or vxlan specific attributes. The attribute is called
NDA_FDB_EXT_ATTRS and the structure would look like:
 [NDA_FDB_EXT_ATTRS] = {
    [NFEA_xxx]
 }

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24 14:36:33 -07:00
Daniel Vetter
dc5bdb68b5 drm/fb-helper: Fix vt restore
In the past we had a pile of hacks to orchestrate access between fbdev
emulation and native kms clients. We've tried to streamline this, by
always preferring the kms side above fbdev calls when a drm master
exists, because drm master controls access to the display resources.

Unfortunately this breaks existing userspace, specifically Xorg. When
exiting Xorg first restores the console to text mode using the KDSET
ioctl on the vt. This does nothing, because a drm master is still
around. Then it drops the drm master status, which again does nothing,
because logind is keeping additional drm fd open to be able to
orchestrate vt switches. In the past this is the point where fbdev was
restored, as part of the ->lastclose hook on the drm side.

Now to fix this regression we don't want to go back to letting fbdev
restore things whenever it feels like, or to the pile of hacks we've
had before. Instead try and go with a minimal exception to make the
KDSET case work again, and nothing else.

This means that if userspace does a KDSET call when switching between
graphical compositors, there will be some flickering with fbcon
showing up for a bit. But a) that's not a regression and b) userspace
can fix it by improving the vt switching dance - logind should have
all the information it needs.

While pondering all this I'm also wondering wheter we should have a
SWITCH_MASTER ioctl to allow race-free master status handover. But
that's for another day.

v2: Somehow forgot to cc all the fbdev people.

v3: Fix typo Alex spotted.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208179
Cc: shlomo@fastmail.com
Reported-and-Tested-by: shlomo@fastmail.com
Cc: Michel Dänzer <michel@daenzer.net>
Fixes: 64914da24e ("drm/fbdev-helper: don't force restores")
Cc: Noralf Trønnes <noralf@tronnes.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.7+
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Qiujun Huang <hqjagain@gmail.com>
Cc: Peter Rosin <peda@axentia.se>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200624092910.3280448-1-daniel.vetter@ffwll.ch
2020-06-24 21:34:11 +02:00
Dmitry Yakunin
f9bcf96837 bpf: Add SO_KEEPALIVE and related options to bpf_setsockopt
This patch adds support of SO_KEEPALIVE flag and TCP related options
to bpf_setsockopt() routine. This is helpful if we want to enable or tune
TCP keepalive for applications which don't do it in the userspace code.

v3:
  - update kernel-doc in uapi (Nikita Vetoshkin <nekto0n@yandex-team.ru>)

v4:
  - update kernel-doc in tools too (Alexei Starovoitov)
  - add test to selftests (Alexei Starovoitov)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200620153052.9439-3-zeil@yandex-team.ru
2020-06-24 11:21:03 -07:00
Greg Kroah-Hartman
62fb45d317 USB: ch9: add "USB_" prefix in front of TEST defines
For some reason, the TEST_ defines in the usb/ch9.h files did not have
the USB_ prefix on it, making it a bit confusing when reading the file,
as well as not the nicest thing to do in a uapi file.

So fix that up and add the USB_ prefix on to them, and fix up all
in-kernel usages.  This included deleting the duplicate copy in the
net2272.h file.

Cc: Felipe Balbi <balbi@kernel.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Mathias Nyman <mathias.nyman@intel.com>
Cc: Pawel Laszczak <pawell@cadence.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jia-Ju Bai <baijiaju1990@gmail.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jules Irenge <jbi.octave@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Cc: Rob Gill <rrobgill@protonmail.com>
Cc: Macpaul Lin <macpaul.lin@mediatek.com>
Acked-by: Minas Harutyunyan <hminas@synopsys.com>
Acked-by: Bin Liu <b-liu@ti.com>
Acked-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Acked-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20200618144206.2655890-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24 15:01:24 +02:00
Dave Jiang
0b8975bdc0 dmaengine: idxd: fix hw descriptor fields for delta record
Fix the hw descriptor fields for delta record in user exported idxd.h
header. Missing the "expected result mask" field.

Reported-by: Mona Hossain <mona.hossain@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/159120526866.65385.536565786678052944.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-06-24 11:53:34 +05:30
Petr Vaněk
428d2459cc xfrm: introduce oseq-may-wrap flag
RFC 4303 in section 3.3.3 suggests to disable anti-replay for manually
distributed ICVs in which case the sender does not need to monitor or
reset the counter. However, the sender still increments the counter and
when it reaches the maximum value, the counter rolls over back to zero.

This patch introduces new extra_flag XFRM_SA_XFLAG_OSEQ_MAY_WRAP which
allows sequence number to cycle in outbound packets if set. This flag is
used only in legacy and bmp code, because esn should not be negotiated
if anti-replay is disabled (see note in 3.3.3 section).

Signed-off-by: Petr Vaněk <pv@excello.cz>
Acked-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-06-24 07:51:01 +02:00
Quentin Monnet
bcc7f554cf bpf: Fix formatting in documentation for BPF helpers
When producing the bpf-helpers.7 man page from the documentation from
the BPF user space header file, rst2man complains:

    <stdin>:2636: (ERROR/3) Unexpected indentation.
    <stdin>:2640: (WARNING/2) Block quote ends without a blank line; unexpected unindent.

Let's fix formatting for the relevant chunk (item list in
bpf_ringbuf_query()'s description), and for a couple other functions.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200623153935.6215-1-quentin@isovalent.com
2020-06-23 17:57:02 -07:00
Alexei Starovoitov
4e608675e7 Merge up to bpf_probe_read_kernel_str() fix into bpf-next 2020-06-23 15:33:41 -07:00
Andrii Nakryiko
bdb7b79b4c bpf: Switch most helper return values from 32-bit int to 64-bit long
Switch most of BPF helper definitions from returning int to long. These
definitions are coming from comments in BPF UAPI header and are used to
generate bpf_helper_defs.h (under libbpf) to be later included and used from
BPF programs.

In actual in-kernel implementation, all the helpers are defined as returning
u64, but due to some historical reasons, most of them are actually defined as
returning int in UAPI (usually, to return 0 on success, and negative value on
error).

This actually causes Clang to quite often generate sub-optimal code, because
compiler believes that return value is 32-bit, and in a lot of cases has to be
up-converted (usually with a pair of 32-bit bit shifts) to 64-bit values,
before they can be used further in BPF code.

Besides just "polluting" the code, these 32-bit shifts quite often cause
problems for cases in which return value matters. This is especially the case
for the family of bpf_probe_read_str() functions. There are few other similar
helpers (e.g., bpf_read_branch_records()), in which return value is used by
BPF program logic to record variable-length data and process it. For such
cases, BPF program logic carefully manages offsets within some array or map to
read variable-length data. For such uses, it's crucial for BPF verifier to
track possible range of register values to prove that all the accesses happen
within given memory bounds. Those extraneous zero-extending bit shifts,
inserted by Clang (and quite often interleaved with other code, which makes
the issues even more challenging and sometimes requires employing extra
per-variable compiler barriers), throws off verifier logic and makes it mark
registers as having unknown variable offset. We'll study this pattern a bit
later below.

Another common pattern is to check return of BPF helper for non-zero state to
detect error conditions and attempt alternative actions in such case. Even in
this simple and straightforward case, this 32-bit vs BPF's native 64-bit mode
quite often leads to sub-optimal and unnecessary extra code. We'll look at
this pattern as well.

Clang's BPF target supports two modes of code generation: ALU32, in which it
is capable of using lower 32-bit parts of registers, and no-ALU32, in which
only full 64-bit registers are being used. ALU32 mode somewhat mitigates the
above described problems, but not in all cases.

This patch switches all the cases in which BPF helpers return 0 or negative
error from returning int to returning long. It is shown below that such change
in definition leads to equivalent or better code. No-ALU32 mode benefits more,
but ALU32 mode doesn't degrade or still gets improved code generation.

Another class of cases switched from int to long are bpf_probe_read_str()-like
helpers, which encode successful case as non-negative values, while still
returning negative value for errors.

In all of such cases, correctness is preserved due to two's complement
encoding of negative values and the fact that all helpers return values with
32-bit absolute value. Two's complement ensures that for negative values
higher 32 bits are all ones and when truncated, leave valid negative 32-bit
value with the same value. Non-negative values have upper 32 bits set to zero
and similarly preserve value when high 32 bits are truncated. This means that
just casting to int/u32 is correct and efficient (and in ALU32 mode doesn't
require any extra shifts).

To minimize the chances of regressions, two code patterns were investigated,
as mentioned above. For both patterns, BPF assembly was analyzed in
ALU32/NO-ALU32 compiler modes, both with current 32-bit int return type and
new 64-bit long return type.

Case 1. Variable-length data reading and concatenation. This is quite
ubiquitous pattern in tracing/monitoring applications, reading data like
process's environment variables, file path, etc. In such case, many pieces of
string-like variable-length data are read into a single big buffer, and at the
end of the process, only a part of array containing actual data is sent to
user-space for further processing. This case is tested in test_varlen.c
selftest (in the next patch). Code flow is roughly as follows:

  void *payload = &sample->payload;
  u64 len;

  len = bpf_probe_read_kernel_str(payload, MAX_SZ1, &source_data1);
  if (len <= MAX_SZ1) {
      payload += len;
      sample->len1 = len;
  }
  len = bpf_probe_read_kernel_str(payload, MAX_SZ2, &source_data2);
  if (len <= MAX_SZ2) {
      payload += len;
      sample->len2 = len;
  }
  /* and so on */
  sample->total_len = payload - &sample->payload;
  /* send over, e.g., perf buffer */

There could be two variations with slightly different code generated: when len
is 64-bit integer and when it is 32-bit integer. Both variations were analysed.
BPF assembly instructions between two successive invocations of
bpf_probe_read_kernel_str() were used to check code regressions. Results are
below, followed by short analysis. Left side is using helpers with int return
type, the right one is after the switch to long.

ALU32 + INT                                ALU32 + LONG
===========                                ============

64-BIT (13 insns):                         64-BIT (10 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   if w0 > 256 goto +9 <LBB0_4>         18:   if r0 > 256 goto +6 <LBB0_4>
  19:   w1 = w0                              19:   r1 = 0 ll
  20:   r1 <<= 32                            21:   *(u64 *)(r1 + 0) = r0
  21:   r1 s>>= 32                           22:   r6 = 0 ll
  22:   r2 = 0 ll                            24:   r6 += r0
  24:   *(u64 *)(r2 + 0) = r1              00000000000000c8 <LBB0_4>:
  25:   r6 = 0 ll                            25:   r1 = r6
  27:   r6 += r1                             26:   w2 = 256
00000000000000e0 <LBB0_4>:                   27:   r3 = 0 ll
  28:   r1 = r6                              29:   call 115
  29:   w2 = 256
  30:   r3 = 0 ll
  32:   call 115

32-BIT (11 insns):                         32-BIT (12 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   if w0 > 256 goto +7 <LBB1_4>         18:   if w0 > 256 goto +8 <LBB1_4>
  19:   r1 = 0 ll                            19:   r1 = 0 ll
  21:   *(u32 *)(r1 + 0) = r0                21:   *(u32 *)(r1 + 0) = r0
  22:   w1 = w0                              22:   r0 <<= 32
  23:   r6 = 0 ll                            23:   r0 >>= 32
  25:   r6 += r1                             24:   r6 = 0 ll
00000000000000d0 <LBB1_4>:                   26:   r6 += r0
  26:   r1 = r6                            00000000000000d8 <LBB1_4>:
  27:   w2 = 256                             27:   r1 = r6
  28:   r3 = 0 ll                            28:   w2 = 256
  30:   call 115                             29:   r3 = 0 ll
                                             31:   call 115

In ALU32 mode, the variant using 64-bit length variable clearly wins and
avoids unnecessary zero-extension bit shifts. In practice, this is even more
important and good, because BPF code won't need to do extra checks to "prove"
that payload/len are within good bounds.

32-bit len is one instruction longer. Clang decided to do 64-to-32 casting
with two bit shifts, instead of equivalent `w1 = w0` assignment. The former
uses extra register. The latter might potentially lose some range information,
but not for 32-bit value. So in this case, verifier infers that r0 is [0, 256]
after check at 18:, and shifting 32 bits left/right keeps that range intact.
We should probably look into Clang's logic and see why it chooses bitshifts
over sub-register assignments for this.

NO-ALU32 + INT                             NO-ALU32 + LONG
==============                             ===============

64-BIT (14 insns):                         64-BIT (10 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   r0 <<= 32                            18:   if r0 > 256 goto +6 <LBB0_4>
  19:   r1 = r0                              19:   r1 = 0 ll
  20:   r1 >>= 32                            21:   *(u64 *)(r1 + 0) = r0
  21:   if r1 > 256 goto +7 <LBB0_4>         22:   r6 = 0 ll
  22:   r0 s>>= 32                           24:   r6 += r0
  23:   r1 = 0 ll                          00000000000000c8 <LBB0_4>:
  25:   *(u64 *)(r1 + 0) = r0                25:   r1 = r6
  26:   r6 = 0 ll                            26:   r2 = 256
  28:   r6 += r0                             27:   r3 = 0 ll
00000000000000e8 <LBB0_4>:                   29:   call 115
  29:   r1 = r6
  30:   r2 = 256
  31:   r3 = 0 ll
  33:   call 115

32-BIT (13 insns):                         32-BIT (13 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   r1 = r0                              18:   r1 = r0
  19:   r1 <<= 32                            19:   r1 <<= 32
  20:   r1 >>= 32                            20:   r1 >>= 32
  21:   if r1 > 256 goto +6 <LBB1_4>         21:   if r1 > 256 goto +6 <LBB1_4>
  22:   r2 = 0 ll                            22:   r2 = 0 ll
  24:   *(u32 *)(r2 + 0) = r0                24:   *(u32 *)(r2 + 0) = r0
  25:   r6 = 0 ll                            25:   r6 = 0 ll
  27:   r6 += r1                             27:   r6 += r1
00000000000000e0 <LBB1_4>:                 00000000000000e0 <LBB1_4>:
  28:   r1 = r6                              28:   r1 = r6
  29:   r2 = 256                             29:   r2 = 256
  30:   r3 = 0 ll                            30:   r3 = 0 ll
  32:   call 115                             32:   call 115

In NO-ALU32 mode, for the case of 64-bit len variable, Clang generates much
superior code, as expected, eliminating unnecessary bit shifts. For 32-bit
len, code is identical.

So overall, only ALU-32 32-bit len case is more-or-less equivalent and the
difference stems from internal Clang decision, rather than compiler lacking
enough information about types.

Case 2. Let's look at the simpler case of checking return result of BPF helper
for errors. The code is very simple:

  long bla;
  if (bpf_probe_read_kenerl(&bla, sizeof(bla), 0))
      return 1;
  else
      return 0;

ALU32 + CHECK (9 insns)                    ALU32 + CHECK (9 insns)
====================================       ====================================
  0:    r1 = r10                             0:    r1 = r10
  1:    r1 += -8                             1:    r1 += -8
  2:    w2 = 8                               2:    w2 = 8
  3:    r3 = 0                               3:    r3 = 0
  4:    call 113                             4:    call 113
  5:    w1 = w0                              5:    r1 = r0
  6:    w0 = 1                               6:    w0 = 1
  7:    if w1 != 0 goto +1 <LBB2_2>          7:    if r1 != 0 goto +1 <LBB2_2>
  8:    w0 = 0                               8:    w0 = 0
0000000000000048 <LBB2_2>:                 0000000000000048 <LBB2_2>:
  9:    exit                                 9:    exit

Almost identical code, the only difference is the use of full register
assignment (r1 = r0) vs half-registers (w1 = w0) in instruction #5. On 32-bit
architectures, new BPF assembly might be slightly less optimal, in theory. But
one can argue that's not a big issue, given that use of full registers is
still prevalent (e.g., for parameter passing).

NO-ALU32 + CHECK (11 insns)                NO-ALU32 + CHECK (9 insns)
====================================       ====================================
  0:    r1 = r10                             0:    r1 = r10
  1:    r1 += -8                             1:    r1 += -8
  2:    r2 = 8                               2:    r2 = 8
  3:    r3 = 0                               3:    r3 = 0
  4:    call 113                             4:    call 113
  5:    r1 = r0                              5:    r1 = r0
  6:    r1 <<= 32                            6:    r0 = 1
  7:    r1 >>= 32                            7:    if r1 != 0 goto +1 <LBB2_2>
  8:    r0 = 1                               8:    r0 = 0
  9:    if r1 != 0 goto +1 <LBB2_2>        0000000000000048 <LBB2_2>:
 10:    r0 = 0                               9:    exit
0000000000000058 <LBB2_2>:
 11:    exit

NO-ALU32 is a clear improvement, getting rid of unnecessary zero-extension bit
shifts.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-1-andriin@fb.com
2020-06-24 00:04:36 +02:00
Horatiu Vultur
2464bc7c28 bridge: uapi: mrp: Fix MRP_PORT_ROLE
Currently the MRP_PORT_ROLE_NONE has the value 0x2 but this is in conflict
with the IEC 62439-2 standard. The standard defines the following port
roles: primary (0x0), secondary(0x1), interconnect(0x2).
Therefore remove the port role none.

Fixes: 4714d13791 ("bridge: uapi: mrp: Add mrp attributes.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23 14:38:05 -07:00
Alexandre Cassen
79a28ddd18 rtnetlink: add keepalived rtm_protocol
Keepalived can set global static ip routes or virtual ip routes dynamically
following VRRP protocol states. Using a dedicated rtm_protocol will help
keeping track of it.

Changes in v2:
 - fix tab/space indenting

Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23 14:35:12 -07:00
Hans Verkuil
286cf7d3a9 media: videodev2.h: add V4L2_FMT_FLAG_ENC_CAP_FRAME_INTERVAL flag
Add the V4L2_FMT_FLAG_ENC_CAP_FRAME_INTERVAL flag to signal that
the coded frame interval can be set separately from the raw frame
interval for stateful encoders.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Michael Tretter <m.tretter@pengutronix.de>
Acked-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-23 14:02:46 +02:00
Sergey Senozhatsky
1e0b2318fa media: videobuf2: handle V4L2_FLAG_MEMORY_NON_CONSISTENT flag
This patch lets user-space to request a non-consistent memory
allocation during CREATE_BUFS and REQBUFS ioctl calls.

= CREATE_BUFS

  struct v4l2_create_buffers has seven 4-byte reserved areas,
  so reserved[0] is renamed to ->flags. The struct, thus, now
  has six reserved 4-byte regions.

= CREATE_BUFS32

  struct v4l2_create_buffers32 has seven 4-byte reserved areas,
  so reserved[0] is renamed to ->flags. The struct, thus, now
  has six reserved 4-byte regions.

= REQBUFS

 We use one bit of a ->reserved[1] member of struct v4l2_requestbuffers,
 which is now renamed to ->flags. Unlike v4l2_create_buffers, struct
 v4l2_requestbuffers does not have enough reserved room. Therefore for
 backward compatibility  ->reserved and ->flags were put into anonymous
 union.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-23 13:32:41 +02:00
Sergey Senozhatsky
ac53503ee3 media: videobuf2: add V4L2_FLAG_MEMORY_NON_CONSISTENT flag
By setting or clearing V4L2_FLAG_MEMORY_NON_CONSISTENT flag
user-space should be able to set or clear queue's NON_CONSISTENT
->dma_attrs. Queue's ->dma_attrs are passed to the underlying
allocator in __vb2_buf_mem_alloc(), so thus user-space is able
to request vb2 buffer's memory to be either consistent (coherent)
or non-consistent.

The patch set also adds a corresponding capability flag:
fill_buf_caps() reports V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
when queue supports user-space cache management hints. Note,
however, that MMAP_CACHE_HINTS capability only valid when the
queue is used for memory MMAP-ed streaming I/O.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-23 13:28:17 +02:00
Collin Walling
23a60f8344 s390/kvm: diagnose 0x318 sync and reset
DIAGNOSE 0x318 (diag318) sets information regarding the environment
the VM is running in (Linux, z/VM, etc) and is observed via
firmware/service events.

This is a privileged s390x instruction that must be intercepted by
SIE. Userspace handles the instruction as well as migration. Data
is communicated via VCPU register synchronization.

The Control Program Name Code (CPNC) is stored in the SIE block. The
CPNC along with the Control Program Version Code (CPVC) are stored
in the kvm_vcpu_arch struct.

This data is reset on load normal and clear resets.

Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200622154636.5499-3-walling@linux.ibm.com
[borntraeger@de.ibm.com: fix sync_reg position]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-06-23 10:55:33 +02:00
Vasundhara Volam
b5872cd0e8 devlink: Add support for board.serial_number to info_get cb.
Board serial number is a serial number, often available in PCI
*Vital Product Data*.

Also, update devlink-info.rst documentation file.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22 16:15:04 -07:00
Parav Pandit
2a916ecc40 net/devlink: Support querying hardware address of port function
PCI PF and VF devlink port can manage the function represented by
a devlink port.

Enable users to query port function's hardware address.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22 15:29:19 -07:00
Linus Torvalds
dd0d718152 spi: Fixes for v5.8
Quite a lot of fixes here for no single reason.  There's a collection of
 the usual sort of device specific fixes and also a bunch of people have
 been working on spidev and the userspace test program spidev_test so
 they've got an unusually large collection of small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl7wl/8THGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0NJBB/4oTOxPIfkKQyPcSTQB8FC6AHJ5Z8FT
 /4dtK4U32rn5Sms/CHNXMMFxomBgB46Ud19KJ/bEgqugJfI9Vl0bl9fZT+sDyVU3
 rFPthVluIprFe1bITyfcgYmjRoznLRdzj9LMFsAWIe3Vmy584TdvXbl8n5COu5Wh
 6sR6SuQnEn65x1HC4++3IsG70+ogz81sFRapk/7jXwS+YdwzWIqtN2s2IlUcTmcy
 ylijl04XUSyo6e/v/sjZaaRmIz72mW+HbMKW4iC0mGZ6yuACoy7zop9BuwAg6S7z
 dFDyqTZ5gq72ZeJc7fuco1y7jqQoKUC1oTRghPlbq2XG3Huta80l415l
 =32lr
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "Quite a lot of fixes here for no single reason.

  There's a collection of the usual sort of device specific fixes and
  also a bunch of people have been working on spidev and the userspace
  test program spidev_test so they've got an unusually large collection
  of small fixes"

* tag 'spi-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spidev: fix a potential use-after-free in spidev_release()
  spi: spidev: fix a race between spidev_release and spidev_remove
  spi: stm32-qspi: Fix error path in case of -EPROBE_DEFER
  spi: uapi: spidev: Use TABs for alignment
  spi: spi-fsl-dspi: Free DMA memory with matching function
  spi: tools: Add macro definitions to fix build errors
  spi: tools: Make default_tx/rx and input_tx static
  spi: dt-bindings: amlogic, meson-gx-spicc: Fix schema for meson-g12a
  spi: rspi: Use requested instead of maximum bit rate
  spi: spidev_test: Use %u to format unsigned numbers
  spi: sprd: switch the sequence of setting WDG_LOAD_LOW and _HIGH
2020-06-22 09:49:59 -07:00
Jiufei Xue
5769a351b8 io_uring: change the poll type to be 32-bits
poll events should be 32-bits to cover EPOLLEXCLUSIVE.

Explicit word-swap the poll32_events for big endian to make sure the ABI
is not changed.  We call this feature IORING_FEAT_POLL_32BITS,
applications who want to use EPOLLEXCLUSIVE should check the feature bit
first.

Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:00 -06:00
Linus Torvalds
eede2b9b3f libnvdimm for 5.8-rc2
- Fix the visibility of the region 'align' attribute. The new unit tests
   for region alignment handling caught a corner case where the alignment
   cannot be specified if the region is converted from static to dynamic
   provisioning at runtime.
 
 - Add support for device health retrieval for the persistent memory
   supported by the papr_scm driver. This includes both the standard
   sysfs "health flags" that the nfit persistent memory driver publishes
   and a mechanism for the ndctl tool to retrieve a health-command payload.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl7tMD8ACgkQHtKRamZ9
 iAJNdA//aoULVhq/ZEo+tiPPXwgC9yHPJJ+Oe3+pcV2saWNeSBuHu047IdSQayRi
 6VZgsQ3lLRlEmk5aUbK00T84tVtwN20fppw1H4XyqfyGg4hmW7O5HomY8ZyOAETc
 0lEJet3+jf4Ym36BxBvfFTU9fOwtvLum5EHUw3Dc1Xl7UZVtp2zb0lklZx5xU/xr
 /nKNkPktvj7ENtmIOKegPFC/eSeKtOmD9shzjFygLeG52ZU1n+TEvhm4LCUU7Z+Y
 e5s/Ny3LUWzPY+bESXxTI1XpLthWOR6T+yPkNp6JoZHzk3P6vKawMGzRrvI1bUai
 hOj7Bf6BeNnKqIXIIZOXO8/ISgveHbnvUp6Kk5Lq5OYzPuGn81zZhen3EfHkB0y2
 Dc+xyceDsTeTlu2RGcrGrV60hCo2+5DqSLodu+kUAX6btvhtdrEgt3kmWAZSctk2
 7gHy8H+54p5ocXS+eDqafX6qYGmQIdGbtfTtiTuZJKIIH+Dsh9+mcu3KtH1AwtpN
 bV+/sbjuWOMmMBNnlVRtqhy1xA9bSVSaodthl+iWpw1nKXFPSKdl3w8ngSeZsfFD
 eZnGa4hO5G02s96gozh1n61ye8KFJLk8gCjx100PQKCemvIWTWLIzlh2NjPnCdTx
 SAI0vNfJgpmNZfb62NHYAGFwNwvOn4b7j9xFSqtHVh7+Vuyt6JM=
 =qVy4
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "A feature (papr_scm health retrieval) and a fix (sysfs attribute
  visibility) for v5.8.

  Vaibhav explains in the merge commit below why missing v5.8 would be
  painful and I agreed to try a -rc2 pull because only cosmetics kept
  this out of -rc1 and his initial versions were posted in more than
  enough time for v5.8 consideration:

   'These patches are tied to specific features that were committed to
    customers in upcoming distros releases (RHEL and SLES) whose
    time-lines are tied to 5.8 kernel release.

    Being able to track the health of an nvdimm is critical for our
    customers that are running workloads leveraging papr-scm nvdimms.
    Missing the 5.8 kernel would mean missing the distro timelines and
    shifting forward the availability of this feature in distro kernels
    by at least 6 months'

  Summary:

   - Fix the visibility of the region 'align' attribute.

     The new unit tests for region alignment handling caught a corner
     case where the alignment cannot be specified if the region is
     converted from static to dynamic provisioning at runtime.

   - Add support for device health retrieval for the persistent memory
     supported by the papr_scm driver.

     This includes both the standard sysfs "health flags" that the nfit
     persistent memory driver publishes and a mechanism for the ndctl
     tool to retrieve a health-command payload"

* tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm/region: always show the 'align' attribute
  powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
  ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
  powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
  powerpc/papr_scm: Fetch nvdimm health information from PHYP
  seq_buf: Export seq_buf_printf
  powerpc: Document details on H_SCM_HEALTH hcall
2020-06-20 13:13:21 -07:00
Alex Williamson
f751820bc3 vfio/type1: Fix migration info capability ID
ID 1 is already used by the IOVA range capability, use ID 2.

Reported-by: Liu Yi L <yi.l.liu@intel.com>
Fixes: ad721705d0 ("vfio iommu: Add migration capability to report supported features")
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-18 13:07:13 -06:00
Macpaul Lin
81c7462883 USB: replace hardcode maximum usb string length by definition
Replace hardcoded maximum USB string length (126 bytes) by definition
"USB_MAX_STRING_LEN".

Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/1592471618-29428-1-git-send-email-macpaul.lin@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-18 16:07:57 +02:00
Rob Gill
03cc8353c2 USB: core: additional Device Classes to debug/usb/devices
Several newer USB Device classes are not presently reported individually at
/sys/kernel/debug/usb/devices, (reported as "unk."). This patch adds the
following classes: 0fh (Personal Healthcare devices), 10h (USB Type-C combined
Audio/Video devices) 11h (USB billboard), 12h (USB Type-C Bridge). As defined
at [https://www.usb.org/defined-class-codes]

Corresponding classes defined in include/linux/usb/ch9.h.

Signed-off-by: Rob Gill <rrobgill@protonmail.com>
Link: https://lore.kernel.org/r/20200601211749.6878-1-rrobgill@protonmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-18 10:02:58 +02:00
David S. Miller
b9d37bbb55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-06-17

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 2 day(s) which contain
a total of 14 files changed, 158 insertions(+), 59 deletions(-).

The main changes are:

1) Important fix for bpf_probe_read_kernel_str() return value, from Andrii.

2) [gs]etsockopt fix for large optlen, from Stanislav.

3) devmap allocation fix, from Toke.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-17 13:26:55 -07:00
Christian Brauner
60997c3d45
close_range: add CLOSE_RANGE_UNSHARE
One of the use-cases of close_range() is to drop file descriptors just before
execve(). This would usually be expressed in the sequence:

unshare(CLONE_FILES);
close_range(3, ~0U);

as pointed out by Linus it might be desirable to have this be a part of
close_range() itself under a new flag CLOSE_RANGE_UNSHARE.

This expands {dup,unshare)_fd() to take a max_fds argument that indicates the
maximum number of file descriptors to copy from the old struct files. When the
user requests that all file descriptors are supposed to be closed via
close_range(min, max) then we can cap via unshare_fd(min) and hence don't need
to do any of the heavy fput() work for everything above min.

The patch makes it so that if CLOSE_RANGE_UNSHARE is requested and we do in
fact currently share our file descriptor table we create a new private copy.
We then close all fds in the requested range and finally after we're done we
install the new fd table.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-17 00:07:38 +02:00
Vaibhav Jain
f517f7925b ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.

The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a 'struct nd_pkg_pdsm' and a maximal union named
'nd_pdsm_payload'. These new structs together with 'struct nd_cmd_pkg'
for a pdsm envelop thats sent by libndctl to libnvdimm and serviced by
papr_scm in 'papr_scm_service_pdsm()'. The PDSM request is
communicated by member 'struct nd_cmd_pkg.nd_command' together with
other information on the pdsm payload (size-in, size-out).

The patch also introduces 'struct pdsm_cmd_desc' instances of which
are stored in an array __pdsm_cmd_descriptors[] indexed with PDSM cmd
and corresponding access function pdsm_cmd_desc() is
introduced. 'struct pdsm_cdm_desc' holds the service function for a
given PDSM and corresponding payload in/out sizes.

A new function papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm. The function performs validation on the PDSM
payload based on info present in corresponding PDSM descriptor and if
valid calls the 'struct pdcm_cmd_desc.service' function to service the
PDSM.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20200615124407.32596-6-vaibhav@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-06-15 18:22:44 -07:00
Andrii Nakryiko
b0659d8a95 bpf: Fix definition of bpf_ringbuf_output() helper in UAPI comments
Fix definition of bpf_ringbuf_output() in UAPI header comments, which is used
to generate libbpf's bpf_helper_defs.h header. Return value is a number (error
code), not a pointer.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200615214926.3638836-1-andriin@fb.com
2020-06-16 02:17:01 +02:00
Linus Torvalds
3be20b6fc1 This is the second round of ext4 commits for 5.8 merge window. It
includes the per-inode DAX support, which was dependant on the DAX
 infrastructure which came in via the XFS tree, and a number of
 regression and bug fixes; most notably the "BUG: using
 smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
 by syzkaller.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl7mgCcACgkQ8vlZVpUN
 gaPftwf8C4w/7SG+CYLdwg0d2u9TKk77yDuWaioFHOcMSjZvG4TCSgtMhZxQnyty
 9t4yqacILx12pCj/mZnrZp5BOSn9O2ZbuDoXNKNrFXU0BF+CsbnhvJvrrh1j/MUa
 PPtcqyGFdOLSDvHSD9xPVT76juwh79aR8vB7qnQXaEO5wcLodZWoqBEFSKCl6Bo8
 hjXs1EXidusKsoarQxW6mEITmnhU2S2fuCVDgVcoM/LmKwzbgqvlWrentq9u8qLH
 W+XbjWgUtCM1byeDZWqe5FYyyJ8x+dTv7H5an3KR92EN6hKo5AOvzA0I41pZscq/
 bJ9p2THDxJQX4rJBevGAS5mZ6hTkRw==
 =z6eO
 -----END PGP SIGNATURE-----

Merge tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull more ext4 updates from Ted Ts'o:
 "This is the second round of ext4 commits for 5.8 merge window [1].

  It includes the per-inode DAX support, which was dependant on the DAX
  infrastructure which came in via the XFS tree, and a number of
  regression and bug fixes; most notably the "BUG: using
  smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
  by syzkaller"

[1] The pull request actually came in 15 minutes after I had tagged the
    rc1 release. Tssk, tssk, late..   - Linus

* tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
  ext4: support xattr gnu.* namespace for the Hurd
  ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr
  ext4: avoid utf8_strncasecmp() with unstable name
  ext4: stop overwrite the errcode in ext4_setup_super
  ext4: fix partial cluster initialization when splitting extent
  ext4: avoid race conditions when remounting with options that change dax
  Documentation/dax: Update DAX enablement for ext4
  fs/ext4: Introduce DAX inode flag
  fs/ext4: Remove jflag variable
  fs/ext4: Make DAX mount option a tri-state
  fs/ext4: Only change S_DAX on inode load
  fs/ext4: Update ext4_should_use_dax()
  fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS
  fs/ext4: Disallow verity if inode is DAX
  fs/ext4: Narrow scope of DAX check in setflags
2020-06-15 09:32:10 -07:00
Geert Uytterhoeven
27784a256c
spi: uapi: spidev: Use TABs for alignment
The UAPI <linux/spi/spidev.h> uses TABs for alignment.
Convert the recently introduced spaces to TABs to restore consistency.

Fixes: 7bb64402a0 ("spi: tools: Add macro definitions to fix build errors")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20200613073755.15906-1-geert+renesas@glider.be
Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-15 16:03:38 +01:00
Adrian Hunter
dd9ddf466a ftrace: Add perf ksymbol events for ftrace trampolines
Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-8-adrian.hunter@intel.com
2020-06-15 14:09:49 +02:00
Adrian Hunter
69e4908869 kprobes: Add perf ksymbol events for kprobe insn pages
Symbols are needed for tools to describe instruction addresses. Pages
allocated for kprobe's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-5-adrian.hunter@intel.com
2020-06-15 14:09:49 +02:00
Adrian Hunter
e17d43b93e perf: Add perf text poke event
Record (single instruction) changes to the kernel text (i.e.
self-modifying code) in order to support tracers like Intel PT and
ARM CoreSight.

A copy of the running kernel code is needed as a reference point (e.g.
from /proc/kcore). The text poke event records the old bytes and the
new bytes so that the event can be processed forwards or backwards.

The basic problem is recording the modified instruction in an
unambiguous manner given SMP instruction cache (in)coherence. That is,
when modifying an instruction concurrently any solution with one or
multiple timestamps is not sufficient:

	CPU0				CPU1
 0
 1	write insn A
 2					execute insn A
 3	sync-I$
 4

Due to I$, CPU1 might execute either the old or new A. No matter where
we record tracepoints on CPU0, one simply cannot tell what CPU1 will
have observed, except that at 0 it must be the old one and at 4 it
must be the new one.

To solve this, take inspiration from x86 text poking, which has to
solve this exact problem due to variable length instruction encoding
and I-fetch windows.

 1) overwrite the instruction with a breakpoint and sync I$

This guarantees that that code flow will never hit the target
instruction anymore, on any CPU (or rather, it will cause an
exception).

 2) issue the TEXT_POKE event

 3) overwrite the breakpoint with the new instruction and sync I$

Now we know that any execution after the TEXT_POKE event will either
observe the breakpoint (and hit the exception) or the new instruction.

So by guarding the TEXT_POKE event with an exception on either side;
we can now tell, without doubt, which instruction another CPU will
have observed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-2-adrian.hunter@intel.com
2020-06-15 14:09:48 +02:00
Linus Torvalds
96144c58ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix cfg80211 deadlock, from Johannes Berg.

 2) RXRPC fails to send norigications, from David Howells.

 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from
    Geliang Tang.

 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu.

 5) The ucc_geth driver needs __netdev_watchdog_up exported, from
    Valentin Longchamp.

 6) Fix hashtable memory leak in dccp, from Wang Hai.

 7) Fix how nexthops are marked as FDB nexthops, from David Ahern.

 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni.

 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien.

10) Fix link speed reporting in iavf driver, from Brett Creeley.

11) When a channel is used for XSK and then reused again later for XSK,
    we forget to clear out the relevant data structures in mlx5 which
    causes all kinds of problems. Fix from Maxim Mikityanskiy.

12) Fix memory leak in genetlink, from Cong Wang.

13) Disallow sockmap attachments to UDP sockets, it simply won't work.
    From Lorenz Bauer.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: ethernet: ti: ale: fix allmulti for nu type ale
  net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init
  net: atm: Remove the error message according to the atomic context
  bpf: Undo internal BPF_PROBE_MEM in BPF insns dump
  libbpf: Support pre-initializing .bss global variables
  tools/bpftool: Fix skeleton codegen
  bpf: Fix memlock accounting for sock_hash
  bpf: sockmap: Don't attach programs to UDP sockets
  bpf: tcp: Recv() should return 0 when the peer socket is closed
  ibmvnic: Flush existing work items before device removal
  genetlink: clean up family attributes allocations
  net: ipa: header pad field only valid for AP->modem endpoint
  net: ipa: program upper nibbles of sequencer type
  net: ipa: fix modem LAN RX endpoint id
  net: ipa: program metadata mask differently
  ionic: add pcie_print_link_status
  rxrpc: Fix race between incoming ACK parser and retransmitter
  net/mlx5: E-Switch, Fix some error pointer dereferences
  net/mlx5: Don't fail driver on failure to create debugfs
  net/mlx5e: CT: Fix ipv6 nat header rewrite actions
  ...
2020-06-13 16:27:13 -07:00
David S. Miller
fa7566a0d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-06-12

The following pull-request contains BPF updates for your *net* tree.

We've added 26 non-merge commits during the last 10 day(s) which contain
a total of 27 files changed, 348 insertions(+), 93 deletions(-).

The main changes are:

1) sock_hash accounting fix, from Andrey.

2) libbpf fix and probe_mem sanitizing, from Andrii.

3) sock_hash fixes, from Jakub.

4) devmap_val fix, from Jesper.

5) load_bytes_relative fix, from YiFei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-13 15:28:08 -07:00
Linus Torvalds
6c32978414 Notifications over pipes + Keyring notifications
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7U/i8ACgkQ+7dXa6fL
 C2u2eg/+Oy6ybq0hPovYVkFI9WIG7ZCz7w9Q6BEnfYMqqn3dnfJxKQ3l4pnQEOWw
 f4QfvpvevsYfMtOJkYcG6s66rQgbFdqc5TEyBBy0QNp3acRolN7IXkcopvv9xOpQ
 JxedpbFG1PTFLWjvBpyjlrUPouwLzq2FXAf1Ox0ZIMw6165mYOMWoli1VL8dh0A0
 Ai7JUB0WrvTNbrwhV413obIzXT/rPCdcrgbQcgrrLPex8lQ47ZAE9bq6k4q5HiwK
 KRzEqkQgnzId6cCNTFBfkTWsx89zZunz7jkfM5yx30MvdAtPSxvvpfIPdZRZkXsP
 E2K9Fk1/6OQZTC0Op3Pi/bt+hVG/mD1p0sQUDgo2MO3qlSS+5mMkR8h3mJEgwK12
 72P4YfOJkuAy2z3v4lL0GYdUDAZY6i6G8TMxERKu/a9O3VjTWICDOyBUS6F8YEAK
 C7HlbZxAEOKTVK0BTDTeEUBwSeDrBbvH6MnRlZCG5g1Fos2aWP0udhjiX8IfZLO7
 GN6nWBvK1fYzfsUczdhgnoCzQs3suoDo04HnsTPGJ8De52T4x2RsjV+gPx0nrNAq
 eWChl1JvMWsY2B3GLnl9XQz4NNN+EreKEkk+PULDGllrArrPsp5Vnhb9FJO1PVCU
 hMDJHohPiXnKbc8f4Bd78OhIvnuoGfJPdM5MtNe2flUKy2a2ops=
 =YTGf
 -----END PGP SIGNATURE-----

Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull notification queue from David Howells:
 "This adds a general notification queue concept and adds an event
  source for keys/keyrings, such as linking and unlinking keys and
  changing their attributes.

  Thanks to Debarshi Ray, we do have a pull request to use this to fix a
  problem with gnome-online-accounts - as mentioned last time:

     https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

  Without this, g-o-a has to constantly poll a keyring-based kerberos
  cache to find out if kinit has changed anything.

  [ There are other notification pending: mount/sb fsinfo notifications
    for libmount that Karel Zak and Ian Kent have been working on, and
    Christian Brauner would like to use them in lxc, but let's see how
    this one works first ]

  LSM hooks are included:

   - A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set. Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable. The
     LSM should use current's credentials. [Wanted by SELinux & Smack]

   - A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue. This is
     given the credentials from the event generator (which may be the
     system) and the watch setter. [Wanted by Smack]

  I've provided SELinux and Smack with implementations of some of these
  hooks.

  WHY
  ===

  Key/keyring notifications are desirable because if you have your
  kerberos tickets in a file/directory, your Gnome desktop will monitor
  that using something like fanotify and tell you if your credentials
  cache changes.

  However, we also have the ability to cache your kerberos tickets in
  the session, user or persistent keyring so that it isn't left around
  on disk across a reboot or logout. Keyrings, however, cannot currently
  be monitored asynchronously, so the desktop has to poll for it - not
  so good on a laptop. This facility will allow the desktop to avoid the
  need to poll.

  DESIGN DECISIONS
  ================

   - The notification queue is built on top of a standard pipe. Messages
     are effectively spliced in. The pipe is opened with a special flag:

        pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem
     like it will ever be applicable in this context)[?]. It is given up
     front to make it a lot easier to prohibit splice&co from accessing
     the pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
         O_* flag if I can avoid it - should I add a pipe3() system call
         instead?

     The pipe is then configured::

        ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
        ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

   - It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the
     kernel has to be able to insert messages into the pipe *without*
     holding pipe->mutex and the code to make this work needs careful
     auditing.

   - sendfile(), splice() and vmsplice() are disabled on notification
     pipes because of the pipe->mutex issue and also because they
     sometimes want to revert what they just did - but one or more
     notification messages might've been interleaved in the ring.

   - The kernel inserts messages with the wait queue spinlock held. This
     means that pipe_read() and pipe_write() have to take the spinlock
     to update the queue pointers.

   - Records in the buffer are binary, typed and have a length so that
     they can be of varying size.

     This allows multiple heterogeneous sources to share a common
     buffer; there are 16 million types available, of which I've used
     just a few, so there is scope for others to be used. Tags may be
     specified when a watchpoint is created to help distinguish the
     sources.

   - Records are filterable as types have up to 256 subtypes that can be
     individually filtered. Other filtration is also available.

   - Notification pipes don't interfere with each other; each may be
     bound to a different set of watches. Any particular notification
     will be copied to all the queues that are currently watching for it
     - and only those that are watching for it.

   - When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's
     insufficient space. read() will fabricate a loss notification
     message at an appropriate point later.

   - The notification pipe is created and then watchpoints are attached
     to it, using one of:

        keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
        watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
        watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

     where in both cases, fd indicates the queue and the number after is
     a tag between 0 and 255.

   - Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed. In the latter case, a message will
     be generated indicating the enforced watch removal.

  Things I want to avoid:

   - Introducing features that make the core VFS dependent on the
     network stack or networking namespaces (ie. usage of netlink).

   - Dumping all this stuff into dmesg and having a daemon that sits
     there parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky. Further, dmesg might not exist or might be
     inaccessible inside a container.

   - Letting users see events they shouldn't be able to see.

  TESTING AND MANPAGES
  ====================

   - The keyutils tree has a pipe-watch branch that has keyctl commands
     for making use of notifications. Proposed manual pages can also be
     found on this branch, though a couple of them really need to go to
     the main manpages repository instead.

     If the kernel supports the watching of keys, then running "make
     test" on that branch will cause the testing infrastructure to spawn
     a monitoring process on the side that monitors a notifications pipe
     for all the key/keyring changes induced by the tests and they'll
     all be checked off to make sure they happened.

        https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

   - A test program is provided (samples/watch_queue/watch_test) that
     can be used to monitor for keyrings, mount and superblock events.
     Information on the notifications is simply logged to stdout"

* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  smack: Implement the watch_key and post_notification hooks
  selinux: Implement the watch_key security hook
  keys: Make the KEY_NEED_* perms an enum rather than a mask
  pipe: Add notification lossage handling
  pipe: Allow buffers to be marked read-whole-or-error for notifications
  Add sample notification program
  watch_queue: Add a key/keyring notification facility
  security: Add hooks to rule on setting a watch
  pipe: Add general notification queue support
  pipe: Add O_NOTIFICATION_PIPE
  security: Add a hook for the point of notification insertion
  uapi: General notification queue definitions
2020-06-13 09:56:21 -07:00
Jan (janneke) Nieuwenhuizen
88ee9d571b ext4: support xattr gnu.* namespace for the Hurd
The Hurd gained[0] support for moving the translator and author
fields out of the inode and into the "gnu.*" xattr namespace.

In anticipation of that, an xattr INDEX was reserved[1].  The Hurd has
now been brought into compliance[2] with that.

This patch adds support for reading and writing such attributes from
Linux; you can now do something like

    mkdir -p hurd-root/servers/socket
    touch hurd-root/servers/socket/1
    setfattr --name=gnu.translator --value='"/hurd/pflocal\0"' \
        hurd-root/servers/socket/1
    getfattr --name=gnu.translator hurd-root/servers/socket/1
    # file: 1
    gnu.translator="/hurd/pflocal"

to setup a pipe translator, which is being used to create[3] a
vm-image for the Hurd from GNU Guix.

[0] https://summerofcode.withgoogle.com/projects/#5869799859027968
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3980bd3b406addb327d858aebd19e229ea340b9a
[2] https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=a04c7bf83172faa7cb080fbe3b6c04a8415ca645
[3] https://git.savannah.gnu.org/cgit/guix.git/log/?h=wip-hurd-vm

Signed-off-by: Jan Nieuwenhuizen <janneke@gnu.org>
Link: https://lore.kernel.org/r/20200525193940.878-1-janneke@gnu.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-12 13:23:34 -04:00
Qing Zhang
7bb64402a0
spi: tools: Add macro definitions to fix build errors
Add SPI_TX_OCTAL and SPI_RX_OCTAL to fix the following build errors:

CC       spidev_test.o
spidev_test.c: In function ‘transfer’:
spidev_test.c:131:13: error: ‘SPI_TX_OCTAL’ undeclared (first use in this function)
  if (mode & SPI_TX_OCTAL)
             ^
spidev_test.c:131:13: note: each undeclared identifier is reported only once for each function it appears in
spidev_test.c:137:13: error: ‘SPI_RX_OCTAL’ undeclared (first use in this function)
  if (mode & SPI_RX_OCTAL)
             ^
spidev_test.c: In function ‘parse_opts’:
spidev_test.c:290:12: error: ‘SPI_TX_OCTAL’ undeclared (first use in this function)
    mode |= SPI_TX_OCTAL;
            ^
spidev_test.c:308:12: error: ‘SPI_RX_OCTAL’ undeclared (first use in this function)
    mode |= SPI_RX_OCTAL;
            ^
  LD       spidev_test-in.o
ld: cannot find spidev_test.o: No such file or directory

Additionally, maybe SPI_CS_WORD and SPI_3WIRE_HIZ will be used in the future,
so add them too.

Fixes: 896fa73508 ("spi: spidev_test: Add support for Octal mode data transfers")
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/1591880212-13479-2-git-send-email-zhangqing@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-11 16:27:25 +01:00
Theodore Ts'o
68cd44920d Enable ext4 support for per-file/directory dax operations
This adds the same per-file/per-directory DAX support for ext4 as was
done for xfs, now that we finally have consensus over what the
interface should be.
2020-06-11 10:51:44 -04:00
Linus Torvalds
09102704c6 virtio: features, fixes
virtio-mem
 doorbell mapping for vdpa
 config interrupt support in ifc
 fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl7fZ6APHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpkDoIAMcBcQx5su1iuX7vT35xzUWZO478eAf1jOMZ
 7KxKUVBeztkcxVFUlRVRu9MR6wOzwHils+1HD6025775Smr5M6x3aJxR6xOORaBj
 RoU6OVGkpDvbzsxlhW+xhONz4O7/RkveKJPCwzGjqHrsFeh92lkfTqroz/EuNpw+
 LZsO0+DhdUf123HbwHQp5lxW8EjyrRabgeZZg/D9VLPhoCP88vCjRhBXU2GPuaUl
 /UNXsQafn4xUgrxPaoN5f4Phn/P46NNrbZ1jmlkw/z/3QhF/DhktGXGaZsIHDCN/
 vicUii0or5QLeBsZpMbKko/BIe2xWHxFjkMRhMOMZOfcBb6sMBI=
 =auUa
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - virtio-mem: paravirtualized memory hotplug

 - support doorbell mapping for vdpa

 - config interrupt support in ifc

 - fixes all over the place

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
  vhost/test: fix up after API change
  virtio_mem: convert device block size into 64bit
  virtio-mem: drop unnecessary initialization
  ifcvf: implement config interrupt in IFCVF
  vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
  vhost_vdpa: Support config interrupt in vdpa
  ifcvf: ignore continuous setting same status value
  virtio-mem: Don't rely on implicit compiler padding for requests
  virtio-mem: Try to unplug the complete online memory block first
  virtio-mem: Use -ETXTBSY as error code if the device is busy
  virtio-mem: Unplug subblocks right-to-left
  virtio-mem: Drop manual check for already present memory
  virtio-mem: Add parent resource for all added "System RAM"
  virtio-mem: Better retry handling
  virtio-mem: Offline and remove completely unplugged memory blocks
  mm/memory_hotplug: Introduce offline_and_remove_memory()
  virtio-mem: Allow to offline partially unplugged memory blocks
  mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
  virtio-mem: Paravirtualized memory hotunplug part 2
  virtio-mem: Paravirtualized memory hotunplug part 1
  ...
2020-06-10 13:42:09 -07:00
Jesper Dangaard Brouer
281920b7e0 bpf: Devmap adjust uapi for attach bpf program
V2:
- Defer changing BPF-syscall to start at file-descriptor 1
- Use {} to zero initialise struct.

The recent commit fbee97feed ("bpf: Add support to attach bpf program to a
devmap entry"), introduced ability to attach (and run) a separate XDP
bpf_prog for each devmap entry. A bpf_prog is added via a file-descriptor.
As zero were a valid FD, not using the feature requires using value minus-1.
The UAPI is extended via tail-extending struct bpf_devmap_val and using
map->value_size to determine the feature set.

This will break older userspace applications not using the bpf_prog feature.
Consider an old userspace app that is compiled against newer kernel
uapi/bpf.h, it will not know that it need to initialise the member
bpf_prog.fd to minus-1. Thus, users will be forced to update source code to
get program running on newer kernels.

This patch remove the minus-1 checks, and have zero mean feature isn't used.

Followup patches either for kernel or libbpf should handle and avoid
returning file-descriptor zero in the first place.

Fixes: fbee97feed ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159170950687.2102545.7235914718298050113.stgit@firesoul
2020-06-09 11:36:18 -07:00
Michael S. Tsirkin
544fc7dbbf virtio_mem: convert device block size into 64bit
If subblock size is large (e.g. 1G) 32 bit math involving it
can overflow. Rather than try to catch all instances of that,
let's tweak block size to 64 bit.

It ripples through UAPI which is an ABI change, but it's not too late to
make it, and it will allow supporting >4Gbyte blocks while might
become necessary down the road.

Fixes: 5f1f79bbc9 ("virtio-mem: Paravirtualized memory hotplug")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
2020-06-09 06:42:06 -04:00
David S. Miller
6b1ad5a3ad Just a small update:
* fix the deadlock on rfkill/wireless removal that a few
    people reported
  * fix an uninitialized variable
  * update wiki URLs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl7d8c0ACgkQB8qZga/f
 l8SRzg//ZTtHTKOsfZ2IpsAmExkQ+1ZdsGHAGfkgDLQz4rvv1Lug7TvrFPiSyHSm
 jwLlRQNsQ5+Cv2CRY3Xm7Qf8j9wBavYnfHhJkoTrnD3Z770KUS+BXBYb31+Odkxv
 CzsR1GZYTWdYhCrzVIyE+GkQmW2pZ3L8U7ODioM7ETYaK0gAjmCb/HXLiX/m8cGa
 O0uUlJZqE57Trfy5p+WO7cQOLJ9v6WXgSCcrDCNb9Ek25wg5J6RVOMEm7w6oV8oC
 F8uZyVXPC0fSblzHC4cch0yX3z4YIuD12BVZBOVDLJKQBZwqohtxd0jT4MNHJB2y
 BflU13M2kW5pw3l+cBPLZFOsURcDmOcBo9pNYCbi7Uxsd5Hvgft039jeXpukI3QW
 e3d50KB0gSE/plOgXShPVSvm4eQ7WGS3Vyv2IfmU3dY6mxLv7kazSOErFD+fxUMy
 vtdVN/Ie9XyRbh30n5MfTrE3PIf6k7XI3zirZrpMMNfu9fw4a3DQycqoZRBOoU1Y
 l4ThlIduREp+wr14OnF2ueaho9hxVRxh+gnfuhWbzI8VKLHBCVOKe/MsTXzxg5OB
 8xSA9Q1xo/bv+VymaQrY6ENG39sDZB+uI5fi0hnQ2Fu7BHPgp/Juzb56nQ/bWrfG
 DOItqu5PoejvwMP+ju43i8oUDdqjlNgHhwDze+nCHiSnHUf+yWE=
 =wP9I
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2020-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just a small update:
 * fix the deadlock on rfkill/wireless removal that a few
   people reported
 * fix an uninitialized variable
 * update wiki URLs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-08 17:14:19 -07:00
Linus Torvalds
ca687877e0 Changes in gfs2:
- An iopen glock locking scheme rework that speeds up deletes of
   inodes accessed from multiple nodes.
 - Various bug fixes and debugging improvements.
 - Convert gfs2-glocks.txt to ReST.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAl7eYjMUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTr/9g//cJ6jgiD/+qzh0VzougVksVZIduAl
 RMB+kldOjBS2ORbyYM87Jm1tdyakgZuFO91XlSwChWRdC3Y2mqdaJIEE/kATqfY9
 7Frlw++SyFKLvIf04kDYGk2hXX+umXXYFfrIiKb0tzDSGkPRaARUb3RM4TRvlSDP
 /U0JlYA/4aXMUge+VpYsbpSGeqHNfEzmcmCyPXGmZYyh1MZ/RocMZFYEsP9NH82J
 l07fxowUd10LJPEmBajzjD2NmEvjdvF4gBCOfJVNIfOzCj0CwXL3vmtu1SUNOKr+
 em266EWZF89eOcvfdtE6xF0w81oCAK43wYRjIODSgI9JCLXGiOYmlWZVwZoqu5iy
 2GQDhj/taq3ObuVqjR5n6GYuqMoJ+D0LSD13qccDALq/Bdy4lq9TMLSdDbkrVIM/
 8BVn0nI5MzUlTV3mq6uxhU0HqtYDwUEiHWURWw6bYRug5OvQy3nbg/XZptYlnD87
 XQccE4ErjlgSHLiYx1YckFz/GG6ytrRAKl9bGMkZ0u2+XmDsH+iJJgLcaXCPUP9h
 /hhYagKI55UBDer7we4tppbu+gnJrg3PXlgImf53iMc7ia0KQHd+TfSIFkGPuydI
 aEKKhIQzje23JayMbPRnwPbNlw9zU1loPi7hPS3rCpDY+w8oawpFyIieEOcxJyEt
 pYkOt4BQi9LvpGc=
 =PsCY
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - An iopen glock locking scheme rework that speeds up deletes of inodes
   accessed from multiple nodes

 - Various bug fixes and debugging improvements

 - Convert gfs2-glocks.txt to ReST

* tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: fix use-after-free on transaction ail lists
  gfs2: new slab for transactions
  gfs2: initialize transaction tr_ailX_lists earlier
  gfs2: Smarter iopen glock waiting
  gfs2: Wake up when setting GLF_DEMOTE
  gfs2: Check inode generation number in delete_work_func
  gfs2: Move inode generation number check into gfs2_inode_lookup
  gfs2: Minor gfs2_lookup_by_inum cleanup
  gfs2: Try harder to delete inodes locally
  gfs2: Give up the iopen glock on contention
  gfs2: Turn gl_delete into a delayed work
  gfs2: Keep track of deleted inode generations in LVBs
  gfs2: Allow ASPACE glocks to also have an lvb
  gfs2: instrumentation wrt log_flush stuck
  gfs2: introduce new gfs2_glock_assert_withdraw
  gfs2: print mapping->nrpages in glock dump for address space glocks
  gfs2: Only do glock put in gfs2_create_inode for free inodes
  gfs2: Allow lock_nolock mount to specify jid=X
  gfs2: Don't ignore inode write errors during inode_go_sync
  docs: filesystems: convert gfs2-glocks.txt to ReST
2020-06-08 12:47:09 -07:00
Linus Torvalds
23fc02e36e s390 updates for the 5.8 merge window
- Add support for multi-function devices in pci code.
 
 - Enable PF-VF linking for architectures using the
   pdev->no_vf_scan flag (currently just s390).
 
 - Add reipl from NVMe support.
 
 - Get rid of critical section cleanup in entry.S.
 
 - Refactor PNSO CHSC (perform network subchannel operation) in cio
   and qeth.
 
 - QDIO interrupts and error handling fixes and improvements, more
   refactoring changes.
 
 - Align ioremap() with generic code.
 
 - Accept requests without the prefetch bit set in vfio-ccw.
 
 - Enable path handling via two new regions in vfio-ccw.
 
 - Other small fixes and improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl7eVGcACgkQjYWKoQLX
 FBhweQgAkicvx31x230rdfG+jQkQkl0UqF99vvWrJHEll77SqadfjzKAGIjUB+K0
 EoeHVD5Wcj7BogDGcyHeQ0bZpu4WzE+y1nmnrsvu7TEEvcBmkJH0rF2jF+y0sb/O
 3qvwFkX/CB5OqaMzKC/AEeRpcCKR+ZUXkWu1irbYth7CBXaycD9EAPc4cj8CfYGZ
 r5njUdYOVk77TaO4aV+t5pCYc5TCRJaWXSsWaAv/nuLcIqsFBYOy2q+L47zITGXp
 utZVanIDjzx+ikpaKicOIfC3hJsRuNX9MnlZKsQFwpVEZAUZmIUm29XdhGJTWSxU
 RV7m1ORINbFP1nGAqWqkOvGo/LC0ZA==
 =VhXR
 -----END PGP SIGNATURE-----

Merge tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add support for multi-function devices in pci code.

 - Enable PF-VF linking for architectures using the pdev->no_vf_scan
   flag (currently just s390).

 - Add reipl from NVMe support.

 - Get rid of critical section cleanup in entry.S.

 - Refactor PNSO CHSC (perform network subchannel operation) in cio and
   qeth.

 - QDIO interrupts and error handling fixes and improvements, more
   refactoring changes.

 - Align ioremap() with generic code.

 - Accept requests without the prefetch bit set in vfio-ccw.

 - Enable path handling via two new regions in vfio-ccw.

 - Other small fixes and improvements all over the code.

* tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits)
  vfio-ccw: make vfio_ccw_regops variables declarations static
  vfio-ccw: Add trace for CRW event
  vfio-ccw: Wire up the CRW irq and CRW region
  vfio-ccw: Introduce a new CRW region
  vfio-ccw: Refactor IRQ handlers
  vfio-ccw: Introduce a new schib region
  vfio-ccw: Refactor the unregister of the async regions
  vfio-ccw: Register a chp_event callback for vfio-ccw
  vfio-ccw: Introduce new helper functions to free/destroy regions
  vfio-ccw: document possible errors
  vfio-ccw: Enable transparent CCW IPL from DASD
  s390/pci: Log new handle in clp_disable_fh()
  s390/cio, s390/qeth: cleanup PNSO CHSC
  s390/qdio: remove q->first_to_kick
  s390/qdio: fix up qdio_start_irq() kerneldoc
  s390: remove critical section cleanup from entry.S
  s390: add machine check SIGP
  s390/pci: ioremap() align with generic code
  s390/ap: introduce new ap function ap_get_qdev()
  Documentation/s390: Update / remove developerWorks web links
  ...
2020-06-08 12:05:31 -07:00
Linus Torvalds
4e3a16ee91 IOMMU Updates for Linux v5.8
Including:
 
 	- A big part of this is a change in how devices get connected to
 	  IOMMUs in the core code. It contains the change from the old
 	  add_device()/remove_device() to the new
 	  probe_device()/release_device() call-backs. As a result
 	  functionality that was previously in the IOMMU drivers has
 	  been moved to the IOMMU core code, including IOMMU group
 	  allocation for each device.
 	  The reason for this change was to get more robust allocation
 	  of default domains for the iommu groups.
 	  A couple of fixes were necessary after this was merged into
 	  the IOMMU tree, but there are no known bugs left. The last fix
 	  is applied on-top of the merge commit for the topic branches.
 
 	- Removal of the driver private domain handling in the Intel
 	  VT-d driver. This was fragile code and I am glad it is gone
 	  now.
 
 	- More Intel VT-d updates from Lu Baolu:
 
 		- Nested Shared Virtual Addressing (SVA) support to the
 		  Intel VT-d driver
 
 		- Replacement of the Intel SVM interfaces to the common
 		  IOMMU SVA API
 
 		- SVA Page Request draining support
 
 	- ARM-SMMU Updates from Will:
 
 		- Avoid mapping reserved MMIO space on SMMUv3, so that
 		  it can be claimed by the PMU driver
 
 		- Use xarray to manage ASIDs on SMMUv3
 
 		- Reword confusing shutdown message
 
 		- DT compatible string updates
 
 		- Allow implementations to override the default domain
 		  type
 
 	- A new IOMMU driver for the Allwinner Sun50i platform
 
 	- Support for ATS gets disabled for untrusted devices (like
 	  Thunderbolt devices). This includes a PCI patch, acked by
 	  Bjorn.
 
 	- Some cleanups to the AMD IOMMU driver to make more use of
 	  IOMMU core features.
 
 	- Unification of some printk formats in the Intel and AMD IOMMU
 	  drivers and in the IOVA code.
 
 	- Updates for DT bindings
 
 	- A number of smaller fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl7eX5gACgkQK/BELZcB
 GuOMMQ//Si8h3uC4QhTmeNM6OwYpTcImMuCtqOebVDOJYWfbjGb4U2ZvDSUu4r7u
 KGj66pWBq9kciKaM5HcLnWNg4iNNG+iZHwYSOy2DAOdPorWh40aM/Obozdd4D4eK
 sXt4uy1JEQem/Bm4eTwmvaJV5/riyK6xn1HVocPejstGSJCh4kal/bYuhj415qEa
 LLrN0AcitoPaSRl4Pl7/wEtesk+Az0g94jY9qDhtxIQJXWlAwO25s+rIPy4S7QuW
 WAFGU+Xp+J7WC3hQm6nHKQtURIqPHtqozT9Flws9YETuyeKwn47GRitMiAXZsy7R
 t+kj1cHyglEhe2hdPnJBSFIjyrO3cCrV7CUVryJHigPCQOaQLjoEegThQCYU3VQu
 FPRBX+bp4haHeo3BCBy2jQv4JZrPFkTVXeVEtpMRDOoJLb2OKaI34xbOvGy6dMM0
 dFtpbAW2IjHuneJaQCbJIC+jaEYii8mr3Zwok4LS8u8Sy+7PPSKmt6Tti3enD8+C
 pBB/0CxNJvQFhl13s6oI8NHTT9D6cPTbjxc2Gfc3UuKyyWsz+eR54gRhaBi0FypA
 p6syMosNVjjOaHFd5K5gsbpUFCC3X/drIhqeXRLgQ51mqfkNZMuBBtiyLWTk7iJd
 CK+1f2aqtBrpUdSNjTzE/XmR+AhjIn2oIcG/7jPCgYXQoSGM2Sg=
 =a4z4
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "A big part of this is a change in how devices get connected to IOMMUs
  in the core code. It contains the change from the old add_device() /
  remove_device() to the new probe_device() / release_device()
  call-backs.

  As a result functionality that was previously in the IOMMU drivers has
  been moved to the IOMMU core code, including IOMMU group allocation
  for each device. The reason for this change was to get more robust
  allocation of default domains for the iommu groups.

  A couple of fixes were necessary after this was merged into the IOMMU
  tree, but there are no known bugs left. The last fix is applied on-top
  of the merge commit for the topic branches.

  Other than that change, we have:

   - Removal of the driver private domain handling in the Intel VT-d
     driver. This was fragile code and I am glad it is gone now.

   - More Intel VT-d updates from Lu Baolu:
      - Nested Shared Virtual Addressing (SVA) support to the Intel VT-d
        driver
      - Replacement of the Intel SVM interfaces to the common IOMMU SVA
        API
      - SVA Page Request draining support

   - ARM-SMMU Updates from Will:
      - Avoid mapping reserved MMIO space on SMMUv3, so that it can be
        claimed by the PMU driver
      - Use xarray to manage ASIDs on SMMUv3
      - Reword confusing shutdown message
      - DT compatible string updates
      - Allow implementations to override the default domain type

   - A new IOMMU driver for the Allwinner Sun50i platform

   - Support for ATS gets disabled for untrusted devices (like
     Thunderbolt devices). This includes a PCI patch, acked by Bjorn.

   - Some cleanups to the AMD IOMMU driver to make more use of IOMMU
     core features.

   - Unification of some printk formats in the Intel and AMD IOMMU
     drivers and in the IOVA code.

   - Updates for DT bindings

   - A number of smaller fixes and cleanups.

* tag 'iommu-updates-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (109 commits)
  iommu: Check for deferred attach in iommu_group_do_dma_attach()
  iommu/amd: Remove redundant devid checks
  iommu/amd: Store dev_data as device iommu private data
  iommu/amd: Merge private header files
  iommu/amd: Remove PD_DMA_OPS_MASK
  iommu/amd: Consolidate domain allocation/freeing
  iommu/amd: Free page-table in protection_domain_free()
  iommu/amd: Allocate page-table in protection_domain_init()
  iommu/amd: Let free_pagetable() not rely on domain->pt_root
  iommu/amd: Unexport get_dev_data()
  iommu/vt-d: Fix compile warning
  iommu/vt-d: Remove real DMA lookup in find_domain
  iommu/vt-d: Allocate domain info for real DMA sub-devices
  iommu/vt-d: Only clear real DMA device's context entries
  iommu: Remove iommu_sva_ops::mm_exit()
  uacce: Remove mm_exit() op
  iommu/sun50i: Constify sun50i_iommu_ops
  iommu/hyper-v: Constify hyperv_ir_domain_ops
  iommu/vt-d: Use pci_ats_supported()
  iommu/arm-smmu-v3: Use pci_ats_supported()
  ...
2020-06-08 11:42:23 -07:00
Flavio Suligoi
97eda66421 include: fix wiki website url in netlink interface header
The wiki url is still the old "wireless.kernel.org"
instead of the new "wireless.wiki.kernel.org"

Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
Link: https://lore.kernel.org/r/20200605154112.16277-9-f.suligoi@asem.it
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-08 10:05:59 +02:00
Linus Torvalds
e8dff03aef RTC for 5.8
Subsystem:
  - new VL flag for backup switch over
 
 Drivers:
  - ingenic: only support device tree
  - pcf2127: report battery switch over, handle nowayout
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEycoQi/giopmpPgB12wIijOdRNOUFAl7dWSkACgkQ2wIijOdR
 NOXGbQ//cSTxUJbuYBNi/VCV7J3/khGlyoQQqDsru/tzuEwXHGBoG2LRNQMOauWd
 2Osg61VQj4IY+WCqp4+ivn5H0K26y1PPKkt+UmrlRgkl0eeDFWmY4ejpziZ85D7Z
 kDlzcUi3YWkd6m4YSJJrtdCcKljBMIEXb/PEKKK9y6dkrcG5990N8JchpmkCzrjx
 fTPVIOfxu43msDc5b8egUDzPYnNbFw3ERAeasr6/EGTz+ksCspXtvWDk/mJzum0G
 FiermTkO499Dr66Nf0AS3ex9SvEoqH+kd9KA1CKii5OlYEl7K9sI+eSmTQ1EutZO
 L5WAvvQdW8UkARo6R4HAobhwK27pL+wpzUljbyXxt940/RTeqp82kl7rnH+0ihU7
 tTbR2Vu+uwWrfQbPkCCj0TJmqIHgam5/Vhn1+ZR2f4U2JIlPvvHoLRVKO0oP7XKK
 1ZDcP8zc9V2LQ2G2M1/ec6eOmoGW3EZDnKp4hcv9mnEiePSvVn04t5sa83NjNs4R
 e+awVY1x5pFwoXu99gjlfQTV2kTyaA7Jywp6gIO7BKaw/Ci3+d3tlpowfsDH+UVI
 WwKxNNqmuNXqoIep0zqUhqXHNIizKxGEk8wE4mr8HP2SlGJ+lUHAyrTTdpLeinN1
 5qTEPT3BhjExSFfDZQyWV3+CzKMvxtfFA4/Ca/0iSoaqzMZpm1E=
 =dsKr
 -----END PGP SIGNATURE-----

Merge tag 'rtc-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "Not much this cycle apart from the ingenic rtc driver rework.

  The fixes are mainly minor issues reported by coccinelle rather than
  real world issues.

  Subsystem:

   - new VL flag for backup switch over

  Drivers:

   - ingenic: only support device tree

   - pcf2127: report battery switch over, handle nowayout"

* tag 'rtc-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (29 commits)
  rtc: pcf2127: watchdog: handle nowayout feature
  rtc: fsl-ftm-alarm: fix freeze(s2idle) failed to wake
  rtc: abx80x: Provide debug feedback for invalid dt properties
  rtc: abx80x: Add Device Tree matching table
  rtc: rv3028: Add missed check for devm_regmap_init_i2c()
  rtc: mpc5121: Use correct return value for mpc5121_rtc_probe()
  rtc: goldfish: Use correct return value for goldfish_rtc_probe()
  rtc: snvs: Add necessary clock operations for RTC APIs
  rtc: snvs: Make SNVS clock always prepared
  rtc: ingenic: Reset regulator register in probe
  rtc: ingenic: Fix masking of error code
  rtc: ingenic: Remove unused fields from private structure
  rtc: ingenic: Set wakeup params in probe
  rtc: ingenic: Enable clock in probe
  rtc: ingenic: Use local 'dev' variable in probe
  rtc: ingenic: Only support probing from devicetree
  rtc: mc13xxx: fix a double-unlock issue
  rtc: stmp3xxx: update contact email
  rtc: max77686: Use single-byte writes on MAX77620
  rtc: pcf2127: report battery switch over
  ...
2020-06-07 16:11:23 -07:00
Linus Torvalds
9aa900c809 Char/Misc driver patches for 5.8-rc1
Here is the large set of char/misc driver patches for 5.8-rc1
 
 Included in here are:
 	- habanalabs driver updates, loads
 	- mhi bus driver updates
 	- extcon driver updates
 	- clk driver updates (approved by the clock maintainer)
 	- firmware driver updates
 	- fpga driver updates
 	- gnss driver updates
 	- coresight driver updates
 	- interconnect driver updates
 	- parport driver updates (it's still alive!)
 	- nvmem driver updates
 	- soundwire driver updates
 	- visorbus driver updates
 	- w1 driver updates
 	- various misc driver updates
 
 In short, loads of different driver subsystem updates along with the
 drivers as well.
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXtzkHw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yldOwCgus/DgpnI1UL4z+NdBxJrAXtkPmgAn2sgTUea
 i5RblCmcVMqvHaGtYkY+
 =tScN
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the large set of char/misc driver patches for 5.8-rc1

  Included in here are:

   - habanalabs driver updates, loads

   - mhi bus driver updates

   - extcon driver updates

   - clk driver updates (approved by the clock maintainer)

   - firmware driver updates

   - fpga driver updates

   - gnss driver updates

   - coresight driver updates

   - interconnect driver updates

   - parport driver updates (it's still alive!)

   - nvmem driver updates

   - soundwire driver updates

   - visorbus driver updates

   - w1 driver updates

   - various misc driver updates

  In short, loads of different driver subsystem updates along with the
  drivers as well.

  All have been in linux-next for a while with no reported issues"

* tag 'char-misc-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (233 commits)
  habanalabs: correctly cast u64 to void*
  habanalabs: initialize variable to default value
  extcon: arizona: Fix runtime PM imbalance on error
  extcon: max14577: Add proper dt-compatible strings
  extcon: adc-jack: Fix an error handling path in 'adc_jack_probe()'
  extcon: remove redundant assignment to variable idx
  w1: omap-hdq: print dev_err if irq flags are not cleared
  w1: omap-hdq: fix interrupt handling which did show spurious timeouts
  w1: omap-hdq: fix return value to be -1 if there is a timeout
  w1: omap-hdq: cleanup to add missing newline for some dev_dbg
  /dev/mem: Revoke mappings when a driver claims the region
  misc: xilinx-sdfec: convert get_user_pages() --> pin_user_pages()
  misc: xilinx-sdfec: cleanup return value in xsdfec_table_write()
  misc: xilinx-sdfec: improve get_user_pages_fast() error handling
  nvmem: qfprom: remove incorrect write support
  habanalabs: handle MMU cache invalidation timeout
  habanalabs: don't allow hard reset with open processes
  habanalabs: GAUDI does not support soft-reset
  habanalabs: add print for soft reset due to event
  habanalabs: improve MMU cache invalidation code
  ...
2020-06-07 10:59:32 -07:00
Zhu Lingshan
776f395004 vhost_vdpa: Support config interrupt in vdpa
This commit implements config interrupt support in
vhost_vdpa layer.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/1591352835-22441-4-git-send-email-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-06 16:26:27 -04:00
Linus Torvalds
0b166a57e6 A lot of bug fixes and cleanups for ext4, including:
* Fix performance problems found in dioread_nolock now that it is the
   default, caused by transaction leaks.
 * Clean up fiemap handling in ext4
 * Clean up and refactor multiple block allocator (mballoc) code
 * Fix a problem with mballoc with a smaller file systems running out
   of blocks because they couldn't properly use blocks that had been
   reserved by inode preallocation.
 * Fixed a race in ext4_sync_parent() versus rename()
 * Simplify the error handling in the extent manipulation code
 * Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and
   ext4_make_inode_dirty()'s callers.
 * Avoid passing an error pointer to brelse in ext4_xattr_set()
 * Fix race which could result to freeing an inode on the dirty last
   in data=journal mode.
 * Fix refcount handling if ext4_iget() fails
 * Fix a crash in generic/019 caused by a corrupted extent node
 -----BEGIN PGP SIGNATURE-----
 
 iQEyBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl7Ze8kACgkQ8vlZVpUN
 gaNChAf4xn0ytFSrweI/S2Sp05G/2L/ocZ2TZZk2ZdGeN1E+ABdSIv/zIF9zuFgZ
 /pY/C+fyEZWt4E3FlNO8gJzoEedkzMCMnUhSIfI+wZbcclyTOSNMJtnrnJKAEtVH
 HOvGZJmg357jy407RCGhZpJ773nwU2xhBTr5OFxvSf9mt/vzebxIOnw5D7HPlC1V
 Fgm6Du8q+tRrPsyjv1Yu4pUEVXMJ7qUcvt326AXVM3kCZO1Aa5GrURX0w3J4mzW1
 tc1tKmtbLcVVYTo9CwHXhk/edbxrhAydSP2iACand3tK6IJuI6j9x+bBJnxXitnr
 vsxsfTYMG18+2SxrJ9LwmagqmrRq
 =HMTs
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "A lot of bug fixes and cleanups for ext4, including:

   - Fix performance problems found in dioread_nolock now that it is the
     default, caused by transaction leaks.

   - Clean up fiemap handling in ext4

   - Clean up and refactor multiple block allocator (mballoc) code

   - Fix a problem with mballoc with a smaller file systems running out
     of blocks because they couldn't properly use blocks that had been
     reserved by inode preallocation.

   - Fixed a race in ext4_sync_parent() versus rename()

   - Simplify the error handling in the extent manipulation code

   - Make sure all metadata I/O errors are felected to
     ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.

   - Avoid passing an error pointer to brelse in ext4_xattr_set()

   - Fix race which could result to freeing an inode on the dirty last
     in data=journal mode.

   - Fix refcount handling if ext4_iget() fails

   - Fix a crash in generic/019 caused by a corrupted extent node"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
  ext4: avoid unnecessary transaction starts during writeback
  ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
  ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
  fs: remove the access_ok() check in ioctl_fiemap
  fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
  fs: move fiemap range validation into the file systems instances
  iomap: fix the iomap_fiemap prototype
  fs: move the fiemap definitions out of fs.h
  fs: mark __generic_block_fiemap static
  ext4: remove the call to fiemap_check_flags in ext4_fiemap
  ext4: split _ext4_fiemap
  ext4: fix fiemap size checks for bitmap files
  ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
  add comment for ext4_dir_entry_2 file_type member
  jbd2: avoid leaking transaction credits when unreserving handle
  ext4: drop ext4_journal_free_reserved()
  ext4: mballoc: use lock for checking free blocks while retrying
  ext4: mballoc: refactor ext4_mb_good_group()
  ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
  ext4: mballoc: refactor ext4_mb_discard_preallocations()
  ...
2020-06-05 16:19:28 -07:00
Linus Torvalds
5a36f0f3f5 VFIO updates for v5.8-rc1
- Block accesses to disabled MMIO space (Alex Williamson)
 
  - VFIO device migration API (Kirti Wankhede)
 
  - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede)
 
  - PCI NULL capability masking (Alex Williamson)
 
  - Memory leak fixes (Qian Cai)
 
  - Reference leak fix (Qiushi Wu)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJe19RZAAoJECObm247sIsiBTEP/1DsA/mL/eDR94Hkmz3y1qoX
 tkrIJuM5xo95uk6DsbXDikbjM2jpUmvDUiRilara25IceuVX/tKgTjgxNOWJd+h8
 KHBoPM0bveILp+Hp6r6ZAEfhZLQ3P6A5XBTB6ql0ebveadyt13TP9Cr0jCf4Jxr2
 2A7wZAexWN8P4YBXBqrW8X3ZF3V9rfWYOJ4eILRvEplJ/TkqsGJyKW8LKP45waUF
 wutN6yh7irnWQbbUO6hb/eTIO7usrNB2L4A5CJMSZIm8JZYYmFQOrn73Ry3bA3Pg
 U+v3xIRU5U1ed/1DeLs8lRDDxImmkSXbVga0N+e3qnKtDIBh7/i9yNT9hQjhG9Ba
 C3IQoOM0Fj03hK/UahNerN7FKYoj+JvtdSKeTSNFXwdp/JUSXjj1tLouxUBSFsgD
 P15CMPXF/U6rT3nQ7/3z3JFKft/iSf6ShjXeNJFqhwH+lKsj13kF2+py3+vgayl8
 zLOqeLdRkQwXz73UStX+hrc7MmNYZVRxJ0uf3XXq3/CZ1gwiinIQqxDF3ry+pQ6w
 vXB8j5XJqsOguoAU7trTdvkKpfoJd4yY0D6aNN9F3JN8xkEBbfm6xZ7Hzur2JI6r
 iCxUHcS4XrXA/PRlTFupg8451CefuPJa2UkAWSdk182hO2R41l458Q6b7lIMqhOQ
 oP04S8VYvvDIw5JVnPWU
 =KSAE
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Block accesses to disabled MMIO space (Alex Williamson)

 - VFIO device migration API (Kirti Wankhede)

 - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede)

 - PCI NULL capability masking (Alex Williamson)

 - Memory leak fixes (Qian Cai)

 - Reference leak fix (Qiushi Wu)

* tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio:
  vfio iommu: typecast corrections
  vfio iommu: Use shift operation for 64-bit integer division
  vfio/mdev: Fix reference count leak in add_mdev_supported_type
  vfio: Selective dirty page tracking if IOMMU backed device pins pages
  vfio iommu: Add migration capability to report supported features
  vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
  vfio iommu: Implementation of ioctl for dirty pages tracking
  vfio iommu: Add ioctl definition for dirty pages tracking
  vfio iommu: Cache pgsize_bitmap in struct vfio_iommu
  vfio iommu: Remove atomicity of ref_count of pinned pages
  vfio: UAPI for migration interface for device state
  vfio/pci: fix memory leaks of eventfd ctx
  vfio/pci: fix memory leaks in alloc_perm_bits()
  vfio-pci: Mask cap zero
  vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  vfio-pci: Fault mmaps to enable vma tracking
  vfio/type1: Support faulting PFNMAP vmas
2020-06-05 13:51:49 -07:00
Andreas Gruenbacher
300e549b6e Merge branch 'gfs2-iopen' into for-next 2020-06-05 21:25:36 +02:00
Andreas Gruenbacher
f286d627ef gfs2: Keep track of deleted inode generations in LVBs
When deleting an inode, keep track of the generation of the deleted inode in
the inode glock Lock Value Block (LVB).  When trying to delete an inode
remotely, check the last-known inode generation against the deleted inode
generation to skip duplicate remote deletes.  This avoids taking the resource
group glock in order to verify the block type.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:20 +02:00
Linus Torvalds
828f3e18e1 ARM/SoC: drivers for v5.7
These are updates to SoC specific drivers that did not have
 another subsystem maintainer tree to go through for some
 reason:
 
 - Some bus and memory drivers for the MIPS P5600 based
   Baikal-T1 SoC that is getting added through the MIPS tree.
 
 - There are new soc_device identification drivers for TI K3,
   Qualcomm MSM8939
 
 - New reset controller drivers for NXP i.MX8MP, Renesas
   RZ/G1H, and Hisilicon hi6220
 
 - The SCMI firmware interface can now work across ARM SMC/HVC
   as a transport.
 
 - Mediatek platforms now use a new driver for their "MMSYS"
   hardware block that controls clocks and some other aspects
   in behalf of the media and gpu drivers.
 
 - Some Tegra processors have improved power management
   support, including getting woken up by the PMIC and cluster
   power down during idle.
 
 - A new v4l staging driver for Tegra is added.
 
 - Cleanups and minor bugfixes for TI, NXP, Hisilicon,
   Mediatek, and Tegra.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl7XvgAACgkQmmx57+YA
 GNmj/hAAnAJ/hYehLfgCe711HUntgeRkaoTVpCt8BJNMdxsa23sn3V6k5+WYn1uG
 PtlgpefZEMHLUEEVDegR4nZXLG0Pzu1SR12KW34YPcQKkNo/+vlQ9zYUajnJ/KX6
 10zdLSIzHfk1VtXKvvQQ8xFyE+S/trGmjC57E6gfoCUT3rl1maD+ccVXUBaz9oob
 wuMxGXQAl57mio5yT1OfSk6Fev39xRE2dN1hzP7KUYhsemZajBwBBW5wVJZCsCB8
 LCGmxVkavM7BV4r2NokbBDs5rlfedBl/P/IPd9Is5a5tuGUkSsVRG9zqShxYLGM3
 S06az6POQFwXKFJoUKW0dK/Koy0D7BK+vhUBPzFv4HZ8iDCVf6Jju2MJ02GMqHPj
 OOrXaCbLYrvN/edVUWeeFywqwMbYTRwC4DxyTq5m7HxEB004xTOhs0rX5aR0u4n1
 bbsR97LguolwH9iEMzd3F3jCiKBcMecH3lAh5WcrtwlFIRrNhbWoGDoA/4TuORFS
 b11rgsTRIJ5Vc++D1HnSnx0ZZvUzyluMvygdALnSgVah6xYe6KVw9Kg/wioAJ04G
 uSTidqP3qRhsyET2HQo7CxdVfZbKfP25iKCKrdhQziztKvhF8qrUmZKloXOodRw+
 ewYSRmv8c324OYYit1X43oAdW8dntq1XbSIauaqxEb4JC7x/xRY=
 =44Jc
 -----END PGP SIGNATURE-----

Merge tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM/SoC driver updates from Arnd Bergmann:
 "These are updates to SoC specific drivers that did not have another
  subsystem maintainer tree to go through for some reason:

   - Some bus and memory drivers for the MIPS P5600 based Baikal-T1 SoC
     that is getting added through the MIPS tree.

   - There are new soc_device identification drivers for TI K3, Qualcomm
     MSM8939

   - New reset controller drivers for NXP i.MX8MP, Renesas RZ/G1H, and
     Hisilicon hi6220

   - The SCMI firmware interface can now work across ARM SMC/HVC as a
     transport.

   - Mediatek platforms now use a new driver for their "MMSYS" hardware
     block that controls clocks and some other aspects in behalf of the
     media and gpu drivers.

   - Some Tegra processors have improved power management support,
     including getting woken up by the PMIC and cluster power down
     during idle.

   - A new v4l staging driver for Tegra is added.

   - Cleanups and minor bugfixes for TI, NXP, Hisilicon, Mediatek, and
     Tegra"

* tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (155 commits)
  clk: sprd: fix compile-testing
  bus: bt1-axi: Build the driver into the kernel
  bus: bt1-apb: Build the driver into the kernel
  bus: bt1-axi: Use sysfs_streq instead of strncmp
  bus: bt1-axi: Optimize the return points in the driver
  bus: bt1-apb: Use sysfs_streq instead of strncmp
  bus: bt1-apb: Use PTR_ERR_OR_ZERO to return from request-regs method
  bus: bt1-apb: Fix show/store callback identations
  bus: bt1-apb: Include linux/io.h
  dt-bindings: memory: Add Baikal-T1 L2-cache Control Block binding
  memory: Add Baikal-T1 L2-cache Control Block driver
  bus: Add Baikal-T1 APB-bus driver
  bus: Add Baikal-T1 AXI-bus driver
  dt-bindings: bus: Add Baikal-T1 APB-bus binding
  dt-bindings: bus: Add Baikal-T1 AXI-bus binding
  staging: tegra-video: fix V4L2 dependency
  tee: fix crypto select
  drivers: soc: ti: knav_qmss_queue: Make knav_gp_range_ops static
  soc: ti: add k3 platforms chipid module driver
  dt-bindings: soc: ti: add binding for k3 platforms chipid module
  ...
2020-06-04 19:56:20 -07:00
David Hildenbrand
fce8afd76e virtio-mem: Don't rely on implicit compiler padding for requests
The compiler will add padding after the last member, make that explicit.
The size of a request is always 24 bytes. The size of a response always
10 bytes. Add compile-time checks.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200515101402.16597-1-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04 15:36:52 -04:00
David Hildenbrand
f2af6d3978 virtio-mem: Allow to specify an ACPI PXM as nid
We want to allow to specify (similar as for a DIMM), to which node a
virtio-mem device (and, therefore, its memory) belongs. Add a new
virtio-mem feature flag and export pxm_to_node, so it can be used in kernel
module context.

Acked-by: Michal Hocko <mhocko@suse.com> # for the export
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> # for the export
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-4-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04 15:36:52 -04:00
David Hildenbrand
5f1f79bbc9 virtio-mem: Paravirtualized memory hotplug
Each virtio-mem device owns exactly one memory region. It is responsible
for adding/removing memory from that memory region on request.

When the device driver starts up, the requested amount of memory is
queried and then plugged to Linux. On request, further memory can be
plugged or unplugged. This patch only implements the plugging part.

On x86-64, memory can currently be plugged in 4MB ("subblock") granularity.
When required, a new memory block will be added (e.g., usually 128MB on
x86-64) in order to plug more subblocks. Only x86-64 was tested for now.

The online_page callback is used to keep unplugged subblocks offline
when onlining memory - similar to the Hyper-V balloon driver. Unplugged
pages are marked PG_offline, to tell dump tools (e.g., makedumpfile) to
skip them.

User space is usually responsible for onlining the added memory. The
memory hotplug notifier is used to synchronize virtio-mem activity
against memory onlining/offlining.

Each virtio-mem device can belong to a NUMA node, which allows us to
easily add/remove small chunks of memory to/from a specific NUMA node by
using multiple virtio-mem devices. Something that works even when the
guest has no idea about the NUMA topology.

One way to view virtio-mem is as a "resizable DIMM" or a DIMM with many
"sub-DIMMS".

This patch directly introduces the basic infrastructure to implement memory
unplug. Especially the memory block states and subblock bitmaps will be
heavily used there.

Notes:
- In case memory is to be onlined by user space, we limit the amount of
  offline memory blocks, to not run out of memory. This is esp. an
  issue if memory is added faster than it is getting onlined.
- Suspend/Hibernate is not supported due to the way virtio-mem devices
  behave. Limited support might be possible in the future.
- Reloading the device driver is not supported.

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-2-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04 15:36:52 -04:00
Linus Torvalds
a98f670e41 media updates for v5.8-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl7XUmwACgkQCF8+vY7k
 4RU4zg//fT32wiVAPHCCp+pDZVnWNeipXE1gnpqghd/qZXfzBPiLEC9sPS74VVkA
 jf1hhR33VZpKAKTPg/b074qhRZBywEOdHZnT/0CEE1oNB61shVOnyDYzLGSq95cO
 6V55ovbi5IOkrg0QEJbHpG5YHzt+pq5XeWOkqGNsHwla7N7iMGMVYfHepVVDWPnZ
 0wGYFF9cAJP+X/uxqkZLDVMA/K1I+QKh6vrj/qx53/eRt8VID3+i8ig3guk4PlUq
 7RLw5w/CywtNaGE5zaz7T3i2eoED71JHOTXi6RxdP1z8IDvELZ9mT95GQ+enlwqt
 AS6Ju1sV40wviHMv5prJWQjJkrrtYH3S907lIjwBpQLNGbh2+5crCd/6CwumkGgv
 1cCZ1dVmXpCe++9mU9AXmSkjsjGPStNcmHMOpc1Pwn9jUV3LQOOSDp8+RYdt1WHU
 Iw9cyM8NOpz5Mv/B1/ZPQ1gPb9lr1gE09XyUekxtAI/nl4nNHGWO8QDuX7Odfrv9
 8nfo14lk/p6XCTA8dsWJCgI5B1fgnqD4frHKWO9Uctppc/KBW41c8JpQUjBNlG/T
 MhtlGwYMVgSQxpQ6wK018JUAFoWkn1Sr0zMKRayqCnMjMLHsaMwE6kq+LgmRBqbB
 ersKV/9ZLYqCU1d6PhEVG6xUs6GsWdLcyhALlmHsddPSdpFXdf8=
 =KNAo
 -----END PGP SIGNATURE-----

Merge tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - Media documentation is now split into admin-guide, driver-api and
   userspace-api books (a longstanding request from Jon);

 - The media Kconfig was reorganized, in order to make easier to select
   drivers and their dependencies;

 - The testing drivers now has a separate directory;

 - added a new driver for Rockchip Video Decoder IP;

 - The atomisp staging driver was resurrected. It is meant to work with
   4 generations of cameras on Atom-based laptops, tablets and cell
   phones. So, it seems worth investing time to cleanup this driver and
   making it in good shape.

 - Added some V4L2 core ancillary routines to help with h264 codecs;

 - Added an ov2740 image sensor driver;

 - The si2157 gained support for Analog TV, which, in turn, added
   support for some cx231xx and cx23885 boards to also support analog
   standards;

 - Added some V4L2 controls (V4L2_CID_CAMERA_ORIENTATION and
   V4L2_CID_CAMERA_SENSOR_ROTATION) to help identifying where the camera
   is located at the device;

 - VIDIOC_ENUM_FMT was extended to support MC-centric devices;

 - Lots of drivers improvements and cleanups.

* tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (503 commits)
  media: Documentation: media: Refer to mbus format documentation from CSI-2 docs
  media: s5k5baf: Replace zero-length array with flexible-array
  media: i2c: imx219: Drop <linux/clk-provider.h> and <linux/clkdev.h>
  media: i2c: Add ov2740 image sensor driver
  media: ov8856: Implement sensor module revision identification
  media: ov8856: Add devicetree support
  media: dt-bindings: ov8856: Document YAML bindings
  media: dvb-usb: Add Cinergy S2 PCIe Dual Port support
  media: dvbdev: Fix tuner->demod media controller link
  media: dt-bindings: phy: phy-rockchip-dphy-rx0: move rockchip dphy rx0 bindings out of staging
  media: staging: dt-bindings: phy-rockchip-dphy-rx0: remove non-used reg property
  media: atomisp: unify the version for isp2401 a0 and b0 versions
  media: atomisp: update TODO with the current data
  media: atomisp: adjust some code at sh_css that could be broken
  media: atomisp: don't produce errs for ignored IRQs
  media: atomisp: print IRQ when debugging
  media: atomisp: isp_mmu: don't use kmem_cache
  media: atomisp: add a notice about possible leak resources
  media: atomisp: disable the dynamic and reserved pools
  media: atomisp: turn on camera before setting it
  ...
2020-06-03 20:59:38 -07:00
Christoph Hellwig
10c5db2864 fs: move the fiemap definitions out of fs.h
No need to pull the fiemap definitions into almost every file in the
kernel build.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Linus Torvalds
cb8e59cc87 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

 2) Add GSO partial support to igc, from Sasha Neftin.

 3) Several cleanups and improvements to r8169 from Heiner Kallweit.

 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

 5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

 8) Add sriov and vf support to hinic, from Luo bin.

 9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

18) Several RISCV bpf jit optimizations, from Luke Nelson.

19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

21) Add BPF iterators, from Yonghang Song.

22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

25) Add CAP_BPF, from Alexei Starovoitov.

26) Support terse dumps in the packet scheduler, from Vlad Buslov.

27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

28) Add devm_register_netdev(), from Bartosz Golaszewski.

29) Minimize qdisc resets, from Cong Wang.

30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
  selftests: net: ip_defrag: ignore EPERM
  net_failover: fixed rollback in net_failover_open()
  Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
  Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
  vmxnet3: allow rx flow hash ops only when rss is enabled
  hinic: add set_channels ethtool_ops support
  selftests/bpf: Add a default $(CXX) value
  tools/bpf: Don't use $(COMPILE.c)
  bpf, selftests: Use bpf_probe_read_kernel
  s390/bpf: Use bcr 0,%0 as tail call nop filler
  s390/bpf: Maintain 8-byte stack alignment
  selftests/bpf: Fix verifier test
  selftests/bpf: Fix sample_cnt shared between two threads
  bpf, selftests: Adapt cls_redirect to call csum_level helper
  bpf: Add csum_level helper for fixing up csum levels
  bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
  sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
  crypto/chtls: IPv6 support for inline TLS
  Crypto/chcr: Fixes a coccinile check error
  Crypto/chcr: Fixes compilations warnings
  ...
2020-06-03 16:27:18 -07:00
Linus Torvalds
039aeb9deb ARM:
- Move the arch-specific code into arch/arm64/kvm
 - Start the post-32bit cleanup
 - Cherry-pick a few non-invasive pre-NV patches
 
 x86:
 - Rework of TLB flushing
 - Rework of event injection, especially with respect to nested virtualization
 - Nested AMD event injection facelift, building on the rework of generic code
 and fixing a lot of corner cases
 - Nested AMD live migration support
 - Optimization for TSC deadline MSR writes and IPIs
 - Various cleanups
 - Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
 - Interrupt-based delivery of asynchronous "page ready" events (host side)
 - Hyper-V MSRs and hypercalls for guest debugging
 - VMX preemption timer fixes
 
 s390:
 - Cleanups
 
 Generic:
 - switch vCPU thread wakeup from swait to rcuwait
 
 The other architectures, and the guest side of the asynchronous page fault
 work, will come next week.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
 ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
 5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
 7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
 asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
 CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
 =v7Wn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Move the arch-specific code into arch/arm64/kvm

   - Start the post-32bit cleanup

   - Cherry-pick a few non-invasive pre-NV patches

  x86:
   - Rework of TLB flushing

   - Rework of event injection, especially with respect to nested
     virtualization

   - Nested AMD event injection facelift, building on the rework of
     generic code and fixing a lot of corner cases

   - Nested AMD live migration support

   - Optimization for TSC deadline MSR writes and IPIs

   - Various cleanups

   - Asynchronous page fault cleanups (from tglx, common topic branch
     with tip tree)

   - Interrupt-based delivery of asynchronous "page ready" events (host
     side)

   - Hyper-V MSRs and hypercalls for guest debugging

   - VMX preemption timer fixes

  s390:
   - Cleanups

  Generic:
   - switch vCPU thread wakeup from swait to rcuwait

  The other architectures, and the guest side of the asynchronous page
  fault work, will come next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
  KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
  KVM: check userspace_addr for all memslots
  KVM: selftests: update hyperv_cpuid with SynDBG tests
  x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
  x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
  x86/kvm/hyper-v: Add support for synthetic debugger interface
  x86/hyper-v: Add synthetic debugger definitions
  KVM: selftests: VMX preemption timer migration test
  KVM: nVMX: Fix VMX preemption timer migration
  x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
  KVM: x86/pmu: Support full width counting
  KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
  KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
  KVM: x86: acknowledgment mechanism for async pf page ready notifications
  KVM: x86: interrupt based APF 'page ready' event delivery
  KVM: introduce kvm_read_guest_offset_cached()
  KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
  KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  KVM: VMX: Replace zero-length array with flexible-array
  ...
2020-06-03 15:13:47 -07:00
Farhan Ali
d8cac29b1d vfio-ccw: Introduce a new CRW region
This region provides a mechanism to pass a Channel Report Word
that affect vfio-ccw devices, and needs to be passed to the guest
for its awareness and/or processing.

The base driver (see crw_collect_info()) provides space for two
CRWs, as a subchannel event may have two CRWs chained together
(one for the ssid, one for the subchannel).  As vfio-ccw will
deal with everything at the subchannel level, provide space
for a single CRW to be transferred in one shot.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200505122745.53208-7-farman@linux.ibm.com>
[CH: added padding to ccw_crw_region]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-03 11:27:43 +02:00
Linus Torvalds
f3cdc8ae11 for-5.8-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl7U50AACgkQxWXV+ddt
 WDtK1g//RXeNsTguYQr1N9R5eUPThjLEI0+4J0l4SYfCPU8Ou3C7nqpOEJJQgm8F
 ezZE+16cWi9U5uGueOc+w0rfyz4AuIXKgzoz+c0/GG2+yV5jp6DsAMbWqojAb96L
 V/N3HxEzR66jqwgVUBE/x5okb2SyY7//B1l/O0amc66XDO7KTMImpIwThere6zWZ
 o2SNpYpHAPQeUYJQx8h+FAW3w1CxrCZmnifazU9Jqe9J7QeQLg7rbUlJDV38jySm
 ZOA8ohKN9U1gPZy+dTU3kdyyuBIq1etkIaSPJANyTo5TczPKiC0IMg75cXtS4ae/
 NSxhccMpSIjVMcIHARzSFGYKNP3sGNRsmaTUg/2Cx/9GoHOhYMiCAVc8qtBBpwJO
 UI0siexrCe64RuTBMRRc128GdFv7IjmSImcdi8xaR62bCcUiNdEa3zvjRe/9tOEH
 ET7Z85oBnKpSzpC3MdhSUU4dtHY5XLawP8z3oUU1VSzSWM2DVjlHf79/VzbOfp18
 miCVpt94lCn/gUX7el6qcnbuvMAjDyeC6HmfD+TwzQgGwyV6TLgKN9lRXeH/Oy6/
 VgjGQSavGHMll3zIGURmrBCXKudjJg0J+IP4wN1TimmSEMfwKH+7tnekQd8y5qlF
 eXEIqlWNykKeDzEnmV9QJy+/cV83hVWM/mUslcTx39tLN/3B/Us=
 =qTt8
 -----END PGP SIGNATURE-----

Merge tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "Highlights:

   - speedup dead root detection during orphan cleanup, eg. when there
     are many deleted subvolumes waiting to be cleaned, the trees are
     now looked up in radix tree instead of a O(N^2) search

   - snapshot creation with inherited qgroup will mark the qgroup
     inconsistent, requires a rescan

   - send will emit file capabilities after chown, this produces a
     stream that does not need postprocessing to set the capabilities
     again

   - direct io ported to iomap infrastructure, cleaned up and simplified
     code, notably removing last use of struct buffer_head in btrfs code

  Core changes:

   - factor out backreference iteration, to be used by ordinary
     backreferences and relocation code

   - improved global block reserve utilization
      * better logic to serialize requests
      * increased maximum available for unlink
      * improved handling on large pages (64K)

   - direct io cleanups and fixes
      * simplify layering, where cloned bios were unnecessarily created
        for some cases
      * error handling fixes (submit, endio)
      * remove repair worker thread, used to avoid deadlocks during
        repair

   - refactored block group reading code, preparatory work for new type
     of block group storage that should improve mount time on large
     filesystems

  Cleanups:

   - cleaned up (and slightly sped up) set/get helpers for metadata data
     structure members

   - root bit REF_COWS got renamed to SHAREABLE to reflect the that the
     blocks of the tree get shared either among subvolumes or with the
     relocation trees

  Fixes:

   - when subvolume deletion fails due to ENOSPC, the filesystem is not
     turned read-only

   - device scan deals with devices from other filesystems that changed
     ownership due to overwrite (mkfs)

   - fix a race between scrub and block group removal/allocation

   - fix long standing bug of a runaway balance operation, printing the
     same line to the syslog, caused by a stale status bit on a reloc
     tree that prevented progress

   - fix corrupt log due to concurrent fsync of inodes with shared
     extents

   - fix space underflow for NODATACOW and buffered writes when it for
     some reason needs to fallback to COW mode"

* tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
  btrfs: fix space_info bytes_may_use underflow during space cache writeout
  btrfs: fix space_info bytes_may_use underflow after nocow buffered write
  btrfs: fix wrong file range cleanup after an error filling dealloc range
  btrfs: remove redundant local variable in read_block_for_search
  btrfs: open code key_search
  btrfs: split btrfs_direct_IO to read and write part
  btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
  fs: remove dio_end_io()
  btrfs: switch to iomap_dio_rw() for dio
  iomap: remove lockdep_assert_held()
  iomap: add a filesystem hook for direct I/O bio submission
  fs: export generic_file_buffered_read()
  btrfs: turn space cache writeout failure messages into debug messages
  btrfs: include error on messages about failure to write space/inode caches
  btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
  btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
  btrfs: make checksum item extension more efficient
  btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
  btrfs: unexport btrfs_compress_set_level()
  btrfs: simplify iget helpers
  ...
2020-06-02 19:59:25 -07:00
Linus Torvalds
96ed320d52 New code for 5.8:
- Clean up io_is_direct.
 - Add a new statx flag to indicate when file data access is being done
   via DAX (as opposed to the page cache).
 - Update the documentation for how system administrators and application
   programmers can take advantage of the (still experimental DAX) feature.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl6wO7UACgkQ+H93GTRK
 tOtEaw//eShC2YE0S+GS7ihQ3x71PJa4Is0VZOIpTHl01aqMSegwB3QbDQbVUhn1
 TzLhUw4pZsz3R9GbUOrfHOYRt+aSP2t0WNhIulDeBp41CYJQSaFt85KnfM9hoBOi
 VYssum3Lu7/6ReKrDD/mumzWYkts+JDCuXRmt7nOQeZJVNXOCBBbvN354V4/IKLY
 wB4Wnaq3f3gYniXYW/23aCX+kocaOIUZtK6aFKyeD0KvfP5toDlpw1cBVMoM9CmO
 bmEy8vKf4lgFZDLeDMqmWOecMgEH5h0baN5Psu13WuDCiCd6maBl0KpxVpVlwsep
 yVz6mMbZjmLOJ2lqyw+lZb+XicD+K3yRVSTGKxV3VbuRjeX9tjVG5Im13VesNvJB
 WWJq/CkOU8W0Zs7Q5RbUDGbFFWDJSI/OStAU+UeuWvL9Gndv7hqv6H904qbPPtEu
 4m4Y34ARzrEaKpkABKKwQ53cLClNxmmgUN9N3cXK3mk8idlX4zM3j6+HJYUxXTO+
 fBjhOlyUy2KaWmzZoJp28QvaU4iegGmMSuRnQ9HAvXmdxUA2K6+wjS6LCZGh04vz
 z7SbzTBlo2kvsKdRMwJ306s2QA0/HvmKHHLI+p8OQANce9hjhE3XdJayzhitd0fk
 k0D/y8OY+fbCSgI8C4g66lA8Zf2sos/ulD0QTTNBPfU2rRWkKUc=
 =rS19
 -----END PGP SIGNATURE-----

Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull DAX updates part one from Darrick Wong:
 "After many years of LKML-wrangling about how to enable programs to
  query and influence the file data access mode (DAX) when a filesystem
  resides on storage devices such as persistent memory, Ira Weiny has
  emerged with a proposed set of standard behaviors that has not been
  shot down by anyone! We're more or less standardizing on the current
  XFS behavior and adapting ext4 to do the same.

  This is the first of a handful pull requests that will make ext4 and
  XFS present a consistent interface for user programs that care about
  DAX. We add a statx attribute that programs can check to see if DAX is
  enabled on a particular file. Then, we update the DAX documentation to
  spell out the user-visible behaviors that filesystems will guarantee
  (until the next storage industry shakeup). The on-disk inode flag has
  been in XFS for a few years now.

  Summary:

   - Clean up io_is_direct.

   - Add a new statx flag to indicate when file data access is being
     done via DAX (as opposed to the page cache).

   - Update the documentation for how system administrators and
     application programmers can take advantage of the (still
     experimental DAX) feature"

Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/

* tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  Documentation/dax: Update Usage section
  fs/stat: Define DAX statx attribute
  fs: Remove unneeded IS_DAX() check in io_is_direct()
2020-06-02 19:45:12 -07:00
Linus Torvalds
d9afbb3509 Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull lockdown update from James Morris:
 "An update for the security subsystem to allow unprivileged users
  to see the status of the lockdown feature. From Jeremy Cline"

Also an added comment to describe CAP_SETFCAP.

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  capabilities: add description for CAP_SETFCAP
  lockdown: Allow unprivileged users to see lockdown status
2020-06-02 17:36:24 -07:00
Linus Torvalds
9d99b1647f audit/stable-5.8 PR 20200601
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7VnKEUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMbHA/+PQmrPdzPvkLAjjf1y3LXvyEIAXIQ
 h2r8SxHa7iGyF6vVPz+ya7ux0KAm8wCVdfkokWG5jxjwK7pysS6gx9JzBVK7dbhD
 FsKBSoq9+to9fYlaCyX7vn85C7kK5oGrwS/ECos0BHBpij8ukLgvPQu+PDs7d4xW
 1X2Nrgqnc7M4L8ayzXTQX0fDWcOkapzaN86+R+Lavb4hO/FownaYbuCFn+1mdzux
 ZNBpt3/y1pM6vi5YBkI1rkauBCmkl/YSX/mf/EwDNlQ0XmcadGQ6z7iwjyiE826g
 etCHWD3cgQH7Zzz6CxBNX8Xbq0nIQueHHiFYpVyy9lf4xleFvnfFDebrs8Q9TB6G
 jTWU8okioUKPZyRDaRuIAmCf/LBQRsMkIYTU3w6J0ZqsBycTw3NXPiQArmlxZESM
 HquxWpKoZytRiw581hiSGKNqY+R3FvA+Jroc/7bWfNOE3IdFxegvCsC3giKJf1rY
 AlQitehql9a5jp7A57+477WRYOygYRnd+ntMD5KqR90QSIcQXeg0/lFKhco+zc2p
 bXbWLE+aaOTGCeC+3Eow3T7FMWmrIn6ccKgM84+WT7YQYtRqUYu3RIZbnlYXN7uH
 8xGXT6ccPcEwIjgyF87J0KyGhrbT1N91Jd2jMJkEry9OLAn/yr+pUBQtAa456MMi
 JYevS4atZaUqgvw=
 =iLfC
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "Summary of the significant patches:

   - Record information about binds/unbinds to the audit multicast
     socket. This helps identify which processes have/had access to the
     information in the audit stream.

   - Cleanup and add some additional information to the netfilter
     configuration events collected by audit.

   - Fix some of the audit error handling code so we don't leak network
     namespace references"

* tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: add subj creds to NETFILTER_CFG record to
  audit: Replace zero-length array with flexible-array
  audit: make symbol 'audit_nfcfgs' static
  netfilter: add audit table unregister actions
  audit: tidy and extend netfilter_cfg x_tables
  audit: log audit netlink multicast bind and unbind
  audit: fix a net reference leak in audit_list_rules_send()
  audit: fix a net reference leak in audit_send_reply()
2020-06-02 17:13:37 -07:00
Stefan Hajnoczi
56f2e3b7d8 capabilities: add description for CAP_SETFCAP
Document the purpose of CAP_SETFCAP.  For some reason this capability
had no description while the others did.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2020-06-02 16:22:46 -07:00
Linus Torvalds
1ee08de1e2 for-5.8/io_uring-2020-06-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VP+kQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpuK4D/0XsSG/Yirbba1rrbqw/qpw9xcAs9oyN0tS
 8SmmGN27ghrkVSsGBXNcG+PSTu3pkkLjYZ6TQtKamrya9G+lRAsKRsQ+Yq+7Qv4e
 N6lCUlLJ99KqTMtwvIoxSpA1tz3ENHucOw2cJrw3kd9G0kil7GvDkIOBasd+kmwn
 ak+mnMJZzRhqSM7M5lKQOk8l92gKBHGbPy4xKb0st3dQkYptDvit0KcNSAuevtOp
 sRZpdbXaT3FA6xa5iEgggI6vZQGVmK1EaGoQqZ8vgVo75aovkjZyQWWiFVVOlEqr
 QjUCCQuixcbMRbZjgpojqva5nmLhFVhLCfoSH2XgttEQZhmTwypdRwM2/IlxV5q2
 xCofrDkhYOfIgHkuP6p68ukIPIfQ+4jotvsmXZ/HeD/xbx3TRyJRZadISr6wiuLm
 7zRXWaGCYomUIPJOOrpBQ9FsCglkaN63oB6VGuGKTg3g7kE2QrZ2/aGuexP+FAdh
 OrA8BlzxZzpqMKhjQVKOl9r6FU928MZn8nIAkMdQ/Ia1mOpb4rrPo4qCdf+tbhPO
 pmKtQPQjbszQ3UfTgShvfvDk43BeRim1DxZPFTauSu1FMpqWBCwQgXMynPFrf5TR
 HXF61G+jw5swDW6uJgW7bXdm7hHr15vRqQr54MgGS+T0OOa1df9MR0dJB5CGklfI
 ycLU6AAT+A==
 =A/qA
 -----END PGP SIGNATURE-----

Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:
 "A relatively quiet round, mostly just fixes and code improvements. In
particular:

   - Make statx just use the generic statx handler, instead of open
     coding it. We don't need that anymore, as we always call it async
     safe (Bijan)

   - Enable closing of the ring itself. Also fixes O_PATH closure (me)

   - Properly name completion members (me)

   - Batch reap of dead file registrations (me)

   - Allow IORING_OP_POLL with double waitqueues (me)

   - Add tee(2) support (Pavel)

   - Remove double off read (Pavel)

   - Fix overflow cancellations (Pavel)

   - Improve CQ timeouts (Pavel)

   - Async defer drain fixes (Pavel)

   - Add support for enabling/disabling notifications on a registered
     eventfd (Stefano)

   - Remove dead state parameter (Xiaoguang)

   - Disable SQPOLL submit on dying ctx (Xiaoguang)

   - Various code cleanups"

* tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
  io_uring: fix overflowed reqs cancellation
  io_uring: off timeouts based only on completions
  io_uring: move timeouts flushing to a helper
  statx: hide interfaces no longer used by io_uring
  io_uring: call statx directly
  statx: allow system call to be invoked from io_uring
  io_uring: add io_statx structure
  io_uring: get rid of manual punting in io_close
  io_uring: separate DRAIN flushing into a cold path
  io_uring: don't re-read sqe->off in timeout_prep()
  io_uring: simplify io_timeout locking
  io_uring: fix flush req->refs underflow
  io_uring: don't submit sqes when ctx->refs is dying
  io_uring: async task poll trigger cleanup
  io_uring: add tee(2) support
  splice: export do_tee()
  io_uring: don't repeat valid flag list
  io_uring: rename io_file_put()
  io_uring: remove req->needs_fixed_files
  io_uring: cleanup io_poll_remove_one() logic
  ...
2020-06-02 15:42:50 -07:00
Linus Torvalds
bce159d734 for-5.8/drivers-2020-06-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VPc4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgQkEACnQlzWOfNQMz1AzgUAv/S8IYDJCLrkbjLZ
 JK4pJv8Hjhss/7sS+fd8kyKe9VtaZz2IjmrXcC66RMMwtpx4iHnkRffoNAgEdGOl
 /M5TCZGhs+F/mp3Lc0WdR5DFHkM6yy2Tkk9wCFLreB4bW67janAWnd7nbU4INqJj
 +WqIgpzNMc/kfUhpBYTeQLORhL4e2TG9ADTi/zeUITlpnEsA65LOgXKEpeIFYnSX
 KTl4GIZ9tjazG3Y1Eva7DYHDIErNNAtX67KBqf+WBgMV98eB0O6xIPN1WlmhDTqj
 FGMLkb8msH1HHntvxDAuc4/ortnUy8vPI4o6zKP89HJJNjIM5p5eHEuVF5JnBw42
 Rtu9Om6JqWx51nhAhJNBj9bUStYbhEl0vVQCwbkfPbDJhzTy3RR8z709q9+ZwOrL
 xbp4aJBzqrzscjBEiSQbNCf2PyuOAdU0r1x81UN81ZN41d5qUcumcinjw4Y7vru8
 z5zMlo1Iy/AWQYyu7jgHmnpI7ZyA/1Qclo5dV7aa72bLFaJa35e7QxgfQOFBA5dY
 UZl6QPJRlnB80uGRzD5jCh2O2sQ3XZqYnpaKsUAka1GgbceCp9IC4A5mfZvpACsh
 Xk8VXjlhvY/iPJsKLqrh4Oedg4Dj5M3PLL9C3MDfYeIP2qgXpbnk87UV1TPNSpY0
 QcTxsXXXIw==
 =H+/Z
 -----END PGP SIGNATURE-----

Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "On top of the core changes, here are the block driver changes for this
  merge window:

   - NVMe changes:
        - NVMe over Fibre Channel protocol updates, which also reach
          over to drivers/scsi/lpfc (James Smart)
        - namespace revalidation support on the target (Anthony
          Iliopoulos)
        - gcc zero length array fix (Arnd Bergmann)
        - nvmet cleanups (Chaitanya Kulkarni)
        - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
        - use a SRQ per completion vector (Max Gurtovoy)
        - fix handling of runtime changes to the queue count (Weiping
          Zhang)
        - t10 protection information support for nvme-rdma and
          nvmet-rdma (Israel Rukshin and Max Gurtovoy)
        - target side AEN improvements (Chaitanya Kulkarni)
        - various fixes and minor improvements all over, icluding the
          nvme part of the lpfc driver"

   - Floppy code cleanup series (Willy, Denis)

   - Floppy contention fix (Jiri)

   - Loop CONFIGURE support (Martijn)

   - bcache fixes/improvements (Coly, Joe, Colin)

   - q->queuedata cleanups (Christoph)

   - Get rid of ioctl_by_bdev (Christoph, Stefan)

   - md/raid5 allocation fixes (Coly)

   - zero length array fixes (Gustavo)

   - swim3 task state fix (Xu)"

* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
  bcache: configure the asynchronous registertion to be experimental
  bcache: asynchronous devices registration
  bcache: fix refcount underflow in bcache_device_free()
  bcache: Convert pr_<level> uses to a more typical style
  bcache: remove redundant variables i and n
  lpfc: Fix return value in __lpfc_nvme_ls_abort
  lpfc: fix axchg pointer reference after free and double frees
  lpfc: Fix pointer checks and comments in LS receive refactoring
  nvme: set dma alignment to qword
  nvmet: cleanups the loop in nvmet_async_events_process
  nvmet: fix memory leak when removing namespaces and controllers concurrently
  nvmet-rdma: add metadata/T10-PI support
  nvmet: add metadata support for block devices
  nvmet: add metadata/T10-PI support
  nvme: add Metadata Capabilities enumerations
  nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
  nvmet: rename nvmet_rw_len to nvmet_rw_data_len
  nvmet: add metadata characteristics for a namespace
  nvme-rdma: add metadata/T10-PI support
  nvme-rdma: introduce nvme_rdma_sgl structure
  ...
2020-06-02 15:37:03 -07:00
Linus Torvalds
faa392181a drm pull for 5.8-rc1
core:
 - uapi: error out EBUSY when existing master
 - uapi: rework SET/DROP MASTER permission handling
 - remove drm_pci.h
 - drm_pci* are now legacy
 - introduced managed DRM resources
 - subclassing support for drm_framebuffer
 - simple encoder helper
 - edid improvements
 - vblank + writeback documentation improved
 - drm/mm - optimise tree searches
 - port drivers to use devm_drm_dev_alloc
 
 dma-buf:
 - add flag for p2p buffer support
 
 mst:
 - ACT timeout improvements
 - remove drm_dp_mst_has_audio
 - don't use 2nd TX slot - spec recommends against it
 
 bridge:
 - dw-hdmi various improvements
 - chrontel ch7033 support
 - fix stack issues with old gcc
 
 hdmi:
 - add unpack function for drm infoframe
 
 fbdev:
 - misc fbdev driver fixes
 
 i915:
 - uapi: global sseu pinning
 - uapi: OA buffer polling
 - uapi: remove generated perf code
 - uapi: per-engine default property values in sysfs
 - Tigerlake GEN12 enabled.
 - Lots of gem refactoring
 - Tigerlake enablement patches
 - move to drm_device logging
 - Icelake gamma HW readout
 - push MST link retrain to hotplug work
 - bandwidth atomic helpers
 - ICL fixes
 - RPS/GT refactoring
 - Cherryview full-ppgtt support
 - i915 locking guidelines documented
 - require linear fb stride to be 512 multiple on gen9
 - Tigerlake SAGV support
 
 amdgpu:
 - uapi: encrypted GPU memory handling
 - uapi: add MEM_SYNC IB flag
 - p2p dma-buf support
 - export VRAM dma-bufs
 - FRU chip access support
 - RAS/SR-IOV updates
 - Powerplay locking fixes
 - VCN DPG (powergating) enablement
 - GFX10 clockgating fixes
 - DC fixes
 - GPU reset fixes
 - navi SDMA fix
 - expose FP16 for modesetting
 - DP 1.4 compliance fixes
 - gfx10 soft recovery
 - Improved Critical Thermal Faults handling
 - resizable BAR on gmc10
 
 amdkfd:
 - uapi: GWS resource management
 - track GPU memory per process
 - report PCI domain in topology
 
 radeon:
 - safe reg list generator fixes
 
 nouveau:
 - HD audio fixes on recent systems
 - vGPU detection (fail probe if we're on one, for now)
 - Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it)
 - SVM improvements/fixes
 - NVIDIA format modifier support
 - Misc other fixes.
 
 adv7511:
 - HDMI SPDIF support
 
 ast:
 - allocate crtc state size
 - fix double assignment
 - fix suspend
 
 bochs:
 - drop connector register
 
 cirrus:
 - move to tiny drivers.
 
 exynos:
 - fix imported dma-buf mapping
 - enable runtime PM
 - fixes and cleanups
 
 mediatek:
 - DPI pin mode swap
 - config mipi_tx current/impedance
 
 lima:
 - devfreq + cooling device support
 - task handling improvements
 - runtime PM support
 
 pl111:
 - vexpress init improvements
 - fix module auto-load
 
 rcar-du:
 - DT bindings conversion to YAML
 - Planes zpos sanity check and fix
 - MAINTAINERS entry for LVDS panel driver
 
 mcde:
 - fix return value
 
 mgag200:
 - use managed config init
 
 stm:
 - read endpoints from DT
 
 vboxvideo:
 - use PCI managed functions
 - drop WC mtrr
 
 vkms:
 - enable cursor by default
 
 rockchip:
 - afbc support
 
 virtio:
 - various cleanups
 
 qxl:
 - fix cursor notify port
 
 hisilicon:
 - 128-byte stride alignment fix
 
 sun4i:
 - improved format handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJe1edsAAoJEAx081l5xIa+bKEQAJAZv/8OMM2rx+p+GyKgrNpl
 ihTX/oyToy8dw97s1kWF7V5kKU+qjF8aWlKoPS0xovzaMAzYSFz9FRNEUgqtTXMI
 zIAzSXioqP21oL9/ZTHcXDULtz8Gk3uiPomgXMWLlNBdt3X5qvCwsmPRIYSwG0GJ
 00VCvxDbVxGSM3wzcvbfyRwHCq3SrFvIusXv5jHnnxEFGH0C7Mj2/FLYMKLNjvli
 Q8VEI2wQPZj1QdA8fLFVneIQsR6YUSko9OfFMANP8VJGpPMnUkvVxTJ5ACGJspvn
 U/h6NYqJeUU2Y3BSKqtjIC3a1LY51tp5tL9q4H9TD1hqMckt6F2V7T2IeFU8i6+V
 YzUsSiT4q1xB+uiFVcgopx2hyIp8INOEyWrVdYgw2JviROeRD+pDHvJd13ZNMnTe
 GvLWQ/PfBFrcz8eligjiYjOf66ZTU+j/rivaOBFyrs9gdlsaEW2QRurFrcNX+0lZ
 kDbLsIFjhYnPXsvHP87x4BuQCKQIEh8wWuxXuJjunBPdqVrJyltZWbBiKO571b5/
 BtX6xj6ztUOffR2RdiVanzY546I2hEi7SHMUuWnMqXsOV46GBN0QvlpZad/47n9x
 ZUy8HDDD0/qWuGwvPOJGIeAnUteWge9AhWXTeN5+1h5m+QEOzYkPKqC3Hp8TW1pM
 gToTWgAhnu731fhzLWyt
 =H7IS
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Highlights:

   - Core DRM had a lot of refactoring around managed drm resources to
     make drivers simpler.

   - Intel Tigerlake support is on by default

   - amdgpu now support p2p PCI buffer sharing and encrypted GPU memory

  Details:

  core:
   - uapi: error out EBUSY when existing master
   - uapi: rework SET/DROP MASTER permission handling
   - remove drm_pci.h
   - drm_pci* are now legacy
   - introduced managed DRM resources
   - subclassing support for drm_framebuffer
   - simple encoder helper
   - edid improvements
   - vblank + writeback documentation improved
   - drm/mm - optimise tree searches
   - port drivers to use devm_drm_dev_alloc

  dma-buf:
   - add flag for p2p buffer support

  mst:
   - ACT timeout improvements
   - remove drm_dp_mst_has_audio
   - don't use 2nd TX slot - spec recommends against it

  bridge:
   - dw-hdmi various improvements
   - chrontel ch7033 support
   - fix stack issues with old gcc

  hdmi:
   - add unpack function for drm infoframe

  fbdev:
   - misc fbdev driver fixes

  i915:
   - uapi: global sseu pinning
   - uapi: OA buffer polling
   - uapi: remove generated perf code
   - uapi: per-engine default property values in sysfs
   - Tigerlake GEN12 enabled.
   - Lots of gem refactoring
   - Tigerlake enablement patches
   - move to drm_device logging
   - Icelake gamma HW readout
   - push MST link retrain to hotplug work
   - bandwidth atomic helpers
   - ICL fixes
   - RPS/GT refactoring
   - Cherryview full-ppgtt support
   - i915 locking guidelines documented
   - require linear fb stride to be 512 multiple on gen9
   - Tigerlake SAGV support

  amdgpu:
   - uapi: encrypted GPU memory handling
   - uapi: add MEM_SYNC IB flag
   - p2p dma-buf support
   - export VRAM dma-bufs
   - FRU chip access support
   - RAS/SR-IOV updates
   - Powerplay locking fixes
   - VCN DPG (powergating) enablement
   - GFX10 clockgating fixes
   - DC fixes
   - GPU reset fixes
   - navi SDMA fix
   - expose FP16 for modesetting
   - DP 1.4 compliance fixes
   - gfx10 soft recovery
   - Improved Critical Thermal Faults handling
   - resizable BAR on gmc10

  amdkfd:
   - uapi: GWS resource management
   - track GPU memory per process
   - report PCI domain in topology

  radeon:
   - safe reg list generator fixes

  nouveau:
   - HD audio fixes on recent systems
   - vGPU detection (fail probe if we're on one, for now)
   - Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it)
   - SVM improvements/fixes
   - NVIDIA format modifier support
   - Misc other fixes.

  adv7511:
   - HDMI SPDIF support

  ast:
   - allocate crtc state size
   - fix double assignment
   - fix suspend

  bochs:
   - drop connector register

  cirrus:
   - move to tiny drivers.

  exynos:
   - fix imported dma-buf mapping
   - enable runtime PM
   - fixes and cleanups

  mediatek:
   - DPI pin mode swap
   - config mipi_tx current/impedance

  lima:
   - devfreq + cooling device support
   - task handling improvements
   - runtime PM support

  pl111:
   - vexpress init improvements
   - fix module auto-load

  rcar-du:
   - DT bindings conversion to YAML
   - Planes zpos sanity check and fix
   - MAINTAINERS entry for LVDS panel driver

  mcde:
   - fix return value

  mgag200:
   - use managed config init

  stm:
   - read endpoints from DT

  vboxvideo:
   - use PCI managed functions
   - drop WC mtrr

  vkms:
   - enable cursor by default

  rockchip:
   - afbc support

  virtio:
   - various cleanups

  qxl:
   - fix cursor notify port

  hisilicon:
   - 128-byte stride alignment fix

  sun4i:
   - improved format handling"

* tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm: (1401 commits)
  drm/amd/display: Fix potential integer wraparound resulting in a hang
  drm/amd/display: drop cursor position check in atomic test
  drm/amdgpu: fix device attribute node create failed with multi gpu
  drm/nouveau: use correct conflicting framebuffer API
  drm/vblank: Fix -Wformat compile warnings on some arches
  drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode
  drm/amd/display: Handle GPU reset for DC block
  drm/amdgpu: add apu flags (v2)
  drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven
  drm/amdgpu: fix pm sysfs node handling (v2)
  drm/amdgpu: move gpu_info parsing after common early init
  drm/amdgpu: move discovery gfx config fetching
  drm/nouveau/dispnv50: fix runtime pm imbalance on error
  drm/nouveau: fix runtime pm imbalance on error
  drm/nouveau: fix runtime pm imbalance on error
  drm/nouveau/debugfs: fix runtime pm imbalance on error
  drm/nouveau/nouveau/hmm: fix migrate zero page to GPU
  drm/nouveau/nouveau/hmm: fix nouveau_dmem_chunk allocations
  drm/nouveau/kms/nv50-: Share DP SST mode_valid() handling with MST
  drm/nouveau/kms/nv50-: Move 8BPC limit for MST into nv50_mstc_get_modes()
  ...
2020-06-02 15:04:15 -07:00
Linus Torvalds
c5d6c13843 MMC core:
- Enable erase/discard/trim support for all (e)MMC/SD hosts
  - Export information through sysfs about enhanced RPMB support (eMMC v5.1+)
  - Align the initialization commands for SDIO cards
  - Fix SDIO initialization to prevent memory leaks and NULL pointer errors
  - Do not export undefined MMC_NAME/MODALIAS for SDIO cards
  - Export device/vendor field from common CIS for SDIO cards
  - Move SDIO IDs from functional drivers to the common SDIO header
  - Introduce the ->request_atomic() host ops
 
 MMC host:
  - Improve support for HW busy signaling for several hosts
  - Converting some DT bindings to the json-schema
  - meson-mx-sdhc: Add driver and DT doc for the Amlogic Meson SDHC controller
  - meson-mx-sdio: Run a soft reset to recover from timeout/CRC error
  - mmci: Convert to use mmc_regulator_set_vqmmc()
  - mmci_stm32_sdmmc: Fix a couple of DMA bugs
  - mmci_stm32_sdmmc: Fix power on issue
  - renesas,mmcif,sdhci: Document r8a7742 DT bindings
  - renesas_sdhi: Add support for M3-W ES1.2 and 1.3 revisions
  - renesas_sdhi: Improvements to the TAP selection
  - renesas_sdhi/tmio: Further fixup runtime PM management at ->remove()
  - sdhci: Introduce ops to dump vendor specific registers
  - sdhci-cadence: Fix PHY write sequence
  - sdhci-esdhc-imx: Improve tunings
  - sdhci-esdhc-imx: Enable GPIO card detect as system wakeup
  - sdhci-esdhc-imx: Add HS400 support for i.MX6SLL
  - sdhci-esdhc-mcf: Add driver for the Coldfire/M5441X esdhc controller
    - m68k: mcf5441x: Add platform data to enable esdhc mmc controller
  - sdhci-msm: Improve HS400 tuning
  - sdhci-msm: Dump vendor specific registers at error
  - sdhci-msm: Add support for DLL/DDR properties provided from DT
  - sdhci-msm: Add support for the sm8250 variant
  - sdhci-msm: Add support for DVFS by converting to dev_pm_opp_set_rate()
  - sdhci-of-arasan: Add support for Intel Keem Bay variant
  - sdhci-of-arasan: Add support for Xilinx Versal SD variant
  - sdhci-of-dwcmshc: Add support for system suspend/resume
  - sdhci-of-dwcmshc: Fix UHS signaling support
  - sdhci-of-esdhc: Fix tuning for eMMC HS400 mode
  - sdhci-pci-gli: Add Genesys Logic GL9763E support
  - sdhci-sprd: Add support for the ->request_atomic() ops
  - sdhci-tegra: Avoid reading autocal timeout values when not applicable
 
 MEMSTICK:
  - Minor trivial update.
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAl7UuIIXHHVsZi5oYW5z
 c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCnzog/9GLeCHnaXan3uKU0NwWBpbiP0
 RIqty/wYJpnWwL1J+otWLjVamuy/DSF4Bv4Fz+L+lXTzXWK+htXpWUragsvOeCj5
 KHxSl/rASVo/zg5FOCCcq1+rYfRitAFqkWTYv0uFX+G/jcAcrspVJET3SfVYpVI4
 fT2VP3zWXEh4yxJUCwnzqVPR3rsTdRob9csvpz2tuBRBUt0Vg+Kfc+NLi2EZzdsJ
 GDmiYOlch4Lx+8tUtRQQUX4MTVYMWCCkfL0RhYH8+kgcD8xK/LorN0Xb6FPTln95
 hVPFTP3siojnj41UWQETqVnps7nsD+1ekLqehvNq6oLy7X4JDhWxu6bk3w7Gb2ox
 fk/L5x1ZFP+NFN8bvb3KPESfCKf4miuQ3fgRNad+FES7oqjN8ec55gB3oC2q5K8/
 RrBFrtrcFTWBqxeYPxBMdBdj1tS7yfNPOavQg9iVGjzqJfnoMXfsKHrrTeZmGdZZ
 4HudtV4BzSNUCmkvqho6TvFbLslfJ2a1RkV3tXijgbFknDoqD5rclm7KPp/ZQhH6
 5ExnQpqd2EsHykJE2Zu4UxWW5f8TUBT2iBhVmTyQzrzijCtdmUkpbViaJXU0/yLD
 /k+MbiiJQ4ZTpcOFMeZ73J2WFU4xLyjY4magtWtBlPUZiONQy68zQ2GVdQ0AlXEt
 kbWvC7mMAYXiMJyAF9c=
 =1bkc
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC updates from Ulf Hansson:
 "MMC core:
   - Enable erase/discard/trim support for all (e)MMC/SD hosts
   - Export information through sysfs about enhanced RPMB support (eMMC v5.1+)
   - Align the initialization commands for SDIO cards
   - Fix SDIO initialization to prevent memory leaks and NULL pointer errors
   - Do not export undefined MMC_NAME/MODALIAS for SDIO cards
   - Export device/vendor field from common CIS for SDIO cards
   - Move SDIO IDs from functional drivers to the common SDIO header
   - Introduce the ->request_atomic() host ops

  MMC host:
   - Improve support for HW busy signaling for several hosts
   - Converting some DT bindings to the json-schema
   - meson-mx-sdhc: Add driver and DT doc for the Amlogic Meson SDHC controller
   - meson-mx-sdio: Run a soft reset to recover from timeout/CRC error
   - mmci: Convert to use mmc_regulator_set_vqmmc()
   - mmci_stm32_sdmmc: Fix a couple of DMA bugs
   - mmci_stm32_sdmmc: Fix power on issue
   - renesas,mmcif,sdhci: Document r8a7742 DT bindings
   - renesas_sdhi: Add support for M3-W ES1.2 and 1.3 revisions
   - renesas_sdhi: Improvements to the TAP selection
   - renesas_sdhi/tmio: Further fixup runtime PM management at ->remove()
   - sdhci: Introduce ops to dump vendor specific registers
   - sdhci-cadence: Fix PHY write sequence
   - sdhci-esdhc-imx: Improve tunings
   - sdhci-esdhc-imx: Enable GPIO card detect as system wakeup
   - sdhci-esdhc-imx: Add HS400 support for i.MX6SLL
   - sdhci-esdhc-mcf: Add driver for the Coldfire/M5441X esdhc controller
   - m68k: mcf5441x: Add platform data to enable esdhc mmc controller
   - sdhci-msm: Improve HS400 tuning
   - sdhci-msm: Dump vendor specific registers at error
   - sdhci-msm: Add support for DLL/DDR properties provided from DT
   - sdhci-msm: Add support for the sm8250 variant
   - sdhci-msm: Add support for DVFS by converting to dev_pm_opp_set_rate()
   - sdhci-of-arasan: Add support for Intel Keem Bay variant
   - sdhci-of-arasan: Add support for Xilinx Versal SD variant
   - sdhci-of-dwcmshc: Add support for system suspend/resume
   - sdhci-of-dwcmshc: Fix UHS signaling support
   - sdhci-of-esdhc: Fix tuning for eMMC HS400 mode
   - sdhci-pci-gli: Add Genesys Logic GL9763E support
   - sdhci-sprd: Add support for the ->request_atomic() ops
   - sdhci-tegra: Avoid reading autocal timeout values when not applicable

  MEMSTICK:
   - Minor trivial update"

* tag 'mmc-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (127 commits)
  dt-bindings: mmc: Convert sdhci-pxa to json-schema
  mmc: sdhci-msm: Clear tuning done flag while hs400 tuning
  mmc: core: Export device/vendor ids from Common CIS for SDIO cards
  mmc: core: Do not export MMC_NAME= and MODALIAS=mmc:block for SDIO cards
  mmc: sdhci-of-at91: fix CALCR register being rewritten
  mmc: sdhci-esdhc-imx: disable the CMD CRC check for standard tuning
  mmc: sdhci-esdhc-imx: fix the mask for tuning start point
  mmc: host: sdhci-esdhc-imx: add wakeup feature for GPIO CD pin
  mmc: mmci_sdmmc: fix DMA API warning max segment size
  mmc: mmci_sdmmc: fix DMA API warning overlapping mappings
  mmc: sdhci-of-arasan: Add support for Intel Keem Bay
  dt-bindings: mmc: arasan: Add compatible strings for Intel Keem Bay
  mmc: sdhci-cadence: fix PHY write
  mmc: sdio: Sort all SDIO IDs in common include file
  mmc: sdio: Fix Cypress SDIO IDs macros in common include file
  mmc: sdio: Move SDIO IDs from b43-sdio driver to common include file
  mmc: sdio: Move SDIO IDs from ath10k driver to common include file
  mmc: sdio: Move SDIO IDs from ath6kl driver to common include file
  mmc: sdio: Move SDIO IDs from smssdio driver to common include file
  mmc: sdio: Move SDIO IDs from btmtksdio driver to common include file
  ...
2020-06-02 12:48:58 -07:00
Daniel Borkmann
7cdec54f97 bpf: Add csum_level helper for fixing up csum levels
Add a bpf_csum_level() helper which BPF programs can use in combination
with bpf_skb_adjust_room() when they pass in BPF_F_ADJ_ROOM_NO_CSUM_RESET
flag to the latter to avoid falling back to CHECKSUM_NONE.

The bpf_csum_level() allows to adjust CHECKSUM_UNNECESSARY skb->csum_levels
via BPF_CSUM_LEVEL_{INC,DEC} which calls __skb_{incr,decr}_checksum_unnecessary()
on the skb. The helper also allows a BPF_CSUM_LEVEL_RESET which sets the skb's
csum to CHECKSUM_NONE as well as a BPF_CSUM_LEVEL_QUERY to just return the
current level. Without this helper, there is no way to otherwise adjust the
skb->csum_level. I did not add an extra dummy flags as there is plenty of free
bitspace in level argument itself iff ever needed in future.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/279ae3717cb3d03c0ffeb511493c93c450a01e1a.1591108731.git.daniel@iogearbox.net
2020-06-02 11:50:23 -07:00
Daniel Borkmann
836e66c218 bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
Lorenz recently reported:

  In our TC classifier cls_redirect [0], we use the following sequence of
  helper calls to decapsulate a GUE (basically IP + UDP + custom header)
  encapsulated packet:

    bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
    bpf_redirect(skb->ifindex, BPF_F_INGRESS)

  It seems like some checksums of the inner headers are not validated in
  this case. For example, a TCP SYN packet with invalid TCP checksum is
  still accepted by the network stack and elicits a SYN ACK. [...]

  That is, we receive the following packet from the driver:

    | ETH | IP | UDP | GUE | IP | TCP |
    skb->ip_summed == CHECKSUM_UNNECESSARY

  ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading.
  On this packet we run skb_adjust_room_mac(-encap_len), and get the following:

    | ETH | IP | TCP |
    skb->ip_summed == CHECKSUM_UNNECESSARY

  Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing
  into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is
  turned into a no-op due to CHECKSUM_UNNECESSARY.

The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally,
it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does
not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original
skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now
there is no way to adjust the skb->csum_level. NICs that have checksum offload
disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected.

Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and
add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users
from opting out. Opting out is useful for the case where we don't remove/add
full protocol headers, or for the case where a user wants to adjust the csum
level manually e.g. through bpf_csum_level() helper that is added in subsequent
patch.

The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF
bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally
as well but doesn't change layers, only transitions between v4 to v6 and vice
versa, therefore no adoption is required there.

  [0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/

Fixes: 2be7e212d5 ("bpf: add bpf_skb_adjust_room helper")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Reported-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/CACAyw9-uU_52esMd1JjuA80fRPHJv5vsSg8GnfW3t_qDU4aVKQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/11a90472e7cce83e76ddbfce81fdfce7bfc68808.1591108731.git.daniel@iogearbox.net
2020-06-02 11:50:23 -07:00
Farhan Ali
24c986748b vfio-ccw: Introduce a new schib region
The schib region can be used by userspace to get the subchannel-
information block (SCHIB) for the passthrough subchannel.
This can be useful to get information such as channel path
information via the SCHIB.PMCW fields.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200505122745.53208-5-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-02 13:14:08 +02:00
Joerg Roedel
cc69fc4861 Merge branches 'arm/msm', 'arm/allwinner', 'arm/smmu', 'x86/vt-d', 'hyper-v', 'core' and 'x86/amd' into next 2020-06-02 10:32:04 +02:00
Michael S. Tsirkin
a865e420b9 virtio: force spec specified alignment on types
The ring element addresses are passed between components with different
alignments assumptions. Thus, if guest/userspace selects a pointer and
host then gets and dereferences it, we might need to decrease the
compiler-selected alignment to prevent compiler on the host from
assuming pointer is aligned.

This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration, but it seems safer to handle this
generally.

Note that userspace that allocates the memory is actually OK and does
not need to be fixed, but userspace that gets it from guest or another
process does need to be fixed. The later doesn't generally talk to the
kernel so while it might be buggy it's not talking to the kernel in the
buggy way - it's just using the header in the buggy way - so fixing
header and asking userspace to recompile is the best we can do.

I verified that the produced kernel binary on x86 is exactly identical
before and after the change.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2020-06-02 02:45:13 -04:00
Matej Genci
e7c8cc35a6 virtio: add VIRTIO_RING_NO_LEGACY
Add macro to disable legacy vring functions.

Signed-off-by: Matej Genci <matej.genci@nutanix.com>
Link: https://lore.kernel.org/r/20190911124942.243713-1-matej.genci@nutanix.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-02 02:45:13 -04:00
Linus Torvalds
f359287765 Merge branch 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted patches from Miklos.

  An interesting part here is /proc/mounts stuff..."

The "/proc/mounts stuff" is using a cursor for keeeping the location
data while traversing the mount listing.

Also probably worth noting is the addition of faccessat2(), which takes
an additional set of flags to specify how the lookup is done
(AT_EACCESS, AT_SYMLINK_NOFOLLOW, AT_EMPTY_PATH).

* 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: add faccessat2 syscall
  vfs: don't parse "silent" option
  vfs: don't parse "posixacl" option
  vfs: don't parse forbidden flags
  statx: add mount_root
  statx: add mount ID
  statx: don't clear STATX_ATIME on SB_RDONLY
  uapi: deprecate STATX_ALL
  utimensat: AT_EMPTY_PATH support
  vfs: split out access_override_creds()
  proc/mounts: add cursor
  aio: fix async fsync creds
  vfs: allow unprivileged whiteout creation
2020-06-01 16:44:06 -07:00
Linus Torvalds
b23c4771ff A fair amount of stuff this time around, dominated by yet another massive
set from Mauro toward the completion of the RST conversion.  I *really*
 hope we are getting close to the end of this.  Meanwhile, those patches
 reach pretty far afield to update document references around the tree;
 there should be no actual code changes there.  There will be, alas, more of
 the usual trivial merge conflicts.
 
 Beyond that we have more translations, improvements to the sphinx
 scripting, a number of additions to the sysctl documentation, and lots of
 fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl7VId8PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Yq/gH/iaDgirQZV6UZ2v9sfwQNYolNpf2sKAuOZjd
 bPFB7WJoMQbKwQEvYrAUL2+5zPOcLYuIfzyOfo1BV1py+EyKbACcKjI4AedxfJF7
 +NchmOBhlEqmEhzx2U08HRc4/8J223WG17fJRVsV3p+opJySexSFeQucfOciX5NR
 RUCxweWWyg/FgyqjkyMMTtsePqZPmcT5dWTlVXISlbWzcv5NFhuJXnSrw8Sfzcmm
 SJMzqItv3O+CabnKQ8kMLV2PozXTMfjeWH47ZUK0Y8/8PP9+cvqwFzZ0UDQJ1Xaz
 oyW/TqmunaXhfMsMFeFGSwtfgwRHvXdxkQdtwNHvo1dV4dzTvDw=
 =fDC/
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.8' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A fair amount of stuff this time around, dominated by yet another
  massive set from Mauro toward the completion of the RST conversion. I
  *really* hope we are getting close to the end of this. Meanwhile,
  those patches reach pretty far afield to update document references
  around the tree; there should be no actual code changes there. There
  will be, alas, more of the usual trivial merge conflicts.

  Beyond that we have more translations, improvements to the sphinx
  scripting, a number of additions to the sysctl documentation, and lots
  of fixes"

* tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
  Documentation: fixes to the maintainer-entry-profile template
  zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
  tracing: Fix events.rst section numbering
  docs: acpi: fix old http link and improve document format
  docs: filesystems: add info about efivars content
  Documentation: LSM: Correct the basic LSM description
  mailmap: change email for Ricardo Ribalda
  docs: sysctl/kernel: document unaligned controls
  Documentation: admin-guide: update bug-hunting.rst
  docs: sysctl/kernel: document ngroups_max
  nvdimm: fixes to maintainter-entry-profile
  Documentation/features: Correct RISC-V kprobes support entry
  Documentation/features: Refresh the arch support status files
  Revert "docs: sysctl/kernel: document ngroups_max"
  docs: move locking-specific documents to locking/
  docs: move digsig docs to the security book
  docs: move the kref doc into the core-api book
  docs: add IRQ documentation at the core-api book
  docs: debugging-via-ohci1394.txt: add it to the core-api book
  docs: fix references for ipmi.rst file
  ...
2020-06-01 15:45:27 -07:00
Jakub Sitnicki
7f045a49fe bpf: Add link-based BPF program attachment to network namespace
Extend bpf() syscall subcommands that operate on bpf_link, that is
LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
network namespaces (only flow dissector at the moment).

Link-based and prog-based attachment can be used interchangeably, but only
one can exist at a time. Attempts to attach a link when a prog is already
attached directly, and the other way around, will be met with -EEXIST.
Attempts to detach a program when link exists result in -EINVAL.

Attachment of multiple links of same attach type to one netns is not
supported with the intention to lift the restriction when a use-case
presents itself. Because of that link create returns -E2BIG when trying to
create another netns link, when one already exists.

Link-based attachments to netns don't keep a netns alive by holding a ref
to it. Instead links get auto-detached from netns when the latter is being
destroyed, using a pernet pre_exit callback.

When auto-detached, link lives in defunct state as long there are open FDs
for it. -ENOLINK is returned if a user tries to update a defunct link.

Because bpf_link to netns doesn't hold a ref to struct net, special care is
taken when releasing, updating, or filling link info. The netns might be
getting torn down when any of these link operations are in progress. That
is why auto-detach and update/release/fill_info are synchronized by the
same mutex. Also, link ops have to always check if auto-detach has not
happened yet and if netns is still alive (refcnt > 0).

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-5-jakub@cloudflare.com
2020-06-01 15:21:03 -07:00
Linus Torvalds
533b220f7b arm64 updates for 5.8
- Branch Target Identification (BTI)
 	* Support for ARMv8.5-BTI in both user- and kernel-space. This
 	  allows branch targets to limit the types of branch from which
 	  they can be called and additionally prevents branching to
 	  arbitrary code, although kernel support requires a very recent
 	  toolchain.
 
 	* Function annotation via SYM_FUNC_START() so that assembly
 	  functions are wrapped with the relevant "landing pad"
 	  instructions.
 
 	* BPF and vDSO updates to use the new instructions.
 
 	* Addition of a new HWCAP and exposure of BTI capability to
 	  userspace via ID register emulation, along with ELF loader
 	  support for the BTI feature in .note.gnu.property.
 
 	* Non-critical fixes to CFI unwind annotations in the sigreturn
 	  trampoline.
 
 - Shadow Call Stack (SCS)
 	* Support for Clang's Shadow Call Stack feature, which reserves
 	  platform register x18 to point at a separate stack for each
 	  task that holds only return addresses. This protects function
 	  return control flow from buffer overruns on the main stack.
 
 	* Save/restore of x18 across problematic boundaries (user-mode,
 	  hypervisor, EFI, suspend, etc).
 
 	* Core support for SCS, should other architectures want to use it
 	  too.
 
 	* SCS overflow checking on context-switch as part of the existing
 	  stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.
 
 - CPU feature detection
 	* Removed numerous "SANITY CHECK" errors when running on a system
 	  with mismatched AArch32 support at EL1. This is primarily a
 	  concern for KVM, which disabled support for 32-bit guests on
 	  such a system.
 
 	* Addition of new ID registers and fields as the architecture has
 	  been extended.
 
 - Perf and PMU drivers
 	* Minor fixes and cleanups to system PMU drivers.
 
 - Hardware errata
 	* Unify KVM workarounds for VHE and nVHE configurations.
 
 	* Sort vendor errata entries in Kconfig.
 
 - Secure Monitor Call Calling Convention (SMCCC)
 	* Update to the latest specification from Arm (v1.2).
 
 	* Allow PSCI code to query the SMCCC version.
 
 - Software Delegated Exception Interface (SDEI)
 	* Unexport a bunch of unused symbols.
 
 	* Minor fixes to handling of firmware data.
 
 - Pointer authentication
 	* Add support for dumping the kernel PAC mask in vmcoreinfo so
 	  that the stack can be unwound by tools such as kdump.
 
 	* Simplification of key initialisation during CPU bringup.
 
 - BPF backend
 	* Improve immediate generation for logical and add/sub
 	  instructions.
 
 - vDSO
 	- Minor fixes to the linker flags for consistency with other
 	  architectures and support for LLVM's unwinder.
 
 	- Clean up logic to initialise and map the vDSO into userspace.
 
 - ACPI
 	- Work around for an ambiguity in the IORT specification relating
 	  to the "num_ids" field.
 
 	- Support _DMA method for all named components rather than only
 	  PCIe root complexes.
 
 	- Minor other IORT-related fixes.
 
 - Miscellaneous
 	* Initialise debug traps early for KGDB and fix KDB cacheflushing
 	  deadlock.
 
 	* Minor tweaks to early boot state (documentation update, set
 	  TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).
 
 	* Refactoring and cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl7U9csQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNLBHCACs/YU4SM7Om5f+7QnxIKao5DBr2CnGGvdC
 yTfDghFDTLQVv3MufLlfno3yBe5G8sQpcZfcc+hewfcGoMzVZXu8s7LzH6VSn9T9
 jmT3KjDMrg0RjSHzyumJp2McyelTk0a4FiKArSIIKsJSXUyb1uPSgm7SvKVDwEwU
 JGDzL9IGilmq59GiXfDzGhTZgmC37QdwRoRxDuqtqWQe5CHoRXYexg87HwBKOQxx
 HgU9L7ehri4MRZfpyjaDrr6quJo3TVnAAKXNBh3mZAskVS9ZrfKpEH0kYWYuqybv
 znKyHRecl/rrGePV8RTMtrwnSdU26zMXE/omsVVauDfG9hqzqm+Q
 =w3qi
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "A sizeable pile of arm64 updates for 5.8.

  Summary below, but the big two features are support for Branch Target
  Identification and Clang's Shadow Call stack. The latter is currently
  arm64-only, but the high-level parts are all in core code so it could
  easily be adopted by other architectures pending toolchain support

  Branch Target Identification (BTI):

   - Support for ARMv8.5-BTI in both user- and kernel-space. This allows
     branch targets to limit the types of branch from which they can be
     called and additionally prevents branching to arbitrary code,
     although kernel support requires a very recent toolchain.

   - Function annotation via SYM_FUNC_START() so that assembly functions
     are wrapped with the relevant "landing pad" instructions.

   - BPF and vDSO updates to use the new instructions.

   - Addition of a new HWCAP and exposure of BTI capability to userspace
     via ID register emulation, along with ELF loader support for the
     BTI feature in .note.gnu.property.

   - Non-critical fixes to CFI unwind annotations in the sigreturn
     trampoline.

  Shadow Call Stack (SCS):

   - Support for Clang's Shadow Call Stack feature, which reserves
     platform register x18 to point at a separate stack for each task
     that holds only return addresses. This protects function return
     control flow from buffer overruns on the main stack.

   - Save/restore of x18 across problematic boundaries (user-mode,
     hypervisor, EFI, suspend, etc).

   - Core support for SCS, should other architectures want to use it
     too.

   - SCS overflow checking on context-switch as part of the existing
     stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.

  CPU feature detection:

   - Removed numerous "SANITY CHECK" errors when running on a system
     with mismatched AArch32 support at EL1. This is primarily a concern
     for KVM, which disabled support for 32-bit guests on such a system.

   - Addition of new ID registers and fields as the architecture has
     been extended.

  Perf and PMU drivers:

   - Minor fixes and cleanups to system PMU drivers.

  Hardware errata:

   - Unify KVM workarounds for VHE and nVHE configurations.

   - Sort vendor errata entries in Kconfig.

  Secure Monitor Call Calling Convention (SMCCC):

   - Update to the latest specification from Arm (v1.2).

   - Allow PSCI code to query the SMCCC version.

  Software Delegated Exception Interface (SDEI):

   - Unexport a bunch of unused symbols.

   - Minor fixes to handling of firmware data.

  Pointer authentication:

   - Add support for dumping the kernel PAC mask in vmcoreinfo so that
     the stack can be unwound by tools such as kdump.

   - Simplification of key initialisation during CPU bringup.

  BPF backend:

   - Improve immediate generation for logical and add/sub instructions.

  vDSO:

   - Minor fixes to the linker flags for consistency with other
     architectures and support for LLVM's unwinder.

   - Clean up logic to initialise and map the vDSO into userspace.

  ACPI:

   - Work around for an ambiguity in the IORT specification relating to
     the "num_ids" field.

   - Support _DMA method for all named components rather than only PCIe
     root complexes.

   - Minor other IORT-related fixes.

  Miscellaneous:

   - Initialise debug traps early for KGDB and fix KDB cacheflushing
     deadlock.

   - Minor tweaks to early boot state (documentation update, set
     TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).

   - Refactoring and cleanup"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  KVM: arm64: Check advertised Stage-2 page size capability
  arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
  ACPI/IORT: Remove the unused __get_pci_rid()
  arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
  arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
  arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
  arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
  arm64/cpufeature: Introduce ID_MMFR5 CPU register
  arm64/cpufeature: Introduce ID_DFR1 CPU register
  arm64/cpufeature: Introduce ID_PFR2 CPU register
  arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
  arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
  arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
  arm64: mm: Add asid_gen_match() helper
  firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
  arm64: vdso: Fix CFI directives in sigreturn trampoline
  arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
  ...
2020-06-01 15:18:27 -07:00
David Ahern
64b59025c1 xdp: Add xdp_txq_info to xdp_buff
Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
moment only the device is added. Other fields (queue_index)
can be added as use cases arise.

>From a UAPI perspective, add egress_ifindex to xdp context for
bpf programs to see the Tx device.

Update the verifier to only allow accesses to egress_ifindex by
XDP programs with BPF_XDP_DEVMAP expected attach type.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200529220716.75383-4-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:48:32 -07:00
David Ahern
fbee97feed bpf: Add support to attach bpf program to a devmap entry
Add BPF_XDP_DEVMAP attach type for use with programs associated with a
DEVMAP entry.

Allow DEVMAPs to associate a program with a device entry by adding
a bpf_prog.fd to 'struct bpf_devmap_val'. Values read show the program
id, so the fd and id are a union. bpf programs can get access to the
struct via vmlinux.h.

The program associated with the fd must have type XDP with expected
attach type BPF_XDP_DEVMAP. When a program is associated with a device
index, the program is run on an XDP_REDIRECT and before the buffer is
added to the per-cpu queue. At this point rxq data is still valid; the
next patch adds tx device information allowing the prorgam to see both
ingress and egress device indices.

XDP generic is skb based and XDP programs do not work with skb's. Block
the use case by walking maps used by a program that is to be attached
via xdpgeneric and fail if any of them are DEVMAP / DEVMAP_HASH with

Block attach of BPF_XDP_DEVMAP programs to devices.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200529220716.75383-3-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:48:32 -07:00
Amritha Nambiar
c3c16f2ea6 bpf: Add rx_queue_mapping to bpf_sock
Add "rx_queue_mapping" to bpf_sock. This gives read access for the
existing field (sk_rx_queue_mapping) of struct sock from bpf_sock.
Semantics for the bpf_sock rx_queue_mapping access are similar to
sk_rx_queue_get(), i.e the value NO_QUEUE_MAPPING is not allowed
and -1 is returned in that case. This is useful for transmit queue
selection based on the received queue index which is cached in the
socket in the receive path.

v3: Addressed review comments to add usecase in patch description,
    and fixed default value for rx_queue_mapping.
v2: fixed build error for CONFIG_XPS wrapping, reported by
    kbuild test robot <lkp@intel.com>

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:38:23 -07:00
Andrii Nakryiko
457f44363a bpf: Implement BPF ring buffer and verifier support for it
This commit adds a new MPSC ring buffer implementation into BPF ecosystem,
which allows multiple CPUs to submit data to a single shared ring buffer. On
the consumption side, only single consumer is assumed.

Motivation
----------
There are two distinctive motivators for this work, which are not satisfied by
existing perf buffer, which prompted creation of a new ring buffer
implementation.
  - more efficient memory utilization by sharing ring buffer across CPUs;
  - preserving ordering of events that happen sequentially in time, even
  across multiple CPUs (e.g., fork/exec/exit events for a task).

These two problems are independent, but perf buffer fails to satisfy both.
Both are a result of a choice to have per-CPU perf ring buffer.  Both can be
also solved by having an MPSC implementation of ring buffer. The ordering
problem could technically be solved for perf buffer with some in-kernel
counting, but given the first one requires an MPSC buffer, the same solution
would solve the second problem automatically.

Semantics and APIs
------------------
Single ring buffer is presented to BPF programs as an instance of BPF map of
type BPF_MAP_TYPE_RINGBUF. Two other alternatives considered, but ultimately
rejected.

One way would be to, similar to BPF_MAP_TYPE_PERF_EVENT_ARRAY, make
BPF_MAP_TYPE_RINGBUF could represent an array of ring buffers, but not enforce
"same CPU only" rule. This would be more familiar interface compatible with
existing perf buffer use in BPF, but would fail if application needed more
advanced logic to lookup ring buffer by arbitrary key. HASH_OF_MAPS addresses
this with current approach. Additionally, given the performance of BPF
ringbuf, many use cases would just opt into a simple single ring buffer shared
among all CPUs, for which current approach would be an overkill.

Another approach could introduce a new concept, alongside BPF map, to
represent generic "container" object, which doesn't necessarily have key/value
interface with lookup/update/delete operations. This approach would add a lot
of extra infrastructure that has to be built for observability and verifier
support. It would also add another concept that BPF developers would have to
familiarize themselves with, new syntax in libbpf, etc. But then would really
provide no additional benefits over the approach of using a map.
BPF_MAP_TYPE_RINGBUF doesn't support lookup/update/delete operations, but so
doesn't few other map types (e.g., queue and stack; array doesn't support
delete, etc).

The approach chosen has an advantage of re-using existing BPF map
infrastructure (introspection APIs in kernel, libbpf support, etc), being
familiar concept (no need to teach users a new type of object in BPF program),
and utilizing existing tooling (bpftool). For common scenario of using
a single ring buffer for all CPUs, it's as simple and straightforward, as
would be with a dedicated "container" object. On the other hand, by being
a map, it can be combined with ARRAY_OF_MAPS and HASH_OF_MAPS map-in-maps to
implement a wide variety of topologies, from one ring buffer for each CPU
(e.g., as a replacement for perf buffer use cases), to a complicated
application hashing/sharding of ring buffers (e.g., having a small pool of
ring buffers with hashed task's tgid being a look up key to preserve order,
but reduce contention).

Key and value sizes are enforced to be zero. max_entries is used to specify
the size of ring buffer and has to be a power of 2 value.

There are a bunch of similarities between perf buffer
(BPF_MAP_TYPE_PERF_EVENT_ARRAY) and new BPF ring buffer semantics:
  - variable-length records;
  - if there is no more space left in ring buffer, reservation fails, no
    blocking;
  - memory-mappable data area for user-space applications for ease of
    consumption and high performance;
  - epoll notifications for new incoming data;
  - but still the ability to do busy polling for new data to achieve the
    lowest latency, if necessary.

BPF ringbuf provides two sets of APIs to BPF programs:
  - bpf_ringbuf_output() allows to *copy* data from one place to a ring
    buffer, similarly to bpf_perf_event_output();
  - bpf_ringbuf_reserve()/bpf_ringbuf_commit()/bpf_ringbuf_discard() APIs
    split the whole process into two steps. First, a fixed amount of space is
    reserved. If successful, a pointer to a data inside ring buffer data area
    is returned, which BPF programs can use similarly to a data inside
    array/hash maps. Once ready, this piece of memory is either committed or
    discarded. Discard is similar to commit, but makes consumer ignore the
    record.

bpf_ringbuf_output() has disadvantage of incurring extra memory copy, because
record has to be prepared in some other place first. But it allows to submit
records of the length that's not known to verifier beforehand. It also closely
matches bpf_perf_event_output(), so will simplify migration significantly.

bpf_ringbuf_reserve() avoids the extra copy of memory by providing a memory
pointer directly to ring buffer memory. In a lot of cases records are larger
than BPF stack space allows, so many programs have use extra per-CPU array as
a temporary heap for preparing sample. bpf_ringbuf_reserve() avoid this needs
completely. But in exchange, it only allows a known constant size of memory to
be reserved, such that verifier can verify that BPF program can't access
memory outside its reserved record space. bpf_ringbuf_output(), while slightly
slower due to extra memory copy, covers some use cases that are not suitable
for bpf_ringbuf_reserve().

The difference between commit and discard is very small. Discard just marks
a record as discarded, and such records are supposed to be ignored by consumer
code. Discard is useful for some advanced use-cases, such as ensuring
all-or-nothing multi-record submission, or emulating temporary malloc()/free()
within single BPF program invocation.

Each reserved record is tracked by verifier through existing
reference-tracking logic, similar to socket ref-tracking. It is thus
impossible to reserve a record, but forget to submit (or discard) it.

bpf_ringbuf_query() helper allows to query various properties of ring buffer.
Currently 4 are supported:
  - BPF_RB_AVAIL_DATA returns amount of unconsumed data in ring buffer;
  - BPF_RB_RING_SIZE returns the size of ring buffer;
  - BPF_RB_CONS_POS/BPF_RB_PROD_POS returns current logical possition of
    consumer/producer, respectively.
Returned values are momentarily snapshots of ring buffer state and could be
off by the time helper returns, so this should be used only for
debugging/reporting reasons or for implementing various heuristics, that take
into account highly-changeable nature of some of those characteristics.

One such heuristic might involve more fine-grained control over poll/epoll
notifications about new data availability in ring buffer. Together with
BPF_RB_NO_WAKEUP/BPF_RB_FORCE_WAKEUP flags for output/commit/discard helpers,
it allows BPF program a high degree of control and, e.g., more efficient
batched notifications. Default self-balancing strategy, though, should be
adequate for most applications and will work reliable and efficiently already.

Design and implementation
-------------------------
This reserve/commit schema allows a natural way for multiple producers, either
on different CPUs or even on the same CPU/in the same BPF program, to reserve
independent records and work with them without blocking other producers. This
means that if BPF program was interruped by another BPF program sharing the
same ring buffer, they will both get a record reserved (provided there is
enough space left) and can work with it and submit it independently. This
applies to NMI context as well, except that due to using a spinlock during
reservation, in NMI context, bpf_ringbuf_reserve() might fail to get a lock,
in which case reservation will fail even if ring buffer is not full.

The ring buffer itself internally is implemented as a power-of-2 sized
circular buffer, with two logical and ever-increasing counters (which might
wrap around on 32-bit architectures, that's not a problem):
  - consumer counter shows up to which logical position consumer consumed the
    data;
  - producer counter denotes amount of data reserved by all producers.

Each time a record is reserved, producer that "owns" the record will
successfully advance producer counter. At that point, data is still not yet
ready to be consumed, though. Each record has 8 byte header, which contains
the length of reserved record, as well as two extra bits: busy bit to denote
that record is still being worked on, and discard bit, which might be set at
commit time if record is discarded. In the latter case, consumer is supposed
to skip the record and move on to the next one. Record header also encodes
record's relative offset from the beginning of ring buffer data area (in
pages). This allows bpf_ringbuf_commit()/bpf_ringbuf_discard() to accept only
the pointer to the record itself, without requiring also the pointer to ring
buffer itself. Ring buffer memory location will be restored from record
metadata header. This significantly simplifies verifier, as well as improving
API usability.

Producer counter increments are serialized under spinlock, so there is
a strict ordering between reservations. Commits, on the other hand, are
completely lockless and independent. All records become available to consumer
in the order of reservations, but only after all previous records where
already committed. It is thus possible for slow producers to temporarily hold
off submitted records, that were reserved later.

Reservation/commit/consumer protocol is verified by litmus tests in
Documentation/litmus-test/bpf-rb.

One interesting implementation bit, that significantly simplifies (and thus
speeds up as well) implementation of both producers and consumers is how data
area is mapped twice contiguously back-to-back in the virtual memory. This
allows to not take any special measures for samples that have to wrap around
at the end of the circular buffer data area, because the next page after the
last data page would be first data page again, and thus the sample will still
appear completely contiguous in virtual memory. See comment and a simple ASCII
diagram showing this visually in bpf_ringbuf_area_alloc().

Another feature that distinguishes BPF ringbuf from perf ring buffer is
a self-pacing notifications of new data being availability.
bpf_ringbuf_commit() implementation will send a notification of new record
being available after commit only if consumer has already caught up right up
to the record being committed. If not, consumer still has to catch up and thus
will see new data anyways without needing an extra poll notification.
Benchmarks (see tools/testing/selftests/bpf/benchs/bench_ringbuf.c) show that
this allows to achieve a very high throughput without having to resort to
tricks like "notify only every Nth sample", which are necessary with perf
buffer. For extreme cases, when BPF program wants more manual control of
notifications, commit/discard/output helpers accept BPF_RB_NO_WAKEUP and
BPF_RB_FORCE_WAKEUP flags, which give full control over notifications of data
availability, but require extra caution and diligence in using this API.

Comparison to alternatives
--------------------------
Before considering implementing BPF ring buffer from scratch existing
alternatives in kernel were evaluated, but didn't seem to meet the needs. They
largely fell into few categores:
  - per-CPU buffers (perf, ftrace, etc), which don't satisfy two motivations
    outlined above (ordering and memory consumption);
  - linked list-based implementations; while some were multi-producer designs,
    consuming these from user-space would be very complicated and most
    probably not performant; memory-mapping contiguous piece of memory is
    simpler and more performant for user-space consumers;
  - io_uring is SPSC, but also requires fixed-sized elements. Naively turning
    SPSC queue into MPSC w/ lock would have subpar performance compared to
    locked reserve + lockless commit, as with BPF ring buffer. Fixed sized
    elements would be too limiting for BPF programs, given existing BPF
    programs heavily rely on variable-sized perf buffer already;
  - specialized implementations (like a new printk ring buffer, [0]) with lots
    of printk-specific limitations and implications, that didn't seem to fit
    well for intended use with BPF programs.

  [0] https://lwn.net/Articles/779550/

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200529075424.3139988-2-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:38:22 -07:00
John Fastabend
13d70f5a5e bpf, sk_msg: Add get socket storage helpers
Add helpers to use local socket storage.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/159033907577.12355.14740125020572756560.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:38:20 -07:00
Linus Torvalds
a7092c8204 Kernel side changes:
- Add AMD Fam17h RAPL support
   - Introduce CAP_PERFMON to kernel and user space
   - Add Zhaoxin CPU support
   - Misc fixes and cleanups
 
 Tooling changes:
 
   perf record:
 
     - Introduce --switch-output-event to use arbitrary events to be setup
       and read from a side band thread and, when they take place a signal
       be sent to the main 'perf record' thread, reusing the --switch-output
       code to take perf.data snapshots from the --overwrite ring buffer, e.g.:
 
 	# perf record --overwrite -e sched:* \
 		      --switch-output-event syscalls:*connect* \
 		      workload
 
       will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
       connect syscalls.
 
     - Add --num-synthesize-threads option to control degree of parallelism of the
       synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be
       time consuming. This mimics pre-existing behaviour in 'perf top'.
 
   perf bench:
 
     - Add a multi-threaded synthesize benchmark.
     - Add kallsyms parsing benchmark.
 
   Intel PT support:
 
     - Stitch LBR records from multiple samples to get deeper backtraces,
       there are caveats, see the csets for details.
     - Allow using Intel PT to synthesize callchains for regular events.
     - Add support for synthesizing branch stacks for regular events (cycles,
       instructions, etc) from Intel PT data.
 
   Misc changes:
 
     - Updated perf vendor events for power9 and Coresight.
     - Add flamegraph.py script via 'perf flamegraph'
     - Misc other changes, fixes and cleanups - see the Git log for details.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VJAcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hAYw/8DFtzGkMaaWkrDSj62LXtWQiqr1l01ZFt
 9GzV4aN4/go+K4BQtsQN8cUjOkRHFnOryLuD9LfSBfqsdjuiyTynV/cJkeUGQBck
 TT/GgWf3XKJzTUBRQRk367Gbqs9UKwBP8CdFhOXcNzGEQpjhbwwIDPmem94U4L1N
 XLsysgC45ejWL1kMTZKmk6hDIidlFeDg9j70WDPX1nNfCeisk25rxwTpdgvjsjcj
 3RzPRt2EGS+IkuF4QSCT5leYSGaCpVDHCQrVpHj57UoADfWAyC71uopTLG4OgYSx
 PVd9gvloMeeqWmroirIxM67rMd/TBTfVekNolhnQDjqp60Huxm+gGUYmhsyjNqdx
 Pb8HRZCBAudei9Ue4jNMfhCRK2Ug1oL5wNvN1xcSteAqrwMlwBMGHWns6l12x0ks
 BxYhyLvfREvnKijXc1o8D5paRgqohJgfnHlrUZeacyaw5hQCbiVRpwg0T1mWAF53
 u9hfWLY0Oy+Qs2C7EInNsWSYXRw8oPQNTFVx2I968GZqsEn4DC6Pt3ovWrDKIDnz
 ugoZJQkJ3/O8stYSMiyENehdWlo575NkapCTDwhLWnYztrw4skqqHE8ighU/e8ug
 o/Kx7ANWN9OjjjQpq2GVUeT0jCaFO+OMiGMNEkKoniYgYjogt3Gw5PeedBMtY07p
 OcWTiQZamjU=
 =i27M
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add AMD Fam17h RAPL support

   - Introduce CAP_PERFMON to kernel and user space

   - Add Zhaoxin CPU support

   - Misc fixes and cleanups

  Tooling changes:

   - perf record:

     Introduce '--switch-output-event' to use arbitrary events to be
     setup and read from a side band thread and, when they take place a
     signal be sent to the main 'perf record' thread, reusing the core
     for '--switch-output' to take perf.data snapshots from the ring
     buffer used for '--overwrite', e.g.:

	# perf record --overwrite -e sched:* \
		      --switch-output-event syscalls:*connect* \
		      workload

     will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
     connect syscalls.

     Add '--num-synthesize-threads' option to control degree of
     parallelism of the synthesize_mmap() code which is scanning
     /proc/PID/task/PID/maps and can be time consuming. This mimics
     pre-existing behaviour in 'perf top'.

   - perf bench:

     Add a multi-threaded synthesize benchmark and kallsyms parsing
     benchmark.

   - Intel PT support:

     Stitch LBR records from multiple samples to get deeper backtraces,
     there are caveats, see the csets for details.

     Allow using Intel PT to synthesize callchains for regular events.

     Add support for synthesizing branch stacks for regular events
     (cycles, instructions, etc) from Intel PT data.

  Misc changes:

   - Updated perf vendor events for power9 and Coresight.

   - Add flamegraph.py script via 'perf flamegraph'

   - Misc other changes, fixes and cleanups - see the Git log for details

  Also, since over the last couple of years perf tooling has matured and
  decoupled from the kernel perf changes to a large degree, going
  forward Arnaldo is going to send perf tooling changes via direct pull
  requests"

* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
  perf/x86/rapl: Add AMD Fam17h RAPL support
  perf/x86/rapl: Make perf_probe_msr() more robust and flexible
  perf/x86/rapl: Flip logic on default events visibility
  perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
  perf/x86/rapl: Move RAPL support to common x86 code
  perf/core: Replace zero-length array with flexible-array
  perf/x86: Replace zero-length array with flexible-array
  perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
  perf/x86/rapl: Add Ice Lake RAPL support
  perf flamegraph: Use /bin/bash for report and record scripts
  perf cs-etm: Move definition of 'traceid_list' global variable from header file
  libsymbols kallsyms: Move hex2u64 out of header
  libsymbols kallsyms: Parse using io api
  perf bench: Add kallsyms parsing
  perf: cs-etm: Update to build with latest opencsd version.
  perf symbol: Fix kernel symbol address display
  perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  ...
2020-06-01 13:23:59 -07:00
Linus Torvalds
afdb0f2ec5 fscrypt updates for 5.8
- Add the IV_INO_LBLK_32 encryption policy flag which modifies the
   encryption to be optimized for eMMC inline encryption hardware.
 
 - Make the test_dummy_encryption mount option for ext4 and f2fs support
   v2 encryption policies.
 
 - Fix kerneldoc warnings and some coding style inconsistencies.
 
 There will be merge conflicts with the ext4 and f2fs trees due to the
 test_dummy_encryption change, but the resolutions are straightforward.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXtScMBQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKxC6AP0eOEkMrc9e10YftdN6xsyRjvqiPyFg
 oMjuU+SvQ+/sVgEAo0mBFITnl75ZGb8PyqXCNMDAy6uHaxcEjVGufx5q2QE=
 =dbxy
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt updates from Eric Biggers:

 - Add the IV_INO_LBLK_32 encryption policy flag which modifies the
   encryption to be optimized for eMMC inline encryption hardware.

 - Make the test_dummy_encryption mount option for ext4 and f2fs support
   v2 encryption policies.

 - Fix kerneldoc warnings and some coding style inconsistencies.

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: add support for IV_INO_LBLK_32 policies
  fscrypt: make test_dummy_encryption use v2 by default
  fscrypt: support test_dummy_encryption=v2
  fscrypt: add fscrypt_add_test_dummy_key()
  linux/parser.h: add include guards
  fscrypt: remove unnecessary extern keywords
  fscrypt: name all function parameters
  fscrypt: fix all kerneldoc warnings
2020-06-01 12:10:17 -07:00
Linus Torvalds
81e8c10dac Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Introduce crypto_shash_tfm_digest() and use it wherever possible.
   - Fix use-after-free and race in crypto_spawn_alg.
   - Add support for parallel and batch requests to crypto_engine.

  Algorithms:
   - Update jitter RNG for SP800-90B compliance.
   - Always use jitter RNG as seed in drbg.

  Drivers:
   - Add Arm CryptoCell driver cctrng.
   - Add support for SEV-ES to the PSP driver in ccp"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits)
  crypto: hisilicon - fix driver compatibility issue with different versions of devices
  crypto: engine - do not requeue in case of fatal error
  crypto: cavium/nitrox - Fix a typo in a comment
  crypto: hisilicon/qm - change debugfs file name from qm_regs to regs
  crypto: hisilicon/qm - add DebugFS for xQC and xQE dump
  crypto: hisilicon/zip - add debugfs for Hisilicon ZIP
  crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE
  crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC
  crypto: hisilicon/qm - add debugfs to the QM state machine
  crypto: hisilicon/qm - add debugfs for QM
  crypto: stm32/crc32 - protect from concurrent accesses
  crypto: stm32/crc32 - don't sleep in runtime pm
  crypto: stm32/crc32 - fix multi-instance
  crypto: stm32/crc32 - fix run-time self test issue.
  crypto: stm32/crc32 - fix ext4 chksum BUG_ON()
  crypto: hisilicon/zip - Use temporary sqe when doing work
  crypto: hisilicon - add device error report through abnormal irq
  crypto: hisilicon - remove codes of directly report device errors through MSI
  crypto: hisilicon - QM memory management optimization
  crypto: hisilicon - unify initial value assignment into QM
  ...
2020-06-01 12:00:10 -07:00
Horatiu Vultur
c6676e7d62 bridge: mrp: Add support for role MRA
A node that has the MRA role, it can behave as MRM or MRC.

Initially it starts as MRM and sends MRP_Test frames on both ring ports.
If it detects that there are MRP_Test send by another MRM, then it
checks if these frames have a lower priority than itself. In this case
it would send MRP_Nack frames to notify the other node that it needs to
stop sending MRP_Test frames.
If it receives a MRP_Nack frame then it stops sending MRP_Test frames
and starts to behave as a MRC but it would continue to monitor the
MRP_Test frames send by MRM. If at a point the MRM stops to send
MRP_Test frames it would get the MRM role and start to send MRP_Test
frames.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01 11:56:11 -07:00
Horatiu Vultur
4b3a61b030 bridge: mrp: Set the priority of MRP instance
Each MRP instance has a priority, a lower value means a higher priority.
The priority of MRP instance is stored in MRP_Test frame in this way
all the MRP nodes in the ring can see other nodes priority.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01 11:56:11 -07:00
Horatiu Vultur
7e89ed8ab3 bridge: mrp: Update MRP frame type
Replace u16/u32 with be16/be32 in the MRP frame types.
This fixes sparse warnings like:
warning: cast to restricted __be16

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01 11:56:11 -07:00
Ido Schimmel
30a4e9a29a devlink: Add 'control' trap type
This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Do not report such packets to the kernel's drop monitor as they were not
dropped by the device no encountered an exception during forwarding.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01 11:49:23 -07:00
Ido Schimmel
9eefeabed6 devlink: Add 'mirror' trap action
The action is used by control traps such as IGMP query. The packet is
flooded by the device, but also trapped to the CPU in order for the
software bridge to mark the receiving port as a multicast router port.
Such packets are marked with 'skb->offload_fwd_mark = 1' in order to
prevent the software bridge from flooding them again.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01 11:49:23 -07:00
David S. Miller
af0a2482fa Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next
to extend ctnetlink and the flowtable infrastructure:

1) Extend ctnetlink kernel side netlink dump filtering capabilities,
   from Romain Bellan.

2) Generalise the flowtable hook parser to take a hook list.

3) Pass a hook list to the flowtable hook registration/unregistration.

4) Add a helper function to release the flowtable hook list.

5) Update the flowtable event notifier to pass a flowtable hook list.

6) Allow users to add new devices to an existing flowtables.

7) Allow users to remove devices to an existing flowtables.

8) Allow for registering a flowtable with no initial devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01 11:46:30 -07:00
Jon Doron
f97f5a56f5 x86/kvm/hyper-v: Add support for synthetic debugger interface
Add support for Hyper-V synthetic debugger (syndbg) interface.
The syndbg interface is using MSRs to emulate a way to send/recv packets
data.

The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled
and if it supports the synthetic debugger interface it will attempt to
use it, instead of trying to initialize a network adapter.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200529134543.1127440-4-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:11 -04:00
Jon Doron
f7d31e6536 x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
The problem the patch is trying to address is the fact that 'struct
kvm_hyperv_exit' has different layout on when compiling in 32 and 64 bit
modes.

In 64-bit mode the default alignment boundary is 64 bits thus
forcing extra gaps after 'type' and 'msr' but in 32-bit mode the
boundary is at 32 bits thus no extra gaps.

This is an issue as even when the kernel is 64 bit, the userspace using
the interface can be both 32 and 64 bit but the same 32 bit userspace has
to work with 32 bit kernel.

The issue is fixed by forcing the 64 bit layout, this leads to ABI
change for 32 bit builds and while we are obviously breaking '32 bit
userspace with 32 bit kernel' case, we're fixing the '32 bit userspace
with 64 bit kernel' one.

As the interface has no (known) users and 32 bit KVM is rather baroque
nowadays, this seems like a reasonable decision.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200424113746.3473563-2-arilou@gmail.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:09 -04:00
Vitaly Kuznetsov
72de5fa4c1 KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
Introduce new capability to indicate that KVM supports interrupt based
delivery of 'page ready' APF events. This includes support for both
MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:08 -04:00
David S. Miller
1806c13dc2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.

The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-31 17:48:46 -07:00
David S. Miller
1079a34c56 Another set of changes, including
* many 6 GHz changes, though it's not _quite_ complete
    (I left out scanning for now, we're still discussing)
  * allow userspace SA-query processing for operating channel
    validation
  * TX status for control port TX, for AP-side operation
  * more per-STA/TID control options
  * move to kHz for channels, for future S1G operation
  * various other small changes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl7TfecACgkQB8qZga/f
 l8Tx6hAAgRizfdHb9xxp001AAzKnsdU46srOKOhwV2d6w+S+qHbLtwa0Xz43pBvX
 LxpQs7dBQBLYh11xJhDlKY6duYV989xGcsHm7suO43jbjDo8KXfz4MaP65em6EKt
 pdD0mD1sKkfR4FhYNbUEe8Ug/185jdk+gX+aI1Nrz6XlkUoiY+czSnGFyAvpvau2
 I+NGqyKG5D6ureq7p7dQcgN+t2D4Ou9stVhpQ+jP0Ep720gvfTEzeFuMJbb3JZ1y
 KSgOOWS1HQj1FdlJDs3KAmgUXpkU/lxZhNxl06MMYo3tB7Y0vmLoy/ZNcb5eW4Sw
 a0SHgG5yhDysCyINz6q7llG3esDcppGiNuMjd/qR2qPOZPHNtlYaHtcoKBcKdS0k
 03DyURZpA0B33cr9FTV8tXaM7IMY/2qaq/DqkeNtuDzGdh4jEwkVJ4fNtUAdgcOv
 4JEz3A7fY3isy8tzi7Dom4U/2hR1di5gZloAC5PPYRvnbmY9HoIqG06k1Wtn1Yj4
 pbquqvdJ5ONcaAaXz7zVQUZm1JzrK81Pl3pdih7USasc8z2MEzWQPSR+hxtwG5TY
 KbDI1Nel8ZLbL2MWDakh3+lPoJAMuyadRlVVWEMj4l/afYHgcy5hEbaMbaZnxmAg
 G4I6R5JZTJZuVdKi/U/Q9n7jR83qfIRNbxMLY8HFZ4caJ5qhZGs=
 =wdaG
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Another set of changes, including
 * many 6 GHz changes, though it's not _quite_ complete
   (I left out scanning for now, we're still discussing)
 * allow userspace SA-query processing for operating channel
   validation
 * TX status for control port TX, for AP-side operation
 * more per-STA/TID control options
 * move to kHz for channels, for future S1G operation
 * various other small changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-31 14:32:50 -07:00
Nathan Errera
093a48d2aa cfg80211: support bigger kek/kck key length
With some newer AKMs, the KCK and KEK are bigger, so allow that
if the driver advertises support for it. In addition, add a new
attribute for the AKM so we can use it for offloaded rekeying.

Signed-off-by: Nathan Errera <nathan.errera@intel.com>
[reword commit message]
Link: https://lore.kernel.org/r/20200528212237.5eb58b00a5d1.I61b09d77c4f382e8d58a05dcca78096e99a6bc15@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-31 11:27:24 +02:00
Johannes Berg
2239521772 cfg80211: add and expose HE 6 GHz band capabilities
These capabilities cover what would otherwise be transported
in HT/VHT capabilities, but only a subset thereof that is
actually needed on 6 GHz with HE already present. Expose the
capabilities to userspace, drivers are expected to set them
as using the 6 GHz band (currently) requires HE capability.

Link: https://lore.kernel.org/r/20200528213443.244cd5cb9db8.Icd8c773277a88c837e7e3af1d4d1013cc3b66543@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-31 11:26:36 +02:00
Rajkumar Manoharan
43e64bf301 cfg80211: handle 6 GHz capability of new station
Handle 6 GHz HE capability while adding new station. It will be used
later in mac80211 station processing.

Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org>
Link: https://lore.kernel.org/r/1589399105-25472-2-git-send-email-rmanohar@codeaurora.org
[handle nl80211_set_station, require WME,
 remove NL80211_HE_6GHZ_CAPABILITY_LEN]
Link: https://lore.kernel.org/r/20200528213443.b6b711fd4312.Ic9b97d57b6c4f2b28d4b2d23d2849d8bc20bd8cc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-31 11:26:20 +02:00
David S. Miller
942110fdf2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2020-05-29

1) Several fixes for ESP gro/gso in transport and beet mode when
   IPv6 extension headers are present. From Xin Long.

2) Fix a wrong comment on XFRMA_OFFLOAD_DEV.
   From Antony Antony.

3) Fix sk_destruct callback handling on ESP in TCP encapsulation.
   From Sabrina Dubroca.

4) Fix a use after free in xfrm_output_gso when used with vxlan.
   From Xin Long.

5) Fix secpath handling of VTI when used wiuth IPCOMP.
   From Xin Long.

6) Fix an oops when deleting a x-netns xfrm interface.
   From Nicolas Dichtel.

7) Fix a possible warning on policy updates. We had a case where it was
   possible to add two policies with the same lookup keys.
   From Xin Long.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29 13:05:56 -07:00
Ira Weiny
b383a73f2b fs/ext4: Introduce DAX inode flag
Add a flag ([EXT4|FS]_DAX_FL) to preserve FS_XFLAG_DAX in the ext4
inode.

Set the flag to be user visible and changeable.  Set the flag to be
inherited.  Allow applications to change the flag at any time except if
it conflicts with the set of mutually exclusive flags (Currently VERITY,
ENCRYPT, JOURNAL_DATA).

Furthermore, restrict setting any of the exclusive flags if DAX is set.

While conceptually possible, we do not allow setting EXT4_DAX_FL while
at the same time clearing exclusion flags (or vice versa) for 2 reasons:

	1) The DAX flag does not take effect immediately which
	   introduces quite a bit of complexity
	2) There is no clear use case for being this flexible

Finally, on regular files, flag the inode to not be cached to facilitate
changing S_DAX on the next creation of the inode.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-9-ira.weiny@intel.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28 22:09:47 -04:00
Kirti Wankhede
ad721705d0 vfio iommu: Add migration capability to report supported features
Added migration capability in IOMMU info chain.
User application should check IOMMU info chain for migration capability
to use dirty page tracking feature provided by kernel module.
User application must check page sizes supported and maximum dirty
bitmap size returned by this capability structure for ioctls used to get
dirty bitmap.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:27 -06:00
Kirti Wankhede
331e33d296 vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should allocate memory for bitmap, set
it all zeros, set size of allocated memory, set page size to be
considered for bitmap and set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:26 -06:00
Kirti Wankhede
b704fd14a0 vfio iommu: Add ioctl definition for dirty pages tracking
IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
All pages pinned by vendor driver through this API should be considered as
dirty during migration. When container consists of IOMMU capable device and
all pages are pinned and mapped, then all pages are marked dirty.
Added support to start/stop dirtied pages tracking and to get bitmap of all
dirtied pages for requested IO virtual address range.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:24 -06:00
Kirti Wankhede
a8a24f3f6e vfio: UAPI for migration interface for device state
- Defined MIGRATION region type and sub-type.

- Defined vfio_device_migration_info structure which will be placed at the
  0th offset of migration region to get/set VFIO device related
  information. Defined members of structure and usage on read/write access.

- Defined device states and state transition details.

- Defined sequence to be followed while saving and resuming VFIO device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:20 -06:00
Jérôme Pouiller
83fc5dd57f mmc: fix compilation of user API
The definitions of MMC_IOC_CMD  and of MMC_IOC_MULTI_CMD rely on
MMC_BLOCK_MAJOR:

    #define MMC_IOC_CMD       _IOWR(MMC_BLOCK_MAJOR, 0, struct mmc_ioc_cmd)
    #define MMC_IOC_MULTI_CMD _IOWR(MMC_BLOCK_MAJOR, 1, struct mmc_ioc_multi_cmd)

However, MMC_BLOCK_MAJOR is defined in linux/major.h and
linux/mmc/ioctl.h did not include it.

Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200511161902.191405-1-Jerome.Pouiller@silabs.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-05-28 11:22:14 +02:00
Ingo Molnar
0bffedbce9 Linux 5.7-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7K9iEeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGzTAH/0ifZEG4BQ8x/WlB
 8YLSLE6QQTSXYi25nyExuJbFkkKY5Tik8M2HD/36xwY/HnZOlH9jH6m0ntqZxpaA
 3EU9lr1ct79nCBMYhiJssvz8d9AOZXlyogFW9y2y9pmPjlmUtseZ7yGh1xD465cj
 B5Ty2w2W34cs7zF3og2xn5agOJMtWWXLXZ5mRa9EOquKC5zeYyRicmd0T+plYQD6
 hbRYmxFfDfppVnBCBARPNN0+NU5JJD94H+8bOuf1tl48XNrLiZMOicmtohKNQ6+W
 rZNpJNEGEp7KMtqWH0Nl3hmy3yfZHMwe1DXM/AZDqR7jTHZY4mZ0GEpLyfI9AU4n
 34jVHwU=
 =SmJ9
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28 07:58:12 +02:00
Romain Bellan
cb8aa9a3af netfilter: ctnetlink: add kernel side filtering for dump
Conntrack dump does not support kernel side filtering (only get exists,
but it returns only one entry. And user has to give a full valid tuple)

It means that userspace has to implement filtering after receiving many
irrelevant entries, consuming resources (conntrack table is sometimes
very huge, much more than a routing table for example).

This patch adds filtering in kernel side. To achieve this goal, we:

 * Add a new CTA_FILTER netlink attributes, actually a flag list to
   parametize filtering
 * Convert some *nlattr_to_tuple() functions, to allow a partial parsing
   of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not
   fully set)

Filtering is now possible on:
 * IP SRC/DST values
 * Ports for TCP and UDP flows
 * IMCP(v6) codes types and IDs

Filtering is done as an "AND" operator. For example, when flags
PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all
values are dumped.

Changes since v1:
  Set NLM_F_DUMP_FILTERED in nlm flags if entries are filtered

Changes since v2:
  Move several constants to nf_internals.h
  Move a fix on netlink values check in a separate patch
  Add a check on not-supported flags
  Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack
  (not yet implemented)
  Code style issues

Changes since v3:
  Fix compilation warning reported by kbuild test robot

Changes since v4:
  Fix a regression introduced in v3 (returned EINVAL for valid netlink
  messages without CTA_MARK)

Changes since v5:
  Change definition of CTA_FILTER_F_ALL
  Fix a regression when CTA_TUPLE_ZONE is not set

Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27 22:20:34 +02:00
Horatiu Vultur
20f6a05ef6 bridge: mrp: Rework the MRP netlink interface
This patch reworks the MRP netlink interface. Before, each attribute
represented a binary structure which made it hard to be extended.
Therefore update the MRP netlink interface such that each existing
attribute to be a nested attribute which contains the fields of the
binary structures.
In this way the MRP netlink interface can be extended without breaking
the backwards compatibility. It is also using strict checking for
attributes under the MRP top attribute.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27 11:30:43 -07:00
Dan Williams
3234ac664a /dev/mem: Revoke mappings when a driver claims the region
Close the hole of holding a mapping over kernel driver takeover event of
a given address range.

Commit 90a545e981 ("restrict /dev/mem to idle io memory ranges")
introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
kernel against scenarios where a /dev/mem user tramples memory that a
kernel driver owns. However, this protection only prevents *new* read(),
write() and mmap() requests. Established mappings prior to the driver
calling request_mem_region() are left alone.

Especially with persistent memory, and the core kernel metadata that is
stored there, there are plentiful scenarios for a /dev/mem user to
violate the expectations of the driver and cause amplified damage.

Teach request_mem_region() to find and shoot down active /dev/mem
mappings that it believes it has successfully claimed for the exclusive
use of the driver. Effectively a driver call to request_mem_region()
becomes a hole-punch on the /dev/mem device.

The typical usage of unmap_mapping_range() is part of
truncate_pagecache() to punch a hole in a file, but in this case the
implementation is only doing the "first half" of a hole punch. Namely it
is just evacuating current established mappings of the "hole", and it
relies on the fact that /dev/mem establishes mappings in terms of
absolute physical address offsets. Once existing mmap users are
invalidated they can attempt to re-establish the mapping, or attempt to
continue issuing read(2) / write(2) to the invalidated extent, but they
will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
block those subsequent accesses.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 90a545e981 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27 11:10:05 +02:00
Hauke Mehrtens
1b9ae0c929 wireless: Use linux/stddef.h instead of stddef.h
When compiling inside the kernel include linux/stddef.h instead of
stddef.h. When I compile this header file in backports for power PC I
run into a conflict with ptrdiff_t. I was unable to reproduce this in
mainline kernel. I still would like to fix this problem in the kernel.

Fixes: 6989310f5d ("wireless: Use offsetof instead of custom macro.")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://lore.kernel.org/r/20200521201422.16493-1-hauke@hauke-m.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-27 10:03:27 +02:00
Tamizh Chelvam
9a5f648862 nl80211: Add support to configure TID specific Tx rate configuration
This patch adds support to configure per TID Tx Rate configuration
through NL80211_TID_CONFIG_ATTR_TX_RATE* attributes. And it uses
nl80211_parse_tx_bitrate_mask api to validate the Tx rate mask.

Signed-off-by: Tamizh Chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1589357504-10175-1-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-27 10:03:25 +02:00
Markus Theil
dca9ca2d58 nl80211: add ability to report TX status for control port TX
This adds the necessary capabilities in nl80211 to allow drivers to
assign a cookie to control port TX frames (returned via extack in
the netlink ACK message of the command) and then later report the
frame's status.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200508144202.7678-2-markus.theil@tu-ilmenau.de
[use extack cookie instead of explicit message, recombine patches]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-27 10:02:04 +02:00
Thomas Pedersen
2032f3b2f9 nl80211: support scan frequencies in KHz
If the driver advertises NL80211_EXT_FEATURE_SCAN_FREQ_KHZ
userspace can omit NL80211_ATTR_SCAN_FREQUENCIES in favor
of an NL80211_ATTR_SCAN_FREQ_KHZ. To get scan results in
KHz userspace must also set the
NL80211_SCAN_FLAG_FREQ_KHZ.

This lets nl80211 remain compatible with older userspaces
while not requring and sending redundant (and potentially
incorrect) scan frequency sets.

Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200430172554.18383-4-thomas@adapt-ip.com
[use just nla_nest_start() (not _noflag) for NL80211_ATTR_SCAN_FREQ_KHZ]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-27 10:02:03 +02:00
Thomas Pedersen
942ba88ba9 nl80211: add KHz frequency offset for most wifi commands
cfg80211 recently gained the ability to understand a
frequency offset component in KHz. Expose this in nl80211
through the new attributes NL80211_ATTR_WIPHY_FREQ_OFFSET,
NL80211_FREQUENCY_ATTR_OFFSET,
NL80211_ATTR_CENTER_FREQ1_OFFSET, and
NL80211_BSS_FREQUENCY_OFFSET.

These add support to send and receive a KHz offset
component with the following NL80211 commands:

- NL80211_CMD_FRAME
- NL80211_CMD_GET_SCAN
- NL80211_CMD_AUTHENTICATE
- NL80211_CMD_ASSOCIATE
- NL80211_CMD_CONNECT

Along with any other command which takes a chandef, ie:

- NL80211_CMD_SET_CHANNEL
- NL80211_CMD_SET_WIPHY
- NL80211_CMD_START_AP
- NL80211_CMD_RADAR_DETECT
- NL80211_CMD_NOTIFY_RADAR
- NL80211_CMD_CHANNEL_SWITCH
- NL80211_JOIN_IBSS
- NL80211_CMD_REMAIN_ON_CHANNEL
- NL80211_CMD_JOIN_OCB
- NL80211_CMD_JOIN_MESH
- NL80211_CMD_TDLS_CHANNEL_SWITCH

If the driver advertises a band containing channels with
frequency offset, it must also verify support for
frequency offset channels in its cfg80211 ops, or return
an error.

Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200430172554.18383-3-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-27 10:02:02 +02:00
Sergey Matyukevich
c03369558c nl80211: simplify peer specific TID configuration
Current rule for applying TID configuration for specific peer looks overly
complicated. No need to reject new TID configuration when override flag is
specified. Another call with the same TID configuration, but without
override flag, allows to apply new configuration anyway.

Use the same approach as for the 'all peers' case: if override flag is
specified, then reset existing TID configuration and immediately
apply a new one.

Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20200424112905.26770-5-sergey.matyukevich.os@quantenna.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-27 10:02:02 +02:00
Sergey Matyukevich
33462e6823 cfg80211: add support for TID specific AMSDU configuration
This patch adds support to control per TID MSDU aggregation
using the NL80211_TID_CONFIG_ATTR_AMSDU_CTRL attribute.

Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20200424112905.26770-4-sergey.matyukevich.os@quantenna.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-27 10:02:01 +02:00
Andrew Lunn
f2bc8ad31a net: ethtool: Allow PHY cable test TDR data to configured
Allow the user to configure where on the cable the TDR data should be
retrieved, in terms of first and last sample, and the step between
samples. Also add the ability to ask for TDR data for just one pair.

If this configuration is not provided, it defaults to 1-150m at 1m
intervals for all pairs.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>

v3:
Move the TDR configuration into a structure
Add a range check on step
Use NL_SET_ERR_MSG_ATTR() when appropriate
Move TDR configuration into a nest
Document attributes in the request

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 23:22:21 -07:00
Andrew Lunn
a331172b15 net: ethtool: Add attributes for cable test TDR data
Some Ethernet PHYs can return the raw time domain reflectromatry data.
Add the attributes to allow this data to be requested and returned via
netlink ethtool.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>

v2:
m -> cm
Report what the PHY actually used for start/stop/step.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 23:21:48 -07:00
David S. Miller
745bd6f44c One batch of changes, containing:
* hwsim improvements from Jouni and myself, to be able to
    test more scenarios easily
  * some more HE (802.11ax) support
  * some initial S1G (sub 1 GHz) work for fractional MHz channels
  * some (action) frame registration updates to help DPP support
  * along with other various improvements/fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl7L1A8ACgkQB8qZga/f
 l8T5RQ/9EUxFqBs0cWojwyad5nkesyl51eOnbvSCJJF14W93s2oMeikCynTPe8Vg
 km36041QZqGbwmU0yWC9Lmm4y3ja5qQGI+QW+vT6tutGQx6FgK5TzUfYXqiFZqf6
 asqkvHpH4VqmbG1KEp0PZjIpW/OVK96pbvtXVnkrcMmjl2JjbRtAhyZQVNtt9ufJ
 6wqKf8e6iYqMIInMFPLX+rl7UEknxDKVcqPbMMJmY8/iM1z9Elkg3rkRSMehC+mE
 8cznZ6BsjAGCbMiA8K9fUo15lcMfZCJ1hAPzkD4TsJtMEJ0gYDo5jDB8TIpr5uoL
 95OnlF8jokJIsO+1g4CyaNSQsmFIuDo84vW8LtGRu9qzTP0UwelxhjZLgE3xlP6b
 W+z5HomxfWkYhJhaNywLP3B1VPtJwX8dL/wpECOWHzNKXG7Rb6GqzUwaCRFb6Jjo
 TmFJ5wLoEZHhsXYO2dvcyTzCUCXviXvfq60a56IyCJN8wDqmcubePv0+NOHUmj3c
 E71NTYymM3j9agdSpXdCFLBXA1OgyIydeSNHuBlaPA4sK6tr4ikUtbOrABjYTaQz
 2BB5fHEi8gs4EiHbSXqLFBot3JHljKJPsSN0wAgzQffN+6Kts9FG6HcrLsL+duDg
 lRdAzRrunE85S0QhsxeVIX216rX4W08sl0B1rJR+dTMX9ByblAk=
 =MVBJ
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
One batch of changes, containing:
 * hwsim improvements from Jouni and myself, to be able to
   test more scenarios easily
 * some more HE (802.11ax) support
 * some initial S1G (sub 1 GHz) work for fractional MHz channels
 * some (action) frame registration updates to help DPP support
 * along with other various improvements/fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 20:17:35 -07:00
Guillaume Nault
61aec25a6d cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.

In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).

For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.

The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).

Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:22:58 -07:00
Arnd Bergmann
b4ad9a32b2 TEE subsystem work
- Reserve GlobalPlatform implementation defined logon method range
 - Add support to register kernel memory with TEE to allow TEE bus drivers
   to register memory references.
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCgA4FiEEFV+gSSXZJY9ZyuB5LinzTIcAHJcFAl6wVgYaHGplbnMud2lr
 bGFuZGVyQGxpbmFyby5vcmcACgkQLinzTIcAHJcPLQ//TZixfZPBBjrjbKVonYIq
 eghXjOIAi9C9Se38zf5wOan+6DGrmtIzHwqKrtUH0tgP94RGTGqIIUgXt+FOZDsr
 cGhsF6EG0G1O1TA+g61WWgszdLec+WLTEq5jG0gUrAuXsKNQYw478pCcCLzVq2AP
 RJ3Gv0GqVWdXHRSFN0TMWJ88Xtp/GTKNVt7KHVxgQAPMML5TO7SM8/XBbS+zexXz
 UaKiSg76V72NX83VUN5a6DRPOH4fg1SoWc1F4j/qqoh8dqM4v2w3Wq9qP0XpbdQo
 5dc/fPBMMotBOZrZrbXfPo8Q65HKGQ/GALwbv7054xNwNdAFryVs5PkrtU5prL+3
 xgelsO7K3Yo4m/d1frJEVdIEVzKbAmrjXGzB9eV9EMgW6EqlapWx7omnebe+e8BG
 lenpjmh8zzssAsQrRHWvDclEcRH9/LDbzCBJahqYOa+iSSH4/bJiuk6Bt4auhmD9
 0FoUNAPTxtDfl1g7nlxB+ZkPmoYzMzjx78kqgCeU/A98uGful0rBPJ6NQILWtAph
 mIQOaPvAAeG70qWeln/wci87xIRVeqbPfhtUdX8tugj+fGHD1k3+HT7yABSjZX8o
 4K+G2Vt1+Ip/wQBW4zELSsG33UzAqwDktxZr4fGCQ58cgoHSDrpqeE+9p3NHYgb+
 191kv4Dk9+JMYsQKOQDLwKA=
 =K162
 -----END PGP SIGNATURE-----

Merge tag 'tee-subsys-for-5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee into arm/drivers

TEE subsystem work
- Reserve GlobalPlatform implementation defined logon method range
- Add support to register kernel memory with TEE to allow TEE bus drivers
  to register memory references.

* tag 'tee-subsys-for-5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee:
  tee: add private login method for kernel clients
  tee: enable support to register kernel memory

Link: https://lore.kernel.org/r/20200504181049.GA10860@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-05-25 16:38:01 +02:00
David Sterba
31344b2fce btrfs: remove more obsolete v0 extent ref declarations
The extent references v0 have been superseded long time go, there are
some unused declarations of access helpers. We can safely remove them
now. The struct btrfs_extent_ref_v0 is not used anywhere, but struct
btrfs_extent_item_v0 is still part of a backward compatibility check in
relocation.c and thus not removed.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:29 +02:00
David S. Miller
13209a8f73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24 13:47:27 -07:00
David S. Miller
a152b85984 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-05-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 50 non-merge commits during the last 8 day(s) which contain
a total of 109 files changed, 2776 insertions(+), 2887 deletions(-).

The main changes are:

1) Add a new AF_XDP buffer allocation API to the core in order to help
   lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe
   as well as mlx5 have been moved over to the new API and also gained a
   small improvement in performance, from Björn Töpel and Magnus Karlsson.

2) Add getpeername()/getsockname() attach types for BPF sock_addr programs
   in order to allow for e.g. reverse translation of load-balancer backend
   to service address/port tuple from a connected peer, from Daniel Borkmann.

3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers
   being non-NULL, e.g. if after an initial test another non-NULL test on
   that pointer follows in a given path, then it can be pruned right away,
   from John Fastabend.

4) Larger rework of BPF sockmap selftests to make output easier to understand
   and to reduce overall runtime as well as adding new BPF kTLS selftests
   that run in combination with sockmap, also from John Fastabend.

5) Batch of misc updates to BPF selftests including fixing up test_align
   to match verifier output again and moving it under test_progs, allowing
   bpf_iter selftest to compile on machines with older vmlinux.h, and
   updating config options for lirc and v6 segment routing helpers, from
   Stanislav Fomichev, Andrii Nakryiko and Alan Maguire.

6) Conversion of BPF tracing samples outdated internal BPF loader to use
   libbpf API instead, from Daniel T. Lee.

7) Follow-up to BPF kernel test infrastructure in order to fix a flake in
   the XDP selftests, from Jesper Dangaard Brouer.

8) Minor improvements to libbpf's internal hashmap implementation, from
   Ian Rogers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 18:30:34 -07:00
Roopa Prabhu
1274e1cc42 vxlan: ecmp support for mac fdb entries
Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
  exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a null check fix in vxlan_xmit from Nikolay

v2 - Fixed build issue:
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Roopa Prabhu
38428d6871 nexthop: support for fdb ecmp nexthops
This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
This patch includes appropriate checks to avoid routes
referencing such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 - fixed uninitialized variable reported by kernel test robot
Reported-by: kernel test robot <rong.a.chen@intel.com>

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Oleksij Rempel
8066021915 ethtool: provide UAPI for PHY Signal Quality Index (SQI)
Signal Quality Index is a mandatory value required by "OPEN Alliance
SIG" for the 100Base-T1 PHYs [1]. This indicator can be used for cable
integrity diagnostic and investigating other noise sources and
implement by at least two vendors: NXP[2] and TI[3].

[1] http://www.opensig.org/download/document/218/Advanced_PHY_features_for_automotive_Ethernet_V1.0.pdf
[2] https://www.nxp.com/docs/en/data-sheet/TJA1100.pdf
[3] https://www.ti.com/product/DP83TC811R-Q1

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21 17:18:00 -07:00
Chris Mi
d8bed686ab net: psample: Add tunnel support
Currently, psample can only send the packet bits after decapsulation.
The tunnel information is lost. Add the tunnel support.

If the sampled packet has no tunnel info, the behavior is the same as
before. If it has, add a nested metadata field named PSAMPLE_ATTR_TUNNEL
and include the tunnel subfields if applicable.

Increase the metadata length for sampled packet with the tunnel info.
If new subfields of tunnel info should be included, update the metadata
length accordingly.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21 17:04:07 -07:00
Martijn Coenen
3448914e8c loop: Add LOOP_CONFIGURE ioctl
This allows userspace to completely setup a loop device with a single
ioctl, removing the in-between state where the device can be partially
configured - eg the loop device has a backing file associated with it,
but is reading from the wrong offset.

Besides removing the intermediate state, another big benefit of this
ioctl is that LOOP_SET_STATUS can be slow; the main reason for this
slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to
freeze the associated queue; this requires waiting for RCU
synchronization, which I've measured can take about 15-20ms on this
device on average.

In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can
also be used to:
- Set the correct block size immediately by setting
  loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE)
- Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO
  in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO)
- Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY
  in loop_config.info.lo_flags

Here's setting up ~70 regular loop devices with an offset on an x86
Android device, using LOOP_SET_FD and LOOP_SET_STATUS:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
    0m03.40s real     0m00.02s user     0m00.03s system

Here's configuring ~70 devices in the same way, but using a modified
losetup that uses the new LOOP_CONFIGURE ioctl:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
    0m01.94s real     0m00.01s user     0m00.01s system

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21 08:20:35 -06:00
Martijn Coenen
faf1d25440 loop: Clean up LOOP_SET_STATUS lo_flags handling
LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in
particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas
LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this
explicit by updating the UAPI to include the flags that can be
set/cleared using this ioctl.

The implementation can then blindly take over the passed in flags,
and use the previous flags for those flags that can't be set / cleared
using LOOP_SET_STATUS.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21 08:20:35 -06:00
Paolo Bonzini
9d5272f5e3 Merge tag 'noinstr-x86-kvm-2020-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD 2020-05-20 03:40:09 -04:00
Daniel Borkmann
1b66d25361 bpf: Add get{peer, sock}name attach types for sock_addr
As stated in 983695fa67 ("bpf: fix unconnected udp hooks"), the objective
for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be
transparent to applications. In Cilium we make use of these hooks [0] in
order to enable E-W load balancing for existing Kubernetes service types
for all Cilium managed nodes in the cluster. Those backends can be local
or remote. The main advantage of this approach is that it operates as close
as possible to the socket, and therefore allows to avoid packet-based NAT
given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses.

This also allows to expose NodePort services on loopback addresses in the
host namespace, for example. As another advantage, this also efficiently
blocks bind requests for applications in the host namespace for exposed
ports. However, one missing item is that we also need to perform reverse
xlation for inet{,6}_getname() hooks such that we can return the service
IP/port tuple back to the application instead of the remote peer address.

The vast majority of applications does not bother about getpeername(), but
in a few occasions we've seen breakage when validating the peer's address
since it returns unexpectedly the backend tuple instead of the service one.
Therefore, this trivial patch allows to customise and adds a getpeername()
as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order
to address this situation.

Simple example:

  # ./cilium/cilium service list
  ID   Frontend     Service Type   Backend
  1    1.2.3.4:80   ClusterIP      1 => 10.0.0.10:80

Before; curl's verbose output example, no getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0)
  > GET / HTTP/1.1
  > Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

After; with getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
  > GET / HTTP/1.1
  >  Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed
peer to the context similar as in inet{,6}_getname() fashion, but API-wise
this is suboptimal as it always enforces programs having to test for ctx->peer
which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split.
Similarly, the checked return code is on tnum_range(1, 1), but if a use case
comes up in future, it can easily be changed to return an error code instead.
Helper and ctx member access is the same as with connect/sendmsg/etc hooks.

  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
2020-05-19 11:32:04 -07:00
Eric Biggers
e3b1078bed fscrypt: add support for IV_INO_LBLK_32 policies
The eMMC inline crypto standard will only specify 32 DUN bits (a.k.a. IV
bits), unlike UFS's 64.  IV_INO_LBLK_64 is therefore not applicable, but
an encryption format which uses one key per policy and permits the
moving of encrypted file contents (as f2fs's garbage collector requires)
is still desirable.

To support such hardware, add a new encryption format IV_INO_LBLK_32
that makes the best use of the 32 bits: the IV is set to
'SipHash-2-4(inode_number) + file_logical_block_number mod 2^32', where
the SipHash key is derived from the fscrypt master key.  We hash only
the inode number and not also the block number, because we need to
maintain contiguity of DUNs to merge bios.

Unlike with IV_INO_LBLK_64, with this format IV reuse is possible; this
is unavoidable given the size of the DUN.  This means this format should
only be used where the requirements of the first paragraph apply.
However, the hash spreads out the IVs in the whole usable range, and the
use of a keyed hash makes it difficult for an attacker to determine
which files use which IVs.

Besides the above differences, this flag works like IV_INO_LBLK_64 in
that on ext4 it is only allowed if the stable_inodes feature has been
enabled to prevent inode numbers and the filesystem UUID from changing.

Link: https://lore.kernel.org/r/20200515204141.251098-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Paul Crowley <paulcrowley@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-19 09:34:18 -07:00
David Howells
f7e47677e3 watch_queue: Add a key/keyring notification facility
Add a key/keyring change notification facility whereby notifications about
changes in key and keyring content and attributes can be received.

Firstly, an event queue needs to be created:

	pipe2(fds, O_NOTIFICATION_PIPE);
	ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256);

then a notification can be set up to report notifications via that queue:

	struct watch_notification_filter filter = {
		.nr_filters = 1,
		.filters = {
			[0] = {
				.type = WATCH_TYPE_KEY_NOTIFY,
				.subtype_filter[0] = UINT_MAX,
			},
		},
	};
	ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);

After that, records will be placed into the queue when events occur in
which keys are changed in some way.  Records are of the following format:

	struct key_notification {
		struct watch_notification watch;
		__u32	key_id;
		__u32	aux;
	} *n;

Where:

	n->watch.type will be WATCH_TYPE_KEY_NOTIFY.

	n->watch.subtype will indicate the type of event, such as
	NOTIFY_KEY_REVOKED.

	n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
	record.

	n->watch.info & WATCH_INFO_ID will be the second argument to
	keyctl_watch_key(), shifted.

	n->key will be the ID of the affected key.

	n->aux will hold subtype-dependent information, such as the key
	being linked into the keyring specified by n->key in the case of
	NOTIFY_KEY_LINKED.

Note that it is permissible for event records to be of variable length -
or, at least, the length may be dependent on the subtype.  Note also that
the queue can be shared between multiple notifications of various types.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
2020-05-19 15:19:06 +01:00
David Howells
c73be61ced pipe: Add general notification queue support
Make it possible to have a general notification queue built on top of a
standard pipe.  Notifications are 'spliced' into the pipe and then read
out.  splice(), vmsplice() and sendfile() are forbidden on pipes used for
notifications as post_one_notification() cannot take pipe->mutex.  This
means that notifications could be posted in between individual pipe
buffers, making iov_iter_revert() difficult to effect.

The way the notification queue is used is:

 (1) An application opens a pipe with a special flag and indicates the
     number of messages it wishes to be able to queue at once (this can
     only be set once):

	pipe2(fds, O_NOTIFICATION_PIPE);
	ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);

 (2) The application then uses poll() and read() as normal to extract data
     from the pipe.  read() will return multiple notifications if the
     buffer is big enough, but it will not split a notification across
     buffers - rather it will return a short read or EMSGSIZE.

     Notification messages include a length in the header so that the
     caller can split them up.

Each message has a header that describes it:

	struct watch_notification {
		__u32	type:24;
		__u32	subtype:8;
		__u32	info;
	};

The type indicates the source (eg. mount tree changes, superblock events,
keyring changes, block layer events) and the subtype indicates the event
type (eg. mount, unmount; EIO, EDQUOT; link, unlink).  The info field
indicates a number of things, including the entry length, an ID assigned to
a watchpoint contributing to this buffer and type-specific flags.

Supplementary data, such as the key ID that generated an event, can be
attached in additional slots.  The maximum message size is 127 bytes.
Messages may not be padded or aligned, so there is no guarantee, for
example, that the notification type will be on a 4-byte bounary.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-19 15:08:24 +01:00
David Howells
b580b93664 pipe: Add O_NOTIFICATION_PIPE
Add an O_NOTIFICATION_PIPE flag that can be passed to pipe2() to indicate
that the pipe being created is going to be used for notifications.  This
suppresses the use of splice(), vmsplice(), tee() and sendfile() on the
pipe as calling iov_iter_revert() on a pipe when a kernel notification
message has been inserted into the middle of a multi-buffer splice will be
messy.

The flag is given the same value as O_EXCL as it seems unlikely that
this flag will ever be applicable to pipes and I don't want to use up
another O_* bit unnecessarily.  An alternative could be to add a pipe3()
system call.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-19 15:08:23 +01:00
David Howells
0858caa419 uapi: General notification queue definitions
Add UAPI definitions for the general notification queue, including the
following pieces:

 (*) struct watch_notification.

     This is the metadata header for notification messages.  It includes a
     type and subtype that indicate the source of the message
     (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message
     (eg. NOTIFY_MOUNT_NEW_MOUNT).

     The header also contains an information field that conveys the
     following information:

	- WATCH_INFO_LENGTH.  The size of the entry (entries are variable
          length).

	- WATCH_INFO_ID.  The watch ID specified when the watchpoint was
          set.

	- WATCH_INFO_TYPE_INFO.  (Sub)type-specific information.

	- WATCH_INFO_FLAG_*.  Flag bits overlain on the type-specific
          information.  For use by the type.

     All the information in the header can be used in filtering messages at
     the point of writing into the buffer.

 (*) struct watch_notification_removal

     This is an extended watch-removal notification record that includes an
     'id' field that can indicate the identifier of the object being
     removed if available (for instance, a keyring serial number).

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-19 15:08:23 +01:00
Jacob Pan
b0d1f8741b iommu/vt-d: Add nested translation helper function
Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

This patch adds a helper function for setting up nested translation
where second level comes from a domain and first level comes from
a guest PGD.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200516062101.29541-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-18 15:37:25 +02:00
Jacopo Mondi
926645d43f media: v4l2-ctrls: Add camera orientation and rotation
Add support for the newly defined V4L2_CID_CAMERA_ORIENTATION
and V4L2_CID_CAMERA_SENSOR_ROTATION read-only controls used to report
the camera device mounting position and orientation respectively.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-05-18 15:34:21 +02:00
Pavel Begunkov
f2a8d5c7a2 io_uring: add tee(2) support
Add IORING_OP_TEE implementing tee(2) support. Almost identical to
splice bits, but without offsets.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-17 14:10:07 -06:00
Stefano Garzarella
7e55a19cf6 io_uring: add IORING_CQ_EVENTFD_DISABLED to the CQ ring flags
This new flag should be set/clear from the application to
disable/enable eventfd notifications when a request is completed
and queued to the CQ ring.

Before this patch, notifications were always sent if an eventfd is
registered, so IORING_CQ_EVENTFD_DISABLED is not set during the
initialization.

It will be up to the application to set the flag after initialization
if no notifications are required at the beginning.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-15 12:16:59 -06:00
Stefano Garzarella
0d9b5b3af1 io_uring: add 'cq_flags' field for the CQ ring
This patch adds the new 'cq_flags' field that should be written by
the application and read by the kernel.

This new field is available to the userspace application through
'cq_off.flags'.
We are using 4-bytes previously reserved and set to zero. This means
that if the application finds this field to zero, then the new
functionality is not supported.

In the next patch we will introduce the first flag available.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-15 12:16:59 -06:00
David S. Miller
3430223d39 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-05-15

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 1 day(s) which contain
a total of 67 files changed, 741 insertions(+), 252 deletions(-).

The main changes are:

1) bpf_xdp_adjust_tail() now allows to grow the tail as well, from Jesper.

2) bpftool can probe CONFIG_HZ, from Daniel.

3) CAP_BPF is introduced to isolate user processes that use BPF infra and
   to secure BPF networking services by dropping CAP_SYS_ADMIN requirement
   in certain cases, from Alexei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 10:43:52 -07:00
Vlad Buslov
f8ab1807a9 net: sched: introduce terse dump flag
Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse
filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option
is intended to be used to improve performance of TC filter dump when
userland only needs to obtain stats and not the whole classifier/action
data. Extend struct tcf_proto_ops with new terse_dump() callback that must
be defined by supporting classifier implementations.

Support of the options in specific classifiers and actions is
implemented in following patches in the series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 10:23:11 -07:00
Alexei Starovoitov
a17b53c4a4 bpf, capability: Introduce CAP_BPF
Split BPF operations that are allowed under CAP_SYS_ADMIN into
combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.
For backward compatibility include them in CAP_SYS_ADMIN as well.

The end result provides simple safety model for applications that use BPF:
- to load tracing program types
  BPF_PROG_TYPE_{KPROBE, TRACEPOINT, PERF_EVENT, RAW_TRACEPOINT, etc}
  use CAP_BPF and CAP_PERFMON
- to load networking program types
  BPF_PROG_TYPE_{SCHED_CLS, XDP, SK_SKB, etc}
  use CAP_BPF and CAP_NET_ADMIN

There are few exceptions from this rule:
- bpf_trace_printk() is allowed in networking programs, but it's using
  tracing mechanism, hence this helper needs additional CAP_PERFMON
  if networking program is using this helper.
- BPF_F_ZERO_SEED flag for hash/lru map is allowed under CAP_SYS_ADMIN only
  to discourage production use.
- BPF HW offload is allowed under CAP_SYS_ADMIN.
- bpf_probe_write_user() is allowed under CAP_SYS_ADMIN only.

CAPs are not checked at attach/detach time with two exceptions:
- loading BPF_PROG_TYPE_CGROUP_SKB is allowed for unprivileged users,
  hence CAP_NET_ADMIN is required at attach time.
- flow_dissector detach doesn't check prog FD at detach,
  hence CAP_NET_ADMIN is required at detach time.

CAP_SYS_ADMIN is required to iterate BPF objects (progs, maps, links) via get_next_id
command and convert them to file descriptor via GET_FD_BY_ID command.
This restriction guarantees that mutliple tasks with CAP_BPF are not able to
affect each other. That leads to clean isolation of tasks. For example:
task A with CAP_BPF and CAP_NET_ADMIN loads and attaches a firewall via bpf_link.
task B with the same capabilities cannot detach that firewall unless
task A explicitly passed link FD to task B via scm_rights or bpffs.
CAP_SYS_ADMIN can still detach/unload everything.

Two networking user apps with CAP_SYS_ADMIN and CAP_NET_ADMIN can
accidentely mess with each other programs and maps.
Two networking user apps with CAP_NET_ADMIN and CAP_BPF cannot affect each other.

CAP_NET_ADMIN + CAP_BPF allows networking programs access only packet data.
Such networking progs cannot access arbitrary kernel memory or leak pointers.

bpftool, bpftrace, bcc tools binaries should NOT be installed with
CAP_BPF and CAP_PERFMON, since unpriv users will be able to read kernel secrets.
But users with these two permissions will be able to use these tracing tools.

CAP_PERFMON is least secure, since it allows kprobes and kernel memory access.
CAP_NET_ADMIN can stop network traffic via iproute2.
CAP_BPF is the safest from security point of view and harmless on its own.

Having CAP_BPF and/or CAP_NET_ADMIN is not enough to write into arbitrary map
and if that map is used by firewall-like bpf prog.
CAP_BPF allows many bpf prog_load commands in parallel. The verifier
may consume large amount of memory and significantly slow down the system.

Existing unprivileged BPF operations are not affected.
In particular unprivileged users are allowed to load socket_filter and cg_skb
program types and to create array, hash, prog_array, map-in-map map types.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-2-alexei.starovoitov@gmail.com
2020-05-15 17:29:41 +02:00
Jesper Dangaard Brouer
c8741e2bfe xdp: Allow bpf_xdp_adjust_tail() to grow packet size
Finally, after all drivers have a frame size, allow BPF-helper
bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.

Remember that helper/macro xdp_data_hard_end have reserved some
tailroom.  Thus, this helper makes sure that the BPF-prog don't have
access to this tailroom area.

V2: Remove one chicken check and use WARN_ONCE for other

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158945348530.97035.12577148209134239291.stgit@firesoul
2020-05-14 21:21:56 -07:00
David S. Miller
d00f26b623 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-05-14

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON.

2) support for narrow loads in bpf_sock_addr progs and additional
   helpers in cg-skb progs, from Andrey.

3) bpf benchmark runner, from Andrii.

4) arm and riscv JIT optimizations, from Luke.

5) bpf iterator infrastructure, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14 20:31:21 -07:00
Andrey Ignatov
f307fa2cb4 bpf: Introduce bpf_sk_{, ancestor_}cgroup_id helpers
With having ability to lookup sockets in cgroup skb programs it becomes
useful to access cgroup id of retrieved sockets so that policies can be
implemented based on origin cgroup of such socket.

For example, a container running in a cgroup can have cgroup skb ingress
program that can lookup peer socket that is sending packets to a process
inside the container and decide whether those packets should be allowed
or denied based on cgroup id of the peer.

More specifically such ingress program can implement intra-host policy
"allow incoming packets only from this same container and not from any
other container on same host" w/o relying on source IP addresses since
quite often it can be the case that containers share same IP address on
the host.

Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and
bpf_sk_ancestor_cgroup_id().

These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id
helpers with the only difference that sk is used to get cgroup id
instead of skb, and share code with them.

See documentation in UAPI for more details.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com
2020-05-14 18:41:07 -07:00
Andrey Ignatov
7aebfa1b38 bpf: Support narrow loads from bpf_sock_addr.user_port
bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly
code in BPF programs, like:

	volatile __u32 user_port = ctx->user_port;
	__u16 port = bpf_ntohs(user_port);

Since otherwise clang may optimize the load to be 2-byte and it's
rejected by verifier.

Add support for 1- and 2-byte loads same way as it's supported for other
fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com
2020-05-14 18:30:57 -07:00
Miklos Szeredi
c8ffd8bcdd vfs: add faccessat2 syscall
POSIX defines faccessat() as having a fourth "flags" argument, while the
linux syscall doesn't have it.  Glibc tries to emulate AT_EACCESS and
AT_SYMLINK_NOFOLLOW, but AT_EACCESS emulation is broken.

Add a new faccessat(2) syscall with the added flags argument and implement
both flags.

The value of AT_EACCESS is defined in glibc headers to be the same as
AT_REMOVEDIR.  Use this value for the kernel interface as well, together
with the explanatory comment.

Also add AT_EMPTY_PATH support, which is not documented by POSIX, but can
be useful and is trivial to implement.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-14 16:44:25 +02:00
Miklos Szeredi
80340fe360 statx: add mount_root
Determining whether a path or file descriptor refers to a mountpoint (or
more precisely a mount root) is not trivial using current tools.

Add a flag to statx that indicates whether the path or fd refers to the
root of a mount or not.

Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Reported-by: Lennart Poettering <mzxreary@0pointer.de>
Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14 16:44:24 +02:00
Miklos Szeredi
fa2fcf4f1d statx: add mount ID
Systemd is hacking around to get it and it's trivial to add to statx, so...

Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14 16:44:24 +02:00
Miklos Szeredi
581701b7ef uapi: deprecate STATX_ALL
Constants of the *_ALL type can be actively harmful due to the fact that
developers will usually fail to consider the possible effects of future
changes to the definition.

Deprecate STATX_ALL in the uapi, while no damage has been done yet.

We could keep something like this around in the kernel, but there's
actually no point, since all filesystems should be explicitly checking
flags that they support and not rely on the VFS masking unknown ones out: a
flag could be known to the VFS, yet not known to the filesystem.

Cc: David Howells <dhowells@redhat.com>
Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14 16:44:24 +02:00
Andrey Konovalov
c61769bd47 usb: raw-gadget: support stalling/halting/wedging endpoints
Raw Gadget is currently unable to stall/halt/wedge gadget endpoints,
which is required for proper emulation of certain USB classes.

This patch adds a few more ioctls:

- USB_RAW_IOCTL_EP0_STALL allows to stall control endpoint #0 when
  there's a pending setup request for it.
- USB_RAW_IOCTL_SET/CLEAR_HALT/WEDGE allow to set/clear halt/wedge status
  on non-control non-isochronous endpoints.

Fixes: f2c2e71764 ("usb: gadget: add raw-gadget interface")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-05-14 12:30:18 +03:00
Andrey Konovalov
97df5e5758 usb: raw-gadget: fix gadget endpoint selection
Currently automatic gadget endpoint selection based on required features
doesn't work. Raw Gadget tries iterating over the list of available
endpoints and finding one that has the right direction and transfer type.
Unfortunately selecting arbitrary gadget endpoints (even if they satisfy
feature requirements) doesn't work, as (depending on the UDC driver) they
might have fixed addresses, and one also needs to provide matching
endpoint addresses in the descriptors sent to the host.

The composite framework deals with this by assigning endpoint addresses
in usb_ep_autoconfig() before enumeration starts. This approach won't work
with Raw Gadget as the endpoints are supposed to be enabled after a
set_configuration/set_interface request from the host, so it's too late to
patch the endpoint descriptors that had already been sent to the host.

For Raw Gadget we take another approach. Similarly to GadgetFS, we allow
the user to make the decision as to which gadget endpoints to use.

This patch adds another Raw Gadget ioctl USB_RAW_IOCTL_EPS_INFO that
exposes information about all non-control endpoints that a currently
connected UDC has. This information includes endpoints addresses, as well
as their capabilities and limits to allow the user to choose the most
fitting gadget endpoint.

The USB_RAW_IOCTL_EP_ENABLE ioctl is updated to use the proper endpoint
validation routine usb_gadget_ep_match_desc().

These changes affect the portability of the gadgets that use Raw Gadget
when running on different UDCs. Nevertheless, as long as the user relies
on the information provided by USB_RAW_IOCTL_EPS_INFO to dynamically
choose endpoint addresses, UDC-agnostic gadgets can still be written with
Raw Gadget.

Fixes: f2c2e71764 ("usb: gadget: add raw-gadget interface")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-05-14 12:30:17 +03:00
Andrey Konovalov
17ff3b72e7 usb: raw-gadget: improve uapi headers comments
Fix typo "trasferred" => "transferred".

Don't call USB requests URBs.

Fix comment style.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-05-14 12:30:17 +03:00
Dave Airlie
49eea1c657 Merge tag 'amd-drm-next-5.8-2020-05-12' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.8-2020-05-12:

amdgpu:
- Misc cleanups
- RAS fixes
- Expose FP16 for modesetting
- DP 1.4 compliance test fixes
- Clockgating fixes
- MAINTAINERS update
- Soft recovery for gfx10
- Runtime PM cleanups
- PSP code cleanups

amdkfd:
- Track GPU memory utilization per process
- Report PCI domain in topology

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200512213703.4039-1-alexander.deucher@amd.com
2020-05-14 13:21:33 +10:00
Paolo Bonzini
4aef2ec902 Merge branch 'kvm-amd-fixes' into HEAD 2020-05-13 12:14:05 -04:00
Dmitry Torokhov
0fdc50dfab Linux 5.6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6BIG4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlHUH/RCFve2sfHRPjRW+
 xR5SaLVAw6XKvtKBq7yvKmHEwqNJnL79IHyqqtSrtfFr2FfaH/KvYiCbbAezvSrM
 np0udGu7STKGd21CWuyEZJudyhXkOwMRNiFiCXWp7rs35oh8T0TpJxMzo2Nc1nLk
 JFQPqAP6OSvq4IkWEywKQI+Au3Z1IBf83xVjZ1s+MKPQHYD49x2hc4cQntL5/cnm
 a3DoR2iBkYiGZCZ9dDqAqJTnMQIiCbACdZXgGjNRUpdyA/dtAjsMl11NPYHm8TA2
 3AHBupAK50WBZGad6xv2qKQyScsmoJG2mv92QjlOFz0Tpiu6rLnDlLYREDVB6YH6
 qbLDsc8=
 =XEIU
 -----END PGP SIGNATURE-----

Merge tag 'v5.6' into next

Sync up with mainline to get device tree and other changes.
2020-05-12 12:18:21 -07:00
Denis Efremov
0836275df4 floppy: suppress UBSAN warning in setup_rw_floppy()
UBSAN: array-index-out-of-bounds in drivers/block/floppy.c:1521:45
index 16 is out of range for type 'unsigned char [16]'
Call Trace:
...
 setup_rw_floppy+0x5c3/0x7f0
 floppy_ready+0x2be/0x13b0
 process_one_work+0x2c1/0x5d0
 worker_thread+0x56/0x5e0
 kthread+0x122/0x170
 ret_from_fork+0x35/0x40

From include/uapi/linux/fd.h:
struct floppy_raw_cmd {
	...
	unsigned char cmd_count;
	unsigned char cmd[16];
	unsigned char reply_count;
	unsigned char reply[16];
	...
}

This out-of-bounds access is intentional. The command in struct
floppy_raw_cmd may take up the space initially intended for the reply
and the reply count. It is needed for long 82078 commands such as
RESTORE, which takes 17 command bytes. Initial cmd size is not enough
and since struct setup_rw_floppy is a part of uapi we check that
cmd_count is in [0:16+1+16] in raw_cmd_copyin().

The patch adds union with original cmd,reply_count,reply fields and
fullcmd field of equivalent size. The cmd accesses are turned to
fullcmd where appropriate to suppress UBSAN warning.

Link: https://lore.kernel.org/r/20200501134416.72248-5-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
2020-05-12 19:34:57 +03:00
Denis Efremov
bd10a5f3e2 floppy: add defines for sizes of cmd & reply buffers of floppy_raw_cmd
Use FD_RAW_CMD_SIZE, FD_RAW_REPLY_SIZE defines instead of magic numbers
for cmd & reply buffers of struct floppy_raw_cmd. Remove local to
floppy.c MAX_REPLIES define, as it is now FD_RAW_REPLY_SIZE.
FD_RAW_CMD_FULLSIZE added as we allow command to also fill reply_count
and reply fields.

Link: https://lore.kernel.org/r/20200501134416.72248-4-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
2020-05-12 19:34:56 +03:00
Denis Efremov
9c4c5a24c8 floppy: add FD_AUTODETECT_SIZE define for struct floppy_drive_params
Use FD_AUTODETECT_SIZE for autodetect buffer size in struct
floppy_drive_params instead of a magic number.

Link: https://lore.kernel.org/r/20200501134416.72248-3-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
2020-05-12 19:34:56 +03:00
Willy Tarreau
7d33850abd floppy: add references to 82077's extra registers
This controller provides extra status registers SRA and SRB as well
as a tape drive register (TDR) and a data rate select register (DSR),
which are referenced in the sparc port, so let's have their symbolic
definitions centralized.

Link: https://lore.kernel.org/r/20200331094054.24441-3-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
2020-05-12 19:34:52 +03:00
Hans Verkuil
6446ec6cbf media: v4l2-subdev: add VIDIOC_SUBDEV_QUERYCAP ioctl
While normal video/radio/vbi/swradio nodes have a proper QUERYCAP ioctl
that apps can call to determine that it is indeed a V4L2 device, there
is currently no equivalent for v4l-subdev nodes. Adding this ioctl will
solve that, and it will allow utilities like v4l2-compliance to be used
with these devices as well.

SUBDEV_QUERYCAP currently returns the version and capabilities of the
subdevice. Define a capability flag to report if the subdevice is
registered in read-only mode.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-05-12 17:05:31 +02:00
Quentin Monnet
ab8d78093d bpf: Minor fixes to BPF helpers documentation
Minor improvements to the documentation for BPF helpers:

* Fix formatting for the description of "bpf_socket" for
  bpf_getsockopt() and bpf_setsockopt(), thus suppressing two warnings
  from rst2man about "Unexpected indentation".
* Fix formatting for return values for bpf_sk_assign() and seq_file
  helpers.
* Fix and harmonise formatting, in particular for function/struct names.
* Remove blank lines before "Return:" sections.
* Replace tabs found in the middle of text lines.
* Fix typos.
* Add a note to the footer (in Python script) about "bpftool feature
  probe", including for listing features available to unprivileged
  users, and add a reference to bpftool man page.

Thanks to Florian for reporting two typos (duplicated words).

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200511161536.29853-4-quentin@isovalent.com
2020-05-11 21:20:53 +02:00
Alexandre Belloni
734e5e4e26 rtc: add new VL flag for backup switchover
A new flag RTC_VL_BACKUP_SWITCH means that a backup switchover happened
since last flag clear.

Link: https://lore.kernel.org/r/20200505201310.255145-1-alexandre.belloni@bootlin.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2020-05-11 16:04:35 +02:00
Andrew Lunn
b28efb930b net: ethtool: Add attributes for cable test reports
Add the attributes needed to report cable test results to userspace.
The reports are expected to be per twisted pair. A nested property per
pair can report the result of the cable test. A nested property can
also report the length of the cable to any fault.

v2:
Grammar fixes
Change length from u16 to u32
s/DEV/HEADER/g
Add status attributes
Rename pairs from numbers to letters.

v3:
Fixed example in document
Add ETHTOOL_A_CABLE_NEST_* enum
Add ETHTOOL_MSG_CABLE_TEST_NTF to documentation

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 12:28:41 -07:00
Andrew Lunn
11ca3c4261 net: ethtool: netlink: Add support for triggering a cable test
Add new ethtool netlink calls to trigger the starting of a PHY cable
test.

Add Kconfig'ury to ETHTOOL_NETLINK so that PHYLIB is not a module when
ETHTOOL_NETLINK is builtin, which would result in kernel linking errors.

v2:
Remove unwanted white space change
Remove ethnl_cable_test_act_ops and use doit handler
Rename cable_test_set_policy cable_test_act_policy
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY

v3:
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY from documentation
Remove unused cable_test_get_policy
Add Reviewed-by tags

v4:
Remove unwanted blank line

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 12:28:41 -07:00
Yonghong Song
492e639f0c bpf: Add bpf_seq_printf and bpf_seq_write helpers
Two helpers bpf_seq_printf and bpf_seq_write, are added for
writing data to the seq_file buffer.

bpf_seq_printf supports common format string flag/width/type
fields so at least I can get identical results for
netlink and ipv6_route targets.

For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
specifically indicates a write failure due to overflow, which
means the object will be repeated in the next bpf invocation
if object collection stays the same. Note that if the object
collection is changed, depending how collection traversal is
done, even if the object still in the collection, it may not
be visited.

For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to
read kernel memory. Reading kernel memory may fail in
the following two cases:
  - invalid kernel address, or
  - valid kernel address but requiring a major fault
If reading kernel memory failed, the %s string will be
an empty string and %p{i,I}{4,6} will be all 0.
Not returning error to bpf program is consistent with
what bpf_trace_printk() does for now.

bpf_seq_printf may return -EBUSY meaning that internal percpu
buffer for memory copy of strings or other pointees is
not available. Bpf program can return 1 to indicate it
wants the same object to be repeated. Right now, this should not
happen on no-RT kernels since migrate_disable(), which guards
bpf prog call, calls preempt_disable().

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
ac51d99bf8 bpf: Create anonymous bpf iterator
A new bpf command BPF_ITER_CREATE is added.

The anonymous bpf iterator is seq_file based.
The seq_file private data are referenced by targets.
The bpf_iter infrastructure allocated additional space
at seq_file->private before the space used by targets
to store some meta data, e.g.,
  prog:       prog to run
  session_id: an unique id for each opened seq_file
  seq_num:    how many times bpf programs are queried in this session
  done_stop:  an internal state to decide whether bpf program
              should be called in seq_ops->stop() or not

The seq_num will start from 0 for valid objects.
The bpf program may see the same seq_num more than once if
 - seq_file buffer overflow happens and the same object
   is retried by bpf_seq_read(), or
 - the bpf program explicitly requests a retry of the
   same object

Since module is not supported for bpf_iter, all target
registeration happens at __init time, so there is no
need to change bpf_iter_unreg_target() as it is used
mostly in error path of the init function at which time
no bpf iterators have been created yet.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
de4e05cac4 bpf: Support bpf tracing/iter programs for BPF_LINK_CREATE
Given a bpf program, the step to create an anonymous bpf iterator is:
  - create a bpf_iter_link, which combines bpf program and the target.
    In the future, there could be more information recorded in the link.
    A link_fd will be returned to the user space.
  - create an anonymous bpf iterator with the given link_fd.

The bpf_iter_link can be pinned to bpffs mount file system to
create a file based bpf iterator as well.

The benefit to use of bpf_iter_link:
  - using bpf link simplifies design and implementation as bpf link
    is used for other tracing bpf programs.
  - for file based bpf iterator, bpf_iter_link provides a standard
    way to replace underlying bpf programs.
  - for both anonymous and free based iterators, bpf link query
    capability can be leveraged.

The patch added support of tracing/iter programs for BPF_LINK_CREATE.
A new link type BPF_LINK_TYPE_ITER is added to facilitate link
querying. Currently, only prog_id is needed, so there is no
additional in-kernel show_fdinfo() and fill_link_info() hook
is needed for BPF_LINK_TYPE_ITER link.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
15d83c4d7c bpf: Allow loading of a bpf_iter program
A bpf_iter program is a tracing program with attach type
BPF_TRACE_ITER. The load attribute
  attach_btf_id
is used by the verifier against a particular kernel function,
which represents a target, e.g., __bpf_iter__bpf_map
for target bpf_map which is implemented later.

The program return value must be 0 or 1 for now.
  0 : successful, except potential seq_file buffer overflow
      which is handled by seq_file reader.
  1 : request to restart the same object

In the future, other return values may be used for filtering or
teminating the iterator.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Stanislav Fomichev
8086fbaf49 bpf: Allow any port in bpf_bind helper
We want to have a tighter control on what ports we bind to in
the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means
connect() becomes slightly more expensive. The expensive part
comes from the fact that we now need to call inet_csk_get_port()
that verifies that the port is not used and allocates an entry
in the hash table for it.

Since we can't rely on "snum || !bind_address_no_port" to prevent
us from calling POST_BIND hook anymore, let's add another bind flag
to indicate that the call site is BPF program.

v5:
* fix wrong AF_INET (should be AF_INET6) in the bpf program for v6

v3:
* More bpf_bind documentation refinements (Martin KaFai Lau)
* Add UDP tests as well (Martin KaFai Lau)
* Don't start the thread, just do socket+bind+listen (Martin KaFai Lau)

v2:
* Update documentation (Andrey Ignatov)
* Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov)

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-5-sdf@google.com
2020-05-09 00:48:20 +02:00
Dave Airlie
370fb6b0aa Merge tag 'amd-drm-next-5.8-2020-04-30' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.8-2020-04-30:

amdgpu:
- SR-IOV fixes
- SDMA fix for Navi
- VCN 2.5 DPG fixes
- Display fixes
- Display stuttering fixes for pageflip and cursor
- Add support for handling encrypted GPU memory
- Add UAPI for encrypted GPU memory
- Rework IB pool handling

amdkfd:
- Expose asic revision in topology
- Add UAPI for GWS (Global Wave Sync) resource management

UAPI:
- Add amdgpu UAPI for encrypted GPU memory
  Used by: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4401
- Add amdkfd UAPI for GWS (Global Wave Sync) resource management
  Thunk usage of KFD ioctl: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-2.8.0/src/queues.c#L840
  ROCr usage of Thunk API: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/roc-3.1.0/src/core/runtime/amd_gpu_agent.cpp#L597
  HCC code using ROCr API: 98ee9f3494/lib/hsa/mcwamp_hsa.cpp (L2161)
  HIP code using HCC API: cf8589b8c8/src/hip_module.cpp (L567)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430212951.3902-1-alexander.deucher@amd.com
2020-05-08 13:31:08 +10:00
David S. Miller
3793faad7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts were all overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 22:10:13 -07:00
Oleksij Rempel
bdbdac7649 ethtool: provide UAPI for PHY master/slave configuration.
This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of
auto-negotiation support, we needed to be able to configure the
MASTER-SLAVE role of the port manually or from an application in user
space.

The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to
force MASTER or SLAVE role. See IEEE 802.3-2018:
22.2.4.3.7 MASTER-SLAVE control register (Register 9)
22.2.4.3.8 MASTER-SLAVE status register (Register 10)
40.5.2 MASTER-SLAVE configuration resolution
45.2.1.185.1 MASTER-SLAVE config value (1.2100.14)
45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32)

The MASTER-SLAVE role affects the clock configuration:

-------------------------------------------------------------------------------
When the  PHY is configured as MASTER, the PMA Transmit function shall
source TX_TCLK from a local clock source. When configured as SLAVE, the
PMA Transmit function shall source TX_TCLK from the clock recovered from
data stream provided by MASTER.

iMX6Q                     KSZ9031                XXX
------\                /-----------\        /------------\
      |                |           |        |            |
 MAC  |<----RGMII----->| PHY Slave |<------>| PHY Master |
      |<--- 125 MHz ---+-<------/  |        | \          |
------/                \-----------/        \------------/
                                               ^
                                                \-TX_TCLK

-------------------------------------------------------------------------------

Since some clock or link related issues are only reproducible in a
specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial
to provide generic (not 100BASE-T1 specific) interface to the user space
for configuration flexibility and trouble shooting.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:45:45 -07:00
Alexei Starovoitov
f87b87a1c9 CAP_PERFMON for BPF
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6zMIUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTEOEACWUbZlswquZqSAr6pESTjAbjSPZ58s
 QfmiY218aXvyvwJeINfJduYIjtVjPL9F0qbqmm5Sh4CFflgF9QRUZLxsyuu6K7uY
 Bsy7hjHWUCO79anxgjie1rqhCxT/orW39E0nlDW72AlrebVwPRc4PmERsV9bl/9z
 Z7M7aJzza60938K8qN24A63Q4wb3uCfygUYDGkeaN46jmwlHLj8Qwu/L2pVv2/3O
 7FHEHYFK0UuvVw6byRpbPoHDbBEGApYszRlEUMRtkk/7zsvdIQJGHQpPMD5PkGY1
 kS6x1a7sdnA7++A7Oin7Uq+0y7sgVNeJyPO9o+u9wZgePZL4t87YZlwIL5NlyXIL
 JHDrz0DAckSYEAYuPnFrXtYW2AQ9TZxVuIHGsJdKW8KOdkHkYUqtXwLuj+0xf8v/
 szeJR+l6mOJzDHML7W7KzZdl+AJB/+GE55cLaRPs0bBFyLpUs/vL8BjkoDiDm3/i
 okGIWQkh8PwpujiS/mHDHqoXuVpVHYAcHD0X+zuLIUVCzKf71Kq7y2fIiHyTR4Wo
 +x5aeOFWHYOC2DwFdUQ1EiOUCtLbORqq1CDsnIE4KCtsfx1K6IHmqI8X77D8CTEp
 oSPz0kd8kgBfHwPhCepQ7DnBA1cJTiRbs6/++frU0S5R2HhIOGeLN6hrhrMjNhYD
 07jorSqL6hjjog==
 =+J1X
 -----END PGP SIGNATURE-----

Merge tag 'perf-for-bpf-2020-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into bpf-next

CAP_PERFMON for BPF
2020-05-06 17:12:44 -07:00
Laurent Pinchart
e5b6b07a1b media: v4l2: Extend VIDIOC_ENUM_FMT to support MC-centric devices
The VIDIOC_ENUM_FMT ioctl enumerates all formats supported by a video
node. For MC-centric devices, its behaviour has always been ill-defined,
with drivers implementing one of the following behaviours:

- No support for VIDIOC_ENUM_FMT at all
- Enumerating all formats supported by the video node, regardless of the
  configuration of the pipeline
- Enumerating formats supported by the video node for the active
  configuration of the connected subdevice

The first behaviour is obviously useless for applications. The second
behaviour provides the most information, but doesn't offer a way to find
what formats are compatible with a given pipeline configuration. The
third behaviour fixes that, but with the drawback that applications
can't enumerate all supported formats anymore, and have to modify the
active configuration of the pipeline to enumerate formats.

The situation is messy as none of the implemented behaviours are ideal,
and userspace can't predict what will happen as the behaviour is
driver-specific.

To fix this, let's extend the VIDIOC_ENUM_FMT with a missing capability:
enumerating pixel formats for a given media bus code. The media bus code
is passed through the v4l2_fmtdesc structure in a new mbus_code field
(repurposed from the reserved fields). With this capability in place,
applications can enumerate pixel formats for a given media bus code
without modifying the active configuration of the device.

The current behaviour of the ioctl is preserved when the new mbus_code
field is set to 0, ensuring compatibility with existing userspace. The
API extension is documented as mandatory for MC-centric devices (as
advertised through the V4L2_CAP_IO_MC capability), allowing applications
and compliance tools to easily determine the availability of the
VIDIOC_ENUM_FMT extension.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-05-06 12:09:04 +02:00
Niklas Söderlund
f645e6256b media: v4l2-dev/ioctl: Add V4L2_CAP_IO_MC
Add a video device capability flag to indicate that its inputs and/or
outputs are controlled by the Media Controller instead of the V4L2 API.
When this flag is set, ioctl for enum inputs and outputs are
automatically enabled and programmed to call a helper function.

Suggested-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-05-06 12:08:25 +02:00
Will Deacon
80e4e56132 Merge branch 'for-next/bti-user' into for-next/bti
Merge in user support for Branch Target Identification, which narrowly
missed the cut for 5.7 after a late ABI concern.

* for-next/bti-user:
  arm64: bti: Document behaviour for dynamically linked binaries
  arm64: elf: Fix allnoconfig kernel build with !ARCH_USE_GNU_PROPERTY
  arm64: BTI: Add Kconfig entry for userspace BTI
  mm: smaps: Report arm64 guarded pages in smaps
  arm64: mm: Display guarded pages in ptdump
  KVM: arm64: BTI: Reset BTYPE when skipping emulated instructions
  arm64: BTI: Reset BTYPE when skipping emulated instructions
  arm64: traps: Shuffle code to eliminate forward declarations
  arm64: unify native/compat instruction skipping
  arm64: BTI: Decode BYTPE bits when printing PSTATE
  arm64: elf: Enable BTI at exec based on ELF program properties
  elf: Allow arch to tweak initial mmap prot flags
  arm64: Basic Branch Target Identification support
  ELF: Add ELF program property parsing support
  ELF: UAPI and Kconfig additions for ELF program properties
2020-05-05 15:15:58 +01:00
Eric Dumazet
39d010504e net_sched: sch_fq: add horizon attribute
QUIC servers would like to use SO_TXTIME, without having CAP_NET_ADMIN,
to efficiently pace UDP packets.

As far as sch_fq is concerned, we need to add safety checks, so
that a buggy application does not fill the qdisc with packets
having delivery time far in the future.

This patch adds a configurable horizon (default: 10 seconds),
and a configurable policy when a packet is beyond the horizon
at enqueue() time:
- either drop the packet (default policy)
- or cap its delivery time to the horizon.

$ tc -s -d qd sh dev eth0
qdisc fq 8022: root refcnt 257 limit 10000p flow_limit 100p buckets 1024
 orphan_mask 1023 quantum 10Kb initial_quantum 51160b low_rate_threshold 550Kbit
 refill_delay 40.0ms timer_slack 10.000us horizon 10.000s
 Sent 1234215879 bytes 837099 pkt (dropped 21, overlimits 0 requeues 6)
 backlog 0b 0p requeues 6
  flows 1191 (inactive 1177 throttled 0)
  gc 0 highprio 0 throttled 692 latency 11.480us
  pkts_too_long 0 alloc_errors 0 horizon_drops 21 horizon_caps 0

v2: fixed an overflow on 32bit kernels in fq_init(), reported
    by kbuild test robot <lkp@intel.com>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 11:56:17 -07:00
Gustavo A. R. Silva
1e6e9d0f48 uapi: revert flexible-array conversions
These structures can get embedded in other structures in user-space
and cause all sorts of warnings and problems. So, we better don't take
any chances and keep the zero-length arrays in place for now.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2020-05-04 11:30:15 -05:00
Ira Weiny
712b2698e4 fs/stat: Define DAX statx attribute
In order for users to determine if a file is currently operating in DAX
state (effective DAX).  Define a statx attribute value and set that
attribute if the effective DAX flag is set.

To go along with this we propose the following addition to the statx man
page:

STATX_ATTR_DAX

	The file is in the DAX (cpu direct access) state.  DAX state
	attempts to minimize software cache effects for both I/O and
	memory mappings of this file.  It requires a file system which
	has been configured to support DAX.

	DAX generally assumes all accesses are via cpu load / store
	instructions which can minimize overhead for small accesses, but
	may adversely affect cpu utilization for large transfers.

	File I/O is done directly to/from user-space buffers and memory
	mapped I/O may be performed with direct memory mappings that
	bypass kernel page cache.

	While the DAX property tends to result in data being transferred
	synchronously, it does not give the same guarantees of O_SYNC
	where data and the necessary metadata are transferred together.

	A DAX file may support being mapped with the MAP_SYNC flag,
	which enables a program to use CPU cache flush instructions to
	persist CPU store operations without an explicit fsync(2).  See
	mmap(2) for more information.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 08:49:39 -07:00
Vincent Cheng
d3f1cbd29f ptp: Add adjust_phase to ptp_clock_caps capability.
Add adjust_phase to ptp_clock_caps capability to allow
user to query if a PHC driver supports adjust phase with
ioctl PTP_CLOCK_GETCAPS command.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02 16:31:45 -07:00
David S. Miller
115506fea4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-05-01 (v2)

The following pull-request contains BPF updates for your *net-next* tree.

We've added 61 non-merge commits during the last 6 day(s) which contain
a total of 153 files changed, 6739 insertions(+), 3367 deletions(-).

The main changes are:

1) pulled work.sysctl from vfs tree with sysctl bpf changes.

2) bpf_link observability, from Andrii.

3) BTF-defined map in map, from Andrii.

4) asan fixes for selftests, from Andrii.

5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub.

6) production cloudflare classifier as a selftes, from Lorenz.

7) bpf_ktime_get_*_ns() helper improvements, from Maciej.

8) unprivileged bpftool feature probe, from Quentin.

9) BPF_ENABLE_STATS command, from Song.

10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav.

11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs,
 from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 17:02:27 -07:00
Po Liu
a51c328df3 net: qos: introduce a gate control flow action
Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can be passed at
specific time slot, and be dropped at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action requires the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped. When the ingress frames have reach total frames over
8000000 bytes, the excessive frames will be dropped in that 200000000ns
time slot.

> tc qdisc add dev eth0 ingress

> tc filter add dev eth0 parent ffff: protocol ip \
	   flower src_ip 192.168.0.20 \
	   action gate index 2 clockid CLOCK_TAI \
	   sched-entry open 200000000 -1 8000000 \
	   sched-entry close 100000000 -1 -1

> tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow with period nanosecond. Then next item is internal
priority value means which ingress queue should put. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.
Base-time is not set will be 0 as default, as result start time would
be ((N + 1) * cycletime) which is the minimal of future time.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

 1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

> tc filter add dev eth0 parent ffff:  protocol ip \
	   flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
	   action gate index 12 base-time 1357000000000 \
	   sched-entry close 200000000 -1 -1 \
	   clockid CLOCK_TAI

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 16:08:19 -07:00
Stanislav Fomichev
beecf11bc2 bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr
Currently, bpf_getsockopt and bpf_setsockopt helpers operate on the
'struct bpf_sock_ops' context in BPF_PROG_TYPE_SOCK_OPS program.
Let's generalize them and make them available for 'struct bpf_sock_addr'.
That way, in the future, we can allow those helpers in more places.

As an example, let's expose those 'struct bpf_sock_addr' based helpers to
BPF_CGROUP_INET{4,6}_CONNECT hooks. That way we can override CC before the
connection is made.

v3:
* Expose custom helpers for bpf_sock_addr context instead of doing
  generic bpf_sock argument (as suggested by Daniel). Even with
  try_socket_lock that doesn't sleep we have a problem where context sk
  is already locked and socket lock is non-nestable.

v2:
* s/BPF_PROG_TYPE_CGROUP_SOCKOPT/BPF_PROG_TYPE_SOCK_OPS/

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200430233152.199403-1-sdf@google.com
2020-05-01 12:44:28 -07:00
Mauro Carvalho Chehab
883780af72 docs: networking: convert x25-iface.txt to ReST
Not much to be done here:

- add SPDX header;
- adjust title markup;
- remove a tail whitespace;
- add to networking/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 12:24:40 -07:00
Linus Torvalds
477bfeb9a3 drm fixes for 5.7-rc4
core:
 - EDID off by one DTD fix
 - DP mst write return code fix
 
 dma-buf:
 - fix SET_NAME ioctl uapi
 - doc fixes
 
 amdgpu:
 - Fix a green screen on resume issue
 - PM fixes for SR-IOV
  SDMA fix for navi
 - Renoir display fixes
 - Cursor and pageflip stuttering fixes
 - Misc additional display fixes
 - (uapi) Add additional DCC tiling flags for navi1x
 
 i915:
 - Fix selftest refcnt leak (Xiyu)
 - Fix gem vma lock (Chris)
 - Fix gt's i915_request.timeline acquire by checking if cacheline is valid (Chris)
 - Fix IRQ postinistall fault masks (Matt)
 
 qxl:
 - use after gree fix
 - fix lost kunmap
 - release leak fix
 
 virtio:
 - context destruction fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJeq5AAAAoJEAx081l5xIa+Id8P/0JLdvrRpHkJgTKwZq2ZDXYn
 fSfEuj5TUnIeRX4G6Z8KOuKEZwCNL3NdCfDhg3WGhpZ1MBoYEfOOOgejRKeS3UrG
 ziY+zoxaZIVFpS30TC5XiWO79RjW8dJDJogF5XgBcm26aX2j27dzzaHeJZtdAjrS
 HTKZgneYd3+B7Aa/UfZmQinHKXee0AykM/gJNbqPqp4FPSHdfEEwn72MmVHyTh1H
 zrtmzC4tVx7mHbUlwrZCPUJQcbxVuA9E0bmsk8K1xDtnI1r/lB5LldkvV/HjaI9x
 b5VEAMRoBjgZOqaXC+C/RLBIh8eP4drcKXfjt4MMCXIB6IpqEcr54wVnC7JTA4KF
 pCYa3UChnmYceqAW+Yz+37F7PbWvfWnA4YQTWR4JwG+42rbR4nc6pRAh2HHCLPGR
 M+ZiqSOYs0/ROdR2nzD2XsUjSYFksjLGCpyq2OqqOkhm6kbF8eZsL2Rxj9VPwHLk
 PJhVR/d3Kc7fQpgbEeDvf/4jFlkodNv1/FauIpWRmXtt4WJM/vKPWwx80WvekhNh
 IOqTrVW3XOZM2XYXVw/xSebBVMbs4+PtO1hRmaA5+yJwQNqU2rc6qLptUVxpib6/
 Gwxyp8Wb+7K4vIC8+5QrcKaYByiLBiaFNTIH5vo6vUv/FZHo9veZaiMqKxBptCRq
 LWONsaxPQqpdHSc/R25Q
 =4eA4
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2020-05-01' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular scheduled fixes for graphics. Nothing to extreme bunch of
  amdgpu fixes, i915 and qxl fixes, along with some misc ones.

  All seems to be progressing normally.

  core:
   - EDID off by one DTD fix
   - DP mst write return code fix

  dma-buf:
   - fix SET_NAME ioctl uapi
   - doc fixes

  amdgpu:
   - Fix a green screen on resume issue
   - PM fixes for SR-IOV SDMA fix for navi
   - Renoir display fixes
   - Cursor and pageflip stuttering fixes
   - Misc additional display fixes
   - (uapi) Add additional DCC tiling flags for navi1x

  i915:
   - Fix selftest refcnt leak (Xiyu)
   - Fix gem vma lock (Chris)
   - Fix gt's i915_request.timeline acquire by checking if cacheline is
     valid (Chris)
   - Fix IRQ postinistall fault masks (Matt)

  qxl:
   - use after gree fix
   - fix lost kunmap
   - release leak fix

  virtio:
   - context destruction fix"

* tag 'drm-fixes-2020-05-01' of git://anongit.freedesktop.org/drm/drm: (26 commits)
  dma-buf: fix documentation build warnings
  drm/qxl: qxl_release use after free
  drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
  drm/i915: Use proper fault mask in interrupt postinstall too
  drm/amd/display: Use cursor locking to prevent flip delays
  drm/amd/display: Update downspread percent to match spreadsheet for DCN2.1
  drm/amd/display: Defer cursor update around VUPDATE for all ASIC
  drm/amd/display: fix rn soc bb update
  drm/amd/display: check if REFCLK_CNTL register is present
  drm/amdgpu: bump version for invalidate L2 before SDMA IBs
  drm/amdgpu: invalidate L2 before SDMA IBs (v2)
  drm/amdgpu: add tiling flags from Mesa
  drm/amd/powerplay: avoid using pm_en before it is initialized revised
  Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"
  drm/qxl: qxl_release leak in qxl_hw_surface_alloc()
  drm/qxl: qxl_release leak in qxl_draw_dirty_fb()
  drm/virtio: only destroy created contexts
  drm/dp_mst: Fix drm_dp_send_dpcd_write() return code
  drm/i915/gt: Check cacheline is valid before acquiring
  drm/i915/gem: Hold obj->vma.lock over for_each_ggtt_vma()
  ...
2020-05-01 11:01:51 -07:00
Song Liu
d46edd671a bpf: Sharing bpf runtime stats with BPF_ENABLE_STATS
Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
Typical userspace tools use kernel.bpf_stats_enabled as follows:

  1. Enable kernel.bpf_stats_enabled;
  2. Check program run_time_ns;
  3. Sleep for the monitoring period;
  4. Check program run_time_ns again, calculate the difference;
  5. Disable kernel.bpf_stats_enabled.

The problem with this approach is that only one userspace tool can toggle
this sysctl. If multiple tools toggle the sysctl at the same time, the
measurement may be inaccurate.

To fix this problem while keep backward compatibility, introduce a new
bpf command BPF_ENABLE_STATS. On success, this command enables stats and
returns a valid fd. BPF_ENABLE_STATS takes argument "type". Currently,
only one type, BPF_STATS_RUN_TIME, is supported. We can extend the
command to support other types of stats in the future.

With BPF_ENABLE_STATS, user space tool would have the following flow:

  1. Get a fd with BPF_ENABLE_STATS, and make sure it is valid;
  2. Check program run_time_ns;
  3. Sleep for the monitoring period;
  4. Check program run_time_ns again, calculate the difference;
  5. Close the fd.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200430071506.1408910-2-songliubraving@fb.com
2020-05-01 10:36:32 -07:00
Felix Kuehling
0aeaaf64e6 drm/amdkfd: Fix comment formatting
Corrected two function names. Added a missing space.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01 10:00:19 -04:00
Johannes Berg
d07dcf9aad netlink: add infrastructure to expose policies to userspace
Add, and use in generic netlink, helpers to dump out a netlink
policy to userspace, including all the range validation data,
nested policies etc.

This lets userspace discover what the kernel understands.

For families/commands other than generic netlink, the helpers
need to be used directly in an appropriate command, or we can
add some infrastructure (a new netlink family) that those can
register their policies with for introspection. I'm not that
familiar with non-generic netlink, so that's left out for now.

The data exposed to userspace also includes min and max length
for binary/string data, I've done that instead of letting the
userspace tools figure out whether min/max is intended based
on the type so that we can extend this later in the kernel, we
might want to just use the range data for example.

Because of this, I opted to not directly expose the NLA_*
values, even if some of them are already exposed via BPF, as
with min/max length we don't need to have different types here
for NLA_BINARY/NLA_MIN_LEN/NLA_EXACT_LEN, we just make them
all NL_ATTR_TYPE_BINARY with min/max length optionally set.

Similarly, we don't really need NLA_MSECS, and perhaps can
remove it in the future - but not if we encode it into the
userspace API now. It gets mapped to NL_ATTR_TYPE_U64 here.

Note that the exposing here corresponds to the strict policy
interpretation, and NLA_UNSPEC items are omitted entirely.
To get those, change them to NLA_MIN_LEN which behaves in
exactly the same way, but is exposed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 17:51:42 -07:00
Dave Airlie
c62098c991 A few resources-related fixes for qxl, some doc build warnings and ioctl
fixes for dma-buf, an off-by-one fix in edid, and a return code fix in
 DP-MST
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXqrvVgAKCRDj7w1vZxhR
 xazeAPMFYoHj36L2cbr7lrUDER2s6cNdCpUGN0tuQx9fYjmQAP0RRCXpbfyFESvf
 MG5BBZvARO7OUtUCujogiPbmAVn1DQ==
 =r+Az
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2020-04-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

A few resources-related fixes for qxl, some doc build warnings and ioctl
fixes for dma-buf, an off-by-one fix in edid, and a return code fix in
DP-MST

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430153201.wx6of2b2gsoip7bk@gilmour.lan
2020-05-01 10:42:09 +10:00
Mauro Carvalho Chehab
06bfa47e72 docs: networking: convert timestamping.txt to ReST
- add SPDX header;
- add a document title;
- adjust titles and chapters, adding proper markups;
- mark code blocks and literals as such;
- adjust identation, whitespaces and blank lines where needed;
- add to networking/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:56:38 -07:00
Dmitry Yakunin
b1f3e43dbf inet_diag: add support for cgroup filter
This patch adds ability to filter sockets based on cgroup v2 ID.
Such filter is helpful in ss utility for filtering sockets by
cgroup pathname.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:54:02 -07:00
Dmitry Yakunin
6e3a401fc8 inet_diag: add cgroup id attribute
This patch adds cgroup v2 ID to common inet diag message attributes.
Cgroup v2 ID is kernfs ID (ino or ino+gen). This attribute allows filter
inet diag output by cgroup ID obtained by name_to_handle_at() syscall.
When net_cls or net_prio cgroup is activated this ID is equal to 1 (root
cgroup ID) for newly created sockets.

Some notes about this ID:

1) gets initialized in socket() syscall
2) incoming socket gets ID from listening socket
   (not during accept() syscall)
3) not changed when process get moved to another cgroup
4) can point to deleted cgroup (refcounting)

v2:
  - use CONFIG_SOCK_CGROUP_DATA instead if CONFIG_CGROUPS

v3:
  - fix attr size by using nla_total_size_64bit() (Eric Dumazet)
  - more detailed commit message (Konstantin Khlebnikov)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-By: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:54:01 -07:00
Tom Lendacky
97f9ac3db6 crypto: ccp - Add support for SEV-ES to the PSP driver
To provide support for SEV-ES, the hypervisor must provide an area of
memory to the PSP. Once this Trusted Memory Region (TMR) is provided to
the PSP, the contents of this area of memory are no longer available to
the x86.

Update the PSP driver to allocate a 1MB region for the TMR that is 1MB
aligned and then provide it to the PSP through the SEV INIT command.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30 15:19:33 +10:00
David S. Miller
323e395f19 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for nf-next:

1) Add IPS_HW_OFFLOAD status bit, from Bodong Wang.

2) Remove 128-bit limit on the set element data area, rise it
   to 64 bytes.

3) Report EOPNOTSUPP for unsupported NAT types and flags.

4) Set up nft_nat flags from the control plane path.

5) Add helper functions to set up the nf_nat_range2 structure.

6) Add netmap support for nft_nat.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29 14:14:44 -07:00
Andrii Nakryiko
f2e10bff16 bpf: Add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
Also enhance show_fdinfo to potentially include bpf_link type-specific
information (similarly to obj_info).

Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
UAPI. bpf_link_tracing also now will store and return bpf_attach_type.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-5-andriin@fb.com
2020-04-28 17:27:08 -07:00
Andrii Nakryiko
2d602c8cf4 bpf: Support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
Add support to look up bpf_link by ID and iterate over all existing bpf_links
in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
done as the very last step in finalizing bpf_link, together with installing
FD. This approach allows users of bpf_link in kernel code to not worry about
races between user-space and kernel code that hasn't finished attaching and
initializing bpf_link.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-4-andriin@fb.com
2020-04-28 17:27:08 -07:00
Andrii Nakryiko
a3b80e1078 bpf: Allocate ID for bpf_link
Generate ID for each bpf_link using IDR, similarly to bpf_map and bpf_prog.
bpf_link creation, initialization, attachment, and exposing to user-space
through FD and ID is a complicated multi-step process, abstract it away
through bpf_link_primer and bpf_link_prime(), bpf_link_settle(), and
bpf_link_cleanup() internal API. They guarantee that until bpf_link is
properly attached, user-space won't be able to access partially-initialized
bpf_link either from FD or ID. All this allows to simplify bpf_link attachment
and error handling code.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-3-andriin@fb.com
2020-04-28 17:27:08 -07:00
Oak Zeng
5bb4b78be9 drm/amdkfd: New IOCTL to allocate queue GWS (v2)
Add a new kfd ioctl to allocate queue GWS. Queue
GWS is released on queue destroy.

v2: re-introduce this API with the following fixes squashed in:
- drm/amdkfd: fix null pointer dereference on dev
- drm/amdkfd: Return proper error code for gws alloc API
- drm/amdkfd: Remove GPU ID in GWS queue creation

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:30 -04:00
Pablo Neira Ayuso
3ff7ddb135 netfilter: nft_nat: add netmap support
This patch allows you to NAT the network address prefix onto another
network address prefix, a.k.a. netmapping.

Userspace must specify the NF_NAT_RANGE_NETMAP flag and the prefix
address through the NFTA_NAT_REG_ADDR_MIN and NFTA_NAT_REG_ADDR_MAX
netlink attributes.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-28 00:53:54 +02:00
Richard Guy Briggs
9d2161bed4 audit: log audit netlink multicast bind and unbind
Log information about programs connecting to and disconnecting from the
audit netlink multicast socket. This is needed so that during
investigations a security officer can tell who or what had access to the
audit trail.  This helps to meet the FAU_SAR.2 requirement for Common
Criteria.

Here is the systemd startup event:
type=PROCTITLE msg=audit(2020-04-22 10:10:21.787:10) : proctitle=/init
type=SYSCALL msg=audit(2020-04-22 10:10:21.787:10) : arch=x86_64 syscall=bind success=yes exit=0 a0=0x19 a1=0x555f4aac7e90 a2=0xc a3=0x7ffcb792ff44 items=0 ppid=0 pid=1 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=systemd exe=/usr/lib/systemd/systemd subj=kernel key=(null)
type=UNKNOWN[1335] msg=audit(2020-04-22 10:10:21.787:10) : pid=1 uid=root auid=unset tty=(none) ses=unset subj=kernel comm=systemd exe=/usr/lib/systemd/systemd nl-mcgrp=1 op=connect res=yes

And events from the test suite that just uses close():
type=PROCTITLE msg=audit(2020-04-22 11:47:08.501:442) : proctitle=/usr/bin/perl -w amcast_joinpart/test
type=SYSCALL msg=audit(2020-04-22 11:47:08.501:442) : arch=x86_64 syscall=bind success=yes exit=0 a0=0x7 a1=0x563004378760 a2=0xc a3=0x0 items=0 ppid=815 pid=818 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1335] msg=audit(2020-04-22 11:47:08.501:442) : pid=818 uid=root auid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=perl exe=/usr/bin/perl nl-mcgrp=1 op=connect res=yes

type=UNKNOWN[1335] msg=audit(2020-04-22 11:47:08.501:443) : pid=818 uid=root auid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=perl exe=/usr/bin/perl nl-mcgrp=1 op=disconnect res=yes

And the events from the test suite using setsockopt with NETLINK_DROP_MEMBERSHIP:
type=PROCTITLE msg=audit(2020-04-22 11:39:53.291:439) : proctitle=/usr/bin/perl -w amcast_joinpart/test
type=SYSCALL msg=audit(2020-04-22 11:39:53.291:439) : arch=x86_64 syscall=bind success=yes exit=0 a0=0x7 a1=0x5560877c2d20 a2=0xc a3=0x0 items=0 ppid=772 pid=775 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1335] msg=audit(2020-04-22 11:39:53.291:439) : pid=775 uid=root auid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=perl exe=/usr/bin/perl nl-mcgrp=1 op=connect res=yes

type=PROCTITLE msg=audit(2020-04-22 11:39:53.292:440) : proctitle=/usr/bin/perl -w amcast_joinpart/test
type=SYSCALL msg=audit(2020-04-22 11:39:53.292:440) : arch=x86_64 syscall=setsockopt success=yes exit=0 a0=0x7 a1=SOL_NETLINK a2=0x2 a3=0x7ffc8366f000 items=0 ppid=772 pid=775 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1335] msg=audit(2020-04-22 11:39:53.292:440) : pid=775 uid=root auid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=perl exe=/usr/bin/perl nl-mcgrp=1 op=disconnect res=yes

Please see the upstream issue tracker at
  https://github.com/linux-audit/audit-kernel/issues/28
With the feature description at
  https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Multicast-Socket-Join-Part
The testsuite support is at
  https://github.com/rgbriggs/audit-testsuite/compare/ghak28-mcast-part-join
  https://github.com/linux-audit/audit-testsuite/pull/93
And the userspace support patch is at
  https://github.com/linux-audit/audit-userspace/pull/114

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-27 18:49:31 -04:00
Linus Torvalds
869997be0e hyperv-fixes for 5.7-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl6mwOETHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXrFLB/4yKsrl41WwYRbTKgiir576/LA0vGxQ
 cZjUQwkVv3S5/AfhvpwiGFV4dBV6j81KtNhRE6luaa3FBHObnjrx5tNqMw/P8a0j
 HZGZ68n4qE+OPVtTxj54s81iWIi9vgT/La92GPYhuXoiVPTd5zJ2lwY3so04BSFJ
 p30+RZFKNkTjNYZNZSHcoodr+js4Uws8JSn8OmpCJr8Gt+FJqkujQROG3HMKhJlk
 KlJlCJhV48tj/nlgcbGHBF0Yy5l8DVCaKIz+MiF5F/i+P8r0cErfyihc9Ene0/un
 LNFhIVGn8/MTi0CVrltcnur2qFH1qPCuLolKSpd/FKd6H2UDgK16XgAd
 =NJP/
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V fixes from Wei Liu:

 - Two patches from Dexuan fixing suspension bugs

 - Three cleanup patches from Andy and Michael

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyper-v: Remove internal types from UAPI header
  hyper-v: Use UUID API for exporting the GUID
  x86/hyperv: Suspend/resume the VP assist page for hibernation
  Drivers: hv: Move AEOI determination to architecture dependent code
  Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM
2020-04-27 13:28:27 -07:00
Horatiu Vultur
3e54442c93 net: bridge: Add port attribute IFLA_BRPORT_MRP_RING_OPEN
This patch adds a new port attribute, IFLA_BRPORT_MRP_RING_OPEN, which allows
to notify the userspace when the port lost the continuite of MRP frames.

This attribute is set by kernel whenever the SW or HW detects that the ring is
being open or closed.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:40:25 -07:00
Horatiu Vultur
4714d13791 bridge: uapi: mrp: Add mrp attributes.
Add new nested netlink attribute to configure the MRP. These attributes are used
by the userspace to add/delete/configure MRP instances and by the kernel to
notify the userspace when the MRP ring gets open/closed. MRP nested attribute
has the following attributes:

IFLA_BRIDGE_MRP_INSTANCE - the parameter type is br_mrp_instance which contains
  the instance id, and the ifindex of the two ports. The ports can't be part of
  multiple instances. This is used to create/delete MRP instances.

IFLA_BRIDGE_MRP_PORT_STATE - the parameter type is u32. Which can be forwarding,
  blocking or disabled.

IFLA_BRIDGE_MRP_PORT_ROLE - the parameter type is br_mrp_port_role which
  contains the instance id and the role. The role can be primary or secondary.

IFLA_BRIDGE_MRP_RING_STATE - the parameter type is br_mrp_ring_state which
  contains the instance id and the state. The state can be open or closed.

IFLA_BRIDGE_MRP_RING_ROLE - the parameter type is br_mrp_ring_role which
  contains the instance id and the ring role. The role can be MRM or MRC.

IFLA_BRIDGE_MRP_START_TEST - the parameter type is br_mrp_start_test which
  contains the instance id, the interval at which to send the MRP_Test frames,
  how many test frames can be missed before declaring the ring open and the
  period which represent for how long to send the test frames.

Also add the file include/uapi/linux/mrp_bridge.h which defines all the types
used by MRP that are also needed by the userpace.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:40:25 -07:00
Daniel Vetter
a5bff92eaa dma-buf: Fix SET_NAME ioctl uapi
The uapi is the same on 32 and 64 bit, but the number isn't. Everyone
who botched this please re-read:

https://www.kernel.org/doc/html/v5.4-preprc-cpu/ioctl/botching-up-ioctls.html

Also, the type argument for the ioctl macros is for the type the void
__user *arg pointer points at, which in this case would be the
variable-sized char[] of a 0 terminated string. So this was botched in
more than just the usual ways.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Chenbo Feng <fengc@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: minchan@kernel.org
Cc: surenb@google.com
Cc: jenhaochen@google.com
Cc: Martin Liu <liumartin@google.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Tested-by: Martin Liu <liumartin@google.com>
Reviewed-by: Martin Liu <liumartin@google.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
  [sumits: updated some checkpatch fixes, corrected author email]
Link: https://patchwork.freedesktop.org/patch/msgid/20200407133002.3486387-1-daniel.vetter@ffwll.ch
2020-04-27 16:29:41 +05:30
Bodong Wang
74f99482ea netfilter: nf_conntrack: add IPS_HW_OFFLOAD status bit
This bit indicates that the conntrack entry is offloaded to hardware
flow table. nf_conntrack entry will be tagged with [HW_OFFLOAD] if
it's offload to hardware.

cat /proc/net/nf_conntrack
	ipv4 2 tcp 6 \
	src=1.1.1.17 dst=1.1.1.16 sport=56394 dport=5001 \
	src=1.1.1.16 dst=1.1.1.17 sport=5001 dport=56394 [HW_OFFLOAD] \
	mark=0 zone=0 use=3

Note that HW_OFFLOAD/OFFLOAD/ASSURED are mutually exclusive.

Changelog:

* V1->V2:
- Remove check of lastused from stats. It was meant for cases such
  as removing driver module while traffic still running. Better to
  handle such cases from garbage collector.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-26 23:54:35 +02:00
Maciej Żenczykowski
71d1921477 bpf: add bpf_ktime_get_boot_ns()
On a device like a cellphone which is constantly suspending
and resuming CLOCK_MONOTONIC is not particularly useful for
keeping track of or reacting to external network events.
Instead you want to use CLOCK_BOOTTIME.

Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns()
based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-04-26 09:43:05 -07:00
David S. Miller
d483389678 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Simple overlapping changes to linux/vermagic.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25 20:18:53 -07:00
Linus Torvalds
ab51cac00e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix memory leak in netfilter flowtable, from Roi Dayan.

 2) Ref-count leaks in netrom and tipc, from Xiyu Yang.

 3) Fix warning when mptcp socket is never accepted before close, from
    Florian Westphal.

 4) Missed locking in ovs_ct_exit(), from Tonghao Zhang.

 5) Fix large delays during PTP synchornization in cxgb4, from Rahul
    Lakkireddy.

 6) team_mode_get() can hang, from Taehee Yoo.

 7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver,
    from Niklas Schnelle.

 8) Fix handling of bpf XADD on BTF memory, from Jann Horn.

 9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson.

10) Missing queue memory release in iwlwifi pcie code, from Johannes
    Berg.

11) Fix NULL deref in macvlan device event, from Taehee Yoo.

12) Initialize lan87xx phy correctly, from Yuiko Oshino.

13) Fix looping between VRF and XFRM lookups, from David Ahern.

14) etf packet scheduler assumes all sockets are full sockets, which is
    not necessarily true. From Eric Dumazet.

15) Fix mptcp data_fin handling in RX path, from Paolo Abeni.

16) fib_select_default() needs to handle nexthop objects, from David
    Ahern.

17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun.

18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca.

19) Correct rx/tx stats in bcmgenet driver, from Doug Berger.

20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix
    from Luke Nelson.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits)
  selftests/bpf: Fix a couple of broken test_btf cases
  tools/runqslower: Ensure own vmlinux.h is picked up first
  bpf: Make bpf_link_fops static
  bpftool: Respect the -d option in struct_ops cmd
  selftests/bpf: Add test for freplace program with expected_attach_type
  bpf: Propagate expected_attach_type when verifying freplace programs
  bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
  bpf, x86_32: Fix logic error in BPF_LDX zero-extension
  bpf, x86_32: Fix clobbering of dst for BPF_JSET
  bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
  bpf: Fix reStructuredText markup
  net: systemport: suppress warnings on failed Rx SKB allocations
  net: bcmgenet: suppress warnings on failed Rx SKB allocations
  macsec: avoid to set wrong mtu
  mac80211: sta_info: Add lockdep condition for RCU list usage
  mac80211: populate debugfs only after cfg80211 init
  net: bcmgenet: correct per TX/RX ring statistics
  net: meth: remove spurious copyright text
  net: phy: bcm84881: clear settings on link down
  chcr: Fix CPU hard lockup
  ...
2020-04-24 19:17:30 -07:00
Jakub Wilk
a33d314794 bpf: Fix reStructuredText markup
The patch fixes:
$ scripts/bpf_helpers_doc.py > bpf-helpers.rst
$ rst2man bpf-helpers.rst > bpf-helpers.7
bpf-helpers.rst:1105: (WARNING/2) Inline strong start-string without end-string.

Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200422082324.2030-1-jwilk@jwilk.net
2020-04-24 17:01:26 -07:00
David Matlack
acd05785e4 kvm: add capability for halt polling
KVM_CAP_HALT_POLL is a per-VM capability that lets userspace
control the halt-polling time, allowing halt-polling to be tuned or
disabled on particular VMs.

With dynamic halt-polling, a VM's VCPUs can poll from anywhere from
[0, halt_poll_ns] on each halt. KVM_CAP_HALT_POLL sets the
upper limit on the poll time.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jon Cargille <jcargill@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200417221446.108733-1-jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-24 12:53:17 -04:00
Johannes Berg
155d7c7338 nl80211: allow client-only BIGTK support
The current NL80211_EXT_FEATURE_BEACON_PROTECTION feature flag
requires both AP and client support, add a new one called
NL80211_EXT_FEATURE_BEACON_PROTECTION_CLIENT that enables only
support in client (and P2P-client) modes.

Link: https://lore.kernel.org/r/20200420140559.6ba704053a5a.Ifeb869fb0b48e52fe0cb9c15572b93ac8a924f8d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-04-24 12:33:41 +02:00
Johannes Berg
9dba48a6ec cfg80211: support multicast RX registration
For DPP, there's a need to receive multicast action frames,
but many drivers need a special filter configuration for this.

Support announcing from userspace in the management registration
that multicast RX is required, with an extended feature flag if
the driver handles this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20200417124013.c46238801048.Ib041d437ce0bff28a0c6d5dc915f68f1d8591002@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-04-24 12:33:41 +02:00
Jouni Malinen
4d797fce78 cfg80211: Unprotected Beacon frame RX indication
Extend cfg80211_rx_unprot_mlme_mgmt() to cover indication of unprotected
Beacon frames in addition to the previously used Deauthentication and
Disassociation frames. The Beacon frame case is quite similar, but has
couple of exceptions: this is used both with fully unprotected and also
incorrectly protected frames and there is a rate limit on the events to
avoid unnecessary flooding netlink events in case something goes wrong.

Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200401142548.6990-1-jouni@codeaurora.org
[add missing kernel-doc]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-04-24 12:33:40 +02:00
Andy Shevchenko
f081bbb3fd hyper-v: Remove internal types from UAPI header
The uuid_le mistakenly comes to be an UAPI type. Since it's luckily not used by
Hyper-V APIs, we may replace with POD types, i.e. __u8 array.

Note, previously shared uuid_be had been removed from UAPI few releases ago.
This is a continuation of that process towards removing uuid_le one.

Note, there is no ABI change!

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200422131818.23088-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-04-22 21:10:05 +01:00
Ingo Molnar
87cfeb1920 perf/core fixes and improvements:
kernel + tools/perf:
 
   Alexey Budankov:
 
   - Introduce CAP_PERFMON to kernel and user space.
 
 callchains:
 
   Adrian Hunter:
 
   - Allow using Intel PT to synthesize callchains for regular events.
 
   Kan Liang:
 
   - Stitch LBR records from multiple samples to get deeper backtraces,
     there are caveats, see the csets for details.
 
 perf script:
 
   Andreas Gerstmayr:
 
   - Add flamegraph.py script
 
 BPF:
 
   Jiri Olsa:
 
   - Synthesize bpf_trampoline/dispatcher ksymbol events.
 
 perf stat:
 
   Arnaldo Carvalho de Melo:
 
   - Honour --timeout for forked workloads.
 
   Stephane Eranian:
 
   - Force error in fallback on :k events, to avoid counting nothing when
     the user asks for kernel events but is not allowed to.
 
 perf bench:
 
   Ian Rogers:
 
   - Add event synthesis benchmark.
 
 tools api fs:
 
   Stephane Eranian:
 
  - Make xxx__mountpoint() more scalable
 
 libtraceevent:
 
   He Zhe:
 
   - Handle return value of asprintf.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXp2LlQAKCRCyPKLppCJ+
 J95oAP0ZihVUhESv/gdeX0IDE5g6Rd2V6LNcRj+jb7gX9NlQkwD/UfS454WV1ftQ
 qTwrkKPzY/5Tm2cLuVE7r7fJ6naDHgU=
 =FHm4
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:

kernel + tools/perf:

  Alexey Budankov:

  - Introduce CAP_PERFMON to kernel and user space.

callchains:

  Adrian Hunter:

  - Allow using Intel PT to synthesize callchains for regular events.

  Kan Liang:

  - Stitch LBR records from multiple samples to get deeper backtraces,
    there are caveats, see the csets for details.

perf script:

  Andreas Gerstmayr:

  - Add flamegraph.py script

BPF:

  Jiri Olsa:

  - Synthesize bpf_trampoline/dispatcher ksymbol events.

perf stat:

  Arnaldo Carvalho de Melo:

  - Honour --timeout for forked workloads.

  Stephane Eranian:

  - Force error in fallback on :k events, to avoid counting nothing when
    the user asks for kernel events but is not allowed to.

perf bench:

  Ian Rogers:

  - Add event synthesis benchmark.

tools api fs:

  Stephane Eranian:

 - Make xxx__mountpoint() more scalable

libtraceevent:

  He Zhe:

  - Handle return value of asprintf.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 14:08:28 +02:00
Linus Torvalds
189522da8b virtio: fixes, cleanups
Some bug fixes.
 Cleanup a couple of issues that surfaced meanwhile.
 Disable vhost on ARM with OABI for now - to be fixed
 fully later in the cycle or in the next release.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl6d6ZgPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpH3oH/0bJ6o+FiAi8xXgYqm9XXmswrZoZLahjyPay
 dA7Sz5nNKVtdSGH9o0wRdcekt0SOI3ilZSkv9nwt9ep/5YzC3brf2hry+nPvMTsA
 MhI3IAa7sK1vCXkftwOlx+SIeDfIwsqr+h4SCfMRxlIT0yAmOC8fl2ByT2dIbqnj
 dlzwczecHI9LPUEmRWiKH/4Tj5MPZN5IeFSIAE+nA/9cl5h4qVSfYtWD3Y4VQ82g
 Rv3mvVE+chaVbPxewaBZ8Y0Avti4tMyzsE0MY+dz5xfh+75hqMfygg//1osbEAbz
 SiL5dDcANe8Q+QOc/BxHdj4dqpqUp1ldV+3Lge9k4lWAGnsEMEk=
 =GZb2
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes and cleanups from Michael Tsirkin:

 - Some bug fixes

 - Cleanup a couple of issues that surfaced meanwhile

 - Disable vhost on ARM with OABI for now - to be fixed fully later in
   the cycle or in the next release.

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits)
  vhost: disable for OABI
  virtio: drop vringh.h dependency
  virtio_blk: add a missing include
  virtio-balloon: Avoid using the word 'report' when referring to free page hinting
  virtio-balloon: make virtballoon_free_page_report() static
  vdpa: fix comment of vdpa_register_device()
  vdpa: make vhost, virtio depend on menu
  vdpa: allow a 32 bit vq alignment
  drm/virtio: fix up for include file changes
  remoteproc: pull in slab.h
  rpmsg: pull in slab.h
  virtio_input: pull in slab.h
  remoteproc: pull in slab.h
  virtio-rng: pull in slab.h
  virtgpu: pull in uaccess.h
  tools/virtio: make asm/barrier.h self contained
  tools/virtio: define aligned attribute
  virtio/test: fix up after IOTLB changes
  vhost: Create accessors for virtqueues private_data
  vdpasim: Return status in vdpasim_get_status
  ...
2020-04-21 12:27:18 -07:00
Maheshwar Ajja
1ca3cb46a9 media: v4l2-ctrl: Add H264 profile and levels
Add H264 profile "Contrained High" and H264 levels "5.2",
"6.0", "6.1" and "6.2".

Signed-off-by: Maheshwar Ajja <majja@codeaurora.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-04-21 13:26:26 +02:00
Mauro Carvalho Chehab
af690f4593 firewire: firewire-cdev.hL get rid of a docs warning
This warning:

	./include/uapi/linux/firewire-cdev.h:312: WARNING: Inline literal start-string without end-string.

is because %FOO doesn't work if there's a parenthesis at the
string (as a parenthesis may indicate a function). So, mark
the literal block using the alternate ``FOO`` syntax.

Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/9b2501a41eba27ccdd4603cac2353c0efba7a90a.1586881715.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-04-20 15:45:41 -06:00
Mauro Carvalho Chehab
3ecad8c2c1 docs: fix broken references for ReST files that moved around
Some broken references happened due to shifting files around
and ReST renames. Those can't be auto-fixed by the script,
so let's fix them manually.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Link: https://lore.kernel.org/r/64773a12b4410aaf3e3be89e3ec7e34de2484eea.1586881715.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-04-20 15:45:03 -06:00
Mauro Carvalho Chehab
72ef5e52b3 docs: fix broken references to text files
Several references got broken due to txt to ReST conversion.

Several of them can be automatically fixed with:

	scripts/documentation-file-ref-check --fix

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> # hwtracing/coresight/Kconfig
Reviewed-by: Paul E. McKenney <paulmck@kernel.org> # memory-barrier.txt
Acked-by: Alex Shi <alex.shi@linux.alibaba.com> # translations/zh_CN
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> # translations/it_IT
Acked-by: Marc Zyngier <maz@kernel.org> # kvm/arm64
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6f919ddb83a33b5f2a63b6b5f0575737bb2b36aa.1586881715.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-04-20 15:35:59 -06:00
Andrew Lunn
eec517cdb4 net: Add IF_OPER_TESTING
RFC 2863 defines the operational state testing. Add support for this
state, both as a IF_LINK_MODE_ and __LINK_STATE_.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20 12:43:24 -07:00
Sumit Garg
104edb94cc tee: add private login method for kernel clients
There are use-cases where user-space shouldn't be allowed to communicate
directly with a TEE device which is dedicated to provide a specific
service for a kernel client. So add a private login method for kernel
clients and disallow user-space to open-session using GP implementation
defined login method range: (0x80000000 - 0xBFFFFFFF).

Reviewed-by: Jerome Forissier <jerome@forissier.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2020-04-20 16:18:14 +02:00
Antony Antony
29e4276667 xfrm: fix error in comment
s/xfrm_state_offload/xfrm_user_offload/

Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-04-20 07:26:42 +02:00
Linus Torvalds
1340283741 flexible-array member convertion patches for 5.7-rc2
Hi Linus,
 
 Please, pull the following patches that replace zero-length arrays with
 flexible-array members.
 
 The current codebase makes use of the zero-length array language
 extension to the C90 standard, but the preferred mechanism to declare
 variable-length types such as these ones is a flexible array member[1][2],
 introduced in C99:
 
 struct foo {
         int stuff;
         struct boo array[];
 };
 
 By making use of the mechanism above, we will get a compiler warning
 in case the flexible array does not occur last in the structure, which
 will help us prevent some kind of undefined behavior bugs from being
 inadvertently introduced[3] to the codebase from now on.
 
 Also, notice that, dynamic memory allocations won't be affected by
 this change:
 
 "Flexible array members have incomplete type, and so the sizeof operator
 may not be applied. As a quirk of the original implementation of
 zero-length arrays, sizeof evaluates to zero."[1]
 
 sizeof(flexible-array-member) triggers a warning because flexible array
 members have incomplete type[1]. There are some instances of code in
 which the sizeof operator is being incorrectly/erroneously applied to
 zero-length arrays and the result is zero. Such instances may be hiding
 some bugs. So, this work (flexible-array member convertions) will also
 help to get completely rid of those sorts of issues.
 
 Notice that all of these patches have been baking in linux-next for
 quite a while now and, 238 more of these patches have already been
 merged into 5.7-rc1.
 
 There are a couple hundred more of these issues waiting to be addressed
 in the whole codebase.
 
 [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
 [2] https://github.com/KSPP/linux/issues/21
 [3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
 
 Thanks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl6bccgACgkQRwW0y0cG
 2zFYvBAAl5tsoZsb6h5o7+XpWetl2BfA8lRelXWg1la9mF+Zqgqz8raubs+EbR8f
 65yz1lvoOl3jgeu1pQnx+AaDdG88Yu66BjPpFz/n8WWBjNC0z3M4Xcu+pFUanEzO
 QqkCPryj6RlqCYL/WlSCifo+ZOAeM7jlw/2kkX1ILVwjYItFPJIw+5IEPrM0ucN2
 tFp9H3iKOlA6PDuj4JO2xCnlUkL5aZk101qKqm41yZLLiS8zE8or4+s8Y7c7yDDP
 ajQ+uCzJpt/VCn6Iyri0oZ5hp+gI6jJ8ox1Vo0UCuWQ2RJ7E2FE5qhhctwB4UYsg
 +B6c1yckJqUoJ1c7Bbj00gsNMns3A7uLHFDOGBKQTjkRCn5+QV1wVvv5TJx2LJYL
 EBt07IfS0YAv0EBIbJyxqzmWCt0unKCu3i1KePp/FYqq291dpr39olUMCa1+Qg98
 v1VTGUlOvONy3v41tDx+Bfkt/0ebT8pogyenA51cjsD0bUZ3I/BsGxigXf0myLuy
 6yFjx7f6ng2I3uBDSZ+H/KUM51H6yhB9UCQuQCSqHDU3iEHvh7dDdumD3A9OJyLw
 nPC2HQhTOHVkbtg/E0KFh/ak1PoELCH3CR1Kgj/NSOG2Mz5tgtBfoxa+GwJTvJha
 9m5JrBQcT7qF7pGtZU0NDQICrhhvUEX/Hwo3QAtYInWPsV3S+5U=
 =GsIm
 -----END PGP SIGNATURE-----

Merge tag 'flexible-array-member-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull flexible-array member conversion from Gustavo Silva:
 "The current codebase makes use of the zero-length array language
  extension to the C90 standard, but the preferred mechanism to declare
  variable-length types such as these ones is a flexible array
  member[1][2], introduced in C99:

    struct foo {
        int stuff;
        struct boo array[];
    };

  By making use of the mechanism above, we will get a compiler warning
  in case the flexible array does not occur last in the structure, which
  will help us prevent some kind of undefined behavior bugs from being
  inadvertently introduced[3] to the codebase from now on.

  Also, notice that, dynamic memory allocations won't be affected by
  this change:

   "Flexible array members have incomplete type, and so the sizeof
    operator may not be applied. As a quirk of the original
    implementation of zero-length arrays, sizeof evaluates to zero."[1]

  sizeof(flexible-array-member) triggers a warning because flexible
  array members have incomplete type[1]. There are some instances of
  code in which the sizeof operator is being incorrectly/erroneously
  applied to zero-length arrays and the result is zero. Such instances
  may be hiding some bugs. So, this work (flexible-array member
  convertions) will also help to get completely rid of those sorts of
  issues.

  Notice that all of these patches have been baking in linux-next for
  quite a while now and, 238 more of these patches have already been
  merged into 5.7-rc1.

  There are a couple hundred more of these issues waiting to be
  addressed in the whole codebase"

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

* tag 'flexible-array-member-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (28 commits)
  xattr.h: Replace zero-length array with flexible-array member
  uapi: linux: fiemap.h: Replace zero-length array with flexible-array member
  uapi: linux: dlm_device.h: Replace zero-length array with flexible-array member
  tpm_eventlog.h: Replace zero-length array with flexible-array member
  ti_wilink_st.h: Replace zero-length array with flexible-array member
  swap.h: Replace zero-length array with flexible-array member
  skbuff.h: Replace zero-length array with flexible-array member
  sched: topology.h: Replace zero-length array with flexible-array member
  rslib.h: Replace zero-length array with flexible-array member
  rio.h: Replace zero-length array with flexible-array member
  posix_acl.h: Replace zero-length array with flexible-array member
  platform_data: wilco-ec.h: Replace zero-length array with flexible-array member
  memcontrol.h: Replace zero-length array with flexible-array member
  list_lru.h: Replace zero-length array with flexible-array member
  lib: cpu_rmap: Replace zero-length array with flexible-array member
  irq.h: Replace zero-length array with flexible-array member
  ihex.h: Replace zero-length array with flexible-array member
  igmp.h: Replace zero-length array with flexible-array member
  genalloc.h: Replace zero-length array with flexible-array member
  ethtool.h: Replace zero-length array with flexible-array member
  ...
2020-04-19 10:34:30 -07:00
Gustavo A. R. Silva
6e88abb862 uapi: linux: fiemap.h: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2020-04-18 15:44:56 -05:00
Gustavo A. R. Silva
d6cdad8703 uapi: linux: dlm_device.h: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2020-04-18 15:44:56 -05:00
Alexander Duyck
31ba514b2f virtio-balloon: Avoid using the word 'report' when referring to free page hinting
It can be confusing to have multiple features within the same driver that
are using the same verbage. As such this patch is creating a union of
free_page_report_cmd_id with free_page_hint_cmd_id so that we can clean-up
the userspace code a bit in terms of readability while maintaining the
functionality of legacy code.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Link: https://lore.kernel.org/r/20200415174318.13597.99753.stgit@localhost.localdomain
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-17 06:05:30 -04:00
Linus Torvalds
c8372665b4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Disable RISCV BPF JIT builds when !MMU, from Björn Töpel.

 2) nf_tables leaves dangling pointer after free, fix from Eric Dumazet.

 3) Out of boundary write in __xsk_rcv_memcpy(), fix from Li RongQing.

 4) Adjust icmp6 message source address selection when routes have a
    preferred source address set, from Tim Stallard.

 5) Be sure to validate HSR protocol version when creating new links,
    from Taehee Yoo.

 6) CAP_NET_ADMIN should be sufficient to manage l2tp tunnels even in
    non-initial namespaces, from Michael Weiß.

 7) Missing release firmware call in mlx5, from Eran Ben Elisha.

 8) Fix variable type in macsec_changelink(), caught by KASAN. Fix from
    Taehee Yoo.

 9) Fix pause frame negotiation in marvell phy driver, from Clemens
    Gruber.

10) Record RX queue early enough in tun packet paths such that XDP
    programs will see the correct RX queue index, from Gilberto Bertin.

11) Fix double unlock in mptcp, from Florian Westphal.

12) Fix offset overflow in ARM bpf JIT, from Luke Nelson.

13) marvell10g needs to soft reset PHY when coming out of low power
    mode, from Russell King.

14) Fix MTU setting regression in stmmac for some chip types, from
    Florian Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
  amd-xgbe: Use __napi_schedule() in BH context
  mISDN: make dmril and dmrim static
  net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
  net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
  tipc: fix incorrect increasing of link window
  Documentation: Fix tcp_challenge_ack_limit default value
  net: tulip: make early_486_chipsets static
  dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400
  ipv6: remove redundant assignment to variable err
  net/rds: Use ERR_PTR for rds_message_alloc_sgs()
  net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge
  selftests/bpf: Check for correct program attach/detach in xdp_attach test
  libbpf: Fix type of old_fd in bpf_xdp_set_link_opts
  libbpf: Always specify expected_attach_type on program load if supported
  xsk: Add missing check on user supplied headroom size
  mac80211: fix channel switch trigger from unknown mesh peer
  mac80211: fix race in ieee80211_register_hw()
  net: marvell10g: soft-reset the PHY when coming out of low power
  net: marvell10g: report firmware version
  net/cxgb4: Check the return from t4_query_params properly
  ...
2020-04-16 14:52:29 -07:00
Alexey Budankov
9807372822 capabilities: Introduce CAP_PERFMON to kernel and user space
Introduce the CAP_PERFMON capability designed to secure system
performance monitoring and observability operations so that CAP_PERFMON
can assist CAP_SYS_ADMIN capability in its governing role for
performance monitoring and observability subsystems.

CAP_PERFMON hardens system security and integrity during performance
monitoring and observability operations by decreasing attack surface that
is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
chances to misuse the credentials and makes the operation more secure.

Thus, CAP_PERFMON implements the principle of least privilege for
performance monitoring and observability operations (POSIX IEEE 1003.1e:
2.2.2.39 principle of least privilege: A security design principle that
  states that a process or program be granted only those privileges
(e.g., capabilities) necessary to accomplish its legitimate function,
and only for the time that such privileges are actually required)

CAP_PERFMON meets the demand to secure system performance monitoring and
observability operations for adoption in security sensitive, restricted,
multiuser production environments (e.g. HPC clusters, cloud and virtual compute
environments), where root or CAP_SYS_ADMIN credentials are not available to
mass users of a system, and securely unblocks applicability and scalability
of system performance monitoring and observability operations beyond root
and CAP_SYS_ADMIN use cases.

CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance
monitoring and observability operations and balances amount of CAP_SYS_ADMIN
credentials following the recommendations in the capabilities man page [1]
for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel
developers, below." For backward compatibility reasons access to system
performance monitoring and observability subsystems of the kernel remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability
usage for secure system performance monitoring and observability operations
is discouraged with respect to the designed CAP_PERFMON capability.

Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official hardware issues mitigation procedure [2]. The bugs
in the software itself can be fixed following the standard kernel development
process [3] to maintain and harden security of system performance monitoring
and observability operations.

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-16 12:19:06 -03:00
Linus Torvalds
6cc9306b8f for-5.7-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl6VskEACgkQxWXV+ddt
 WDurrhAAhkxlh6yrdZqr753DcpdEVAQhyHDsJ66GAKWuW8sn7ypTiZhNgKxvEGuz
 UhwtXTlzZ7K9/h3TsVeih2iqEj6oc8ick+Th+Wf/7s0jhUXDcWi2OqBjTnIiH2Za
 efrwGMiOEAHYqQ7tHjEbZiJGcQ2tE7+2Le4g3aFnv/kRT0jXDikzLTa/viMG73k5
 9llSm+GJYl2KQNcUPmxGKrwwiiV5c5xNCGuEuY4lw+3OVn1QU4rayZDB/5GxZ/nC
 72Efl9CxoDunBviys2NWxYTt/Ts3R/+yhnGX0kM6BovkN0bo1pA7HuWkADqYPnNN
 r8z8X/zFYi7jZBwpPq4alcHW2IaMC7UEseEyZHlj9ce8pK8MnHFlBtfBcUzbvFl5
 Wtt23AvAZ9CiQ40Sf5UBt6pliUQhr/BpBz88jatZ619ij1GLxeO++I5bIz3/YFQH
 UEP7okhoqpxgKLFGRcpxkw0ggOipp7isFyfss2qaRMPebmNMKnuuUoEy5BDlHs2f
 ewxbyuSUVXVBJMB4R6u77Nk5KLrTO67kfiCROaVKkzhYDESpbB4Trdl+kvzPSFb6
 p3NYpJoGnkOKngG/vg5MoQGOp1oi4h3RH2Ck1Yes7jmBgYLSCQokCUXkm52PGfId
 25P45yOzwS4W7sVFXsR3rygpexXlcNAIGG+2xtiw/AyFIQo5AZ4=
 =pkZ2
 -----END PGP SIGNATURE-----

Merge tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "We have a few regressions and one fix for stable:

   - revert fsync optimization

   - fix lost i_size update

   - fix a space accounting leak

   - build fix, add back definition of a deprecated ioctl flag

   - fix search condition for old roots in relocation"

* tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: re-instantiate the removed BTRFS_SUBVOL_CREATE_ASYNC definition
  btrfs: fix reclaim counter leak of space_info objects
  btrfs: make full fsyncs always operate on the entire file again
  btrfs: fix lost i_size update after cloning inline extent
  btrfs: check commit root generation in should_ignore_root
2020-04-14 11:51:30 -07:00
Eugene Syromiatnikov
34c51814b2 btrfs: re-instantiate the removed BTRFS_SUBVOL_CREATE_ASYNC definition
The commit 9c1036fdb1 ("btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC
support") breaks strace build with the kernel headers from git:

    btrfs.c: In function "btrfs_test_subvol_ioctls":
    btrfs.c:531:23: error: "BTRFS_SUBVOL_CREATE_ASYNC" undeclared (first use
    in this function)
       vol_args_v2.flags = BTRFS_SUBVOL_CREATE_ASYNC;

Moreover, it is improper to break UAPI, strace uses the definitions to
decode ioctls that are considered part of public API.

Restore the macro definition and put it under "#ifndef __KERNEL__"
in order to prevent inadvertent in-kernel usage.

Fixes: 9c1036fdb1 ("btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-10 18:48:27 +02:00
Linus Torvalds
0906d8b975 IOMMU Updates for Linux v5.7
Including:
 
 	- ARM-SMMU support for the TLB range invalidation command in
 	  SMMUv3.2.
 
 	- ARM-SMMU introduction of command batching helpers to batch up
 	  CD and ATC invalidation.
 
 	- ARM-SMMU support for PCI PASID, along with necessary PCI
 	  symbol exports.
 
 	- Introduce a generic (actually rename an existing) IOMMU
 	  related pointer in struct device and reduce the IOMMU related
 	  pointers.
 
 	- Some fixes for the OMAP IOMMU driver to make it build on 64bit
 	  architectures.
 
 	- Various smaller fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl6MmlQACgkQK/BELZcB
 GuP9ug//QtyPYRYdO4ltD6mPvfB7V0qdXksJz+ZVbPOMvqUs1jr1FVYFH1HkOVu5
 mFD6OJuQQJrrGukXMERyVgDhUqNr+xHrkGS+X67NrOkUrguyvUfLYSU/GmOH/kdk
 w1Smp7pTcHHAMmxGyQWTSFa9jSxKes5ZYBo065Z3/SlcIcTTkbw7V87N3RPrlnCX
 s/K7CtSGnKJMpL9DVuNH27eqGlfiuIrQhj/vTQVSn1nF7TjaGKXaRXj+3fcUgrIt
 KAfflWiTJncMY6WLjz65iiUtUvgA2Mmgn3CKJnWjgECd70+NybLQ9OAvQO+A2H6s
 8XO9DsOOe8HFq/ljev1JGSw5LgB5Ip1RtSk7Ost6mkUFzLlaeTBJFQeHbECI9dne
 hksRYL4R8bwiQu+MkQe7HLa6TDb+asqjsayIO3M1oIpF+8mIz/oNOGCeP0cqSiuj
 lVMnblAWatrsZrf+AlxZKddIJWiduXoTjtpV64HTTvZeL4/g3kY0ykBXpS4xLj5V
 s0KvR6kjR1LYUgpe9jJ3CJTdIlU4MzSlrtq4CYFZvRa7rBLmk2cGsR1jiA3GTGpn
 bcqOQNgb5X1mpAzmOZb//pbjozgvCjQpQexyU4tRzs38yk+TK5OnOe5z4M1srHPY
 7dTZoUEpAcRm4K+JFQ3+yOtxRTsINYyFUL/Qt8ALbWy4hXluRGY=
 =nhuS
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - ARM-SMMU support for the TLB range invalidation command in SMMUv3.2

 - ARM-SMMU introduction of command batching helpers to batch up CD and
   ATC invalidation

 - ARM-SMMU support for PCI PASID, along with necessary PCI symbol
   exports

 - Introduce a generic (actually rename an existing) IOMMU related
   pointer in struct device and reduce the IOMMU related pointers

 - Some fixes for the OMAP IOMMU driver to make it build on 64bit
   architectures

 - Various smaller fixes and improvements

* tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (39 commits)
  iommu: Move fwspec->iommu_priv to struct dev_iommu
  iommu/virtio: Use accessor functions for iommu private data
  iommu/qcom: Use accessor functions for iommu private data
  iommu/mediatek: Use accessor functions for iommu private data
  iommu/renesas: Use accessor functions for iommu private data
  iommu/arm-smmu: Use accessor functions for iommu private data
  iommu/arm-smmu: Refactor master_cfg/fwspec usage
  iommu/arm-smmu-v3: Use accessor functions for iommu private data
  iommu: Introduce accessors for iommu private data
  iommu/arm-smmu: Fix uninitilized variable warning
  iommu: Move iommu_fwspec to struct dev_iommu
  iommu: Rename struct iommu_param to dev_iommu
  iommu/tegra-gart: Remove direct access of dev->iommu_fwspec
  drm/msm/mdp5: Remove direct access of dev->iommu_fwspec
  ACPI/IORT: Remove direct access of dev->iommu_fwspec
  iommu: Define dev_iommu_fwspec_get() for !CONFIG_IOMMU_API
  iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
  iommu/virtio: Fix freeing of incomplete domains
  iommu/virtio: Fix sparse warning
  iommu/vt-d: Add build dependency on IOASID
  ...
2020-04-08 11:00:00 -07:00
Linus Torvalds
9bb715260e virtio: fixes, vdpa
Some bug fixes.
 The new vdpa subsystem with two first drivers.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl6MS7wPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpGp8H/2H49Gya1cfVbGU13qgmBSQqQXC8hS3iNLuG
 ltRgU+jafJT//kvkdm3/DUzfK3eRUWUfqZLKEbAQDtMY0OGHi/KGEBYVLDde7Zxt
 Lg4VnwBhkYDR/f01ZZDbHxzj9JAr83i28nILjLIqf3a1BX4zf203+ZE0/JM8a7wL
 dOPoH7NAfyz5ul2F67bR1IOF8vC6TidpavzR2+HC/MocHYXb6Bgfvt+i4EcrfuMf
 9lnBfajgklKr9sNJniwvvR1pWVg+YyG3VeC6T8tIC/xzbCmIoNT+5b3q2XPSIHq1
 EuQTeXH9CBFXS0qcFlq2ktR1xd1Lx95hKwZpqLwLFDmfgjhV2QU=
 =/84P
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - Some bug fixes

 - The new vdpa subsystem with two first drivers

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-balloon: Revert "virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM"
  vdpa: move to drivers/vdpa
  virtio: Intel IFC VF driver for VDPA
  vdpasim: vDPA device simulator
  vhost: introduce vDPA-based backend
  virtio: introduce a vDPA based transport
  vDPA: introduce vDPA bus
  vringh: IOTLB support
  vhost: factor out IOTLB
  vhost: allow per device message handler
  vhost: refine vhost and vringh kconfig
  virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  virtio-net: Introduce hash report feature
  virtio-net: Introduce RSS receive steering feature
  virtio-net: Introduce extended RSC feature
  tools/virtio: option to build an out of tree module
2020-04-08 10:51:53 -07:00
Linus Torvalds
9ebe5422ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 "An update to the Goodix touchscreen driver to enable it work properly
  on various Bay Trail and Cherry Trail devices, and a few other
  assorted changes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (26 commits)
  Input: update SPDX tag for input-event-codes.h
  Input: i8042 - add Acer Aspire 5738z to nomux list
  Input: goodix - fix compilation when ACPI support is disabled
  dt-bindings: touchscreen: Convert edt-ft5x06 to json-schema
  Input: of_touchscreen - explicitly choose axis
  Input: goodix - support gt9147 touchpanel
  dt-bindings: touchscreen: goodix: support of gt9147
  Input: goodix - add support for Goodix GT917S
  Input: goodix - use string-based chip ID
  dt-bindings: input: touchscreen: add compatible string for Goodix GT917S
  Input: goodix - add support for more then one touch-key
  Input: goodix - fix spurious key release events
  Input: goodix - try to reset the controller if the i2c-test fails
  Input: goodix - restore config on resume if necessary
  Input: goodix - make goodix_send_cfg() take a raw buffer as argument
  Input: goodix - add minimum firmware size check
  Input: goodix - save a copy of the config from goodix_read_config()
  Input: goodix - move defines to above struct goodix_ts_data declaration
  Input: goodix - add support for controlling the IRQ pin through ACPI methods
  Input: goodix - add support for getting IRQ + reset GPIOs on Bay Trail devices
  ...
2020-04-07 20:20:12 -07:00
David S. Miller
c2c1128902 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net, they are:

1) Fix spurious overlap condition in the rbtree tree, from Stefano Brivio.

2) Fix possible uninitialized pointer dereference in nft_lookup.

3) IDLETIMER v1 target matches the Android layout, from
   Maciej Zenczykowski.

4) Dangling pointer in nf_tables_set_alloc_name, from Eric Dumazet.

5) Fix RCU warning splat in ipset find_set_type(), from Amol Grover.

6) Report EOPNOTSUPP on unsupported set flags and object types in sets.

7) Add NFT_SET_CONCAT flag to provide consistent error reporting
   when users defines set with ranges in concatenations in old kernels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-07 18:08:06 -07:00
Linus Torvalds
63bef48fd6 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - a lot more of MM, quite a bit more yet to come: (memcg, pagemap,
   vmalloc, pagealloc, migration, thp, ksm, madvise, virtio,
   userfaultfd, memory-hotplug, shmem, rmap, zswap, zsmalloc, cleanups)

 - various other subsystems (procfs, misc, MAINTAINERS, bitops, lib,
   checkpatch, epoll, binfmt, kallsyms, reiserfs, kmod, gcov, kconfig,
   ubsan, fault-injection, ipc)

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (158 commits)
  ipc/shm.c: make compat_ksys_shmctl() static
  ipc/mqueue.c: fix a brace coding style issue
  lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability"
  ubsan: include bug type in report header
  kasan: unset panic_on_warn before calling panic()
  ubsan: check panic_on_warn
  drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks
  ubsan: split "bounds" checker from other options
  ubsan: add trap instrumentation option
  init/Kconfig: clean up ANON_INODES and old IO schedulers options
  kernel/gcov/fs.c: replace zero-length array with flexible-array member
  gcov: gcc_3_4: replace zero-length array with flexible-array member
  gcov: gcc_4_7: replace zero-length array with flexible-array member
  kernel/kmod.c: fix a typo "assuems" -> "assumes"
  reiserfs: clean up several indentation issues
  kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
  samples/hw_breakpoint: drop use of kallsyms_lookup_name()
  samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes
  fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
  fs/binfmt_elf.c: allocate less for static executable
  ...
2020-04-07 14:11:54 -07:00
Linus Torvalds
762a9f2f01 This pull request contains the following changes for UML:
- New mode for time travel, external via virtio
 - Fixes for ubd to make sure no requests can get lost
 - Fixes for vector networking
 - Allow CONFIG_STATIC_LINK only when possible
 - Minor cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl6MbGYWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wSY2D/4k1kb3A5pZ6OEXCkKmRU63j0RC
 na0bsa4lztMuABgOWKXP09cqL2ZhJ1rVVRUMV7jgVFKj7rKkJHHGHgdBeEkXOcb8
 skOVxln1X/i3T9q9QQ4ofkSk0U8gHCZA3pqrn7TFI9ZmrosOUYwhQKkqcNHvSfPc
 XEjKUx1GCS+wA0mw5yLyDZqDGkZgMNSmNezR7Oq3EB9wi8K2n6Racn6//S/uqiS6
 I8HHE7R2ci0YfflP+xE8i1qg8/TY2wj2oCP33b9o/XefyyNSndVj7KQUI3KRBmSh
 M0k2sbOqegVzSH/l5YFIZ7zbDcqkYeGWopPIuYWo3en7ZmfJfP2KD31c8gPOuElC
 HuUvQyS1VDpLn6JBa8Y456e8IrKl/QquXfZDc2qG5HYTR6g9nv9y8VNtx4dSQ+sB
 AfgErKofx7x2JQNRfg+0BYKgw/MawGAjiSZm5qVNfvFM3YDWZSUZ9gEAcX6qto/z
 P+66Zrhatdt9TaQdy9vbQKDWSJk9ood2mQYU0JJSfzgsotWslyvCsc6ANtwfkc7R
 sLxnsa6EA7CYogbMJ7wRxD5spCNZrRZvepHhe5uft/nWG/qGM1jy7Vk16Or03sVH
 sScIp6m+yDyhhEjJOT8Mq6WbM3mIfILMb42FyDJQIpJ9JcXSxzbiZu7RSK38yoEG
 +WYGOYdTGgzxIWsRmQ==
 =WVcL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

 - New mode for time travel, external via virtio

 - Fixes for ubd to make sure no requests can get lost

 - Fixes for vector networking

 - Allow CONFIG_STATIC_LINK only when possible

 - Minor cleanups and fixes

* tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Remove some unnecessary NULL checks in vector_user.c
  um: vector: Avoid NULL ptr deference if transport is unset
  um: Make CONFIG_STATIC_LINK actually static
  um: Implement cpu_relax() as ndelay(1) for time-travel
  um: Implement ndelay/udelay in time-travel mode
  um: Implement time-travel=ext
  um: virtio: Implement VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS
  um: time-travel: Rewrite as an event scheduler
  um: Move timer-internal.h to non-shared
  hostfs: Use kasprintf() instead of fixed buffer formatting
  um: falloc.h needs to be directly included for older libc
  um: ubd: Retry buffer read on any kind of error
  um: ubd: Prevent buffer overrun on command completion
  um: Fix overlapping ELF segments when statically linked
  um: Delete never executed timer
  um: Don't overwrite ethtool driver version
  um: Fix len of file in create_pid_file
  um: Don't use console_drivers directly
  um: Cleanup CONFIG_IOSCHED_CFQ
2020-04-07 12:36:09 -07:00
Shaohua Li
e06f1e1dd4 userfaultfd: wp: enabled write protection in userfaultfd API
Now it's safe to enable write protection in userfaultfd API

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220163112.11409-15-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Andrea Arcangeli
63b2d4174c userfaultfd: wp: add the writeprotect API to userfaultfd ioctl
Introduce the new uffd-wp APIs for userspace.

Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking
using the new UFFDIO_REGISTER_MODE_WP flag.  Note that this flag can
co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the
userspace program can not only resolve missing page faults, and at the
same time tracking page data changes along the way.

Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level
write protection tracking.  Note that we will need to register the memory
region with UFFDIO_REGISTER_MODE_WP before that.

[peterx@redhat.com: write up the commit message]
[peterx@redhat.com: remove useless block, write commit message, check against
 VM_MAYWRITE rather than VM_WRITE when register]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Andrea Arcangeli
72981e0e7b userfaultfd: wp: add UFFDIO_COPY_MODE_WP
This allows UFFDIO_COPY to map pages write-protected.

[peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
 around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
 commit messages]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Alexander Duyck
b0c504f154 virtio-balloon: add support for providing free page reports to host
Add support for the page reporting feature provided by virtio-balloon.
Reporting differs from the regular balloon functionality in that is is
much less durable than a standard memory balloon.  Instead of creating a
list of pages that cannot be accessed the pages are only inaccessible
while they are being indicated to the virtio interface.  Once the
interface has acknowledged them they are placed back into their respective
free lists and are once again accessible by the guest system.

Unlike a standard balloon we don't inflate and deflate the pages.  Instead
we perform the reporting, and once the reporting is completed it is
assumed that the page has been dropped from the guest and will be faulted
back in the next time the page is accessed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>
Cc: wei qi <weiqi4@huawei.com>
Link: http://lkml.kernel.org/r/20200211224657.29318.68624.stgit@localhost.localdomain
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Pablo Neira Ayuso
ef516e8625 netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag
Stefano originally proposed to introduce this flag, users hit EOPNOTSUPP
in new binaries with old kernels when defining a set with ranges in
a concatenation.

Fixes: f3a2181e16 ("netfilter: nf_tables: Support for sets with multiple ranged fields")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-07 18:23:04 +02:00
Dmitry Torokhov
cd510679f4 Merge branch 'next' into for-linus
Prepare input updates for 5.7 merge window.
2020-04-06 20:56:50 -07:00
Linus Torvalds
b6ff10700d \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl6LFJ0ACgkQnJ2qBz9k
 QNkSUQgAzwaescnHeVTF7/Zg9Uj2xrfrTJZ1E+Mn9qnd/0/z/asVV+RKfY7Gnu7h
 g19inDI4ZESFz2gWz4jwJD1c2/yMZb8vnae4ye3dtCv2yjG/0JxCeue6vjwsWqmO
 4jbSgk8YNQqzwEFVMzNp43ZJr3CFooLCIsJcL8q4yYk8Kt4pDUPmQ1vBvAc6k9vK
 BKMBvp926tbomP27nq0n0CjvHy7ipDGMl4H6i4vBxHRfbDPih2x9VEklK3JatC1n
 4AKS6IYJrkZVdOjli+DrResbcWxyT4db5tPio5MU0RDnVhNZT2cHyNVXf5EpRJqP
 72pa7gfPu1Rx1+tU8bDR/daSveou2A==
 =fkCV
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "This implements the fanotify FAN_DIR_MODIFY event.

  This event reports the name in a directory under which a change
  happened and together with the directory filehandle and fstatat()
  allows reliable and efficient implementation of directory
  synchronization"

* tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: Fix the checks in fanotify_fsid_equal
  fanotify: report name info for FAN_DIR_MODIFY event
  fanotify: record name info for FAN_DIR_MODIFY event
  fanotify: Drop fanotify_event_has_fid()
  fanotify: prepare to report both parent and child fid's
  fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name
  fanotify: divorce fanotify_path_event and fanotify_fid_event
  fanotify: Store fanotify handles differently
  fanotify: Simplify create_fd()
  fanotify: fix merging marks masks with FAN_ONDIR
  fanotify: merge duplicate events on parent and child
  fsnotify: replace inode pointer with an object id
  fsnotify: simplify arguments passing to fsnotify_parent()
  fsnotify: use helpers to access data by data_type
  fsnotify: funnel all dirent events through fsnotify_name()
  fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
  fsnotify: tidy up FS_ and FAN_ constants
2020-04-06 08:58:42 -07:00
Maciej Żenczykowski
bc9fe6143d netfilter: xt_IDLETIMER: target v1 - match Android layout
Android has long had an extension to IDLETIMER to send netlink
messages to userspace, see:
  https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/include/uapi/linux/netfilter/xt_IDLETIMER.h#42
Note: this is idletimer target rev 1, there is no rev 0 in
the Android common kernel sources, see registration at:
  https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/net/netfilter/xt_IDLETIMER.c#483

When we compare that to upstream's new idletimer target rev 1:
  https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git/tree/include/uapi/linux/netfilter/xt_IDLETIMER.h#n46

We immediately notice that these two rev 1 structs are the
same size and layout, and that while timer_type and send_nl_msg
are differently named and serve a different purpose, they're
at the same offset.

This makes them impossible to tell apart - and thus one cannot
know in a mixed Android/vanilla environment whether one means
timer_type or send_nl_msg.

Since this is iptables/netfilter uapi it introduces a problem
between iptables (vanilla vs Android) userspace and kernel
(vanilla vs Android) if the two don't match each other.

Additionally when at some point in the future Android picks up
5.7+ it's not at all clear how to resolve the resulting merge
conflict.

Furthermore, since upgrading the kernel on old Android phones
is pretty much impossible there does not seem to be an easy way
out of this predicament.

The only thing I've been able to come up with is some super
disgusting kernel version >= 5.7 check in the iptables binary
to flip between different struct layouts.

By adding a dummy field to the vanilla Linux kernel header file
we can force the two structs to be compatible with each other.

Long term I think I would like to deprecate send_nl_msg out of
Android entirely, but I haven't quite been able to figure out
exactly how we depend on it.  It seems to be very similar to
sysfs notifications but with some extra info.

Currently it's actually always enabled whenever Android uses
the IDLETIMER target, so we could also probably entirely
remove it from the uapi in favour of just always enabling it,
but again we can't upgrade old kernels already in the field.

(Also note that this doesn't change the structure's size,
as it is simply fitting into the pre-existing padding, and
that since 5.7 hasn't been released yet, there's still time
to make this uapi visible change)

Cc: Manoj Basapathi <manojbm@codeaurora.org>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-05 23:26:37 +02:00
Linus Torvalds
c48b07226b perf updates all over the place:
core:
 
    - Support for cgroup tracking in samples to allow cgroup based
      analysis
 
  tools:
 
    - Support for cgroup analysis
 
    - Commandline option and hotkey for perf top to change the sort order
 
    - A set of fixes all over the place
 
    - Various build system related improvements
 
    - Updates of the X86 pmu event JSON data
 
    - Documentation updates
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6J2iATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXMEEACy6WaNabX7EzkLUnX0WCgxnZVSryNR
 4EnxLJSg5lChEe4q2mOS3mRMpzlXHQieWcFxlwda7kjIbFlgQvjJQUiYlAvxRRO/
 giY/GwCtTi/Flcb+7wKxTgMYmtAUOZDWeQbBZUlFLi9vyeCHVkjget9EyVsgbe/W
 EmRsrPuKOVMUTeEwm3zpIE051DObpiWLNge++My70q5W/yNsS94PbNydgKO7osqk
 pX37YVyBFpI2IQxMGzaE3yK7OxXRjYljZaz1tONFDMEYOX9gmxpDsCCflsP1ZOzL
 4/P4faRvAOElwxtYBelKmRl8eboqhRpTEK0Et0TI0LYbUZrE2nisDi0LTKPWQb0k
 Om2Qi6AfZs67PVzx9htlx6rfee72+sUluz5BDKOGH0pNJ8CFy70ns8InLsZqbBZ7
 SgFVNjx6bHxB58VuVE9WEzr/KVs6zI/SuJlH7WG7FLXbm5j0cETfCjg40JlDpSNh
 CZs+Epky1zyytrVJ9Gc7KnRlw8VB2eWEQ2cQ0sqj5w7WxhfrnsCmQf1zk4sofhOF
 iz3Dvb9Llz/pYWGZbEiQAuI+8bo0psJptidzpmpbIXs/woKDhW49w1FxqowJQMWe
 +lGWcauSfo3gjgEnTkOWAx3yiH4i9rvRChX8Jh0Z07d6Kwf19YYfxcuhlkx0Wutj
 eKaaErWDtWAxpA==
 =2UUD
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more perf updates from Thomas Gleixner:
 "Perf updates all over the place:

  core:

   - Support for cgroup tracking in samples to allow cgroup based
     analysis

  tools:

   - Support for cgroup analysis

   - Commandline option and hotkey for perf top to change the sort order

   - A set of fixes all over the place

   - Various build system related improvements

   - Updates of the X86 pmu event JSON data

   - Documentation updates"

* tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  perf python: Fix clang detection to strip out options passed in $CC
  perf tools: Support Python 3.8+ in Makefile
  perf script: Fix invalid read of directory entry after closedir()
  perf script report: Fix SEGFAULT when using DWARF mode
  perf script: add -S/--symbols documentation
  perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
  perf events parser: Add missing Intel CPU events to parser
  perf script: Allow --symbol to accept hexadecimal addresses
  perf report/top TUI: Fix title line formatting
  perf top: Support hotkey to change sort order
  perf top: Support --group-sort-idx to change the sort order
  perf symbols: Fix arm64 gap between kernel start and module end
  perf build-test: Honour JOBS to override detection of number of cores
  perf script: Add --show-cgroup-events option
  perf top: Add --all-cgroups option
  perf record: Add --all-cgroups option
  perf record: Support synthesizing cgroup events
  perf report: Add 'cgroup' sort key
  perf cgroup: Maintain cgroup hierarchy
  perf tools: Basic support for CGROUP event
  ...
2020-04-05 12:26:24 -07:00
Linus Torvalds
bdabb68931 RTC for 5.7
Subsystem:
  - The rtc_time_to_tm and rtc_tm_to_time wrappers have finally been removed and
    only the 64bit version remain.
  - hctosys now works with drivers compiled as modules
 
 New driver:
  - MediaTek MT2712 SoC based RTC
 
 Drivers:
  - set range for 88pm860x, au1xxx, cpcap, da9052, davinci, ds1305, ds1374,
    mcp5121, pl030, pl031, pm8xxx, puv3, sa1100, sirfsoc, starfire, sun6i
  - ds1307: DS1388 oscillator failure detection and watchdog support
  - jz4740: JZ4760 support
  - pcf85063: clock out pin support
  - sun6i: external 32k oscillator is now optional, the range is now handled by
    the core, providing a solution for 2034.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEycoQi/giopmpPgB12wIijOdRNOUFAl6IgdAACgkQ2wIijOdR
 NOVH9Q//evfZmRDQb9Lm8YuSOt+U6lBaCuvRP8H4s7i02jsw0tt1oT6+rGo+1N+k
 pi64+iwPqDj7h+eds06jVur7yWZDNNfCSywlMU8ODGLQcc/hXMTvaIeJtdXG+I+7
 dpIVTs9Artiey2Qb3qZalSMYzrE2rQG6Yuyb/tHx7Pohvje1wKnAE3T64W6sfvir
 zjAwAr/+s9n82mjGtGLapz0onao7dSN4p6b3UIkph9KxPdeHjItd7cRiJuzioysj
 LE8WRUek++uha32PzghK9MoUpRKf8PJtGbsAIRhdwvmwub1Pqs6Uvk5IHQvdiFZx
 af1Qnw/4qUH03HVUuq8VdEkwa8mdthcZ72b8uzUWOqP+uPkeTTlQOGOd1dRECijK
 Qic+SKlEznwgKC9WuDufUQg5i2WjFBnBlA+dCE12/t91SbSlZdYjb53nQLHve1b9
 e4VfyKnS5Ghn/dfizzzSPdcWuPW1rqq2dHTiOvFVqlLn1Fkt1AKaJzGAeCe6PWp4
 AJVqPu+D1eQ/qmvEPXI7dQpCxdI+p6FNJgQvQLMssaq7mpf2fiRoLx23ikFG0v0k
 BeNkNWpLz2ACwHpsOa/RWLwa/FC2+KBHaRWf+As7te63byeDQC8hUEi7F5K78lSS
 yhfZ8SJR04SIIGKjuwAzPmaaIJ0hJT+eaxkFAJU7P5hD7KElsuM=
 =6Veu
 -----END PGP SIGNATURE-----

Merge tag 'rtc-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "More cleanup this cycle, with the final goal of removing the
  rtc_time_to_tm and rtc_tm_to_time wrappers. All the drivers that have
  been modified for this now are ready for the end of times (whether it
  happens in 2033, 2038, 2106, 2127 or even 4052). There is also a
  single new driver and the usual fixes and features.

  Summary:

  Subsystem:

   - The rtc_time_to_tm and rtc_tm_to_time wrappers have finally been
     removed and only the 64bit version remain.

   - hctosys now works with drivers compiled as modules

  New driver:

   - MediaTek MT2712 SoC based RTC

  Drivers:

   - set range for 88pm860x, au1xxx, cpcap, da9052, davinci, ds1305,
     ds1374, mcp5121, pl030, pl031, pm8xxx, puv3, sa1100, sirfsoc,
     starfire, sun6i

   - ds1307: DS1388 oscillator failure detection and watchdog support

   - jz4740: JZ4760 support

   - pcf85063: clock out pin support

   - sun6i: external 32k oscillator is now optional, the range is now
     handled by the core, providing a solution for 2034"

* tag 'rtc-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (87 commits)
  rtc: ds1307: check for failed memory allocation on wdt
  rtc: class: remove redundant assignment to variable err
  rtc: remove rtc_time_to_tm and rtc_tm_to_time
  rtc: sun6i: let the core handle rtc range
  rtc: sun6i: switch to rtc_time64_to_tm/rtc_tm_to_time64
  rtc: ds1307: add support for watchdog timer on ds1388
  rtc: da9052: switch to rtc_time64_to_tm/rtc_tm_to_time64
  rtc: da9052: set range
  rtc: da9052: convert to devm_rtc_allocate_device
  rtc: imx-sc: Align imx sc msg structs to 4
  rtc: fsl-ftm-alarm: report alarm to core
  rtc: pcf85063: Add pcf85063 clkout control to common clock framework
  rtc: make definitions in include/uapi/linux/rtc.h actually useful for user space
  rtc: class: avoid unnecessary lookup in hctosys
  dt-bindings: rtc: Convert and update jz4740-rtc doc to YAML
  rtc: jz4740: Rename vendor-specific DT properties
  rtc: jz4740: Add support for JZ4760 SoC
  rtc: class: support hctosys from modular RTC drivers
  rtc: pm8xxx: clear alarm register when alarm is not enabled
  rtc: omap: drop unused dt-bindings header
  ...
2020-04-04 10:38:01 -07:00
Linus Torvalds
828907ef25 This is the bulk of GPIO development for the v5.7 kernel cycle.
Core and userspace API:
 
 - The userspace API KFIFOs have been imoproved with locks that
   do not block interrupts. This makes us better at getting
   events to userspace without blocking or disturbing new events
   arriving in the same time. This was reviewed by the KFIFO
   maintainer Stefani. This is a generic improvement which
   paves the road for similar improvements in other subsystems.
 
 - We provide a new ioctl() for monitoring changes in the line
   information, such as when multiple clients are taking lines
   and giving them back, possibly reconfiguring them in the
   process: we can now monitor that and not get stuck with stale
   static information.
 
 - An example tool 'gpio-watch' is provided to showcase this
   functionality.
 
 - Timestamps for events are switched to ktime_get_ns() which is
   monotonic. We previously had a 'realtime' stamp which could
   move forward and *backward* in time, which probably would just
   cause silent bugs and weird behaviour. In the long run we
   see two relevant timestamps: ktime_get_ns() or the timestamp
   sometimes provided by the GPIO hardware itself, if that
   exists.
 
 - Device Tree overlay support for GPIO hogs. On systems that
   load overlays, these overlays can now contain hogs, and will
   then be respected.
 
 - Handle pin control interaction with nonexisting pin ranges
   in the GPIO library core instead of in the individual
   drivers.
 
 New drivers:
 
 - New driver for the Mellanox BlueField 2 GPIO controller.
 
 Driver improvements:
 
 - Introduce the BGPIOF_NO_SET_ON_INPUT flag to the generic
   MMIO GPIO library and use this flag in the MT7621 driver.
 
 - Texas Instruments OMAP CPU power management improvements,
   such as blocking of idle on pending GPIO interrupts.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl6IWUAACgkQQRCzN7AZ
 XXPReQ//WUHALqbLOL/DZR8kVyygVZXV8JC4jcuMQgX+Mvm1DQuGp1T8VPlG1wC7
 vy5eTz3oPc+jAaHGQgGiSGdPRQtDkAW5+Wo4hzcHJ5ATlzT7cgNywK3Jk6+YlPlF
 CuhHB6aPEymQ45/GseUwEb5AuWWKS03P3XdOBj2IdDhuRHRfyjHi+31s0lh6bfNs
 kC2eP+VtddVqLbOdPE1GKjZi+Iq9ffFRmiNTduHbl3o6ZoE6WnMvn2wsC6zPpJr4
 AM8zpj9JFUq1DYaQf8bJJLTm8lItcyyUUhq8oKUxBpecjWfhUYwEYHGxqWIk9fbM
 oyJ3zBlU0G7bc4gu1VG3cTacUf//BTmQG3iZNSSw9w17I1hOMX9wNJan22VkEbJD
 8X+6AcgkldNwnAVWcVgg1lkfgdxlWURMYXl1uQPQZSj2SQL6gFcdkAhKwlgl8jOi
 RctUfZ69tD+4ul2b+MhL6BtIk16o2RIBJF53ESxrR4qftRZ3ywW4JQ/AwNoQalxM
 Xg8U3axFm8kitOzemVoXGr/AIee9KpcA4NK8yQLEPmSFgBaVzaVlP9Q7Klr7QGXg
 RvK45uRULrWnvSyoVY7Ox+ie1NiG8JAbaxKBxkV8n+z+gVMh82xcyQX+5SgWxwW0
 dzeEx1B0mCW4WDAK29CcdQVE6m4K2u/fabhZq5IhgS1mQoTC894=
 =EFm/
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO development for the v5.7 kernel cycle.

  Core and userspace API:

   - The userspace API KFIFOs have been imoproved with locks that do not
     block interrupts. This makes us better at getting events to
     userspace without blocking or disturbing new events arriving in the
     same time. This was reviewed by the KFIFO maintainer Stefani. This
     is a generic improvement which paves the road for similar
     improvements in other subsystems.

   - We provide a new ioctl() for monitoring changes in the line
     information, such as when multiple clients are taking lines and
     giving them back, possibly reconfiguring them in the process: we
     can now monitor that and not get stuck with stale static
     information.

   - An example tool 'gpio-watch' is provided to showcase this
     functionality.

   - Timestamps for events are switched to ktime_get_ns() which is
     monotonic. We previously had a 'realtime' stamp which could move
     forward and *backward* in time, which probably would just cause
     silent bugs and weird behaviour. In the long run we see two
     relevant timestamps: ktime_get_ns() or the timestamp sometimes
     provided by the GPIO hardware itself, if that exists.

   - Device Tree overlay support for GPIO hogs. On systems that load
     overlays, these overlays can now contain hogs, and will then be
     respected.

   - Handle pin control interaction with nonexisting pin ranges in the
     GPIO library core instead of in the individual drivers.

  New drivers:

   - New driver for the Mellanox BlueField 2 GPIO controller.

  Driver improvements:

   - Introduce the BGPIOF_NO_SET_ON_INPUT flag to the generic MMIO GPIO
     library and use this flag in the MT7621 driver.

   - Texas Instruments OMAP CPU power management improvements, such as
     blocking of idle on pending GPIO interrupts"

* tag 'gpio-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (59 commits)
  Revert "gpio: eic-sprd: Use devm_platform_ioremap_resource()"
  pinctrl: Unconditionally assign .request()/.free()
  gpio: Unconditionally assign .request()/.free()
  gpio: export of_pinctrl_get to modules
  pinctrl: Define of_pinctrl_get() dummy for !PINCTRL
  gpio: Rename variable in core APIs
  gpio: Avoid using pin ranges with !PINCTRL
  gpiolib: Remove unused gpio_chip parameter from gpio_set_bias()
  gpiolib: Pass gpio_desc to gpio_set_config()
  gpiolib: Introduce gpiod_set_config()
  tools: gpio: Fix out-of-tree build regression
  gpio: gpiolib: fix a doc warning
  gpio: tegra186: Add Tegra194 pin ranges for GG.0 and GG.1
  gpio: tegra186: Add support for pin ranges
  gpio: Support GPIO controllers without pin-ranges
  ARM: integrator: impd1: Use GPIO_LOOKUP() helper macro
  gpio: brcmstb: support gpio-line-names property
  tools: gpio: Fix typo in gpio-utils
  tools: gpio-hammer: Apply scripts/Lindent and retain good changes
  gpiolib: gpio_name_to_desc: factor out !name check
  ...
2020-04-04 10:27:00 -07:00
Linus Torvalds
2fb732b33b VFIO updates for v5.7-rc1
- vfio-pci SR-IOV support (Alex Williamson)
 
  - vfio DMA read/write interface (Yan Zhao)
 
  - Fix vfio-platform erroneous IRQ error log (Eric Auger)
 
  - Fix shared ATSD support for NVLink on POWER (Sam Bobroff)
 
  - Fix init error without CONFIG_IOMMU_DMA (Andre Przywara)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJeh7XVAAoJECObm247sIsihpgP/A3xkJ4bS4KALoeKJwYOtWfP
 /KHNYZwlSno7WZxREsYxQvZpdbOmwj00IRl/ckvTNGZ5Urs3Po8PtZ1mmMmzqPj7
 oiLbusA2HapZUCncjtDtaiOhaaIP3bMc2SejtmBm8pWkNvCZNQm2MhjxKIS5jYbG
 XLRWN/OnRl6sk+w6QRYIMgYeXNXK25FVfE3NPlp/c1u75GU9cXOqO/oRwGJ4VcXJ
 ClGJUJQeGrQx6RNt2380e/SXCnYfnW92FqUVWmYuW1/4uznBbhdJ/OtRQz3eJ3AU
 i8jTWdvATIiQQsTwbFZIjOdQJbLPtFd4g5Q8tBb0sFTSkAnUamAqUeVn/bH9swNI
 9cq3xAcRx6ch1KDREIvO7M+SCrLTvIYcvGUc0QY8tPeUaH0aHXCU4tF3asxFU91P
 RCusvTK8bdjx0K90XsgXexzy45377uTnrwXzJaWlCFp6cZxL+QELLfvUrQ4ZOBFV
 MNilILh9/Beig6rxrzCB4K8A3YoMHAS6PnYyyb5iBLu86xbrxVk+LgUIt3eH7jGb
 n9+9UbPCPzVuMZfXV3bkh6zr/bOYu+LsASOptlOAQp29Kp0R5Al0o7IloDlBlFRX
 q7lIaHneC1IgKolj3Q+w4TXFvAtA6+GPdqkckDKwp9Ho5v7/4PlkuYpzEM67+cTx
 V79mNUyqw7zF/hyvVrew
 =RZWJ
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.7-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - vfio-pci SR-IOV support (Alex Williamson)

 - vfio DMA read/write interface (Yan Zhao)

 - Fix vfio-platform erroneous IRQ error log (Eric Auger)

 - Fix shared ATSD support for NVLink on POWER (Sam Bobroff)

 - Fix init error without CONFIG_IOMMU_DMA (Andre Przywara)

* tag 'vfio-v5.7-rc1' of git://github.com/awilliam/linux-vfio:
  vfio: Ignore -ENODEV when getting MSI cookie
  vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0]
  vfio/pci: Cleanup .probe() exit paths
  vfio/pci: Remove dev_fmt definition
  vfio/pci: Add sriov_configure support
  vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
  vfio/pci: Introduce VF token
  vfio/pci: Implement match ops
  vfio: Include optional device match in vfio_device_ops callbacks
  vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages
  vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  vfio: allow external user to get vfio group from device
  vfio: platform: Switch to platform_get_irq_optional()
2020-04-04 09:57:42 -07:00
Ingo Molnar
7dc41b9b99 perf/urgent fixes and improvements:
perf python:
 
   Arnaldo Carvalho de Melo:
 
   - Fix clang detection to strip out options passed in $CC.
 
 build:
 
   He Zhe:
 
   - Normalize gcc parameter when generating arch errno table, fixing
     the build by removing options from $(CC).
 
   Sam Lunt:
 
   - Support Python 3.8+ in Makefile.
 
 perf report/top:
 
   Arnaldo Carvalho de Melo:
 
   - Fix title line formatting.
 
 perf script:
 
   Andreas Gerstmayr:
 
   - Fix SEGFAULT when using DWARF mode.
 
   - Fix invalid read of directory entry after closedir(), found with valgrind.
 
   Hagen Paul Pfeifer:
 
   - Introduce --deltatime option.
 
   Stephane Eranian:
 
   - Allow --symbol to accept hexadecimal addresses.
 
   Ian Rogers:
 
   - Add -S/--symbols documentation
 
   Namhyung Kim:
 
   - Add --show-cgroup-events option.
 
 perf python:
 
   Arnaldo Carvalho de Melo:
 
   - Include rwsem.c in the python binding, needed by the cgroups improvements.
 
 build-test:
 
   Arnaldo Carvalho de Melo:
 
   - Honour JOBS to override detection of number of cores
 
 perf top:
 
   Jin Yao:
 
   - Support --group-sort-idx to change the sort order
 
   - perf top: Support hotkey to change sort order
 
 perf pmu-events x86:
 
   Jin Yao:
 
   - Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
 
 perf symbols arm64:
 
   Kemeng Shi:
 
   - Fix arm64 gap between kernel start and module end
 
 kernel perf subsystem:
 
   Namhyung Kim:
 
   - Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
     to allow cgroup tracking, saving a link between cgroup path and
     its id number.
 
 perf cgroup:
 
   Namhyung Kim:
 
   - Maintain cgroup hierarchy.
 
 perf report:
 
   Namhyung Kim:
 
   - Add 'cgroup' sort key.
 
 perf record:
 
   Namhyung Kim:
 
   - Support synthesizing cgroup events for pre-existing cgroups.
 
   - Add --all-cgroups option
 
 Documentation:
 
   Tony Jones:
 
   - Update docs regarding kernel/user space unwinding.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXodMtwAKCRCyPKLppCJ+
 J+leAP0Ws0dGzMSIwcMVc7zvK1IsYOTlZ8lYXJePxD+Po/YPdAEA6Squf4gwZ2wm
 b9R7w50dlCkMJ9LaueCeZZjh/4asFwQ=
 =auOb
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-5.7-20200403' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:

perf python:

  Arnaldo Carvalho de Melo:

  - Fix clang detection to strip out options passed in $CC.

build:

  He Zhe:

  - Normalize gcc parameter when generating arch errno table, fixing
    the build by removing options from $(CC).

  Sam Lunt:

  - Support Python 3.8+ in Makefile.

perf report/top:

  Arnaldo Carvalho de Melo:

  - Fix title line formatting.

perf script:

  Andreas Gerstmayr:

  - Fix SEGFAULT when using DWARF mode.

  - Fix invalid read of directory entry after closedir(), found with valgrind.

  Hagen Paul Pfeifer:

  - Introduce --deltatime option.

  Stephane Eranian:

  - Allow --symbol to accept hexadecimal addresses.

  Ian Rogers:

  - Add -S/--symbols documentation

  Namhyung Kim:

  - Add --show-cgroup-events option.

perf python:

  Arnaldo Carvalho de Melo:

  - Include rwsem.c in the python binding, needed by the cgroups improvements.

build-test:

  Arnaldo Carvalho de Melo:

  - Honour JOBS to override detection of number of cores

perf top:

  Jin Yao:

  - Support --group-sort-idx to change the sort order

  - perf top: Support hotkey to change sort order

perf pmu-events x86:

  Jin Yao:

  - Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric

perf symbols arm64:

  Kemeng Shi:

  - Fix arm64 gap between kernel start and module end

kernel perf subsystem:

  Namhyung Kim:

  - Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
    to allow cgroup tracking, saving a link between cgroup path and
    its id number.

perf cgroup:

  Namhyung Kim:

  - Maintain cgroup hierarchy.

perf report:

  Namhyung Kim:

  - Add 'cgroup' sort key.

perf record:

  Namhyung Kim:

  - Support synthesizing cgroup events for pre-existing cgroups.

  - Add --all-cgroups option

Documentation:

  Tony Jones:

  - Update docs regarding kernel/user space unwinding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-04 10:35:15 +02:00
Linus Torvalds
86f26a77cb pci-v5.7-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl6GTQMUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vy3PhAAmqpYBRobOsG8QbmKDjoJEFtkqdvD
 z6+4zf/R+hF11RyXjMDwihIe8d+tkQ4eAaYu6Oh5PrTyanz0G0PgeCrivZeytULk
 thqQIWzDQMVA5vN/2/Vy8s5s+3HzP8z/MZOFScJ7+xA1MndXptPRTNmFUbjx+GAv
 x8/pTp0u9AF6m7itX65DxXvwkzjWamt+Ar4Yx2IcuKAU/M5RtfuZO3PpDnqn7/wk
 JFlkRoYeFB6qNnnkPdeyPHl9dALhuhzgdTyklQEnKVW3nf3xThYDhcEwdh6kBQgl
 0dH8lL5LXy7PKGN8RES4wB0Vqndw/HlsCF5O4wkkfItbnbJxGJtS139e5973m0ud
 sgWvF4yJAT2jCKhIeNz34sePQJMyWALhv0XzZCsJ0YeGHsrV1jrHELkwUT1+eIsT
 3UV0iZ6aL06zQJDyKUbbIcQzEQ/wwBC+x9VgsyL54K1quCQZ1N1Nl/dvrb4cRG9m
 m9EhJK/brDf4c0uFlOmMTSxV1t5J+z6ZSQnh1ShD/o5yBsxqN6q5brDT6LEs+jbM
 LsIkA18jJOd4OyiDs98YiFKvIfFQbQ0LEBQpJwhF0snvfBFMMbUYN/T/NYneWON/
 F0TpkFoP7PXDuq55iNaLdnObfzrpC9kdzUyWvePUvjxIl55bkf+/qtUny+H48t4L
 dNggvW052d7BHes=
 =deWu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)

   - Add more 32 GT/s link speed decoding and improve the implementation
     (Yicong Yang)

  Resource management:

   - Add support for sizing programmable host bridge apertures and fix a
     related alpha Nautilus regression (Ivan Kokshaysky)

  Interrupts:

   - Add boot interrupt quirk mechanism for Xeon chipsets and document
     boot interrupts (Sean V Kelley)

  PCIe native device hotplug:

   - When possible, disable in-band presence detect and use PDS
     (Alexandru Gagniuc)

   - Add DMI table for devices that don't use in-band presence detection
     but don't advertise that correctly (Stuart Hayes)

   - Fix hang when powering slots up/down via sysfs (Lukas Wunner)

   - Fix an MSI interrupt race (Stuart Hayes)

  Virtualization:

   - Add ACS quirks for Zhaoxin devices (Raymond Pang)

  Error handling:

   - Add Error Disconnect Recover (EDR) support so firmware can report
     devices disconnected via DPC and we can try to recover (Kuppuswamy
     Sathyanarayanan)

  Peer-to-peer DMA:

   - Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
     Maier)

  ASPM:

   - Reduce severity of common clock config message (Chris Packham)

   - Clear the correct bits when enabling L1 substates, so we don't go
     to the wrong state (Yicong Yang)

  Endpoint framework:

   - Replace EPF linkup ops with notifier call chain and improve locking
     (Kishon Vijay Abraham I)

   - Fix concurrent memory allocation in OB address region (Kishon Vijay
     Abraham I)

   - Move PF function number assignment to EPC core to support multiple
     function creation methods (Kishon Vijay Abraham I)

   - Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)

   - Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
     Offset (Kishon Vijay Abraham I)

   - Add support for testing DMA transfers (Kishon Vijay Abraham I)

   - Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)

   - Add support for tests to clear IRQ (Kishon Vijay Abraham I)

   - Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)

  Amlogic Meson PCIe controller driver:

   - Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
     Pommarel)

   - Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
     Pommarel)

  Cadence PCIe controller driver:

   - Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
     Abraham I)

  Intel VMD host bridge driver:

   - Add two VMD Device IDs that require bus restriction mode (Sushma
     Kalakota)

  Mobiveil PCIe controller driver:

   - Refactor and modularize mobiveil driver (Hou Zhiqiang)

   - Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)

  Microsoft Hyper-V host bridge driver:

   - Add support for Hyper-V PCI protocol version 1.3 and
     PCI_BUS_RELATIONS2 (Long Li)

   - Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
     Feng)

   - Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)

  NVIDIA Tegra PCIe controller driver:

   - Use pci_parse_request_of_pci_ranges() (Rob Herring)

   - Add support for endpoint mode and related DT updates (Vidya Sagar)

   - Reduce -EPROBE_DEFER error message log level (Thierry Reding)

  Qualcomm PCIe controller driver:

   - Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)

  Synopsys DesignWare PCIe controller driver:

   - Refactor core initialization code for endpoint mode (Vidya Sagar)

   - Fix endpoint MSI-X to use correct table address (Kishon Vijay
     Abraham I)

  TI DRA7xx PCIe controller driver:

   - Fix MSI IRQ handling (Vignesh Raghavendra)

  TI Keystone PCIe controller driver:

   - Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)

  Miscellaneous:

   - Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
     Feng)

   - Use ioremap(), not phys_to_virt(), for platform ROM to fix video
     ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"

* tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
  misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
  PCI: tegra: Print -EPROBE_DEFER error message at debug level
  misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
  misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
  tools: PCI: Add 'e' to clear IRQ
  misc: pci_endpoint_test: Add ioctl to clear IRQ
  misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
  PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
  PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
  PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
  misc: pci_endpoint_test: Add support to get DMA option from userspace
  tools: PCI: Add 'd' command line option to support DMA
  misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
  PCI: endpoint: functions/pci-epf-test: Print throughput information
  PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
  PCI: pciehp: Fix MSI interrupt race
  PCI: pciehp: Fix indefinite wait on sysfs requests
  PCI: endpoint: Fix clearing start entry in configfs
  PCI: tegra: Add support for PCIe endpoint mode in Tegra194
  PCI: sysfs: Revert "rescan" file renames
  ...
2020-04-03 14:25:02 -07:00
Linus Torvalds
0ad5b053d4 Char/Misc driver patches for 5.7-rc1
Here is the big set of char/misc/other driver patches for 5.7-rc1.
 
 Lots of things in here, and it's later than expected due to some reverts
 to resolve some reported issues.  All is now clean with no reported
 problems in linux-next.
 
 Included in here is:
 	- interconnect updates
 	- mei driver updates
 	- uio updates
 	- nvmem driver updates
 	- soundwire updates
 	- binderfs updates
 	- coresight updates
 	- habanalabs updates
 	- mhi new bus type and core
 	- extcon driver updates
 	- some Kconfig cleanups
 	- other small misc driver cleanups and updates
 
 As mentioned, all have been in linux-next for a while, and with the last
 two reverts, all is calm and good.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodfvA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynzCQCfROhar3E8EhYEqSOP6xq6uhX9uegAnRgGY2rs
 rN4JJpOcTddvZcVlD+vo
 =ocWk
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big set of char/misc/other driver patches for 5.7-rc1.

  Lots of things in here, and it's later than expected due to some
  reverts to resolve some reported issues. All is now clean with no
  reported problems in linux-next.

  Included in here is:
   - interconnect updates
   - mei driver updates
   - uio updates
   - nvmem driver updates
   - soundwire updates
   - binderfs updates
   - coresight updates
   - habanalabs updates
   - mhi new bus type and core
   - extcon driver updates
   - some Kconfig cleanups
   - other small misc driver cleanups and updates

  As mentioned, all have been in linux-next for a while, and with the
  last two reverts, all is calm and good"

* tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (174 commits)
  Revert "driver core: platform: Initialize dma_parms for platform devices"
  Revert "amba: Initialize dma_parms for amba devices"
  amba: Initialize dma_parms for amba devices
  driver core: platform: Initialize dma_parms for platform devices
  bus: mhi: core: Drop the references to mhi_dev in mhi_destroy_device()
  bus: mhi: core: Initialize bhie field in mhi_cntrl for RDDM capture
  bus: mhi: core: Add support for reading MHI info from device
  misc: rtsx: set correct pcr_ops for rts522A
  speakup: misc: Use dynamic minor numbers for speakup devices
  mei: me: add cedar fork device ids
  coresight: do not use the BIT() macro in the UAPI header
  Documentation: provide IBM contacts for embargoed hardware
  nvmem: core: remove nvmem_sysfs_get_groups()
  nvmem: core: use is_bin_visible for permissions
  nvmem: core: use device_register and device_unregister
  nvmem: core: add root_only member to nvmem device struct
  extcon: axp288: Add wakeup support
  extcon: Mark extcon_get_edev_name() function as exported symbol
  extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
  dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
  ...
2020-04-03 13:22:40 -07:00
Linus Torvalds
d883600523 Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Christian extended clone3 so that processes can be spawned into
   cgroups directly.

   This is not only neat in terms of semantics but also avoids grabbing
   the global cgroup_threadgroup_rwsem for migration.

 - Daniel added !root xattr support to cgroupfs.

   Userland already uses xattrs on cgroupfs for bookkeeping. This will
   allow delegated cgroups to support such usages.

 - Prateek tried to make cpuset hotplug handling synchronous but that
   led to possible deadlock scenarios. Reverted.

 - Other minor changes including release_agent_path handling cleanup.

* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup-v1: Document the cpuset_v2_mode mount option
  Revert "cpuset: Make cpuset hotplug synchronous"
  cgroupfs: Support user xattrs
  kernfs: Add option to enable user xattrs
  kernfs: Add removed_size out param for simple_xattr_set
  kernfs: kvmalloc xattr value instead of kmalloc
  cgroup: Restructure release_agent_path handling
  selftests/cgroup: add tests for cloning into cgroups
  clone3: allow spawning processes into cgroups
  cgroup: add cgroup_may_write() helper
  cgroup: refactor fork helpers
  cgroup: add cgroup_get_from_file() helper
  cgroup: unify attach permission checking
  cpuset: Make cpuset hotplug synchronous
  cgroup.c: Use built-in RCU list checking
  kselftest/cgroup: add cgroup destruction test
  cgroup: Clean up css_set task traversal
2020-04-03 11:30:20 -07:00
Linus Torvalds
e964f1e04a dmaengine updates for v5.7-rc1
- Core:
     - Some code cleanup and optimization in core by Andy
     - Debugfs support for displaying dmaengine channels by Peter
 
   - Drivers:
     - New driver for uniphier-xdmac controller
     - Updates to stm32 dma, mdma and dmamux drivers and PM support
     - More updates to idxd drivers
     - Bunch of changes in tegra-apb driver and cleaning up of pm functions
     - Bunch of spelling fixes and Replace zero-length array patches
     - Shutdown hook for fsl-dpaa2-qdma driver
     - Support for interleaved transfers for ti-edma and virtualization
       support for k3-dma driver
     - Support for reset and updates in xilinx_dma driver
     - Improvements and locking updates in at_hdma driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl6FyokACgkQfBQHDyUj
 g0dnFRAAj9lvpflrL+b9eWBZkY1ElV1jAdxsTs4HnYdXQM3ijw8yOosVVSqiuiOy
 2qMfSRTP7qU9gqZ7oa1fnh05DqPmuTc3OF2IZlvGzkU9CiGQ735WGGGG8FfK/dZe
 F4OgQGwA45b47hNIbvM4acwWZYPL+pBuYusKdjdHkouqVM4SORiNM8aRrCJ59xIn
 P9TER//sMpdMEASuRuUIQnXb+OzSNPn1mLiP3zT0XHSM/nBMTAm7AnCDNT/Tjs9f
 hwk2j8rLrwllHGqeZln8cWLhUCPrZFNe5pBWtWyV3MyY/nxlrcUX0ndJUGJIDtsb
 nfXc4QKemOeF1RsC8DsQ/AY8jl6HFvRzWEEkq742IrLPCu/nTnxia4dbXW9MJ0Dp
 BI7IPwoaOoYqBdRkBnSVS2F4x3813egsEReznlu/sUorTIG2g9sWtmuzv6eRt4ow
 HczGgfdJXfCvIKbRg5TIXpbaJogbbB+1YrUlWq9vrZyhVw0ULtfxlWVKDy5VI1cL
 0Kiz/ZIGuoQ9h6E4G3jCpaQTV49tNbYp+vimU9kizmcm+WXrTXR7rgD4AI5tH2DQ
 pxYXNEl4gm1NRtWL1zzJ+B1C0MPXpc1Xafl92W39D6rphEGOdVVzay8meVIaQKDU
 qQaZ1dEK4uuSxwj8NrF7sXHSClafF888FFJBEMArde1HVql/HRU=
 =+UJ7
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-5.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine updates from Vinod Koul:
 "Core:
   - Some code cleanup and optimization in core by Andy

   - Debugfs support for displaying dmaengine channels by Peter

  Drivers:
   - New driver for uniphier-xdmac controller

   - Updates to stm32 dma, mdma and dmamux drivers and PM support

   - More updates to idxd drivers

   - Bunch of changes in tegra-apb driver and cleaning up of pm
     functions

   - Bunch of spelling fixes and Replace zero-length array patches

   - Shutdown hook for fsl-dpaa2-qdma driver

   - Support for interleaved transfers for ti-edma and virtualization
     support for k3-dma driver

   - Support for reset and updates in xilinx_dma driver

   - Improvements and locking updates in at_hdma driver"

* tag 'dmaengine-5.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (89 commits)
  dt-bindings: dma: renesas,usb-dmac: add r8a77961 support
  dmaengine: uniphier-xdmac: Remove redandant error log for platform_get_irq
  dmaengine: tegra-apb: Improve DMA synchronization
  dmaengine: tegra-apb: Don't save/restore IRQ flags in interrupt handler
  dmaengine: tegra-apb: mark PM functions as __maybe_unused
  dmaengine: fix spelling mistake "exceds" -> "exceeds"
  dmaengine: sprd: Set request pending flag when DMA controller is active
  dmaengine: ppc4xx: Use scnprintf() for avoiding potential buffer overflow
  dmaengine: idxd: remove global token limit check
  dmaengine: idxd: reflect shadow copy of traffic class programming
  dmaengine: idxd: Merge definition of dsa_batch_desc into dsa_hw_desc
  dmaengine: Create debug directories for DMA devices
  dmaengine: ti: k3-udma: Implement custom dbg_summary_show for debugfs
  dmaengine: Add basic debugfs support
  dmaengine: fsl-dpaa2-qdma: remove set but not used variable 'dpaa2_qdma'
  dmaengine: ti: edma: fix null dereference because of a typo in pointer name
  dmaengine: fsl-dpaa2-qdma: Adding shutdown hook
  dmaengine: uniphier-xdmac: Add UniPhier external DMA controller driver
  dt-bindings: dmaengine: Add UniPhier external DMA controller bindings
  dmaengine: ti: k3-udma: Implement support for atype (for virtualization)
  ...
2020-04-02 16:04:42 -07:00
Linus Torvalds
8c1b724ddb ARM:
* GICv4.1 support
 * 32bit host removal
 
 PPC:
 * secure (encrypted) using under the Protected Execution Framework
 ultravisor
 
 s390:
 * allow disabling GISA (hardware interrupt injection) and protected
 VMs/ultravisor support.
 
 x86:
 * New dirty bitmap flag that sets all bits in the bitmap when dirty
 page logging is enabled; this is faster because it doesn't require bulk
 modification of the page tables.
 * Initial work on making nested SVM event injection more similar to VMX,
 and less buggy.
 * Various cleanups to MMU code (though the big ones and related
 optimizations were delayed to 5.8).  Instead of using cr3 in function
 names which occasionally means eptp, KVM too has standardized on "pgd".
 * A large refactoring of CPUID features, which now use an array that
 parallels the core x86_features.
 * Some removal of pointer chasing from kvm_x86_ops, which will also be
 switched to static calls as soon as they are available.
 * New Tigerlake CPUID features.
 * More bugfixes, optimizations and cleanups.
 
 Generic:
 * selftests: cleanups, new MMU notifier stress test, steal-time test
 * CSV output for kvm_stat.
 
 KVM/MIPS has been broken since 5.5, it does not compile due to a patch committed
 by MIPS maintainers.  I had already prepared a fix, but the MIPS maintainers
 prefer to fix it in generic code rather than KVM so they are taking care of it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6GOnIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfxwf/ZKLZiRoaovXCOG71M/eHtQb8ZIqU
 3MPy+On3eC5Sk/aBxWUL9EFZsbYG6kYdbZ1VOvG9XPBoLlnkDSm/IR0kaELHtnjj
 oGVda/tvGn46Ne39y8xBptmb91WDcWH0vFthT/CwlMxAw3xjr+gG7Qyo+8F2CW6m
 SSSuLiHSBnyO1cQKruBTHZ8qnR8LlnfXEqtd6Y4LFLic0LbLIoIdRcT3wjQrcZrm
 Djd7wbTEYZjUfoqZ72ekwEDUsONcDLDSKcguDO9pSMSCGhpxCVT5Vy68KRpoIMs2
 nzNWDKjvqQo5zb2+GWxJgkd12Hv+n7PCXZMbVrWBu1pQsewUns9m4mkpGw==
 =6fGt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - GICv4.1 support

   - 32bit host removal

  PPC:
   - secure (encrypted) using under the Protected Execution Framework
     ultravisor

  s390:
   - allow disabling GISA (hardware interrupt injection) and protected
     VMs/ultravisor support.

  x86:
   - New dirty bitmap flag that sets all bits in the bitmap when dirty
     page logging is enabled; this is faster because it doesn't require
     bulk modification of the page tables.

   - Initial work on making nested SVM event injection more similar to
     VMX, and less buggy.

   - Various cleanups to MMU code (though the big ones and related
     optimizations were delayed to 5.8). Instead of using cr3 in
     function names which occasionally means eptp, KVM too has
     standardized on "pgd".

   - A large refactoring of CPUID features, which now use an array that
     parallels the core x86_features.

   - Some removal of pointer chasing from kvm_x86_ops, which will also
     be switched to static calls as soon as they are available.

   - New Tigerlake CPUID features.

   - More bugfixes, optimizations and cleanups.

  Generic:
   - selftests: cleanups, new MMU notifier stress test, steal-time test

   - CSV output for kvm_stat"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
  x86/kvm: fix a missing-prototypes "vmread_error"
  KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
  KVM: VMX: Add a trampoline to fix VMREAD error handling
  KVM: SVM: Annotate svm_x86_ops as __initdata
  KVM: VMX: Annotate vmx_x86_ops as __initdata
  KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
  KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
  KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
  KVM: VMX: Configure runtime hooks using vmx_x86_ops
  KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
  KVM: x86: Move init-only kvm_x86_ops to separate struct
  KVM: Pass kvm_init()'s opaque param to additional arch funcs
  s390/gmap: return proper error code on ksm unsharing
  KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
  KVM: Fix out of range accesses to memslots
  KVM: X86: Micro-optimize IPI fastpath delay
  KVM: X86: Delay read msr data iff writes ICR MSR
  KVM: PPC: Book3S HV: Add a capability for enabling secure guests
  KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
  KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
  ...
2020-04-02 15:13:15 -07:00
Bjorn Helgaas
b16f2ab280 Merge branch 'remotes/lorenzo/pci/endpoint'
- Use notification chain instead of EPF linkup ops for EPC events (Kishon
    Vijay Abraham I)

  - Protect concurrent allocation in endpoint outbound address region
    (Kishon Vijay Abraham I)

  - Protect concurrent access to pci_epf_ops (Kishon Vijay Abraham I)

  - Assign function number for each PF in endpoint core (Kishon Vijay
    Abraham I)

  - Refactor endpoint mode core initialization (Vidya Sagar)

  - Add API to notify when core initialization completes (Vidya Sagar)

  - Add test framework support to defer core initialization (Vidya Sagar)

  - Update Tegra SoC ABI header to support uninitialization of UPHY PLL
    when in endpoint mode without reference clock (Vidya Sagar)

  - Add DT and driver support for Tegra194 PCIe endpoint nodes (Vidya
    Sagar)

  - Add endpoint test support for DMA data transfer (Kishon Vijay
    Abraham I)

  - Print throughput information in endpoint test (Kishon Vijay Abraham I)

  - Use streaming DMA APIs for endpoint test buffer allocation (Kishon
    Vijay Abraham I)

  - Add endpoint test command line option for DMA (Kishon Vijay Abraham I)

  - When stopping a controller via configfs, clear endpoint "start" entry
    to prevent WARN_ON (Kunihiko Hayashi)

  - Update endpoint ->set_msix() to pay attention to MSI-X BAR Indicator
    and offset when finding MSI-X tables (Kishon Vijay Abraham I)

  - MSI-X tables are in local memory, not in the PCI address space.  Update
    pcie-designware-ep to account for this (Kishon Vijay Abraham I)

  - Allow AM654 PCIe Endpoint to raise MSI-X interrupts (Kishon Vijay
    Abraham I)

  - Avoid using module parameter to determine irqtype for endpoint test
    (Kishon Vijay Abraham I)

  - Add ioctl to clear IRQ for endpoint test (Kishon Vijay Abraham I)

  - Add endpoint test 'e' option to clear IRQ (Kishon Vijay Abraham I)

  - Bump limit on number of endpoint test devices from 10 to 10,000 (Kishon
    Vijay Abraham I)

  - Use full pci-endpoint-test name in request_irq() for easier profiling
    (Kishon Vijay Abraham I)

  - Reduce log level of -EPROBE_DEFER error messages to debug (Thierry
    Reding)

* remotes/lorenzo/pci/endpoint:
  misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
  PCI: tegra: Print -EPROBE_DEFER error message at debug level
  misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
  misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
  tools: PCI: Add 'e' to clear IRQ
  misc: pci_endpoint_test: Add ioctl to clear IRQ
  misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
  PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
  PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
  PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
  misc: pci_endpoint_test: Add support to get DMA option from userspace
  tools: PCI: Add 'd' command line option to support DMA
  misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
  PCI: endpoint: functions/pci-epf-test: Print throughput information
  PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
  PCI: endpoint: Fix clearing start entry in configfs
  PCI: tegra: Add support for PCIe endpoint mode in Tegra194
  dt-bindings: PCI: tegra: Add DT support for PCIe EP nodes in Tegra194
  soc/tegra: bpmp: Update ABI header
  PCI: pci-epf-test: Add support to defer core initialization
  PCI: dwc: Add API to notify core initialization completion
  PCI: endpoint: Add notification for core init completion
  PCI: dwc: Refactor core initialization code for EP mode
  PCI: endpoint: Add core init notifying feature
  PCI: endpoint: Assign function number for each PF in EPC core
  PCI: endpoint: Protect concurrent access to pci_epf_ops with mutex
  PCI: endpoint: Fix for concurrent memory allocation in OB address region
  PCI: endpoint: Replace spinlock with mutex
  PCI: endpoint: Use notification chain mechanism to notify EPC events to EPF
2020-04-02 14:26:57 -05:00
Kishon Vijay Abraham I
475007f9ce misc: pci_endpoint_test: Add ioctl to clear IRQ
Add ioctl to clear IRQ which can be used to free the allocated
IRQ vectors and free the requested IRQ.

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2020-04-02 17:57:10 +01:00
Kishon Vijay Abraham I
73c5762652 tools: PCI: Add 'd' command line option to support DMA
Add a new command line option 'd' to use DMA for data transfers.
It should be used with read, write or copy commands.

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Alan Mikhak <alan.mikhak@sifive.com>
2020-04-02 17:57:10 +01:00
Brian Geffon
e346b38130 mm/mremap: add MREMAP_DONTUNMAP to mremap()
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set,
the source mapping will not be removed.  The remap operation will be
performed as it would have been normally by moving over the page tables to
the new mapping.  The old vma will have any locked flags cleared, have no
pagetables, and any userfaultfds that were watching that range will
continue watching it.

For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause
the mremap() call to fail.  Because MREMAP_DONTUNMAP always results in
moving a VMA you MUST use the MREMAP_MAYMOVE flag, it's not possible to
resize a VMA while also moving with MREMAP_DONTUNMAP so old_len must
always be equal to the new_len otherwise it will return -EINVAL.

We hope to use this in Chrome OS where with userfaultfd we could write an
anonymous mapping to disk without having to STOP the process or worry
about VMA permission changes.

This feature also has a use case in Android, Lokesh Gidra has said that
"As part of using userfaultfd for GC, We'll have to move the physical
pages of the java heap to a separate location.  For this purpose mremap
will be used.  Without the MREMAP_DONTUNMAP flag, when I mremap the java
heap, its virtual mapping will be removed as well.  Therefore, we'll
require performing mmap immediately after.  This is not only time
consuming but also opens a time window where a native thread may call mmap
and reserve the java heap's address range for its own usage.  This flag
solves the problem."

[bgeffon@google.com: v6]
  Link: http://lkml.kernel.org/r/20200218173221.237674-1-bgeffon@google.com
[bgeffon@google.com: v7]
  Link: http://lkml.kernel.org/r/20200221174248.244748-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Deacon <will@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>
Link: http://lkml.kernel.org/r/20200207201856.46070-1-bgeffon@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:30 -07:00
Tiwei Bie
4c8cf31885 vhost: introduce vDPA-based backend
This patch introduces a vDPA-based vhost backend. This backend is
built on top of the same interface defined in virtio-vDPA and provides
a generic vhost interface for userspace to accelerate the virtio
devices in guest.

This backend is implemented as a vDPA device driver on top of the same
ops used in virtio-vDPA. It will create char device entry named
vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
on top of this char device to setup the backend.

Vhost ioctls are extended to make it type agnostic and behave like a
virtio device, this help to eliminate type specific API like what
vhost_net/scsi/vsock did:

- VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
  by virtio specification to differ from different type of devices
- VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
  supported by the vDPA device
- VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
- VHOST_VDPA_SET/GET_CONFIG: access virtio config space
- VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue

For memory mapping, IOTLB API is mandated for vhost-vDPA which means
userspace drivers are required to use
VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
a specific userspace memory region.

The vhost-vDPA API is designed to be type agnostic, but it allows net
device only in current stage. Due to the lacking of control virtqueue
support, some features were filter out by vhost-vdpa.

We will enable more features and devices in the near future.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200326140125.19794-8-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-02 10:41:40 -04:00
Rajat Jain
3a85796296 Input: update SPDX tag for input-event-codes.h
Replace the
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
with
/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */

to help coreboot community consume this file without relaxing their
licensing checks.

Signed-off-by: Rajat Jain <rajatja@google.com>
Link: https://lore.kernel.org/r/20200329172513.133548-1-rajatja@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-04-01 13:27:07 -07:00
Linus Torvalds
29d9f30d4c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Fix the iwlwifi regression, from Johannes Berg.

   2) Support BSS coloring and 802.11 encapsulation offloading in
      hardware, from John Crispin.

   3) Fix some potential Spectre issues in qtnfmac, from Sergey
      Matyukevich.

   4) Add TTL decrement action to openvswitch, from Matteo Croce.

   5) Allow paralleization through flow_action setup by not taking the
      RTNL mutex, from Vlad Buslov.

   6) A lot of zero-length array to flexible-array conversions, from
      Gustavo A. R. Silva.

   7) Align XDP statistics names across several drivers for consistency,
      from Lorenzo Bianconi.

   8) Add various pieces of infrastructure for offloading conntrack, and
      make use of it in mlx5 driver, from Paul Blakey.

   9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

  10) Lots of parallelization improvements during configuration changes
      in mlxsw driver, from Ido Schimmel.

  11) Add support to devlink for generic packet traps, which report
      packets dropped during ACL processing. And use them in mlxsw
      driver. From Jiri Pirko.

  12) Support bcmgenet on ACPI, from Jeremy Linton.

  13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
      Starovoitov, and your's truly.

  14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

  15) Fix sysfs permissions when network devices change namespaces, from
      Christian Brauner.

  16) Add a flags element to ethtool_ops so that drivers can more simply
      indicate which coalescing parameters they actually support, and
      therefore the generic layer can validate the user's ethtool
      request. Use this in all drivers, from Jakub Kicinski.

  17) Offload FIFO qdisc in mlxsw, from Petr Machata.

  18) Support UDP sockets in sockmap, from Lorenz Bauer.

  19) Fix stretch ACK bugs in several TCP congestion control modules,
      from Pengcheng Yang.

  20) Support virtual functiosn in octeontx2 driver, from Tomasz
      Duszynski.

  21) Add region operations for devlink and use it in ice driver to dump
      NVM contents, from Jacob Keller.

  22) Add support for hw offload of MACSEC, from Antoine Tenart.

  23) Add support for BPF programs that can be attached to LSM hooks,
      from KP Singh.

  24) Support for multiple paths, path managers, and counters in MPTCP.
      From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
      and others.

  25) More progress on adding the netlink interface to ethtool, from
      Michal Kubecek"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
  net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
  cxgb4/chcr: nic-tls stats in ethtool
  net: dsa: fix oops while probing Marvell DSA switches
  net/bpfilter: remove superfluous testing message
  net: macb: Fix handling of fixed-link node
  net: dsa: ksz: Select KSZ protocol tag
  netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
  net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
  net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
  net: stmmac: create dwmac-intel.c to contain all Intel platform
  net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
  net: dsa: bcm_sf2: Add support for matching VLAN TCI
  net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
  net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
  net: dsa: bcm_sf2: Disable learning for ASP port
  net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
  net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
  net: dsa: b53: Restore VLAN entries upon (re)configuration
  net: dsa: bcm_sf2: Fix overflow checks
  hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
  ...
2020-03-31 17:29:33 -07:00
Linus Torvalds
dfabb077d6 MMC core:
- Add support for host software queue for (e)MMC/SD
  - Throttle polling rate for CMD6
  - Update CMD13 busy condition check for CMD6 commands
  - Improve busy detect polling for erase/trim/discard/HPI
  - Fixup support for HW busy detection for HPI commands
  - Re-work and improve support for eMMC sanitize commands
 
 MMC host:
  - mmci: Add support for sdmmc variant revision 2.0
  - mmci_sdmmc: Improve support for busyend detection
  - mmci_sdmmc: Fixup support for signal voltage switch
  - mmci_sdmmc: Add support for tuning with delay block
  - mtk-sd: Fix another SDIO irq issue
  - sdhci: Disable native card detect when GPIO based type exist
  - sdhci: Add option to defer request completion
  - sdhci_am654: Add support to set a tap value per speed mode
  - sdhci-esdhc-imx: Add support for i.MX8MM based variant
  - sdhci-esdhc-imx: Fixup support for standard tuning on i.MX8 usdhc
  - sdhci-esdhc-imx: Optimize for strobe/clock dll settings
  - sdhci-esdhc-imx: Fixup support for system and runtime suspend/resume
  - sdhci-iproc: Update regulator/bus-voltage management for bcm2711
  - sdhci-msm: Prevent clock gating with PWRSAVE_DLL on broken variants
  - sdhci-msm: Fix management of CQE during SDHCI reset
  - sdhci-of-arasan: Add support for auto tuning on ZynqMP based platforms
  - sdhci-omap: Add support for system suspend/resume
  - sdhci-sprd: Add support for HW busy detection
  - sdhci-sprd: Enable support host software queue
  - sdhci-tegra: Add support for HW busy detection
  - tmio/renesas_sdhi: Enforce retune after runtime suspend
  - renesas_sdhi: Use manual tap correction for HS400 on some variants
  - renesas_sdhi: Add support for manual correction of tap values for tunings
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAl6CGT8XHHVsZi5oYW5z
 c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjClFWg/+LzX09vHBOfAu7hT/RokcTaBT
 uQnSAfmhkBI+CZerVulPjDX9lFpG2Jb/fu44Ae9EqOAOESAgsTJpxywRRO2f+aNL
 ie9mc0WOkmz1wuAbqYPJImES0CIL2WNpivovLgquRWyltbneh+ImkCbqoWmDYff7
 uIuIC4EPhrWYJczdKr5RCw6HVbsNEAgAr6oJEbmzC63HciCPx5Zo99FN5WHoyRnf
 3c3Ehc4wkVy5iu/wlXqmRdvuayDHhAAmVq6FP5J3IfuoeES3EYeKHc2Ej+pwhYi9
 IFCrO8RDKEu3/o5hLp60ShhF7N/LGWYsl+5KfrwOQ6YPyMLYawR6L0iTYSqkQijy
 3admTGD4OGFuN/8DvQb0yUwhSpRm/Dj+jBZTP3uk9FJHteFlLNHnzREk7weo8i/R
 2WNDSbbV3+TudfC0uC4ipsHtDoidyds+TvR/ebO53pH2Dcr/z6h7i+1tKczA2rK4
 x9mqXhOsskNZC26/UBb9K2oElRON4XDv+VZdQI5ddDuabIYIswXMWLYD1TGYoX5z
 1PXSrrj/Jl/Sz65ZpabKJOexa24s2uThvpOnrGCy2aDc/tbDpcvVhKwL6NX9iRK0
 yYKpwy9yWCGMryVfLI+ahJpvJfQDY4ufKpLC2429LVvgFvNZDG233ZcZhdlhoLNG
 nWh9qHTGTPWo/213yx0=
 =gILc
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC updates from Ulf Hansson:
 "MMC core:
   - Add support for host software queue for (e)MMC/SD
   - Throttle polling rate for CMD6
   - Update CMD13 busy condition check for CMD6 commands
   - Improve busy detect polling for erase/trim/discard/HPI
   - Fixup support for HW busy detection for HPI commands
   - Re-work and improve support for eMMC sanitize commands

  MMC host:
   - mmci:
       * Add support for sdmmc variant revision 2.0
   - mmci_sdmmc:
       * Improve support for busyend detection
       * Fixup support for signal voltage switch
       * Add support for tuning with delay block
   - mtk-sd:
       * Fix another SDIO irq issue
   - sdhci:
       * Disable native card detect when GPIO based type exist
   - sdhci:
       * Add option to defer request completion
   - sdhci_am654:
       * Add support to set a tap value per speed mode
   - sdhci-esdhc-imx:
       * Add support for i.MX8MM based variant
       * Fixup support for standard tuning on i.MX8 usdhc
       * Optimize for strobe/clock dll settings
       * Fixup support for system and runtime suspend/resume
   - sdhci-iproc:
       * Update regulator/bus-voltage management for bcm2711
   - sdhci-msm:
       * Prevent clock gating with PWRSAVE_DLL on broken variants
       * Fix management of CQE during SDHCI reset
   - sdhci-of-arasan:
       * Add support for auto tuning on ZynqMP based platforms
   - sdhci-omap:
       * Add support for system suspend/resume
   - sdhci-sprd:
       * Add support for HW busy detection
       * Enable support host software queue
   - sdhci-tegra:
       * Add support for HW busy detection
   - tmio/renesas_sdhi:
       * Enforce retune after runtime suspend
   - renesas_sdhi:
       * Use manual tap correction for HS400 on some variants
       * Add support for manual correction of tap values for tunings"

* tag 'mmc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (86 commits)
  mmc: cavium-octeon: remove nonsense variable coercion
  mmc: mediatek: fix SDIO irq issue
  mmc: mmci_sdmmc: Fix clear busyd0end irq flag
  dt-bindings: mmc: Fix node name in an example
  mmc: core: Re-work the code for eMMC sanitize
  mmc: sdhci: use FIELD_GET for preset value bit masks
  mmc: sdhci-of-at91: Display clock changes for debug purpose only
  mmc: sdhci: iproc: Add custom set_power() callback for bcm2711
  mmc: sdhci: am654: Use sdhci_set_power_and_voltage()
  mmc: sdhci: at91: Use sdhci_set_power_and_voltage()
  mmc: sdhci: milbeaut: Use sdhci_set_power_and_voltage()
  mmc: sdhci: arasan: Use sdhci_set_power_and_voltage()
  mmc: sdhci: Introduce sdhci_set_power_and_bus_voltage()
  mmc: vub300: Use scnprintf() for avoiding potential buffer overflow
  dt-bindings: mmc: synopsys-dw-mshc: fix clock-freq-min-max in example
  sdhci: tegra: Enable MMC_CAP_WAIT_WHILE_BUSY host capability
  sdhci: tegra: Implement Tegra specific set_timeout callback
  mmc: sdhci-omap: Add Support for Suspend/Resume
  mmc: renesas_sdhi: simplify execute_tuning
  mmc: renesas_sdhi: Use BITS_PER_LONG helper
  ...
2020-03-31 16:13:09 -07:00
Linus Torvalds
15c981d16d for-5.7-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl6CDIMACgkQxWXV+ddt
 WDuJ9g/+NTVt+OXAX3G4VLAIR6EjugREAmiHPlojM7scKsmkBuH9BN35+2EPj+yS
 rSmdL01nOH3gyqe+RzAc1EEiujH/9uDpkNf4zE1tGtj9m5Useqj8ZNmiG/BN0PmR
 OJZkVb8DXUHEXIFscHjQJPP60kFZoqIovS7qZbDh4992+p98lTiUUEI6SPanVYeR
 QysXxmafty03hQMFW93ohFZemwAELVVI44nHxxcmOHT5BbIIopXrkInkkchB9I6b
 l+tIJx1gjL6k0D3v/TTqRuD+wGCE8InJgtiuEOf0WkHp2YXUlSDaKAnF/j9Le4oe
 eOgc50LtA3YNGmZ2m5vTeRjBeU9qUPWjJWJ2urp87oIrxX5x7B5Hsjxdnn28P0yZ
 dl/dt9HxeCKFgaRrMZYETYq9VBt0IMxiOIG9w5fukB9qnC6Dd05dXyQB0slg0+l1
 chn5p0FtMS74cvXB32jW7N0fwxWNt6KI4zBvomabJGYZQd6+dyDO8l8Od86vvve/
 w7KgRy7CFBjc9JOCyLTvS8eEhu/qAVc07phSblpdNnyzPFjWWTdZySON/qQYvUCf
 cGDiq+5+1d1+kWuEjtYNzvxon2AaAfg7UBZm5FrjN735ojTQXqm2vi3rrurcU5AZ
 ItmiU6DMre5EGZ+hfWgSPXDkeqx/JYbtDuUwWbNg6svTXaKKnmI=
 =1m9l
 -----END PGP SIGNATURE-----

Merge tag 'for-5.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "A number of core changes that make things work better in general, code
  is simpler and cleaner.

  Core changes:

   - per-inode file extent tree, for in memory tracking of contiguous
     extent ranges to make sure i_size adjustments are accurate

   - tree root structures are protected by reference counts, replacing
     SRCU that did not cover some cases

   - leak detector for tree root structures

   - per-transaction pinned extent tracking

   - buffer heads are replaced by bios for super block access

   - speedup of extent back reference resolution, on an example test
     scenario the runtime of send went down from a hour to minutes

   - factor out locking scheme used for subvolume writer and NOCOW
     exclusion, abstracted as DREW lock, double reader-writer exclusion
     (allow either readers or writers)

   - cleanup and abstract extent allocation policies, preparation for
     zoned device support

   - make reflink/clone_range work on inline extents

   - add more cancellation point for relocation, improves long response
     from 'balance cancel'

   - add page migration callback for data pages

   - switch to guid for uuids, with additional cleanups of the interface

   - make ranged full fsyncs more efficient

   - removal of obsolete ioctl flag BTRFS_SUBVOL_CREATE_ASYNC

   - remove b-tree readahead from delayed refs paths, avoiding seek and
     read unnecessary blocks

  Features:

   - v2 of ioctl to delete subvolumes, allowing to delete by id and more
     future extensions

  Fixes:

   - fix qgroup rescan worker that could block umount

   - fix crash during unmount due to race with delayed inode workers

   - fix dellaloc flushing logic that could create unnecessary chunks
     under heavy load

   - fix missing file extent item for hole after ranged fsync

   - several fixes in relocation error handling

  Other:

   - more documentation of relocation, device replace, space
     reservations

   - many random cleanups"

* tag 'for-5.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (210 commits)
  btrfs: fix missing semaphore unlock in btrfs_sync_file
  btrfs: use nofs allocations for running delayed items
  btrfs: sysfs: Use scnprintf() instead of snprintf()
  btrfs: do not resolve backrefs for roots that are being deleted
  btrfs: track reloc roots based on their commit root bytenr
  btrfs: restart relocate_tree_blocks properly
  btrfs: reloc: reorder reservation before root selection
  btrfs: do not readahead in build_backref_tree
  btrfs: do not use readahead for running delayed refs
  btrfs: Remove async_transid from btrfs_mksubvol/create_subvol/create_snapshot
  btrfs: Remove transid argument from btrfs_ioctl_snap_create_transid
  btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support
  btrfs: kill the subvol_srcu
  btrfs: make btrfs_cleanup_fs_roots use the radix tree lock
  btrfs: don't take an extra root ref at allocation time
  btrfs: hold a ref on the root on the dead roots list
  btrfs: make inodes hold a ref on their roots
  btrfs: move the root freeing stuff into btrfs_put_root
  btrfs: move ino_cache_inode dropping out of btrfs_free_fs_root
  btrfs: make the extent buffer leak check per fs info
  ...
2020-03-31 13:00:16 -07:00
Linus Torvalds
1455c69900 fscrypt updates for 5.7
Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves a file's
 encryption nonce.  This makes it easier to write automated tests which
 verify that fscrypt is doing the encryption correctly.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXoIg/RQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK2mZAQDjEil0Kf8AqZhjPuJSRrbifkzEPfu+
 4EmERSyBZ5OCLgEA155kKnL5jiz7b5DRS9wGEw+drGpW8I7WfhTGv/XjoQs=
 =2jU9
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt updates from Eric Biggers:
 "Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves a file's
  encryption nonce.

  This makes it easier to write automated tests which verify that
  fscrypt is doing the encryption correctly"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  ubifs: wire up FS_IOC_GET_ENCRYPTION_NONCE
  f2fs: wire up FS_IOC_GET_ENCRYPTION_NONCE
  ext4: wire up FS_IOC_GET_ENCRYPTION_NONCE
  fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl
2020-03-31 12:58:36 -07:00
Paolo Bonzini
4f4af841f0 KVM PPC update for 5.7
* Add a capability for enabling secure guests under the Protected
   Execution Framework ultravisor
 
 * Various bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJegnq3AAoJEJ2a6ncsY3GfrU8IANpaxS7kTAsW3ZN0wJnP2PHq
 i8j7ZPlCVLpkQWArrMyMimDLdiN9VP7lvaWonAWjG0HJrxlmMUUVnwVSMuPTXhWY
 vSwEqhUXwSF5KKxN7DYkgxqaKFxkElJIubVE/AYJjH9zFpu9ca4vM5sCxnzWcvS3
 hxPNe756nKhFH9xhrC/9NRUZWmDAiv75wvzq+5DRKbSVPJGJugchdIPBbi3Yrr7P
 qxnUCPZdCxzXXU94bfQl038wrjSMR3S7b4FvekJ12go2FalujqzsL2lVtift4Rf0
 jvu+RIINcmTiaQqclY332j7l24LZ6Pni456RygT5OwFxuYoxWKZRafN16JmaMCo=
 =rpJL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

KVM PPC update for 5.7

* Add a capability for enabling secure guests under the Protected
  Execution Framework ultravisor

* Various bug fixes and cleanups.
2020-03-31 10:45:49 -04:00
Paolo Bonzini
cf39d37539 KVM/arm updates for Linux 5.7
- GICv4.1 support
 - 32bit host removal
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl6DKKIPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDDe0P/30Oda6HJdcUY+g0dnHkH8N7t+VKjPPnihlX
 WBaT0Y4SzMsfAtG5lQqS48A50dXKWW70QvwkZjxu7abQhYFWGd2SGtTQxwqJXT8J
 I6MBh4r9xrIfiqzVT2BXslA6id5H6wCyyFI6vKm/IFkIu1J6JtwnKakQ0CIddS1d
 Blbgj5jcxGw+2xOppHCQXbWwwDdmYWkMZEBZjmhkezddqLDK+oaAUiUhHHHizTsB
 kLjgqYBVENpR1zDIsGpQAJloKXAiHfBQshQAmnhnBNzXE60LZ0n0/iODU9U5FDEO
 5j0DRWccKvsIMsUh7JpPr5xerGJ0rqk1IwPC2JcyzfRbvRLMpK1IOWfhI5Tg5lbP
 4Ev96QLEMBnKOWMSE0MqnMdq6JPzDLA6WZ28HZe2nc3/oWNgsSDtlXigx4xFFxTX
 zfc2YpAgFu3xJkPf8PtWTFvItm0AvFNFynPg0Rr/NsGf/FGeszYR4cLcHmv5NlWS
 IiV4+lgnlmr2LZr3VjUaumbtWIpuVF4Db5Al2K2E/PCN7ObfEkyCweDic8ophkH8
 sMS9TI38aH1Efy+I2Nfxxqpy8BcElZAMrAWt9R27A4JRLHdr7j5DsGnyRigXHgRe
 pFgbqtk/EjWkHwjaJVg8kPxf2+2P05VZsQeGG721nbKAIKDetM3RA2BflexdsptY
 kXplNsVr
 =eILh
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for Linux 5.7

- GICv4.1 support
- 32bit host removal
2020-03-31 10:44:53 -04:00
David S. Miller
ed52f2c608 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 19:52:37 -07:00
David S. Miller
d9679cd985 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Add support to specify a stateful expression in set definitions,
   this allows users to specify e.g. counters per set elements.

2) Flowtable software counter support.

3) Flowtable hardware offload counter support, from wenxu.

3) Parallelize flowtable hardware offload requests, from Paul Blakey.
   This includes a patch to add one work entry per offload command.

4) Several patches to rework nf_queue refcount handling, from Florian
   Westphal.

4) A few fixes for the flowtable tunnel offload: Fix crash if tunneling
   information is missing and set up indirect flow block as TC_SETUP_FT,
   patch from wenxu.

5) Stricter netlink attribute sanity check on filters, from Romain Bellan
   and Florent Fourcot.

5) Annotations to make sparse happy, from Jules Irenge.

6) Improve icmp errors in debugging information, from Haishuang Yan.

7) Fix warning in IPVS icmp error debugging, from Haishuang Yan.

8) Fix endianess issue in tcp extension header, from Sergey Marinkevich.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 19:40:46 -07:00
Ido Schimmel
1e8c661989 devlink: Add packet trap policers support
Devices capable of offloading the kernel's datapath and perform
functions such as bridging and routing must also be able to send (trap)
specific packets to the kernel (i.e., the CPU) for processing.

For example, a device acting as a multicast-aware bridge must be able to
trap IGMP membership reports to the kernel for processing by the bridge
module.

In most cases, the underlying device is capable of handling packet rates
that are several orders of magnitude higher compared to those that can
be handled by the CPU.

Therefore, in order to prevent the underlying device from overwhelming
the CPU, devices usually include packet trap policers that are able to
police the trapped packets to rates that can be handled by the CPU.

This patch allows capable device drivers to register their supported
packet trap policers with devlink. User space can then tune the
parameters of these policer (currently, rate and burst size) and read
from the device the number of packets that were dropped by the policer,
if supported.

Subsequent patches in the series will allow device drivers to create
default binding between these policers and packet trap groups and allow
user space to change the binding.

v2:
* Add 'strict_start_type' in devlink policy
* Have device drivers provide max/min rate/burst size for each policer.
  Use them to check validity of user provided parameters

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:58 -07:00
Andrii Nakryiko
0c991ebc8c bpf: Implement bpf_prog replacement for an active bpf_cgroup_link
Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
under given bpf_link. Currently this is only supported for bpf_cgroup_link,
but will be extended to other kinds of bpf_links in follow-up patches.

For bpf_cgroup_link, implemented functionality matches existing semantics for
direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
unconditionally set new bpf_prog regardless of which bpf_prog is currently
active under given bpf_link, or, optionally, can specify expected active
bpf_prog. If active bpf_prog doesn't match expected one, no changes are
performed, old bpf_link stays intact and attached, operation returns
a failure.

cgroup_bpf_replace() operation is resolving race between auto-detachment and
bpf_prog update in the same fashion as it's done for bpf_link detachment,
except in this case update has no way of succeeding because of target cgroup
marked as dying. So in this case error is returned.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200330030001.2312810-3-andriin@fb.com
2020-03-30 17:36:33 -07:00
Andrii Nakryiko
af6eea5743 bpf: Implement bpf_link-based cgroup BPF program attachment
Implement new sub-command to attach cgroup BPF programs and return FD-based
bpf_link back on success. bpf_link, once attached to cgroup, cannot be
replaced, except by owner having its FD. Cgroup bpf_link supports only
BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
attachments can be freely intermixed.

To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
BPF program can be executed, implement auto-detachment of link. When
cgroup_bpf_release() is called, all attached bpf_links are forced to release
cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
well as still owning underlying bpf_prog. This is because user-space might
still have FDs open and active, so bpf_link as a user-referenced object can't
be freed yet. Once last active FD is closed, bpf_link will be freed and
underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
touched, because cgroup is released already.

The inherent race between bpf_cgroup_link release (from closing last FD) and
cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
the only additional check required is when bpf_cgroup_link attempts to detach
itself from cgroup. At that time we need to check whether there is still
cgroup associated with that link. And if not, exit with success, because
bpf_cgroup_link was already successfully detached.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com
2020-03-30 17:35:59 -07:00
Linus Torvalds
9b82f05f86 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle were:

  Kernel side changes:

   - A couple of x86/cpu cleanups and changes were grandfathered in due
     to patch dependencies. These clean up the set of CPU model/family
     matching macros with a consistent namespace and C99 initializer
     style.

   - A bunch of updates to various low level PMU drivers:
       * AMD Family 19h L3 uncore PMU
       * Intel Tiger Lake uncore support
       * misc fixes to LBR TOS sampling

   - optprobe fixes

   - perf/cgroup: optimize cgroup event sched-in processing

   - misc cleanups and fixes

  Tooling side changes are to:

   - perf {annotate,expr,record,report,stat,test}

   - perl scripting

   - libapi, libperf and libtraceevent

   - vendor events on Intel and S390, ARM cs-etm

   - Intel PT updates

   - Documentation changes and updates to core facilities

   - misc cleanups, fixes and other enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
  cpufreq/intel_pstate: Fix wrong macro conversion
  x86/cpu: Cleanup the now unused CPU match macros
  hwrng: via_rng: Convert to new X86 CPU match macros
  crypto: Convert to new CPU match macros
  ASoC: Intel: Convert to new X86 CPU match macros
  powercap/intel_rapl: Convert to new X86 CPU match macros
  PCI: intel-mid: Convert to new X86 CPU match macros
  mmc: sdhci-acpi: Convert to new X86 CPU match macros
  intel_idle: Convert to new X86 CPU match macros
  extcon: axp288: Convert to new X86 CPU match macros
  thermal: Convert to new X86 CPU match macros
  hwmon: Convert to new X86 CPU match macros
  platform/x86: Convert to new CPU match macros
  EDAC: Convert to new X86 CPU match macros
  cpufreq: Convert to new X86 CPU match macros
  ACPI: Convert to new X86 CPU match macros
  x86/platform: Convert to new CPU match macros
  x86/kernel: Convert to new CPU match macros
  x86/kvm: Convert to new CPU match macros
  x86/perf/events: Convert to new CPU match macros
  ...
2020-03-30 16:40:08 -07:00
Linus Torvalds
db34c5ffee USB / PHY patches for 5.7-rc1
Here are the big set of USB and PHY driver patches for 5.7-rc1.
 
 Nothing huge here, some new PHY drivers, loads of USB gadget fixes and
 updates, xhci updates, usb-serial driver updates and new device ids, and
 other minor things.  Full details in the shortlog.
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXoHL9w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymz6wCcDwDTZouXj+0B37q+kwlCQQPyLukAn2CxKfrM
 d+wScRHWoZutA8IdzqaU
 =5+jn
 -----END PGP SIGNATURE-----

Merge tag 'usb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / PHY updates from Greg KH:
 "Here are the big set of USB and PHY driver patches for 5.7-rc1.

  Nothing huge here, some new PHY drivers, loads of USB gadget fixes and
  updates, xhci updates, usb-serial driver updates and new device ids,
  and other minor things. Full details in the shortlog.

  All have been in linux-next for a while with no reported issues"

* tag 'usb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (239 commits)
  USB: cdc-acm: restore capability check order
  usb: cdns3: make signed 1 bit bitfields unsigned
  usb: gadget: fsl: remove unused variable 'driver_desc'
  usb: gadget: f_fs: Fix use after free issue as part of queue failure
  usb: typec: Correct the documentation for typec_cable_put()
  USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback
  USB: serial: option: add Wistron Neweb D19Q1
  USB: serial: option: add BroadMobi BM806U
  USB: serial: option: add support for ASKEY WWHC050
  usb: core: Add ACPI support for USB interface devices
  driver core: platform: Reimplement devm_platform_ioremap_resource
  usb: dwc2: convert to devm_platform_get_and_ioremap_resource
  usb: host: hisilicon: convert to devm_platform_get_and_ioremap_resource
  usb: host: xhci-plat: convert to devm_platform_get_and_ioremap_resource
  drivers: provide devm_platform_get_and_ioremap_resource()
  phy: qcom-qusb2: Add new overriding tuning parameters in QUSB2 V2 PHY
  phy: qcom-qusb2: Add support for overriding tuning parameters in QUSB2 V2 PHY
  dt-bindings: phy: qcom-qusb2: Add support for overriding Phy tuning parameters
  phy: qcom-qusb2: Add generic QUSB2 V2 PHY support
  dt-bindings: phy: qcom,qusb2: Add compatibles for QUSB2 V2 phy and SC7180
  ...
2020-03-30 13:54:11 -07:00
Joe Stringer
cf7fbe660f bpf: Add socket assign support
Add support for TPROXY via a new bpf helper, bpf_sk_assign().

This helper requires the BPF program to discover the socket via a call
to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
helper takes its own reference to the socket in addition to any existing
reference that may or may not currently be obtained for the duration of
BPF processing. For the destination socket to receive the traffic, the
traffic must be routed towards that socket via local route. The
simplest example route is below, but in practice you may want to route
traffic more narrowly (eg by CIDR):

  $ ip route add local default dev lo

This patch avoids trying to introduce an extra bit into the skb->sk, as
that would require more invasive changes to all code interacting with
the socket to ensure that the bit is handled correctly, such as all
error-handling cases along the path from the helper in BPF through to
the orphan path in the input. Instead, we opt to use the destructor
variable to switch on the prefetch of the socket.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz
2020-03-30 13:45:04 -07:00
Linus Torvalds
063d194224 media updates for v5.7-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl6Bvm0ACgkQCF8+vY7k
 4RViMQ//baNJyAA8/Hpxz1w5+nL5hsTOhf2PcPgnCLkRVQnIiKMZoq7AS5KWtbqu
 im0SM6nWduGc2T/44Ew13YmBnuWMdXL9Gs8XtAkNakgSV1UM+A3pRuOWRYCyU3Ts
 1QDsc3N7oY9cvyJGWOlqdcA4gp4AaKxjS6M6Z18wYBk/jYSCcj4ZVecT89DYeeM7
 wFORkv/xSdgC3eoKWEwTyglzUmrXKrbfHdcNWrQBg+1SN3WrMYQWCL6nSYMqn0Vu
 f9L5E6jUSx9s6+apxS0OUQmDj78RM1JCEY1P8lgc3tAtVJ+X3yZbxwtpcvujhFPv
 c48NUQeyxAJc7evarvkd73Gwl4buujqHSgiRUovHwqUXHJuGZ3PBTryV9HzbmYYy
 EeHS/23t09F3j9zYtuoDNFIED03Mi+TNeS04cq8OIfwNl7xpUSEV0S/wd11V2308
 cfm6lsogGE9HRbaIxCHgx4AiGFVhbpK1OQt66iYze8r/wyxnN8MVOHGWw+eI4LRK
 9gwh7Wx37k6uCrjfOnLSgx7kcJ+mxSZEYyHJZqqtPm9H1SC68GOxhL/S3Zu7arvK
 eiwFfxJBiunCEfauOx28kaAdvBZVyEvYeDFYl/k+q4DCIGjvK0FXud6QRjNXv24S
 qUXYZKPUALTFOpbkQ3IQiBOQNM4NhF15RzCqRUptVnlF05MSywg=
 =Ve8R
 -----END PGP SIGNATURE-----

Merge tag 'media/v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - New sensor driver: imx219

 - Support for some new pixelformats

 - Support for Sun8i SoC

 - Added more codecs to meson vdec driver

 - Prepare for removing the legacy usbvision driver by moving it to
   staging. This driver has issues and use legacy core APIs. If nobody
   steps up to address those, it is time for its retirement.

 - Several cleanups and improvements on drivers, with the addition of
   new supported boards

* tag 'media/v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (236 commits)
  media: venus: firmware: Ignore secure call error on first resume
  media: mtk-vpu: load vpu firmware from the new location
  media: i2c: video-i2c: fix build errors due to 'imply hwmon'
  media: MAINTAINERS: add myself to co-maintain Hantro G1/G2 for i.MX8MQ
  media: hantro: add initial i.MX8MQ support
  media: dt-bindings: Document i.MX8MQ VPU bindings
  media: vivid: fix incorrect PA assignment to HDMI outputs
  media: hantro: Add linux-rockchip mailing list to MAINTAINERS
  media: cedrus: h264: Fix 4K decoding on H6
  media: siano: Use scnprintf() for avoiding potential buffer overflow
  media: rc: Use scnprintf() for avoiding potential buffer overflow
  media: allegro: create new struct for channel parameters
  media: allegro: move mail definitions to separate file
  media: allegro: pass buffers through firmware
  media: allegro: verify source and destination buffer in VCU response
  media: allegro: handle dependency of bitrate and bitrate_peak
  media: allegro: read bitrate mode directly from control
  media: allegro: make QP configurable
  media: allegro: make frame rate configurable
  media: allegro: skip filler data if possible
  ...
2020-03-30 13:42:05 -07:00
Linus Torvalds
78b0dedd52 updates for seccomp
- allow TSYNC and USER_NOTIF together (Tycho Andersen)
 - Add missing compat_ioctl for notify (Sven Schnelle)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl6BcbwWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJuhfD/9NjC6h+l9YNHW2O3bYPxDDjkJq
 1aRInf+/UayTnfwhlZiJj2FFYyPR1gfZXB9CPcniYO/t6tsCdc+0kQdc3uUCPb0y
 ClPp5Pc/u/SwEFgrj5gv/NsEwAVaTPy1ioagefZMENQXj77XcfifF5Mrave+lR3K
 TiZsFItucIRTiEb8YY4xF/t5rn/lBvAqDiYNZwYYVcopnW3kgvOljz6ZRyOstV/B
 J9QrErFfDH9SzPfK/1bZ5GbCUsTRzbGXA281UBhZdkJQaA3yoqK+yv/xKtoaX0WK
 uxLPt2BG3qb21+8JZacJ2L6KQAwm5EdT+OyLyFzUYki23LsJNHEb+UpkoRnyJg5H
 sSSZRj14WH5aK1REGTDLr5tgx5lxkXx/iuxYc4tuM56ToWS4hXNiQFU2cUcdqjSO
 bSKVg1LO9FfTTMecYXUqljoOwAKMVra2nDNCpvkBr/1JMVFZjCfWpjy3ZvHHwpqt
 BpxgfJW250HfnMWpa7k5p6bIP+WMwetwP1yGZx6xNz8j3ZSshIPUqCvTU6zD89CN
 RXHMfnZOxNtq1biI41Ppc5/kCt2t4598BaGsWIWcjhhY8p5Ttq+HGs3tOPsuUXen
 ccGAa/1Co0u5CCxudG4nZ2a/ooeijMx7D5HfvoYvHDQbugR/x4aSZuiw7JiTKBHr
 EZCFZyxvFVqnqlzQ2Q==
 =Ejyd
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "A couple of seccomp updates. They're both mostly bug fixes that I
  wanted to have sit in linux-next for a while:

   - allow TSYNC and USER_NOTIF together (Tycho Andersen)

   - add missing compat_ioctl for notify (Sven Schnelle)"

* tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: Add missing compat_ioctl for notify
  seccomp: allow TSYNC and USER_NOTIF together
2020-03-30 12:53:56 -07:00
Linus Torvalds
e59cd88028 for-5.7/io_uring-2020-03-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6BJEMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpie7D/9gN4zhykYDfcgamfxMtTbpla2PdTnWoJxP
 fjy/Nx2FySakmccaiCGQSQ1rzD1L67UQkJgEH6hPTomJvA4FaOmJ+ZSaExMy55LH
 ZT+nD3zQ9SCuA0DEpfxbsCP1tbnoXSMQNt8Tyh0x8PAoxp5bI0eRczOju1QWLWTS
 tjBEMZNipN6krrV9RPWT0S5Z31/yGr/sXprCSHFV9Ypzwrx58Tj2i6F9gR7FVbLs
 nV2/O8taEn0sMQIz8TVHKol/TBalluGrC4M/bOeS3faP3BPN4TT24Gtc0LAKEibk
 F49/SX7FzwhOdl43Bdkbe2bbL86p+zOLSf0IMBwMm0DJl4aiOljRUYTSYRolgGgm
 Ebw9QhemTwbxxeD2nEriA4EAeYvTx69RDlN2eVilwwfJ48Xz9fVm3GNYG7LISeON
 k3/TyZOBQH2SZ2Hc3oF2Mq9j1UPHXZHUUsUNlNcN+aM9SFHcWkRi6xZWemTJHJZ4
 zFss5RZHo0+RLBa8rrx8xaO8iWrc73+FuRhr9eSsmyPIj+OZ4ezEFRRRHwtk2fgv
 dZvD413AyCI1c+3LlBusESMsrtXyY8p9O9buNTzHy3ZUtHe0ERmYV2m/a83A5pXo
 Kia/5aJbPIC61bAkCCkiVo+W9OASJ6o5+3CXl5sM9lGTbDXjcofzewmd+RHPestx
 xVbzeR9UIw==
 =bYLJ
 -----END PGP SIGNATURE-----

Merge tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:
 "Here are the io_uring changes for this merge window. Light on new
  features this time around (just splice + buffer selection), lots of
  cleanups, fixes, and improvements to existing support. In particular,
  this contains:

   - Cleanup fixed file update handling for stack fallback (Hillf)

   - Re-work of how pollable async IO is handled, we no longer require
     thread offload to handle that. Instead we rely using poll to drive
     this, with task_work execution.

   - In conjunction with the above, allow expendable buffer selection,
     so that poll+recv (for example) no longer has to be a split
     operation.

   - Make sure we honor RLIMIT_FSIZE for buffered writes

   - Add support for splice (Pavel)

   - Linked work inheritance fixes and optimizations (Pavel)

   - Async work fixes and cleanups (Pavel)

   - Improve io-wq locking (Pavel)

   - Hashed link write improvements (Pavel)

   - SETUP_IOPOLL|SETUP_SQPOLL improvements (Xiaoguang)"

* tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block: (54 commits)
  io_uring: cleanup io_alloc_async_ctx()
  io_uring: fix missing 'return' in comment
  io-wq: handle hashed writes in chains
  io-uring: drop 'free_pfile' in struct io_file_put
  io-uring: drop completion when removing file
  io_uring: Fix ->data corruption on re-enqueue
  io-wq: close cancel gap for hashed linked work
  io_uring: make spdxcheck.py happy
  io_uring: honor original task RLIMIT_FSIZE
  io-wq: hash dependent work
  io-wq: split hashing and enqueueing
  io-wq: don't resched if there is no work
  io-wq: remove duplicated cancel code
  io_uring: fix truncated async read/readv and write/writev retry
  io_uring: dual license io_uring.h uapi header
  io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled
  io_uring: Fix unused function warnings
  io_uring: add end-of-bits marker and build time verify it
  io_uring: provide means of removing buffers
  io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG
  ...
2020-03-30 12:18:49 -07:00
Linus Torvalds
1592614838 for-5.7/drivers-2020-03-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6BJDYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplhMD/95jd4nlVetHAo54z+Zk2ExE13+yDamRKyh
 vc7t2tz1reqFOimtVr5aVuTXCTgOx4CpiIox5qcn6qAExN4JtCChOBRGize/0u8S
 ckxnhHbN2C0rfnGldvrYYeNRonFI+7QKimnurWUSYYGN0xqbo21BxJ7dFaohMseo
 q4K8sIW0ctE6AOlw28Jerkg614s2NDGZ7q1laheXnYHn5c9f1m0NaKN/jyTGgr0X
 TLBiLbX2yRrAuvpctBj6Fna6YN7Vdd9jsf2Bt6ipUI1XgHQoVUGMxQNhWPyjsbSv
 GzRQUNAfVcasLzCP/Mj/47144OkUtDDpn2mjeXDaFljLDGFULD+jp/SsOmLCxkPC
 gI7G2yfBvF96/SOyT0JXrLyMcBd1R2vRoASbc5tPu82mZhx7YJZH5WYtOB9h2gra
 RTYo3xcm0EoN6yeMaH+xOuXxTWWInIrgKPONW4H8s7hxEiMt5oFNVBI7vqPr4LVp
 tpfxiKZDavKOofKXogNV4W7mSMP/Ir5Q9Ha4g5SXHBGp0z/PHmnQ0xDGNq0KDnU4
 eNO0UYCFNCNa+0AOhpNxaVuVm9LjrgvyXRjePgOZQ4akhohwHO6DLrHK1f8Hb1vD
 8Ih6uR+F5zZlKsouWro8HLGYm5w40Wq9tbCI8QbPYH6nkGoDmzpPv9jbAeWgJU5c
 KqP/5TBSLA==
 =Bs4E
 -----END PGP SIGNATURE-----

Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:

 - floppy driver cleanup series from Willy

 - NVMe updates and fixes (Various)

 - null_blk trace improvements (Chaitanya)

 - bcache fixes (Coly)

 - md fixes (via Song)

 - loop block size change optimizations (Martijn)

 - scnprintf() use (Takashi)

* tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits)
  null_blk: add trace in null_blk_zoned.c
  null_blk: add tracepoint helpers for zoned mode
  block: add a zone condition debug helper
  nvme: cleanup namespace identifier reporting in nvme_init_ns_head
  nvme: rename __nvme_find_ns_head to nvme_find_ns_head
  nvme: refactor nvme_identify_ns_descs error handling
  nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl
  nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl
  nvme: Fix controller creation races with teardown flow
  nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
  nvme: Fix ctrl use-after-free during sysfs deletion
  nvme-pci: Re-order nvme_pci_free_ctrl
  nvme: Remove unused return code from nvme_delete_ctrl_sync
  nvme: Use nvme_state_terminal helper
  nvme: release ida resources
  nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO
  nvmet-tcp: optimize tcp stack TX when data digest is used
  nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow
  nvme-multipath: do not reset on unknown status
  nvmet-rdma: allocate RW ctxs according to mdts
  ...
2020-03-30 11:43:51 -07:00
Eran Ben Elisha
48bb52c80b devlink: Add auto dump flag to health reporter
On low memory system, run time dumps can consume too much memory. Add
administrator ability to disable auto dumps per reporter as part of the
error flow handle routine.

This attribute is not relevant while executing
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.

By default, auto dump is activated for any reporter that has a dump method,
as part of the reporter registration to devlink.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:17:34 -07:00
Jiri Pirko
93a129eb8c net: sched: expose HW stats types per action used by drivers
It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 10 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:06:49 -07:00
Michal Kubecek
5b071c59ed ethtool: provide timestamping information with TSINFO_GET request
Implement TSINFO_GET request to get timestamping information for a network
device. This is traditionally available via ETHTOOL_GET_TS_INFO ioctl
request.

Move part of ethtool_get_ts_info() into common.c so that ioctl and netlink
code use the same logic to get timestamping information from the device.

v3: use "TSINFO" rather than "TIMESTAMP", suggested by Richard Cochran

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:37 -07:00
Michal Kubecek
f76510b458 ethtool: add timestamping related string sets
Add three string sets related to timestamping information:

  ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags
  ETH_SS_TS_TX_TYPES:      timestamping Tx types
  ETH_SS_TS_RX_FILTERS:    timestamping Rx filters

These will be used for TIMESTAMP_GET request.

v2: avoid compiler warning ("enumeration value not handled in switch")
    in net_hwtstamp_validate()

v3: omit dash in Tx type names ("one-step-*" -> "onestep-*"), suggested by
    Richard Cochran

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
6c5bc8fe4e ethtool: add EEE_NTF notification
Send ETHTOOL_MSG_EEE_NTF notification whenever EEE settings of a network
device are modified using ETHTOOL_MSG_EEE_SET netlink message or
ETHTOOL_SEEE ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
fd77be7bd4 ethtool: set EEE settings with EEE_SET request
Implement EEE_SET netlink request to set EEE settings of a network device.
These are traditionally set with ETHTOOL_SEEE ioctl request.

The netlink interface allows setting the EEE status for all link modes
supported by kernel but only first 32 link modes can be set at the moment
as only those are supported by the ethtool_ops callback.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
b7eeefe72e ethtool: provide EEE settings with EEE_GET request
Implement EEE_GET request to get EEE settings of a network device. These
are traditionally available via ETHTOOL_GEEE ioctl request.

The netlink interface allows reporting EEE status for all link modes
supported by kernel but only first 32 link modes are provided at the moment
as only those are reported by the ethtool_ops callback and drivers.

v2: fix alignment (whitespace only)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
bf37faa386 ethtool: add PAUSE_NTF notification
Send ETHTOOL_MSG_PAUSE_NTF notification whenever pause parameters of
a network device are modified using ETHTOOL_MSG_PAUSE_SET netlink message
or ETHTOOL_SPAUSEPARAM ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
3ab879933d ethtool: set pause parameters with PAUSE_SET request
Implement PAUSE_SET netlink request to set pause parameters of a network
device. Thease are traditionally set with ETHTOOL_SPAUSEPARAM ioctl
request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
7f59fb32b0 ethtool: provide pause parameters with PAUSE_GET request
Implement PAUSE_GET request to get pause parameters of a network device.
These are traditionally available via ETHTOOL_GPAUSEPARAM ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
0cf3eac8c9 ethtool: add COALESCE_NTF notification
Send ETHTOOL_MSG_COALESCE_NTF notification whenever coalescing parameters
of a network device are modified using ETHTOOL_MSG_COALESCE_SET netlink
message or ETHTOOL_SCOALESCE ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
9881418c75 ethtool: set coalescing parameters with COALESCE_SET request
Implement COALESCE_SET netlink request to set coalescing parameters of
a network device. These are traditionally set with ETHTOOL_SCOALESCE ioctl
request. This commit adds only support for device coalescing parameters,
not per queue coalescing parameters.

Like the ioctl implementation, the generic ethtool code checks if only
supported parameters are modified; if not, first offending attribute is
reported using extack.

v2: fix alignment (whitespace only)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
217275453b ethtool: provide coalescing parameters with COALESCE_GET request
Implement COALESCE_GET request to get coalescing parameters of a network
device. These are traditionally available via ETHTOOL_GCOALESCE ioctl
request. This commit adds only support for device coalescing parameters,
not per queue coalescing parameters.

Omit attributes with zero values unless they are declared as supported
(i.e. the corresponding bit in ethtool_ops::supported_coalesce_params is
set).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Alexander Aring
a7a29f9c36 net: ipv6: add rpl sr tunnel
This patch adds functionality to configure routes for RPL source routing
functionality. There is no IPIP functionality yet implemented which can
be added later when the cases when to use IPv6 encapuslation comes more
clear.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
8610c7c6e3 net: ipv6: add support for rpl sr exthdr
This patch adds rpl source routing receive handling. Everything works
only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly
the same behaviour as IPv6 segmentation routing. To handle compression
and uncompression a rpl.c file is created which contains the necessary
functionality. The receive handling will also care about IPv6
encapsulated so far it's specified as possible nexthdr in RFC 6554.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
cfa933d938 include: uapi: linux: add rpl sr header definition
This patch adds a uapi header for rpl struct definition. The segments
data can be accessed over rpl_segaddr or rpl_segdata macros. In case of
compri and compre is zero the segment data is not compressed and can be
accessed by rpl_segaddr. In the other case the compressed data can be
accessed by rpl_segdata and interpreted as byte array.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Paolo Abeni
01cacb00b3 mptcp: add netlink-based PM
Expose a new netlink family to userspace to control the PM, setting:

 - list of local addresses to be signalled.
 - list of local addresses used to created subflows.
 - maximum number of add_addr option to react

When the msk is fully established, the PM netlink attempts to
announce the 'signal' list via the ADD_ADDR option. Since we
currently lack the ADD_ADDR echo (and related event) only the
first addr is sent.

After exhausting the 'announce' list, the PM tries to create
subflow for each addr in 'local' list, waiting for each
connection to be completed before attempting the next one.

Idea is to add an additional PM hook for ADD_ADDR echo, to allow
the PM netlink announcing multiple addresses, in sequence.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Davide Caratti
5147dfb508 mptcp: allow dumping subflow context to userspace
add ulp-specific diagnostic functions, so that subflow information can be
dumped to userspace programs like 'ss'.

v2 -> v3:
- uapi: use bit macros appropriate for userspace

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Mark Starovoytov
791bb3fcaf net: macsec: add support for specifying offload upon link creation
This patch adds new netlink attribute to allow a user to (optionally)
specify the desired offload mode immediately upon MACSec link creation.

Separate iproute patch will be required to support this from user space.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:34:21 -07:00
David S. Miller
f0b5989745 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment conflict in mac80211.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:25:29 -07:00
KP Singh
fc611f47f2 bpf: Introduce BPF_PROG_TYPE_LSM
Introduce types and configs for bpf programs that can be attached to
LSM hooks. The programs can be enabled by the config option
CONFIG_BPF_LSM.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Florent Revest <revest@google.com>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org
2020-03-30 01:34:00 +02:00
Johannes Berg
88ce642492 um: Implement time-travel=ext
This implements synchronized time-travel mode which - using a special
application on a unix socket - lets multiple machines take part in a
time-travelling simulation together.

The protocol for the unix domain socket is defined in the new file
include/uapi/linux/um_timetravel.h.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2020-03-29 23:29:08 +02:00
Toke Høiland-Jørgensen
92234c8f15 xdp: Support specifying expected existing program when attaching XDP
While it is currently possible for userspace to specify that an existing
XDP program should not be replaced when attaching to an interface, there is
no mechanism to safely replace a specific XDP program with another.

This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
set along with IFLA_XDP_FD. If set, the kernel will check that the program
currently loaded on the interface matches the expected one, and fail the
operation if it does not. This corresponds to a 'cmpxchg' memory operation.
Setting the new attribute with a negative value means that no program is
expected to be attached, which corresponds to setting the UPDATE_IF_NOEXIST
flag.

A new companion flag, XDP_FLAGS_REPLACE, is also added to explicitly
request checking of the EXPECTED_FD attribute. This is needed for userspace
to discover whether the kernel supports the new attribute.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/158515700640.92963.3551295145441017022.stgit@toke.dk
2020-03-28 14:24:41 -07:00
Daniel Borkmann
0f09abd105 bpf: Enable bpf cgroup hooks to retrieve cgroup v2 and ancestor id
Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(),
recvmsg() and bind-related hooks in order to retrieve the cgroup v2
context which can then be used as part of the key for BPF map lookups,
for example. Given these hooks operate in process context 'current' is
always valid and pointing to the app that is performing mentioned
syscalls if it's subject to a v2 cgroup. Also with same motivation of
commit 7723628101 ("bpf: Introduce bpf_skb_ancestor_cgroup_id helper")
enable retrieval of ancestor from current so the cgroup id can be used
for policy lookups which can then forbid connect() / bind(), for example.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/d2a7ef42530ad299e3cbb245e6c12374b72145ef.1585323121.git.daniel@iogearbox.net
2020-03-27 19:40:39 -07:00
Daniel Borkmann
f318903c0b bpf: Add netns cookie and enable it for bpf cgroup hooks
In Cilium we're mainly using BPF cgroup hooks today in order to implement
kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*),
ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic
between Cilium managed nodes. While this works in its current shape and avoids
packet-level NAT for inter Cilium managed node traffic, there is one major
limitation we're facing today, that is, lack of netns awareness.

In Kubernetes, the concept of Pods (which hold one or multiple containers)
has been built around network namespaces, so while we can use the global scope
of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing
NodePort ports on loopback addresses), we also have the need to differentiate
between initial network namespaces and non-initial one. For example, ExternalIP
services mandate that non-local service IPs are not to be translated from the
host (initial) network namespace as one example. Right now, we have an ugly
work-around in place where non-local service IPs for ExternalIP services are
not xlated from connect() and friends BPF hooks but instead via less efficient
packet-level NAT on the veth tc ingress hook for Pod traffic.

On top of determining whether we're in initial or non-initial network namespace
we also have a need for a socket-cookie like mechanism for network namespaces
scope. Socket cookies have the nice property that they can be combined as part
of the key structure e.g. for BPF LRU maps without having to worry that the
cookie could be recycled. We are planning to use this for our sessionAffinity
implementation for services. Therefore, add a new bpf_get_netns_cookie() helper
which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would
provide the cookie for the initial network namespace while passing the context
instead of NULL would provide the cookie from the application's network namespace.
We're using a hole, so no size increase; the assignment happens only once.
Therefore this allows for a comparison on initial namespace as well as regular
cookie usage as we have today with socket cookies. We could later on enable
this helper for other program types as well as we would see need.

  (*) Both externalTrafficPolicy={Local|Cluster} types
  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
2020-03-27 19:40:38 -07:00
Linus Walleij
06dd3f31cb Linux 5.6-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl54EZgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGLpwIAJv475oyWJDefyZn
 Va5GF+LgR3CMGnfOQXyLXphFUU0fYQtuHb6E5w2hmMpovNrlbpzuypuOetqN1gtQ
 DpDgt6htHlBAJCNkNnHOjEARmMZo64D2dnLlTfa6fjJMc4tg3yk/oMFXFpiP0kdd
 ena4DetB293IF2EjP7RWfVbXzbZzG4sLmIsOmUiFH1H1nhTV8tZWG06KvUcwuCSU
 AfrXiOaVj6npiShszjdODYaFRL6mYh5es7q02wQpKeWdZHRU8IuKTgywiOjh6uD4
 J2bXvz0qbDN/2Zgj73H8EfkAP7zm6nCHifQiUm9uRsjzpcfjFRYIn+4/4LAzCIjm
 VI8uvdA=
 =/NN5
 -----END PGP SIGNATURE-----

Merge tag 'v5.6-rc7' into devel

Linux 5.6-rc7
2020-03-27 22:36:17 +01:00
Pablo Neira Ayuso
53c2b2899a netfilter: flowtable: add counter support
Add a new flag to turn on flowtable counters which are stored in the
conntrack entry.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:32:37 +01:00
Pablo Neira Ayuso
cfbd1125fc netfilter: nf_tables: add enum nft_flowtable_flags to uapi
Expose the NFT_FLOWTABLE_HW_OFFLOAD flag through uapi.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:32:36 +01:00
Namhyung Kim
6546b19f95 perf/core: Add PERF_SAMPLE_CGROUP feature
The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information in
the sample.  It will add a 64-bit id to identify current cgroup and it's
the file handle in the cgroup file system.  Userspace should use this
information with PERF_RECORD_CGROUP event to match which cgroup it
belongs.

I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word.  But if we want bigger samples, I can work on that
direction too.

Committer testing:

  $ pahole perf_sample_data | grep -w cgroup -B5 -A5
  	/* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
  	struct perf_regs           regs_intr;            /*   312    16 */
  	/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
  	u64                        stack_user_size;      /*   328     8 */
  	u64                        phys_addr;            /*   336     8 */
  	u64                        cgroup;               /*   344     8 */

  	/* size: 384, cachelines: 6, members: 22 */
  	/* padding: 32 */
  };
  $

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-27 10:41:44 -03:00
Namhyung Kim
96aaab6865 perf/core: Add PERF_RECORD_CGROUP event
To support cgroup tracking, add CGROUP event to save a link between
cgroup path and id number.  This is needed since cgroups can go away
when userspace tries to read the cgroup info (from the id) later.

The attr.cgroup bit was also added to enable cgroup tracking from
userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

Committer testing:

From the resulting kernel, using /sys/kernel/btf/vmlinux:

  $ pahole perf_event_attr | grep -w cgroup -B5 -A1
  	__u64                      write_backward:1;     /*    40:27  8 */
  	__u64                      namespaces:1;         /*    40:28  8 */
  	__u64                      ksymbol:1;            /*    40:29  8 */
  	__u64                      bpf_event:1;          /*    40:30  8 */
  	__u64                      aux_output:1;         /*    40:31  8 */
  	__u64                      cgroup:1;             /*    40:32  8 */
  	__u64                      __reserved_1:31;      /*    40:33  8 */
  $

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
[staticize perf_event_cgroup function]
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-27 10:39:11 -03:00
Joerg Roedel
ff68eb2330 Merge branches 'iommu/fixes', 'arm/qcom', 'arm/omap', 'arm/smmu', 'x86/amd', 'x86/vt-d', 'virtio' and 'core' into next 2020-03-27 11:33:27 +01:00
Jean-Philippe Brucker
3f84b96c97 iommu/virtio: Fix sparse warning
We copied the virtio_iommu_config from the virtio-iommu specification,
which declares the fields using little-endian annotations (for example
le32). Unfortunately this causes sparse to warn about comparison between
little- and cpu-endian, because of the typecheck() in virtio_cread():

drivers/iommu/virtio-iommu.c:1024:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1024:9: sparse:    restricted __le64 *
drivers/iommu/virtio-iommu.c:1024:9: sparse:    unsigned long long *
drivers/iommu/virtio-iommu.c:1036:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1036:9: sparse:    restricted __le64 *
drivers/iommu/virtio-iommu.c:1036:9: sparse:    unsigned long long *
drivers/iommu/virtio-iommu.c:1040:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1040:9: sparse:    restricted __le64 *
drivers/iommu/virtio-iommu.c:1040:9: sparse:    unsigned long long *
drivers/iommu/virtio-iommu.c:1044:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1044:9: sparse:    restricted __le32 *
drivers/iommu/virtio-iommu.c:1044:9: sparse:    unsigned int *
drivers/iommu/virtio-iommu.c:1048:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1048:9: sparse:    restricted __le32 *
drivers/iommu/virtio-iommu.c:1048:9: sparse:    unsigned int *
drivers/iommu/virtio-iommu.c:1052:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1052:9: sparse:    restricted __le32 *
drivers/iommu/virtio-iommu.c:1052:9: sparse:    unsigned int *

Although virtio_cread() does convert virtio-endian (in our case
little-endian) to cpu-endian, the typecheck() needs the two arguments to
have the same endianness. Do as UAPI headers of other virtio devices do,
and remove the endian annotation from the device config.

Even though we change the UAPI this shouldn't cause any regression since
QEMU, the existing implementation of virtio-iommu that uses this header,
already removes the annotations when importing headers.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20200326093558.2641019-2-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-27 11:09:18 +01:00
Linus Torvalds
f3e69428b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:

 - a fix to generate proper timestamps on key autorepeat events that
   were broken recently

 - a fix for Synaptics driver to only activate reduced reporting mode
   when explicitly requested

 - a new keycode for "selective screenshot" function

 - other assorted fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: fix stale timestamp on key autorepeat events
  Input: move the new KEY_SELECTIVE_SCREENSHOT keycode
  Input: avoid BIT() macro usage in the serio.h UAPI header
  Input: synaptics-rmi4 - set reduced reporting mode only when requested
  Input: synaptics - enable RMI on HP Envy 13-ad105ng
  Input: allocate keycode for "Selective Screenshot" key
  Input: tm2-touchkey - add support for Coreriver TC360 variant
  dt-bindings: input: add Coreriver TC360 binding
  dt-bindings: vendor-prefixes: Add Coreriver vendor prefix
  Input: raydium_i2c_ts - fix error codes in raydium_i2c_boot_trigger()
2020-03-26 20:49:44 -07:00
Antoine Tenart
21114b7fee net: macsec: add support for offloading to the MAC
This patch adds a new MACsec offloading option, MACSEC_OFFLOAD_MAC,
allowing a user to select a MAC as a provider for MACsec offloading
operations.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 20:17:36 -07:00
Eugene Syromiatnikov
673040c3a8 taprio: do not use BIT() in TCA_TAPRIO_ATTR_FLAG_* definitions
BIT() macro definition is internal to the Linux kernel and is not
to be used in UAPI headers; replace its usage with the _BITUL() macro
that is already used elsewhere in the header.

Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 20:08:45 -07:00
Eugene Syromiatnikov
b0efe02812 rtc: make definitions in include/uapi/linux/rtc.h actually useful for user space
BIT() macro is not defined in UAPI headers; there is, however, similarly
defined _BITUL() macro present in include/uapi/linux/const.h; use it
instead and include <linux/const.h> and <linux/ioctl.h> in order to make
the definitions provided in the header useful.

Fixes: 3431ca4837 ("rtc: define RTC_VL_READ values")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20200324041209.GA30727@asgard.redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2020-03-26 21:38:40 +01:00
Dmitry Torokhov
fbf66796a0 Input: move the new KEY_SELECTIVE_SCREENSHOT keycode
We should try to keep keycodes sequential unless there is a reason to leave
a gap in numbering, so let's move it from 0x280 to 0x27a while we still
can.

Fixes: 3b059da983 ("Input: allocate keycode for Selective Screenshot key")
Acked-by: Rajat Jain <rajatja@google.com>
Link: https://lore.kernel.org/r/20200326182711.GA259753@dtor-ws
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-03-26 12:55:17 -07:00
Eugene Syromiatnikov
9b6eaaf3db coresight: do not use the BIT() macro in the UAPI header
The BIT() macro definition is not available for the UAPI headers
(moreover, it can be defined differently in the user space); replace
its usage with the _BITUL() macro that is defined in <linux/const.h>.

Fixes: 237483aa5c ("coresight: stm: adding driver for CoreSight STM component")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20200324042213.GA10452@asgard.redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-26 15:10:44 +01:00
Paul Mackerras
9a5788c615 KVM: PPC: Book3S HV: Add a capability for enabling secure guests
At present, on Power systems with Protected Execution Facility
hardware and an ultravisor, a KVM guest can transition to being a
secure guest at will.  Userspace (QEMU) has no way of knowing
whether a host system is capable of running secure guests.  This
will present a problem in future when the ultravisor is capable of
migrating secure guests from one host to another, because
virtualization management software will have no way to ensure that
secure guests only run in domains where all of the hosts can
support secure guests.

This adds a VM capability which has two functions: (a) userspace
can query it to find out whether the host can support secure guests,
and (b) userspace can enable it for a guest, which allows that
guest to become a secure guest.  If userspace does not enable it,
KVM will return an error when the ultravisor does the hypercall
that indicates that the guest is starting to transition to a
secure guest.  The ultravisor will then abort the transition and
the guest will terminate.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
2020-03-26 11:09:04 +11:00
Amir Goldstein
44d705b037 fanotify: report name info for FAN_DIR_MODIFY event
Report event FAN_DIR_MODIFY with name in a variable length record similar
to how fid's are reported.  With name info reporting implemented, setting
FAN_DIR_MODIFY in mark mask is now allowed.

When events are reported with name, the reported fid identifies the
directory and the name follows the fid. The info record type for this
event info is FAN_EVENT_INFO_TYPE_DFID_NAME.

For now, all reported events have at most one info record which is
either FAN_EVENT_INFO_TYPE_FID or FAN_EVENT_INFO_TYPE_DFID_NAME (for
FAN_DIR_MODIFY).  Later on, events "on child" will report both records.

There are several ways that an application can use this information:

1. When watching a single directory, the name is always relative to
the watched directory, so application need to fstatat(2) the name
relative to the watched directory.

2. When watching a set of directories, the application could keep a map
of dirfd for all watched directories and hash the map by fid obtained
with name_to_handle_at(2).  When getting a name event, the fid in the
event info could be used to lookup the base dirfd in the map and then
call fstatat(2) with that dirfd.

3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
directories, the application could use open_by_handle_at(2) with the fid
in event info to obtain dirfd for the directory where event happened and
call fstatat(2) with this dirfd.

The last option scales better for a large number of watched directories.
The first two options may be available in the future also for non
privileged fanotify watchers, because open_by_handle_at(2) requires
the CAP_DAC_READ_SEARCH capability.

Link: https://lore.kernel.org/r/20200319151022.31456-15-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25 23:17:16 +01:00
Amir Goldstein
9e2ba2c34f fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name
Dirent events are going to be supported in two flavors:

1. Directory fid info + mask that includes the specific event types
   (e.g. FAN_CREATE) and an optional FAN_ONDIR flag.
2. Directory fid info + name + mask that includes only FAN_DIR_MODIFY.

To request the second event flavor, user needs to set the event type
FAN_DIR_MODIFY in the mark mask.

The first flavor is supported since kernel v5.1 for groups initialized
with flag FAN_REPORT_FID.  It is intended to be used for watching
directories in "batch mode" - the watcher is notified when directory is
changed and re-scans the directory content in response.  This event
flavor is stored more compactly in the event queue, so it is optimal
for workloads with frequent directory changes.

The second event flavor is intended to be used for watching large
directories, where the cost of re-scan of the directory on every change
is considered too high.  The watcher getting the event with the directory
fid and entry name is expected to call fstatat(2) to query the content of
the entry after the change.

Legacy inotify events are reported with name and event mask (e.g. "foo",
FAN_CREATE | FAN_ONDIR).  That can lead users to the conclusion that
there is *currently* an entry "foo" that is a sub-directory, when in fact
"foo" may be negative or non-dir by the time user gets the event.

To make it clear that the current state of the named entry is unknown,
when reporting an event with name info, fanotify obfuscates the specific
event types (e.g. create,delete,rename) and uses a common event type -
FAN_DIR_MODIFY to describe the change.  This should make it harder for
users to make wrong assumptions and write buggy filesystem monitors.

At this point, name info reporting is not yet implemented, so trying to
set FAN_DIR_MODIFY in mark mask will return -EINVAL.

Link: https://lore.kernel.org/r/20200319151022.31456-12-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25 10:27:16 +01:00
Jonathan Neuschäfer
32f5f62d79 gpio: uapi: Improve phrasing around arrays representing empty strings
Character arrays can be considered empty strings (if they are
immediately terminated), but they cannot be NULL.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2020-03-25 09:50:44 +01:00
Eugene Syromiatnikov
52afa505a0 Input: avoid BIT() macro usage in the serio.h UAPI header
The commit 19ba1eb15a ("Input: psmouse - add a custom serio protocol
to send extra information") introduced usage of the BIT() macro
for SERIO_* flags; this macro is not provided in UAPI headers.
Replace if with similarly defined _BITUL() macro defined
in <linux/const.h>.

Fixes: 19ba1eb15a ("Input: psmouse - add a custom serio protocol to send extra information")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Cc: <stable@vger.kernel.org> # v5.0+
Link: https://lore.kernel.org/r/20200324041341.GA32335@asgard.redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-03-24 15:59:34 -07:00
Alex Williamson
43eeeecc8e vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
agnostic ioctl for setting, retrieving, and probing device features.
This implementation provides a 16-bit field for specifying a feature
index, where the data porition of the ioctl is determined by the
semantics for the given feature.  Additional flag bits indicate the
direction and nature of the operation; SET indicates user data is
provided into the device feature, GET indicates the device feature is
written out into user data.  The PROBE flag augments determining
whether the given feature is supported, and if provided, whether the
given operation on the feature is supported.

The first user of this ioctl is for setting the vfio-pci VF token,
where the user provides a shared secret key (UUID) on a SR-IOV PF
device, which users must provide when opening associated VF devices.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:28:27 -06:00
Gustavo A. R. Silva
1a91a36aba mmc: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200226223125.GA20630@embeddedor
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-03-24 14:39:45 +01:00
Jakub Kicinski
0dfb2d82af net: sched: rename more stats_types
Commit 53eca1f347 ("net: rename flow_action_hw_stats_types* ->
flow_action_hw_stats*") renamed just the flow action types and
helpers. For consistency rename variables, enums, struct members
and UAPI too (note that this UAPI was not in any official release,
yet).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 20:54:23 -07:00
Amir Goldstein
6473ea760c fsnotify: tidy up FS_ and FAN_ constants
Order by value, so the free value ranges are easier to find.

Link: https://lore.kernel.org/r/20200319151022.31456-2-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-23 18:00:47 +01:00
Nikolay Borisov
9c1036fdb1 btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support
This functionality was deprecated in kernel 5.4. Since no one has
complained of the impending removal it's time we did so.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:02:00 +01:00
Marcos Paulo de Souza
949964c928 btrfs: add new BTRFS_IOC_SNAP_DESTROY_V2 ioctl
This ioctl will be responsible for deleting a subvolume using its id.
This can be used when a system has a file system mounted from a
subvolume, rather than the root file system, like below:

/
@subvol1/
@subvol2/
@subvol_default/

If only @subvol_default is mounted, we have no path to reach @subvol1
and @subvol2, thus no way to delete them. Current subvolume delete ioctl
takes a file handle point as argument, and if @subvol_default is
mounted, we can't reach @subvol1 and @subvol2 from the same mount point.

This patch introduces a new ioctl BTRFS_IOC_SNAP_DESTROY_V2 that takes
the extended structure with flags to allow to delete subvolume using
subvolid.

Now, we can use this new ioctl specifying the subvolume id and refer to
the same mount point. It doesn't matter which subvolume was mounted,
since we can reach to the desired one using the subvolume id, and then
delete it.

The full path to the subvolume id is resolved internally and access is
verified as if the subvolume was accessed by path.

The volume args v2 structure is extended to use the existing union for
subvolume id specification, that's valid in case the
BTRFS_SUBVOL_SPEC_BY_ID is set.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:42 +01:00
David Sterba
eed0269053 btrfs: define support masks for ioctl volume args v2
The ioctl data for devices or subvolumes can be passed via
btrfs_ioctl_vol_args or btrfs_ioctl_vol_args_v2. The latter is more
versatile and needs some caution as some of the flags make sense only
for some ioctls.

As we're going to extend the flags, define support masks for each ioctl
class separately.

Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:42 +01:00
Yuri Benditovich
3024e20958 virtio-net: Introduce hash report feature
The feature VIRTIO_NET_F_HASH_REPORT extends the
layout of the packet and requests the device to
calculate hash on incoming packets and report it
in the packet header.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Link: https://lore.kernel.org/r/20200302115003.14877-4-yuri.benditovich@daynix.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-03-23 09:50:02 -04:00
Yuri Benditovich
fd58bf6745 virtio-net: Introduce RSS receive steering feature
RSS (Receive-side scaling) defines hash calculation
rules and decision on receive virtqueue according to
the calculated hash, provided mask to apply and
provided indirection table containing indices of
receive virqueues. The driver sends the control
command to enable multiqueue and provide parameters
for receive steering.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Link: https://lore.kernel.org/r/20200302115003.14877-3-yuri.benditovich@daynix.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-03-23 09:50:02 -04:00
Yuri Benditovich
22b436c9b5 virtio-net: Introduce extended RSC feature
VIRTIO_NET_F_RSC_EXT feature bit indicates that the device
is able to provide extended RSC information. When the feature
is negotiatede and 'gso_type' field in received packet is not
GSO_NONE, the device reports number of coalesced packets in
'csum_start' field and number of duplicated acks in 'csum_offset'
field and sets VIRTIO_NET_HDR_F_RSC_INFO in 'flags' field.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Link: https://lore.kernel.org/r/20200302115003.14877-2-yuri.benditovich@daynix.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-03-23 09:50:02 -04:00
Greg Kroah-Hartman
d2e971d884 Merge 5.6-rc7 into usb-next
We need the USB fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-23 08:04:08 +01:00
Lukas Bulwahn
9f5834c868 io_uring: make spdxcheck.py happy
Commit bbbdeb4720 ("io_uring: dual license io_uring.h uapi header")
uses a nested SPDX-License-Identifier to dual license the header.

Since then, ./scripts/spdxcheck.py complains:

  include/uapi/linux/io_uring.h: 1:60 Missing parentheses: OR

Add parentheses to make spdxcheck.py happy.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-21 14:03:46 -06:00
David S. Miller
0d7043f355 Another set of changes:
* HE ranging (fine timing measurement) API support
  * hwsim gets virtio support, for use with wmediumd,
    to be able to simulate with multiple machines
  * eapol-over-nl80211 improvements to exclude preauth
  * IBSS reset support, to recover connections from
    userspace
  * and various others.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl50yM8ACgkQB8qZga/f
 l8Qe0w//Rt5vsqlkiuZtg1SHQcMHq5Y6jJ92Xfg6Yw8H3H1g40r6UWDeSsF7Dl9H
 YuC4VvfNH/FhnNz01FDVxjgjnUyjpAolxM6sYj3hW3ZIBahSBeovAdjTP59y1Fhi
 zLMP/zYLsqtlKO5qr0fK7tUmENiF/IxyALRcxqsmQ1ul5003pIpFRVPoQVDTjHZn
 JCzDTboQLmtyYiDYjejaeauo4Awjj4lDiyk3Fu+kXSv8weY4dl+w8/AM5zsiVNDs
 4HRDpJUBIwfwWuygMNwl0EQVhnpYaNQFQ+IEe9B29qiAEb95MPHhSudU1qaT+4d7
 CykikhH39/n4Wv1PBg4PrCfWInxAUiPz6K027lDOY5VtJFFbLg8GqQkn3Vy/WnHm
 CJuVaLdJ73k7AypvWdZUgSe9SaloymVd987IIMixmz0t0cqIE3kPYot//kXFjy6m
 rqvWJ3a7SIopPdkD4xXDwzASUmctCUedQ0QVnWYRQCUW5rnkZ96SHjblUA9IykYS
 Ia44BwB8ggZsax1KrqyGV9hgj8mh9srLmEjtRPhxVD3ncE1ewiQaBRwv56J0EKNu
 yRnV10dNMCtUyT3j9dRpHa8zuNpS/A6ShXe4aGgoXAz+ShjNOBj2GxXtolsDJzrd
 zIqZH8gcWaeV+8yWhaGx8d82gvY8PhRvJl+AMax7Ds2avOZ/pa0=
 =t3QF
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2020-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Another set of changes:
 * HE ranging (fine timing measurement) API support
 * hwsim gets virtio support, for use with wmediumd,
   to be able to simulate with multiple machines
 * eapol-over-nl80211 improvements to exclude preauth
 * IBSS reset support, to recover connections from
   userspace
 * and various others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20 08:57:38 -07:00
Nikolay Aleksandrov
c443758b21 net: bridge: vlan options: move the tunnel command to the nested attribute
Now that we have a nested tunnel info attribute we can add a separate
one for the tunnel command and require it explicitly from user-space. It
must be one of RTM_SETLINK/DELLINK. Only RTM_SETLINK requires a valid
tunnel id, DELLINK just removes it if it was set before. This allows us
to have all tunnel attributes and control in one place, thus removing
the need for an outside vlan info flag.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20 08:52:20 -07:00
Nikolay Aleksandrov
fa388f29a9 net: bridge: vlan options: nest the tunnel id into a tunnel info attribute
While discussing the new API, Roopa mentioned that we'll be adding more
tunnel attributes and options in the future, so it's better to make it a
nested attribute, since this is still in net-next we can easily change it
and nest the tunnel id attribute under BRIDGE_VLANDB_ENTRY_TUNNEL_INFO.

The new format is:
 [BRIDGE_VLANDB_ENTRY]
     [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO]
         [BRIDGE_VLANDB_TINFO_ID]

Any new tunnel attributes can be nested under
BRIDGE_VLANDB_ENTRY_TUNNEL_INFO.

Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20 08:52:20 -07:00
Veerendranath Jakkam
7fc82af856 cfg80211: Configure PMK lifetime and reauth threshold for PMKSA entries
Drivers that trigger roaming need to know the lifetime of the configured
PMKSA for deciding whether to trigger the full or PMKSA cache based
authentication. The configured PMKSA is invalid after the PMK lifetime
has expired and must not be used after that and the STA needs to
disassociate if the PMK expires. Hence the STA is expected to refresh
the PMK with a full authentication before this happens (e.g., when
reassociating to a new BSS the next time or by performing EAPOL
reauthentication depending on the AKM) to avoid unnecessary
disconnection.

The PMK reauthentication threshold is the percentage of the PMK lifetime
value and indicates to the driver to trigger a full authentication roam
(without PMKSA caching) after the reauthentication threshold time, but
before the PMK timer has expired. Authentication methods like SAE need
to be able to generate a new PMKSA entry without having to force a
disconnection after this threshold timeout. If no roaming occurs between
the reauthentication threshold time and PMK lifetime expiration,
disassociation is still forced.

The new attributes for providing these values correspond to the dot11
MIB variables dot11RSNAConfigPMKLifetime and
dot11RSNAConfigPMKReauthThreshold.

This type of functionality is already available in cases where user
space component is in control of roaming. This commit extends that same
capability into cases where parts or all of this functionality is
offloaded to the driver.

Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200312235903.18462-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-20 14:42:20 +01:00
Nicolas Cavallari
edafcf4259 cfg80211: Add support for userspace to reset stations in IBSS mode
Sometimes, userspace is able to detect that a peer silently lost its
state (like, if the peer reboots). wpa_supplicant does this for IBSS-RSN
by registering for auth/deauth frames, but when it detects this, it is
only able to remove the encryption keys of the peer and close its port.

However, the kernel also hold other state about the station, such as BA
sessions, probe response parameters and the like.  They also need to be
resetted correctly.

This patch adds the NL80211_EXT_FEATURE_DEL_IBSS_STA feature flag
indicating the driver accepts deleting stations in IBSS mode, which
should send a deauth and reset the state of the station, just like in
mesh point mode.

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Link: https://lore.kernel.org/r/20200305135754.12094-1-cavallar@lri.fr
[preserve -EINVAL return]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-20 14:42:20 +01:00
Shaul Triebitz
0c138a5c2b nl80211: add PROTECTED_TWT nl80211 extended feature
Add API for telling whether the driver supports protected TWT.
The protected_twt capability in the RSNXE will be based on this.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200131111300.891737-23-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-20 14:42:20 +01:00
Avraham Stern
efb5520d0e nl80211/cfg80211: add support for non EDCA based ranging measurement
Add support for requesting that the ranging measurement will use
the trigger-based / non trigger-based flow instead of the EDCA based
flow.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200131111300.891737-2-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-20 14:42:19 +01:00
Markus Theil
5631d96aa3 nl80211: add no pre-auth attribute and ext. feature flag for ctrl. port
If the nl80211 control port is used before this patch, pre-auth frames
(0x88c7) are send to userspace uncoditionally. While this enables userspace
to only use nl80211 on the station side, it is not always useful for APs.
Furthermore, pre-auth frames are ordinary data frames and not related to
the control port. Therefore it should for example be possible for pre-auth
frames to be bridged onto a wired network on AP side without touching
userspace.

For backwards compatibility to code already using pre-auth over nl80211,
this patch adds a feature flag to disable this behavior, while it remains
enabled by default. An additional ext. feature flag is added to detect this
from userspace.

Thanks to Jouni for pointing out, that pre-auth frames should be handled as
ordinary data frames.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200312091055.54257-2-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-20 14:42:19 +01:00
Erel Geron
5d44fe7c98 mac80211_hwsim: add frame transmission support over virtio
This allows communication with external entities.

It also required fixing up the netlink policy, since NLA_UNSPEC
attributes are no longer accepted.

Signed-off-by: Erel Geron <erelx.geron@intel.com>
[port to backports, inline the ID, use 29 as the ID as requested,
 drop != NULL checks, reduce ifdefs]
Link: https://lore.kernel.org/r/20200305143212.c6e4c87d225b.I7ce60bf143e863dcdf0fb8040aab7168ba549b99@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-20 14:42:19 +01:00
Daniel Glöckner
573a750813 media: v4l: Add 1X14 14-bit greyscale media bus code definition
The code is called MEDIA_BUS_FMT_Y14_1X14 and behaves just like
MEDIA_BUS_FMT_Y12_1X12 with two more bits.

Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20 09:01:16 +01:00
Daniel Glöckner
ae9753a04c media: v4l: Add 14-bit raw greyscale pixel format
The new format is called V4L2_PIX_FMT_Y14. Like V4L2_PIX_FMT_Y10 and
V4L2_PIX_FMT_Y12 it is stored in two bytes per pixel but has only two
unused bits at the top.

Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20 09:00:56 +01:00
Sakari Ailus
d12127ed0e media: v4l: Add 14-bit raw bayer pixel formats
The formats added by this patch are:

	V4L2_PIX_FMT_SBGGR14
	V4L2_PIX_FMT_SGBRG14
	V4L2_PIX_FMT_SGRBG14
	V4L2_PIX_FMT_SRGGB14

Signed-off-by: Jouni Ukkonen <jouni.ukkonen@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
[dg@emlix.com: rebased onto current media_tree]
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20 09:00:23 +01:00
Eric Biggers
e98ad46475 fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl
Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves the nonce from
an encrypted file or directory.  The nonce is the 16-byte random value
stored in the inode's encryption xattr.  It is normally used together
with the master key to derive the inode's actual encryption key.

The nonces are needed by automated tests that verify the correctness of
the ciphertext on-disk.  Except for the IV_INO_LBLK_64 case, there's no
way to replicate a file's ciphertext without knowing that file's nonce.

The nonces aren't secret, and the existing ciphertext verification tests
in xfstests retrieve them from disk using debugfs or dump.f2fs.  But in
environments that lack these debugging tools, getting the nonces by
manually parsing the filesystem structure would be very hard.

To make this important type of testing much easier, let's just add an
ioctl that retrieves the nonce.

Link: https://lore.kernel.org/r/20200314205052.93294-2-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-03-19 21:56:54 -07:00
Nikolay Aleksandrov
56d099761a net: bridge: vlan: include stats in dumps if requested
This patch adds support for vlan stats to be included when dumping vlan
information. We have to dump them only when explicitly requested (thus the
flag below) because that disables the vlan range compression and will make
the dump significantly larger. In order to request the stats to be
included we add a new dump attribute called BRIDGE_VLANDB_DUMP_FLAGS which
can affect dumps with the following first flag:
  - BRIDGE_VLANDB_DUMPF_STATS
The stats are intentionally nested and put into separate attributes to make
it easier for extending later since we plan to add per-vlan mcast stats,
drop stats and possibly STP stats. This is the last missing piece from the
new vlan API which makes the dumped vlan information complete.

A dump request which should include stats looks like:
 [BRIDGE_VLANDB_DUMP_FLAGS] |= BRIDGE_VLANDB_DUMPF_STATS

A vlandb entry attribute with stats looks like:
 [BRIDGE_VLANDB_ENTRY] = {
     [BRIDGE_VLANDB_ENTRY_STATS] = {
         [BRIDGE_VLANDB_STATS_RX_BYTES]
         [BRIDGE_VLANDB_STATS_RX_PACKETS]
         ...
     }
 }

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19 20:21:47 -07:00
Ingo Molnar
409e1a3140 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-19 15:01:45 +01:00
Pablo Neira Ayuso
65038428b2 netfilter: nf_tables: allow to specify stateful expression in set definition
This patch allows users to specify the stateful expression for the
elements in this set via NFTA_SET_EXPR. This new feature allows you to
turn on counters for all of the elements in this set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19 11:37:31 +01:00
Daniel Borkmann
357b6cc583 netfilter: revert introduction of egress hook
This reverts the following commits:

  8537f78647 ("netfilter: Introduce egress hook")
  5418d3881e ("netfilter: Generalize ingress hook")
  b030f194ae ("netfilter: Rename ingress hook include file")

>From the discussion in [0], the author's main motivation to add a hook
in fast path is for an out of tree kernel module, which is a red flag
to begin with. Other mentioned potential use cases like NAT{64,46}
is on future extensions w/o concrete code in the tree yet. Revert as
suggested [1] given the weak justification to add more hooks to critical
fast-path.

  [0] https://lore.kernel.org/netdev/cover.1583927267.git.lukas@wunner.de/
  [1] https://lore.kernel.org/netdev/20200318.011152.72770718915606186.davem@davemloft.net/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Nacked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:35:48 -07:00
David S. Miller
a58741ef1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Use nf_flow_offload_tuple() to fetch flow stats, from Paul Blakey.

2) Add new xt_IDLETIMER hard mode, from Manoj Basapathi.
   Follow up patch to clean up this new mode, from Dan Carpenter.

3) Add support for geneve tunnel options, from Xin Long.

4) Make sets built-in and remove modular infrastructure for sets,
   from Florian Westphal.

5) Remove unused TEMPLATE_NULLS_VAL, from Li RongQing.

6) Statify nft_pipapo_get, from Chen Wandun.

7) Use C99 flexible-array member, from Gustavo A. R. Silva.

8) More descriptive variable names for bitwise, from Jeremy Sowden.

9) Four patches to add tunnel device hardware offload to the flowtable
   infrastructure, from wenxu.

10) pipapo set supports for 8-bit grouping, from Stefano Brivio.

11) pipapo can switch between nibble and byte grouping, also from
    Stefano.

12) Add AVX2 vectorized version of pipapo, from Stefano Brivio.

13) Update pipapo to be use it for single ranges, from Stefano.

14) Add stateful expression support to elements via control plane,
    eg. counter per element.

15) Re-visit sysctls in unprivileged namespaces, from Florian Westphal.

15) Add new egress hook, from Lukas Wunner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 23:51:31 -07:00
Russell King
74db1c18d8 net: phylink: pcs: add 802.3 clause 22 helpers
Implement helpers for PCS accessed via the MII bus using 802.3 clause
22 cycles, conforming to 802.3 clause 37 and Cisco SGMII specifications
for the advertisement word.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 22:51:16 -07:00
Nikolay Aleksandrov
569da08228 net: bridge: vlan options: add support for tunnel mapping set/del
This patch adds support for manipulating vlan/tunnel mappings. The
tunnel ids are globally unique and are one per-vlan. There were two
trickier issues - first in order to support vlan ranges we have to
compute the current tunnel id in the following way:
 - base tunnel id (attr) + current vlan id - starting vlan id
This is in line how the old API does vlan/tunnel mapping with ranges. We
already have the vlan range present, so it's redundant to add another
attribute for the tunnel range end. It's simply base tunnel id + vlan
range. And second to support removing mappings we need an out-of-band way
to tell the option manipulating function because there are no
special/reserved tunnel id values, so we use a vlan flag to denote the
operation is tunnel mapping removal.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 22:47:12 -07:00
Nikolay Aleksandrov
188c67dd19 net: bridge: vlan options: add support for tunnel id dumping
Add a new option - BRIDGE_VLANDB_ENTRY_TUNNEL_ID which is used to dump
the tunnel id mapping. Since they're unique per vlan they can enter a
vlan range if they're consecutive, thus we can calculate the tunnel id
range map simply as: vlan range end id - vlan range start id. The
starting point is the tunnel id in BRIDGE_VLANDB_ENTRY_TUNNEL_ID. This
is similar to how the tunnel entries can be created in a range via the
old API (a vlan range maps to a tunnel range).

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 22:47:12 -07:00
Eric Dumazet
583396f4ca net_sched: sch_fq: enable use of hrtimer slack
Add a new attribute to control the fq qdisc hrtimer slack.

Default is set to 10 usec.

When/if packets are throttled, fq set up an hrtimer that can
lead to one interrupt per packet in the throttled queue.

By using a timer slack, we allow better use of timer interrupts,
by giving them a chance to call multiple timer callbacks
at each hardware interrupt.

Also, giving a slack allows FQ to dequeue batches of packets
instead of a single one, thus increasing xmit_more efficiency.

This has no negative effect on the rate a TCP flow can sustain,
since each TCP flow maintains its own precise vtime (tp->tcp_wstamp_ns)

v2: added strict netlink checking (as feedback from Jakub Kicinski)

Tested:
 1000 concurrent flows all using paced packets.
 1,000,000 packets sent per second.

Before the patch :

$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 60726784  23628 3485992    0    0   138     1  977  535  0 12 87  0  0
 0  0      0 60714700  23628 3485628    0    0     0     0 1568827 26462  0 22 78  0  0
 1  0      0 60716012  23628 3485656    0    0     0     0 1570034 26216  0 22 78  0  0
 0  0      0 60722420  23628 3485492    0    0     0     0 1567230 26424  0 22 78  0  0
 0  0      0 60727484  23628 3485556    0    0     0     0 1568220 26200  0 22 78  0  0
 2  0      0 60718900  23628 3485380    0    0     0    40 1564721 26630  0 22 78  0  0
 2  0      0 60718096  23628 3485332    0    0     0     0 1562593 26432  0 22 78  0  0
 0  0      0 60719608  23628 3485064    0    0     0     0 1563806 26238  0 22 78  0  0
 1  0      0 60722876  23628 3485236    0    0     0   130 1565874 26566  0 22 78  0  0
 1  0      0 60722752  23628 3484908    0    0     0     0 1567646 26247  0 22 78  0  0

After the patch, slack of 10 usec, we can see a reduction of interrupts
per second, and a small decrease of reported cpu usage.

$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 60722564  23628 3484728    0    0   133     1  696  545  0 13 87  0  0
 1  0      0 60722568  23628 3484824    0    0     0     0 977278 25469  0 20 80  0  0
 0  0      0 60716396  23628 3484764    0    0     0     0 979997 25326  0 20 80  0  0
 0  0      0 60713844  23628 3484960    0    0     0     0 981394 25249  0 20 80  0  0
 2  0      0 60720468  23628 3484916    0    0     0     0 982860 25062  0 20 80  0  0
 1  0      0 60721236  23628 3484856    0    0     0     0 982867 25100  0 20 80  0  0
 1  0      0 60722400  23628 3484456    0    0     0     8 982698 25303  0 20 80  0  0
 0  0      0 60715396  23628 3484428    0    0     0     0 981777 25176  0 20 80  0  0
 0  0      0 60716520  23628 3486544    0    0     0    36 978965 27857  0 21 79  0  0
 0  0      0 60719592  23628 3486516    0    0     0    22 977318 25106  0 20 80  0  0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 21:16:35 -07:00
Rajat Jain
3b059da983 Input: allocate keycode for "Selective Screenshot" key
New Chrome OS keyboards have a "snip" key that is basically a selective
screenshot (allows a user to select an area of screen to be copied).
Allocate a keycode for it.

Signed-off-by: Rajat Jain <rajatja@google.com>
Reviewed-by: Harry Cutts <hcutts@chromium.org>
Link: https://lore.kernel.org/r/20200313180333.75011-1-rajatja@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-03-17 20:08:52 -07:00
Lukas Wunner
8537f78647 netfilter: Introduce egress hook
Commit e687ad60af ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets on ingress.

Allow the same on egress.  Position the hook immediately before a packet
is handed to tc and then sent out on an interface, thereby mirroring the
ingress order.  This order allows marking packets in the netfilter
egress hook and subsequently using the mark in tc.  Another benefit of
this order is consistency with a lot of existing documentation which
says that egress tc is performed after netfilter hooks.

Egress hooks already exist for the most common protocols, such as
NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
they are executed earlier during packet processing.  However for more
exotic protocols, there is currently no provision to apply netfilter on
egress.  A common workaround is to enslave the interface to a bridge and
use ebtables, or to resort to tc.  But when the ingress hook was
introduced, consensus was that users should be given the choice to use
netfilter or tc, whichever tool suits their needs best:
https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/
This hook is also useful for NAT46/NAT64, tunneling and filtering of
locally generated af_packet traffic such as dhclient.

There have also been occasional user requests for a netfilter egress
hook in the past, e.g.:
https://www.spinics.net/lists/netfilter/msg50038.html

Performance measurements with pktgen surprisingly show a speedup rather
than a slowdown with this commit:

* Without this commit:
  Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
  2920481pps 1401Mb/sec (1401830880bps) errors: 0

* With this commit:
  Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
  2941410pps 1411Mb/sec (1411876800bps) errors: 0

* Without this commit + tc egress:
  Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
  2562631pps 1230Mb/sec (1230062880bps) errors: 0

* With this commit + tc egress:
  Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
  2659259pps 1276Mb/sec (1276444320bps) errors: 0

* With this commit + nft egress:
  Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
  2413320pps 1158Mb/sec (1158393600bps) errors: 0

Tested on a bare-metal Core i7-3615QM, each measurement was performed
three times to verify that the numbers are stable.

Commands to perform a measurement:
modprobe pktgen
echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000

Commands for testing tc egress:
tc qdisc add dev lo clsact
tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32

Commands for testing nft egress:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop

All testing was performed on the loopback interface to avoid distorting
measurements by the packet handling in the low-level Ethernet driver.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-18 01:20:15 +01:00
Andrey Konovalov
956ae8df7f usb: raw_gadget: fix compilation warnings in uapi headers
Mark usb_raw_io_flags_valid() and usb_raw_io_flags_zero() as inline to
fix the following warnings:

./usr/include/linux/usb/raw_gadget.h:69:12: warning: unused function 'usb_raw_io_flags_valid' [-Wunused-function]
./usr/include/linux/usb/raw_gadget.h:74:12: warning: unused function 'usb_raw_io_flags_zero' [-Wunused-function]

Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/6206b80b3810f95bfe1d452de45596609a07b6ea.1584456779.git.andreyknvl@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-17 16:04:49 +01:00
Dave Martin
ab7876a98a arm64: elf: Enable BTI at exec based on ELF program properties
For BTI protection to be as comprehensive as possible, it is
desirable to have BTI enabled from process startup.  If this is not
done, the process must use mprotect() to enable BTI for each of its
executable mappings, but this is painful to do in the libc startup
code.  It's simpler and more sound to have the kernel do it
instead.

To this end, detect BTI support in the executable (or ELF
interpreter, as appropriate), via the
NT_GNU_PROGRAM_PROPERTY_TYPE_0 note, and tweak the initial prot
flags for the process' executable pages to include PROT_BTI as
appropriate.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-16 17:19:48 +00:00
Dave Martin
00e19ceec8 ELF: Add ELF program property parsing support
ELF program properties will be needed for detecting whether to
enable optional architecture or ABI features for a new ELF process.

For now, there are no generic properties that we care about, so do
nothing unless CONFIG_ARCH_USE_GNU_PROPERTY=y.

Otherwise, the presence of properties using the PT_PROGRAM_PROPERTY
phdrs entry (if any), and notify each property to the arch code.

For now, the added code is not used.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-16 17:19:48 +00:00
Dave Martin
db751e309f ELF: UAPI and Kconfig additions for ELF program properties
Pull the basic ELF definitions relating to the
NT_GNU_PROPERTY_TYPE_0 note from Yu-Cheng Yu's earlier x86 shstk
series.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-16 17:19:48 +00:00
Paolo Bonzini
1c482452d5 KVM: s390: Features and Enhancements for 5.7 part1
1. Allow to disable gisa
 2. protected virtual machines
   Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's
   state like guest memory and guest registers anymore. Instead the
   PVMs are mostly managed by a new entity called Ultravisor (UV),
   which provides an API, so KVM and the PV can request management
   actions.
 
   PVMs are encrypted at rest and protected from hypervisor access
   while running.  They switch from a normal operation into protected
   mode, so we can still use the standard boot process to load a
   encrypted blob and then move it into protected mode.
 
   Rebooting is only possible by passing through the unprotected/normal
   mode and switching to protected again.
 
   One mm related patch will go via Andrews mm tree ( mm/gup/writeback:
   add callbacks for inaccessible pages)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJeZf9tAAoJEBF7vIC1phx89J0P/iv3wCoMNDqAttnHa/UQFF04
 njUadNYkAADDrsabIEOs9O+BE1/4BVspnIunE4+xw76p5M/7/g5eIhXWcLudhlnL
 +XtvuEwz/2ffA9JWAAYNKB7cGqBM9BCC+iYzAF9ah6sPLmlDCoF+hRe0g+0tXSON
 cklUJFril9bOcxd/MxrzFLcmipbxT/Z4/10eBY+FHcm6SQGOKAtJH0xL7X3PfPI5
 L/6ZhML9exsj1Iplkrl8BomMRoYOrvfq/jMaZp9SwmfXaOKYmNU3a19MhzfZ593h
 bfR92H8kZRy/TpBd7EnpxYGQ/n53HkUhFMhtqkkkeHW1rCo8ccwC4VfnXb+KqQp+
 nJ8KieWG+OlKKFDuZPl5Gq+jQqjJfzchbyMTYnBNe+GPT5zg76tJXmQyDn5X9p3R
 mfg+9ZEeEonMu7px93Ht1gLdPiC2gjRckjuBDPqMGEhG2z2SQ/MLri+WnproIQRa
 TcE7rZBtuyrGFTq4M4dEcsUW02xnOaav6H57kkl8EwqYwgDHlqoUbt0AvLFyW07a
 RlH7drmhKDwTJkcOhOLeLNM8Un6NvnsLZ8Lbcr9rRf9Z9Lpc+zW88BSwJ7MM/GH8
 FEQM8Omnn8KAJTENpIm3bHHyvsi0kJEhl+c3Ila3QnYzXZbJ3ZDaJZngMAbUUnVl
 YNeFyyALzOgVVBx4kvTm
 =x6Hn
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Features and Enhancements for 5.7 part1

1. Allow to disable gisa
2. protected virtual machines
  Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's
  state like guest memory and guest registers anymore. Instead the
  PVMs are mostly managed by a new entity called Ultravisor (UV),
  which provides an API, so KVM and the PV can request management
  actions.

  PVMs are encrypted at rest and protected from hypervisor access
  while running.  They switch from a normal operation into protected
  mode, so we can still use the standard boot process to load a
  encrypted blob and then move it into protected mode.

  Rebooting is only possible by passing through the unprotected/normal
  mode and switching to protected again.

  One mm related patch will go via Andrews mm tree ( mm/gup/writeback:
  add callbacks for inaccessible pages)
2020-03-16 18:19:34 +01:00
Jay Zhou
3c9bd4006b KVM: x86: enable dirty log gradually in small chunks
It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:

1. Initialize all the bits of the dirty bitmap to 1 when enabling
   dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
   SPTEs gradually in small chunks

Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:

VM Size        Before    After optimization
128G           460ms     10ms

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:37 +01:00
Willy Tarreau
e2032464fe floppy: separate the FDC's base address from its registers
FDC registers FD_STATUS, FD_DATA, FD_DOR, FD_DIR and FD_DCR used to be
defined relative to FD_IOPORT, which is the FDC's base address, itself
a macro depending on the "fdc" local or global variable.

This patch changes this so that the register macros above now only
reference the address offset, and that the FDC's address is explicitly
passed in each call to fd_inb() and fd_outb(), thus removing the macro.
With this change there is no more implicit usage of the local/global
"fdc" variable.

One place in the ARM code used to check if the port was equal to FD_DOR,
this was changed to testing the register by applying a mask to the port,
as was already done in the sparc code.

There are still occurrences of fd_inb() and fd_outb() in the PARISC
code and these ones remain unaffected since they already used to work
with a base address and a register offset.

The sparc, m68k and parisc code could now be slightly cleaned up to
benefit from the macro definitions above instead of the equivalent
hard-coded values.

Link: https://lore.kernel.org/r/20200301195555.11154-6-w@1wt.eu
Cc: Ian Molton <spyro@f2s.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16 08:26:58 -06:00
Era Mayflower
48ef50fa86 macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)
Netlink support of extended packet number cipher suites,
allows adding and updating XPN macsec interfaces.

Added support in:
    * Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites.
    * Setting and getting 64bit packet numbers with of SAs.
    * Setting (only on SA creation) and getting ssci of SAs.
    * Setting salt when installing a SAK.

Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1:
    * MACSEC_CIPHER_ID_GCM_AES_XPN_128
    * MACSEC_CIPHER_ID_GCM_AES_XPN_256

In addition, added 2 new netlink attribute types:
    * MACSEC_SA_ATTR_SSCI
    * MACSEC_SA_ATTR_SALT

Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw.

Signed-off-by: Era Mayflower <mayflowerera@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16 01:42:31 -07:00
Gustavo A. R. Silva
6daf141401 netfilter: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

Lastly, fix checkpatch.pl warning
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
in net/bridge/netfilter/ebtables.c

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Xin Long
925d844696 netfilter: nft_tunnel: add support for geneve opts
Like vxlan and erspan opts, geneve opts should also be supported in
nft_tunnel. The difference is geneve RFC (draft-ietf-nvo3-geneve-14)
allows a geneve packet to carry multiple geneve opts. So with this
patch, nftables/libnftnl would do:

  # nft add table ip filter
  # nft add chain ip filter input { type filter hook input priority 0 \; }
  # nft add tunnel filter geneve_02 { type geneve\; id 2\; \
    ip saddr 192.168.1.1\; ip daddr 192.168.1.2\; \
    sport 9000\; dport 9001\; dscp 1234\; ttl 64\; flags 1\; \
    opts \"1:1:34567890,2:2:12121212,3:3:1212121234567890\"\; }
  # nft list tunnels table filter
    table ip filter {
    	tunnel geneve_02 {
    		id 2
    		ip saddr 192.168.1.1
    		ip daddr 192.168.1.2
    		sport 9000
    		dport 9001
    		tos 18
    		ttl 64
    		flags 1
    		geneve opts 1:1:34567890,2:2:12121212,3:3:1212121234567890
    	}
    }

v1->v2:
  - no changes, just post it separately.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Manoj Basapathi
68983a354a netfilter: xtables: Add snapshot of hardidletimer target
This is a snapshot of hardidletimer netfilter target.

This patch implements a hardidletimer Xtables target that can be
used to identify when interfaces have been idle for a certain period
of time.

Timers are identified by labels and are created when a rule is set
with a new label. The rules also take a timeout value (in seconds) as
an option. If more than one rule uses the same timer label, the timer
will be restarted whenever any of the rules get a hit.

One entry for each timer is created in sysfs. This attribute contains
the timer remaining for the timer to expire. The attributes are
located under the xt_idletimer class:

/sys/class/xt_idletimer/timers/<label>

When the timer expires, the target module sends a sysfs notification
to the userspace, which can then decide what to do (eg. disconnect to
save power)

Compared to IDLETIMER, HARDIDLETIMER can send notifications when
CPU is in suspend too, to notify the timer expiry.

v1->v2: Moved all functionality into IDLETIMER module to avoid
code duplication per comment from Florian.

Signed-off-by: Manoj Basapathi <manojbm@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Andrey Konovalov
f2c2e71764 usb: gadget: add raw-gadget interface
USB Raw Gadget is a kernel module that provides a userspace interface for
the USB Gadget subsystem. Essentially it allows to emulate USB devices
from userspace. Enabled with CONFIG_USB_RAW_GADGET. Raw Gadget is
currently a strictly debugging feature and shouldn't be used in
production.

Raw Gadget is similar to GadgetFS, but provides a more low-level and
direct access to the USB Gadget layer for the userspace. The key
differences are:

1. Every USB request is passed to the userspace to get a response, while
   GadgetFS responds to some USB requests internally based on the provided
   descriptors. However note, that the UDC driver might respond to some
   requests on its own and never forward them to the Gadget layer.

2. GadgetFS performs some sanity checks on the provided USB descriptors,
   while Raw Gadget allows you to provide arbitrary data as responses to
   USB requests.

3. Raw Gadget provides a way to select a UDC device/driver to bind to,
   while GadgetFS currently binds to the first available UDC.

4. Raw Gadget uses predictable endpoint names (handles) across different
   UDCs (as long as UDCs have enough endpoints of each required transfer
   type).

5. Raw Gadget has ioctl-based interface instead of a filesystem-based one.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-03-15 11:34:48 +02:00
Petr Machata
0a7fad2376 net: sched: RED: Introduce an ECN nodrop mode
When the RED Qdisc is currently configured to enable ECN, the RED algorithm
is used to decide whether a certain SKB should be marked. If that SKB is
not ECN-capable, it is early-dropped.

It is also possible to keep all traffic in the queue, and just mark the
ECN-capable subset of it, as appropriate under the RED algorithm. Some
switches support this mode, and some installations make use of it.

To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is
configured with this flag, non-ECT traffic is enqueued instead of being
early-dropped.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14 21:03:46 -07:00
Petr Machata
14bc175d9c net: sched: Allow extending set of supported RED flags
The qdiscs RED, GRED, SFQ and CHOKE use different subsets of the same pool
of global RED flags. These are passed in tc_red_qopt.flags. However none of
these qdiscs validate the flag field, and just copy it over wholesale to
internal structures, and later dump it back. (An exception is GRED, which
does validate for VQs -- however not for the main setup.)

A broken userspace can therefore configure a qdisc with arbitrary
unsupported flags, and later expect to see the flags on qdisc dump. The
current ABI therefore allows storage of several bits of custom data to
qdisc instances of the types mentioned above. How many bits, depends on
which flags are meaningful for the qdisc in question. E.g. SFQ recognizes
flags ECN and HARDDROP, and the rest is not interpreted.

If SFQ ever needs to support ADAPTATIVE, it needs another way of doing it,
and at the same time it needs to retain the possibility to store 6 bits of
uninterpreted data. Likewise RED, which adds a new flag later in this
patchset.

To that end, this patch adds a new function, red_get_flags(), to split the
passed flags of RED-like qdiscs to flags and user bits, and
red_validate_flags() to validate the resulting configuration. It further
adds a new attribute, TCA_RED_FLAGS, to pass arbitrary flags.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14 21:03:46 -07:00
David S. Miller
44ef976ab3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-03-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 86 non-merge commits during the last 12 day(s) which contain
a total of 107 files changed, 5771 insertions(+), 1700 deletions(-).

The main changes are:

1) Add modify_return attach type which allows to attach to a function via
   BPF trampoline and is run after the fentry and before the fexit programs
   and can pass a return code to the original caller, from KP Singh.

2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher
   objects to be visible in /proc/kallsyms so they can be annotated in
   stack traces, from Jiri Olsa.

3) Extend BPF sockmap to allow for UDP next to existing TCP support in order
   in order to enable this for BPF based socket dispatch, from Lorenz Bauer.

4) Introduce a new bpftool 'prog profile' command which attaches to existing
   BPF programs via fentry and fexit hooks and reads out hardware counters
   during that period, from Song Liu. Example usage:

   bpftool prog profile id 337 duration 3 cycles instructions llc_misses

        4228 run_cnt
     3403698 cycles                                              (84.08%)
     3525294 instructions   #  1.04 insn per cycle               (84.05%)
          13 llc_misses     #  3.69 LLC misses per million isns  (83.50%)

5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition
   of a new bpf_link abstraction to keep in particular BPF tracing programs
   attached even when the applicaion owning them exits, from Andrii Nakryiko.

6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering
   and which returns the PID as seen by the init namespace, from Carlos Neira.

7) Refactor of RISC-V JIT code to move out common pieces and addition of a
   new RV32G BPF JIT compiler, from Luke Nelson.

8) Add gso_size context member to __sk_buff in order to be able to know whether
   a given skb is GSO or not, from Willem de Bruijn.

9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output
   implementation but can be called from tracepoint programs, from Eelco Chaudron.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13 20:52:03 -07:00
David S. Miller
1d34357931 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 22:34:48 -07:00
Eelco Chaudron
d831ee84bf bpf: Add bpf_xdp_output() helper
Introduce new helper that reuses existing xdp perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct xdp_buff *' as a tracepoint argument.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial
2020-03-12 17:47:38 -07:00
Carlos Neira
b4490c5c4e bpf: Added new helper bpf_get_ns_current_pid_tgid
New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com
2020-03-12 17:33:11 -07:00
Linus Torvalds
1b51f69461 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "It looks like a decent sized set of fixes, but a lot of these are one
  liner off-by-one and similar type changes:

   1) Fix netlink header pointer to calcular bad attribute offset
      reported to user. From Pablo Neira Ayuso.

   2) Don't double clear PHY interrupts when ->did_interrupt is set,
      from Heiner Kallweit.

   3) Add missing validation of various (devlink, nl802154, fib, etc.)
      attributes, from Jakub Kicinski.

   4) Missing *pos increments in various netfilter seq_next ops, from
      Vasily Averin.

   5) Missing break in of_mdiobus_register() loop, from Dajun Jin.

   6) Don't double bump tx_dropped in veth driver, from Jiang Lidong.

   7) Work around FMAN erratum A050385, from Madalin Bucur.

   8) Make sure ARP header is pulled early enough in bonding driver,
      from Eric Dumazet.

   9) Do a cond_resched() during multicast processing of ipvlan and
      macvlan, from Mahesh Bandewar.

  10) Don't attach cgroups to unrelated sockets when in interrupt
      context, from Shakeel Butt.

  11) Fix tpacket ring state management when encountering unknown GSO
      types. From Willem de Bruijn.

  12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend()
      only in the suspend context. From Heiner Kallweit"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
  net: systemport: fix index check to avoid an array out of bounds access
  tc-testing: add ETS scheduler to tdc build configuration
  net: phy: fix MDIO bus PM PHY resuming
  net: hns3: clear port base VLAN when unload PF
  net: hns3: fix RMW issue for VLAN filter switch
  net: hns3: fix VF VLAN table entries inconsistent issue
  net: hns3: fix "tc qdisc del" failed issue
  taprio: Fix sending packets without dequeueing them
  net: mvmdio: avoid error message for optional IRQ
  net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register
  net: memcg: fix lockdep splat in inet_csk_accept()
  s390/qeth: implement smarter resizing of the RX buffer pool
  s390/qeth: refactor buffer pool code
  s390/qeth: use page pointers to manage RX buffer pool
  seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number
  net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
  net/packet: tpacket_rcv: do not increment ring index on drop
  sxgbe: Fix off by one in samsung driver strncpy size arg
  net: caif: Add lockdep expression to RCU traversal primitive
  MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer
  ...
2020-03-12 16:19:19 -07:00
Michal Kubecek
546379b9a0 ethtool: add CHANNELS_NTF notification
Send ETHTOOL_MSG_CHANNELS_NTF notification whenever channel counts of
a network device are modified using ETHTOOL_MSG_CHANNELS_SET netlink
message or ETHTOOL_SCHANNELS ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
e19c591eaf ethtool: set device channel counts with CHANNELS_SET request
Implement CHANNELS_SET netlink request to set channel counts of a network
device. These are traditionally set with ETHTOOL_SCHANNELS ioctl request.

Like the ioctl implementation, the generic ethtool code checks if supplied
values do not exceed driver defined limits; if they do, first offending
attribute is reported using extack. Checks preventing removing channels
used for RX indirection table or zerocopy AF_XDP socket are also
implemented.

Move ethtool_get_max_rxfh_channel() helper into common.c so that it can be
used by both ioctl and netlink code.

v2:
  - fix netdev reference leak in error path (found by Jakub Kicinsky)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
0c84979c95 ethtool: provide channel counts with CHANNELS_GET request
Implement CHANNELS_GET request to get channel counts of a network device.
These are traditionally available via ETHTOOL_GCHANNELS ioctl request.

Omit attributes for channel types which are not supported by driver or
device (zero reported for maximum).

v2: (all suggested by Jakub Kicinski)
  - minor cleanup in channels_prepare_data()
  - more descriptive channels_reply_size()
  - omit attributes with zero max count

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
bc9d1c995e ethtool: add RINGS_NTF notification
Send ETHTOOL_MSG_RINGS_NTF notification whenever ring sizes of a network
device are modified using ETHTOOL_MSG_RINGS_SET netlink message or
ETHTOOL_SRINGPARAM ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
2fc2929e80 ethtool: set device ring sizes with RINGS_SET request
Implement RINGS_SET netlink request to set ring sizes of a network device.
These are traditionally set with ETHTOOL_SRINGPARAM ioctl request.

Like the ioctl implementation, the generic ethtool code checks if supplied
values do not exceed driver defined limits; if they do, first offending
attribute is reported using extack.

v2:
  - fix netdev reference leak in error path (found by Jakub Kicinsky)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
e4a1717b67 ethtool: provide ring sizes with RINGS_GET request
Implement RINGS_GET request to get ring sizes of a network device. These
are traditionally available via ETHTOOL_GRINGPARAM ioctl request.

Omit attributes for ring types which are not supported by driver or device
(zero reported for maximum).

v2: (all suggested by Jakub Kicinski)
  - minor cleanup in rings_prepare_data()
  - more descriptive rings_reply_size()
  - omit attributes with zero max size

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
111dcba3c6 ethtool: add PRIVFLAGS_NTF notification
Send ETHTOOL_MSG_PRIVFLAGS_NTF notification whenever private flags of
a network device are modified using ETHTOOL_MSG_PRIVFLAGS_SET netlink
message or ETHTOOL_SPFLAGS ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
f265d79959 ethtool: set device private flags with PRIVFLAGS_SET request
Implement PRIVFLAGS_SET netlink request to set private flags of a network
device. These are traditionally set with ETHTOOL_SPFLAGS ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
e16c3386fc ethtool: provide private flags with PRIVFLAGS_GET request
Implement PRIVFLAGS_GET request to get private flags for a network device.
These are traditionally available via ETHTOOL_GPFLAGS ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
9c6451ef48 ethtool: add FEATURES_NTF notification
Send ETHTOOL_MSG_FEATURES_NTF notification whenever network device features
are modified using ETHTOOL_MSG_FEATURES_SET netlink message, ethtool ioctl
request or any other way resulting in call to netdev_update_features() or
netdev_change_features()

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:33 -07:00
Michal Kubecek
0980bfcd69 ethtool: set netdev features with FEATURES_SET request
Implement FEATURES_SET netlink request to set network device features.
These are traditionally set using ETHTOOL_SFEATURES ioctl request.

Actual change is subject to netdev_change_features() sanity checks so that
it can differ from what was requested. Unlike with most other SET requests,
in addition to error code and optional extack, kernel provides an optional
reply message (ETHTOOL_MSG_FEATURES_SET_REPLY) in the same format but with
different semantics: information about difference between user request and
actual result and difference between old and new state of dev->features.
This reply message can be suppressed by setting ETHTOOL_FLAG_OMIT_REPLY
flag in request header.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:32 -07:00
Michal Kubecek
0524399d46 ethtool: provide netdev features with FEATURES_GET request
Implement FEATURES_GET request to get network device features. These are
traditionally available via ETHTOOL_GFEATURES ioctl request.

v2:
  - style cleanup suggested by Jakub Kicinski

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:32:32 -07:00
Marco Felsch
f8c8ee6118 media: v4l: link dt-bindings and uapi
Since we expose the definition to the dt-bindings we need to keep those
definitions in sync. To address this the patch adds a simple cross
reference to the dt-bindings.

Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-12 16:29:52 +01:00
Paolo Lungaroni
2677625387 seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number
The Internet Assigned Numbers Authority (IANA) has recently assigned
a protocol number value of 143 for Ethernet [1].

Before this assignment, encapsulation mechanisms such as Segment Routing
used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated
payload is an Ethernet frame.

In this patch, we add the definition of the Ethernet protocol number to the
kernel headers and update the SRv6 L2 tunnels to use it.

[1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11 23:49:30 -07:00
Jens Axboe
bbbdeb4720 io_uring: dual license io_uring.h uapi header
This just syncs the header it with the liburing version, so there's no
confusion on the license of the header parts.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-11 07:45:46 -06:00
Tony Luck
7c4a4d0882 dmaengine: idxd: Merge definition of dsa_batch_desc into dsa_hw_desc
We don't need a special structure just for batch descriptors. The
layout matches the general form for other descriptors.

Merge the desc_list_addr field into the union of other aliases for
the the third quadword in the structure.

Create a union to alias "xfer_size" with "desc_count".

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/158387868208.35922.5895104426944263789.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-03-11 15:08:52 +05:30
Jens Axboe
067524e914 io_uring: provide means of removing buffers
We have IORING_OP_PROVIDE_BUFFERS, but the only way to remove buffers
is to trigger IO on them. The usual case of shrinking a buffer pool
would be to just not replenish the buffers when IO completes, and
instead just free it. But it may be nice to have a way to manually
remove a number of buffers from a given group, and
IORING_OP_REMOVE_BUFFERS provides that functionality.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:56 -06:00
Jens Axboe
bcda7baaa3 io_uring: support buffer selection for OP_READ and OP_RECV
If a server process has tons of pending socket connections, generally
it uses epoll to wait for activity. When the socket is ready for reading
(or writing), the task can select a buffer and issue a recv/send on the
given fd.

Now that we have fast (non-async thread) support, a task can have tons
of pending reads or writes pending. But that means they need buffers to
back that data, and if the number of connections is high enough, having
them preallocated for all possible connections is unfeasible.

With IORING_OP_PROVIDE_BUFFERS, an application can register buffers to
use for any request. The request then sets IOSQE_BUFFER_SELECT in the
sqe, and a given group ID in sqe->buf_group. When the fd becomes ready,
a free buffer from the specified group is selected. If none are
available, the request is terminated with -ENOBUFS. If successful, the
CQE on completion will contain the buffer ID chosen in the cqe->flags
member, encoded as:

	(buffer_id << IORING_CQE_BUFFER_SHIFT) | IORING_CQE_F_BUFFER;

Once a buffer has been consumed by a request, it is no longer available
and must be registered again with IORING_OP_PROVIDE_BUFFERS.

Requests need to support this feature. For now, IORING_OP_READ and
IORING_OP_RECV support it. This is checked on SQE submission, a CQE with
res == -EOPNOTSUPP will be posted if attempted on unsupported requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:45 -06:00
Jens Axboe
ddf0322db7 io_uring: add IORING_OP_PROVIDE_BUFFERS
IORING_OP_PROVIDE_BUFFERS uses the buffer registration infrastructure to
support passing in an addr/len that is associated with a buffer ID and
buffer group ID. The group ID is used to index and lookup the buffers,
while the buffer ID can be used to notify the application which buffer
in the group was used. The addr passed in is the starting buffer address,
and length is each buffer length. A number of buffers to add with can be
specified, in which case addr is incremented by length for each addition,
and each buffer increments the buffer ID specified.

No validation is done of the buffer ID. If the application provides
buffers within the same group with identical buffer IDs, then it'll have
a hard time telling which buffer ID was used. The only restriction is
that the buffer ID can be a max of 16-bits in size, so USHRT_MAX is the
maximum ID that can be used.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:14 -06:00
Yousuk Seung
e08ab0b377 tcp: add bytes not sent to SCM_TIMESTAMPING_OPT_STATS
Add TCP_NLA_BYTES_NOTSENT to SCM_TIMESTAMPING_OPT_STATS that reports
bytes in the write queue but not sent. This is the same metric as
what is exported with tcp_info.tcpi_notsent_bytes.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09 17:56:33 -07:00
Jiri Pirko
44f8658017 sched: act: allow user to specify type of HW stats for a filter
Currently, user who is adding an action expects HW to report stats,
however it does not have exact expectations about the stats types.
That is aligned with TCA_ACT_HW_STATS_TYPE_ANY.

Allow user to specify the type of HW stats for an action and require it.

Pass the information down to flow_offload layer.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Ingo Molnar
1941011a8b Merge branch 'perf/urgent' into perf/core, to pick up the latest fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-06 11:56:40 +01:00
Tycho Andersen
51891498f2 seccomp: allow TSYNC and USER_NOTIF together
The restriction introduced in 7a0df7fbc1 ("seccomp: Make NEW_LISTENER and
TSYNC flags exclusive") is mostly artificial: there is enough information
in a seccomp user notification to tell which thread triggered a
notification. The reason it was introduced is because TSYNC makes the
syscall return a thread-id on failure, and NEW_LISTENER returns an fd, and
there's no way to distinguish between these two cases (well, I suppose the
caller could check all fds it has, then do the syscall, and if the return
value was an fd that already existed, then it must be a thread id, but
bleh).

Matthew would like to use these two flags together in the Chrome sandbox
which wants to use TSYNC for video drivers and NEW_LISTENER to proxy
syscalls.

So, let's fix this ugliness by adding another flag, TSYNC_ESRCH, which
tells the kernel to just return -ESRCH on a TSYNC error. This way,
NEW_LISTENER (and any subsequent seccomp() commands that want to return
positive values) don't conflict with each other.

Suggested-by: Matthew Denton <mpdenton@google.com>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Link: https://lore.kernel.org/r/20200304180517.23867-1-tycho@tycho.ws
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-03-04 14:48:54 -08:00
KP Singh
ae24082331 bpf: Introduce BPF_MODIFY_RETURN
When multiple programs are attached, each program receives the return
value from the previous program on the stack and the last program
provides the return value to the attached function.

The fmod_ret bpf programs are run after the fentry programs and before
the fexit programs. The original function is only called if all the
fmod_ret programs return 0 to avoid any unintended side-effects. The
success value, i.e. 0 is not currently configurable but can be made so
where user-space can specify it at load time.

For example:

int func_to_be_attached(int a, int b)
{  <--- do_fentry

do_fmod_ret:
   <update ret by calling fmod_ret>
   if (ret != 0)
        goto do_fexit;

original_function:

    <side_effects_happen_here>

}  <--- do_fexit

The fmod_ret program attached to this function can be defined as:

SEC("fmod_ret/func_to_be_attached")
int BPF_PROG(func_name, int a, int b, int ret)
{
        // This will skip the original function logic.
        return 1;
}

The first fmod_ret program is passed 0 in its return argument.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200304191853.1529-4-kpsingh@chromium.org
2020-03-04 13:41:05 -08:00
Linus Torvalds
776e49e8dd - Fix request-based DM's congestion_fn and actually wire it up to the
bdi.
 
 - Extend dm-bio-record to track additional struct bio members needed
   by DM integrity target.
 
 - Fix DM core to properly advertise that a device is suspended during
   unload (between the presuspend and postsuspend hooks).  This change
   is a prereq for related DM integrity and DM writecache fixes.  It
   elevates DM integrity's 'suspending' state tracking to DM core.
 
 - Four stable fixes for DM integrity target.
 
 - Fix crash in DM cache target due to incorrect work item cancelling.
 
 - Fix DM thin metadata lockdep warning that was introduced during 5.6
   merge window.
 
 - Fix DM zoned target's chunk work refcounting that regressed during
   recent conversion to refcount_t.
 
 - Bump the minor version for DM core and all target versions that have
   seen interface changes or important fixes during the 5.6 cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl5fwPgTHHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWnUJB/9+36J0Y3HYdUIKM8P5CBH0tdkMqSo4
 BFm3Og4Xo1GA5xodNptgD+QoLXy3VWkekUsLvaLgOlPq7haPvHkME20vUMuO1l46
 QMNR2VXciYGgV0+CFlpXL2oMr0ZEc1hkt7ZpzYaw1OIk6Mo0tEEDrSo0rvmXafPf
 W4veTWRL4HfN8fy3NpQNJ4xBQs4Iw4VC20FPFIOvzA1EGk7VGGaP1mGKmOnjxo/O
 o3lEH8NpQgwxwST7d4T6DPdeu3aTifnjdY8/h8mFGpQnxOiZ/wZkheKWwTCNE8r9
 oeVY07qVxNAHyC8G/SmAL5KnC6En1hq5jA2M4xh9guUI2k8YbON571n+
 =iAS0
 -----END PGP SIGNATURE-----

Merge tag 'for-5.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix request-based DM's congestion_fn and actually wire it up to the
   bdi.

 - Extend dm-bio-record to track additional struct bio members needed by
   DM integrity target.

 - Fix DM core to properly advertise that a device is suspended during
   unload (between the presuspend and postsuspend hooks). This change is
   a prereq for related DM integrity and DM writecache fixes. It
   elevates DM integrity's 'suspending' state tracking to DM core.

 - Four stable fixes for DM integrity target.

 - Fix crash in DM cache target due to incorrect work item cancelling.

 - Fix DM thin metadata lockdep warning that was introduced during 5.6
   merge window.

 - Fix DM zoned target's chunk work refcounting that regressed during
   recent conversion to refcount_t.

 - Bump the minor version for DM core and all target versions that have
   seen interface changes or important fixes during the 5.6 cycle.

* tag 'for-5.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: bump version of core and various targets
  dm: fix congested_fn for request-based device
  dm integrity: use dm_bio_record and dm_bio_restore
  dm bio record: save/restore bi_end_io and bi_integrity
  dm zoned: Fix reference counter initial value of chunk works
  dm writecache: verify watermark during resume
  dm: report suspended device during destroy
  dm thin metadata: fix lockdep complaint
  dm cache: fix a crash due to incorrect work item cancelling
  dm integrity: fix invalid table returned due to argument count mismatch
  dm integrity: fix a deadlock due to offloading to an incorrect workqueue
  dm integrity: fix recalculation when moving from journal mode to bitmap mode
2020-03-04 13:02:45 -06:00
Andrii Nakryiko
1aae4bdd78 bpf: Switch BPF UAPI #define constants used from BPF program side to enums
Switch BPF UAPI constants, previously defined as #define macro, to anonymous
enum values. This preserves constants values and behavior in expressions, but
has added advantaged of being captured as part of DWARF and, subsequently, BTF
type info. Which, in turn, greatly improves usefulness of generated vmlinux.h
for BPF applications, as it will not require BPF users to copy/paste various
flags and constants, which are frequently used with BPF helpers. Only those
constants that are used/useful from BPF program side are converted.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200303003233.3496043-2-andriin@fb.com
2020-03-04 17:00:05 +01:00
Willem de Bruijn
cf62089b0e bpf: Add gso_size to __sk_buff
BPF programs may want to know whether an skb is gso. The canonical
answer is skb_is_gso(skb), which tests that gso_size != 0.

Expose this field in the same manner as gso_segs. That field itself
is not a sufficient signal, as the comment in skb_shared_info makes
clear: gso_segs may be zero, e.g., from dodgy sources.

Also prepare net/bpf/test_run for upcoming BPF_PROG_TEST_RUN tests
of the feature.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200303200503.226217-2-willemdebruijn.kernel@gmail.com
2020-03-03 16:23:59 -08:00
Parav Pandit
acf1ee44ca devlink: Introduce devlink port flavour virtual
Currently mlx5 PCI PF and VF devlink devices register their ports as
physical port in non-representors mode.

Introduce a new port flavour as virtual so that virtual devices can
register 'virtual' flavour to make it more clear to users.

An example of one PCI PF and 2 PCI virtual functions, each having
one devlink port.

$ devlink port show
pci/0000:06:00.0/1: type eth netdev ens2f0 flavour physical port 0
pci/0000:06:00.2/1: type eth netdev ens2f2 flavour virtual port 0
pci/0000:06:00.3/1: type eth netdev ens2f3 flavour virtual port 0

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03 15:40:40 -08:00
Mike Snitzer
636be4241b dm: bump version of core and various targets
Changes made during the 5.6 cycle warrant bumping the version number
for DM core and the targets modified by this commit.

It should be noted that dm-thin, dm-crypt and dm-raid already had
their target version bumped during the 5.6 merge window.

Signed-off-by; Mike Snitzer <snitzer@redhat.com>
2020-03-03 11:10:21 -05:00
Jens Axboe
d7718a9d25 io_uring: use poll driven retry for files that support it
Currently io_uring tries any request in a non-blocking manner, if it can,
and then retries from a worker thread if we get -EAGAIN. Now that we have
a new and fancy poll based retry backend, use that to retry requests if
the file supports it.

This means that, for example, an IORING_OP_RECVMSG on a socket no longer
requires an async thread to complete the IO. If we get -EAGAIN reading
from the socket in a non-blocking manner, we arm a poll handler for
notification on when the socket becomes readable. When it does, the
pending read is executed directly by the task again, through the io_uring
task work handlers. Not only is this faster and more efficient, it also
means we're not generating potentially tons of async threads that just
sit and block, waiting for the IO to complete.

The feature is marked with IORING_FEAT_FAST_POLL, meaning that async
pollable IO is fast, and that poll<link>other_op is fast as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:38 -07:00
Pavel Begunkov
7d67af2c01 io_uring: add splice(2) support
Add support for splice(2).

- output file is specified as sqe->fd, so it's handled by generic code
- hash_reg_file handled by generic code as well
- len is 32bit, but should be fine
- the fd_in is registered file, when SPLICE_F_FD_IN_FIXED is set, which
is a splice flag (i.e. sqe->splice_flags).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:37 -07:00
Gustavo A. R. Silva
1776658da8 drop_monitor: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-02 11:16:28 -08:00
Gustavo A. R. Silva
5a8b7c4b7f arcnet: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-29 21:52:20 -08:00
David S. Miller
9f0ca0c1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-02-28

The following pull-request contains BPF updates for your *net-next* tree.

We've added 41 non-merge commits during the last 7 day(s) which contain
a total of 49 files changed, 1383 insertions(+), 499 deletions(-).

The main changes are:

1) BPF and Real-Time nicely co-exist.

2) bpftool feature improvements.

3) retrieve bpf_sk_storage via INET_DIAG.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-29 15:53:35 -08:00
Martin KaFai Lau
085c20cacf bpf: inet_diag: Dump bpf_sk_storages in inet_diag_dump()
This patch will dump out the bpf_sk_storages of a sk
if the request has the INET_DIAG_REQ_SK_BPF_STORAGES nlattr.

An array of SK_DIAG_BPF_STORAGE_REQ_MAP_FD can be specified in
INET_DIAG_REQ_SK_BPF_STORAGES to select which bpf_sk_storage to dump.
If no map_fd is specified, all bpf_sk_storages of a sk will be dumped.

bpf_sk_storages can be added to the system at runtime.  It is difficult
to find a proper static value for cb->min_dump_alloc.

This patch learns the nlattr size required to dump the bpf_sk_storages
of a sk.  If it happens to be the very first nlmsg of a dump and it
cannot fit the needed bpf_sk_storages,  it will try to expand the
skb by "pskb_expand_head()".

Instead of expanding it in inet_sk_diag_fill(), it is expanded at a
sleepable context in __inet_diag_dump() so __GFP_DIRECT_RECLAIM can
be used.  In __inet_diag_dump(), it will retry as long as the
skb is empty and the cb->min_dump_alloc becomes larger than before.
cb->min_dump_alloc is bounded by KMALLOC_MAX_SIZE.  The min_dump_alloc
is also changed from 'u16' to 'u32' to accommodate a sk that may have
a few large bpf_sk_storages.

The updated cb->min_dump_alloc will also be used to allocate the skb in
the next dump.  This logic already exists in netlink_dump().

Here is the sample output of a locally modified 'ss' and it could be made
more readable by using BTF later:
[root@arch-fb-vm1 ~]# ss --bpf-map-id 14 --bpf-map-id 13 -t6an 'dst [::1]:8989'
State Recv-Q Send-Q Local Address:Port  Peer Address:PortProcess
ESTAB 0      0              [::1]:51072        [::1]:8989
	 bpf_map_id:14 value:[ 3feb ]
	 bpf_map_id:13 value:[ 3f ]
ESTAB 0      0              [::1]:51070        [::1]:8989
	 bpf_map_id:14 value:[ 3feb ]
	 bpf_map_id:13 value:[ 3f ]

[root@arch-fb-vm1 ~]# ~/devshare/github/iproute2/misc/ss --bpf-maps -t6an 'dst [::1]:8989'
State         Recv-Q         Send-Q                   Local Address:Port                    Peer Address:Port         Process
ESTAB         0              0                                [::1]:51072                          [::1]:8989
	 bpf_map_id:14 value:[ 3feb ]
	 bpf_map_id:13 value:[ 3f ]
	 bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
ESTAB         0              0                                [::1]:51070                          [::1]:8989
	 bpf_map_id:14 value:[ 3feb ]
	 bpf_map_id:13 value:[ 3f ]
	 bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com
2020-02-27 18:50:19 -08:00
Martin KaFai Lau
1ed4d92458 bpf: INET_DIAG support in bpf_sk_storage
This patch adds INET_DIAG support to bpf_sk_storage.

1. Although this series adds bpf_sk_storage diag capability to inet sk,
   bpf_sk_storage is in general applicable to all fullsock.  Hence, the
   bpf_sk_storage logic will operate on SK_DIAG_* nlattr.  The caller
   will pass in its specific nesting nlattr (e.g. INET_DIAG_*) as
   the argument.

2. The request will be like:
	INET_DIAG_REQ_SK_BPF_STORAGES (nla_nest) (defined in latter patch)
		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
		......

   Considering there could have multiple bpf_sk_storages in a sk,
   instead of reusing INET_DIAG_INFO ("ss -i"),  the user can select
   some specific bpf_sk_storage to dump by specifying an array of
   SK_DIAG_BPF_STORAGE_REQ_MAP_FD.

   If no SK_DIAG_BPF_STORAGE_REQ_MAP_FD is specified (i.e. an empty
   INET_DIAG_REQ_SK_BPF_STORAGES), it will dump all bpf_sk_storages
   of a sk.

3. The reply will be like:
	INET_DIAG_BPF_SK_STORAGES (nla_nest) (defined in latter patch)
		SK_DIAG_BPF_STORAGE (nla_nest)
			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
		SK_DIAG_BPF_STORAGE (nla_nest)
			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
		......

4. Unlike other INET_DIAG info of a sk which is pretty static, the size
   required to dump the bpf_sk_storage(s) of a sk is dynamic as the
   system adding more bpf_sk_storage_map.  It is hard to set a static
   min_dump_alloc size.

   Hence, this series learns it at the runtime and adjust the
   cb->min_dump_alloc as it iterates all sk(s) of a system.  The
   "unsigned int *res_diag_size" in bpf_sk_storage_diag_put()
   is for this purpose.

   The next patch will update the cb->min_dump_alloc as it
   iterates the sk(s).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230421.1975729-1-kafai@fb.com
2020-02-27 18:50:19 -08:00
Martin KaFai Lau
0df6d32842 inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data
The INET_DIAG_REQ_BYTECODE nlattr is currently re-found every time when
the "dump()" is re-started.

In a latter patch, it will also need to parse the new
INET_DIAG_REQ_SK_BPF_STORAGES nlattr to learn the map_fds. Thus, this
patch takes this chance to store the parsed nlattr in cb->data
during the "start" time of a dump.

By doing this, the "bc" argument also becomes unnecessary
and is removed.  Also, the two copies of the INET_DIAG_REQ_BYTECODE
parsing-audit logic between compat/current version can be
consolidated to one.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230415.1975555-1-kafai@fb.com
2020-02-27 18:50:19 -08:00
Gustavo A. R. Silva
d7f10df862 bpf: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200227001744.GA3317@embeddedor
2020-02-28 01:21:02 +01:00
Christian Borntraeger
13da9ae1cd KVM: s390: protvirt: introduce and enable KVM_CAP_S390_PROTECTED
Now that everything is in place, we can announce the feature.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2020-02-27 19:47:13 +01:00
Janosch Frank
e0d2773d48 KVM: s390: protvirt: UV calls in support of diag308 0, 1
diag 308 subcode 0 and 1 require several KVM and Ultravisor interactions.
Specific to these "soft" reboots are

* The "unshare all" UVC
* The "prepare for reset" UVC

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27 19:47:12 +01:00
Janosch Frank
19e1227768 KVM: S390: protvirt: Introduce instruction data area bounce buffer
Now that we can't access guest memory anymore, we have a dedicated
satellite block that's a bounce buffer for instruction data.

We re-use the memop interface to copy the instruction data to / from
userspace. This lets us re-use a lot of QEMU code which used that
interface to make logical guest memory accesses which are not possible
anymore in protected mode anyway.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27 19:47:11 +01:00
Janosch Frank
29b40f105e KVM: s390: protvirt: Add initial vm and cpu lifecycle handling
This contains 3 main changes:
1. changes in SIE control block handling for secure guests
2. helper functions for create/destroy/unpack secure guests
3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
machines

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27 19:47:11 +01:00
Jiri Pirko
742b8cceaa drop_monitor: extend by passing cookie from driver
If driver passed along the cookie, push it through Netlink.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:54 -08:00
Jiri Pirko
85b0589ede devlink: add trap metadata type for cookie
Allow driver to indicate cookie metadata for registered traps.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:54 -08:00
David S. Miller
3b3e808cd8 A new set of changes:
* lots of small documentation fixes, from Jérôme Pouiller
  * beacon protection (BIGTK) support from Jouni Malinen
  * some initial code for TID configuration, from Tamizh chelvam
  * I reverted some new API before it's actually used, because
    it's wrong to mix controlled port and preauth
  * a few other cleanups/fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl5UFlsACgkQB8qZga/f
 l8SwNQ//Y/dELGODumDxY03l/Bj/+XL7HYl3Yn+J8mt2mYPC3zjTjvcRJBApiAog
 1Oyd75fd+EqjxDUT+ngN8uJLQ/yCVsdwfpZhwV7VanAOuLI9aYkIXnai/Qs96rj3
 yDIlrBVsmfsaxf/2e9UmsLmeUSm7C5s1EzaAIwoRGvcUH0pbH9WYAXF1QV+8fmXa
 yoXuHV5Bv+wOW2xWqWJsFpoV109AW24pwJm0vlILcpFP/jno2GsRvwEpnC/GJhEA
 4+jfEj0KlFkOewp0/HcqrUJp4yDEBhnhTTYgDL3hSWgKRVorKqY4/QmpKQCmpVQk
 Qrb6k+TrnLmKBQKdqfd+PKAEC9U/9Wjg0KLPyc9btBGFNSUG3QoDigzxxIvSlW6w
 2vyanDW6780FTIi8sA7sq1cBLosIyoFG44YYwbMidVtxhBk1LMRvetNAOnAJeycp
 Abbp/A2EdvzM+ZMNMRwWlsgig6WkGY7jy/zpcmQUdALM+yT2o7D5ZwxE5pa2ggds
 jf0eER1vVCEpOL70swNSZbAnDNCzBzTN64GX9gQIjdVT+nMUYQXwuOgEmho1FshD
 bgZz4PcaOCCSTice9GYCC3C9OqXBsE2DyBwytYYzahyDiQH13Iz6wEq/+tIIjzCN
 KKRdD12TXaPobLwv5zI3kogJ1I4P1fa6RZYpkngLpMbQrQ7qiDI=
 =x0rf
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2020-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
A new set of changes:
 * lots of small documentation fixes, from Jérôme Pouiller
 * beacon protection (BIGTK) support from Jouni Malinen
 * some initial code for TID configuration, from Tamizh chelvam
 * I reverted some new API before it's actually used, because
   it's wrong to mix controlled port and preauth
 * a few other cleanups/fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24 15:41:54 -08:00
Martin Varghese
4b5f67232d net: Special handling for IP & MPLS.
Special handling is needed in bareudp module for IP & MPLS as they
support more than one ethertypes.

MPLS has 2 ethertypes. 0x8847 for MPLS unicast and 0x8848 for MPLS multicast.
While decapsulating MPLS packet from UDP packet the tunnel destination IP
address is checked to determine the ethertype. The ethertype of the packet
will be set to 0x8848 if the  tunnel destination IP address is a multicast
IP address. The ethertype of the packet will be set to 0x8847 if the
tunnel destination IP address is a unicast IP address.

IP has 2 ethertypes.0x0800 for IPV4 and 0x86dd for IPv6. The version
field of the IP header tunnelled will be checked to determine the ethertype.

This special handling to tunnel additional ethertypes will be disabled
by default and can be enabled using a flag called multiproto. This flag can
be used only with ethertypes 0x8847 and 0x0800.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24 13:31:42 -08:00
Martin Varghese
571912c69f net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.
The Bareudp tunnel module provides a generic L3 encapsulation
tunnelling module for tunnelling different protocols like MPLS,
IP,NSH etc inside a UDP tunnel.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24 13:31:42 -08:00
Eugen Hristev
4e52889f48 media: atmel: atmel-isc-base: expose white balance as v4l2 controls
This exposes the white balance configuration of the ISC as v4l2 controls
into userspace.
There are 8 controls available:
4 gain controls, sliders, for each of the BAYER components: R, B, GR, GB.
These gains are multipliers for each component, in format unsigned 0:4:9
with a default value of 512 (1.0 multiplier).
4 offset controls, sliders, for each of the BAYER components: R, B, GR, GB.
These offsets are added/substracted from each component, in format signed
1:12:0 with a default value of 0 (+/- 0)

To expose this to userspace, added 8 custom controls, in an auto cluster.

To summarize the functionality:
The auto cluster switch is the auto white balance control, and it works
like this:
AWB == 1: autowhitebalance is on, the do_white_balance button is inactive,
the gains/offsets are inactive, but volatile and readable.
Thus, the results of the whitebalance algorithm are available to userspace
to read at any time.
AWB == 0: autowhitebalance is off, cluster is in manual mode, user can
configure the gain/offsets directly. More than that, if the
do_white_balance button is pressed, the driver will perform
one-time-adjustment, (preferably with color checker card) and the userspace
can read again the new values.

With this feature, the userspace can save the coefficients and reinstall
them for example after reboot or reprobing the driver.

[hverkuil: fix checkpatch warning]
[hverkuil: minor spacing adjustments in the functionality description]

Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-02-24 16:12:28 +01:00
Tamizh chelvam
04f7d142f5 nl80211: Add support to configure TID specific RTSCTS configuration
This patch adds support to configure per TID RTSCTS control
configuration to enable/disable through the
NL80211_TID_CONFIG_ATTR_RTSCTS_CTRL attribute.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-5-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24 13:56:57 +01:00
Tamizh chelvam
ade274b23e nl80211: Add support to configure TID specific AMPDU configuration
This patch adds support to configure per TID AMPDU control
configuration to enable/disable aggregation through the
NL80211_TID_CONFIG_ATTR_AMPDU_CTRL attribute.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-4-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24 13:56:49 +01:00
Tamizh chelvam
6a21d16c4d nl80211: Add support to configure TID specific retry configuration
This patch adds support to configure per TID retry configuration
through the NL80211_TID_CONFIG_ATTR_RETRY_SHORT and
NL80211_TID_CONFIG_ATTR_RETRY_LONG attributes. This TID specific
retry configuration will have more precedence than phy level
configuration.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-3-git-send-email-tamizhr@codeaurora.org
[rebase completely on top of my previous API changes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24 13:48:54 +01:00
Johannes Berg
3710a8a628 nl80211: modify TID-config API
Make some changes to the TID-config API:
 * use u16 in nl80211 (only, and restrict to using 8 bits for now),
   to avoid issues in the future if we ever want to use higher TIDs.
 * reject empty TIDs mask (via netlink policy)
 * change feature advertising to not use extended feature flags but
   have own mechanism for this, which simplifies the code
 * fix all variable names from 'tid' to 'tids' since it's a mask
 * change to cfg80211_ name prefixes, not ieee80211_
 * fix some minor docs/spelling things.

Change-Id: Ia234d464b3f914cdeab82f540e018855be580dce
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24 12:01:10 +01:00
Tamizh chelvam
77f576deaa nl80211: Add NL command to support TID speicific configurations
Add the new NL80211_CMD_SET_TID_CONFIG command to support
data TID specific configuration. Per TID configuration is
passed in the nested NL80211_ATTR_TID_CONFIG attribute.

This patch adds support to configure per TID noack policy
through the NL80211_TID_CONFIG_ATTR_NOACK attribute.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-2-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24 11:15:25 +01:00
Jouni Malinen
56be393fa8 cfg80211: Support key configuration for Beacon protection (BIGTK)
IEEE P802.11-REVmd/D3.0 adds support for protecting Beacon frames using
a new set of keys (BIGTK; key index 6..7) similarly to the way
group-addressed Robust Management frames are protected (IGTK; key index
4..5). Extend cfg80211 and nl80211 to allow the new BIGTK to be
configured. Add an extended feature flag to indicate driver support for
the new key index values to avoid array overflows in driver
implementations and also to indicate to user space when this
functionality is available.

Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200222132548.20835-2-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24 10:35:48 +01:00
Johannes Berg
8d74a623cc Revert "nl80211: add src and dst addr attributes for control port tx/rx"
This reverts commit 8c3ed7aa2b.

As Jouni points out, there's really no need for this, since the
RSN pre-authentication frames are normal data frames, not port
control frames (locally).

We can still revert this now since it hasn't actually gone beyond
-next.

Fixes: 8c3ed7aa2b ("nl80211: add src and dst addr attributes for control port tx/rx")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200224101910.b746e263287a.I9eb15d6895515179d50964dec3550c9dc784bb93@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24 10:22:02 +01:00
David S. Miller
b105e8e281 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-02-21

The following pull-request contains BPF updates for your *net-next* tree.

We've added 25 non-merge commits during the last 4 day(s) which contain
a total of 33 files changed, 2433 insertions(+), 161 deletions(-).

The main changes are:

1) Allow for adding TCP listen sockets into sock_map/hash so they can be used
   with reuseport BPF programs, from Jakub Sitnicki.

2) Add a new bpf_program__set_attach_target() helper for adding libbpf support
   to specify the tracepoint/function dynamically, from Eelco Chaudron.

3) Add bpf_read_branch_records() BPF helper which helps use cases like profile
   guided optimizations, from Daniel Xu.

4) Enable bpf_perf_event_read_value() in all tracing programs, from Song Liu.

5) Relax BTF mandatory check if only used for libbpf itself e.g. to process
   BTF defined maps, from Andrii Nakryiko.

6) Move BPF selftests -mcpu compilation attribute from 'probe' to 'v3' as it has
   been observed that former fails in envs with low memlock, from Yonghong Song.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21 15:22:45 -08:00
David S. Miller
e65ee2fb54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflict resolution of ice_virtchnl_pf.c based upon work by
Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21 13:39:34 -08:00
Linus Torvalds
cee853e825 USB fixes for 5.6-rc3
Here are a number of small USB driver fixes for 5.6-rc3.
 
 Included in here are:
   - MAINTAINER file updates
   - USB gadget driver fixes
   - usb core quirk additions and fixes for regressions
   - xhci driver fixes
   - usb serial driver id additions and fixes
   - thunderbolt bugfix
 
 Thunderbolt patches come in through here now that USB4 is really
 thunderbolt.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXk+/zw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynQywCeOlklNLcTKw/JkbsbtMlY/yTl4W8AoLLoEgSy
 GyUV8E67wZNdXLKH+n14
 =o97t
 -----END PGP SIGNATURE-----

Merge tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/Thunderbolt fixes from Greg KH:
 "Here are a number of small USB driver fixes for 5.6-rc3.

  Included in here are:
  - MAINTAINER file updates
  - USB gadget driver fixes
  - usb core quirk additions and fixes for regressions
  - xhci driver fixes
  - usb serial driver id additions and fixes
  - thunderbolt bugfix

  Thunderbolt patches come in through here now that USB4 is really
  thunderbolt.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits)
  USB: misc: iowarrior: add support for the 100 device
  thunderbolt: Prevent crash if non-active NVMem file is read
  usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format
  USB: misc: iowarrior: add support for the 28 and 28L devices
  USB: misc: iowarrior: add support for 2 OEMed devices
  USB: Fix novation SourceControl XL after suspend
  xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2
  Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables"
  MAINTAINERS: Sort entries in database for THUNDERBOLT
  usb: dwc3: debug: fix string position formatting mixup with ret and len
  usb: gadget: serial: fix Tx stall after buffer overflow
  usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags
  usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows
  usb: dwc2: Fix in ISOC request length checking
  usb: gadget: composite: Support more than 500mA MaxPower
  usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus
  usb: gadget: u_audio: Fix high-speed max packet size
  usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields
  USB: core: clean up endpoint-descriptor parsing
  USB: quirks: blacklist duplicate ep on Sound Devices USBPre2
  ...
2020-02-21 12:44:53 -08:00
Linus Torvalds
3dc55dba67 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from
    Cong Wang.

 2) Fix deadlock in xsk by publishing global consumer pointers when NAPI
    is finished, from Magnus Karlsson.

 3) Set table field properly to RT_TABLE_COMPAT when necessary, from
    Jethro Beekman.

 4) NLA_STRING attributes are not necessary NULL terminated, deal wiht
    that in IFLA_ALT_IFNAME. From Eric Dumazet.

 5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov.

 6) Handle mtu==0 devices properly in wireguard, from Jason A.
    Donenfeld.

 7) Fix several lockdep warnings in bonding, from Taehee Yoo.

 8) Fix cls_flower port blocking, from Jason Baron.

 9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen.

10) Fix RDMA race in qede driver, from Michal Kalderon.

11) Fix several false lockdep warnings by adding conditions to
    list_for_each_entry_rcu(), from Madhuparna Bhowmik.

12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen.

13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song.

14) Hey, variables declared in switch statement before any case
    statements are not initialized. I learn something every day. Get
    rids of this stuff in several parts of the networking, from Kees
    Cook.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
  bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.
  bnxt_en: Improve device shutdown method.
  net: netlink: cap max groups which will be considered in netlink_bind()
  net: thunderx: workaround BGX TX Underflow issue
  ionic: fix fw_status read
  net: disable BRIDGE_NETFILTER by default
  net: macb: Properly handle phylink on at91rm9200
  s390/qeth: fix off-by-one in RX copybreak check
  s390/qeth: don't warn for napi with 0 budget
  s390/qeth: vnicc Fix EOPNOTSUPP precedence
  openvswitch: Distribute switch variables for initialization
  net: ip6_gre: Distribute switch variables for initialization
  net: core: Distribute switch variables for initialization
  udp: rehash on disconnect
  net/tls: Fix to avoid gettig invalid tls record
  bpf: Fix a potential deadlock with bpf_map_do_batch
  bpf: Do not grab the bucket spinlock by default on htab batch ops
  ice: Wait for VF to be reset/ready before configuration
  ice: Don't tell the OS that link is going down
  ice: Don't reject odd values of usecs set by user
  ...
2020-02-21 11:59:51 -08:00
Christian Borntraeger
467d12f5c7 include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap
QEMU has a funny new build error message when I use the upstream kernel
headers:

      CC      block/file-posix.o
    In file included from /home/cborntra/REPOS/qemu/include/qemu/timer.h:4,
                     from /home/cborntra/REPOS/qemu/include/qemu/timed-average.h:29,
                     from /home/cborntra/REPOS/qemu/include/block/accounting.h:28,
                     from /home/cborntra/REPOS/qemu/include/block/block_int.h:27,
                     from /home/cborntra/REPOS/qemu/block/file-posix.c:30:
    /usr/include/linux/swab.h: In function `__swab':
    /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:34: error: "sizeof" is not defined, evaluates to 0 [-Werror=undef]
       20 | #define BITS_PER_LONG           (sizeof (unsigned long) * BITS_PER_BYTE)
          |                                  ^~~~~~
    /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:41: error: missing binary operator before token "("
       20 | #define BITS_PER_LONG           (sizeof (unsigned long) * BITS_PER_BYTE)
          |                                         ^
    cc1: all warnings being treated as errors
    make: *** [/home/cborntra/REPOS/qemu/rules.mak:69: block/file-posix.o] Error 1
    rm tests/qemu-iotests/socket_scm_helper.o

This was triggered by commit d5767057c9 ("uapi: rename ext2_swab() to
swab() and share globally in swab.h").  That patch is doing

  #include <asm/bitsperlong.h>

but it uses BITS_PER_LONG.

The kernel file asm/bitsperlong.h provide only __BITS_PER_LONG.

Let us use the __ variant in swap.h

Link: http://lkml.kernel.org/r/20200213142147.17604-1-borntraeger@de.ibm.com
Fixes: d5767057c9 ("uapi: rename ext2_swab() to swab() and share globally in swab.h")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Cc: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21 11:22:15 -08:00
Arnd Bergmann
c766d1472c y2038: hide timeval/timespec/itimerval/itimerspec types
There are no in-kernel users remaining, but there may still be users that
include linux/time.h instead of sys/time.h from user space, so leave the
types available to user space while hiding them from kernel space.

Only the __kernel_old_* versions of these types remain now.

Link: http://lkml.kernel.org/r/20200110154232.4104492-4-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21 11:22:15 -08:00
Alexandru Gagniuc
202853595e PCI: pciehp: Disable in-band presence detect when possible
The presence detect state (PDS) is normally a logical OR of in-band and
out-of-band (OOB) presence detect.  As of PCIe 4.0, there is the option to
disable in-band presence so that the PDS bit always reflects the state of
the out-of-band presence.

The recommendation of the PCIe spec is to disable in-band presence whenever
supported (PCIe r5.0, appendix I implementation note):

  Due to architectural issues, the in-band (Physical-Layer-based) portion
  of the PD mechanism is deprecated for use with async hot-plug. One issue
  is that in-band PD as architected does not detect adapter removal during
  certain LTSSM states, notably the L1 and Disabled States.  Another issue
  is that when both in-band and OOB PD are being used together, the
  Presence Detect State bit and its associated interrupt mechanism always
  reflect the logical OR of the inband and OOB PD states, and with some
  hot-plug hardware configurations, it is important for software to detect
  and respond to in-band and OOB PD events independently.  If OOB PD is
  being used and the associated DSP supports In-Band PD Disable, it is
  recommended that the In-Band PD Disable bit be Set, and the Presence
  Detect State bit and its associated interrupt mechanism be used
  exclusively for OOB PD.  As a substitute for in-band PD with async
  hot-plug, the reference model uses either the DPC or the DLL Link Active
  mechanism.

Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com
[bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD
value (suggested by Lukas)]
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2020-02-20 22:44:30 -06:00
David S. Miller
41f57cfde1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-02-19

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 93 insertions(+), 31 deletions(-).

The main changes are:

1) batched bpf hashtab fixes from Brian and Yonghong.

2) various selftests and libbpf fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 16:42:35 -08:00
Daniel Xu
fff7b64355 bpf: Add bpf_read_branch_records() helper
Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.

We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.

Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
2020-02-19 14:37:36 -08:00
Aya Levin
f623e59705 ethtool: Add support for low latency RS FEC
Add support for low latency Reed Solomon FEC as LLRS.

The LL-FEC is defined by the 25G/50G ethernet consortium,
in the document titled "Low Latency Reed Solomon Forward Error Correction"

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
CC: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
2020-02-18 19:17:31 -08:00
David S. Miller
7c8c1697c7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

This batch contains Netfilter fixes for net:

1) Restrict hashlimit size to 1048576, from Cong Wang.

2) Check for offload flags from nf_flow_table_offload_setup(),
   this fixes a crash in case the hardware offload is disabled.
   From Florian Westphal.

3) Three preparation patches to extend the conntrack clash resolution,
   from Florian.

4) Extend clash resolution to deal with DNS packets from the same flow
   racing to set up the NAT configuration.

5) Small documentation fix in pipapo, from Stefano Brivio.

6) Remove misleading unlikely() from pipapo_refill(), also from Stefano.

7) Reduce hashlimit mutex scope, from Cong Wang. This patch is actually
   triggering another problem, still under discussion, another patch to
   fix this will follow up.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 15:44:13 -08:00
Toke Høiland-Jørgensen
f25975f42f bpf, uapi: Remove text about bpf_redirect_map() giving higher performance
The performance of bpf_redirect() is now roughly the same as that of
bpf_redirect_map(). However, David Ahern pointed out that the header file
has not been updated to reflect this, and still says that a significant
performance increase is possible when using bpf_redirect_map(). Remove this
text from the bpf_redirect_map() description, and reword the description in
bpf_redirect() slightly. Also fix the 'Return' section of the
bpf_redirect_map() documentation.

Fixes: 1d233886dd ("xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200218130334.29889-1-toke@redhat.com
2020-02-18 15:31:31 +01:00
Florian Westphal
6a757c07e5 netfilter: conntrack: allow insertion of clashing entries
This patch further relaxes the need to drop an skb due to a clash with
an existing conntrack entry.

Current clash resolution handles the case where the clash occurs between
two identical entries (distinct nf_conn objects with same tuples), i.e.:

                    Original                        Reply
existing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
clashing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353

... existing handling will discard the unconfirmed clashing entry and
makes skb->_nfct point to the existing one.  The skb can then be
processed normally just as if the clash would not have existed in the
first place.

For other clashes, the skb needs to be dropped.
This frequently happens with DNS resolvers that send A and AAAA queries
back-to-back when NAT rules are present that cause packets to get
different DNAT transformations applied, for example:

-m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353
-m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353

In this case the A or AAAA query is dropped which incurs a costly
delay during name resolution.

This patch also allows this collision type:
                       Original                   Reply
existing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
clashing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.7:5353

In this case, clash is in original direction -- the reply direction
is still unique.

The change makes it so that when the 2nd colliding packet is received,
the clashing conntrack is tagged with new IPS_NAT_CLASH_BIT, gets a fixed
1 second timeout and is inserted in the reply direction only.

The entry is hidden from 'conntrack -L', it will time out quickly
and it can be early dropped because it will never progress to the
ASSURED state.

To avoid special-casing the delete code path to special case
the ORIGINAL hlist_nulls node, a new helper, "hlist_nulls_add_fake", is
added so hlist_nulls_del() will work.

Example:

      CPU A:                               CPU B:
1.  10.2.3.4:42 -> 10.8.8.8:53 (A)
2.                                         10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
3.  Apply DNAT, reply changed to 10.0.0.6
4.                                         10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
5.                                         Apply DNAT, reply changed to 10.0.0.7
6. confirm/commit to conntrack table, no collisions
7.                                         commit clashing entry

Reply comes in:

10.2.3.4:42 <- 10.0.0.6:5353 (A)
 -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
10.2.3.4:42 <- 10.0.0.7:5353 (AAAA)
 -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
    The conntrack entry is deleted from table, as it has the NAT_CLASH
    bit set.

In case of a retransmit from ORIGINAL dir, all further packets will get
the DNAT transformation to 10.0.0.6.

I tried to come up with other solutions but they all have worse
problems.

Alternatives considered were:
1.  Confirm ct entries at allocation time, not in postrouting.
 a. will cause uneccesarry work when the skb that creates the
    conntrack is dropped by ruleset.
 b. in case nat is applied, ct entry would need to be moved in
    the table, which requires another spinlock pair to be taken.
 c. breaks the 'unconfirmed entry is private to cpu' assumption:
    we would need to guard all nfct->ext allocation requests with
    ct->lock spinlock.

2. Make the unconfirmed list a hash table instead of a pcpu list.
   Shares drawback c) of the first alternative.

3. Document this is expected and force users to rearrange their
   ruleset (e.g. by using "-m cluster" instead of "-m statistics").
   nft has the 'jhash' expression which can be used instead of 'numgen'.

   Major drawback: doesn't fix what I consider a bug, not very realistic
   and I believe its reasonable to have the existing rulesets to 'just
   work'.

4. Document this is expected and force users to steer problematic
   packets to the same CPU -- this would serialize the "allocate new
   conntrack entry/nat table evaluation/perform nat/confirm entry", so
   no race can occur.  Similar drawback to 3.

Another advantage of this patch compared to 1) and 2) is that there are
no changes to the hot path; things are handled in the udp tracker and
the clash resolution path.

Cc: rcu@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-17 10:55:14 +01:00
Matteo Croce
744676e777 openvswitch: add TTL decrement action
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.

Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.

Tested with a corresponding change in the userspace:

    # ovs-dpctl dump-flows
    in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},1
    in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},2
    in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
    in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1

    # ping -c1 192.168.0.2 -t 42
    IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
    # ping -c1 192.168.0.2 -t 120
    IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
    # ping -c1 192.168.0.2 -t 1
    #

Co-developed-by: Bindiya Kurle <bindiyakurle@gmail.com>
Signed-off-by: Bindiya Kurle <bindiyakurle@gmail.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16 19:34:44 -08:00
Arjun Roy
33946518d4 tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.

For applications using epoll, returning sk_err along with the result
of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a spurious wakeup.

Consider a multi-threaded application using epoll. A thread may awaken
with EPOLLIN but another thread may already be reading. The
spuriously-awoken thread does not necessarily know that another thread
'won'; rather, it may be possible that it was woken up due to the
presence of an error if there is no data. A zerocopy read receiving 0
bytes thus would need to be followed up by recvmsg to be sure.

Instead, we return sk_err directly with zerocopy, so the application
can avoid this extra system call.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16 19:25:02 -08:00
Arjun Roy
c8856c0514 tcp-zerocopy: Return inq along with tcp receive zerocopy.
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.

For applications using edge-triggered epoll, returning inq along with
the result of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
since normally we would need to perform a recvmsg() call for every
successful small RPC read via TCP receive zerocopy, returning inq can
reduce the number of system calls performed by approximately half.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16 19:25:02 -08:00
David S. Miller
ddb535a6a0 A few big new things:
* 802.11 frame encapsulation offload support
  * more HE (802.11ax) support, including some for 6 GHz band
  * powersave in hwsim, for better testing
 
 Of course as usual there are various cleanups and small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl5GggsACgkQB8qZga/f
 l8Qgdg//R42bSv94JYPcwZ5phgTgraCRZODWjBJq08n2T5m0EmEufgX79d9uEdgT
 u9npvn+ich5/VZhmuSbGrW9TT6/FPLAZyghV1fj79o971Qd7ky2Mp8G1fcTEbtDn
 IG2e9vauY9XDSb2O3wNj8dA8rAN/kLNmhsPqWxn2CgLPqjdbf+W15dvo4rnaL2gs
 ffGyE47dHuAFwCruyT8UPbw3iu4+tQhruN9eVg+UkU8rJGvEMqfrLK20zl1weIV9
 a7IuXdxacdsHO8Y+tl6GtvgOURQPpvf55+leLOUhcmHPJ3f/eAal6wmWRxDxs/qB
 IWSe8BC81cZZ5pYWk1A+0sXfJMlYjNsN0xw5SQRSrbgyb5saz8aLUIlHsOBM4iPH
 SwzCMN5A1GOPOUFsugzPwbiki9g6dh0/EC2NyXE4A26CAd967dVXTvTY5SMNgiB+
 bZaaUDaPQUm1jgDT5bLRhTipTHbekDkYzG/e+wNO+HKyStoEYM485MwY4MQCYzEh
 HKDmkAbFuCwEUeXXw1y8GybUknApCRru9FtY+oiN/+y/aESfB7HJfmDFFU/KYgPu
 HOuqJoNAxdMdycDCb24/cLjUiehzfM6sujwBxZOD5WHhAcXrBo5dGd6ibfurIrjj
 XI36/mwTiMtyyb0/5xM1AKvoic2j+a5YU3MB7KSc9TlaPa5j2NA=
 =CgmJ
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2020-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
A few big new things:
 * 802.11 frame encapsulation offload support
 * more HE (802.11ax) support, including some for 6 GHz band
 * powersave in hwsim, for better testing

Of course as usual there are various cleanups and small fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16 19:00:22 -08:00
Christian Brauner
ef2c41cf38 clone3: allow spawning processes into cgroups
This adds support for creating a process in a different cgroup than its
parent. Callers can limit and account processes and threads right from
the moment they are spawned:
- A service manager can directly spawn new services into dedicated
  cgroups.
- A process can be directly created in a frozen cgroup and will be
  frozen as well.
- The initial accounting jitter experienced by process supervisors and
  daemons is eliminated with this.
- Threaded applications or even thread implementations can choose to
  create a specific cgroup layout where each thread is spawned
  directly into a dedicated cgroup.

This feature is limited to the unified hierarchy. Callers need to pass
a directory file descriptor for the target cgroup. The caller can
choose to pass an O_PATH file descriptor. All usual migration
restrictions apply, i.e. there can be no processes in inner nodes. In
general, creating a process directly in a target cgroup adheres to all
migration restrictions.

One of the biggest advantages of this feature is that CLONE_INTO_GROUP does
not need to grab the write side of the cgroup cgroup_threadgroup_rwsem.
This global lock makes moving tasks/threads around super expensive. With
clone3() this lock is avoided.

Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:57:51 -05:00
Bartosz Golaszewski
51c1064e82 gpiolib: add new ioctl() for monitoring changes in line info
Currently there is no way for user-space to be informed about changes
in status of GPIO lines e.g. when someone else requests the line or its
config changes. We can only periodically re-read the line-info. This
is fine for simple one-off user-space tools, but any daemon that provides
a centralized access to GPIO chips would benefit hugely from an event
driven line info synchronization.

This patch adds a new ioctl() that allows user-space processes to reuse
the file descriptor associated with the character device for watching
any changes in line properties. Every such event contains the updated
line information.

Currently the events are generated on three types of status changes: when
a line is requested, when it's released and when its config is changed.
The first two are self-explanatory. For the third one: this will only
happen when another user-space process calls the new SET_CONFIG ioctl()
as any changes that can happen from within the kernel (i.e.
set_transitory() or set_debounce()) are of no interest to user-space.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2020-02-12 12:05:47 +01:00
Kan Liang
bbfd5e4fab perf/core: Add new branch sample type for HW index of raw branch records
The low level index is the index in the underlying hardware buffer of
the most recently captured taken branch which is always saved in
branch_entries[0]. It is very useful for reconstructing the call stack.
For example, in Intel LBR call stack mode, the depth of reconstructed
LBR call stack limits to the number of LBR registers. With the low level
index information, perf tool may stitch the stacks of two samples. The
reconstructed LBR call stack can break the HW limitation.

Add a new branch sample type to retrieve low level index of raw branch
records. The low level index is between -1 (unknown) and max depth which
can be retrieved in /sys/devices/cpu/caps/branches.

Only when the new branch sample type is set, the low level index
information is dumped into the PERF_SAMPLE_BRANCH_STACK output.
Perf tool should check the attr.branch_sample_type, and apply the
corresponding format for PERF_SAMPLE_BRANCH_STACK samples.
Otherwise, some user case may be broken. For example, users may parse a
perf.data, which include the new branch sample type, with an old version
perf tool (without the check). Users probably get incorrect information
without any warning.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com
2020-02-11 13:23:49 +01:00
Peter Chen
ca4b43c14c usb: charger: assign specific number for enum value
To work properly on every architectures and compilers, the enum value
needs to be specific numbers.

Suggested-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/1580537624-10179-1-git-send-email-peter.chen@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-10 11:08:30 -08:00
Linus Torvalds
380a129eb2 fs: New zonefs file system
Zonefs is a very simple file system exposing each zone of a zoned block
 device as a file.
 
 Unlike a regular file system with native zoned block device support
 (e.g. f2fs or the on-going btrfs effort), zonefs does not hide the
 sequential write constraint of zoned block devices to the user. As a
 result, zonefs is not a POSIX compliant file system. Its goal is to
 simplify the implementation of zoned block devices support in
 applications by replacing raw block device file accesses with a richer
 file based API, avoiding relying on direct block device file ioctls
 which may be more obscure to developers.
 
 One example of this approach is the implementation of LSM
 (log-structured merge) tree structures (such as used in RocksDB and
 LevelDB) on zoned block devices by allowing SSTables to be stored in a
 zone file similarly to a regular file system rather than as a range of
 sectors of a zoned device. The introduction of the higher level
 construct "one file is one zone" can help reducing the amount of changes
 needed in the application while at the same time allowing the use of
 zoned block devices with various programming languages other than C.
 
 Zonefs IO management implementation uses the new iomap generic code.
 Zonefs has been successfully tested using a functional test suite
 (available with zonefs userland format tool on github) and a prototype
 implementation of LevelDB on top of zonefs.
 
 Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCXj1y8QAKCRDdoc3SxdoY
 dqozAP9J3t+Q95BgKgI5jP+XEtyYsPBTaVrvaSaViEnwtJLVoQD/ZQ1lTCZSE9OI
 UkvWawkuFtLGfOxTqyA3eZrZi22Ttwk=
 =YVvO
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull new zonefs file system from Damien Le Moal:
 "Zonefs is a very simple file system exposing each zone of a zoned
  block device as a file.

  Unlike a regular file system with native zoned block device support
  (e.g. f2fs or the on-going btrfs effort), zonefs does not hide the
  sequential write constraint of zoned block devices to the user. As a
  result, zonefs is not a POSIX compliant file system. Its goal is to
  simplify the implementation of zoned block devices support in
  applications by replacing raw block device file accesses with a richer
  file based API, avoiding relying on direct block device file ioctls
  which may be more obscure to developers.

  One example of this approach is the implementation of LSM
  (log-structured merge) tree structures (such as used in RocksDB and
  LevelDB) on zoned block devices by allowing SSTables to be stored in a
  zone file similarly to a regular file system rather than as a range of
  sectors of a zoned device. The introduction of the higher level
  construct "one file is one zone" can help reducing the amount of
  changes needed in the application while at the same time allowing the
  use of zoned block devices with various programming languages other
  than C.

  Zonefs IO management implementation uses the new iomap generic code.
  Zonefs has been successfully tested using a functional test suite
  (available with zonefs userland format tool on github) and a prototype
  implementation of LevelDB on top of zonefs"

* tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Add documentation
  fs: New zonefs file system
2020-02-09 15:51:46 -08:00
Markus Theil
8c3ed7aa2b nl80211: add src and dst addr attributes for control port tx/rx
When using control port over nl80211 in AP mode with
pre-authentication, APs need to forward frames to other
APs defined by their MAC address. Before this patch,
pre-auth frames reaching user space over nl80211 control
port  have no longer any information about the dest attached,
which can be used for forwarding to a controller or injecting
the frame back to a ethernet interface over a AF_PACKET
socket.
Analog problems exist, when forwarding pre-auth frames from
AP -> STA.

This patch therefore adds the NL80211_ATTR_DST_MAC and
NL80211_ATTR_SRC_MAC attributes to provide more context
information when forwarding.
The respective arguments are optional on tx and included on rx.
Therefore unaware existing software is not affected.

Software which wants to detect this feature, can do so
by checking against:
  NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211_MAC_ADDRS

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200115125522.3755-1-markus.theil@tu-ilmenau.de
[split into separate cfg80211/mac80211 patches]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-07 12:58:37 +01:00
Veerendranath Jakkam
d6039a3416 cfg80211: Enhance the AKM advertizement to support per interface.
Commit ab4dfa2053 ("cfg80211: Allow drivers to advertise supported AKM
suites") introduces the support to advertize supported AKMs to userspace.

This needs an enhancement to advertize the AKM support per interface type,
specifically for the cfg80211-based drivers that implement SME and use
different mechanisms to support the AKM's for each interface type (e.g.,
the support for SAE, OWE AKM's take different paths for such drivers on
STA/AP mode).

This commit aims the same and enhances the earlier mechanism of advertizing
the AKMs per wiphy. Add new nl80211 attributes and data structure to
provide supported AKMs per interface type to userspace.

the AKMs advertized in akm_suites are default capabilities if not
advertized for a specific interface type in iftype_akm_suites.

Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Link: https://lore.kernel.org/r/20200126203032.21934-1-vjakkam@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-07 12:34:26 +01:00
Haim Dreyfuss
1e61d82cca cfg80211: add no HE indication to the channel flag
The regulatory domain might forbid HE operation.  Certain regulatory
domains may restrict it for specific channels whereas others may do it
for the whole regulatory domain.

Add an option to indicate it in the channel flag.

Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200121081213.733757-1-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-07 12:34:09 +01:00
Damien Le Moal
8dcc1a9d90 fs: New zonefs file system
zonefs is a very simple file system exposing each zone of a zoned block
device as a file. Unlike a regular file system with zoned block device
support (e.g. f2fs), zonefs does not hide the sequential write
constraint of zoned block devices to the user. Files representing
sequential write zones of the device must be written sequentially
starting from the end of the file (append only writes).

As such, zonefs is in essence closer to a raw block device access
interface than to a full featured POSIX file system. The goal of zonefs
is to simplify the implementation of zoned block device support in
applications by replacing raw block device file accesses with a richer
file API, avoiding relying on direct block device file ioctls which may
be more obscure to developers. One example of this approach is the
implementation of LSM (log-structured merge) tree structures (such as
used in RocksDB and LevelDB) on zoned block devices by allowing SSTables
to be stored in a zone file similarly to a regular file system rather
than as a range of sectors of a zoned device. The introduction of the
higher level construct "one file is one zone" can help reducing the
amount of changes needed in the application as well as introducing
support for different application programming languages.

Zonefs on-disk metadata is reduced to an immutable super block to
persistently store a magic number and optional feature flags and
values. On mount, zonefs uses blkdev_report_zones() to obtain the device
zone configuration and populates the mount point with a static file tree
solely based on this information. E.g. file sizes come from the device
zone type and write pointer offset managed by the device itself.

The zone files created on mount have the following characteristics.
1) Files representing zones of the same type are grouped together
   under a common sub-directory:
     * For conventional zones, the sub-directory "cnv" is used.
     * For sequential write zones, the sub-directory "seq" is used.
  These two directories are the only directories that exist in zonefs.
  Users cannot create other directories and cannot rename nor delete
  the "cnv" and "seq" sub-directories.
2) The name of zone files is the number of the file within the zone
   type sub-directory, in order of increasing zone start sector.
3) The size of conventional zone files is fixed to the device zone size.
   Conventional zone files cannot be truncated.
4) The size of sequential zone files represent the file's zone write
   pointer position relative to the zone start sector. Truncating these
   files is allowed only down to 0, in which case, the zone is reset to
   rewind the zone write pointer position to the start of the zone, or
   up to the zone size, in which case the file's zone is transitioned
   to the FULL state (finish zone operation).
5) All read and write operations to files are not allowed beyond the
   file zone size. Any access exceeding the zone size is failed with
   the -EFBIG error.
6) Creating, deleting, renaming or modifying any attribute of files and
   sub-directories is not allowed.
7) There are no restrictions on the type of read and write operations
   that can be issued to conventional zone files. Buffered, direct and
   mmap read & write operations are accepted. For sequential zone files,
   there are no restrictions on read operations, but all write
   operations must be direct IO append writes. mmap write of sequential
   files is not allowed.

Several optional features of zonefs can be enabled at format time.
* Conventional zone aggregation: ranges of contiguous conventional
  zones can be aggregated into a single larger file instead of the
  default one file per zone.
* File ownership: The owner UID and GID of zone files is by default 0
  (root) but can be changed to any valid UID/GID.
* File access permissions: the default 640 access permissions can be
  changed.

The mkzonefs tool is used to format zoned block devices for use with
zonefs. This tool is available on Github at:

git@github.com:damien-lemoal/zonefs-tools.git.

zonefs-tools also includes a test suite which can be run against any
zoned block device, including null_blk block device created with zoned
mode.

Example: the following formats a 15TB host-managed SMR HDD with 256 MB
zones with the conventional zones aggregation feature enabled.

$ sudo mkzonefs -o aggr_cnv /dev/sdX
$ sudo mount -t zonefs /dev/sdX /mnt
$ ls -l /mnt/
total 0
dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq

The size of the zone files sub-directories indicate the number of files
existing for each type of zones. In this example, there is only one
conventional zone file (all conventional zones are aggregated under a
single file).

$ ls -l /mnt/cnv
total 137101312
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0

This aggregated conventional zone file can be used as a regular file.

$ sudo mkfs.ext4 /mnt/cnv/0
$ sudo mount -o loop /mnt/cnv/0 /data

The "seq" sub-directory grouping files for sequential write zones has
in this example 55356 zones.

$ ls -lv /mnt/seq
total 14511243264
-rw-r----- 1 root root 0 Nov 25 13:23 0
-rw-r----- 1 root root 0 Nov 25 13:23 1
-rw-r----- 1 root root 0 Nov 25 13:23 2
...
-rw-r----- 1 root root 0 Nov 25 13:23 55354
-rw-r----- 1 root root 0 Nov 25 13:23 55355

For sequential write zone files, the file size changes as data is
appended at the end of the file, similarly to any regular file system.

$ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s

$ ls -l /mnt/seq/0
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0

The written file can be truncated to the zone size, preventing any
further write operation.

$ truncate -s 268435456 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0

Truncation to 0 size allows freeing the file zone storage space and
restart append-writes to the file.

$ truncate -s 0 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0

Since files are statically mapped to zones on the disk, the number of
blocks of a file as reported by stat() and fstat() indicates the size
of the file zone.

$ stat /mnt/seq/0
  File: /mnt/seq/0
  Size: 0       Blocks: 524288     IO Block: 4096   regular empty file
Device: 870h/2160d      Inode: 50431       Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/  root)
Access: 2019-11-25 13:23:57.048971997 +0900
Modify: 2019-11-25 13:52:25.553805765 +0900
Change: 2019-11-25 13:52:25.553805765 +0900
 Birth: -

The number of blocks of the file ("Blocks") in units of 512B blocks
gives the maximum file size of 524288 * 512 B = 256 MB, corresponding
to the device zone size in this example. Of note is that the "IO block"
field always indicates the minimum IO size for writes and corresponds
to the device physical sector size.

This code contains contributions from:
* Johannes Thumshirn <jthumshirn@suse.de>,
* Darrick J. Wong <darrick.wong@oracle.com>,
* Christoph Hellwig <hch@lst.de>,
* Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and
* Ting Yao <tingyao@hust.edu.cn>.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-02-07 14:39:38 +09:00
Linus Torvalds
90568ecf56 s390:
* fix register corruption
 * ENOTSUPP/EOPNOTSUPP mixed
 * reset cleanups/fixes
 * selftests
 
 x86:
 * Bug fixes and cleanups
 * AMD support for APIC virtualization even in combination with
   in-kernel PIT or IOAPIC.
 
 MIPS:
 * Compilation fix.
 
 Generic:
 * Fix refcount overflow for zero page.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeOuf7AAoJEL/70l94x66DOBQH/j1W9lUpbDgr9aWbrZT+O/yP
 FWzUDrRlCZCjV1FQKbGPa4YLeDRTG5n+RIQTjmCGRqiMqeoELSJ1+iK99e97nG/u
 L28zf/90Nf0R+wwHL4AOFeploTYfG4WP8EVnlr3CG2UCJrNjxN1KU7yRZoWmWa2d
 ckLJ8ouwNvx6VZd233LVmT38EP4352d1LyqIL8/+oXDIyAcRJLFQu1gRCwagsh3w
 1v1czowFpWnRQ/z9zU7YD+PA4v85/Ge8sVVHlpi1X5NgV/khk4U6B0crAw6M+la+
 mTmpz9g56oAh9m9NUdtv4zDCz1EWGH0v8+ZkAajUKtrM0ftJMn57P6p8PH4VVlE=
 =5+Wl
 -----END PGP SIGNATURE-----

Merge tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "s390:
   - fix register corruption
   - ENOTSUPP/EOPNOTSUPP mixed
   - reset cleanups/fixes
   - selftests

  x86:
   - Bug fixes and cleanups
   - AMD support for APIC virtualization even in combination with
     in-kernel PIT or IOAPIC.

  MIPS:
   - Compilation fix.

  Generic:
   - Fix refcount overflow for zero page"

* tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
  KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit
  x86: vmxfeatures: rename features for consistency with KVM and manual
  KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
  KVM: x86: Fix perfctr WRMSR for running counters
  x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
  x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  kvm: mmu: Separate generating and setting mmio ptes
  kvm: mmu: Replace unsigned with unsigned int for PTE access
  KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
  KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()
  KVM: MIPS: Fix a build error due to referencing not-yet-defined function
  x86/kvm: do not setup pv tlb flush when not paravirtualized
  KVM: fix overflow of zero page refcount with ksm running
  KVM: x86: Take a u64 when checking for a valid dr7 value
  KVM: x86: use raw clock values consistently
  KVM: x86: reorganize pvclock_gtod_data members
  KVM: nVMX: delete meaningless nested_vmx_run() declaration
  KVM: SVM: allow AVIC without split irqchip
  kvm: ioapic: Lazy update IOAPIC EOI
  ...
2020-02-06 09:07:45 -08:00
Paolo Bonzini
ef09f4f463 KVM: s390: Fixes and cleanups for 5.6
- fix register corruption
 - ENOTSUPP/EOPNOTSUPP mixed
 - reset cleanups/fixes
 - selftests
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJeNDcAAAoJEBF7vIC1phx8NkcP/2JWMr/9v44LJJ8BfZVFqdP4
 i41pVFIgtI8Ieqjgp+Fuiu/8ELPxfohzBZ1Rm60TPcZlJ+uREmHklG1ZD2iXEJix
 0YqzICadQ4OvJxiFpi/s5+9bzczoxCIEx7CfJ4PTM2V3qtefauFgNtoSMevF9CtK
 6UuPNNjBi6cJuG3uAyqoOZ3vbMNeZ337ffEgBwukR01UxGImXwJ9odPFEwz31hji
 WKEEbnPaXFZUKy2vMSZVcndJKkhb043QFkZBY98D8m5VTSO5UFwpdYuht6QdMSKx
 IrxDN7788e/p4IPOGBWAXuhjYcmAYZh2Ayt7DM53b49XhWifsc6fw4khly2fjr3+
 Wg5Ol13ls2WaeDTGd5c4XQRWpQD27Wnum0yXLaVf2gaTRbTqrrsisWLHL6k/gqyb
 CXqJIr11/sb4zLwlwXPSrOrIz3CRz4DqawF/F0q47rHC7xyGsRzpGU4gP5Aqj8op
 qAMVORoQQjMtH4fVv6/NhIG6srVeonNA5GjI6hkYZ85mEJhy5Nl9lNuyEh4W094D
 fkNSnlWcCG8fyoLih1SHVa7cROVI8G0tfwhk4uSjRCXXtA5B5Rve2LQl3nCP9gUX
 m7Y6Qzm/yusVtaTu+YE8MyXVE2bpvGMR/xeztIR8eYw/LqbodOzxkRLdfeH2cfaD
 VCmFaVuUjTXx5q4xYmIl
 =ZgeW
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes and cleanups for 5.6
- fix register corruption
- ENOTSUPP/EOPNOTSUPP mixed
- reset cleanups/fixes
- selftests
2020-02-05 16:15:05 +01:00
Linus Torvalds
eadc4e40e6 RTC for 5.6
Subsystem:
  - the VL_READ and VL_CLR ioctls are now documented and their behavior is
    unified across all the drivers.
  - RTC_I2C_AND_SPI Kconfig option rework to avoid selecting both REGMAP_I2C and
    REGMAP_SPI unecessarily.
 
 Drivers:
  - at91rm9200: remove deprecated procfs, add sam9x60, sama5d4 and sama5d2
    compatibles.
  - cmos: solve lost interrupts issue on MS Surface 3
  - hym8563: return proper errno when time is invalid
  - rv3029: many fixes, nvram support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEycoQi/giopmpPgB12wIijOdRNOUFAl44lxsACgkQ2wIijOdR
 NOVF3BAAlQVFDklLqkS6MJplgZ06lgv4ZIbfxCHYFJPUNt/X/gTcHS1OSTwctmor
 3qQamySGc74/GAI6lmNSSzDUz9vi2hwcwvLtR/e6+Kgdhg9EuWWvgsqKYRGAn573
 TzHWsYY8bDpZ/mN5K+qqadGlzsP58gsEaw0fzcsPkZQQdq4mNSIrB5RILvsHX8cN
 +RviYOqR+ZvAegfQrAfb/9SCwxQAstUqRDaZXodQDEeIk5CEDWyr31+U9eLDdYoN
 1FOHYp6uwUy6Vnl0ym7WU42L95tVWx9XOc/PEq8dZ1m09nfrMhqeIoQC8SUtxG9+
 FWXN87lkLSlDaLUwVE8T22QII6jP+7Lc2t6SbI4fwwJdNDoPg+5hhabGjQbM2We9
 nG1x7TsVwKjeUglfhqeVgGWYzmIhAeOEBXQhqCdtfVi13ocecVFJxOG6oolQGtlZ
 M+/t91hID6il7/nxGembcHKapMf9c41CXBPesEg0QjkvGvKj1Z8L9a5vYigzbxWW
 zb8m9cjIhY3bb8mns3aCs773PSXavsycLk1Hupw6RWrN1XnEXVMHQZDtvzCJe7EV
 8CY40LI0VN6x5HC/d0EnqyrMvQbkgTBPXjLBhj54edCZM3vAny3xkYc03oDU8cKc
 0PbLUXKpFmzFbTQB4w0sE2FgT0uZmL2SCXiFemUwxbl+/ZACkoc=
 =e1Xr
 -----END PGP SIGNATURE-----

Merge tag 'rtc-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "The VL_READ and VL_CLR ioctls have been reworked to be more useful.
  This will not break userspace as there are very few users and they are
  using the integer value as a boolean.

  Apart from that, two drivers were reworked and a few fixes here and
  there for a net reduction of number of lines.

  Summary:

  Subsystem:
   - the VL_READ and VL_CLR ioctls are now documented and their behavior
     is unified across all the drivers.
   - RTC_I2C_AND_SPI Kconfig option rework to avoid selecting both
     REGMAP_I2C and REGMAP_SPI unecessarily.

  Drivers:
   - at91rm9200: remove deprecated procfs, add sam9x60, sama5d4 and
     sama5d2 compatibles.
   - cmos: solve lost interrupts issue on MS Surface 3
   - hym8563: return proper errno when time is invalid
   - rv3029: many fixes, nvram support"

* tag 'rtc-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (63 commits)
  dt-bindings: rtc: at91rm9200: document clocks property
  rtc: i2c/spi: Avoid inclusion of REGMAP support when not needed
  rtc: Kconfig: select REGMAP_I2C when necessary
  rtc: Kconfig: properly indent sd3078 entry
  rtc: cmos: Refactor code by using the new dmi_get_bios_year() helper
  rtc: cmos: Use predefined value for RTC IRQ on legacy x86
  rtc: cmos: Stop using shared IRQ
  rtc: tps6586x: Use IRQ_NOAUTOEN flag
  rtc: at91rm9200: use FIELD_PREP/FIELD_GET
  rtc: at91rm9200: avoid time readout in at91_rtc_setalarm
  rtc: at91rm9200: move register definitions to C file
  rtc: at91rm9200: add sama5d4 and sama5d2 compatibles
  dt-bindings: rtc: at91rm9200: convert bindings to json-schema
  rtc: at91rm9200: remove procfs information
  dt-bindings: atmel, at91rm9200-rtc: add microchip, sam9x60-rtc
  rtc: pcf8563: Use BIT
  rtc: moxart: Convert to SPDX identifier
  rtc: ds1343: Remove unused struct spi_device in struct ds1343_priv
  rtc: rx8025: Remove struct i2c_client from struct rx8025_data
  rtc: hym8563: Read the valid flag directly instead of caching it
  ...
2020-02-04 07:03:40 +00:00
Linus Torvalds
acd77500aa Change /dev/random so that it uses the CRNG and only blocking if the
CRNG hasn't initialized, instead of the old blocking pool.  Also clean
 up archrandom.h, and some other miscellaneous cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl40j1kACgkQ8vlZVpUN
 gaPCywf8CWS9HFd2Iipj60gkTVugjlL5ib0lbfhQcAAwwzw1GLTXJSMBzzoMRHY/
 ZI2sJZS1m0V1oWNnXXVKi+A1VXmlValWXAc+7fvbeaIe5pRT1EHP14s4Kz7/4d8Q
 dk0b8cxNpR8u5CcbN8y9D+71IKpdksUbX7uGuGfw3bncQdRNwJVf+oS1fMGS0Rsb
 F8ddQaED7iFpX2BMl56afQ4t2t0LA5+eLYMGoYoJx5fgd9BseP0TEcjj9Y4Z30M7
 +GO4NZjUbAY0syx9r8hx3P/5miWZm2J9QJmJoXHhr5+IcAKM+6+Uo6X6gkOEqV4i
 U//V1cqNuowV5ckE4Na+MfBillinsQ==
 =HeFM
 -----END PGP SIGNATURE-----

Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull random changes from Ted Ts'o:
 "Change /dev/random so that it uses the CRNG and only blocking if the
  CRNG hasn't initialized, instead of the old blocking pool. Also clean
  up archrandom.h, and some other miscellaneous cleanups"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (24 commits)
  s390x: Mark archrandom.h functions __must_check
  powerpc: Mark archrandom.h functions __must_check
  powerpc: Use bool in archrandom.h
  x86: Mark archrandom.h functions __must_check
  linux/random.h: Mark CONFIG_ARCH_RANDOM functions __must_check
  linux/random.h: Use false with bool
  linux/random.h: Remove arch_has_random, arch_has_random_seed
  s390: Remove arch_has_random, arch_has_random_seed
  powerpc: Remove arch_has_random, arch_has_random_seed
  x86: Remove arch_has_random, arch_has_random_seed
  random: remove some dead code of poolinfo
  random: fix typo in add_timer_randomness()
  random: Add and use pr_fmt()
  random: convert to ENTROPY_BITS for better code readability
  random: remove unnecessary unlikely()
  random: remove kernel.random.read_wakeup_threshold
  random: delete code to pull data into pools
  random: remove the blocking pool
  random: make /dev/random be almost like /dev/urandom
  random: ignore GRND_RANDOM in getentropy(2)
  ...
2020-02-01 09:48:37 -08:00
Dmitry Torokhov
b19efcabb5 Merge branch 'next' into for-linus
Prepare input updates for 5.6 merge window.
2020-01-31 17:42:33 -08:00
Linus Torvalds
26dca6dbd6 pci-v5.6-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl40PWgUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwclA/+Id/7Uc5S0r7xgFQRr3lbn0hHcx7f
 oBgmm6kGl8bu77MDiY32WLmPsp9e4BlK2M765cKQL5n20y8CzJ+kthZM8tZEDba4
 pnrZnWZ0A2xaBKzJqqYDtCqAeP97noCs4zBLo3JCA6jYCYI5bkvmdMQRlRjTUofO
 tkenGE+vexaJsLB7ghNskL3xGMueXLtLf/hXvaC6WGbSI9/zUmliHDL53DoKDPRo
 /9TGYDMwItZz+BhmBJz8hAL4naQIhIcDk2mz7CzWkY9xDhCJ1yeEwFvtvJwq0uM2
 Nmtq1g6yCB3sjlx+bRzrioLnouflztK1PGRbNugrMkR5XM9HIFmNwaDrqpU11ffA
 LQabMpbS3RWH3hbh4LYVMW13hbO+ld7/NG8jMFce2LHBWaGj6YejUQGdifz6vGRk
 JnDOgP19v5gWw08ibwkdfYzznPfMXp5IzFdJQFKhK+ugGDSJ8VeXiQ/pWtzghl3z
 P/puRw0BiL7ob/FUmhwn4J1Ytml7PZE+cJVN2l4C/CwKxR583GRUDgSHNL7Dky+o
 GcH9Tmjt4hQMNYRP01PACUmFYJwDfB+zgQ64a+uJsQwl/j+yfMnc1t/kqdM6yC9J
 GgkqLp989G/a3n9w5IC1P8aDYiwRqABvAFzlP9OZcIMUwmWbrhH175Qf6skKYIhH
 q9RKcLVXZdRS3mc=
 =fQ0E
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 "Resource management:

   - Improve resource assignment for hot-added nested bridges, e.g.,
     Thunderbolt (Nicholas Johnson)

  Power management:

   - Optionally print config space of devices before suspend (Chen Yu)

   - Increase D3 delay for AMD Ryzen5/7 XHCI controllers (Daniel Drake)

  Virtualization:

   - Generalize DMA alias quirks (James Sewart)

   - Add DMA alias quirk for PLX PEX NTB (James Sewart)

   - Fix IOV memory leak (Navid Emamdoost)

  AER:

   - Log which device prevents error recovery (Yicong Yang)

  Peer-to-peer DMA:

   - Whitelist Intel SkyLake-E (Armen Baloyan)

  Broadcom iProc host bridge driver:

   - Apply PAXC quirk whether driver is built-in or module (Wei Liu)

  Broadcom STB host bridge driver:

   - Add Broadcom STB PCIe host controller driver (Jim Quinlan)

  Intel Gateway SoC host bridge driver:

   - Add driver for Intel Gateway SoC (Dilip Kota)

  Intel VMD host bridge driver:

   - Add support for DMA aliases on other buses (Jon Derrick)

   - Remove dma_map_ops overrides (Jon Derrick)

   - Remove now-unused X86_DEV_DMA_OPS (Christoph Hellwig)

  NVIDIA Tegra host bridge driver:

   - Fix Tegra30 afi_pex2_ctrl register offset (Marcel Ziswiler)

  Panasonic UniPhier host bridge driver:

   - Remove module code since driver can't be built as a module
     (Masahiro Yamada)

  Qualcomm host bridge driver:

   - Add support for SDM845 PCIe controller (Bjorn Andersson)

  TI Keystone host bridge driver:

   - Fix "num-viewport" DT property error handling (Kishon Vijay Abraham I)

   - Fix link training retries initiation (Yurii Monakov)

   - Fix outbound region mapping (Yurii Monakov)

  Misc:

   - Add Switchtec Gen4 support (Kelvin Cao)

   - Add Switchtec Intercomm Notify and Upstream Error Containment
     support (Logan Gunthorpe)

   - Use dma_set_mask_and_coherent() since Switchtec supports 64-bit
     addressing (Wesley Sheng)"

* tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (60 commits)
  PCI: Allow adjust_bridge_window() to shrink resource if necessary
  PCI: Set resource size directly in adjust_bridge_window()
  PCI: Rename extend_bridge_window() to adjust_bridge_window()
  PCI: Rename extend_bridge_window() parameter
  PCI: Consider alignment of hot-added bridges when assigning resources
  PCI: Remove local variable usage in pci_bus_distribute_available_resources()
  PCI: Pass size + alignment to pci_bus_distribute_available_resources()
  PCI: Rename variables
  PCI: vmd: Add two VMD Device IDs
  PCI: Remove unnecessary braces
  PCI: brcmstb: Add MSI support
  PCI: brcmstb: Add Broadcom STB PCIe host controller driver
  x86/PCI: Remove X86_DEV_DMA_OPS
  PCI: vmd: Remove dma_map_ops overrides
  iommu/vt-d: Remove VMD child device sanity check
  iommu/vt-d: Use pci_real_dma_dev() for mapping
  PCI: Introduce pci_real_dma_dev()
  x86/PCI: Expose VMD's pci_dev in struct pci_sysdata
  x86/PCI: Add to_pci_sysdata() helper
  PCI/AER: Initialize aer_fifo
  ...
2020-01-31 14:48:54 -08:00
Linus Torvalds
846de71bed media updates for v5.6-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl40SYEACgkQCF8+vY7k
 4RU4TQ/8CgWj2+0uMRyIGpggB7B82vBPRqfHr4DIcZzbLSkdDkeDtrEfM5058cUc
 y3NpW9djmcqDMPvOZKFAkb03Bd+mtv89kI72RBTT2mVwCfySYa02K63RqgDg2aFU
 FScPUXlwB8hmZG6BpDlMiykJY1SVyhpb9R2f/7scgJ0ZKVwkKRMmLC5/I5A1IbFX
 WpoqNzRmT07bZJyDdm5RkzxHdM1EP0flMsqWJb3O2aWqeAw9u9+issk+Uv+cMGR+
 70+pmE/6qeurQjS9OHRhrSkf4HjybeByATfgSnONqNrWBtQXgBrHI2TjmT2NvNqV
 kWfsprM1GNPhsLveG6JYKGSNwZK6BHxuUULIjXAr1ocRrae2jVZ7/SZkAvnvzO3v
 hnb2HwgMBkQSctcl4EJDJeLIc1HgIKbZ7D/mFj7N9Mk3Kn7AqcLNHBv+GMunCPFl
 yXNq23ELfxC1HpmQPVhXNM/UaaO5MZCSvOD3MDObcjrxtv4b2bovi6ACDUTgGUNL
 sDozTurG2p1VeGupUnzia62gfb0/fjZ2WBk7RRp8E2K4/93YNEeMA9wgF0E8b4YQ
 SDQcDF1EtsAPF3msiXBC5FSFG42Ly7Ry04fl0v4lAle/0bPamEdQJ2CMVa1ux2Kp
 MRxI39CbRqtoIUbpKzTeInh/FvDDU22TimBKc5sg9d29Hk6y+Yk=
 =sGdY
 -----END PGP SIGNATURE-----

Merge tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - New staging driver for Rockship ISPv1 unit

 - New staging driver for Rockchip MIPI Synopsys DPHY RX0

 - y2038 fixes at V4L2 API (backward-compatible)

 - A dvb core fix when receiving invalid EIT sections

 - Some clang-specific warnings got fixed

 - Added support for touch V4L2 interface at vivid

 - Several drivers were converted to use the new
   i2c_new_scanned_device() kAPI

 - Added sm1 support at meson's vdec driver

 - Several other driver cleanups, fixes and improvements

* tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (207 commits)
  media: staging/intel-ipu3: remove TODO item about acronyms
  media: v4l2-fwnode: Print the node name while parsing endpoints
  media: Revert "media: staging/intel-ipu3: make imgu use fixed running mode"
  media: mt9v111: constify copied structure
  media: platform: VIDEO_MEDIATEK_JPEG can also depend on MTK_IOMMU
  media: uvcvideo: Add a quirk to force GEO GC6500 Camera bits-per-pixel value
  media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
  media: hantro: fix post-processing NULL pointer dereference
  media: rcar-vin: Use correct pixel format when aligning format
  media: MAINTAINERS: add entry for Rockchip ISP1 driver
  media: staging: rkisp1: add TODO file for staging
  media: staging: rkisp1: add document for rkisp1 meta buffer format
  media: staging: rkisp1: add output device for parameters
  media: staging: rkisp1: add capture device for statistics
  media: staging: rkisp1: add user space ABI definitions
  media: staging: rkisp1: add streaming paths
  media: staging: rkisp1: add Rockchip ISP1 base driver
  media: staging: phy-rockchip-dphy-rx0: add Rockchip MIPI Synopsys DPHY RX0 driver
  media: staging: dt-bindings: add Rockchip MIPI RX D-PHY RX0 yaml bindings
  media: staging: dt-bindings: add Rockchip ISP1 yaml bindings
  ...
2020-01-31 14:43:23 -08:00
Linus Torvalds
7eec11d3a7 Merge branch 'akpm' (patches from Andrew)
Pull updates from Andrew Morton:
 "Most of -mm and quite a number of other subsystems: hotfixes, scripts,
  ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov.

  MM is fairly quiet this time.  Holidays, I assume"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  kcov: ignore fault-inject and stacktrace
  include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
  execve: warn if process starts with executable stack
  reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
  init/main.c: fix misleading "This architecture does not have kernel memory protection" message
  init/main.c: fix quoted value handling in unknown_bootoption
  init/main.c: remove unnecessary repair_env_string in do_initcall_level
  init/main.c: log arguments and environment passed to init
  fs/binfmt_elf.c: coredump: allow process with empty address space to coredump
  fs/binfmt_elf.c: coredump: delete duplicated overflow check
  fs/binfmt_elf.c: coredump: allocate core ELF header on stack
  fs/binfmt_elf.c: make BAD_ADDR() unlikely
  fs/binfmt_elf.c: better codegen around current->mm
  fs/binfmt_elf.c: don't copy ELF header around
  fs/binfmt_elf.c: fix ->start_code calculation
  fs/binfmt_elf.c: smaller code generation around auxv vector fill
  lib/find_bit.c: uninline helper _find_next_bit()
  lib/find_bit.c: join _find_next_bit{_le}
  uapi: rename ext2_swab() to swab() and share globally in swab.h
  lib/scatterlist.c: adjust indentation in __sg_alloc_table
  ...
2020-01-31 12:16:36 -08:00
Yury Norov
d5767057c9 uapi: rename ext2_swab() to swab() and share globally in swab.h
ext2_swab() is defined locally in lib/find_bit.c However it is not
specific to ext2, neither to bitmaps.

There are many potential users of it, so rename it to just swab() and
move to include/uapi/linux/swab.h

ABI guarantees that size of unsigned long corresponds to BITS_PER_LONG,
therefore drop unneeded cast.

Link: http://lkml.kernel.org/r/20200103202846.21616-1-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:40 -08:00
Hao Lee
0a3c577297 mm: fix comments related to node reclaim
As zone reclaim has been replaced by node reclaim, this patch fixes
related comments.

Link: http://lkml.kernel.org/r/20191126141346.GA22665@haolee.github.io
Signed-off-by: Hao Lee <haolee.swjtu@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:39 -08:00
Janosch Frank
7de3f1423f KVM: s390: Add new reset vcpu API
The architecture states that we need to reset local IRQs for all CPU
resets. Because the old reset interface did not support the normal CPU
reset we never did that on a normal reset.

Let's implement an interface for the missing normal and clear resets
and reset all local IRQs, registers and control structures as stated
in the architecture.

Userspace might already reset the registers via the vcpu run struct,
but as we need the interface for the interrupt clearing part anyway,
we implement the resets fully and don't rely on userspace to reset the
rest.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20200131100205.74720-4-frankja@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-01-31 12:50:04 +01:00
Linus Torvalds
9f68e3655a drm pull for 5.6-rc1
uapi:
 - dma-buf heaps added (and fixed)
 - command line add support for panel oreientation
 - command line allow overriding penguin count
 
 drm:
 - mipi dsi definition updates
 - lockdep annotations for dma_resv
 - remove dma-buf kmap/kunmap support
 - constify fb_ops in all fbdev drivers
 - MST fix for daisy chained hotplug-
 - CTA-861-G modes with VIC >= 193 added
 - fix drm_panel_of_backlight export
 - LVDS decoder support
 - more device based logging support
 - scanline alighment for dumb buffers
 - MST DSC helpers
 
 scheduler:
 - documentation fixes
 - job distribution improvements
 
 panel:
 - Logic PD type 28 panel support
 - Jimax8729d MIPI-DSI
 - igenic JZ4770
 - generic DSI devicetree bindings
 - sony acx424AKP panel
 - Leadtek LTK500HD1829
 - xinpeng XPP055C272
 - AUO B116XAK01
 - GiantPlus GPM940B0
 - BOE NV140FHM-N49
 - Satoz SAT050AT40H12R2
 - Sharp LS020B1DD01D panels.
 
 ttm:
 - use blocking WW lock
 
 i915:
 - hw/uapi state separation
 - Lock annotation improvements
 - selftest improvements
 - ICL/TGL DSI VDSC support
 - VBT parsing improvments
 - Display refactoring
 - DSI updates + fixes
 - HDCP 2.2 for CFL
 - CML PCI ID fixes
 - GLK+ fbc fix
 - PSR fixes
 - GEN/GT refactor improvments
 - DP MST fixes
 - switch context id alloc to xarray
 - workaround updates
 - LMEM debugfs support
 - tiled monitor fixes
 - ICL+ clock gating programming removed
 - DP MST disable sequence fixed
 - LMEM discontiguous object maps
 - prefaulting for discontiguous objects
 - use LMEM for dumb buffers if possible
 - add LMEM mmap support
 
 amdgpu:
 - enable sync object timelines for vulkan
 - MST atomic routines
 - enable MST DSC support
 - add DMCUB display microengine support
 - DC OEM i2c support
 - Renoir DC fixes
 - Initial HDCP 2.x support
 - BACO support for Arcturus
 - Use BACO for runtime PM power save
 - gfxoff on navi10
 - gfx10 golden updates and fixes
 - DCN support on POWER
 - GFXOFF for raven1 refresh
 - MM engine idle handlers cleanup
 - 10bpc EDP panel fixes
 - renoir watermark fixes
 - SR-IOV fixes
 - Arcturus VCN fixes
 - GDDR6 training fixes
 - freesync fixes
 - Pollock support
 
 amdkfd:
 - unify more codepath with amdgpu
 - use KIQ to setup HIQ rather than MMIO
 
 radeon:
 - fix vma fault handler race
 - PPC DMA fix
 - register check fixes for r100/r200
 
 nouveau:
 - mmap_sem vs dma_resv fix
 - rewrite the ACR secure boot code for Turing
 - TU10x graphics engine support (TU11x pending)
 - Page kind mapping for turing
 - 10-bit LUT support
 - GP10B Tegra fixes
 - HD audio regression fix
 
 hisilicon/hibmc:
 - use generic fbdev code and helpers
 
 rockchip:
 - dsi/px30 support
 
 virtio:
 - fb damage support
 - static some functions
 
 vc4:
 - use dma_resv lock wrappers
 
 msm:
 - use dma_resv lock wrappers
 - sc7180 display + DSI support
 - a618 support
 - UBWC support improvements
 
 vmwgfx:
 - updates + new logging uapi
 
 exynos:
 - enable/disable callback cleanups
 
 etnaviv:
 - use dma_resv lock wrappers
 
 atmel-hlcdc:
 - clock fixes
 
 mediatek:
 - cmdq support
 - non-smooth cursor fixes
 - ctm property support
 
 sun4i:
 - suspend support
 - A64 mipi dsi support
 
 rcar-du:
 - Color management module support
 - LVDS encoder dual-link support
 - R8A77980 support
 
 analogic:
 - add support for an6345
 
 ast:
 - atomic modeset support
 - primary plane garbage fix
 
 arcgpu:
 - fixes for fourcc handling
 
 tegra:
 - minor fixes and improvments
 
 mcde:
 - vblank support
 
 meson:
 - OSD1 plane AFBC commit
 
 gma500:
 - add pageflip support
 - reomve global drm_dev
 
 komeda:
 - tweak debugfs output
 - d32 support
 - runtime PM suppotr
 
 udl:
 - use generic shmem helpers
 - cleanup and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJeMm6RAAoJEAx081l5xIa+vN8P/0j4jEOv+KIinAhoH+LG3EpD
 m2TUuu5OQIoBrcCoWOgFBk3wqYpw6PdMBdkXh+5sE5lfeBynp8oC3Bin+QsHJE05
 eGBpZtHe+70MQb0Eha+Aic0hchvBKzRnq6i0MYSIHn6afs76dLmF8knTjycxrvV5
 Xu1Z3WDmjzqgWF9ja5JCD6fby11seP5RrwObYKVikO35QQyJJwGSGKgu5rq/pByK
 /n0PCnCOINuL0Lz6J9qexdh/0/XYFQilRC31GJNlKbDSFuECF0GOEzEE/xUBW/pI
 dLh2YwIIygm18Gar9PgvMwXJn3BfzQ0qEJsf+HlQeNw9iLgbHpp2AsTxHTE87OGe
 R/y85taW3jGjPsNOKZOeLpvg/Ro8l8ZipLApvDCG2O22DThg/cd6NDjZxl1FJfRH
 acDG/JdgPo5MbdRAH/cM1WuFS9gEM+0BeSQ5gCjtPakF+X4Vz+ABFDLMRJoaejkJ
 q8DG32TQXELQx0RMghsqK7YCWGfl+2alA1u9w6TgJh9Rq4iVckvpDeqAZnK1Adkc
 87g957Tl0n6FA4wJj/t5jrceiLRMJAm/rBK+R3GZNfWrgx4bHbCmb4fZDZsrFzph
 nbAjNJ5kOchrFCaRR47ULby6+Q14MAFbkWq4Crfu4YDdzUkTPpep6pi2GIe8w0rV
 P0hdYOYJf6LUda0utuQX
 =oFrI
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Davbe Airlie:
 "This is the main pull request for graphics for 5.6. Usual selection of
  changes all over.

  I've got one outstanding vmwgfx pull that touches mm so kept it
  separate until after all of this lands. I'll try and get it to you
  soon after this, but it might be early next week (nothing wrong with
  code, just my schedule is messy)

  This also hits a lot of fbdev drivers with some cleanups.

  Other notables:
   - vulkan timeline semaphore support added to syncobjs
   - nouveau turing secureboot/graphics support
   - Displayport MST display stream compression support

  Detailed summary:

  uapi:
   - dma-buf heaps added (and fixed)
   - command line add support for panel oreientation
   - command line allow overriding penguin count

  drm:
   - mipi dsi definition updates
   - lockdep annotations for dma_resv
   - remove dma-buf kmap/kunmap support
   - constify fb_ops in all fbdev drivers
   - MST fix for daisy chained hotplug-
   - CTA-861-G modes with VIC >= 193 added
   - fix drm_panel_of_backlight export
   - LVDS decoder support
   - more device based logging support
   - scanline alighment for dumb buffers
   - MST DSC helpers

  scheduler:
   - documentation fixes
   - job distribution improvements

  panel:
   - Logic PD type 28 panel support
   - Jimax8729d MIPI-DSI
   - igenic JZ4770
   - generic DSI devicetree bindings
   - sony acx424AKP panel
   - Leadtek LTK500HD1829
   - xinpeng XPP055C272
   - AUO B116XAK01
   - GiantPlus GPM940B0
   - BOE NV140FHM-N49
   - Satoz SAT050AT40H12R2
   - Sharp LS020B1DD01D panels.

  ttm:
   - use blocking WW lock

  i915:
   - hw/uapi state separation
   - Lock annotation improvements
   - selftest improvements
   - ICL/TGL DSI VDSC support
   - VBT parsing improvments
   - Display refactoring
   - DSI updates + fixes
   - HDCP 2.2 for CFL
   - CML PCI ID fixes
   - GLK+ fbc fix
   - PSR fixes
   - GEN/GT refactor improvments
   - DP MST fixes
   - switch context id alloc to xarray
   - workaround updates
   - LMEM debugfs support
   - tiled monitor fixes
   - ICL+ clock gating programming removed
   - DP MST disable sequence fixed
   - LMEM discontiguous object maps
   - prefaulting for discontiguous objects
   - use LMEM for dumb buffers if possible
   - add LMEM mmap support

  amdgpu:
   - enable sync object timelines for vulkan
   - MST atomic routines
   - enable MST DSC support
   - add DMCUB display microengine support
   - DC OEM i2c support
   - Renoir DC fixes
   - Initial HDCP 2.x support
   - BACO support for Arcturus
   - Use BACO for runtime PM power save
   - gfxoff on navi10
   - gfx10 golden updates and fixes
   - DCN support on POWER
   - GFXOFF for raven1 refresh
   - MM engine idle handlers cleanup
   - 10bpc EDP panel fixes
   - renoir watermark fixes
   - SR-IOV fixes
   - Arcturus VCN fixes
   - GDDR6 training fixes
   - freesync fixes
   - Pollock support

  amdkfd:
   - unify more codepath with amdgpu
   - use KIQ to setup HIQ rather than MMIO

  radeon:
   - fix vma fault handler race
   - PPC DMA fix
   - register check fixes for r100/r200

  nouveau:
   - mmap_sem vs dma_resv fix
   - rewrite the ACR secure boot code for Turing
   - TU10x graphics engine support (TU11x pending)
   - Page kind mapping for turing
   - 10-bit LUT support
   - GP10B Tegra fixes
   - HD audio regression fix

  hisilicon/hibmc:
   - use generic fbdev code and helpers

  rockchip:
   - dsi/px30 support

  virtio:
   - fb damage support
   - static some functions

  vc4:
   - use dma_resv lock wrappers

  msm:
   - use dma_resv lock wrappers
   - sc7180 display + DSI support
   - a618 support
   - UBWC support improvements

  vmwgfx:
   - updates + new logging uapi

  exynos:
   - enable/disable callback cleanups

  etnaviv:
   - use dma_resv lock wrappers

  atmel-hlcdc:
   - clock fixes

  mediatek:
   - cmdq support
   - non-smooth cursor fixes
   - ctm property support

  sun4i:
   - suspend support
   - A64 mipi dsi support

  rcar-du:
   - Color management module support
   - LVDS encoder dual-link support
   - R8A77980 support

  analogic:
   - add support for an6345

  ast:
   - atomic modeset support
   - primary plane garbage fix

  arcgpu:
   - fixes for fourcc handling

  tegra:
   - minor fixes and improvments

  mcde:
   - vblank support

  meson:
   - OSD1 plane AFBC commit

  gma500:
   - add pageflip support
   - reomve global drm_dev

  komeda:
   - tweak debugfs output
   - d32 support
   - runtime PM suppotr

  udl:
   - use generic shmem helpers
   - cleanup and fixes"

* tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm: (1998 commits)
  drm/nouveau/fb/gp102-: allow module to load even when scrubber binary is missing
  drm/nouveau/acr: return error when registering LSF if ACR not supported
  drm/nouveau/disp/gv100-: not all channel types support reporting error codes
  drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
  drm/nouveau: support synchronous pushbuf submission
  drm/nouveau: signal pending fences when channel has been killed
  drm/nouveau: reject attempts to submit to dead channels
  drm/nouveau: zero vma pointer even if we only unreference it rather than free
  drm/nouveau: Add HD-audio component notifier support
  drm/nouveau: fix build error without CONFIG_IOMMU_API
  drm/nouveau/kms/nv04: remove set but not used variable 'width'
  drm/nouveau/kms/nv50: remove set but not unused variable 'nv_connector'
  drm/nouveau/mmu: fix comptag memory leak
  drm/nouveau/gr/gp10b: Use gp100_grctx and gp100_gr_zbc
  drm/nouveau/pmu/gm20b,gp10b: Fix Falcon bootstrapping
  drm/exynos: Rename Exynos to lowercase
  drm/exynos: change callback names
  drm/mst: Don't do atomic checks over disabled managers
  drm/amdgpu: add the lost mutex_init back
  drm/amd/display: skip opp blank or unblank if test pattern enabled
  ...
2020-01-30 08:04:01 -08:00
Linus Torvalds
83fa805bcb threads-v5.6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXjFo8wAKCRCRxhvAZXjc
 omaGAQDVwCHQekqxp2eC8EJH4Pkt+Bn1BLrA25stlTo93YBPHgEAsPVUCRNcrZAl
 VncYmxCfpt3Yu0S/MTVXu5xrRiIXPQk=
 =uqTN
 -----END PGP SIGNATURE-----

Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread management updates from Christian Brauner:
 "Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
  syscall.

  This syscall allows for the retrieval of file descriptors of a process
  based on its pidfd. A task needs to have ptrace_may_access()
  permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
  Andy) on the target.

  One of the main use-cases is in combination with seccomp's user
  notification feature. As a reminder, seccomp's user notification
  feature was made available in v5.0. It allows a task to retrieve a
  file descriptor for its seccomp filter. The file descriptor is usually
  handed of to a more privileged supervising process. The supervisor can
  then listen for syscall events caught by the seccomp filter of the
  supervisee and perform actions in lieu of the supervisee, usually
  emulating syscalls. pidfd_getfd() is needed to expand its uses.

  There are currently two major users that wait on pidfd_getfd() and one
  future user:

   - Netflix, Sargun said, is working on a service mesh where users
     should be able to connect to a dns-based VIP. When a user connects
     to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
     redirected to an envoy process. This service mesh uses seccomp user
     notifications and pidfd to intercept all connect calls and instead
     of connecting them to 1.2.3.4:80 connects them to e.g.
     127.0.0.1:8080.

   - LXD uses the seccomp notifier heavily to intercept and emulate
     mknod() and mount() syscalls for unprivileged containers/processes.
     With pidfd_getfd() more uses-cases e.g. bridging socket connections
     will be possible.

   - The patchset has also seen some interest from the browser corner.
     Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
     broker process. In the future glibc will start blocking all signals
     during dlopen() rendering this type of sandbox impossible. Hence,
     in the future Firefox will switch to a seccomp-user-nofication
     based sandbox which also makes use of file descriptor retrieval.
     The thread for this can be found at
     https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html

  With pidfd_getfd() it is e.g. possible to bridge socket connections
  for the supervisee (binding to a privileged port) and taking actions
  on file descriptors on behalf of the supervisee in general.

  Sargun's first version was using an ioctl on pidfds but various people
  pushed for it to be a proper syscall which he duely implemented as
  well over various review cycles. Selftests are of course included.
  I've also added instructions how to deal with merge conflicts below.

  There's also a small fix coming from the kernel mentee project to
  correctly annotate struct sighand_struct with __rcu to fix various
  sparse warnings. We've received a few more such fixes and even though
  they are mostly trivial I've decided to postpone them until after -rc1
  since they came in rather late and I don't want to risk introducing
  build warnings.

  Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
  needed to avoid allocation recursions triggerable by storage drivers
  that have userspace parts that run in the IO path (e.g. dm-multipath,
  iscsi, etc). These allocation recursions deadlock the device.

  The new prctl() allows such privileged userspace components to avoid
  allocation recursions by setting the PF_MEMALLOC_NOIO and
  PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
  relevant maintainers and is routed here as part of prctl()
  thread-management."

* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
  sched.h: Annotate sighand_struct with __rcu
  test: Add test for pidfd getfd
  arch: wire up pidfd_getfd syscall
  pid: Implement pidfd_getfd syscall
  vfs, fdtable: Add fget_task helper
2020-01-29 19:38:34 -08:00
Linus Torvalds
896f8d23d0 for-5.6/io_uring-vfs-2020-01-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4yEegQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpn5ZD/4/WlXs2cUDgg1C65bzZFO4qvevm+VkXmsk
 GbyrnFstRekvSH01/ZQxlyDVKS8Wux0XIJ6OArCh1047LvL1bEE5dvOW5iIiwa/r
 grjQuwFAzIPsE2fgcAO17BKIUzq2Z96+hwDzH7dw0i32yBuLvNmY/1SxcCHKfPut
 uzGyp7t3/2dIHbpWILRndMYe0O9j9ubmOMvKyKTwy723yDEafsUoqu2mlpigzTq4
 2i+DbYBIAd8qmLqG/m3e+vOt9xodJ2Q0hlO+v6DcP2SKXU64Hb/N98HadR//aWP9
 41DBXqs+dvDBcu3Jxb80PFUTiOQZECJivkns5cNcjuSXmNkOuQhDQR5K372AHmR9
 m6e6FSBxwej8HselAZCI6yu9uBKd0i+MM4FnFs/O73QGYx2ayXsEXp/Jad9xiYgW
 pC5XJTSqJQhPE0AYYEOzHPPcBLBcpvXHkvmGKdjkNb8OLhhgh2S/YG0DNC+8ABXr
 j1uIe/n3kJEEmOanUyiitGyLmDq+mXd7aCVKJL/J0KiGD8Gkc1avAZ1ZrTQgjujY
 FqqBFawO8gv3g0L4WMI8JI+HJGMnA488obet6UKm9+l/Z/urEpXzDAKf/W/vnx2B
 LD0FSA0bCh1tyO6JU+avFwHlwShtV7/rx/OhrmCK7CCYKtZCA2IEctxyr8U+PBIv
 DtwIMTYTsA==
 =ZZUI
 -----END PGP SIGNATURE-----

Merge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:

 - Support for various new opcodes (fallocate, openat, close, statx,
   fadvise, madvise, openat2, non-vectored read/write, send/recv, and
   epoll_ctl)

 - Faster ring quiesce for fileset updates

 - Optimizations for overflow condition checking

 - Support for max-sized clamping

 - Support for probing what opcodes are supported

 - Support for io-wq backend sharing between "sibling" rings

 - Support for registering personalities

 - Lots of little fixes and improvements

* tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits)
  io_uring: add support for epoll_ctl(2)
  eventpoll: support non-blocking do_epoll_ctl() calls
  eventpoll: abstract out epoll_ctl() handler
  io_uring: fix linked command file table usage
  io_uring: support using a registered personality for commands
  io_uring: allow registering credentials
  io_uring: add io-wq workqueue sharing
  io-wq: allow grabbing existing io-wq
  io_uring/io-wq: don't use static creds/mm assignments
  io-wq: make the io_wq ref counted
  io_uring: fix refcounting with batched allocations at OOM
  io_uring: add comment for drain_next
  io_uring: don't attempt to copy iovec for READ/WRITE
  io_uring: honor IOSQE_ASYNC for linked reqs
  io_uring: prep req when do IOSQE_ASYNC
  io_uring: use labeled array init in io_op_defs
  io_uring: optimise sqe-to-req flags translation
  io_uring: remove REQ_F_IO_DRAINED
  io_uring: file switch work needs to get flushed on exit
  io_uring: hide uring_fd in ctx
  ...
2020-01-29 18:53:37 -08:00
Bjorn Helgaas
4c6a8fe3aa Merge branch 'remotes/lorenzo/pci/dwc'
- Add intel-gw driver for PCIe host controller on Intel Gateway SoC
    (Dilip Kota)

  - Use shared DesignWare helpers to configure Fast Training Sequence (FTS)
    in artpec6 (Dilip Kota)

* remotes/lorenzo/pci/dwc:
  PCI: artpec6: Configure FTS with dwc helper function
  PCI: dwc: intel: PCIe RC controller driver
  dt-bindings: PCI: intel: Add YAML schemas for the PCIe RC controller
2020-01-29 17:00:04 -06:00
Linus Torvalds
22b17db4ea y2038: core, driver and file system changes
These are updates to device drivers and file systems that for some reason
 or another were not included in the kernel in the previous y2038 series.
 
 I've gone through all users of time_t again to make sure the kernel is
 in a long-term maintainable state, replacing all remaining references
 to time_t with safe alternatives.
 
 Some related parts of the series were picked up into the nfsd, xfs,
 alsa and v4l2 trees. A final set of patches in linux-mm removes the now
 unused time_t/timeval/timespec types and helper functions after all five
 branches are merged for linux-5.6, ensuring that no new users get merged.
 
 As a result, linux-5.6, or my backport of the patches to 5.4 [1], should
 be the first release that can serve as a base for a 32-bit system designed
 to run beyond year 2038, with a few remaining caveats:
 
 - All user space must be compiled with a 64-bit time_t, which will be
   supported in the coming musl-1.2 and glibc-2.32 releases, along with
   installed kernel headers from linux-5.6 or higher.
 
 - Applications that use the system call interfaces directly need to be
   ported to use the time64 syscalls added in linux-5.1 in place of the
   existing system calls. This impacts most users of futex() and seccomp()
   as well as programming languages that have their own runtime environment
   not based on libc.
 
 - Applications that use a private copy of kernel uapi header files or
   their contents may need to update to the linux-5.6 version, in
   particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
   linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h.
 
 - A few remaining interfaces cannot be changed to pass a 64-bit time_t
   in a compatible way, so they must be configured to use CLOCK_MONOTONIC
   times or (with a y2106 problem) unsigned 32-bit timestamps. Most
   importantly this impacts all users of 'struct input_event'.
 
 - All y2038 problems that are present on 64-bit machines also apply to
   32-bit machines. In particular this affects file systems with on-disk
   timestamps using signed 32-bit seconds: ext4 with ext3-style small
   inodes, ext2, xfs (to be fixed soon) and ufs.
 
 Changes since v1 [2]:
 
 - Add Acks I received
 - Rebase to v5.5-rc1, dropping patches that got merged already
 - Add NFS, XFS and the final three patches from another series
 - Rewrite etnaviv patches
 - Add one late revert to avoid an etnaviv regression
 
 [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame
 [2] https://lore.kernel.org/lkml/20191108213257.3097633-1-arnd@arndb.de/
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJeMYy3AAoJEGCrR//JCVInEGwP/0R+S+ok7vw9OdLVT0lFl07D
 IcVabgOWf24imN7m7L7Mlt3nDfxIT4tMpiAXq7eMO3spcyViG18O2LXdSQ4/7QBp
 +BlhoMjOP9w34Jyd7mnkFr4vqQALvfIqkS8rFObDtDub2Rfj9PC36MRMIu8BPXlv
 RK8bigwJeH/DV38yc5/JeUcD+WuewYLsK9XPWN+4yB4vgGsNU3ZQQ6nnzbR3hMsN
 DN8WZ68Y7IBs0Kyxkf+s2zmRXtCa2RiFg/2TUsk5olVAJVaenvte69hq5RSbg1vW
 vLi6K8cBoPWL59nqCzcNE+TUhSUg3LOj/a/KWyl76yovz7AlJaNjssOf8ZjHw6sL
 MhQqz3hXTxiJDS2Jvbf1yojiYGlzrq/gqcRFGe9jPcZdieMc4/yZCx60G/Exa5Pu
 YdMcqMyDWPFyUAFQNWEF59HPheOdj6tb1KpJ6bwgCo3P7QqhLrU4z9w3Py4/ZfBO
 4sWcWteSsD6MN/ADJ2WQ56nNxzM2AvkeVJKcF6FCkdngXX9T0GExmZz7SqB5Du99
 9lNjIiD5E+LBa/Swo/7n49aYa8x06V1pmHYTZVh9Wkl+CZiO21umezQFrWsfaMTp
 xt3c6pFdMG5xNMGpreTAXOmf2R+T6O8IO2qQq/TYjzqOLH7QC830P7avkmml+cK1
 LjOBE2TfSeO8Ru1dXV4t
 =wx0A
 -----END PGP SIGNATURE-----

Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull y2038 updates from Arnd Bergmann:
 "Core, driver and file system changes

  These are updates to device drivers and file systems that for some
  reason or another were not included in the kernel in the previous
  y2038 series.

  I've gone through all users of time_t again to make sure the kernel is
  in a long-term maintainable state, replacing all remaining references
  to time_t with safe alternatives.

  Some related parts of the series were picked up into the nfsd, xfs,
  alsa and v4l2 trees. A final set of patches in linux-mm removes the
  now unused time_t/timeval/timespec types and helper functions after
  all five branches are merged for linux-5.6, ensuring that no new users
  get merged.

  As a result, linux-5.6, or my backport of the patches to 5.4 [1],
  should be the first release that can serve as a base for a 32-bit
  system designed to run beyond year 2038, with a few remaining caveats:

   - All user space must be compiled with a 64-bit time_t, which will be
     supported in the coming musl-1.2 and glibc-2.32 releases, along
     with installed kernel headers from linux-5.6 or higher.

   - Applications that use the system call interfaces directly need to
     be ported to use the time64 syscalls added in linux-5.1 in place of
     the existing system calls. This impacts most users of futex() and
     seccomp() as well as programming languages that have their own
     runtime environment not based on libc.

   - Applications that use a private copy of kernel uapi header files or
     their contents may need to update to the linux-5.6 version, in
     particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
     linux/elfcore.h, linux/sockios.h, linux/timex.h and
     linux/can/bcm.h.

   - A few remaining interfaces cannot be changed to pass a 64-bit
     time_t in a compatible way, so they must be configured to use
     CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
     timestamps. Most importantly this impacts all users of 'struct
     input_event'.

   - All y2038 problems that are present on 64-bit machines also apply
     to 32-bit machines. In particular this affects file systems with
     on-disk timestamps using signed 32-bit seconds: ext4 with
     ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame

* tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
  Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
  y2038: sh: remove timeval/timespec usage from headers
  y2038: sparc: remove use of struct timex
  y2038: rename itimerval to __kernel_old_itimerval
  y2038: remove obsolete jiffies conversion functions
  nfs: fscache: use timespec64 in inode auxdata
  nfs: fix timstamp debug prints
  nfs: use time64_t internally
  sunrpc: convert to time64_t for expiry
  drm/etnaviv: avoid deprecated timespec
  drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
  drm/msm: avoid using 'timespec'
  hfs/hfsplus: use 64-bit inode timestamps
  hostfs: pass 64-bit timestamps to/from user space
  packet: clarify timestamp overflow
  tsacct: add 64-bit btime field
  acct: stop using get_seconds()
  um: ubd: use 64-bit time_t where possible
  xtensa: ISS: avoid struct timeval
  dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
  ...
2020-01-29 14:55:47 -08:00
Jens Axboe
3e4827b05d io_uring: add support for epoll_ctl(2)
This adds IORING_OP_EPOLL_CTL, which can perform the same work as the
epoll_ctl(2) system call.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29 15:46:09 -07:00
Linus Torvalds
6aee4badd8 Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull openat2 support from Al Viro:
 "This is the openat2() series from Aleksa Sarai.

  I'm afraid that the rest of namei stuff will have to wait - it got
  zero review the last time I'd posted #work.namei, and there had been a
  leak in the posted series I'd caught only last weekend. I was going to
  repost it on Monday, but the window opened and the odds of getting any
  review during that... Oh, well.

  Anyway, openat2 part should be ready; that _did_ get sane amount of
  review and public testing, so here it comes"

From Aleksa's description of the series:
 "For a very long time, extending openat(2) with new features has been
  incredibly frustrating. This stems from the fact that openat(2) is
  possibly the most famous counter-example to the mantra "don't silently
  accept garbage from userspace" -- it doesn't check whether unknown
  flags are present[1].

  This means that (generally) the addition of new flags to openat(2) has
  been fraught with backwards-compatibility issues (O_TMPFILE has to be
  defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
  kernels gave errors, since it's insecure to silently ignore the
  flag[2]). All new security-related flags therefore have a tough road
  to being added to openat(2).

  Furthermore, the need for some sort of control over VFS's path
  resolution (to avoid malicious paths resulting in inadvertent
  breakouts) has been a very long-standing desire of many userspace
  applications.

  This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
  (which was a variant of David Drysdale's O_BENEATH patchset[4] which
  was a spin-off of the Capsicum project[5]) with a few additions and
  changes made based on the previous discussion within [6] as well as
  others I felt were useful.

  In line with the conclusions of the original discussion of
  AT_NO_JUMPS, the flag has been split up into separate flags. However,
  instead of being an openat(2) flag it is provided through a new
  syscall openat2(2) which provides several other improvements to the
  openat(2) interface (see the patch description for more details). The
  following new LOOKUP_* flags are added:

  LOOKUP_NO_XDEV:

     Blocks all mountpoint crossings (upwards, downwards, or through
     absolute links). Absolute pathnames alone in openat(2) do not
     trigger this. Magic-link traversal which implies a vfsmount jump is
     also blocked (though magic-link jumps on the same vfsmount are
     permitted).

  LOOKUP_NO_MAGICLINKS:

     Blocks resolution through /proc/$pid/fd-style links. This is done
     by blocking the usage of nd_jump_link() during resolution in a
     filesystem. The term "magic-links" is used to match with the only
     reference to these links in Documentation/, but I'm happy to change
     the name.

     It should be noted that this is different to the scope of
     ~LOOKUP_FOLLOW in that it applies to all path components. However,
     you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
     will *not* fail (assuming that no parent component was a
     magic-link), and you will have an fd for the magic-link.

     In order to correctly detect magic-links, the introduction of a new
     LOOKUP_MAGICLINK_JUMPED state flag was required.

  LOOKUP_BENEATH:

     Disallows escapes to outside the starting dirfd's
     tree, using techniques such as ".." or absolute links. Absolute
     paths in openat(2) are also disallowed.

     Conceptually this flag is to ensure you "stay below" a certain
     point in the filesystem tree -- but this requires some additional
     to protect against various races that would allow escape using
     "..".

     Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
     can trivially beam you around the filesystem (breaking the
     protection). In future, there might be similar safety checks done
     as in LOOKUP_IN_ROOT, but that requires more discussion.

  In addition, two new flags are added that expand on the above ideas:

  LOOKUP_NO_SYMLINKS:

     Does what it says on the tin. No symlink resolution is allowed at
     all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
     can still be used with NOFOLLOW to open an fd for the symlink as
     long as no parent path had a symlink component.

  LOOKUP_IN_ROOT:

     This is an extension of LOOKUP_BENEATH that, rather than blocking
     attempts to move past the root, forces all such movements to be
     scoped to the starting point. This provides chroot(2)-like
     protection but without the cost of a chroot(2) for each filesystem
     operation, as well as being safe against race attacks that
     chroot(2) is not.

     If a race is detected (as with LOOKUP_BENEATH) then an error is
     generated, and similar to LOOKUP_BENEATH it is not permitted to
     cross magic-links with LOOKUP_IN_ROOT.

     The primary need for this is from container runtimes, which
     currently need to do symlink scoping in userspace[7] when opening
     paths in a potentially malicious container.

     There is a long list of CVEs that could have bene mitigated by
     having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
     CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
     few).

  In order to make all of the above more usable, I'm working on
  libpathrs[8] which is a C-friendly library for safe path resolution.
  It features a userspace-emulated backend if the kernel doesn't support
  openat2(2). Hopefully we can get userspace to switch to using it, and
  thus get openat2(2) support for free once it's ready.

  Future work would include implementing things like
  RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
  programs to be sure they don't hit DoSes though stale NFS handles)"

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29 11:20:24 -08:00
Linus Torvalds
7ba31c3f2f Staging/IIO patches for 5.6-rc1
Here is the big staging/iio driver patches for 5.6-rc1
 
 Included in here are:
 	- lots of new IIO drivers and updates for that subsystem
 	- the usual huge quantity of minor cleanups for staging drivers
 	- removal of the following staging drivers:
 		- isdn/avm
 		- isdn/gigaset
 		- isdn/hysdn
 		- octeon-usb
 		- octeon ethernet
 
 Overall we deleted far more lines than we added, removing over 40k of
 old and obsolete driver code.
 
 All of these changes have been in linux-next for a while with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXjFOKw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yly3wCfac6fbfrpwZ2VeUFyT5EJFr9JnKEAn1VMQTIJ
 QCgCqbQemnXfbOXiA5pZ
 =rP6a
 -----END PGP SIGNATURE-----

Merge tag 'staging-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO updates from Greg KH:
 "Here is the big staging/iio driver patches for 5.6-rc1

  Included in here are:

   - lots of new IIO drivers and updates for that subsystem

   - the usual huge quantity of minor cleanups for staging drivers

   - removal of the following staging drivers:
       - isdn/avm
       - isdn/gigaset
       - isdn/hysdn
       - octeon-usb
       - octeon ethernet

  Overall we deleted far more lines than we added, removing over 40k of
  old and obsolete driver code.

  All of these changes have been in linux-next for a while with no
  reported issues"

* tag 'staging-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (353 commits)
  staging: most: usb: check for NULL device
  staging: next: configfs: fix release link
  staging: most: core: fix logging messages
  staging: most: core: remove container struct
  staging: most: remove struct device core driver
  staging: most: core: drop device reference
  staging: most: remove device from interface structure
  staging: comedi: drivers: fix spelling mistake "to" -> "too"
  staging: exfat: remove fs_func struct.
  staging: wilc1000: avoid mutex unlock without lock in wilc_wlan_handle_txq()
  staging: wilc1000: return zero on success and non-zero on function failure
  staging: axis-fifo: replace spinlock with mutex
  staging: wilc1000: remove unused code prior to throughput enhancement in SPI
  staging: wilc1000: added 'wilc_' prefix for 'struct assoc_resp' name
  staging: wilc1000: move firmware API struct's to separate header file
  staging: wilc1000: remove use of infinite loop conditions
  staging: kpc2000: rename variables with kpc namespace
  staging: vt6656: Remove memory buffer from vnt_download_firmware.
  staging: vt6656: Just check NEWRSR_DECRYPTOK for RX_FLAG_DECRYPTED.
  staging: vt6656: Use vnt_rx_tail struct for tail variables.
  ...
2020-01-29 10:15:11 -08:00
Jens Axboe
75c6a03904 io_uring: support using a registered personality for commands
For personalities previously registered via IORING_REGISTER_PERSONALITY,
allow any command to select them. This is done through setting
sqe->personality to the id returned from registration, and then flagging
sqe->flags with IOSQE_PERSONALITY.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28 17:45:31 -07:00
Jens Axboe
071698e13a io_uring: allow registering credentials
If an application wants to use a ring with different kinds of
credentials, it can register them upfront. We don't lookup credentials,
the credentials of the task calling IORING_REGISTER_PERSONALITY is used.

An 'id' is returned for the application to use in subsequent personality
support.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28 17:44:44 -07:00
Pavel Begunkov
24369c2e3b io_uring: add io-wq workqueue sharing
If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to
be a valid io_uring fd io-wq of which will be shared with the newly
created io_uring instance. If the flag is set but it can't share io-wq,
it fails.

This allows creation of "sibling" io_urings, where we prefer to keep the
SQ/CQ private, but want to share the async backend to minimize the amount
of overhead associated with having multiple rings that belong to the same
backend.

Reported-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Daurnimator <quae@daurnimator.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28 17:44:41 -07:00
Jens Axboe
cccf0ee834 io_uring/io-wq: don't use static creds/mm assignments
We currently setup the io_wq with a static set of mm and creds. Even for
a single-use io-wq per io_uring, this is suboptimal as we have may have
multiple enters of the ring. For sharing the io-wq backend, it doesn't
work at all.

Switch to passing in the creds and mm when the work item is setup. This
means that async work is no longer deferred to the io_uring mm and creds,
it is done with the current mm and creds.

Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know
they can rely on the current personality (mm and creds) being the same
for direct issue and async issue.

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-28 17:44:20 -07:00
Linus Torvalds
bd2463ac7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Add WireGuard

 2) Add HE and TWT support to ath11k driver, from John Crispin.

 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

 4) Add variable window congestion control to TIPC, from Jon Maloy.

 5) Add BCM84881 PHY driver, from Russell King.

 6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

 7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

 8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

 9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
  net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
  udp: segment looped gso packets correctly
  netem: change mailing list
  qed: FW 8.42.2.0 debug features
  qed: rt init valid initialization changed
  qed: Debug feature: ilt and mdump
  qed: FW 8.42.2.0 Add fw overlay feature
  qed: FW 8.42.2.0 HSI changes
  qed: FW 8.42.2.0 iscsi/fcoe changes
  qed: Add abstraction for different hsi values per chip
  qed: FW 8.42.2.0 Additional ll2 type
  qed: Use dmae to write to widebus registers in fw_funcs
  qed: FW 8.42.2.0 Parser offsets modified
  qed: FW 8.42.2.0 Queue Manager changes
  qed: FW 8.42.2.0 Expose new registers and change windows
  qed: FW 8.42.2.0 Internal ram offsets modifications
  MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
  Documentation: net: octeontx2: Add RVU HW and drivers overview
  octeontx2-pf: ethtool RSS config support
  octeontx2-pf: Add basic ethtool support
  ...
2020-01-28 16:02:33 -08:00
Linus Torvalds
a78208e243 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Removed CRYPTO_TFM_RES flags
   - Extended spawn grabbing to all algorithm types
   - Moved hash descsize verification into API code

  Algorithms:
   - Fixed recursive pcrypt dead-lock
   - Added new 32 and 64-bit generic versions of poly1305
   - Added cryptogams implementation of x86/poly1305

  Drivers:
   - Added support for i.MX8M Mini in caam
   - Added support for i.MX8M Nano in caam
   - Added support for i.MX8M Plus in caam
   - Added support for A33 variant of SS in sun4i-ss
   - Added TEE support for Raven Ridge in ccp
   - Added in-kernel API to submit TEE commands in ccp
   - Added AMD-TEE driver
   - Added support for BCM2711 in iproc-rng200
   - Added support for AES256-GCM based ciphers for chtls
   - Added aead support on SEC2 in hisilicon"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits)
  crypto: arm/chacha - fix build failured when kernel mode NEON is disabled
  crypto: caam - add support for i.MX8M Plus
  crypto: x86/poly1305 - emit does base conversion itself
  crypto: hisilicon - fix spelling mistake "disgest" -> "digest"
  crypto: chacha20poly1305 - add back missing test vectors and test chunking
  crypto: x86/poly1305 - fix .gitignore typo
  tee: fix memory allocation failure checks on drv_data and amdtee
  crypto: ccree - erase unneeded inline funcs
  crypto: ccree - make cc_pm_put_suspend() void
  crypto: ccree - split overloaded usage of irq field
  crypto: ccree - fix PM race condition
  crypto: ccree - fix FDE descriptor sequence
  crypto: ccree - cc_do_send_request() is void func
  crypto: ccree - fix pm wrongful error reporting
  crypto: ccree - turn errors to debug msgs
  crypto: ccree - fix AEAD decrypt auth fail
  crypto: ccree - fix typo in comment
  crypto: ccree - fix typos in error msgs
  crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data
  crypto: x86/sha - Eliminate casts on asm implementations
  ...
2020-01-28 15:38:56 -08:00
Linus Torvalds
f0d8744143 fscrypt updates for 5.6
- Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
   provided via a keyring key.
 
 - Prepare for the new dirhash method (SipHash of plaintext name) that
   will be used by directories that are both encrypted and casefolded.
 
 - Switch to a new format for "no-key names" that prepares for the new
   dirhash method, and also fixes a longstanding bug where multiple
   filenames could map to the same no-key name.
 
 - Allow the crypto algorithms used by fscrypt to be built as loadable
   modules when the fscrypt-capable filesystems are.
 
 - Optimize fscrypt_zeroout_range().
 
 - Various cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXi+OfxQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK0tJAQDkxUl11NRqVgS06TLWAniKVMWh3kaL
 CaonjEmfs1m6kgEA+TP8coTAxUvJNubaHz3J3x9dyx9RPUVFyUzSB0J/TQg=
 =XkXJ
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt updates from Eric Biggers:

 - Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
   provided via a keyring key.

 - Prepare for the new dirhash method (SipHash of plaintext name) that
   will be used by directories that are both encrypted and casefolded.

 - Switch to a new format for "no-key names" that prepares for the new
   dirhash method, and also fixes a longstanding bug where multiple
   filenames could map to the same no-key name.

 - Allow the crypto algorithms used by fscrypt to be built as loadable
   modules when the fscrypt-capable filesystems are.

 - Optimize fscrypt_zeroout_range().

 - Various cleanups.

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: (26 commits)
  fscrypt: improve format of no-key names
  ubifs: allow both hash and disk name to be provided in no-key names
  ubifs: don't trigger assertion on invalid no-key filename
  fscrypt: clarify what is meant by a per-file key
  fscrypt: derive dirhash key for casefolded directories
  fscrypt: don't allow v1 policies with casefolding
  fscrypt: add "fscrypt_" prefix to fname_encrypt()
  fscrypt: don't print name of busy file when removing key
  ubifs: use IS_ENCRYPTED() instead of ubifs_crypt_is_encrypted()
  fscrypt: document gfp_flags for bounce page allocation
  fscrypt: optimize fscrypt_zeroout_range()
  fscrypt: remove redundant bi_status check
  fscrypt: Allow modular crypto algorithms
  fscrypt: include <linux/ioctl.h> in UAPI header
  fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info()
  fscrypt: remove fscrypt_is_direct_key_policy()
  fscrypt: move fscrypt_valid_enc_modes() to policy.c
  fscrypt: check for appropriate use of DIRECT_KEY flag earlier
  fscrypt: split up fscrypt_supported_policy() by policy version
  fscrypt: introduce fscrypt_needs_contents_encryption()
  ...
2020-01-28 15:22:21 -08:00
Mike Christie
8d19f1c8e1
prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
amd nbd that have userspace components that can run in the IO path. For
example, iscsi and nbd's userspace deamons may need to recreate a socket
and/or send IO on it, and dm-multipath's daemon multipathd may need to
send SG IO or read/write IO to figure out the state of paths and re-set
them up.

In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
memalloc_*_save/restore functions to control the allocation behavior,
but for userspace we would end up hitting an allocation that ended up
writing data back to the same device we are trying to allocate for.
The device is then in a state of deadlock, because to execute IO the
device needs to allocate memory, but to allocate memory the memory
layers want execute IO to the device.

Here is an example with nbd using a local userspace daemon that performs
network IO to a remote server. We are using XFS on top of the nbd device,
but it can happen with any FS or other modules layered on top of the nbd
device that can write out data to free memory.  Here a nbd daemon helper
thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
a request. This kicks off a reclaim operation which results in a WRITE to
the nbd device and the nbd thread calling back into the mm layer.

[ 1626.609191] msgr-worker-1   D    0  1026      1 0x00004000
[ 1626.609193] Call Trace:
[ 1626.609195]  ? __schedule+0x29b/0x630
[ 1626.609197]  ? wait_for_completion+0xe0/0x170
[ 1626.609198]  schedule+0x30/0xb0
[ 1626.609200]  schedule_timeout+0x1f6/0x2f0
[ 1626.609202]  ? blk_finish_plug+0x21/0x2e
[ 1626.609204]  ? _xfs_buf_ioapply+0x2e6/0x410
[ 1626.609206]  ? wait_for_completion+0xe0/0x170
[ 1626.609208]  wait_for_completion+0x108/0x170
[ 1626.609210]  ? wake_up_q+0x70/0x70
[ 1626.609212]  ? __xfs_buf_submit+0x12e/0x250
[ 1626.609214]  ? xfs_bwrite+0x25/0x60
[ 1626.609215]  xfs_buf_iowait+0x22/0xf0
[ 1626.609218]  __xfs_buf_submit+0x12e/0x250
[ 1626.609220]  xfs_bwrite+0x25/0x60
[ 1626.609222]  xfs_reclaim_inode+0x2e8/0x310
[ 1626.609224]  xfs_reclaim_inodes_ag+0x1b6/0x300
[ 1626.609227]  xfs_reclaim_inodes_nr+0x31/0x40
[ 1626.609228]  super_cache_scan+0x152/0x1a0
[ 1626.609231]  do_shrink_slab+0x12c/0x2d0
[ 1626.609233]  shrink_slab+0x9c/0x2a0
[ 1626.609235]  shrink_node+0xd7/0x470
[ 1626.609237]  do_try_to_free_pages+0xbf/0x380
[ 1626.609240]  try_to_free_pages+0xd9/0x1f0
[ 1626.609245]  __alloc_pages_slowpath+0x3a4/0xd30
[ 1626.609251]  ? ___slab_alloc+0x238/0x560
[ 1626.609254]  __alloc_pages_nodemask+0x30c/0x350
[ 1626.609259]  skb_page_frag_refill+0x97/0xd0
[ 1626.609274]  sk_page_frag_refill+0x1d/0x80
[ 1626.609279]  tcp_sendmsg_locked+0x2bb/0xdd0
[ 1626.609304]  tcp_sendmsg+0x27/0x40
[ 1626.609307]  sock_sendmsg+0x54/0x60
[ 1626.609308]  ___sys_sendmsg+0x29f/0x320
[ 1626.609313]  ? sock_poll+0x66/0xb0
[ 1626.609318]  ? ep_item_poll.isra.15+0x40/0xc0
[ 1626.609320]  ? ep_send_events_proc+0xe6/0x230
[ 1626.609322]  ? hrtimer_try_to_cancel+0x54/0xf0
[ 1626.609324]  ? ep_read_events_proc+0xc0/0xc0
[ 1626.609326]  ? _raw_write_unlock_irq+0xa/0x20
[ 1626.609327]  ? ep_scan_ready_list.constprop.19+0x218/0x230
[ 1626.609329]  ? __hrtimer_init+0xb0/0xb0
[ 1626.609331]  ? _raw_spin_unlock_irq+0xa/0x20
[ 1626.609334]  ? ep_poll+0x26c/0x4a0
[ 1626.609337]  ? tcp_tsq_write.part.54+0xa0/0xa0
[ 1626.609339]  ? release_sock+0x43/0x90
[ 1626.609341]  ? _raw_spin_unlock_bh+0xa/0x20
[ 1626.609342]  __sys_sendmsg+0x47/0x80
[ 1626.609347]  do_syscall_64+0x5f/0x1c0
[ 1626.609349]  ? prepare_exit_to_usermode+0x75/0xa0
[ 1626.609351]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This patch adds a new prctl command that daemons can use after they have
done their initial setup, and before they start to do allocations that
are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
flags so both userspace block and FS threads can use it to avoid the
allocation recursion and try to prevent from being throttled while
writing out data to free up memory.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Masato Suzuki <masato.suzuki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-28 10:09:51 +01:00
Linus Torvalds
e279160f49 The timekeeping and timers departement provides:
- Time namespace support:
 
     If a container migrates from one host to another then it expects that
     clocks based on MONOTONIC and BOOTTIME are not subject to
     disruption. Due to different boot time and non-suspended runtime these
     clocks can differ significantly on two hosts, in the worst case time
     goes backwards which is a violation of the POSIX requirements.
 
     The time namespace addresses this problem. It allows to set offsets for
     clock MONOTONIC and BOOTTIME once after creation and before tasks are
     associated with the namespace. These offsets are taken into account by
     timers and timekeeping including the VDSO.
 
     Offsets for wall clock based clocks (REALTIME/TAI) are not provided by
     this mechanism. While in theory possible, the overhead and code
     complexity would be immense and not justified by the esoteric potential
     use cases which were discussed at Plumbers '18.
 
     The overhead for tasks in the root namespace (host time offsets = 0) is
     in the noise and great effort was made to ensure that especially in the
     VDSO. If time namespace is disabled in the kernel configuration the
     code is compiled out.
 
     Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature
     and kept on for more than a year addressing review comments, finding
     better solutions. A pleasant experience.
 
   - Overhaul of the alarmtimer device dependency handling to ensure that
     the init/suspend/resume ordering is correct.
 
   - A new clocksource/event driver for Microchip PIT64
 
   - Suspend/resume support for the Hyper-V clocksource
 
   - The usual pile of fixes, updates and improvements mostly in the
     driver code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl4vbTcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXT2D/96iJ3G9Snn2khEQP3XS2rYmtDGw7NO
 m1n96falwWeGe6zreU80R2Jge5nLxQtNhRoMPLLee1GpHwRC6lvqEqgdZ4LMBrD2
 JqV7Gzg8Urmdh+hpDsyTCpeEWEzoMKxiFOX8PxwctqUhM4szEe5iQg2YQsg85Jw2
 vG6M93N2xwDILh4rhEMbKjo+5ZmYn7c1RQvpGOSmpKOj940W/N7H2HBsFhdaJ1Kw
 FW5pFv1211PaU5RV2YNb2dMeeMTT1N3e2VN4Dkadoxp47pb+725gNHEBEjmV9poG
 Lp4IhzGAPnj8zVD88icQZSTaK3gUHMClxprJ0Pf84WEtiH7SeGu8BPYyu77+oNDe
 yzcctDJNyCWXkzmaP/fe/HLc0TStbvNAJ5Tagp4BC75gzebeb4/n8RtRT0fKeDYL
 pxpDPKDAPU7p1JSjxiWAtshqjBycWNY3Z49bA7/VhKBhnv8BDyBPGlYd7/4xrbGr
 RK7DQNXJwaJaiNJ7p5PiaFxGzNyB0B9sThD/slSlEInIKb4h9YzWr0TV+NB62VnB
 sDcN+tpLbRPz5/5cHGGfxR0+zKWpfyai8pzbmmaXEaKssjRYwyvcac5EZdgbWpbK
 k7CqAjoWLA2P+tGeePNJOf5JYK6Vmdyh4clmuwM0zOiRJ9NlWUyMf3z7QYILs4RO
 UAI+6opYlZEPAw==
 =x3qT
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "The timekeeping and timers departement provides:

   - Time namespace support:

     If a container migrates from one host to another then it expects
     that clocks based on MONOTONIC and BOOTTIME are not subject to
     disruption. Due to different boot time and non-suspended runtime
     these clocks can differ significantly on two hosts, in the worst
     case time goes backwards which is a violation of the POSIX
     requirements.

     The time namespace addresses this problem. It allows to set offsets
     for clock MONOTONIC and BOOTTIME once after creation and before
     tasks are associated with the namespace. These offsets are taken
     into account by timers and timekeeping including the VDSO.

     Offsets for wall clock based clocks (REALTIME/TAI) are not provided
     by this mechanism. While in theory possible, the overhead and code
     complexity would be immense and not justified by the esoteric
     potential use cases which were discussed at Plumbers '18.

     The overhead for tasks in the root namespace (ie where host time
     offsets = 0) is in the noise and great effort was made to ensure
     that especially in the VDSO. If time namespace is disabled in the
     kernel configuration the code is compiled out.

     Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
     feature and kept on for more than a year addressing review
     comments, finding better solutions. A pleasant experience.

   - Overhaul of the alarmtimer device dependency handling to ensure
     that the init/suspend/resume ordering is correct.

   - A new clocksource/event driver for Microchip PIT64

   - Suspend/resume support for the Hyper-V clocksource

   - The usual pile of fixes, updates and improvements mostly in the
     driver code"

* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
  alarmtimer: Use wakeup source from alarmtimer platform device
  alarmtimer: Make alarmtimer platform device child of RTC device
  alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
  hrtimer: Add missing sparse annotation for __run_timer()
  lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
  MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
  clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
  clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
  clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
  clocksource/drivers/exynos_mct: Rename Exynos to lowercase
  clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
  clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
  clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
  clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
  clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
  clocksource/drivers/bcm2835_timer: Fix memory leak of timer
  clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
  clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
  clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
  ...
2020-01-27 16:47:05 -08:00
Linus Torvalds
22a8f39c52 for-5.6/drivers-2020-01-27
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4vOrAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgph+HD/9bM9CqjchDitL0NQne+4BwBdoRCcik0z7n
 y/CrNIh3tZnkJO0fT9Lz6GD/6iZNU93NUHFMOgzuS+8mR5CwQUkR/xjDvPX8H07F
 h+Xl8ZUX6YjbuLmO0sgc9yu3SkMaxjCHfGPl1juZXwH6ERM6MTSkg6O+YwQZnvAB
 lLJWaa1oOTQAsbnz7ZVwZ5pDOfkoSirCat2kzPoyfzptcIrUw7vfu4QHdCdNHy63
 eT/vcHmj6CqzZWJRfpkaFOY6fnY30Hh9fqAVQvzxHPvm1vM3z7JSw7cY8t+cjXjn
 TJ0NQK2QFmGTTa/ZEf3KCB5kbNV0SpOV6Jqz1aBX/cStQez6ygFkPGscPbwy8tsR
 vBVDyCMZC42jbt7TuIHNkAI/e+HqSOBgyB8MaWaQfApcbNzTIFp9lltrTcZpaYNZ
 J4R6YQGDve+ElUlOAPBbiXRGrd3jmhApP8scbgls05UwZOtDf+KJBCLQYRzw8qrb
 J7D7hVugwV0oDhdaUkd4Pt3KYoISsFgIe7HRuKGGmfKyqWiJ5iLH0QVPaEkPAokr
 VzzSoex+5xcCSvIiGd1DNzsVD9C2xbyUvifHTa36pYKQ65BogyJBopgYgEYd8ksN
 AlmPxJM9j1o85TtV1CAbb2O0827BlLmYLc6BcdD+s0x+FeStdnjwICQooHiitTiI
 hEHajSDujQ==
 =Us3h
 -----END PGP SIGNATURE-----

Merge tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "Like the core side, not a lot of changes here, just two main items:

   - Series of patches (via Coly) with fixes for bcache (Coly,
     Christoph)

   - MD pull request from Song"

* tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block: (31 commits)
  bcache: reap from tail of c->btree_cache in bch_mca_scan()
  bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
  bcache: remove member accessed from struct btree
  bcache: print written and keys in trace_bcache_btree_write
  bcache: avoid unnecessary btree nodes flushing in btree_flush_write()
  bcache: add code comments for state->pool in __btree_sort()
  lib: crc64: include <linux/crc64.h> for 'crc64_be'
  bcache: use read_cache_page_gfp to read the superblock
  bcache: store a pointer to the on-disk sb in the cache and cached_dev structures
  bcache: return a pointer to the on-disk sb from read_super
  bcache: transfer the sb_page reference to register_{bdev,cache}
  bcache: fix use-after-free in register_bcache()
  bcache: properly initialize 'path' and 'err' in register_bcache()
  bcache: rework error unwinding in register_bcache
  bcache: use a separate data structure for the on-disk super block
  bcache: cached_dev_free needs to put the sb page
  md/raid1: introduce wait_for_serialization
  md/raid1: use bucket based mechanism for IO serialization
  md: introduce a new struct for IO serialization
  md: don't destroy serial_info_pool if serialize_policy is true
  ...
2020-01-27 12:55:48 -08:00
Linus Torvalds
a5b871c91d dmaengine updates for v5.6-rc1
- Core:
    - Support for dynamic channels
    - Removal of various slave wrappers
    - Make few slave request APIs as private to dmaengine
    - Symlinks between channels and slaves
    - Support for hotplug of controllers
    - Support for metadata_ops for dma_async_tx_descriptor
    - Reporting DMA cached data amount
    - Virtual dma channel locking updates
 
  - New drivers/device/feature support support:
    - Driver for Intel data accelerators
    - Driver for TI K3 UDMA
    - Driver for PLX DMA engine
    - Driver for hisilicon Kunpeng DMA engine
    - Support for eDMA support for QorIQ LS1028A in fsl edma driver
    - Support for cyclic dma in sun4i driver
    - Support for X1830 in JZ4780 driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl4u+QkACgkQfBQHDyUj
 g0cCcg//awBruofTHIrBOwHmCX1a09mw5WmkFG48N7tYp4fvaI1aOgs3hH9PZiBG
 fFZUktodwYpEKg6JJOfm1RnLBuKm0+3zmaKGPdK1RcbaDURh8G9qhW65f4mfImvB
 GXlgw59WKtgPAM9zWW9UxjugAk4DBte5xVKYJUsI0t4P7k9TM4i0Fv0VmMUhhDuo
 buPD1cM/GWFHbE7OYJ51aGRtrOHV1nPgQaHBkWaT7EotzGsZ3gtWYzteI3BRXRV/
 IkSgxOefMkIgu1j3KIxFZ1CJDHCZSnx2B+AEMCcp63osyeHBOYoL7KQxo6tBjaRV
 fbCasbkTkvvJUjyZdtOdU2wqf7ZqoDkD+n5nkpENf4G1M8J5RiHmrFq96m3HRonE
 V1bmMslXhsJlvtoT6ec2iJFchiq0nx1XHyST6faUOK+0cd1lzbogWwztydQH4fwd
 TxfEd+eYlFFu3lGDfRp14Tz7fAcFNPZ2bJQhZkF6RpwUW3y3L0cJc3Y0AcWmNkvJ
 oStvTlbbUvgRgO7rvEyAmdPb31lE6PLaA0WCahcvf4zQxxNMyYyaWP73MegvqJGO
 pfJXBOWBTTKwu0fDR5UHJd3tEDABvcZnwBaCSYrpI5f9bJ4NRI3f4DIMwLBnw9IK
 aH6pzwo4gTAMuvxzq8KeTp3hU7kszyUN8q8hiTZlgVozMLKXhQY=
 =mv1v
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-5.6-rc1' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine updates from Vinod Koul:
 "This time we have a bunch of core changes to support dynamic channels,
  hotplug of controllers, new apis for metadata ops etc along with new
  drivers for Intel data accelerators, TI K3 UDMA, PLX DMA engine and
  hisilicon Kunpeng DMA engine. Also usual assorted updates to drivers.

  Core:
   - Support for dynamic channels
   - Removal of various slave wrappers
   - Make few slave request APIs as private to dmaengine
   - Symlinks between channels and slaves
   - Support for hotplug of controllers
   - Support for metadata_ops for dma_async_tx_descriptor
   - Reporting DMA cached data amount
   - Virtual dma channel locking updates

  New drivers/device/feature support support:
   - Driver for Intel data accelerators
   - Driver for TI K3 UDMA
   - Driver for PLX DMA engine
   - Driver for hisilicon Kunpeng DMA engine
   - Support for eDMA support for QorIQ LS1028A in fsl edma driver
   - Support for cyclic dma in sun4i driver
   - Support for X1830 in JZ4780 driver"

* tag 'dmaengine-5.6-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (62 commits)
  dmaengine: Create symlinks between DMA channels and slaves
  dmaengine: hisilicon: Add Kunpeng DMA engine support
  dmaengine: idxd: add char driver to expose submission portal to userland
  dmaengine: idxd: connect idxd to dmaengine subsystem
  dmaengine: idxd: add descriptor manipulation routines
  dmaengine: idxd: add sysfs ABI for idxd driver
  dmaengine: idxd: add configuration component of driver
  dmaengine: idxd: Init and probe for Intel data accelerators
  dmaengine: add support to dynamic register/unregister of channels
  dmaengine: break out channel registration
  x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction
  dmaengine: ti: k3-udma: fix spelling mistake "limted" -> "limited"
  dmaengine: s3c24xx-dma: fix spelling mistake "to" -> "too"
  dmaengine: Move dma_get_{,any_}slave_channel() to private dmaengine.h
  dmaengine: Remove dma_request_slave_channel_compat() wrapper
  dmaengine: Remove dma_device_satisfies_mask() wrapper
  dt-bindings: fsl-imx-sdma: Add i.MX8MM/i.MX8MN/i.MX8MP compatible string
  dmaengine: zynqmp_dma: fix burst length configuration
  dmaengine: sun4i: Add support for cyclic requests with dedicated DMA
  dmaengine: fsl-qdma: fix duplicated argument to &&
  ...
2020-01-27 10:55:50 -08:00
Linus Torvalds
12fb2b993e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
 "This time it's surprisingly quiet (probably due to the christmas
  break):

   - Logitech HID++ protocol improvements from Mazin Rezk, Pedro
     Vanzella and Adrian Freund

   - support for hidraw uniq ioctl from Marcel Holtmann"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-hidpp: avoid duplicate error handling code in 'hidpp_probe()'
  hid-logitech-hidpp: read battery voltage from newer devices
  HID: logitech: Add MX Master 3 Mouse
  HID: logitech-hidpp: Support WirelessDeviceStatus connect events
  HID: logitech-hidpp: Support translations from short to long reports
  HID: hidraw: add support uniq ioctl
2020-01-27 10:48:30 -08:00
Michal Kubecek
67bffa7923 ethtool: add WOL_NTF notification
Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of
a device are modified using ETHTOOL_MSG_WOL_SET netlink message or
ETHTOOL_SWOL ioctl request.

As notifications can be received by anyone, do not include SecureOn(tm)
password in notification messages.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27 11:31:36 +01:00
Michal Kubecek
8d425b19b3 ethtool: set wake-on-lan settings with WOL_SET request
Implement WOL_SET netlink request to set wake-on-lan settings. This is
equivalent to ETHTOOL_SWOL ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27 11:31:36 +01:00
Michal Kubecek
51ea22b04e ethtool: provide WoL settings with WOL_GET request
Implement WOL_GET request to get wake-on-lan settings for a device,
traditionally available via ETHTOOL_GWOL ioctl request.

As part of the implementation, provide symbolic names for wake-on-line
modes as ETH_SS_WOL_MODES string set.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27 11:31:36 +01:00
Michal Kubecek
0bda7af39d ethtool: add DEBUG_NTF notification
Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message
mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message
or ETHTOOL_SMSGLVL ioctl request.

The notification message has the same format as reply to DEBUG_GET request.
As with other ethtool notifications, netlink requests only trigger the
notification if the mask is actually changed while ioctl request trigger it
whenever the request results in calling the ethtool_ops handler.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27 11:31:36 +01:00
Michal Kubecek
e54d04e3af ethtool: set message mask with DEBUG_SET request
Implement DEBUG_SET netlink request to set debugging settings for a device.
At the moment, only message mask corresponding to message level as set by
ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27 11:31:35 +01:00
Michal Kubecek
6a94b8ccf6 ethtool: provide message mask with DEBUG_GET request
Implement DEBUG_GET request to get debugging settings for a device. At the
moment, only message mask corresponding to message level as reported by
ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)

As part of the implementation, provide symbolic names for message mask bits
as ETH_SS_MSG_CLASSES string set.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27 11:31:35 +01:00
Stefano Brivio
f3a2181e16 netfilter: nf_tables: Support for sets with multiple ranged fields
Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.

This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc->field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.

In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.

While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.

For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:

  NFTA_SET_ELEM_KEY:            192.0.2.0 . 22
  NFTA_SET_ELEM_KEY_END:        192.0.2.42 . 25

and NFTA_SET_DESC_CONCAT attribute would contain:

  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		4
  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		2

v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27 08:54:30 +01:00
Pablo Neira Ayuso
7b225d0b5c netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute
Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.

This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; add corresponding
  nft_set_ext_type for new key; rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27 08:54:30 +01:00
Abdul Kabbani
32efcc06d2 tcp: export count for rehash attempts
Using IPv6 flow-label to swiftly route around avoid congested or
disconnected network path can greatly improve TCP reliability.

This patch adds SNMP counters and a OPT_STATS counter to track both
host-level and connection-level statistics. Network administrators
can use these counters to evaluate the impact of this new ability better.

Export count for rehash attempts to
1) two SNMP counters: TcpTimeoutRehash (rehash due to timeouts),
   and TcpDuplicateDataRehash (rehash due to receiving duplicate
   packets)
2) Timestamping API SOF_TIMESTAMPING_OPT_STATS.

Signed-off-by: Abdul Kabbani <akabbani@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-26 15:28:47 +01:00
David S. Miller
4d8773b68e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor conflict in mlx5 because changes happened to code that has
moved meanwhile.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-26 10:40:21 +01:00
Nikolay Aleksandrov
a580c76d53 net: bridge: vlan: add per-vlan state
The first per-vlan option added is state, it is needed for EVPN and for
per-vlan STP. The state allows to control the forwarding on per-vlan
basis. The vlan state is considered only if the port state is forwarding
in order to avoid conflicts and be consistent. br_allowed_egress is
called only when the state is forwarding, but the ingress case is a bit
more complicated due to the fact that we may have the transition between
port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still
allow the bridge to learn from the packet after vlan filtering and it will
be dropped after that. Also to optimize the pvid state check we keep a
copy in the vlan group to avoid one lookup. The state members are
modified with *_ONCE() to annotate the lockless access.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 12:58:14 +01:00
Nikolay Aleksandrov
a5d29ae226 net: bridge: vlan: add basic option setting support
This patch adds support for option modification of single vlans and
ranges. It allows to only modify options, i.e. skip create/delete by
using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range
option changes we try to pack the notifications as much as possible.

v2: do full port (all vlans) notification only when creating/deleting
    vlans for compatibility, rework the range detection when changing
    options, add more verbose extack errors and check if a vlan should
    be used (br_vlan_should_use checks)

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 12:58:14 +01:00
Dave Jiang
bfe1d56091 dmaengine: idxd: Init and probe for Intel data accelerators
The idxd driver introduces the Intel Data Stream Accelerator [1] that will
be available on future Intel Xeon CPUs. One of the kernel access
point for the driver is through the dmaengine subsystem. It will initially
provide the DMA copy service to the kernel.

Some of the main functionality introduced with this accelerator
are: shared virtual memory (SVM) support, and descriptor submission using
Intel CPU instructions movdir64b and enqcmds. There will be additional
accelerator devices that share the same driver with variations to
capabilities.

This commit introduces the probe and initialization component of the
driver.

[1]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965023991.73301.6186843973135311580.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-24 11:18:45 +05:30
Christoph Hellwig
6321bef028 bcache: use read_cache_page_gfp to read the superblock
Avoid a pointless dependency on buffer heads in bcache by simply open
coding reading a single page.  Also add a SB_OFFSET define for the
byte offset of the superblock instead of using magic numbers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:01 -07:00
Christoph Hellwig
a702a692cd bcache: use a separate data structure for the on-disk super block
Split out an on-disk version struct cache_sb with the proper endianness
annotations.  This fixes a fair chunk of sparse warnings, but there are
some left due to the way the checksum is defined.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:00 -07:00
Mohit P. Tahiliani
ec97ecf1eb net: sched: add Flow Queue PIE packet scheduler
Principles:
  - Packets are classified on flows.
  - This is a Stochastic model (as we use a hash, several flows might
                                be hashed to the same slot)
  - Each flow has a PIE managed queue.
  - Flows are linked onto two (Round Robin) lists,
    so that new flows have priority on old ones.
  - For a given flow, packets are not reordered.
  - Drops during enqueue only.
  - ECN capability is off by default.
  - ECN threshold (if ECN is enabled) is at 10% by default.
  - Uses timestamps to calculate queue delay by default.

Usage:
tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ]
                    [ target TIME ] [ tupdate TIME ]
                    [ alpha NUMBER ] [ beta NUMBER ]
                    [ quantum BYTES ] [ memory_limit BYTES ]
                    [ ecnprob PERCENTAGE ] [ [no]ecn ]
                    [ [no]bytemode ] [ [no_]dq_rate_estimator ]

defaults:
  limit: 10240 packets, flows: 1024
  target: 15 ms, tupdate: 15 ms (in jiffies)
  alpha: 1/8, beta : 5/4
  quantum: device MTU, memory_limit: 32 Mb
  ecnprob: 10%, ecn: off
  bytemode: off, dq_rate_estimator: off

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com>
Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 11:38:31 +01:00
David S. Miller
954b3c4397 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-01-22

The following pull-request contains BPF updates for your *net-next* tree.

We've added 92 non-merge commits during the last 16 day(s) which contain
a total of 320 files changed, 7532 insertions(+), 1448 deletions(-).

The main changes are:

1) function by function verification and program extensions from Alexei.

2) massive cleanup of selftests/bpf from Toke and Andrii.

3) batched bpf map operations from Brian and Yonghong.

4) tcp congestion control in bpf from Martin.

5) bulking for non-map xdp_redirect form Toke.

6) bpf_send_signal_thread helper from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 08:10:16 +01:00
Martin KaFai Lau
5576b991e9 bpf: Add BPF_FUNC_jiffies64
This patch adds a helper to read the 64bit jiffies.  It will be used
in a later patch to implement the bpf_cubic.c.

The helper is inlined for jit_requested and 64 BITS_PER_LONG
as the map_gen_lookup().  Other cases could be considered together
with map_gen_lookup() if needed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
2020-01-22 16:30:10 -08:00
Alexei Starovoitov
be8704ff07 bpf: Introduce dynamic program extensions
Introduce dynamic program extensions. The users can load additional BPF
functions and replace global functions in previously loaded BPF programs while
these programs are executing.

Global functions are verified individually by the verifier based on their types only.
Hence the global function in the new program which types match older function can
safely replace that corresponding function.

This new function/program is called 'an extension' of old program. At load time
the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function
to be replaced. The BPF program type is derived from the target program into
extension program. Technically bpf_verifier_ops is copied from target program.
The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops.
The extension program can call the same bpf helper functions as target program.
Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program
types. The verifier allows only one level of replacement. Meaning that the
extension program cannot recursively extend an extension. That also means that
the maximum stack size is increasing from 512 to 1024 bytes and maximum
function nesting level from 8 to 16. The programs don't always consume that
much. The stack usage is determined by the number of on-stack variables used by
the program. The verifier could have enforced 512 limit for combined original
plus extension program, but it makes for difficult user experience. The main
use case for extensions is to provide generic mechanism to plug external
programs into policy program or function call chaining.

BPF trampoline is used to track both fentry/fexit and program extensions
because both are using the same nop slot at the beginning of every BPF
function. Attaching fentry/fexit to a function that was replaced is not
allowed. The opposite is true as well. Replacing a function that currently
being analyzed with fentry/fexit is not allowed. The executable page allocated
by BPF trampoline is not used by program extensions. This inefficiency will be
optimized in future patches.

Function by function verification of global function supports scalars and
pointer to context only. Hence program extensions are supported for such class
of global functions only. In the future the verifier will be extended with
support to pointers to structures, arrays with sizes, etc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org
2020-01-22 23:04:52 +01:00
Linus Torvalds
dbab40bdb4 io_uring-5.5-2020-01-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4obx8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpqNwD/99ae+ezi7LSVj9zQml7y/6ZSV4D3wzD9PJ
 7QsUq5kGA0tisZ9q/rd0eja4Fy2Dw/qhX+GXgTYLt9+a66rp0CskWaD9NWFMtFGp
 eBgitruw5SqFl8GfNCjd6NB/Af3NGyrmQSPV58K7mma6zQX7ELCrEdipCKj5QpNk
 eHO0enZA1KcPliegAbDQfhz7U9frns9nSs0VHB599X9jr5pi8PPejukVEBwK67o6
 Dh52CDqjeKksX8PWxhXau9j8DNt85Zs8ocRFvgWD8/UQSLcHAM6DFLdkeGEHQu9H
 QouW9JRFzmTksy3KvPnCcCPdsYQrqVmj6fCCg6AXW3yOzcI1IvhO+hcNMQtkLFWI
 5JKYZkFhGjCsypmkYpB+5mqcz+fsbkfgN9clU1tvPK8FcmpLolsIQrUJdaRe6r58
 odDe9Qs+I46LAYKttkkAlpYg1E9CD0T7g1ENXzcqb5t6fZTW4oU0Wpqen788WQqz
 EQqp30kU0FgnFAW8BUpJK5iwrrm3RrS+Br6lhk33BeA423Pt6n3RnXYFVvtAHeuA
 jyUVqiMKexi+7fCC2LO1M9xofQMmr6z2nVkZNhDLIr4y9uxD4xTyiaEAFjk6Lws6
 lcSWZMHQPKaCqfxhAtnoVZP96k6zMwfEJUb+fANX9SI0+3p9LHFz2Kp/AOs6GvJC
 /A5vCFjLWw==
 =oNaB
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.5-2020-01-22' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "This was supposed to have gone in last week, but due to a brain fart
  on my part, I forgot that we made this struct addition in the 5.5
  cycle. So here it is for 5.5, to prevent having a 32 vs 64-bit
  compatability issue with the files_update command"

* tag 'io_uring-5.5-2020-01-22' of git://git.kernel.dk/linux-block:
  io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
2020-01-22 08:30:09 -08:00
David S. Miller
4f2c17e0f3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-01-21

1) Add support for TCP encapsulation of IKE and ESP messages,
   as defined by RFC 8229. Patchset from Sabrina Dubroca.

Please note that there is a merge conflict in:

net/unix/af_unix.c

between commit:

3c32da19a8 ("unix: Show number of pending scm files of receive queue in fdinfo")

from the net-next tree and commit:

b50b0580d2 ("net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram")

from the ipsec-next tree.

The conflict can be solved as done in linux-next.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 12:18:20 +01:00
Martin Schiller
f362e5fe0f wan/hdlc_x25: make lapb params configurable
This enables you to configure mode (DTE/DCE), Modulo, Window, T1, T2, N2 via
sethdlc (which needs to be patched as well).

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 11:41:36 +01:00
Pavel Begunkov
6b47ee6eca io_uring: optimise sqe-to-req flags translation
For each IOSQE_* flag there is a corresponding REQ_F_* flag. And there
is a repetitive pattern of their translation:
e.g. if (sqe->flags & SQE_FLAG*) req->flags |= REQ_F_FLAG*

Use same numeric values/bits for them and copy instead of manual
handling.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:07 -07:00
Jens Axboe
66f4af93da io_uring: add support for probing opcodes
The application currently has no way of knowing if a given opcode is
supported or not without having to try and issue one and see if we get
-EINVAL or not. And even this approach is fraught with peril, as maybe
we're getting -EINVAL due to some fields being missing, or maybe it's
just not that easy to issue that particular command without doing some
other leg work in terms of setup first.

This adds IORING_REGISTER_PROBE, which fills in a structure with info
on what it supported or not. This will work even with sparse opcode
fields, which may happen in the future or even today if someone
backports specific features to older kernels.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:06 -07:00
Jens Axboe
cebdb98617 io_uring: add support for IORING_OP_OPENAT2
Add support for the new openat2(2) system call. It's trivial to do, as
we can have openat(2) just be wrapped around it.

Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:04 -07:00
Jens Axboe
f2842ab5b7 io_uring: enable option to only trigger eventfd for async completions
If an application is using eventfd notifications with poll to know when
new SQEs can be issued, it's expecting the following read/writes to
complete inline. And with that, it knows that there are events available,
and don't want spurious wakeups on the eventfd for those requests.

This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like
IORING_REGISTER_EVENTFD, except it only triggers notifications for events
that happen from async completions (IRQ, or io-wq worker completions).
Any completions inline from the submission itself will not trigger
notifications.

Suggested-by: Mark Papadakis <markuspapadakis@icloud.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:04 -07:00
Jens Axboe
fddafacee2 io_uring: add support for send(2) and recv(2)
This adds IORING_OP_SEND for send(2) support, and IORING_OP_RECV for
recv(2) support.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:04 -07:00
Jens Axboe
8110c1a621 io_uring: add support for IORING_SETUP_CLAMP
Some applications like to start small in terms of ring size, and then
ramp up as needed. This is a bit tricky to do currently, since we don't
advertise the max ring size.

This adds IORING_SETUP_CLAMP. If set, and the values for SQ or CQ ring
size exceed what we support, then clamp them at the max values instead
of returning -EINVAL. Since we return the chosen ring sizes after setup,
no further changes are needed on the application side. io_uring already
changes the ring sizes if the application doesn't ask for power-of-two
sizes, for example.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:02 -07:00
Jens Axboe
c1ca757bd6 io_uring: add IORING_OP_MADVISE
This adds support for doing madvise(2) through io_uring. We assume that
any operation can block, and hence punt everything async. This could be
improved, but hard to make bullet proof. The async punt ensures it's
safe.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:02 -07:00
Jens Axboe
4840e418c2 io_uring: add IORING_OP_FADVISE
This adds support for doing fadvise through io_uring. We assume that
WILLNEED doesn't block, but that DONTNEED may block.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:01 -07:00
Jens Axboe
ba04291eb6 io_uring: allow use of offset == -1 to mean file position
This behaves like preadv2/pwritev2 with offset == -1, it'll use (and
update) the current file position. This obviously comes with the caveat
that if the application has multiple read/writes in flight, then the
end result will not be as expected. This is similar to threads sharing
a file descriptor and doing IO using the current file position.

Since this feature isn't easily detectable by doing a read or write,
add a feature flags, IORING_FEAT_RW_CUR_POS, to allow applications to
detect presence of this feature.

Reported-by: 李通洲 <carter.li@eoitek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:03:59 -07:00
Jens Axboe
3a6820f2bb io_uring: add non-vectored read/write commands
For uses cases that don't already naturally have an iovec, it's easier
(or more convenient) to just use a buffer address + length. This is
particular true if the use case is from languages that want to create
a memory safe abstraction on top of io_uring, and where introducing
the need for the iovec may impose an ownership issue. For those cases,
they currently need an indirection buffer, which means allocating data
just for this purpose.

Add basic read/write that don't require the iovec.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:03:59 -07:00
Jens Axboe
ce35a47a3a io_uring: add IOSQE_ASYNC
io_uring defaults to always doing inline submissions, if at all
possible. But for larger copies, even if the data is fully cached, that
can take a long time. Add an IOSQE_ASYNC flag that the application can
set on the SQE - if set, it'll ensure that we always go async for those
kinds of requests. Use the io-wq IO_WQ_WORK_CONCURRENT flag to ensure we
get the concurrency we desire for this case.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:03:59 -07:00
Jens Axboe
eddc7ef52a io_uring: add support for IORING_OP_STATX
This provides support for async statx(2) through io_uring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:03:54 -07:00
Jens Axboe
05f3fb3c53 io_uring: avoid ring quiesce for fixed file set unregister and update
We currently fully quiesce the ring before an unregister or update of
the fixed fileset. This is very expensive, and we can be a bit smarter
about this.

Add a percpu refcount for the file tables as a whole. Grab a percpu ref
when we use a registered file, and put it on completion. This is cheap
to do. Upon removal of a file from a set, switch the ref count to atomic
mode. When we hit zero ref on the completion side, then we know we can
drop the previously registered files. When the old files have been
dropped, switch the ref back to percpu mode for normal operation.

Since there's a period between doing the update and the kernel being
done with it, add a IORING_OP_FILES_UPDATE opcode that can perform the
same action. The application knows the update has completed when it gets
the CQE for it. Between doing the update and receiving this completion,
the application must continue to use the unregistered fd if submitting
IO on this particular file.

This takes the runtime of test/file-register from liburing from 14s to
about 0.7s.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:03:50 -07:00
Jens Axboe
b5dba59e0c io_uring: add support for IORING_OP_CLOSE
This works just like close(2), unsurprisingly. We remove the file
descriptor and post the completion inline, then offload the actual
(potential) last file put to async context.

Mark the async part of this work as uncancellable, as we really must
guarantee that the latter part of the close is run.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
15b71abe7b io_uring: add support for IORING_OP_OPENAT
This works just like openat(2), except it can be performed async. For
the normal case of a non-blocking path lookup this will complete
inline. If we have to do IO to perform the open, it'll be done from
async context.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
d63d1b5edb io_uring: add support for fallocate()
This exposes fallocate(2) through io_uring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
4d92748373 Merge branch 'io_uring-5.5' into for-5.6/io_uring-vfs
Pull in compatability fix for the files_update command.

* io_uring-5.5:
  io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
2020-01-20 17:01:17 -07:00
Eugene Syromiatnikov
1292e972ff io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space.  In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it.  Also, align the field naturally and check that
no garbage is passed there.

Fixes: c3a31e6056 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:00:44 -07:00
Jens Axboe
fa7773deb3 Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into for-5.6/io_uring-vfs
Pull in Al's openat2 branch, since we'll need that for the openat2
support.

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-19 19:47:04 -07:00
Dave Airlie
3d4743131b Linux 5.5-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4k7i8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvk0IAKRenVOdiudY77SQ
 VZjsteyrYTTQtPPv494ToIRjR0XQ+gYp8vyWzXTUC5Nm9Y9U3VzDqUPUjWszrSXE
 6mU+tzcMc9qwuUxnIFn8zfg64ygw+37sn/w3xqeH4QmF9Z5Wl3EX3SdXTs7jp3RS
 VxiztkUNI5ZBV2GDtla5K/9qLPqCQnUYXIiyi5lAtBtiitZDVXFp7dy7hMgEiaEO
 +78K5Kh3xlt5ndDsBFOlwIb2Oof3KL7bBXntdbSBc/bjol6IRvAgln48HWCv59G2
 jzAp2tj2KobX9GRAEPj+v4TQZEW0SXDNDi8MgQsM+3DYVCTmANsv57CBKRuf01+F
 nB1kAys=
 =zSnJ
 -----END PGP SIGNATURE-----

Backmerge v5.5-rc7 into drm-next

msm needs 5.5-rc4, go to the latest.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-01-20 11:42:57 +10:00
David S. Miller
b3f7e3f23a Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2020-01-19 22:10:04 +01:00
Aleksa Sarai
fddb5d430a open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].

This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).

Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.

In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.

Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).

/* Syscall Prototype. */
  /*
   * open_how is an extensible structure (similar in interface to
   * clone3(2) or sched_setattr(2)). The size parameter must be set to
   * sizeof(struct open_how), to allow for future extensions. All future
   * extensions will be appended to open_how, with their zero value
   * acting as a no-op default.
   */
  struct open_how { /* ... */ };

  int openat2(int dfd, const char *pathname,
              struct open_how *how, size_t size);

/* Description. */
The initial version of 'struct open_how' contains the following fields:

  flags
    Used to specify openat(2)-style flags. However, any unknown flag
    bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
    will result in -EINVAL. In addition, this field is 64-bits wide to
    allow for more O_ flags than currently permitted with openat(2).

  mode
    The file mode for O_CREAT or O_TMPFILE.

    Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.

  resolve
    Restrict path resolution (in contrast to O_* flags they affect all
    path components). The current set of flags are as follows (at the
    moment, all of the RESOLVE_ flags are implemented as just passing
    the corresponding LOOKUP_ flag).

    RESOLVE_NO_XDEV       => LOOKUP_NO_XDEV
    RESOLVE_NO_SYMLINKS   => LOOKUP_NO_SYMLINKS
    RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
    RESOLVE_BENEATH       => LOOKUP_BENEATH
    RESOLVE_IN_ROOT       => LOOKUP_IN_ROOT

open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.

Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).

After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.

/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.

In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).

/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).

Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.

Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).

[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb8 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs

Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-18 09:19:18 -05:00
Jeremy Sowden
567d746b55 netfilter: bitwise: add support for shifts.
Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR
and XOR.  Extend it to do shifts as well.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16 15:52:02 +01:00
Jeremy Sowden
779f725e14 netfilter: bitwise: add NFTA_BITWISE_DATA attribute.
Add a new bitwise netlink attribute that will be used by shift
operations to store the size of the shift.  It is not used by boolean
operations.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16 15:52:02 +01:00
Jeremy Sowden
9d1f979986 netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute.
Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a
value of a new enum, nft_bitwise_ops.  It describes the type of
operation an expression contains.  Currently, it only has one value:
NFT_BITWISE_BOOL.  More values will be added later to implement shifts.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16 15:51:57 +01:00
Jeremy Sowden
4a7faaf4ad netfilter: nft_bitwise: correct uapi header comment.
The comment documenting how bitwise expressions work includes a table
which summarizes the mask and xor arguments combined to express the
supported boolean operations.  However, the row for OR:

 mask    xor
 0       x

is incorrect.

  dreg = (sreg & 0) ^ x

is not equivalent to:

  dreg = sreg | x

What the code actually does is:

  dreg = (sreg & ~x) ^ x

Update the documentation to match.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16 15:51:14 +01:00
David S. Miller
8fec380ac0 This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - fix typo and kerneldocs, by Sven Eckelmann
 
  - use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer
 
  - silence some endian sparse warnings by adding annotations,
    by Sven Eckelmann
 
  - Update copyright years to 2020, by Sven Eckelmann
 
  - Disable deprecated sysfs configuration by default, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl4dzoQWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeodEbD/0fM2O/hx/CsBoWDt9rU+tA/Nh8
 CWYUn9fjghG3RpSCzIjqpcwSe4T09KHw7BvPBiWpVNmUYeNz7aLmujbaj1DmAY8l
 l50VHtJYM1aCIC8O3UcqznrUtoyEdAi78gpxUXLVNABJ9t9gkpXaS3bM06gs5QW4
 7Y5KSpN9O7a7QrLByb8KZUuZrHCSUzrVJHkc3eZ4FqAk3jRw0LjaMIkoQaKImc9D
 wOUTU0rUvuTNBZ9jteChWkfpHwa1QrRgoxYiw5d5u7FT7yOZE1NoQUbm0ZzhdMrO
 QLkcdN8n6gI8waDP47cxwNmSRpRS8S7PIyG2MSi1LNoENUcdHuPuPFAnBbXWyUqc
 QAXSe+JS2MKAT13Tt8+y34WWCtT94K7Hko1SbT4WpawdF1G+DzT2BNI9p+kOJLRV
 Ap7OHTwrZHqkTyQLbMEYdfIPFwMilYEc+O24kjOw6OCSt45F1l9bad4IwvE/iASb
 HNzLO2Jlj5gUMXShbcIzC3PGYSMWpZKoX3075SqOXjcOKfF/jUYfaUt8WSUvi1bA
 CcEMD7FMrVa0mBkhG2WUO9f/ARlfTckhg/ijD4NSDLy7CsjhWXZL7lC9N7R43ktM
 LThTJ+ey1P+K6q1OH4ytvlApSwHC1OdaA6zUhoht9XQDHrnLf6gDpTDd4xn8zKPZ
 BXbGv9bCi3/df6ocCw==
 =xeMk
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20200114' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - fix typo and kerneldocs, by Sven Eckelmann

 - use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer

 - silence some endian sparse warnings by adding annotations,
   by Sven Eckelmann

 - Update copyright years to 2020, by Sven Eckelmann

 - Disable deprecated sysfs configuration by default, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 23:04:04 +01:00
Yonghong Song
057996380a bpf: Add batch ops to all htab bpf map
htab can't use generic batch support due some problematic behaviours
inherent to the data structre, i.e. while iterating the bpf map  a
concurrent program might delete the next entry that batch was about to
use, in that case there's no easy solution to retrieve the next entry,
the issue has been discussed multiple times (see [1] and [2]).

The only way hmap can be traversed without the problem previously
exposed is by making sure that the map is traversing entire buckets.
This commit implements those strict requirements for hmap, the
implementation follows the same interaction that generic support with
some exceptions:

 - If keys/values buffer are not big enough to traverse a bucket,
   ENOSPC will be returned.
 - out_batch contains the value of the next bucket in the iteration, not
   the next key, but this is transparent for the user since the user
   should never use out_batch for other than bpf batch syscalls.

This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new
command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete
batch ops it is possible to use the generic implementations.

[1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
[2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com
2020-01-15 14:00:35 -08:00
Brian Vazquez
aa2e93b8e5 bpf: Add generic support for update and delete batch ops
This commit adds generic support for update and delete batch ops that
can be used for almost all the bpf maps. These commands share the same
UAPI attr that lookup and lookup_and_delete batch ops use and the
syscall commands are:

  BPF_MAP_UPDATE_BATCH
  BPF_MAP_DELETE_BATCH

The main difference between update/delete and lookup batch ops is that
for update/delete keys/values must be specified for userspace and
because of that, neither in_batch nor out_batch are used.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com
2020-01-15 14:00:35 -08:00
Brian Vazquez
cb4d03ab49 bpf: Add generic support for lookup batch op
This commit introduces generic support for the bpf_map_lookup_batch.
This implementation can be used by almost all the bpf maps since its core
implementation is relying on the existing map_get_next_key and
map_lookup_elem. The bpf syscall subcommand introduced is:

  BPF_MAP_LOOKUP_BATCH

The UAPI attribute is:

  struct { /* struct used by BPF_MAP_*_BATCH commands */
         __aligned_u64   in_batch;       /* start batch,
                                          * NULL to start from beginning
                                          */
         __aligned_u64   out_batch;      /* output: next start batch */
         __aligned_u64   keys;
         __aligned_u64   values;
         __u32           count;          /* input/output:
                                          * input: # of key/value
                                          * elements
                                          * output: # of filled elements
                                          */
         __u32           map_fd;
         __u64           elem_flags;
         __u64           flags;
  } batch;

in_batch/out_batch are opaque values use to communicate between
user/kernel space, in_batch/out_batch must be of key_size length.

To start iterating from the beginning in_batch must be null,
count is the # of key/value elements to retrieve. Note that the 'keys'
buffer must be a buffer of key_size * count size and the 'values' buffer
must be value_size * count, where value_size must be aligned to 8 bytes
by userspace if it's dealing with percpu maps. 'count' will contain the
number of keys/values successfully retrieved. Note that 'count' is an
input/output variable and it can contain a lower value after a call.

If there's no more entries to retrieve, ENOENT will be returned. If error
is ENOENT, count might be > 0 in case it copied some values but there were
no more entries to retrieve.

Note that if the return code is an error and not -EFAULT,
count indicates the number of elements successfully processed.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com
2020-01-15 14:00:35 -08:00
Yonghong Song
8482941f09 bpf: Add bpf_send_signal_thread() helper
Commit 8b401f9ed2 ("bpf: implement bpf_send_signal() helper")
added helper bpf_send_signal() which permits bpf program to
send a signal to the current process. The signal may be
delivered to any threads in the process.

We found a use case where sending the signal to the current
thread is more preferable.
  - A bpf program will collect the stack trace and then
    send signal to the user application.
  - The user application will add some thread specific
    information to the just collected stack trace for
    later analysis.

If bpf_send_signal() is used, user application will need
to check whether the thread receiving the signal matches
the thread collecting the stack by checking thread id.
If not, it will need to send signal to another thread
through pthread_kill().

This patch proposed a new helper bpf_send_signal_thread(),
which sends the signal to the thread corresponding to
the current kernel task. This way, user space is guaranteed that
bpf_program execution context and user space signal handling
context are the same thread.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com
2020-01-15 11:44:51 -08:00
Kelvin Cao
4efa1d2e36 PCI/switchtec: Add Gen4 flash information interface support
Add the new flash_info registers struct and the implementation of
ioctl_flash_part_info() for the new Gen4 hardware.

[logang@deltatee.com: rewrote commit message]
Link: https://lore.kernel.org/r/20200115035648.2578-7-logang@deltatee.com
Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:39 -06:00
Logan Gunthorpe
fcccd282b6 PCI/switchtec: Rename generation-specific constants
Gen4 hardware will have different values for the SWITCHTEC_X_RUNNING and
SWITCHTEC_IOCTL_NUM_PARTITIONS, so rename them with GEN3 in their name.

No functional changes intended.

Link: https://lore.kernel.org/r/20200115035648.2578-2-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:37 -06:00
Logan Gunthorpe
a6b0ef9a7d PCI/switchtec: Add support for Intercomm Notify and Upstream Error Containment
Add support for the Inter Fabric Manager Communication (Intercomm) Notify
event in PAX variants of Switchtec hardware and the Upstream Error
Containment port in the MR1 release of Gen3 firmware.

Link: https://lore.kernel.org/r/20200106190337.2428-4-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:27 -06:00
Nikolay Aleksandrov
cf5bddb95c net: bridge: vlan: add rtnetlink group and notify support
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN
and add support for sending vlan notifications (both single and ranges).
No functional changes intended, the notification support will be used by
later patches.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov
0ab5587951 net: bridge: vlan: add rtm range support
Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes
RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress
similar vlans into ranges. This will be also used when per-vlan options
are introduced and vlans' options match, they will be put into a single
range which is encapsulated in one netlink attribute. We need to run
similar checks as br_process_vlan_info() does because these ranges will
be used for options setting and they'll be able to skip
br_process_vlan_info().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov
8dcea18708 net: bridge: vlan: add rtm definitions and dump support
This patch adds vlan rtm definitions:
 - NEWVLAN: to be used for creating vlans, setting options and
   notifications
 - DELVLAN: to be used for deleting vlans
 - GETVLAN: used for dumping vlan information

Dumping vlans which can span multiple messages is added now with basic
information (vid and flags). We use nlmsg_parse() to validate the header
length in order to be able to extend the message with filtering
attributes later.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:17 +01:00
John Crispin
5c5e52d1bb nl80211: add handling for BSS color
This patch adds the attributes, policy and parsing code to allow userland
to send the info about the BSS coloring settings to the kernel.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20191217141921.8114-1-john@phrozen.org
[johannes: remove the strict policy parsing, that was a misunderstanding]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15 11:14:24 +01:00
Ido Schimmel
90b93f1b31 ipv4: Add "offload" and "trap" indications to routes
When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.

While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.

This patch adds an "offload" and "trap" indications to IPv4 routes, so
that users will have better visibility into the offload process.

'struct fib_alias' is extended with two new fields that indicate if the
route resides in hardware or not and if it is offloading traffic from
the kernel or trapping packets to it. Note that the new fields are added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].

Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
route's key in order to set the flags.

The indications are dumped to user space via a new flags (i.e.,
'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
ancillary header.

v2:
* Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()

[1]
struct fib_alias {
        struct hlist_node  fa_list;                      /*     0    16 */
        struct fib_info *          fa_info;              /*    16     8 */
        u8                         fa_tos;               /*    24     1 */
        u8                         fa_type;              /*    25     1 */
        u8                         fa_state;             /*    26     1 */
        u8                         fa_slen;              /*    27     1 */
        u32                        tb_id;                /*    28     4 */
        s16                        fa_default;           /*    32     2 */
        u8                         offload:1;            /*    34: 0  1 */
        u8                         trap:1;               /*    34: 1  1 */
        u8                         unused:6;             /*    34: 2  1 */

        /* XXX 5 bytes hole, try to pack */

        struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */

        /* size: 56, cachelines: 1, members: 12 */
        /* sum members: 50, holes: 1, sum holes: 5 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 18:53:35 -08:00
Antoine Tenart
dcb780fb27 net: macsec: add nla support for changing the offloading selection
MACsec offloading to underlying hardware devices is disabled by default
(the software implementation is used). This patch adds support for
changing this setting through the MACsec netlink interface. Many checks
are done when enabling offloading on a given MACsec interface as there
are limitations (it must be supported by the hardware, only a single
interface can be offloaded on a given physical device at a time, rules
can't be moved for now).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 11:31:41 -08:00
Antoine Tenart
76564261a7 net: macsec: introduce the macsec_context structure
This patch introduces the macsec_context structure. It will be used
in the kernel to exchange information between the common MACsec
implementation (macsec.c) and the MACsec hardware offloading
implementations. This structure contains pointers to MACsec specific
structures which contain the actual MACsec configuration, and to the
underlying device (phydev for now).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 11:31:41 -08:00
Andrei Vagin
769071ac9f ns: Introduce Time Namespace
Time Namespace isolates clock values.

The kernel provides access to several clocks CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.

CLOCK_REALTIME
      System-wide clock that measures real (i.e., wall-clock) time.

CLOCK_MONOTONIC
      Clock that cannot be set and represents monotonic time since
      some unspecified starting point.

CLOCK_BOOTTIME
      Identical to CLOCK_MONOTONIC, except it also includes any time
      that the system is suspended.

For many users, the time namespace means the ability to changes date and
time in a container (CLOCK_REALTIME). Providing per namespace notions of
CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
value.

But in the context of checkpoint/restore functionality, monotonic and
boottime clocks become interesting. Both clocks are monotonic with
unspecified starting points. These clocks are widely used to measure time
slices and set timers. After restoring or migrating processes, it has to be
guaranteed that they never go backward. In an ideal case, the behavior of
these clocks should be the same as for a case when a whole system is
suspended. All this means that it is required to set CLOCK_MONOTONIC and
CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
offsets for clocks.

A time namespace is similar to a pid namespace in the way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of the process
will be born in the new time namespace, or a process can use the setns()
system call to join a namespace.

This scheme allows setting clock offsets for a namespace, before any
processes appear in it.

All available clone flags have been used, so CLONE_NEWTIME uses the highest
bit of CSIGNAL. It means that it can be used only with the unshare() and
the clone3() system calls.

[ tglx: Adjusted paragraph about clone3() to reality and massaged the
  	changelog a bit. ]

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://criu.org/Time_namespace
Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
2020-01-14 12:20:48 +01:00
Greg Kroah-Hartman
d40310f657 Merge 5.5-rc6 into staging-next
We need the staging fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13 07:52:17 +01:00
Alexei Starovoitov
51c39bb1d5 bpf: Introduce function-by-function verification
New llvm and old llvm with libbpf help produce BTF that distinguish global and
static functions. Unlike arguments of static function the arguments of global
functions cannot be removed or optimized away by llvm. The compiler has to use
exactly the arguments specified in a function prototype. The argument type
information allows the verifier validate each global function independently.
For now only supported argument types are pointer to context and scalars. In
the future pointers to structures, sizes, pointer to packet data can be
supported as well. Consider the following example:

static int f1(int ...)
{
  ...
}

int f3(int b);

int f2(int a)
{
  f1(a) + f3(a);
}

int f3(int b)
{
  ...
}

int main(...)
{
  f1(...) + f2(...) + f3(...);
}

The verifier will start its safety checks from the first global function f2().
It will recursively descend into f1() because it's static. Then it will check
that arguments match for the f3() invocation inside f2(). It will not descend
into f3(). It will finish f2() that has to be successfully verified for all
possible values of 'a'. Then it will proceed with f3(). That function also has
to be safe for all possible values of 'b'. Then it will start subprog 0 (which
is main() function). It will recursively descend into f1() and will skip full
check of f2() and f3(), since they are global. The order of processing global
functions doesn't affect safety, since all global functions must be proven safe
based on their arguments only.

Such function by function verification can drastically improve speed of the
verification and reduce complexity.

Note that the stack limit of 512 still applies to the call chain regardless whether
functions were static or global. The nested level of 8 also still applies. The
same recursion prevention checks are in place as well.

The type information and static/global kind is preserved after the verification
hence in the above example global function f2() and f3() can be replaced later
by equivalent functions with the same types that are loaded and verified later
without affecting safety of this main() program. Such replacement (re-linking)
of global functions is a subject of future patches.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org
2020-01-10 17:20:07 +01:00
Mat Martineau
faf391c382 tcp: Define IPPROTO_MPTCP
To open a MPTCP socket with socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP),
IPPROTO_MPTCP needs a value that differs from IPPROTO_TCP. The existing
IPPROTO numbers mostly map directly to IANA-specified protocol numbers.
MPTCP does not have a protocol number allocated because MPTCP packets
use the TCP protocol number. Use private number not used OTA.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09 18:41:41 -08:00
Linus Torvalds
b5b3159cff Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "Just a few small fixups here"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: imx_sc_key - only take the valid data from SCU firmware as key state
  Input: add safety guards to input_set_keycode()
  Input: input_event - fix struct padding on sparc64
  Input: uinput - always report EPOLLOUT
2020-01-09 15:37:40 -08:00
David S. Miller
a2d6d7ae59 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The ungrafting from PRIO bug fixes in net, when merged into net-next,
merge cleanly but create a build failure.  The resolution used here is
from Petr Machata.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09 12:13:43 -08:00
Andrey Ignatov
f5bfcd953d bpf: Document BPF_F_QUERY_EFFECTIVE flag
Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects
attach_flags what may not be obvious and what may lead to confision.

Specifically attach_flags is returned only for target_fd but if programs
are inherited from an ancestor cgroup then returned attach_flags for
current cgroup may be confusing. For example, two effective programs of
same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in
attach_flags.

Simple repro:
  # bpftool c s /sys/fs/cgroup/path/to/task
  ID       AttachType      AttachFlags     Name
  # bpftool c s /sys/fs/cgroup/path/to/task effective
  ID       AttachType      AttachFlags     Name
  95043    ingress                         tw_ipt_ingress
  95048    ingress                         tw_ingress

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com
2020-01-09 09:40:06 -08:00
Martin KaFai Lau
206057fe02 bpf: Add BPF_FUNC_tcp_send_ack helper
Add a helper to send out a tcp-ack.  It will be used in the later
bpf_dctcp implementation that requires to send out an ack
when the CE state changed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109004551.3900448-1-kafai@fb.com
2020-01-09 08:46:18 -08:00
Martin KaFai Lau
85d33df357 bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS
The patch introduces BPF_MAP_TYPE_STRUCT_OPS.  The map value
is a kernel struct with its func ptr implemented in bpf prog.
This new map is the interface to register/unregister/introspect
a bpf implemented kernel struct.

The kernel struct is actually embedded inside another new struct
(or called the "value" struct in the code).  For example,
"struct tcp_congestion_ops" is embbeded in:
struct bpf_struct_ops_tcp_congestion_ops {
	refcount_t refcnt;
	enum bpf_struct_ops_state state;
	struct tcp_congestion_ops data;  /* <-- kernel subsystem struct here */
}
The map value is "struct bpf_struct_ops_tcp_congestion_ops".
The "bpftool map dump" will then be able to show the
state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g.
number of tcp_sock in the tcp_congestion_ops case).  This "value" struct
is created automatically by a macro.  Having a separate "value" struct
will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding
"void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some
initialization works before registering the struct_ops to the kernel
subsystem).  The libbpf will take care of finding and populating the
"struct bpf_struct_ops_XYZ" from "struct XYZ".

Register a struct_ops to a kernel subsystem:
1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s)
2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id
   set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the
   running kernel.
   Instead of reusing the attr->btf_value_type_id,
   btf_vmlinux_value_type_id s added such that attr->btf_fd can still be
   used as the "user" btf which could store other useful sysadmin/debug
   info that may be introduced in the furture,
   e.g. creation-date/compiler-details/map-creator...etc.
3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described
   in the running kernel btf.  Populate the value of this object.
   The function ptr should be populated with the prog fds.
4. Call BPF_MAP_UPDATE with the object created in (3) as
   the map value.  The key is always "0".

During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's
args as an array of u64 is generated.  BPF_MAP_UPDATE also allows
the specific struct_ops to do some final checks in "st_ops->init_member()"
(e.g. ensure all mandatory func ptrs are implemented).
If everything looks good, it will register this kernel struct
to the kernel subsystem.  The map will not allow further update
from this point.

Unregister a struct_ops from the kernel subsystem:
BPF_MAP_DELETE with key "0".

Introspect a struct_ops:
BPF_MAP_LOOKUP_ELEM with key "0".  The map value returned will
have the prog _id_ populated as the func ptr.

The map value state (enum bpf_struct_ops_state) will transit from:
INIT (map created) =>
INUSE (map updated, i.e. reg) =>
TOBEFREE (map value deleted, i.e. unreg)

The kernel subsystem needs to call bpf_struct_ops_get() and
bpf_struct_ops_put() to manage the "refcnt" in the
"struct bpf_struct_ops_XYZ".  This patch uses a separate refcnt
for the purose of tracking the subsystem usage.  Another approach
is to reuse the map->refcnt and then "show" (i.e. during map_lookup)
the subsystem's usage by doing map->refcnt - map->usercnt to filter out
the map-fd/pinned-map usage.  However, that will also tie down the
future semantics of map->refcnt and map->usercnt.

The very first subsystem's refcnt (during reg()) holds one
count to map->refcnt.  When the very last subsystem's refcnt
is gone, it will also release the map->refcnt.  All bpf_prog will be
freed when the map->refcnt reaches 0 (i.e. during map_free()).

Here is how the bpftool map command will look like:
[root@arch-fb-vm1 bpf]# bpftool map show
6: struct_ops  name dctcp  flags 0x0
	key 4B  value 256B  max_entries 1  memlock 4096B
	btf_id 6
[root@arch-fb-vm1 bpf]# bpftool map dump id 6
[{
        "value": {
            "refcnt": {
                "refs": {
                    "counter": 1
                }
            },
            "state": 1,
            "data": {
                "list": {
                    "next": 0,
                    "prev": 0
                },
                "key": 0,
                "flags": 2,
                "init": 24,
                "release": 0,
                "ssthresh": 25,
                "cong_avoid": 30,
                "set_state": 27,
                "cwnd_event": 28,
                "in_ack_event": 26,
                "undo_cwnd": 29,
                "pkts_acked": 0,
                "min_tso_segs": 0,
                "sndbuf_expand": 0,
                "cong_control": 0,
                "get_info": 0,
                "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0
                ],
                "owner": 0
            }
        }
    }
]

Misc Notes:
* bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup.
  It does an inplace update on "*value" instead returning a pointer
  to syscall.c.  Otherwise, it needs a separate copy of "zero" value
  for the BPF_STRUCT_OPS_STATE_INIT to avoid races.

* The bpf_struct_ops_map_delete_elem() is also called without
  preempt_disable() from map_delete_elem().  It is because
  the "->unreg()" may requires sleepable context, e.g.
  the "tcp_unregister_congestion_control()".

* "const" is added to some of the existing "struct btf_func_model *"
  function arg to avoid a compiler warning caused by this patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com
2020-01-09 08:46:18 -08:00
Martin KaFai Lau
27ae7997a6 bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS
This patch allows the kernel's struct ops (i.e. func ptr) to be
implemented in BPF.  The first use case in this series is the
"struct tcp_congestion_ops" which will be introduced in a
latter patch.

This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS.
The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular
func ptr of a kernel struct.  The attr->attach_btf_id is the btf id
of a kernel struct.  The attr->expected_attach_type is the member
"index" of that kernel struct.  The first member of a struct starts
with member index 0.  That will avoid ambiguity when a kernel struct
has multiple func ptrs with the same func signature.

For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written
to implement the "init" func ptr of the "struct tcp_congestion_ops".
The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops"
of the _running_ kernel.  The attr->expected_attach_type is 3.

The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved
by arch_prepare_bpf_trampoline that will be done in the next
patch when introducing BPF_MAP_TYPE_STRUCT_OPS.

"struct bpf_struct_ops" is introduced as a common interface for the kernel
struct that supports BPF_PROG_TYPE_STRUCT_OPS prog.  The supporting kernel
struct will need to implement an instance of the "struct bpf_struct_ops".

The supporting kernel struct also needs to implement a bpf_verifier_ops.
During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right
bpf_verifier_ops by searching the attr->attach_btf_id.

A new "btf_struct_access" is also added to the bpf_verifier_ops such
that the supporting kernel struct can optionally provide its own specific
check on accessing the func arg (e.g. provide limited write access).

After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called
to initialize some values (e.g. the btf id of the supporting kernel
struct) and it can only be done once the btf_vmlinux is available.

The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog
if the return type of the prog->aux->attach_func_proto is "void".

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com
2020-01-09 08:46:18 -08:00
Dilip Kota
ed22aaaede PCI: dwc: intel: PCIe RC controller driver
Add support to PCIe RC controller on Intel Gateway SoCs.
PCIe controller is based of Synopsys DesignWare PCIe core.

Intel PCIe driver requires Upconfigure support, Fast Training
Sequence and link speed configurations. So adding the respective
helper functions in the PCIe DesignWare framework.
It also programs hardware autonomous speed during speed
configuration so defining it in pci_regs.h.

Also, mark Intel PCIe driver depends on MSI IRQ Domain
as Synopsys DesignWare framework depends on the
PCI_MSI_IRQ_DOMAIN.

Signed-off-by: Dilip Kota <eswara.kota@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
2020-01-09 11:57:18 +00:00
Mauro Carvalho Chehab
8821e92879 Linux 5.5-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4SYegeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG4m4H+QGCUN8SXN+2B+0/
 BfzOf7PFoKzAx3NwDbJQIZqhSl+Zfa4n3VGPEF8sXsvoQgdYvuJnS/5JiAZ9iRIH
 HAfFzegzQ3mCl8Du+SqCvQKs2Jt4OMCX62KGRebRBhpoKfZdwmN7n7pn9lWO771K
 9rxTpeItXhmK46jOFRbi5oyQfmkfSfyUN1b9CB53FXFS+ZDkDNA7QQiIYnKOD7SZ
 RrL7czhZ580QOC61qOlnz1GIhRzvU5SXg4OtuI3YfoOJRY5FKC3YtOgLReT0vPs+
 vEhAyP93upVXIhqm10WHNjd4t4a45Vy5ff64uFsQ9QV4nnqsC2C70YwWbVDdtz/W
 Lm0mvE8=
 =NECs
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc5' into patchwork

Linux 5.5-rc5

* tag 'v5.5-rc5': (1006 commits)
  Linux 5.5-rc5
  Documentation: riscv: add patch acceptance guidelines
  riscv: prefix IRQ_ macro names with an RV_ namespace
  clocksource: riscv: add notrace to riscv_sched_clock
  apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
  hexagon: define ioremap_uc
  ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
  ocfs2: call journal flush to mark journal as empty after journal recovery when mount
  mm/hugetlb: defer freeing of huge pages if in non-task context
  mm/gup: fix memory leak in __gup_benchmark_ioctl
  mm/oom: fix pgtables units mismatch in Killed process message
  fs/posix_acl.c: fix kernel-doc warnings
  hexagon: work around compiler crash
  hexagon: parenthesize registers in asm predicates
  fs/namespace.c: make to_mnt_ns() static
  fs/nsfs.c: include headers for missing declarations
  fs/direct-io.c: include fs/internal.h for missing prototype
  mm: move_pages: return valid node id in status if the page is already on the target node
  memcg: account security cred as well to kmemcg
  kcov: fix struct layout for kcov_remote_arg
  ...
2020-01-08 11:13:25 +01:00
Andy Lutomirski
48446f198f random: ignore GRND_RANDOM in getentropy(2)
The separate blocking pool is going away.  Start by ignoring
GRND_RANDOM in getentropy(2).

This should not materially break any API.  Any code that worked
without this change should work at least as well with this change.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/705c5a091b63cc5da70c99304bb97e0109be0a26.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-07 16:07:01 -05:00
Andy Lutomirski
75551dbf11 random: add GRND_INSECURE to return best-effort non-cryptographic bytes
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/d5473b56cf1fa900ca4bd2b3fc1e5b8874399919.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-07 16:07:00 -05:00
Vladimir Oltean
6c93099450 mii: Add helpers for parsing SGMII auto-negotiation
Typically a MAC PCS auto-configures itself after it receives the
negotiated copper-side link settings from the PHY, but some MAC devices
are more special and need manual interpretation of the SGMII AN result.

In other cases, the PCS exposes the entire tx_config_reg base page as it
is transmitted on the wire during auto-negotiation, so it makes sense to
be able to decode the equivalent lp_advertised bit mask from the raw u16
(of course, "lp" considering the PCS to be the local PHY).

Therefore, add the bit definitions for the SGMII registers 4 and 5
(local device ability, link partner ability), as well as a link_mode
conversion helper that can be used to feed the AN results into
phy_resolve_aneg_linkmode.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-05 23:22:32 -08:00
Andrey Konovalov
a69b83e1ae kcov: fix struct layout for kcov_remote_arg
Make the layout of kcov_remote_arg the same for 32-bit and 64-bit code.
This makes it more convenient to write userspace apps that can be
compiled into 32-bit or 64-bit binaries and still work with the same
64-bit kernel.

Also use proper __u32 types in uapi headers instead of unsigned ints.

Link: http://lkml.kernel.org/r/9e91020876029cfefc9211ff747685eba9536426.1575638983.git.andreyknvl@google.com
Fixes: eec028c938 ("kcov: remote coverage support")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
Cc: "Jacky . Cao @ sony . com" <Jacky.Cao@sony.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Rijo Thomas
757cc3e9ff tee: add AMD-TEE driver
Adds AMD-TEE driver.
* targets AMD APUs which has AMD Secure Processor with software-based
  Trusted Execution Environment (TEE) support
* registers with TEE subsystem
* defines tee_driver_ops function callbacks
* kernel allocated memory is used as shared memory between normal
  world and secure world.
* acts as REE (Rich Execution Environment) communication agent, which
  uses the services of AMD Secure Processor driver to submit commands
  for processing in TEE environment

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Jens Wiklander <jens.wiklander@linaro.org>
Co-developed-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Signed-off-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Reviewed-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-04 13:49:51 +08:00
Arnd Bergmann
577c89b0ce media: v4l2-core: fix v4l2_buffer handling for time64 ABI
The v4l2_buffer structure contains a 'struct timeval' member that is
defined by the user space C library, creating an ABI incompatibility
when that gets updated to a 64-bit time_t.

As in v4l2_event, handle this with a special case in video_put_user()
and video_get_user() to replace the memcpy there.

Since the structure also contains a pointer, there are now two
native versions (on 32-bit systems) as well as two compat versions
(on 64-bit systems), which unfortunately complicates the compat
handler quite a bit.

Duplicating the existing handlers for the new types is a safe
conversion for now, but unfortunately this may turn into a
maintenance burden later. A larger-scale rework of the
compat code might be a better alternative, but is out of scope
of the y2038 work.

Sparc64 needs a special case because of their special suseconds_t
definition.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-01-03 15:50:21 +01:00
Arnd Bergmann
1a6c0b36dd media: v4l2-core: fix VIDIOC_DQEVENT for time64 ABI
The v4l2_event structure contains a 'struct timespec' member that is
defined by the user space C library, creating an ABI incompatibility
when that gets updated to a 64-bit time_t.

While passing a 32-bit time_t here would be sufficient for CLOCK_MONOTONIC
timestamps, simply redefining the structure to use the kernel's
__kernel_old_timespec would not work for any library that uses a copy
of the linux/videodev2.h header file rather than including the copy from
the latest kernel headers.

This means the kernel has to be changed to handle both versions of the
structure layout on a 32-bit architecture. The easiest way to do this
is during the copy from/to user space.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-01-03 15:47:57 +01:00
Arnd Bergmann
77cdffcb0b media: v4l2: abstract timeval handling in v4l2_buffer
As a preparation for adding 64-bit time_t support in the uapi,
change the drivers to no longer care about the format of the
timestamp field in struct v4l2_buffer.

The v4l2_timeval_to_ns() function is no longer needed in the
kernel after this, but there is userspace code relying on
it to be part of the uapi header.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil-cisco@xs4all.nl: replace spaces by tabs]
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-01-03 15:43:35 +01:00
Dave Airlie
f5c547efa1 drm-misc-next for v5.6:
UAPI Changes:
 - Commandline parser: Add support for panel orientation, and per-mode options.
 - Fix IOCTL naming for dma-buf heaps.
 
 Cross-subsystem Changes:
 - Rename DMA_HEAP_IOC_ALLOC to DMA_HEAP_IOCTL_ALLOC before it becomes abi.
 - Change DMA-BUF system-heap's name to system.
 - Fix leak in error handling in dma_heap_ioctl(), and make a symbol static.
 - Fix udma-buf cpu access.
 - Fix ti devicetree bindings.
 
 Core Changes:
 - Add CTA-861-G modes with VIC >= 193.
 - Change error handling and remove bug_on in *drm_dev_init.
 - Export drm_panel_of_backlight() correctly once more.
 - Add support for lvds decoders.
 - Convert drm/client and drm/(gem-,)fb-helper to drm-device based logging and update logging todo.
 
 Driver Changes:
 - Add support for dsi/px30 to rockchip.
 - Add fb damage support to virtio.
 - Use dma_resv locking wrappers in vc4, msm, etnaviv.
 - Make functions in virtio static, and perform some simplifications.
 - Add suspend support to sun4i.
 - Add A64 mipi dsi support to sun4i.
 - Add runtime pm suspend to komeda.
 - Associated driver fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl4N6t8ACgkQ/lWMcqZw
 E8P0ww/9EEa1W1nkaYmxfWCtmBV3D6QS4490jj62RMBXETezZmLPV11xpFqTPzcw
 9vRwD7PwP+rIDPTnEcg8vIMnhDgZuUMGv93PZrFZMHxe4MHeykQ6BOj4pWEnrkr4
 CQxC0exyIG8sQkH5+OngXkPnANPpzsegAAQ2rGbUf0HxxdZ1WeV3aqlQFo2YDpd9
 c8ouYhgnIP4NfLPYnVN3NQs/hQIVJRJ9vOHr+o8k7Fn9YoFak7ry6UFsSAan4j7I
 ZQDQzPnT5CQBBSRTh9vQinOexj5bkW3AFyNFA7mknv05LHYb1kMPIIqnY01pbi2w
 SyWc5oqJwwdCFPCLZIUHZMOBKYqGKWP0KTjy7+QKx2ty+Sjgf3hTZwnVdtNVLFJe
 7WsXP6Dg+PoSsSEGZuwGOzbr7GCJitSXhUs5GGiMbdbTPzr3rJsDLuyf9/Q1ObUC
 F+yIKkcwYZogeXRShFFQ3wjAxEQ83yyuTchyagvqSoqFsT5ccUjuUqInGAbYifPS
 QfhI1U9hQGmINqXPSkQYHXxMKg+Vl2KWvFknhmLIc0Cf3fRsu+wf3NAokrHsraxd
 RINvo2U5XDhPctRYXaPjPiYtPlnikR69mhyGcd7VG81F72ECzZr/2q1NmsEMmUac
 VqowhgoG8Tm4LcZHloMw4UlCtjV2esvztc2T6b95Mg6j1r4aav0=
 =ye8f
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2020-01-02' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v5.6:

UAPI Changes:
- Commandline parser: Add support for panel orientation, and per-mode options.
- Fix IOCTL naming for dma-buf heaps.

Cross-subsystem Changes:
- Rename DMA_HEAP_IOC_ALLOC to DMA_HEAP_IOCTL_ALLOC before it becomes abi.
- Change DMA-BUF system-heap's name to system.
- Fix leak in error handling in dma_heap_ioctl(), and make a symbol static.
- Fix udma-buf cpu access.
- Fix ti devicetree bindings.

Core Changes:
- Add CTA-861-G modes with VIC >= 193.
- Change error handling and remove bug_on in *drm_dev_init.
- Export drm_panel_of_backlight() correctly once more.
- Add support for lvds decoders.
- Convert drm/client and drm/(gem-,)fb-helper to drm-device based logging and update logging todo.

Driver Changes:
- Add support for dsi/px30 to rockchip.
- Add fb damage support to virtio.
- Use dma_resv locking wrappers in vc4, msm, etnaviv.
- Make functions in virtio static, and perform some simplifications.
- Add suspend support to sun4i.
- Add A64 mipi dsi support to sun4i.
- Add runtime pm suspend to komeda.
- Associated driver fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/efc11139-1653-86bc-1b0f-0aefde219850@linux.intel.com
2020-01-03 11:43:44 +10:00
David Ahern
6b102db50c net: Add device index to tcp_md5sig
Add support for userspace to specify a device index to limit the scope
of an entry via the TCP_MD5SIG_EXT setsockopt. The existing __tcpm_pad
is renamed to tcpm_ifindex and the new field is only checked if the new
TCP_MD5SIG_FLAG_IFINDEX is set in tcpm_flags. For now, the device index
must point to an L3 master device (e.g., VRF). The API and error
handling are setup to allow the constraint to be relaxed in the future
to any device index.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-02 15:51:22 -08:00
Johannes Berg
7d6aa9ba4f Merge remote-tracking branch 'net-next/master' into mac80211-next
Merging to get the mac80211 updates that have since propagated
into net-next.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-02 14:50:58 +01:00
Sven Eckelmann
68e039f966 batman-adv: Update copyright years for 2020
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-01-01 00:00:33 +01:00
Eric Biggers
e933adde6f fscrypt: include <linux/ioctl.h> in UAPI header
<linux/fscrypt.h> defines ioctl numbers using the macros like _IOWR()
which are defined in <linux/ioctl.h>, so <linux/ioctl.h> should be
included as a prerequisite, like it is in many other kernel headers.

In practice this doesn't really matter since anyone referencing these
ioctl numbers will almost certainly include <sys/ioctl.h> too in order
to actually call ioctl().  But we might as well fix this.

Link: https://lore.kernel.org/r/20191219185624.21251-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-31 10:33:51 -06:00
Eric Biggers
93edd392ca fscrypt: support passing a keyring key to FS_IOC_ADD_ENCRYPTION_KEY
Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
specified by a Linux keyring key, rather than specified directly.

This is useful because fscrypt keys belong to a particular filesystem
instance, so they are destroyed when that filesystem is unmounted.
Usually this is desired.  But in some cases, userspace may need to
unmount and re-mount the filesystem while keeping the keys, e.g. during
a system update.  This requires keeping the keys somewhere else too.

The keys could be kept in memory in a userspace daemon.  But depending
on the security architecture and assumptions, it can be preferable to
keep them only in kernel memory, where they are unreadable by userspace.

We also can't solve this by going back to the original fscrypt API
(where for each file, the master key was looked up in the process's
keyring hierarchy) because that caused lots of problems of its own.

Therefore, add the ability for FS_IOC_ADD_ENCRYPTION_KEY to accept a
Linux keyring key.  This solves the problem by allowing userspace to (if
needed) save the keys securely in a Linux keyring for re-provisioning,
while still using the new fscrypt key management ioctls.

This is analogous to how dm-crypt accepts a Linux keyring key, but the
key is then stored internally in the dm-crypt data structures rather
than being looked up again each time the dm-crypt device is accessed.

Use a custom key type "fscrypt-provisioning" rather than one of the
existing key types such as "logon".  This is strongly desired because it
enforces that these keys are only usable for a particular purpose: for
fscrypt as input to a particular KDF.  Otherwise, the keys could also be
passed to any kernel API that accepts a "logon" key with any service
prefix, e.g. dm-crypt, UBIFS, or (recently proposed) AF_ALG.  This would
risk leaking information about the raw key despite it ostensibly being
unreadable.  Of course, this mistake has already been made for multiple
kernel APIs; but since this is a new API, let's do it right.

This patch has been tested using an xfstest which I wrote to test it.

Link: https://lore.kernel.org/r/20191119222447.226853-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-31 10:33:49 -06:00
David S. Miller
ba4028105e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Remove #ifdef pollution around nf_ingress(), from Lukas Wunner.

2) Document ingress hook in netdevice, also from Lukas.

3) Remove htons() in tunnel metadata port netlink attributes,
   from Xin Long.

4) Missing erspan netlink attribute validation also from Xin Long.

5) Missing erspan version in tunnel, from Xin Long.

6) Missing attribute nest in NFTA_TUNNEL_KEY_OPTS_{VXLAN,ERSPAN}
   Patch from Xin Long.

7) Missing nla_nest_cancel() in tunnel netlink dump path,
   from Xin Long.

8) Remove two exported conntrack symbols with no clients,
   from Florian Westphal.

9) Add nft_meta_get_eval_time() helper to nft_meta, from Florian.

10) Add nft_meta_pkttype helper for loopback, also from Florian.

11) Add nft_meta_socket uid helper, from Florian Westphal.

12) Add nft_meta_cgroup helper, from Florian.

13) Add nft_meta_ifkind helper, from Florian.

14) Group all interface related meta selector, from Florian.

15) Add nft_prandom_u32() helper, from Florian.

16) Add nft_meta_rtclassid helper, from Florian.

17) Add support for matching on the slave device index,
    from Florian.

This batch, among other things, contains updates for the netfilter
tunnel netlink interface: This extension is still incomplete and lacking
proper userspace support which is actually my fault, I did not find the
time to go back and finish this. This update is breaking tunnel UAPI in
some aspects to fix it but do it better sooner than never.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30 14:22:11 -08:00
Michal Kubecek
3d2b847fb9 ethtool: provide link state with LINKSTATE_GET request
Implement LINKSTATE_GET netlink request to get link state information.

At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command
is returned.

LINKSTATE_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
1b1b1847c8 ethtool: add LINKMODES_NTF notification
Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link
settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET
netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.

The notification message has the same format as reply to LINKMODES_GET
request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.

As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKMODES_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
bfbcfe2032 ethtool: set link modes related data with LINKMODES_SET request
Implement LINKMODES_SET netlink request to set advertised linkmodes and
related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do.

The request allows setting autonegotiation flag, speed, duplex and
advertised link modes.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
f625aa9be8 ethtool: provide link mode information with LINKMODES_GET request
Implement LINKMODES_GET netlink request to get link modes related
information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl
commands.

This request provides supported, advertised and peer advertised link modes,
autonegotiation flag, speed and duplex.

LINKMODES_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
73286734c1 ethtool: add LINKINFO_NTF notification
Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link
settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or
ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.

The notification message has the same format as reply to LINKINFO_GET
request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.

As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKINFO_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
a53f3d41e4 ethtool: set link settings with LINKINFO_SET request
Implement LINKINFO_SET netlink request to set link settings queried by
LINKINFO_GET message.

Only physical port, phy MDIO address and MDI(-X) control can be set,
attempt to modify MDI(-X) status and transceiver is rejected.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
459e0b81b3 ethtool: provide link settings with LINKINFO_GET request
Implement LINKINFO_GET netlink request to get basic link settings provided
by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands.

This request provides settings not directly related to autonegotiation and
link mode selection: physical port, phy MDIO address, MDI(-X) status,
MDI(-X) control and transceiver.

LINKINFO_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
71921690f9 ethtool: provide string sets with STRSET_GET request
Requests a contents of one or more string sets, i.e. indexed arrays of
strings; this information is provided by ETHTOOL_GSSET_INFO and
ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all
information can be retrieved with one request and mulitple string sets can
be requested at once.

There are three types of requests:

  - no NLM_F_DUMP, no device: get "global" stringsets
  - no NLM_F_DUMP, with device: get string sets related to the device
  - NLM_F_DUMP, no device: get device related string sets for all devices

Client can request either all string sets of given type (global or device
related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only
set sizes (numbers of strings) are returned.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:01 -08:00
Michal Kubecek
6b08d6c146 ethtool: support for netlink notifications
Add infrastructure for ethtool netlink notifications. There is only one
multicast group "monitor" which is used to notify userspace about changes
and actions performed. Notification messages (types using suffix _NTF)
share the format with replies to GET requests.

Notifications are supposed to be broadcasted on every configuration change,
whether it is done using the netlink interface or ioctl one. Netlink SET
requests only trigger a notification if some data is actually changed.

To trigger an ethtool notification, both ethtool netlink and external code
use ethtool_notify() helper. This helper requires RTNL to be held and may
sleep. Handlers sending messages for specific notification message types
are registered in ethnl_notify_handlers array. As notifications can be
triggered from other code, ethnl_ok flag is used to prevent an attempt to
send notification before genetlink family is registered.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:01 -08:00
Michal Kubecek
10b518d4e6 ethtool: netlink bitset handling
The ethtool netlink code uses common framework for passing arbitrary
length bit sets to allow future extensions. A bitset can be a list (only
one bitmap) or can consist of value and mask pair (used e.g. when client
want to modify only some bits). A bitset can use one of two formats:
verbose (bit by bit) or compact.

Verbose format consists of bitset size (number of bits), list flag and
an array of bit nests, telling which bits are part of the list or which
bits are in the mask and which of them are to be set. In requests, bits
can be identified by index (position) or by name. In replies, kernel
provides both index and name. Verbose format is suitable for "one shot"
applications like standard ethtool command as it avoids the need to
either keep bit names (e.g. link modes) in sync with kernel or having to
add an extra roundtrip for string set request (e.g. for private flags).

Compact format uses one (list) or two (value/mask) arrays of 32-bit
words to store the bitmap(s). It is more suitable for long running
applications (ethtool in monitor mode or network management daemons)
which can retrieve the names once and then pass only compact bitmaps to
save space.

Userspace requests can use either format; ETHTOOL_FLAG_COMPACT_BITSETS
flag in request header tells kernel which format to use in reply.
Notifications always use compact format.

As some code uses arrays of unsigned long for internal representation and
some arrays of u32 (or even a single u32), two sets of parse/compose
helpers are introduced. To avoid code duplication, helpers for unsigned
long arrays are implemented as wrappers around helpers for u32 arrays.
There are two reasons for this choice: (1) u32 arrays are more frequent in
ethtool code and (2) unsigned long array can be always interpreted as an
u32 array on little endian 64-bit and all 32-bit architectures while we
would need special handling for odd number of u32 words in the opposite
direction.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:01 -08:00
Michal Kubecek
041b1c5d4a ethtool: helper functions for netlink interface
Add common request/reply header definition and helpers to parse request
header and fill reply header. Provide ethnl_update_* helpers to update
structure members from request attributes (to be used for *_SET requests).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:01 -08:00
Michal Kubecek
2b4a8990b7 ethtool: introduce ethtool netlink interface
Basic genetlink and init infrastructure for the netlink interface, register
genetlink family "ethtool". Add CONFIG_ETHTOOL_NETLINK Kconfig option to
make the build optional. Add initial overall interface description into
Documentation/networking/ethtool-netlink.rst, further patches will add more
detailed information.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:01 -08:00
David S. Miller
2bbc078f81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-12-27

The following pull-request contains BPF updates for your *net-next* tree.

We've added 127 non-merge commits during the last 17 day(s) which contain
a total of 110 files changed, 6901 insertions(+), 2721 deletions(-).

There are three merge conflicts. Conflicts and resolution looks as follows:

1) Merge conflict in net/bpf/test_run.c:

There was a tree-wide cleanup c593642c8b ("treewide: Use sizeof_field() macro")
which gets in the way with b590cb5f80 ("bpf: Switch to offsetofend in
BPF_PROG_TEST_RUN"):

  <<<<<<< HEAD
          if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) +
                             sizeof_field(struct __sk_buff, priority),
  =======
          if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority),
  >>>>>>> 7c8dce4b16

There are a few occasions that look similar to this. Always take the chunk with
offsetofend(). Note that there is one where the fields differ in here:

  <<<<<<< HEAD
          if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) +
                             sizeof_field(struct __sk_buff, tstamp),
  =======
          if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs),
  >>>>>>> 7c8dce4b16

Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to
850a88cc40 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN").

2) Merge conflict in arch/riscv/net/bpf_jit_comp.c:

(I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.)

  <<<<<<< HEAD
          if (is_13b_check(off, insn))
                  return -1;
          emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx);
  =======
          emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx);
  >>>>>>> 7c8dce4b16

Result should look like:

          emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);

3) Merge conflict in arch/riscv/include/asm/pgtable.h:

  <<<<<<< HEAD
  =======
  #define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
  #define VMALLOC_END      (PAGE_OFFSET - 1)
  #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

  #define BPF_JIT_REGION_SIZE     (SZ_128M)
  #define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
  #define BPF_JIT_REGION_END      (VMALLOC_END)

  /*
   * Roughly size the vmemmap space to be large enough to fit enough
   * struct pages to map half the virtual address space. Then
   * position vmemmap directly below the VMALLOC region.
   */
  #define VMEMMAP_SHIFT \
          (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
  #define VMEMMAP_SIZE    BIT(VMEMMAP_SHIFT)
  #define VMEMMAP_END     (VMALLOC_START - 1)
  #define VMEMMAP_START   (VMALLOC_START - VMEMMAP_SIZE)

  #define vmemmap         ((struct page *)VMEMMAP_START)

  >>>>>>> 7c8dce4b16

Only take the BPF_* defines from there and move them higher up in the
same file. Remove the rest from the chunk. The VMALLOC_* etc defines
got moved via 01f52e16b8 ("riscv: define vmemmap before pfn_to_page
calls"). Result:

  [...]
  #define __S101  PAGE_READ_EXEC
  #define __S110  PAGE_SHARED_EXEC
  #define __S111  PAGE_SHARED_EXEC

  #define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
  #define VMALLOC_END      (PAGE_OFFSET - 1)
  #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

  #define BPF_JIT_REGION_SIZE     (SZ_128M)
  #define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
  #define BPF_JIT_REGION_END      (VMALLOC_END)

  /*
   * Roughly size the vmemmap space to be large enough to fit enough
   * struct pages to map half the virtual address space. Then
   * position vmemmap directly below the VMALLOC region.
   */
  #define VMEMMAP_SHIFT \
          (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
  #define VMEMMAP_SIZE    BIT(VMEMMAP_SHIFT)
  #define VMEMMAP_END     (VMALLOC_START - 1)
  #define VMEMMAP_START   (VMALLOC_START - VMEMMAP_SIZE)

  [...]

Let me know if there are any other issues.

Anyway, the main changes are:

1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific
   to a provided BPF object file. This provides an alternative, simplified API
   compared to standard libbpf interaction. Also, add libbpf extern variable
   resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko.

2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by
   generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also,
   add various BPF riscv JIT improvements, from Björn Töpel.

3) Extend bpftool to allow matching BPF programs and maps by name,
   from Paul Chaignon.

4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI
   flag for allowing updates without service interruption, from Andrey Ignatov.

5) Cleanup and simplification of ring access functions for AF_XDP with a
   bonus of 0-5% performance improvement, from Magnus Karlsson.

6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of
   audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa.

7) Move and extend test_select_reuseport into BPF program tests under
   BPF selftests, from Jakub Sitnicki.

8) Various BPF sample improvements for xdpsock for customizing parameters
   to set up and benchmark AF_XDP, from Jay Jayatheerthan.

9) Improve libbpf to provide a ulimit hint on permission denied errors.
   Also change XDP sample programs to attach in driver mode by default,
   from Toke Høiland-Jørgensen.

10) Extend BPF test infrastructure to allow changing skb mark from tc BPF
    programs, from Nikita V. Shirokov.

11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King.

12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after
    libbpf conversion, from Jesper Dangaard Brouer.

13) Minor misc improvements from various others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 14:20:10 -08:00
Andy Roulin
c1e4699026 bonding: rename AD_STATE_* to LACP_STATE_*
As the LACP actor/partner state is now part of the uapi, rename the
3ad state defines with LACP prefix. The LACP prefix is preferred over
BOND_3AD as the LACP standard moved to 802.1AX.

Fixes: 826f66b30c ("bonding: move 802.3ad port state flags to uapi")
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-26 13:09:37 -08:00
Florian Westphal
c14ceb0ec7 netfilter: nft_meta: add support for slave device ifindex matching
Allow to match on vrf slave ifindex or name.

In case there was no slave interface involved, store 0 in the
destination register just like existing iif/oif matching.

sdif(name) is restricted to the ipv4/ipv6 input and forward hooks,
as it depends on ip(6) stack parsing/storing info in skb->cb[].

Cc: Martin Willi <martin@strongswan.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Shrijeet Mukherjee <shrijeet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26 17:41:34 +01:00
Richard Cochran
b6fd7b9636 net: Introduce peer to peer one step PTP time stamping.
The 1588 standard defines one step operation for both Sync and
PDelay_Resp messages.  Up until now, hardware with P2P one step has
been rare, and kernel support was lacking.  This patch adds support of
the mode in anticipation of new hardware developments.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25 19:51:34 -08:00
Martin Varghese
f66b53fdbb openvswitch: New MPLS actions for layer 2 tunnelling
The existing PUSH MPLS action inserts MPLS header between ethernet header
and the IP header. Though this behaviour is fine for L3 VPN where an IP
packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
encapsulate the ethernet packet.

The new mpls action ADD_MPLS inserts MPLS header at the start of the
packet or at the start of the l3 header depending on the value of l3 tunnel
flag in the ADD_MPLS arguments.

POP_MPLS action is extended to support ethertype 0x6558.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24 22:24:45 -08:00
Greg Kroah-Hartman
398d999f96 Merge 5.5-rc3 into staging-next
We need the staging fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-23 07:00:09 -05:00
David S. Miller
ac80010fc9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Mere overlapping changes in the conflicts here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-22 15:15:05 -08:00
Linus Torvalds
78bac77b52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
    including adding a missing ipv6 match description.

 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
    Bhat.

 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.

 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.

 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.

 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
    Chaignon.

 7) Multicast MAC limit test is off by one in qede, from Manish Chopra.

 8) Fix established socket lookup race when socket goes from
    TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
    RCU grace period. From Eric Dumazet.

 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.

10) Fix active backup transition after link failure in bonding, from
    Mahesh Bandewar.

11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.

12) Fix wrong interface passed to ->mac_link_up(), from Russell King.

13) Fix DSA egress flooding settings in b53, from Florian Fainelli.

14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.

15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.

16) Reject invalid MTU values in stmmac, from Jose Abreu.

17) Fix refcount leak in error path of u32 classifier, from Davide
    Caratti.

18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
    Kaseorg.

19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.

20) Disable hardware GRO when XDP is attached to qede, frm Manish
    Chopra.

21) Since we encode state in the low pointer bits, dst metrics must be
    at least 4 byte aligned, which is not necessarily true on m68k. Add
    annotations to fix this, from Geert Uytterhoeven.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
  sfc: Include XDP packet headroom in buffer step size.
  sfc: fix channel allocation with brute force
  net: dst: Force 4-byte alignment of dst_metrics
  selftests: pmtu: fix init mtu value in description
  hv_netvsc: Fix unwanted rx_table reset
  net: phy: ensure that phy IDs are correctly typed
  mod_devicetable: fix PHY module format
  qede: Disable hardware gro when xdp prog is installed
  net: ena: fix issues in setting interrupt moderation params in ethtool
  net: ena: fix default tx interrupt moderation interval
  net/smc: unregister ib devices in reboot_event
  net: stmmac: platform: Fix MDIO init for platforms without PHY
  llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
  net: hisilicon: Fix a BUG trigered by wrong bytes_compl
  net: dsa: ksz: use common define for tag len
  s390/qeth: don't return -ENOTSUPP to userspace
  s390/qeth: fix promiscuous mode after reset
  s390/qeth: handle error due to unsupported transport mode
  cxgb4: fix refcount init for TC-MQPRIO offload
  tc-testing: initial tdc selftests for cls_u32
  ...
2019-12-22 09:54:33 -08:00