Switch most of BPF helper definitions from returning int to long. These
definitions are coming from comments in BPF UAPI header and are used to
generate bpf_helper_defs.h (under libbpf) to be later included and used from
BPF programs.
In actual in-kernel implementation, all the helpers are defined as returning
u64, but due to some historical reasons, most of them are actually defined as
returning int in UAPI (usually, to return 0 on success, and negative value on
error).
This actually causes Clang to quite often generate sub-optimal code, because
compiler believes that return value is 32-bit, and in a lot of cases has to be
up-converted (usually with a pair of 32-bit bit shifts) to 64-bit values,
before they can be used further in BPF code.
Besides just "polluting" the code, these 32-bit shifts quite often cause
problems for cases in which return value matters. This is especially the case
for the family of bpf_probe_read_str() functions. There are few other similar
helpers (e.g., bpf_read_branch_records()), in which return value is used by
BPF program logic to record variable-length data and process it. For such
cases, BPF program logic carefully manages offsets within some array or map to
read variable-length data. For such uses, it's crucial for BPF verifier to
track possible range of register values to prove that all the accesses happen
within given memory bounds. Those extraneous zero-extending bit shifts,
inserted by Clang (and quite often interleaved with other code, which makes
the issues even more challenging and sometimes requires employing extra
per-variable compiler barriers), throws off verifier logic and makes it mark
registers as having unknown variable offset. We'll study this pattern a bit
later below.
Another common pattern is to check return of BPF helper for non-zero state to
detect error conditions and attempt alternative actions in such case. Even in
this simple and straightforward case, this 32-bit vs BPF's native 64-bit mode
quite often leads to sub-optimal and unnecessary extra code. We'll look at
this pattern as well.
Clang's BPF target supports two modes of code generation: ALU32, in which it
is capable of using lower 32-bit parts of registers, and no-ALU32, in which
only full 64-bit registers are being used. ALU32 mode somewhat mitigates the
above described problems, but not in all cases.
This patch switches all the cases in which BPF helpers return 0 or negative
error from returning int to returning long. It is shown below that such change
in definition leads to equivalent or better code. No-ALU32 mode benefits more,
but ALU32 mode doesn't degrade or still gets improved code generation.
Another class of cases switched from int to long are bpf_probe_read_str()-like
helpers, which encode successful case as non-negative values, while still
returning negative value for errors.
In all of such cases, correctness is preserved due to two's complement
encoding of negative values and the fact that all helpers return values with
32-bit absolute value. Two's complement ensures that for negative values
higher 32 bits are all ones and when truncated, leave valid negative 32-bit
value with the same value. Non-negative values have upper 32 bits set to zero
and similarly preserve value when high 32 bits are truncated. This means that
just casting to int/u32 is correct and efficient (and in ALU32 mode doesn't
require any extra shifts).
To minimize the chances of regressions, two code patterns were investigated,
as mentioned above. For both patterns, BPF assembly was analyzed in
ALU32/NO-ALU32 compiler modes, both with current 32-bit int return type and
new 64-bit long return type.
Case 1. Variable-length data reading and concatenation. This is quite
ubiquitous pattern in tracing/monitoring applications, reading data like
process's environment variables, file path, etc. In such case, many pieces of
string-like variable-length data are read into a single big buffer, and at the
end of the process, only a part of array containing actual data is sent to
user-space for further processing. This case is tested in test_varlen.c
selftest (in the next patch). Code flow is roughly as follows:
void *payload = &sample->payload;
u64 len;
len = bpf_probe_read_kernel_str(payload, MAX_SZ1, &source_data1);
if (len <= MAX_SZ1) {
payload += len;
sample->len1 = len;
}
len = bpf_probe_read_kernel_str(payload, MAX_SZ2, &source_data2);
if (len <= MAX_SZ2) {
payload += len;
sample->len2 = len;
}
/* and so on */
sample->total_len = payload - &sample->payload;
/* send over, e.g., perf buffer */
There could be two variations with slightly different code generated: when len
is 64-bit integer and when it is 32-bit integer. Both variations were analysed.
BPF assembly instructions between two successive invocations of
bpf_probe_read_kernel_str() were used to check code regressions. Results are
below, followed by short analysis. Left side is using helpers with int return
type, the right one is after the switch to long.
ALU32 + INT ALU32 + LONG
=========== ============
64-BIT (13 insns): 64-BIT (10 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: if w0 > 256 goto +9 <LBB0_4> 18: if r0 > 256 goto +6 <LBB0_4>
19: w1 = w0 19: r1 = 0 ll
20: r1 <<= 32 21: *(u64 *)(r1 + 0) = r0
21: r1 s>>= 32 22: r6 = 0 ll
22: r2 = 0 ll 24: r6 += r0
24: *(u64 *)(r2 + 0) = r1 00000000000000c8 <LBB0_4>:
25: r6 = 0 ll 25: r1 = r6
27: r6 += r1 26: w2 = 256
00000000000000e0 <LBB0_4>: 27: r3 = 0 ll
28: r1 = r6 29: call 115
29: w2 = 256
30: r3 = 0 ll
32: call 115
32-BIT (11 insns): 32-BIT (12 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: if w0 > 256 goto +7 <LBB1_4> 18: if w0 > 256 goto +8 <LBB1_4>
19: r1 = 0 ll 19: r1 = 0 ll
21: *(u32 *)(r1 + 0) = r0 21: *(u32 *)(r1 + 0) = r0
22: w1 = w0 22: r0 <<= 32
23: r6 = 0 ll 23: r0 >>= 32
25: r6 += r1 24: r6 = 0 ll
00000000000000d0 <LBB1_4>: 26: r6 += r0
26: r1 = r6 00000000000000d8 <LBB1_4>:
27: w2 = 256 27: r1 = r6
28: r3 = 0 ll 28: w2 = 256
30: call 115 29: r3 = 0 ll
31: call 115
In ALU32 mode, the variant using 64-bit length variable clearly wins and
avoids unnecessary zero-extension bit shifts. In practice, this is even more
important and good, because BPF code won't need to do extra checks to "prove"
that payload/len are within good bounds.
32-bit len is one instruction longer. Clang decided to do 64-to-32 casting
with two bit shifts, instead of equivalent `w1 = w0` assignment. The former
uses extra register. The latter might potentially lose some range information,
but not for 32-bit value. So in this case, verifier infers that r0 is [0, 256]
after check at 18:, and shifting 32 bits left/right keeps that range intact.
We should probably look into Clang's logic and see why it chooses bitshifts
over sub-register assignments for this.
NO-ALU32 + INT NO-ALU32 + LONG
============== ===============
64-BIT (14 insns): 64-BIT (10 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: r0 <<= 32 18: if r0 > 256 goto +6 <LBB0_4>
19: r1 = r0 19: r1 = 0 ll
20: r1 >>= 32 21: *(u64 *)(r1 + 0) = r0
21: if r1 > 256 goto +7 <LBB0_4> 22: r6 = 0 ll
22: r0 s>>= 32 24: r6 += r0
23: r1 = 0 ll 00000000000000c8 <LBB0_4>:
25: *(u64 *)(r1 + 0) = r0 25: r1 = r6
26: r6 = 0 ll 26: r2 = 256
28: r6 += r0 27: r3 = 0 ll
00000000000000e8 <LBB0_4>: 29: call 115
29: r1 = r6
30: r2 = 256
31: r3 = 0 ll
33: call 115
32-BIT (13 insns): 32-BIT (13 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: r1 = r0 18: r1 = r0
19: r1 <<= 32 19: r1 <<= 32
20: r1 >>= 32 20: r1 >>= 32
21: if r1 > 256 goto +6 <LBB1_4> 21: if r1 > 256 goto +6 <LBB1_4>
22: r2 = 0 ll 22: r2 = 0 ll
24: *(u32 *)(r2 + 0) = r0 24: *(u32 *)(r2 + 0) = r0
25: r6 = 0 ll 25: r6 = 0 ll
27: r6 += r1 27: r6 += r1
00000000000000e0 <LBB1_4>: 00000000000000e0 <LBB1_4>:
28: r1 = r6 28: r1 = r6
29: r2 = 256 29: r2 = 256
30: r3 = 0 ll 30: r3 = 0 ll
32: call 115 32: call 115
In NO-ALU32 mode, for the case of 64-bit len variable, Clang generates much
superior code, as expected, eliminating unnecessary bit shifts. For 32-bit
len, code is identical.
So overall, only ALU-32 32-bit len case is more-or-less equivalent and the
difference stems from internal Clang decision, rather than compiler lacking
enough information about types.
Case 2. Let's look at the simpler case of checking return result of BPF helper
for errors. The code is very simple:
long bla;
if (bpf_probe_read_kenerl(&bla, sizeof(bla), 0))
return 1;
else
return 0;
ALU32 + CHECK (9 insns) ALU32 + CHECK (9 insns)
==================================== ====================================
0: r1 = r10 0: r1 = r10
1: r1 += -8 1: r1 += -8
2: w2 = 8 2: w2 = 8
3: r3 = 0 3: r3 = 0
4: call 113 4: call 113
5: w1 = w0 5: r1 = r0
6: w0 = 1 6: w0 = 1
7: if w1 != 0 goto +1 <LBB2_2> 7: if r1 != 0 goto +1 <LBB2_2>
8: w0 = 0 8: w0 = 0
0000000000000048 <LBB2_2>: 0000000000000048 <LBB2_2>:
9: exit 9: exit
Almost identical code, the only difference is the use of full register
assignment (r1 = r0) vs half-registers (w1 = w0) in instruction #5. On 32-bit
architectures, new BPF assembly might be slightly less optimal, in theory. But
one can argue that's not a big issue, given that use of full registers is
still prevalent (e.g., for parameter passing).
NO-ALU32 + CHECK (11 insns) NO-ALU32 + CHECK (9 insns)
==================================== ====================================
0: r1 = r10 0: r1 = r10
1: r1 += -8 1: r1 += -8
2: r2 = 8 2: r2 = 8
3: r3 = 0 3: r3 = 0
4: call 113 4: call 113
5: r1 = r0 5: r1 = r0
6: r1 <<= 32 6: r0 = 1
7: r1 >>= 32 7: if r1 != 0 goto +1 <LBB2_2>
8: r0 = 1 8: r0 = 0
9: if r1 != 0 goto +1 <LBB2_2> 0000000000000048 <LBB2_2>:
10: r0 = 0 9: exit
0000000000000058 <LBB2_2>:
11: exit
NO-ALU32 is a clear improvement, getting rid of unnecessary zero-extension bit
shifts.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-1-andriin@fb.com
Currently the MRP_PORT_ROLE_NONE has the value 0x2 but this is in conflict
with the IEC 62439-2 standard. The standard defines the following port
roles: primary (0x0), secondary(0x1), interconnect(0x2).
Therefore remove the port role none.
Fixes: 4714d13791 ("bridge: uapi: mrp: Add mrp attributes.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keepalived can set global static ip routes or virtual ip routes dynamically
following VRRP protocol states. Using a dedicated rtm_protocol will help
keeping track of it.
Changes in v2:
- fix tab/space indenting
Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the V4L2_FMT_FLAG_ENC_CAP_FRAME_INTERVAL flag to signal that
the coded frame interval can be set separately from the raw frame
interval for stateful encoders.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Michael Tretter <m.tretter@pengutronix.de>
Acked-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
This patch lets user-space to request a non-consistent memory
allocation during CREATE_BUFS and REQBUFS ioctl calls.
= CREATE_BUFS
struct v4l2_create_buffers has seven 4-byte reserved areas,
so reserved[0] is renamed to ->flags. The struct, thus, now
has six reserved 4-byte regions.
= CREATE_BUFS32
struct v4l2_create_buffers32 has seven 4-byte reserved areas,
so reserved[0] is renamed to ->flags. The struct, thus, now
has six reserved 4-byte regions.
= REQBUFS
We use one bit of a ->reserved[1] member of struct v4l2_requestbuffers,
which is now renamed to ->flags. Unlike v4l2_create_buffers, struct
v4l2_requestbuffers does not have enough reserved room. Therefore for
backward compatibility ->reserved and ->flags were put into anonymous
union.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
By setting or clearing V4L2_FLAG_MEMORY_NON_CONSISTENT flag
user-space should be able to set or clear queue's NON_CONSISTENT
->dma_attrs. Queue's ->dma_attrs are passed to the underlying
allocator in __vb2_buf_mem_alloc(), so thus user-space is able
to request vb2 buffer's memory to be either consistent (coherent)
or non-consistent.
The patch set also adds a corresponding capability flag:
fill_buf_caps() reports V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
when queue supports user-space cache management hints. Note,
however, that MMAP_CACHE_HINTS capability only valid when the
queue is used for memory MMAP-ed streaming I/O.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
DIAGNOSE 0x318 (diag318) sets information regarding the environment
the VM is running in (Linux, z/VM, etc) and is observed via
firmware/service events.
This is a privileged s390x instruction that must be intercepted by
SIE. Userspace handles the instruction as well as migration. Data
is communicated via VCPU register synchronization.
The Control Program Name Code (CPNC) is stored in the SIE block. The
CPNC along with the Control Program Version Code (CPVC) are stored
in the kvm_vcpu_arch struct.
This data is reset on load normal and clear resets.
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200622154636.5499-3-walling@linux.ibm.com
[borntraeger@de.ibm.com: fix sync_reg position]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Board serial number is a serial number, often available in PCI
*Vital Product Data*.
Also, update devlink-info.rst documentation file.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI PF and VF devlink port can manage the function represented by
a devlink port.
Enable users to query port function's hardware address.
Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
function:
hw_addr 00:11:22:33:44:66
$ devlink port show pci/0000:06:00.0/2 -jp
{
"port": {
"pci/0000:06:00.0/2": {
"type": "eth",
"netdev": "enp6s0pf0vf1",
"flavour": "pcivf",
"pfnum": 0,
"vfnum": 1,
"function": {
"hw_addr": "00:11:22:33:44:66"
}
}
}
}
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quite a lot of fixes here for no single reason. There's a collection of
the usual sort of device specific fixes and also a bunch of people have
been working on spidev and the userspace test program spidev_test so
they've got an unusually large collection of small fixes.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl7wl/8THGJyb29uaWVA
a2VybmVsLm9yZwAKCRAk1otyXVSH0NJBB/4oTOxPIfkKQyPcSTQB8FC6AHJ5Z8FT
/4dtK4U32rn5Sms/CHNXMMFxomBgB46Ud19KJ/bEgqugJfI9Vl0bl9fZT+sDyVU3
rFPthVluIprFe1bITyfcgYmjRoznLRdzj9LMFsAWIe3Vmy584TdvXbl8n5COu5Wh
6sR6SuQnEn65x1HC4++3IsG70+ogz81sFRapk/7jXwS+YdwzWIqtN2s2IlUcTmcy
ylijl04XUSyo6e/v/sjZaaRmIz72mW+HbMKW4iC0mGZ6yuACoy7zop9BuwAg6S7z
dFDyqTZ5gq72ZeJc7fuco1y7jqQoKUC1oTRghPlbq2XG3Huta80l415l
=32lr
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"Quite a lot of fixes here for no single reason.
There's a collection of the usual sort of device specific fixes and
also a bunch of people have been working on spidev and the userspace
test program spidev_test so they've got an unusually large collection
of small fixes"
* tag 'spi-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spidev: fix a potential use-after-free in spidev_release()
spi: spidev: fix a race between spidev_release and spidev_remove
spi: stm32-qspi: Fix error path in case of -EPROBE_DEFER
spi: uapi: spidev: Use TABs for alignment
spi: spi-fsl-dspi: Free DMA memory with matching function
spi: tools: Add macro definitions to fix build errors
spi: tools: Make default_tx/rx and input_tx static
spi: dt-bindings: amlogic, meson-gx-spicc: Fix schema for meson-g12a
spi: rspi: Use requested instead of maximum bit rate
spi: spidev_test: Use %u to format unsigned numbers
spi: sprd: switch the sequence of setting WDG_LOAD_LOW and _HIGH
poll events should be 32-bits to cover EPOLLEXCLUSIVE.
Explicit word-swap the poll32_events for big endian to make sure the ABI
is not changed. We call this feature IORING_FEAT_POLL_32BITS,
applications who want to use EPOLLEXCLUSIVE should check the feature bit
first.
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Fix the visibility of the region 'align' attribute. The new unit tests
for region alignment handling caught a corner case where the alignment
cannot be specified if the region is converted from static to dynamic
provisioning at runtime.
- Add support for device health retrieval for the persistent memory
supported by the papr_scm driver. This includes both the standard
sysfs "health flags" that the nfit persistent memory driver publishes
and a mechanism for the ndctl tool to retrieve a health-command payload.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl7tMD8ACgkQHtKRamZ9
iAJNdA//aoULVhq/ZEo+tiPPXwgC9yHPJJ+Oe3+pcV2saWNeSBuHu047IdSQayRi
6VZgsQ3lLRlEmk5aUbK00T84tVtwN20fppw1H4XyqfyGg4hmW7O5HomY8ZyOAETc
0lEJet3+jf4Ym36BxBvfFTU9fOwtvLum5EHUw3Dc1Xl7UZVtp2zb0lklZx5xU/xr
/nKNkPktvj7ENtmIOKegPFC/eSeKtOmD9shzjFygLeG52ZU1n+TEvhm4LCUU7Z+Y
e5s/Ny3LUWzPY+bESXxTI1XpLthWOR6T+yPkNp6JoZHzk3P6vKawMGzRrvI1bUai
hOj7Bf6BeNnKqIXIIZOXO8/ISgveHbnvUp6Kk5Lq5OYzPuGn81zZhen3EfHkB0y2
Dc+xyceDsTeTlu2RGcrGrV60hCo2+5DqSLodu+kUAX6btvhtdrEgt3kmWAZSctk2
7gHy8H+54p5ocXS+eDqafX6qYGmQIdGbtfTtiTuZJKIIH+Dsh9+mcu3KtH1AwtpN
bV+/sbjuWOMmMBNnlVRtqhy1xA9bSVSaodthl+iWpw1nKXFPSKdl3w8ngSeZsfFD
eZnGa4hO5G02s96gozh1n61ye8KFJLk8gCjx100PQKCemvIWTWLIzlh2NjPnCdTx
SAI0vNfJgpmNZfb62NHYAGFwNwvOn4b7j9xFSqtHVh7+Vuyt6JM=
=qVy4
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"A feature (papr_scm health retrieval) and a fix (sysfs attribute
visibility) for v5.8.
Vaibhav explains in the merge commit below why missing v5.8 would be
painful and I agreed to try a -rc2 pull because only cosmetics kept
this out of -rc1 and his initial versions were posted in more than
enough time for v5.8 consideration:
'These patches are tied to specific features that were committed to
customers in upcoming distros releases (RHEL and SLES) whose
time-lines are tied to 5.8 kernel release.
Being able to track the health of an nvdimm is critical for our
customers that are running workloads leveraging papr-scm nvdimms.
Missing the 5.8 kernel would mean missing the distro timelines and
shifting forward the availability of this feature in distro kernels
by at least 6 months'
Summary:
- Fix the visibility of the region 'align' attribute.
The new unit tests for region alignment handling caught a corner
case where the alignment cannot be specified if the region is
converted from static to dynamic provisioning at runtime.
- Add support for device health retrieval for the persistent memory
supported by the papr_scm driver.
This includes both the standard sysfs "health flags" that the nfit
persistent memory driver publishes and a mechanism for the ndctl
tool to retrieve a health-command payload"
* tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm/region: always show the 'align' attribute
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
powerpc/papr_scm: Fetch nvdimm health information from PHYP
seq_buf: Export seq_buf_printf
powerpc: Document details on H_SCM_HEALTH hcall
ID 1 is already used by the IOVA range capability, use ID 2.
Reported-by: Liu Yi L <yi.l.liu@intel.com>
Fixes: ad721705d0 ("vfio iommu: Add migration capability to report supported features")
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Replace hardcoded maximum USB string length (126 bytes) by definition
"USB_MAX_STRING_LEN".
Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/1592471618-29428-1-git-send-email-macpaul.lin@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several newer USB Device classes are not presently reported individually at
/sys/kernel/debug/usb/devices, (reported as "unk."). This patch adds the
following classes: 0fh (Personal Healthcare devices), 10h (USB Type-C combined
Audio/Video devices) 11h (USB billboard), 12h (USB Type-C Bridge). As defined
at [https://www.usb.org/defined-class-codes]
Corresponding classes defined in include/linux/usb/ch9.h.
Signed-off-by: Rob Gill <rrobgill@protonmail.com>
Link: https://lore.kernel.org/r/20200601211749.6878-1-rrobgill@protonmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2020-06-17
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 2 day(s) which contain
a total of 14 files changed, 158 insertions(+), 59 deletions(-).
The main changes are:
1) Important fix for bpf_probe_read_kernel_str() return value, from Andrii.
2) [gs]etsockopt fix for large optlen, from Stanislav.
3) devmap allocation fix, from Toke.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the use-cases of close_range() is to drop file descriptors just before
execve(). This would usually be expressed in the sequence:
unshare(CLONE_FILES);
close_range(3, ~0U);
as pointed out by Linus it might be desirable to have this be a part of
close_range() itself under a new flag CLOSE_RANGE_UNSHARE.
This expands {dup,unshare)_fd() to take a max_fds argument that indicates the
maximum number of file descriptors to copy from the old struct files. When the
user requests that all file descriptors are supposed to be closed via
close_range(min, max) then we can cap via unshare_fd(min) and hence don't need
to do any of the heavy fput() work for everything above min.
The patch makes it so that if CLOSE_RANGE_UNSHARE is requested and we do in
fact currently share our file descriptor table we create a new private copy.
We then close all fds in the requested range and finally after we're done we
install the new fd table.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a 'struct nd_pkg_pdsm' and a maximal union named
'nd_pdsm_payload'. These new structs together with 'struct nd_cmd_pkg'
for a pdsm envelop thats sent by libndctl to libnvdimm and serviced by
papr_scm in 'papr_scm_service_pdsm()'. The PDSM request is
communicated by member 'struct nd_cmd_pkg.nd_command' together with
other information on the pdsm payload (size-in, size-out).
The patch also introduces 'struct pdsm_cmd_desc' instances of which
are stored in an array __pdsm_cmd_descriptors[] indexed with PDSM cmd
and corresponding access function pdsm_cmd_desc() is
introduced. 'struct pdsm_cdm_desc' holds the service function for a
given PDSM and corresponding payload in/out sizes.
A new function papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm. The function performs validation on the PDSM
payload based on info present in corresponding PDSM descriptor and if
valid calls the 'struct pdcm_cmd_desc.service' function to service the
PDSM.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20200615124407.32596-6-vaibhav@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fix definition of bpf_ringbuf_output() in UAPI header comments, which is used
to generate libbpf's bpf_helper_defs.h header. Return value is a number (error
code), not a pointer.
Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200615214926.3638836-1-andriin@fb.com
includes the per-inode DAX support, which was dependant on the DAX
infrastructure which came in via the XFS tree, and a number of
regression and bug fixes; most notably the "BUG: using
smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
by syzkaller.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl7mgCcACgkQ8vlZVpUN
gaPftwf8C4w/7SG+CYLdwg0d2u9TKk77yDuWaioFHOcMSjZvG4TCSgtMhZxQnyty
9t4yqacILx12pCj/mZnrZp5BOSn9O2ZbuDoXNKNrFXU0BF+CsbnhvJvrrh1j/MUa
PPtcqyGFdOLSDvHSD9xPVT76juwh79aR8vB7qnQXaEO5wcLodZWoqBEFSKCl6Bo8
hjXs1EXidusKsoarQxW6mEITmnhU2S2fuCVDgVcoM/LmKwzbgqvlWrentq9u8qLH
W+XbjWgUtCM1byeDZWqe5FYyyJ8x+dTv7H5an3KR92EN6hKo5AOvzA0I41pZscq/
bJ9p2THDxJQX4rJBevGAS5mZ6hTkRw==
=z6eO
-----END PGP SIGNATURE-----
Merge tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull more ext4 updates from Ted Ts'o:
"This is the second round of ext4 commits for 5.8 merge window [1].
It includes the per-inode DAX support, which was dependant on the DAX
infrastructure which came in via the XFS tree, and a number of
regression and bug fixes; most notably the "BUG: using
smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
by syzkaller"
[1] The pull request actually came in 15 minutes after I had tagged the
rc1 release. Tssk, tssk, late.. - Linus
* tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
ext4: support xattr gnu.* namespace for the Hurd
ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr
ext4: avoid utf8_strncasecmp() with unstable name
ext4: stop overwrite the errcode in ext4_setup_super
ext4: fix partial cluster initialization when splitting extent
ext4: avoid race conditions when remounting with options that change dax
Documentation/dax: Update DAX enablement for ext4
fs/ext4: Introduce DAX inode flag
fs/ext4: Remove jflag variable
fs/ext4: Make DAX mount option a tri-state
fs/ext4: Only change S_DAX on inode load
fs/ext4: Update ext4_should_use_dax()
fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS
fs/ext4: Disallow verity if inode is DAX
fs/ext4: Narrow scope of DAX check in setflags
The UAPI <linux/spi/spidev.h> uses TABs for alignment.
Convert the recently introduced spaces to TABs to restore consistency.
Fixes: 7bb64402a0 ("spi: tools: Add macro definitions to fix build errors")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20200613073755.15906-1-geert+renesas@glider.be
Signed-off-by: Mark Brown <broonie@kernel.org>
Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-8-adrian.hunter@intel.com
Symbols are needed for tools to describe instruction addresses. Pages
allocated for kprobe's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-5-adrian.hunter@intel.com
Record (single instruction) changes to the kernel text (i.e.
self-modifying code) in order to support tracers like Intel PT and
ARM CoreSight.
A copy of the running kernel code is needed as a reference point (e.g.
from /proc/kcore). The text poke event records the old bytes and the
new bytes so that the event can be processed forwards or backwards.
The basic problem is recording the modified instruction in an
unambiguous manner given SMP instruction cache (in)coherence. That is,
when modifying an instruction concurrently any solution with one or
multiple timestamps is not sufficient:
CPU0 CPU1
0
1 write insn A
2 execute insn A
3 sync-I$
4
Due to I$, CPU1 might execute either the old or new A. No matter where
we record tracepoints on CPU0, one simply cannot tell what CPU1 will
have observed, except that at 0 it must be the old one and at 4 it
must be the new one.
To solve this, take inspiration from x86 text poking, which has to
solve this exact problem due to variable length instruction encoding
and I-fetch windows.
1) overwrite the instruction with a breakpoint and sync I$
This guarantees that that code flow will never hit the target
instruction anymore, on any CPU (or rather, it will cause an
exception).
2) issue the TEXT_POKE event
3) overwrite the breakpoint with the new instruction and sync I$
Now we know that any execution after the TEXT_POKE event will either
observe the breakpoint (and hit the exception) or the new instruction.
So by guarding the TEXT_POKE event with an exception on either side;
we can now tell, without doubt, which instruction another CPU will
have observed.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-2-adrian.hunter@intel.com
Pull networking fixes from David Miller:
1) Fix cfg80211 deadlock, from Johannes Berg.
2) RXRPC fails to send norigications, from David Howells.
3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from
Geliang Tang.
4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu.
5) The ucc_geth driver needs __netdev_watchdog_up exported, from
Valentin Longchamp.
6) Fix hashtable memory leak in dccp, from Wang Hai.
7) Fix how nexthops are marked as FDB nexthops, from David Ahern.
8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni.
9) Fix crashes in tipc_disc_rcv(), from Tuong Lien.
10) Fix link speed reporting in iavf driver, from Brett Creeley.
11) When a channel is used for XSK and then reused again later for XSK,
we forget to clear out the relevant data structures in mlx5 which
causes all kinds of problems. Fix from Maxim Mikityanskiy.
12) Fix memory leak in genetlink, from Cong Wang.
13) Disallow sockmap attachments to UDP sockets, it simply won't work.
From Lorenz Bauer.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
net: ethernet: ti: ale: fix allmulti for nu type ale
net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init
net: atm: Remove the error message according to the atomic context
bpf: Undo internal BPF_PROBE_MEM in BPF insns dump
libbpf: Support pre-initializing .bss global variables
tools/bpftool: Fix skeleton codegen
bpf: Fix memlock accounting for sock_hash
bpf: sockmap: Don't attach programs to UDP sockets
bpf: tcp: Recv() should return 0 when the peer socket is closed
ibmvnic: Flush existing work items before device removal
genetlink: clean up family attributes allocations
net: ipa: header pad field only valid for AP->modem endpoint
net: ipa: program upper nibbles of sequencer type
net: ipa: fix modem LAN RX endpoint id
net: ipa: program metadata mask differently
ionic: add pcie_print_link_status
rxrpc: Fix race between incoming ACK parser and retransmitter
net/mlx5: E-Switch, Fix some error pointer dereferences
net/mlx5: Don't fail driver on failure to create debugfs
net/mlx5e: CT: Fix ipv6 nat header rewrite actions
...
Alexei Starovoitov says:
====================
pull-request: bpf 2020-06-12
The following pull-request contains BPF updates for your *net* tree.
We've added 26 non-merge commits during the last 10 day(s) which contain
a total of 27 files changed, 348 insertions(+), 93 deletions(-).
The main changes are:
1) sock_hash accounting fix, from Andrey.
2) libbpf fix and probe_mem sanitizing, from Andrii.
3) sock_hash fixes, from Jakub.
4) devmap_val fix, from Jesper.
5) load_bytes_relative fix, from YiFei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7U/i8ACgkQ+7dXa6fL
C2u2eg/+Oy6ybq0hPovYVkFI9WIG7ZCz7w9Q6BEnfYMqqn3dnfJxKQ3l4pnQEOWw
f4QfvpvevsYfMtOJkYcG6s66rQgbFdqc5TEyBBy0QNp3acRolN7IXkcopvv9xOpQ
JxedpbFG1PTFLWjvBpyjlrUPouwLzq2FXAf1Ox0ZIMw6165mYOMWoli1VL8dh0A0
Ai7JUB0WrvTNbrwhV413obIzXT/rPCdcrgbQcgrrLPex8lQ47ZAE9bq6k4q5HiwK
KRzEqkQgnzId6cCNTFBfkTWsx89zZunz7jkfM5yx30MvdAtPSxvvpfIPdZRZkXsP
E2K9Fk1/6OQZTC0Op3Pi/bt+hVG/mD1p0sQUDgo2MO3qlSS+5mMkR8h3mJEgwK12
72P4YfOJkuAy2z3v4lL0GYdUDAZY6i6G8TMxERKu/a9O3VjTWICDOyBUS6F8YEAK
C7HlbZxAEOKTVK0BTDTeEUBwSeDrBbvH6MnRlZCG5g1Fos2aWP0udhjiX8IfZLO7
GN6nWBvK1fYzfsUczdhgnoCzQs3suoDo04HnsTPGJ8De52T4x2RsjV+gPx0nrNAq
eWChl1JvMWsY2B3GLnl9XQz4NNN+EreKEkk+PULDGllrArrPsp5Vnhb9FJO1PVCU
hMDJHohPiXnKbc8f4Bd78OhIvnuoGfJPdM5MtNe2flUKy2a2ops=
=YTGf
-----END PGP SIGNATURE-----
Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull notification queue from David Howells:
"This adds a general notification queue concept and adds an event
source for keys/keyrings, such as linking and unlinking keys and
changing their attributes.
Thanks to Debarshi Ray, we do have a pull request to use this to fix a
problem with gnome-online-accounts - as mentioned last time:
https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47
Without this, g-o-a has to constantly poll a keyring-based kerberos
cache to find out if kinit has changed anything.
[ There are other notification pending: mount/sb fsinfo notifications
for libmount that Karel Zak and Ian Kent have been working on, and
Christian Brauner would like to use them in lxc, but let's see how
this one works first ]
LSM hooks are included:
- A set of hooks are provided that allow an LSM to rule on whether or
not a watch may be set. Each of these hooks takes a different
"watched object" parameter, so they're not really shareable. The
LSM should use current's credentials. [Wanted by SELinux & Smack]
- A hook is provided to allow an LSM to rule on whether or not a
particular message may be posted to a particular queue. This is
given the credentials from the event generator (which may be the
system) and the watch setter. [Wanted by Smack]
I've provided SELinux and Smack with implementations of some of these
hooks.
WHY
===
Key/keyring notifications are desirable because if you have your
kerberos tickets in a file/directory, your Gnome desktop will monitor
that using something like fanotify and tell you if your credentials
cache changes.
However, we also have the ability to cache your kerberos tickets in
the session, user or persistent keyring so that it isn't left around
on disk across a reboot or logout. Keyrings, however, cannot currently
be monitored asynchronously, so the desktop has to poll for it - not
so good on a laptop. This facility will allow the desktop to avoid the
need to poll.
DESIGN DECISIONS
================
- The notification queue is built on top of a standard pipe. Messages
are effectively spliced in. The pipe is opened with a special flag:
pipe2(fds, O_NOTIFICATION_PIPE);
The special flag has the same value as O_EXCL (which doesn't seem
like it will ever be applicable in this context)[?]. It is given up
front to make it a lot easier to prohibit splice&co from accessing
the pipe.
[?] Should this be done some other way? I'd rather not use up a new
O_* flag if I can avoid it - should I add a pipe3() system call
instead?
The pipe is then configured::
ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
Messages are then read out of the pipe using read().
- It should be possible to allow write() to insert data into the
notification pipes too, but this is currently disabled as the
kernel has to be able to insert messages into the pipe *without*
holding pipe->mutex and the code to make this work needs careful
auditing.
- sendfile(), splice() and vmsplice() are disabled on notification
pipes because of the pipe->mutex issue and also because they
sometimes want to revert what they just did - but one or more
notification messages might've been interleaved in the ring.
- The kernel inserts messages with the wait queue spinlock held. This
means that pipe_read() and pipe_write() have to take the spinlock
to update the queue pointers.
- Records in the buffer are binary, typed and have a length so that
they can be of varying size.
This allows multiple heterogeneous sources to share a common
buffer; there are 16 million types available, of which I've used
just a few, so there is scope for others to be used. Tags may be
specified when a watchpoint is created to help distinguish the
sources.
- Records are filterable as types have up to 256 subtypes that can be
individually filtered. Other filtration is also available.
- Notification pipes don't interfere with each other; each may be
bound to a different set of watches. Any particular notification
will be copied to all the queues that are currently watching for it
- and only those that are watching for it.
- When recording a notification, the kernel will not sleep, but will
rather mark a queue as having lost a message if there's
insufficient space. read() will fabricate a loss notification
message at an appropriate point later.
- The notification pipe is created and then watchpoints are attached
to it, using one of:
keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);
where in both cases, fd indicates the queue and the number after is
a tag between 0 and 255.
- Watches are removed if either the notification pipe is destroyed or
the watched object is destroyed. In the latter case, a message will
be generated indicating the enforced watch removal.
Things I want to avoid:
- Introducing features that make the core VFS dependent on the
network stack or networking namespaces (ie. usage of netlink).
- Dumping all this stuff into dmesg and having a daemon that sits
there parsing the output and distributing it as this then puts the
responsibility for security into userspace and makes handling
namespaces tricky. Further, dmesg might not exist or might be
inaccessible inside a container.
- Letting users see events they shouldn't be able to see.
TESTING AND MANPAGES
====================
- The keyutils tree has a pipe-watch branch that has keyctl commands
for making use of notifications. Proposed manual pages can also be
found on this branch, though a couple of them really need to go to
the main manpages repository instead.
If the kernel supports the watching of keys, then running "make
test" on that branch will cause the testing infrastructure to spawn
a monitoring process on the side that monitors a notifications pipe
for all the key/keyring changes induced by the tests and they'll
all be checked off to make sure they happened.
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch
- A test program is provided (samples/watch_queue/watch_test) that
can be used to monitor for keyrings, mount and superblock events.
Information on the notifications is simply logged to stdout"
* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
smack: Implement the watch_key and post_notification hooks
selinux: Implement the watch_key security hook
keys: Make the KEY_NEED_* perms an enum rather than a mask
pipe: Add notification lossage handling
pipe: Allow buffers to be marked read-whole-or-error for notifications
Add sample notification program
watch_queue: Add a key/keyring notification facility
security: Add hooks to rule on setting a watch
pipe: Add general notification queue support
pipe: Add O_NOTIFICATION_PIPE
security: Add a hook for the point of notification insertion
uapi: General notification queue definitions
Add SPI_TX_OCTAL and SPI_RX_OCTAL to fix the following build errors:
CC spidev_test.o
spidev_test.c: In function ‘transfer’:
spidev_test.c:131:13: error: ‘SPI_TX_OCTAL’ undeclared (first use in this function)
if (mode & SPI_TX_OCTAL)
^
spidev_test.c:131:13: note: each undeclared identifier is reported only once for each function it appears in
spidev_test.c:137:13: error: ‘SPI_RX_OCTAL’ undeclared (first use in this function)
if (mode & SPI_RX_OCTAL)
^
spidev_test.c: In function ‘parse_opts’:
spidev_test.c:290:12: error: ‘SPI_TX_OCTAL’ undeclared (first use in this function)
mode |= SPI_TX_OCTAL;
^
spidev_test.c:308:12: error: ‘SPI_RX_OCTAL’ undeclared (first use in this function)
mode |= SPI_RX_OCTAL;
^
LD spidev_test-in.o
ld: cannot find spidev_test.o: No such file or directory
Additionally, maybe SPI_CS_WORD and SPI_3WIRE_HIZ will be used in the future,
so add them too.
Fixes: 896fa73508 ("spi: spidev_test: Add support for Octal mode data transfers")
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/1591880212-13479-2-git-send-email-zhangqing@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
This adds the same per-file/per-directory DAX support for ext4 as was
done for xfs, now that we finally have consensus over what the
interface should be.
virtio-mem
doorbell mapping for vdpa
config interrupt support in ifc
fixes all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl7fZ6APHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpkDoIAMcBcQx5su1iuX7vT35xzUWZO478eAf1jOMZ
7KxKUVBeztkcxVFUlRVRu9MR6wOzwHils+1HD6025775Smr5M6x3aJxR6xOORaBj
RoU6OVGkpDvbzsxlhW+xhONz4O7/RkveKJPCwzGjqHrsFeh92lkfTqroz/EuNpw+
LZsO0+DhdUf123HbwHQp5lxW8EjyrRabgeZZg/D9VLPhoCP88vCjRhBXU2GPuaUl
/UNXsQafn4xUgrxPaoN5f4Phn/P46NNrbZ1jmlkw/z/3QhF/DhktGXGaZsIHDCN/
vicUii0or5QLeBsZpMbKko/BIe2xWHxFjkMRhMOMZOfcBb6sMBI=
=auUa
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- virtio-mem: paravirtualized memory hotplug
- support doorbell mapping for vdpa
- config interrupt support in ifc
- fixes all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
vhost/test: fix up after API change
virtio_mem: convert device block size into 64bit
virtio-mem: drop unnecessary initialization
ifcvf: implement config interrupt in IFCVF
vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
vhost_vdpa: Support config interrupt in vdpa
ifcvf: ignore continuous setting same status value
virtio-mem: Don't rely on implicit compiler padding for requests
virtio-mem: Try to unplug the complete online memory block first
virtio-mem: Use -ETXTBSY as error code if the device is busy
virtio-mem: Unplug subblocks right-to-left
virtio-mem: Drop manual check for already present memory
virtio-mem: Add parent resource for all added "System RAM"
virtio-mem: Better retry handling
virtio-mem: Offline and remove completely unplugged memory blocks
mm/memory_hotplug: Introduce offline_and_remove_memory()
virtio-mem: Allow to offline partially unplugged memory blocks
mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
virtio-mem: Paravirtualized memory hotunplug part 2
virtio-mem: Paravirtualized memory hotunplug part 1
...
V2:
- Defer changing BPF-syscall to start at file-descriptor 1
- Use {} to zero initialise struct.
The recent commit fbee97feed ("bpf: Add support to attach bpf program to a
devmap entry"), introduced ability to attach (and run) a separate XDP
bpf_prog for each devmap entry. A bpf_prog is added via a file-descriptor.
As zero were a valid FD, not using the feature requires using value minus-1.
The UAPI is extended via tail-extending struct bpf_devmap_val and using
map->value_size to determine the feature set.
This will break older userspace applications not using the bpf_prog feature.
Consider an old userspace app that is compiled against newer kernel
uapi/bpf.h, it will not know that it need to initialise the member
bpf_prog.fd to minus-1. Thus, users will be forced to update source code to
get program running on newer kernels.
This patch remove the minus-1 checks, and have zero mean feature isn't used.
Followup patches either for kernel or libbpf should handle and avoid
returning file-descriptor zero in the first place.
Fixes: fbee97feed ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159170950687.2102545.7235914718298050113.stgit@firesoul
If subblock size is large (e.g. 1G) 32 bit math involving it
can overflow. Rather than try to catch all instances of that,
let's tweak block size to 64 bit.
It ripples through UAPI which is an ABI change, but it's not too late to
make it, and it will allow supporting >4Gbyte blocks while might
become necessary down the road.
Fixes: 5f1f79bbc9 ("virtio-mem: Paravirtualized memory hotplug")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
* fix the deadlock on rfkill/wireless removal that a few
people reported
* fix an uninitialized variable
* update wiki URLs
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl7d8c0ACgkQB8qZga/f
l8SRzg//ZTtHTKOsfZ2IpsAmExkQ+1ZdsGHAGfkgDLQz4rvv1Lug7TvrFPiSyHSm
jwLlRQNsQ5+Cv2CRY3Xm7Qf8j9wBavYnfHhJkoTrnD3Z770KUS+BXBYb31+Odkxv
CzsR1GZYTWdYhCrzVIyE+GkQmW2pZ3L8U7ODioM7ETYaK0gAjmCb/HXLiX/m8cGa
O0uUlJZqE57Trfy5p+WO7cQOLJ9v6WXgSCcrDCNb9Ek25wg5J6RVOMEm7w6oV8oC
F8uZyVXPC0fSblzHC4cch0yX3z4YIuD12BVZBOVDLJKQBZwqohtxd0jT4MNHJB2y
BflU13M2kW5pw3l+cBPLZFOsURcDmOcBo9pNYCbi7Uxsd5Hvgft039jeXpukI3QW
e3d50KB0gSE/plOgXShPVSvm4eQ7WGS3Vyv2IfmU3dY6mxLv7kazSOErFD+fxUMy
vtdVN/Ie9XyRbh30n5MfTrE3PIf6k7XI3zirZrpMMNfu9fw4a3DQycqoZRBOoU1Y
l4ThlIduREp+wr14OnF2ueaho9hxVRxh+gnfuhWbzI8VKLHBCVOKe/MsTXzxg5OB
8xSA9Q1xo/bv+VymaQrY6ENG39sDZB+uI5fi0hnQ2Fu7BHPgp/Juzb56nQ/bWrfG
DOItqu5PoejvwMP+ju43i8oUDdqjlNgHhwDze+nCHiSnHUf+yWE=
=wP9I
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2020-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just a small update:
* fix the deadlock on rfkill/wireless removal that a few
people reported
* fix an uninitialized variable
* update wiki URLs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- An iopen glock locking scheme rework that speeds up deletes of
inodes accessed from multiple nodes.
- Various bug fixes and debugging improvements.
- Convert gfs2-glocks.txt to ReST.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAl7eYjMUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTr/9g//cJ6jgiD/+qzh0VzougVksVZIduAl
RMB+kldOjBS2ORbyYM87Jm1tdyakgZuFO91XlSwChWRdC3Y2mqdaJIEE/kATqfY9
7Frlw++SyFKLvIf04kDYGk2hXX+umXXYFfrIiKb0tzDSGkPRaARUb3RM4TRvlSDP
/U0JlYA/4aXMUge+VpYsbpSGeqHNfEzmcmCyPXGmZYyh1MZ/RocMZFYEsP9NH82J
l07fxowUd10LJPEmBajzjD2NmEvjdvF4gBCOfJVNIfOzCj0CwXL3vmtu1SUNOKr+
em266EWZF89eOcvfdtE6xF0w81oCAK43wYRjIODSgI9JCLXGiOYmlWZVwZoqu5iy
2GQDhj/taq3ObuVqjR5n6GYuqMoJ+D0LSD13qccDALq/Bdy4lq9TMLSdDbkrVIM/
8BVn0nI5MzUlTV3mq6uxhU0HqtYDwUEiHWURWw6bYRug5OvQy3nbg/XZptYlnD87
XQccE4ErjlgSHLiYx1YckFz/GG6ytrRAKl9bGMkZ0u2+XmDsH+iJJgLcaXCPUP9h
/hhYagKI55UBDer7we4tppbu+gnJrg3PXlgImf53iMc7ia0KQHd+TfSIFkGPuydI
aEKKhIQzje23JayMbPRnwPbNlw9zU1loPi7hPS3rCpDY+w8oawpFyIieEOcxJyEt
pYkOt4BQi9LvpGc=
=PsCY
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- An iopen glock locking scheme rework that speeds up deletes of inodes
accessed from multiple nodes
- Various bug fixes and debugging improvements
- Convert gfs2-glocks.txt to ReST
* tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: fix use-after-free on transaction ail lists
gfs2: new slab for transactions
gfs2: initialize transaction tr_ailX_lists earlier
gfs2: Smarter iopen glock waiting
gfs2: Wake up when setting GLF_DEMOTE
gfs2: Check inode generation number in delete_work_func
gfs2: Move inode generation number check into gfs2_inode_lookup
gfs2: Minor gfs2_lookup_by_inum cleanup
gfs2: Try harder to delete inodes locally
gfs2: Give up the iopen glock on contention
gfs2: Turn gl_delete into a delayed work
gfs2: Keep track of deleted inode generations in LVBs
gfs2: Allow ASPACE glocks to also have an lvb
gfs2: instrumentation wrt log_flush stuck
gfs2: introduce new gfs2_glock_assert_withdraw
gfs2: print mapping->nrpages in glock dump for address space glocks
gfs2: Only do glock put in gfs2_create_inode for free inodes
gfs2: Allow lock_nolock mount to specify jid=X
gfs2: Don't ignore inode write errors during inode_go_sync
docs: filesystems: convert gfs2-glocks.txt to ReST
- Add support for multi-function devices in pci code.
- Enable PF-VF linking for architectures using the
pdev->no_vf_scan flag (currently just s390).
- Add reipl from NVMe support.
- Get rid of critical section cleanup in entry.S.
- Refactor PNSO CHSC (perform network subchannel operation) in cio
and qeth.
- QDIO interrupts and error handling fixes and improvements, more
refactoring changes.
- Align ioremap() with generic code.
- Accept requests without the prefetch bit set in vfio-ccw.
- Enable path handling via two new regions in vfio-ccw.
- Other small fixes and improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl7eVGcACgkQjYWKoQLX
FBhweQgAkicvx31x230rdfG+jQkQkl0UqF99vvWrJHEll77SqadfjzKAGIjUB+K0
EoeHVD5Wcj7BogDGcyHeQ0bZpu4WzE+y1nmnrsvu7TEEvcBmkJH0rF2jF+y0sb/O
3qvwFkX/CB5OqaMzKC/AEeRpcCKR+ZUXkWu1irbYth7CBXaycD9EAPc4cj8CfYGZ
r5njUdYOVk77TaO4aV+t5pCYc5TCRJaWXSsWaAv/nuLcIqsFBYOy2q+L47zITGXp
utZVanIDjzx+ikpaKicOIfC3hJsRuNX9MnlZKsQFwpVEZAUZmIUm29XdhGJTWSxU
RV7m1ORINbFP1nGAqWqkOvGo/LC0ZA==
=VhXR
-----END PGP SIGNATURE-----
Merge tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for multi-function devices in pci code.
- Enable PF-VF linking for architectures using the pdev->no_vf_scan
flag (currently just s390).
- Add reipl from NVMe support.
- Get rid of critical section cleanup in entry.S.
- Refactor PNSO CHSC (perform network subchannel operation) in cio and
qeth.
- QDIO interrupts and error handling fixes and improvements, more
refactoring changes.
- Align ioremap() with generic code.
- Accept requests without the prefetch bit set in vfio-ccw.
- Enable path handling via two new regions in vfio-ccw.
- Other small fixes and improvements all over the code.
* tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits)
vfio-ccw: make vfio_ccw_regops variables declarations static
vfio-ccw: Add trace for CRW event
vfio-ccw: Wire up the CRW irq and CRW region
vfio-ccw: Introduce a new CRW region
vfio-ccw: Refactor IRQ handlers
vfio-ccw: Introduce a new schib region
vfio-ccw: Refactor the unregister of the async regions
vfio-ccw: Register a chp_event callback for vfio-ccw
vfio-ccw: Introduce new helper functions to free/destroy regions
vfio-ccw: document possible errors
vfio-ccw: Enable transparent CCW IPL from DASD
s390/pci: Log new handle in clp_disable_fh()
s390/cio, s390/qeth: cleanup PNSO CHSC
s390/qdio: remove q->first_to_kick
s390/qdio: fix up qdio_start_irq() kerneldoc
s390: remove critical section cleanup from entry.S
s390: add machine check SIGP
s390/pci: ioremap() align with generic code
s390/ap: introduce new ap function ap_get_qdev()
Documentation/s390: Update / remove developerWorks web links
...
Including:
- A big part of this is a change in how devices get connected to
IOMMUs in the core code. It contains the change from the old
add_device()/remove_device() to the new
probe_device()/release_device() call-backs. As a result
functionality that was previously in the IOMMU drivers has
been moved to the IOMMU core code, including IOMMU group
allocation for each device.
The reason for this change was to get more robust allocation
of default domains for the iommu groups.
A couple of fixes were necessary after this was merged into
the IOMMU tree, but there are no known bugs left. The last fix
is applied on-top of the merge commit for the topic branches.
- Removal of the driver private domain handling in the Intel
VT-d driver. This was fragile code and I am glad it is gone
now.
- More Intel VT-d updates from Lu Baolu:
- Nested Shared Virtual Addressing (SVA) support to the
Intel VT-d driver
- Replacement of the Intel SVM interfaces to the common
IOMMU SVA API
- SVA Page Request draining support
- ARM-SMMU Updates from Will:
- Avoid mapping reserved MMIO space on SMMUv3, so that
it can be claimed by the PMU driver
- Use xarray to manage ASIDs on SMMUv3
- Reword confusing shutdown message
- DT compatible string updates
- Allow implementations to override the default domain
type
- A new IOMMU driver for the Allwinner Sun50i platform
- Support for ATS gets disabled for untrusted devices (like
Thunderbolt devices). This includes a PCI patch, acked by
Bjorn.
- Some cleanups to the AMD IOMMU driver to make more use of
IOMMU core features.
- Unification of some printk formats in the Intel and AMD IOMMU
drivers and in the IOVA code.
- Updates for DT bindings
- A number of smaller fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl7eX5gACgkQK/BELZcB
GuOMMQ//Si8h3uC4QhTmeNM6OwYpTcImMuCtqOebVDOJYWfbjGb4U2ZvDSUu4r7u
KGj66pWBq9kciKaM5HcLnWNg4iNNG+iZHwYSOy2DAOdPorWh40aM/Obozdd4D4eK
sXt4uy1JEQem/Bm4eTwmvaJV5/riyK6xn1HVocPejstGSJCh4kal/bYuhj415qEa
LLrN0AcitoPaSRl4Pl7/wEtesk+Az0g94jY9qDhtxIQJXWlAwO25s+rIPy4S7QuW
WAFGU+Xp+J7WC3hQm6nHKQtURIqPHtqozT9Flws9YETuyeKwn47GRitMiAXZsy7R
t+kj1cHyglEhe2hdPnJBSFIjyrO3cCrV7CUVryJHigPCQOaQLjoEegThQCYU3VQu
FPRBX+bp4haHeo3BCBy2jQv4JZrPFkTVXeVEtpMRDOoJLb2OKaI34xbOvGy6dMM0
dFtpbAW2IjHuneJaQCbJIC+jaEYii8mr3Zwok4LS8u8Sy+7PPSKmt6Tti3enD8+C
pBB/0CxNJvQFhl13s6oI8NHTT9D6cPTbjxc2Gfc3UuKyyWsz+eR54gRhaBi0FypA
p6syMosNVjjOaHFd5K5gsbpUFCC3X/drIhqeXRLgQ51mqfkNZMuBBtiyLWTk7iJd
CK+1f2aqtBrpUdSNjTzE/XmR+AhjIn2oIcG/7jPCgYXQoSGM2Sg=
=a4z4
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"A big part of this is a change in how devices get connected to IOMMUs
in the core code. It contains the change from the old add_device() /
remove_device() to the new probe_device() / release_device()
call-backs.
As a result functionality that was previously in the IOMMU drivers has
been moved to the IOMMU core code, including IOMMU group allocation
for each device. The reason for this change was to get more robust
allocation of default domains for the iommu groups.
A couple of fixes were necessary after this was merged into the IOMMU
tree, but there are no known bugs left. The last fix is applied on-top
of the merge commit for the topic branches.
Other than that change, we have:
- Removal of the driver private domain handling in the Intel VT-d
driver. This was fragile code and I am glad it is gone now.
- More Intel VT-d updates from Lu Baolu:
- Nested Shared Virtual Addressing (SVA) support to the Intel VT-d
driver
- Replacement of the Intel SVM interfaces to the common IOMMU SVA
API
- SVA Page Request draining support
- ARM-SMMU Updates from Will:
- Avoid mapping reserved MMIO space on SMMUv3, so that it can be
claimed by the PMU driver
- Use xarray to manage ASIDs on SMMUv3
- Reword confusing shutdown message
- DT compatible string updates
- Allow implementations to override the default domain type
- A new IOMMU driver for the Allwinner Sun50i platform
- Support for ATS gets disabled for untrusted devices (like
Thunderbolt devices). This includes a PCI patch, acked by Bjorn.
- Some cleanups to the AMD IOMMU driver to make more use of IOMMU
core features.
- Unification of some printk formats in the Intel and AMD IOMMU
drivers and in the IOVA code.
- Updates for DT bindings
- A number of smaller fixes and cleanups.
* tag 'iommu-updates-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (109 commits)
iommu: Check for deferred attach in iommu_group_do_dma_attach()
iommu/amd: Remove redundant devid checks
iommu/amd: Store dev_data as device iommu private data
iommu/amd: Merge private header files
iommu/amd: Remove PD_DMA_OPS_MASK
iommu/amd: Consolidate domain allocation/freeing
iommu/amd: Free page-table in protection_domain_free()
iommu/amd: Allocate page-table in protection_domain_init()
iommu/amd: Let free_pagetable() not rely on domain->pt_root
iommu/amd: Unexport get_dev_data()
iommu/vt-d: Fix compile warning
iommu/vt-d: Remove real DMA lookup in find_domain
iommu/vt-d: Allocate domain info for real DMA sub-devices
iommu/vt-d: Only clear real DMA device's context entries
iommu: Remove iommu_sva_ops::mm_exit()
uacce: Remove mm_exit() op
iommu/sun50i: Constify sun50i_iommu_ops
iommu/hyper-v: Constify hyperv_ir_domain_ops
iommu/vt-d: Use pci_ats_supported()
iommu/arm-smmu-v3: Use pci_ats_supported()
...
The wiki url is still the old "wireless.kernel.org"
instead of the new "wireless.wiki.kernel.org"
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
Link: https://lore.kernel.org/r/20200605154112.16277-9-f.suligoi@asem.it
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Subsystem:
- new VL flag for backup switch over
Drivers:
- ingenic: only support device tree
- pcf2127: report battery switch over, handle nowayout
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEycoQi/giopmpPgB12wIijOdRNOUFAl7dWSkACgkQ2wIijOdR
NOXGbQ//cSTxUJbuYBNi/VCV7J3/khGlyoQQqDsru/tzuEwXHGBoG2LRNQMOauWd
2Osg61VQj4IY+WCqp4+ivn5H0K26y1PPKkt+UmrlRgkl0eeDFWmY4ejpziZ85D7Z
kDlzcUi3YWkd6m4YSJJrtdCcKljBMIEXb/PEKKK9y6dkrcG5990N8JchpmkCzrjx
fTPVIOfxu43msDc5b8egUDzPYnNbFw3ERAeasr6/EGTz+ksCspXtvWDk/mJzum0G
FiermTkO499Dr66Nf0AS3ex9SvEoqH+kd9KA1CKii5OlYEl7K9sI+eSmTQ1EutZO
L5WAvvQdW8UkARo6R4HAobhwK27pL+wpzUljbyXxt940/RTeqp82kl7rnH+0ihU7
tTbR2Vu+uwWrfQbPkCCj0TJmqIHgam5/Vhn1+ZR2f4U2JIlPvvHoLRVKO0oP7XKK
1ZDcP8zc9V2LQ2G2M1/ec6eOmoGW3EZDnKp4hcv9mnEiePSvVn04t5sa83NjNs4R
e+awVY1x5pFwoXu99gjlfQTV2kTyaA7Jywp6gIO7BKaw/Ci3+d3tlpowfsDH+UVI
WwKxNNqmuNXqoIep0zqUhqXHNIizKxGEk8wE4mr8HP2SlGJ+lUHAyrTTdpLeinN1
5qTEPT3BhjExSFfDZQyWV3+CzKMvxtfFA4/Ca/0iSoaqzMZpm1E=
=dsKr
-----END PGP SIGNATURE-----
Merge tag 'rtc-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Not much this cycle apart from the ingenic rtc driver rework.
The fixes are mainly minor issues reported by coccinelle rather than
real world issues.
Subsystem:
- new VL flag for backup switch over
Drivers:
- ingenic: only support device tree
- pcf2127: report battery switch over, handle nowayout"
* tag 'rtc-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (29 commits)
rtc: pcf2127: watchdog: handle nowayout feature
rtc: fsl-ftm-alarm: fix freeze(s2idle) failed to wake
rtc: abx80x: Provide debug feedback for invalid dt properties
rtc: abx80x: Add Device Tree matching table
rtc: rv3028: Add missed check for devm_regmap_init_i2c()
rtc: mpc5121: Use correct return value for mpc5121_rtc_probe()
rtc: goldfish: Use correct return value for goldfish_rtc_probe()
rtc: snvs: Add necessary clock operations for RTC APIs
rtc: snvs: Make SNVS clock always prepared
rtc: ingenic: Reset regulator register in probe
rtc: ingenic: Fix masking of error code
rtc: ingenic: Remove unused fields from private structure
rtc: ingenic: Set wakeup params in probe
rtc: ingenic: Enable clock in probe
rtc: ingenic: Use local 'dev' variable in probe
rtc: ingenic: Only support probing from devicetree
rtc: mc13xxx: fix a double-unlock issue
rtc: stmp3xxx: update contact email
rtc: max77686: Use single-byte writes on MAX77620
rtc: pcf2127: report battery switch over
...
Here is the large set of char/misc driver patches for 5.8-rc1
Included in here are:
- habanalabs driver updates, loads
- mhi bus driver updates
- extcon driver updates
- clk driver updates (approved by the clock maintainer)
- firmware driver updates
- fpga driver updates
- gnss driver updates
- coresight driver updates
- interconnect driver updates
- parport driver updates (it's still alive!)
- nvmem driver updates
- soundwire driver updates
- visorbus driver updates
- w1 driver updates
- various misc driver updates
In short, loads of different driver subsystem updates along with the
drivers as well.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXtzkHw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yldOwCgus/DgpnI1UL4z+NdBxJrAXtkPmgAn2sgTUea
i5RblCmcVMqvHaGtYkY+
=tScN
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the large set of char/misc driver patches for 5.8-rc1
Included in here are:
- habanalabs driver updates, loads
- mhi bus driver updates
- extcon driver updates
- clk driver updates (approved by the clock maintainer)
- firmware driver updates
- fpga driver updates
- gnss driver updates
- coresight driver updates
- interconnect driver updates
- parport driver updates (it's still alive!)
- nvmem driver updates
- soundwire driver updates
- visorbus driver updates
- w1 driver updates
- various misc driver updates
In short, loads of different driver subsystem updates along with the
drivers as well.
All have been in linux-next for a while with no reported issues"
* tag 'char-misc-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (233 commits)
habanalabs: correctly cast u64 to void*
habanalabs: initialize variable to default value
extcon: arizona: Fix runtime PM imbalance on error
extcon: max14577: Add proper dt-compatible strings
extcon: adc-jack: Fix an error handling path in 'adc_jack_probe()'
extcon: remove redundant assignment to variable idx
w1: omap-hdq: print dev_err if irq flags are not cleared
w1: omap-hdq: fix interrupt handling which did show spurious timeouts
w1: omap-hdq: fix return value to be -1 if there is a timeout
w1: omap-hdq: cleanup to add missing newline for some dev_dbg
/dev/mem: Revoke mappings when a driver claims the region
misc: xilinx-sdfec: convert get_user_pages() --> pin_user_pages()
misc: xilinx-sdfec: cleanup return value in xsdfec_table_write()
misc: xilinx-sdfec: improve get_user_pages_fast() error handling
nvmem: qfprom: remove incorrect write support
habanalabs: handle MMU cache invalidation timeout
habanalabs: don't allow hard reset with open processes
habanalabs: GAUDI does not support soft-reset
habanalabs: add print for soft reset due to event
habanalabs: improve MMU cache invalidation code
...
* Fix performance problems found in dioread_nolock now that it is the
default, caused by transaction leaks.
* Clean up fiemap handling in ext4
* Clean up and refactor multiple block allocator (mballoc) code
* Fix a problem with mballoc with a smaller file systems running out
of blocks because they couldn't properly use blocks that had been
reserved by inode preallocation.
* Fixed a race in ext4_sync_parent() versus rename()
* Simplify the error handling in the extent manipulation code
* Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and
ext4_make_inode_dirty()'s callers.
* Avoid passing an error pointer to brelse in ext4_xattr_set()
* Fix race which could result to freeing an inode on the dirty last
in data=journal mode.
* Fix refcount handling if ext4_iget() fails
* Fix a crash in generic/019 caused by a corrupted extent node
-----BEGIN PGP SIGNATURE-----
iQEyBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl7Ze8kACgkQ8vlZVpUN
gaNChAf4xn0ytFSrweI/S2Sp05G/2L/ocZ2TZZk2ZdGeN1E+ABdSIv/zIF9zuFgZ
/pY/C+fyEZWt4E3FlNO8gJzoEedkzMCMnUhSIfI+wZbcclyTOSNMJtnrnJKAEtVH
HOvGZJmg357jy407RCGhZpJ773nwU2xhBTr5OFxvSf9mt/vzebxIOnw5D7HPlC1V
Fgm6Du8q+tRrPsyjv1Yu4pUEVXMJ7qUcvt326AXVM3kCZO1Aa5GrURX0w3J4mzW1
tc1tKmtbLcVVYTo9CwHXhk/edbxrhAydSP2iACand3tK6IJuI6j9x+bBJnxXitnr
vsxsfTYMG18+2SxrJ9LwmagqmrRq
=HMTs
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A lot of bug fixes and cleanups for ext4, including:
- Fix performance problems found in dioread_nolock now that it is the
default, caused by transaction leaks.
- Clean up fiemap handling in ext4
- Clean up and refactor multiple block allocator (mballoc) code
- Fix a problem with mballoc with a smaller file systems running out
of blocks because they couldn't properly use blocks that had been
reserved by inode preallocation.
- Fixed a race in ext4_sync_parent() versus rename()
- Simplify the error handling in the extent manipulation code
- Make sure all metadata I/O errors are felected to
ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.
- Avoid passing an error pointer to brelse in ext4_xattr_set()
- Fix race which could result to freeing an inode on the dirty last
in data=journal mode.
- Fix refcount handling if ext4_iget() fails
- Fix a crash in generic/019 caused by a corrupted extent node"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
ext4: avoid unnecessary transaction starts during writeback
ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
fs: remove the access_ok() check in ioctl_fiemap
fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
fs: move fiemap range validation into the file systems instances
iomap: fix the iomap_fiemap prototype
fs: move the fiemap definitions out of fs.h
fs: mark __generic_block_fiemap static
ext4: remove the call to fiemap_check_flags in ext4_fiemap
ext4: split _ext4_fiemap
ext4: fix fiemap size checks for bitmap files
ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
add comment for ext4_dir_entry_2 file_type member
jbd2: avoid leaking transaction credits when unreserving handle
ext4: drop ext4_journal_free_reserved()
ext4: mballoc: use lock for checking free blocks while retrying
ext4: mballoc: refactor ext4_mb_good_group()
ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
ext4: mballoc: refactor ext4_mb_discard_preallocations()
...
When deleting an inode, keep track of the generation of the deleted inode in
the inode glock Lock Value Block (LVB). When trying to delete an inode
remotely, check the last-known inode generation against the deleted inode
generation to skip duplicate remote deletes. This avoids taking the resource
group glock in order to verify the block type.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
These are updates to SoC specific drivers that did not have
another subsystem maintainer tree to go through for some
reason:
- Some bus and memory drivers for the MIPS P5600 based
Baikal-T1 SoC that is getting added through the MIPS tree.
- There are new soc_device identification drivers for TI K3,
Qualcomm MSM8939
- New reset controller drivers for NXP i.MX8MP, Renesas
RZ/G1H, and Hisilicon hi6220
- The SCMI firmware interface can now work across ARM SMC/HVC
as a transport.
- Mediatek platforms now use a new driver for their "MMSYS"
hardware block that controls clocks and some other aspects
in behalf of the media and gpu drivers.
- Some Tegra processors have improved power management
support, including getting woken up by the PMIC and cluster
power down during idle.
- A new v4l staging driver for Tegra is added.
- Cleanups and minor bugfixes for TI, NXP, Hisilicon,
Mediatek, and Tegra.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl7XvgAACgkQmmx57+YA
GNmj/hAAnAJ/hYehLfgCe711HUntgeRkaoTVpCt8BJNMdxsa23sn3V6k5+WYn1uG
PtlgpefZEMHLUEEVDegR4nZXLG0Pzu1SR12KW34YPcQKkNo/+vlQ9zYUajnJ/KX6
10zdLSIzHfk1VtXKvvQQ8xFyE+S/trGmjC57E6gfoCUT3rl1maD+ccVXUBaz9oob
wuMxGXQAl57mio5yT1OfSk6Fev39xRE2dN1hzP7KUYhsemZajBwBBW5wVJZCsCB8
LCGmxVkavM7BV4r2NokbBDs5rlfedBl/P/IPd9Is5a5tuGUkSsVRG9zqShxYLGM3
S06az6POQFwXKFJoUKW0dK/Koy0D7BK+vhUBPzFv4HZ8iDCVf6Jju2MJ02GMqHPj
OOrXaCbLYrvN/edVUWeeFywqwMbYTRwC4DxyTq5m7HxEB004xTOhs0rX5aR0u4n1
bbsR97LguolwH9iEMzd3F3jCiKBcMecH3lAh5WcrtwlFIRrNhbWoGDoA/4TuORFS
b11rgsTRIJ5Vc++D1HnSnx0ZZvUzyluMvygdALnSgVah6xYe6KVw9Kg/wioAJ04G
uSTidqP3qRhsyET2HQo7CxdVfZbKfP25iKCKrdhQziztKvhF8qrUmZKloXOodRw+
ewYSRmv8c324OYYit1X43oAdW8dntq1XbSIauaqxEb4JC7x/xRY=
=44Jc
-----END PGP SIGNATURE-----
Merge tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM/SoC driver updates from Arnd Bergmann:
"These are updates to SoC specific drivers that did not have another
subsystem maintainer tree to go through for some reason:
- Some bus and memory drivers for the MIPS P5600 based Baikal-T1 SoC
that is getting added through the MIPS tree.
- There are new soc_device identification drivers for TI K3, Qualcomm
MSM8939
- New reset controller drivers for NXP i.MX8MP, Renesas RZ/G1H, and
Hisilicon hi6220
- The SCMI firmware interface can now work across ARM SMC/HVC as a
transport.
- Mediatek platforms now use a new driver for their "MMSYS" hardware
block that controls clocks and some other aspects in behalf of the
media and gpu drivers.
- Some Tegra processors have improved power management support,
including getting woken up by the PMIC and cluster power down
during idle.
- A new v4l staging driver for Tegra is added.
- Cleanups and minor bugfixes for TI, NXP, Hisilicon, Mediatek, and
Tegra"
* tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (155 commits)
clk: sprd: fix compile-testing
bus: bt1-axi: Build the driver into the kernel
bus: bt1-apb: Build the driver into the kernel
bus: bt1-axi: Use sysfs_streq instead of strncmp
bus: bt1-axi: Optimize the return points in the driver
bus: bt1-apb: Use sysfs_streq instead of strncmp
bus: bt1-apb: Use PTR_ERR_OR_ZERO to return from request-regs method
bus: bt1-apb: Fix show/store callback identations
bus: bt1-apb: Include linux/io.h
dt-bindings: memory: Add Baikal-T1 L2-cache Control Block binding
memory: Add Baikal-T1 L2-cache Control Block driver
bus: Add Baikal-T1 APB-bus driver
bus: Add Baikal-T1 AXI-bus driver
dt-bindings: bus: Add Baikal-T1 APB-bus binding
dt-bindings: bus: Add Baikal-T1 AXI-bus binding
staging: tegra-video: fix V4L2 dependency
tee: fix crypto select
drivers: soc: ti: knav_qmss_queue: Make knav_gp_range_ops static
soc: ti: add k3 platforms chipid module driver
dt-bindings: soc: ti: add binding for k3 platforms chipid module
...
The compiler will add padding after the last member, make that explicit.
The size of a request is always 24 bytes. The size of a response always
10 bytes. Add compile-time checks.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200515101402.16597-1-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We want to allow to specify (similar as for a DIMM), to which node a
virtio-mem device (and, therefore, its memory) belongs. Add a new
virtio-mem feature flag and export pxm_to_node, so it can be used in kernel
module context.
Acked-by: Michal Hocko <mhocko@suse.com> # for the export
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> # for the export
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-4-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Each virtio-mem device owns exactly one memory region. It is responsible
for adding/removing memory from that memory region on request.
When the device driver starts up, the requested amount of memory is
queried and then plugged to Linux. On request, further memory can be
plugged or unplugged. This patch only implements the plugging part.
On x86-64, memory can currently be plugged in 4MB ("subblock") granularity.
When required, a new memory block will be added (e.g., usually 128MB on
x86-64) in order to plug more subblocks. Only x86-64 was tested for now.
The online_page callback is used to keep unplugged subblocks offline
when onlining memory - similar to the Hyper-V balloon driver. Unplugged
pages are marked PG_offline, to tell dump tools (e.g., makedumpfile) to
skip them.
User space is usually responsible for onlining the added memory. The
memory hotplug notifier is used to synchronize virtio-mem activity
against memory onlining/offlining.
Each virtio-mem device can belong to a NUMA node, which allows us to
easily add/remove small chunks of memory to/from a specific NUMA node by
using multiple virtio-mem devices. Something that works even when the
guest has no idea about the NUMA topology.
One way to view virtio-mem is as a "resizable DIMM" or a DIMM with many
"sub-DIMMS".
This patch directly introduces the basic infrastructure to implement memory
unplug. Especially the memory block states and subblock bitmaps will be
heavily used there.
Notes:
- In case memory is to be onlined by user space, we limit the amount of
offline memory blocks, to not run out of memory. This is esp. an
issue if memory is added faster than it is getting onlined.
- Suspend/Hibernate is not supported due to the way virtio-mem devices
behave. Limited support might be possible in the future.
- Reloading the device driver is not supported.
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-2-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl7XUmwACgkQCF8+vY7k
4RU4zg//fT32wiVAPHCCp+pDZVnWNeipXE1gnpqghd/qZXfzBPiLEC9sPS74VVkA
jf1hhR33VZpKAKTPg/b074qhRZBywEOdHZnT/0CEE1oNB61shVOnyDYzLGSq95cO
6V55ovbi5IOkrg0QEJbHpG5YHzt+pq5XeWOkqGNsHwla7N7iMGMVYfHepVVDWPnZ
0wGYFF9cAJP+X/uxqkZLDVMA/K1I+QKh6vrj/qx53/eRt8VID3+i8ig3guk4PlUq
7RLw5w/CywtNaGE5zaz7T3i2eoED71JHOTXi6RxdP1z8IDvELZ9mT95GQ+enlwqt
AS6Ju1sV40wviHMv5prJWQjJkrrtYH3S907lIjwBpQLNGbh2+5crCd/6CwumkGgv
1cCZ1dVmXpCe++9mU9AXmSkjsjGPStNcmHMOpc1Pwn9jUV3LQOOSDp8+RYdt1WHU
Iw9cyM8NOpz5Mv/B1/ZPQ1gPb9lr1gE09XyUekxtAI/nl4nNHGWO8QDuX7Odfrv9
8nfo14lk/p6XCTA8dsWJCgI5B1fgnqD4frHKWO9Uctppc/KBW41c8JpQUjBNlG/T
MhtlGwYMVgSQxpQ6wK018JUAFoWkn1Sr0zMKRayqCnMjMLHsaMwE6kq+LgmRBqbB
ersKV/9ZLYqCU1d6PhEVG6xUs6GsWdLcyhALlmHsddPSdpFXdf8=
=KNAo
-----END PGP SIGNATURE-----
Merge tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Media documentation is now split into admin-guide, driver-api and
userspace-api books (a longstanding request from Jon);
- The media Kconfig was reorganized, in order to make easier to select
drivers and their dependencies;
- The testing drivers now has a separate directory;
- added a new driver for Rockchip Video Decoder IP;
- The atomisp staging driver was resurrected. It is meant to work with
4 generations of cameras on Atom-based laptops, tablets and cell
phones. So, it seems worth investing time to cleanup this driver and
making it in good shape.
- Added some V4L2 core ancillary routines to help with h264 codecs;
- Added an ov2740 image sensor driver;
- The si2157 gained support for Analog TV, which, in turn, added
support for some cx231xx and cx23885 boards to also support analog
standards;
- Added some V4L2 controls (V4L2_CID_CAMERA_ORIENTATION and
V4L2_CID_CAMERA_SENSOR_ROTATION) to help identifying where the camera
is located at the device;
- VIDIOC_ENUM_FMT was extended to support MC-centric devices;
- Lots of drivers improvements and cleanups.
* tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (503 commits)
media: Documentation: media: Refer to mbus format documentation from CSI-2 docs
media: s5k5baf: Replace zero-length array with flexible-array
media: i2c: imx219: Drop <linux/clk-provider.h> and <linux/clkdev.h>
media: i2c: Add ov2740 image sensor driver
media: ov8856: Implement sensor module revision identification
media: ov8856: Add devicetree support
media: dt-bindings: ov8856: Document YAML bindings
media: dvb-usb: Add Cinergy S2 PCIe Dual Port support
media: dvbdev: Fix tuner->demod media controller link
media: dt-bindings: phy: phy-rockchip-dphy-rx0: move rockchip dphy rx0 bindings out of staging
media: staging: dt-bindings: phy-rockchip-dphy-rx0: remove non-used reg property
media: atomisp: unify the version for isp2401 a0 and b0 versions
media: atomisp: update TODO with the current data
media: atomisp: adjust some code at sh_css that could be broken
media: atomisp: don't produce errs for ignored IRQs
media: atomisp: print IRQ when debugging
media: atomisp: isp_mmu: don't use kmem_cache
media: atomisp: add a notice about possible leak resources
media: atomisp: disable the dynamic and reserved pools
media: atomisp: turn on camera before setting it
...
No need to pull the fiemap definitions into almost every file in the
kernel build.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull networking updates from David Miller:
1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.
2) Add GSO partial support to igc, from Sasha Neftin.
3) Several cleanups and improvements to r8169 from Heiner Kallweit.
4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.
5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.
6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.
7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.
8) Add sriov and vf support to hinic, from Luo bin.
9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.
10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.
11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.
12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.
13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.
14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.
15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.
16) Several optimizations to sch_fq scheduler, from Eric Dumazet.
17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.
18) Several RISCV bpf jit optimizations, from Luke Nelson.
19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.
20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.
21) Add BPF iterators, from Yonghang Song.
22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.
23) Remove zero-length arrays all over, from Gustavo A. R. Silva.
24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.
25) Add CAP_BPF, from Alexei Starovoitov.
26) Support terse dumps in the packet scheduler, from Vlad Buslov.
27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.
28) Add devm_register_netdev(), from Bartosz Golaszewski.
29) Minimize qdisc resets, from Cong Wang.
30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested virtualization
- Nested AMD event injection facelift, building on the rework of generic code
and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page fault
work, will come next week.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
=v7Wn
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested
virtualization
- Nested AMD event injection facelift, building on the rework of
generic code and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch
with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host
side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page
fault work, will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
KVM: check userspace_addr for all memslots
KVM: selftests: update hyperv_cpuid with SynDBG tests
x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
x86/kvm/hyper-v: Add support for synthetic debugger interface
x86/hyper-v: Add synthetic debugger definitions
KVM: selftests: VMX preemption timer migration test
KVM: nVMX: Fix VMX preemption timer migration
x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
KVM: x86/pmu: Support full width counting
KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
KVM: x86: acknowledgment mechanism for async pf page ready notifications
KVM: x86: interrupt based APF 'page ready' event delivery
KVM: introduce kvm_read_guest_offset_cached()
KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
KVM: VMX: Replace zero-length array with flexible-array
...
This region provides a mechanism to pass a Channel Report Word
that affect vfio-ccw devices, and needs to be passed to the guest
for its awareness and/or processing.
The base driver (see crw_collect_info()) provides space for two
CRWs, as a subchannel event may have two CRWs chained together
(one for the ssid, one for the subchannel). As vfio-ccw will
deal with everything at the subchannel level, provide space
for a single CRW to be transferred in one shot.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200505122745.53208-7-farman@linux.ibm.com>
[CH: added padding to ccw_crw_region]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl7U50AACgkQxWXV+ddt
WDtK1g//RXeNsTguYQr1N9R5eUPThjLEI0+4J0l4SYfCPU8Ou3C7nqpOEJJQgm8F
ezZE+16cWi9U5uGueOc+w0rfyz4AuIXKgzoz+c0/GG2+yV5jp6DsAMbWqojAb96L
V/N3HxEzR66jqwgVUBE/x5okb2SyY7//B1l/O0amc66XDO7KTMImpIwThere6zWZ
o2SNpYpHAPQeUYJQx8h+FAW3w1CxrCZmnifazU9Jqe9J7QeQLg7rbUlJDV38jySm
ZOA8ohKN9U1gPZy+dTU3kdyyuBIq1etkIaSPJANyTo5TczPKiC0IMg75cXtS4ae/
NSxhccMpSIjVMcIHARzSFGYKNP3sGNRsmaTUg/2Cx/9GoHOhYMiCAVc8qtBBpwJO
UI0siexrCe64RuTBMRRc128GdFv7IjmSImcdi8xaR62bCcUiNdEa3zvjRe/9tOEH
ET7Z85oBnKpSzpC3MdhSUU4dtHY5XLawP8z3oUU1VSzSWM2DVjlHf79/VzbOfp18
miCVpt94lCn/gUX7el6qcnbuvMAjDyeC6HmfD+TwzQgGwyV6TLgKN9lRXeH/Oy6/
VgjGQSavGHMll3zIGURmrBCXKudjJg0J+IP4wN1TimmSEMfwKH+7tnekQd8y5qlF
eXEIqlWNykKeDzEnmV9QJy+/cV83hVWM/mUslcTx39tLN/3B/Us=
=qTt8
-----END PGP SIGNATURE-----
Merge tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Highlights:
- speedup dead root detection during orphan cleanup, eg. when there
are many deleted subvolumes waiting to be cleaned, the trees are
now looked up in radix tree instead of a O(N^2) search
- snapshot creation with inherited qgroup will mark the qgroup
inconsistent, requires a rescan
- send will emit file capabilities after chown, this produces a
stream that does not need postprocessing to set the capabilities
again
- direct io ported to iomap infrastructure, cleaned up and simplified
code, notably removing last use of struct buffer_head in btrfs code
Core changes:
- factor out backreference iteration, to be used by ordinary
backreferences and relocation code
- improved global block reserve utilization
* better logic to serialize requests
* increased maximum available for unlink
* improved handling on large pages (64K)
- direct io cleanups and fixes
* simplify layering, where cloned bios were unnecessarily created
for some cases
* error handling fixes (submit, endio)
* remove repair worker thread, used to avoid deadlocks during
repair
- refactored block group reading code, preparatory work for new type
of block group storage that should improve mount time on large
filesystems
Cleanups:
- cleaned up (and slightly sped up) set/get helpers for metadata data
structure members
- root bit REF_COWS got renamed to SHAREABLE to reflect the that the
blocks of the tree get shared either among subvolumes or with the
relocation trees
Fixes:
- when subvolume deletion fails due to ENOSPC, the filesystem is not
turned read-only
- device scan deals with devices from other filesystems that changed
ownership due to overwrite (mkfs)
- fix a race between scrub and block group removal/allocation
- fix long standing bug of a runaway balance operation, printing the
same line to the syslog, caused by a stale status bit on a reloc
tree that prevented progress
- fix corrupt log due to concurrent fsync of inodes with shared
extents
- fix space underflow for NODATACOW and buffered writes when it for
some reason needs to fallback to COW mode"
* tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
btrfs: fix space_info bytes_may_use underflow during space cache writeout
btrfs: fix space_info bytes_may_use underflow after nocow buffered write
btrfs: fix wrong file range cleanup after an error filling dealloc range
btrfs: remove redundant local variable in read_block_for_search
btrfs: open code key_search
btrfs: split btrfs_direct_IO to read and write part
btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
fs: remove dio_end_io()
btrfs: switch to iomap_dio_rw() for dio
iomap: remove lockdep_assert_held()
iomap: add a filesystem hook for direct I/O bio submission
fs: export generic_file_buffered_read()
btrfs: turn space cache writeout failure messages into debug messages
btrfs: include error on messages about failure to write space/inode caches
btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
btrfs: make checksum item extension more efficient
btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
btrfs: unexport btrfs_compress_set_level()
btrfs: simplify iget helpers
...
- Clean up io_is_direct.
- Add a new statx flag to indicate when file data access is being done
via DAX (as opposed to the page cache).
- Update the documentation for how system administrators and application
programmers can take advantage of the (still experimental DAX) feature.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl6wO7UACgkQ+H93GTRK
tOtEaw//eShC2YE0S+GS7ihQ3x71PJa4Is0VZOIpTHl01aqMSegwB3QbDQbVUhn1
TzLhUw4pZsz3R9GbUOrfHOYRt+aSP2t0WNhIulDeBp41CYJQSaFt85KnfM9hoBOi
VYssum3Lu7/6ReKrDD/mumzWYkts+JDCuXRmt7nOQeZJVNXOCBBbvN354V4/IKLY
wB4Wnaq3f3gYniXYW/23aCX+kocaOIUZtK6aFKyeD0KvfP5toDlpw1cBVMoM9CmO
bmEy8vKf4lgFZDLeDMqmWOecMgEH5h0baN5Psu13WuDCiCd6maBl0KpxVpVlwsep
yVz6mMbZjmLOJ2lqyw+lZb+XicD+K3yRVSTGKxV3VbuRjeX9tjVG5Im13VesNvJB
WWJq/CkOU8W0Zs7Q5RbUDGbFFWDJSI/OStAU+UeuWvL9Gndv7hqv6H904qbPPtEu
4m4Y34ARzrEaKpkABKKwQ53cLClNxmmgUN9N3cXK3mk8idlX4zM3j6+HJYUxXTO+
fBjhOlyUy2KaWmzZoJp28QvaU4iegGmMSuRnQ9HAvXmdxUA2K6+wjS6LCZGh04vz
z7SbzTBlo2kvsKdRMwJ306s2QA0/HvmKHHLI+p8OQANce9hjhE3XdJayzhitd0fk
k0D/y8OY+fbCSgI8C4g66lA8Zf2sos/ulD0QTTNBPfU2rRWkKUc=
=rS19
-----END PGP SIGNATURE-----
Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull DAX updates part one from Darrick Wong:
"After many years of LKML-wrangling about how to enable programs to
query and influence the file data access mode (DAX) when a filesystem
resides on storage devices such as persistent memory, Ira Weiny has
emerged with a proposed set of standard behaviors that has not been
shot down by anyone! We're more or less standardizing on the current
XFS behavior and adapting ext4 to do the same.
This is the first of a handful pull requests that will make ext4 and
XFS present a consistent interface for user programs that care about
DAX. We add a statx attribute that programs can check to see if DAX is
enabled on a particular file. Then, we update the DAX documentation to
spell out the user-visible behaviors that filesystems will guarantee
(until the next storage industry shakeup). The on-disk inode flag has
been in XFS for a few years now.
Summary:
- Clean up io_is_direct.
- Add a new statx flag to indicate when file data access is being
done via DAX (as opposed to the page cache).
- Update the documentation for how system administrators and
application programmers can take advantage of the (still
experimental DAX) feature"
Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/
* tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
Documentation/dax: Update Usage section
fs/stat: Define DAX statx attribute
fs: Remove unneeded IS_DAX() check in io_is_direct()
Pull lockdown update from James Morris:
"An update for the security subsystem to allow unprivileged users
to see the status of the lockdown feature. From Jeremy Cline"
Also an added comment to describe CAP_SETFCAP.
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
capabilities: add description for CAP_SETFCAP
lockdown: Allow unprivileged users to see lockdown status
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7VnKEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMbHA/+PQmrPdzPvkLAjjf1y3LXvyEIAXIQ
h2r8SxHa7iGyF6vVPz+ya7ux0KAm8wCVdfkokWG5jxjwK7pysS6gx9JzBVK7dbhD
FsKBSoq9+to9fYlaCyX7vn85C7kK5oGrwS/ECos0BHBpij8ukLgvPQu+PDs7d4xW
1X2Nrgqnc7M4L8ayzXTQX0fDWcOkapzaN86+R+Lavb4hO/FownaYbuCFn+1mdzux
ZNBpt3/y1pM6vi5YBkI1rkauBCmkl/YSX/mf/EwDNlQ0XmcadGQ6z7iwjyiE826g
etCHWD3cgQH7Zzz6CxBNX8Xbq0nIQueHHiFYpVyy9lf4xleFvnfFDebrs8Q9TB6G
jTWU8okioUKPZyRDaRuIAmCf/LBQRsMkIYTU3w6J0ZqsBycTw3NXPiQArmlxZESM
HquxWpKoZytRiw581hiSGKNqY+R3FvA+Jroc/7bWfNOE3IdFxegvCsC3giKJf1rY
AlQitehql9a5jp7A57+477WRYOygYRnd+ntMD5KqR90QSIcQXeg0/lFKhco+zc2p
bXbWLE+aaOTGCeC+3Eow3T7FMWmrIn6ccKgM84+WT7YQYtRqUYu3RIZbnlYXN7uH
8xGXT6ccPcEwIjgyF87J0KyGhrbT1N91Jd2jMJkEry9OLAn/yr+pUBQtAa456MMi
JYevS4atZaUqgvw=
=iLfC
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Summary of the significant patches:
- Record information about binds/unbinds to the audit multicast
socket. This helps identify which processes have/had access to the
information in the audit stream.
- Cleanup and add some additional information to the netfilter
configuration events collected by audit.
- Fix some of the audit error handling code so we don't leak network
namespace references"
* tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: add subj creds to NETFILTER_CFG record to
audit: Replace zero-length array with flexible-array
audit: make symbol 'audit_nfcfgs' static
netfilter: add audit table unregister actions
audit: tidy and extend netfilter_cfg x_tables
audit: log audit netlink multicast bind and unbind
audit: fix a net reference leak in audit_list_rules_send()
audit: fix a net reference leak in audit_send_reply()
Document the purpose of CAP_SETFCAP. For some reason this capability
had no description while the others did.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VP+kQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuK4D/0XsSG/Yirbba1rrbqw/qpw9xcAs9oyN0tS
8SmmGN27ghrkVSsGBXNcG+PSTu3pkkLjYZ6TQtKamrya9G+lRAsKRsQ+Yq+7Qv4e
N6lCUlLJ99KqTMtwvIoxSpA1tz3ENHucOw2cJrw3kd9G0kil7GvDkIOBasd+kmwn
ak+mnMJZzRhqSM7M5lKQOk8l92gKBHGbPy4xKb0st3dQkYptDvit0KcNSAuevtOp
sRZpdbXaT3FA6xa5iEgggI6vZQGVmK1EaGoQqZ8vgVo75aovkjZyQWWiFVVOlEqr
QjUCCQuixcbMRbZjgpojqva5nmLhFVhLCfoSH2XgttEQZhmTwypdRwM2/IlxV5q2
xCofrDkhYOfIgHkuP6p68ukIPIfQ+4jotvsmXZ/HeD/xbx3TRyJRZadISr6wiuLm
7zRXWaGCYomUIPJOOrpBQ9FsCglkaN63oB6VGuGKTg3g7kE2QrZ2/aGuexP+FAdh
OrA8BlzxZzpqMKhjQVKOl9r6FU928MZn8nIAkMdQ/Ia1mOpb4rrPo4qCdf+tbhPO
pmKtQPQjbszQ3UfTgShvfvDk43BeRim1DxZPFTauSu1FMpqWBCwQgXMynPFrf5TR
HXF61G+jw5swDW6uJgW7bXdm7hHr15vRqQr54MgGS+T0OOa1df9MR0dJB5CGklfI
ycLU6AAT+A==
=A/qA
-----END PGP SIGNATURE-----
Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"A relatively quiet round, mostly just fixes and code improvements. In
particular:
- Make statx just use the generic statx handler, instead of open
coding it. We don't need that anymore, as we always call it async
safe (Bijan)
- Enable closing of the ring itself. Also fixes O_PATH closure (me)
- Properly name completion members (me)
- Batch reap of dead file registrations (me)
- Allow IORING_OP_POLL with double waitqueues (me)
- Add tee(2) support (Pavel)
- Remove double off read (Pavel)
- Fix overflow cancellations (Pavel)
- Improve CQ timeouts (Pavel)
- Async defer drain fixes (Pavel)
- Add support for enabling/disabling notifications on a registered
eventfd (Stefano)
- Remove dead state parameter (Xiaoguang)
- Disable SQPOLL submit on dying ctx (Xiaoguang)
- Various code cleanups"
* tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
io_uring: fix overflowed reqs cancellation
io_uring: off timeouts based only on completions
io_uring: move timeouts flushing to a helper
statx: hide interfaces no longer used by io_uring
io_uring: call statx directly
statx: allow system call to be invoked from io_uring
io_uring: add io_statx structure
io_uring: get rid of manual punting in io_close
io_uring: separate DRAIN flushing into a cold path
io_uring: don't re-read sqe->off in timeout_prep()
io_uring: simplify io_timeout locking
io_uring: fix flush req->refs underflow
io_uring: don't submit sqes when ctx->refs is dying
io_uring: async task poll trigger cleanup
io_uring: add tee(2) support
splice: export do_tee()
io_uring: don't repeat valid flag list
io_uring: rename io_file_put()
io_uring: remove req->needs_fixed_files
io_uring: cleanup io_poll_remove_one() logic
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VPc4QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpgQkEACnQlzWOfNQMz1AzgUAv/S8IYDJCLrkbjLZ
JK4pJv8Hjhss/7sS+fd8kyKe9VtaZz2IjmrXcC66RMMwtpx4iHnkRffoNAgEdGOl
/M5TCZGhs+F/mp3Lc0WdR5DFHkM6yy2Tkk9wCFLreB4bW67janAWnd7nbU4INqJj
+WqIgpzNMc/kfUhpBYTeQLORhL4e2TG9ADTi/zeUITlpnEsA65LOgXKEpeIFYnSX
KTl4GIZ9tjazG3Y1Eva7DYHDIErNNAtX67KBqf+WBgMV98eB0O6xIPN1WlmhDTqj
FGMLkb8msH1HHntvxDAuc4/ortnUy8vPI4o6zKP89HJJNjIM5p5eHEuVF5JnBw42
Rtu9Om6JqWx51nhAhJNBj9bUStYbhEl0vVQCwbkfPbDJhzTy3RR8z709q9+ZwOrL
xbp4aJBzqrzscjBEiSQbNCf2PyuOAdU0r1x81UN81ZN41d5qUcumcinjw4Y7vru8
z5zMlo1Iy/AWQYyu7jgHmnpI7ZyA/1Qclo5dV7aa72bLFaJa35e7QxgfQOFBA5dY
UZl6QPJRlnB80uGRzD5jCh2O2sQ3XZqYnpaKsUAka1GgbceCp9IC4A5mfZvpACsh
Xk8VXjlhvY/iPJsKLqrh4Oedg4Dj5M3PLL9C3MDfYeIP2qgXpbnk87UV1TPNSpY0
QcTxsXXXIw==
=H+/Z
-----END PGP SIGNATURE-----
Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"On top of the core changes, here are the block driver changes for this
merge window:
- NVMe changes:
- NVMe over Fibre Channel protocol updates, which also reach
over to drivers/scsi/lpfc (James Smart)
- namespace revalidation support on the target (Anthony
Iliopoulos)
- gcc zero length array fix (Arnd Bergmann)
- nvmet cleanups (Chaitanya Kulkarni)
- misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
- use a SRQ per completion vector (Max Gurtovoy)
- fix handling of runtime changes to the queue count (Weiping
Zhang)
- t10 protection information support for nvme-rdma and
nvmet-rdma (Israel Rukshin and Max Gurtovoy)
- target side AEN improvements (Chaitanya Kulkarni)
- various fixes and minor improvements all over, icluding the
nvme part of the lpfc driver"
- Floppy code cleanup series (Willy, Denis)
- Floppy contention fix (Jiri)
- Loop CONFIGURE support (Martijn)
- bcache fixes/improvements (Coly, Joe, Colin)
- q->queuedata cleanups (Christoph)
- Get rid of ioctl_by_bdev (Christoph, Stefan)
- md/raid5 allocation fixes (Coly)
- zero length array fixes (Gustavo)
- swim3 task state fix (Xu)"
* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
bcache: configure the asynchronous registertion to be experimental
bcache: asynchronous devices registration
bcache: fix refcount underflow in bcache_device_free()
bcache: Convert pr_<level> uses to a more typical style
bcache: remove redundant variables i and n
lpfc: Fix return value in __lpfc_nvme_ls_abort
lpfc: fix axchg pointer reference after free and double frees
lpfc: Fix pointer checks and comments in LS receive refactoring
nvme: set dma alignment to qword
nvmet: cleanups the loop in nvmet_async_events_process
nvmet: fix memory leak when removing namespaces and controllers concurrently
nvmet-rdma: add metadata/T10-PI support
nvmet: add metadata support for block devices
nvmet: add metadata/T10-PI support
nvme: add Metadata Capabilities enumerations
nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
nvmet: rename nvmet_rw_len to nvmet_rw_data_len
nvmet: add metadata characteristics for a namespace
nvme-rdma: add metadata/T10-PI support
nvme-rdma: introduce nvme_rdma_sgl structure
...
- Enable erase/discard/trim support for all (e)MMC/SD hosts
- Export information through sysfs about enhanced RPMB support (eMMC v5.1+)
- Align the initialization commands for SDIO cards
- Fix SDIO initialization to prevent memory leaks and NULL pointer errors
- Do not export undefined MMC_NAME/MODALIAS for SDIO cards
- Export device/vendor field from common CIS for SDIO cards
- Move SDIO IDs from functional drivers to the common SDIO header
- Introduce the ->request_atomic() host ops
MMC host:
- Improve support for HW busy signaling for several hosts
- Converting some DT bindings to the json-schema
- meson-mx-sdhc: Add driver and DT doc for the Amlogic Meson SDHC controller
- meson-mx-sdio: Run a soft reset to recover from timeout/CRC error
- mmci: Convert to use mmc_regulator_set_vqmmc()
- mmci_stm32_sdmmc: Fix a couple of DMA bugs
- mmci_stm32_sdmmc: Fix power on issue
- renesas,mmcif,sdhci: Document r8a7742 DT bindings
- renesas_sdhi: Add support for M3-W ES1.2 and 1.3 revisions
- renesas_sdhi: Improvements to the TAP selection
- renesas_sdhi/tmio: Further fixup runtime PM management at ->remove()
- sdhci: Introduce ops to dump vendor specific registers
- sdhci-cadence: Fix PHY write sequence
- sdhci-esdhc-imx: Improve tunings
- sdhci-esdhc-imx: Enable GPIO card detect as system wakeup
- sdhci-esdhc-imx: Add HS400 support for i.MX6SLL
- sdhci-esdhc-mcf: Add driver for the Coldfire/M5441X esdhc controller
- m68k: mcf5441x: Add platform data to enable esdhc mmc controller
- sdhci-msm: Improve HS400 tuning
- sdhci-msm: Dump vendor specific registers at error
- sdhci-msm: Add support for DLL/DDR properties provided from DT
- sdhci-msm: Add support for the sm8250 variant
- sdhci-msm: Add support for DVFS by converting to dev_pm_opp_set_rate()
- sdhci-of-arasan: Add support for Intel Keem Bay variant
- sdhci-of-arasan: Add support for Xilinx Versal SD variant
- sdhci-of-dwcmshc: Add support for system suspend/resume
- sdhci-of-dwcmshc: Fix UHS signaling support
- sdhci-of-esdhc: Fix tuning for eMMC HS400 mode
- sdhci-pci-gli: Add Genesys Logic GL9763E support
- sdhci-sprd: Add support for the ->request_atomic() ops
- sdhci-tegra: Avoid reading autocal timeout values when not applicable
MEMSTICK:
- Minor trivial update.
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAl7UuIIXHHVsZi5oYW5z
c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCnzog/9GLeCHnaXan3uKU0NwWBpbiP0
RIqty/wYJpnWwL1J+otWLjVamuy/DSF4Bv4Fz+L+lXTzXWK+htXpWUragsvOeCj5
KHxSl/rASVo/zg5FOCCcq1+rYfRitAFqkWTYv0uFX+G/jcAcrspVJET3SfVYpVI4
fT2VP3zWXEh4yxJUCwnzqVPR3rsTdRob9csvpz2tuBRBUt0Vg+Kfc+NLi2EZzdsJ
GDmiYOlch4Lx+8tUtRQQUX4MTVYMWCCkfL0RhYH8+kgcD8xK/LorN0Xb6FPTln95
hVPFTP3siojnj41UWQETqVnps7nsD+1ekLqehvNq6oLy7X4JDhWxu6bk3w7Gb2ox
fk/L5x1ZFP+NFN8bvb3KPESfCKf4miuQ3fgRNad+FES7oqjN8ec55gB3oC2q5K8/
RrBFrtrcFTWBqxeYPxBMdBdj1tS7yfNPOavQg9iVGjzqJfnoMXfsKHrrTeZmGdZZ
4HudtV4BzSNUCmkvqho6TvFbLslfJ2a1RkV3tXijgbFknDoqD5rclm7KPp/ZQhH6
5ExnQpqd2EsHykJE2Zu4UxWW5f8TUBT2iBhVmTyQzrzijCtdmUkpbViaJXU0/yLD
/k+MbiiJQ4ZTpcOFMeZ73J2WFU4xLyjY4magtWtBlPUZiONQy68zQ2GVdQ0AlXEt
kbWvC7mMAYXiMJyAF9c=
=1bkc
-----END PGP SIGNATURE-----
Merge tag 'mmc-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Enable erase/discard/trim support for all (e)MMC/SD hosts
- Export information through sysfs about enhanced RPMB support (eMMC v5.1+)
- Align the initialization commands for SDIO cards
- Fix SDIO initialization to prevent memory leaks and NULL pointer errors
- Do not export undefined MMC_NAME/MODALIAS for SDIO cards
- Export device/vendor field from common CIS for SDIO cards
- Move SDIO IDs from functional drivers to the common SDIO header
- Introduce the ->request_atomic() host ops
MMC host:
- Improve support for HW busy signaling for several hosts
- Converting some DT bindings to the json-schema
- meson-mx-sdhc: Add driver and DT doc for the Amlogic Meson SDHC controller
- meson-mx-sdio: Run a soft reset to recover from timeout/CRC error
- mmci: Convert to use mmc_regulator_set_vqmmc()
- mmci_stm32_sdmmc: Fix a couple of DMA bugs
- mmci_stm32_sdmmc: Fix power on issue
- renesas,mmcif,sdhci: Document r8a7742 DT bindings
- renesas_sdhi: Add support for M3-W ES1.2 and 1.3 revisions
- renesas_sdhi: Improvements to the TAP selection
- renesas_sdhi/tmio: Further fixup runtime PM management at ->remove()
- sdhci: Introduce ops to dump vendor specific registers
- sdhci-cadence: Fix PHY write sequence
- sdhci-esdhc-imx: Improve tunings
- sdhci-esdhc-imx: Enable GPIO card detect as system wakeup
- sdhci-esdhc-imx: Add HS400 support for i.MX6SLL
- sdhci-esdhc-mcf: Add driver for the Coldfire/M5441X esdhc controller
- m68k: mcf5441x: Add platform data to enable esdhc mmc controller
- sdhci-msm: Improve HS400 tuning
- sdhci-msm: Dump vendor specific registers at error
- sdhci-msm: Add support for DLL/DDR properties provided from DT
- sdhci-msm: Add support for the sm8250 variant
- sdhci-msm: Add support for DVFS by converting to dev_pm_opp_set_rate()
- sdhci-of-arasan: Add support for Intel Keem Bay variant
- sdhci-of-arasan: Add support for Xilinx Versal SD variant
- sdhci-of-dwcmshc: Add support for system suspend/resume
- sdhci-of-dwcmshc: Fix UHS signaling support
- sdhci-of-esdhc: Fix tuning for eMMC HS400 mode
- sdhci-pci-gli: Add Genesys Logic GL9763E support
- sdhci-sprd: Add support for the ->request_atomic() ops
- sdhci-tegra: Avoid reading autocal timeout values when not applicable
MEMSTICK:
- Minor trivial update"
* tag 'mmc-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (127 commits)
dt-bindings: mmc: Convert sdhci-pxa to json-schema
mmc: sdhci-msm: Clear tuning done flag while hs400 tuning
mmc: core: Export device/vendor ids from Common CIS for SDIO cards
mmc: core: Do not export MMC_NAME= and MODALIAS=mmc:block for SDIO cards
mmc: sdhci-of-at91: fix CALCR register being rewritten
mmc: sdhci-esdhc-imx: disable the CMD CRC check for standard tuning
mmc: sdhci-esdhc-imx: fix the mask for tuning start point
mmc: host: sdhci-esdhc-imx: add wakeup feature for GPIO CD pin
mmc: mmci_sdmmc: fix DMA API warning max segment size
mmc: mmci_sdmmc: fix DMA API warning overlapping mappings
mmc: sdhci-of-arasan: Add support for Intel Keem Bay
dt-bindings: mmc: arasan: Add compatible strings for Intel Keem Bay
mmc: sdhci-cadence: fix PHY write
mmc: sdio: Sort all SDIO IDs in common include file
mmc: sdio: Fix Cypress SDIO IDs macros in common include file
mmc: sdio: Move SDIO IDs from b43-sdio driver to common include file
mmc: sdio: Move SDIO IDs from ath10k driver to common include file
mmc: sdio: Move SDIO IDs from ath6kl driver to common include file
mmc: sdio: Move SDIO IDs from smssdio driver to common include file
mmc: sdio: Move SDIO IDs from btmtksdio driver to common include file
...
Add a bpf_csum_level() helper which BPF programs can use in combination
with bpf_skb_adjust_room() when they pass in BPF_F_ADJ_ROOM_NO_CSUM_RESET
flag to the latter to avoid falling back to CHECKSUM_NONE.
The bpf_csum_level() allows to adjust CHECKSUM_UNNECESSARY skb->csum_levels
via BPF_CSUM_LEVEL_{INC,DEC} which calls __skb_{incr,decr}_checksum_unnecessary()
on the skb. The helper also allows a BPF_CSUM_LEVEL_RESET which sets the skb's
csum to CHECKSUM_NONE as well as a BPF_CSUM_LEVEL_QUERY to just return the
current level. Without this helper, there is no way to otherwise adjust the
skb->csum_level. I did not add an extra dummy flags as there is plenty of free
bitspace in level argument itself iff ever needed in future.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/279ae3717cb3d03c0ffeb511493c93c450a01e1a.1591108731.git.daniel@iogearbox.net
Lorenz recently reported:
In our TC classifier cls_redirect [0], we use the following sequence of
helper calls to decapsulate a GUE (basically IP + UDP + custom header)
encapsulated packet:
bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
bpf_redirect(skb->ifindex, BPF_F_INGRESS)
It seems like some checksums of the inner headers are not validated in
this case. For example, a TCP SYN packet with invalid TCP checksum is
still accepted by the network stack and elicits a SYN ACK. [...]
That is, we receive the following packet from the driver:
| ETH | IP | UDP | GUE | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading.
On this packet we run skb_adjust_room_mac(-encap_len), and get the following:
| ETH | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing
into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is
turned into a no-op due to CHECKSUM_UNNECESSARY.
The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally,
it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does
not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original
skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now
there is no way to adjust the skb->csum_level. NICs that have checksum offload
disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected.
Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and
add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users
from opting out. Opting out is useful for the case where we don't remove/add
full protocol headers, or for the case where a user wants to adjust the csum
level manually e.g. through bpf_csum_level() helper that is added in subsequent
patch.
The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF
bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally
as well but doesn't change layers, only transitions between v4 to v6 and vice
versa, therefore no adoption is required there.
[0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/
Fixes: 2be7e212d5 ("bpf: add bpf_skb_adjust_room helper")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Reported-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/CACAyw9-uU_52esMd1JjuA80fRPHJv5vsSg8GnfW3t_qDU4aVKQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/11a90472e7cce83e76ddbfce81fdfce7bfc68808.1591108731.git.daniel@iogearbox.net
The schib region can be used by userspace to get the subchannel-
information block (SCHIB) for the passthrough subchannel.
This can be useful to get information such as channel path
information via the SCHIB.PMCW fields.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200505122745.53208-5-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The ring element addresses are passed between components with different
alignments assumptions. Thus, if guest/userspace selects a pointer and
host then gets and dereferences it, we might need to decrease the
compiler-selected alignment to prevent compiler on the host from
assuming pointer is aligned.
This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration, but it seems safer to handle this
generally.
Note that userspace that allocates the memory is actually OK and does
not need to be fixed, but userspace that gets it from guest or another
process does need to be fixed. The later doesn't generally talk to the
kernel so while it might be buggy it's not talking to the kernel in the
buggy way - it's just using the header in the buggy way - so fixing
header and asking userspace to recompile is the best we can do.
I verified that the produced kernel binary on x86 is exactly identical
before and after the change.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Pull vfs updates from Al Viro:
"Assorted patches from Miklos.
An interesting part here is /proc/mounts stuff..."
The "/proc/mounts stuff" is using a cursor for keeeping the location
data while traversing the mount listing.
Also probably worth noting is the addition of faccessat2(), which takes
an additional set of flags to specify how the lookup is done
(AT_EACCESS, AT_SYMLINK_NOFOLLOW, AT_EMPTY_PATH).
* 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: add faccessat2 syscall
vfs: don't parse "silent" option
vfs: don't parse "posixacl" option
vfs: don't parse forbidden flags
statx: add mount_root
statx: add mount ID
statx: don't clear STATX_ATIME on SB_RDONLY
uapi: deprecate STATX_ALL
utimensat: AT_EMPTY_PATH support
vfs: split out access_override_creds()
proc/mounts: add cursor
aio: fix async fsync creds
vfs: allow unprivileged whiteout creation
set from Mauro toward the completion of the RST conversion. I *really*
hope we are getting close to the end of this. Meanwhile, those patches
reach pretty far afield to update document references around the tree;
there should be no actual code changes there. There will be, alas, more of
the usual trivial merge conflicts.
Beyond that we have more translations, improvements to the sphinx
scripting, a number of additions to the sysctl documentation, and lots of
fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl7VId8PHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Yq/gH/iaDgirQZV6UZ2v9sfwQNYolNpf2sKAuOZjd
bPFB7WJoMQbKwQEvYrAUL2+5zPOcLYuIfzyOfo1BV1py+EyKbACcKjI4AedxfJF7
+NchmOBhlEqmEhzx2U08HRc4/8J223WG17fJRVsV3p+opJySexSFeQucfOciX5NR
RUCxweWWyg/FgyqjkyMMTtsePqZPmcT5dWTlVXISlbWzcv5NFhuJXnSrw8Sfzcmm
SJMzqItv3O+CabnKQ8kMLV2PozXTMfjeWH47ZUK0Y8/8PP9+cvqwFzZ0UDQJ1Xaz
oyW/TqmunaXhfMsMFeFGSwtfgwRHvXdxkQdtwNHvo1dV4dzTvDw=
=fDC/
-----END PGP SIGNATURE-----
Merge tag 'docs-5.8' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A fair amount of stuff this time around, dominated by yet another
massive set from Mauro toward the completion of the RST conversion. I
*really* hope we are getting close to the end of this. Meanwhile,
those patches reach pretty far afield to update document references
around the tree; there should be no actual code changes there. There
will be, alas, more of the usual trivial merge conflicts.
Beyond that we have more translations, improvements to the sphinx
scripting, a number of additions to the sysctl documentation, and lots
of fixes"
* tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
Documentation: fixes to the maintainer-entry-profile template
zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
tracing: Fix events.rst section numbering
docs: acpi: fix old http link and improve document format
docs: filesystems: add info about efivars content
Documentation: LSM: Correct the basic LSM description
mailmap: change email for Ricardo Ribalda
docs: sysctl/kernel: document unaligned controls
Documentation: admin-guide: update bug-hunting.rst
docs: sysctl/kernel: document ngroups_max
nvdimm: fixes to maintainter-entry-profile
Documentation/features: Correct RISC-V kprobes support entry
Documentation/features: Refresh the arch support status files
Revert "docs: sysctl/kernel: document ngroups_max"
docs: move locking-specific documents to locking/
docs: move digsig docs to the security book
docs: move the kref doc into the core-api book
docs: add IRQ documentation at the core-api book
docs: debugging-via-ohci1394.txt: add it to the core-api book
docs: fix references for ipmi.rst file
...
Extend bpf() syscall subcommands that operate on bpf_link, that is
LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
network namespaces (only flow dissector at the moment).
Link-based and prog-based attachment can be used interchangeably, but only
one can exist at a time. Attempts to attach a link when a prog is already
attached directly, and the other way around, will be met with -EEXIST.
Attempts to detach a program when link exists result in -EINVAL.
Attachment of multiple links of same attach type to one netns is not
supported with the intention to lift the restriction when a use-case
presents itself. Because of that link create returns -E2BIG when trying to
create another netns link, when one already exists.
Link-based attachments to netns don't keep a netns alive by holding a ref
to it. Instead links get auto-detached from netns when the latter is being
destroyed, using a pernet pre_exit callback.
When auto-detached, link lives in defunct state as long there are open FDs
for it. -ENOLINK is returned if a user tries to update a defunct link.
Because bpf_link to netns doesn't hold a ref to struct net, special care is
taken when releasing, updating, or filling link info. The netns might be
getting torn down when any of these link operations are in progress. That
is why auto-detach and update/release/fill_info are synchronized by the
same mutex. Also, link ops have to always check if auto-detach has not
happened yet and if netns is still alive (refcnt > 0).
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-5-jakub@cloudflare.com
- Branch Target Identification (BTI)
* Support for ARMv8.5-BTI in both user- and kernel-space. This
allows branch targets to limit the types of branch from which
they can be called and additionally prevents branching to
arbitrary code, although kernel support requires a very recent
toolchain.
* Function annotation via SYM_FUNC_START() so that assembly
functions are wrapped with the relevant "landing pad"
instructions.
* BPF and vDSO updates to use the new instructions.
* Addition of a new HWCAP and exposure of BTI capability to
userspace via ID register emulation, along with ELF loader
support for the BTI feature in .note.gnu.property.
* Non-critical fixes to CFI unwind annotations in the sigreturn
trampoline.
- Shadow Call Stack (SCS)
* Support for Clang's Shadow Call Stack feature, which reserves
platform register x18 to point at a separate stack for each
task that holds only return addresses. This protects function
return control flow from buffer overruns on the main stack.
* Save/restore of x18 across problematic boundaries (user-mode,
hypervisor, EFI, suspend, etc).
* Core support for SCS, should other architectures want to use it
too.
* SCS overflow checking on context-switch as part of the existing
stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.
- CPU feature detection
* Removed numerous "SANITY CHECK" errors when running on a system
with mismatched AArch32 support at EL1. This is primarily a
concern for KVM, which disabled support for 32-bit guests on
such a system.
* Addition of new ID registers and fields as the architecture has
been extended.
- Perf and PMU drivers
* Minor fixes and cleanups to system PMU drivers.
- Hardware errata
* Unify KVM workarounds for VHE and nVHE configurations.
* Sort vendor errata entries in Kconfig.
- Secure Monitor Call Calling Convention (SMCCC)
* Update to the latest specification from Arm (v1.2).
* Allow PSCI code to query the SMCCC version.
- Software Delegated Exception Interface (SDEI)
* Unexport a bunch of unused symbols.
* Minor fixes to handling of firmware data.
- Pointer authentication
* Add support for dumping the kernel PAC mask in vmcoreinfo so
that the stack can be unwound by tools such as kdump.
* Simplification of key initialisation during CPU bringup.
- BPF backend
* Improve immediate generation for logical and add/sub
instructions.
- vDSO
- Minor fixes to the linker flags for consistency with other
architectures and support for LLVM's unwinder.
- Clean up logic to initialise and map the vDSO into userspace.
- ACPI
- Work around for an ambiguity in the IORT specification relating
to the "num_ids" field.
- Support _DMA method for all named components rather than only
PCIe root complexes.
- Minor other IORT-related fixes.
- Miscellaneous
* Initialise debug traps early for KGDB and fix KDB cacheflushing
deadlock.
* Minor tweaks to early boot state (documentation update, set
TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).
* Refactoring and cleanup
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl7U9csQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNLBHCACs/YU4SM7Om5f+7QnxIKao5DBr2CnGGvdC
yTfDghFDTLQVv3MufLlfno3yBe5G8sQpcZfcc+hewfcGoMzVZXu8s7LzH6VSn9T9
jmT3KjDMrg0RjSHzyumJp2McyelTk0a4FiKArSIIKsJSXUyb1uPSgm7SvKVDwEwU
JGDzL9IGilmq59GiXfDzGhTZgmC37QdwRoRxDuqtqWQe5CHoRXYexg87HwBKOQxx
HgU9L7ehri4MRZfpyjaDrr6quJo3TVnAAKXNBh3mZAskVS9ZrfKpEH0kYWYuqybv
znKyHRecl/rrGePV8RTMtrwnSdU26zMXE/omsVVauDfG9hqzqm+Q
=w3qi
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"A sizeable pile of arm64 updates for 5.8.
Summary below, but the big two features are support for Branch Target
Identification and Clang's Shadow Call stack. The latter is currently
arm64-only, but the high-level parts are all in core code so it could
easily be adopted by other architectures pending toolchain support
Branch Target Identification (BTI):
- Support for ARMv8.5-BTI in both user- and kernel-space. This allows
branch targets to limit the types of branch from which they can be
called and additionally prevents branching to arbitrary code,
although kernel support requires a very recent toolchain.
- Function annotation via SYM_FUNC_START() so that assembly functions
are wrapped with the relevant "landing pad" instructions.
- BPF and vDSO updates to use the new instructions.
- Addition of a new HWCAP and exposure of BTI capability to userspace
via ID register emulation, along with ELF loader support for the
BTI feature in .note.gnu.property.
- Non-critical fixes to CFI unwind annotations in the sigreturn
trampoline.
Shadow Call Stack (SCS):
- Support for Clang's Shadow Call Stack feature, which reserves
platform register x18 to point at a separate stack for each task
that holds only return addresses. This protects function return
control flow from buffer overruns on the main stack.
- Save/restore of x18 across problematic boundaries (user-mode,
hypervisor, EFI, suspend, etc).
- Core support for SCS, should other architectures want to use it
too.
- SCS overflow checking on context-switch as part of the existing
stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.
CPU feature detection:
- Removed numerous "SANITY CHECK" errors when running on a system
with mismatched AArch32 support at EL1. This is primarily a concern
for KVM, which disabled support for 32-bit guests on such a system.
- Addition of new ID registers and fields as the architecture has
been extended.
Perf and PMU drivers:
- Minor fixes and cleanups to system PMU drivers.
Hardware errata:
- Unify KVM workarounds for VHE and nVHE configurations.
- Sort vendor errata entries in Kconfig.
Secure Monitor Call Calling Convention (SMCCC):
- Update to the latest specification from Arm (v1.2).
- Allow PSCI code to query the SMCCC version.
Software Delegated Exception Interface (SDEI):
- Unexport a bunch of unused symbols.
- Minor fixes to handling of firmware data.
Pointer authentication:
- Add support for dumping the kernel PAC mask in vmcoreinfo so that
the stack can be unwound by tools such as kdump.
- Simplification of key initialisation during CPU bringup.
BPF backend:
- Improve immediate generation for logical and add/sub instructions.
vDSO:
- Minor fixes to the linker flags for consistency with other
architectures and support for LLVM's unwinder.
- Clean up logic to initialise and map the vDSO into userspace.
ACPI:
- Work around for an ambiguity in the IORT specification relating to
the "num_ids" field.
- Support _DMA method for all named components rather than only PCIe
root complexes.
- Minor other IORT-related fixes.
Miscellaneous:
- Initialise debug traps early for KGDB and fix KDB cacheflushing
deadlock.
- Minor tweaks to early boot state (documentation update, set
TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).
- Refactoring and cleanup"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
KVM: arm64: Check advertised Stage-2 page size capability
arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
ACPI/IORT: Remove the unused __get_pci_rid()
arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
arm64/cpufeature: Introduce ID_MMFR5 CPU register
arm64/cpufeature: Introduce ID_DFR1 CPU register
arm64/cpufeature: Introduce ID_PFR2 CPU register
arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
arm64: mm: Add asid_gen_match() helper
firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
arm64: vdso: Fix CFI directives in sigreturn trampoline
arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
...
Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
moment only the device is added. Other fields (queue_index)
can be added as use cases arise.
>From a UAPI perspective, add egress_ifindex to xdp context for
bpf programs to see the Tx device.
Update the verifier to only allow accesses to egress_ifindex by
XDP programs with BPF_XDP_DEVMAP expected attach type.
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200529220716.75383-4-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add BPF_XDP_DEVMAP attach type for use with programs associated with a
DEVMAP entry.
Allow DEVMAPs to associate a program with a device entry by adding
a bpf_prog.fd to 'struct bpf_devmap_val'. Values read show the program
id, so the fd and id are a union. bpf programs can get access to the
struct via vmlinux.h.
The program associated with the fd must have type XDP with expected
attach type BPF_XDP_DEVMAP. When a program is associated with a device
index, the program is run on an XDP_REDIRECT and before the buffer is
added to the per-cpu queue. At this point rxq data is still valid; the
next patch adds tx device information allowing the prorgam to see both
ingress and egress device indices.
XDP generic is skb based and XDP programs do not work with skb's. Block
the use case by walking maps used by a program that is to be attached
via xdpgeneric and fail if any of them are DEVMAP / DEVMAP_HASH with
Block attach of BPF_XDP_DEVMAP programs to devices.
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200529220716.75383-3-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add "rx_queue_mapping" to bpf_sock. This gives read access for the
existing field (sk_rx_queue_mapping) of struct sock from bpf_sock.
Semantics for the bpf_sock rx_queue_mapping access are similar to
sk_rx_queue_get(), i.e the value NO_QUEUE_MAPPING is not allowed
and -1 is returned in that case. This is useful for transmit queue
selection based on the received queue index which is cached in the
socket in the receive path.
v3: Addressed review comments to add usecase in patch description,
and fixed default value for rx_queue_mapping.
v2: fixed build error for CONFIG_XPS wrapping, reported by
kbuild test robot <lkp@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit adds a new MPSC ring buffer implementation into BPF ecosystem,
which allows multiple CPUs to submit data to a single shared ring buffer. On
the consumption side, only single consumer is assumed.
Motivation
----------
There are two distinctive motivators for this work, which are not satisfied by
existing perf buffer, which prompted creation of a new ring buffer
implementation.
- more efficient memory utilization by sharing ring buffer across CPUs;
- preserving ordering of events that happen sequentially in time, even
across multiple CPUs (e.g., fork/exec/exit events for a task).
These two problems are independent, but perf buffer fails to satisfy both.
Both are a result of a choice to have per-CPU perf ring buffer. Both can be
also solved by having an MPSC implementation of ring buffer. The ordering
problem could technically be solved for perf buffer with some in-kernel
counting, but given the first one requires an MPSC buffer, the same solution
would solve the second problem automatically.
Semantics and APIs
------------------
Single ring buffer is presented to BPF programs as an instance of BPF map of
type BPF_MAP_TYPE_RINGBUF. Two other alternatives considered, but ultimately
rejected.
One way would be to, similar to BPF_MAP_TYPE_PERF_EVENT_ARRAY, make
BPF_MAP_TYPE_RINGBUF could represent an array of ring buffers, but not enforce
"same CPU only" rule. This would be more familiar interface compatible with
existing perf buffer use in BPF, but would fail if application needed more
advanced logic to lookup ring buffer by arbitrary key. HASH_OF_MAPS addresses
this with current approach. Additionally, given the performance of BPF
ringbuf, many use cases would just opt into a simple single ring buffer shared
among all CPUs, for which current approach would be an overkill.
Another approach could introduce a new concept, alongside BPF map, to
represent generic "container" object, which doesn't necessarily have key/value
interface with lookup/update/delete operations. This approach would add a lot
of extra infrastructure that has to be built for observability and verifier
support. It would also add another concept that BPF developers would have to
familiarize themselves with, new syntax in libbpf, etc. But then would really
provide no additional benefits over the approach of using a map.
BPF_MAP_TYPE_RINGBUF doesn't support lookup/update/delete operations, but so
doesn't few other map types (e.g., queue and stack; array doesn't support
delete, etc).
The approach chosen has an advantage of re-using existing BPF map
infrastructure (introspection APIs in kernel, libbpf support, etc), being
familiar concept (no need to teach users a new type of object in BPF program),
and utilizing existing tooling (bpftool). For common scenario of using
a single ring buffer for all CPUs, it's as simple and straightforward, as
would be with a dedicated "container" object. On the other hand, by being
a map, it can be combined with ARRAY_OF_MAPS and HASH_OF_MAPS map-in-maps to
implement a wide variety of topologies, from one ring buffer for each CPU
(e.g., as a replacement for perf buffer use cases), to a complicated
application hashing/sharding of ring buffers (e.g., having a small pool of
ring buffers with hashed task's tgid being a look up key to preserve order,
but reduce contention).
Key and value sizes are enforced to be zero. max_entries is used to specify
the size of ring buffer and has to be a power of 2 value.
There are a bunch of similarities between perf buffer
(BPF_MAP_TYPE_PERF_EVENT_ARRAY) and new BPF ring buffer semantics:
- variable-length records;
- if there is no more space left in ring buffer, reservation fails, no
blocking;
- memory-mappable data area for user-space applications for ease of
consumption and high performance;
- epoll notifications for new incoming data;
- but still the ability to do busy polling for new data to achieve the
lowest latency, if necessary.
BPF ringbuf provides two sets of APIs to BPF programs:
- bpf_ringbuf_output() allows to *copy* data from one place to a ring
buffer, similarly to bpf_perf_event_output();
- bpf_ringbuf_reserve()/bpf_ringbuf_commit()/bpf_ringbuf_discard() APIs
split the whole process into two steps. First, a fixed amount of space is
reserved. If successful, a pointer to a data inside ring buffer data area
is returned, which BPF programs can use similarly to a data inside
array/hash maps. Once ready, this piece of memory is either committed or
discarded. Discard is similar to commit, but makes consumer ignore the
record.
bpf_ringbuf_output() has disadvantage of incurring extra memory copy, because
record has to be prepared in some other place first. But it allows to submit
records of the length that's not known to verifier beforehand. It also closely
matches bpf_perf_event_output(), so will simplify migration significantly.
bpf_ringbuf_reserve() avoids the extra copy of memory by providing a memory
pointer directly to ring buffer memory. In a lot of cases records are larger
than BPF stack space allows, so many programs have use extra per-CPU array as
a temporary heap for preparing sample. bpf_ringbuf_reserve() avoid this needs
completely. But in exchange, it only allows a known constant size of memory to
be reserved, such that verifier can verify that BPF program can't access
memory outside its reserved record space. bpf_ringbuf_output(), while slightly
slower due to extra memory copy, covers some use cases that are not suitable
for bpf_ringbuf_reserve().
The difference between commit and discard is very small. Discard just marks
a record as discarded, and such records are supposed to be ignored by consumer
code. Discard is useful for some advanced use-cases, such as ensuring
all-or-nothing multi-record submission, or emulating temporary malloc()/free()
within single BPF program invocation.
Each reserved record is tracked by verifier through existing
reference-tracking logic, similar to socket ref-tracking. It is thus
impossible to reserve a record, but forget to submit (or discard) it.
bpf_ringbuf_query() helper allows to query various properties of ring buffer.
Currently 4 are supported:
- BPF_RB_AVAIL_DATA returns amount of unconsumed data in ring buffer;
- BPF_RB_RING_SIZE returns the size of ring buffer;
- BPF_RB_CONS_POS/BPF_RB_PROD_POS returns current logical possition of
consumer/producer, respectively.
Returned values are momentarily snapshots of ring buffer state and could be
off by the time helper returns, so this should be used only for
debugging/reporting reasons or for implementing various heuristics, that take
into account highly-changeable nature of some of those characteristics.
One such heuristic might involve more fine-grained control over poll/epoll
notifications about new data availability in ring buffer. Together with
BPF_RB_NO_WAKEUP/BPF_RB_FORCE_WAKEUP flags for output/commit/discard helpers,
it allows BPF program a high degree of control and, e.g., more efficient
batched notifications. Default self-balancing strategy, though, should be
adequate for most applications and will work reliable and efficiently already.
Design and implementation
-------------------------
This reserve/commit schema allows a natural way for multiple producers, either
on different CPUs or even on the same CPU/in the same BPF program, to reserve
independent records and work with them without blocking other producers. This
means that if BPF program was interruped by another BPF program sharing the
same ring buffer, they will both get a record reserved (provided there is
enough space left) and can work with it and submit it independently. This
applies to NMI context as well, except that due to using a spinlock during
reservation, in NMI context, bpf_ringbuf_reserve() might fail to get a lock,
in which case reservation will fail even if ring buffer is not full.
The ring buffer itself internally is implemented as a power-of-2 sized
circular buffer, with two logical and ever-increasing counters (which might
wrap around on 32-bit architectures, that's not a problem):
- consumer counter shows up to which logical position consumer consumed the
data;
- producer counter denotes amount of data reserved by all producers.
Each time a record is reserved, producer that "owns" the record will
successfully advance producer counter. At that point, data is still not yet
ready to be consumed, though. Each record has 8 byte header, which contains
the length of reserved record, as well as two extra bits: busy bit to denote
that record is still being worked on, and discard bit, which might be set at
commit time if record is discarded. In the latter case, consumer is supposed
to skip the record and move on to the next one. Record header also encodes
record's relative offset from the beginning of ring buffer data area (in
pages). This allows bpf_ringbuf_commit()/bpf_ringbuf_discard() to accept only
the pointer to the record itself, without requiring also the pointer to ring
buffer itself. Ring buffer memory location will be restored from record
metadata header. This significantly simplifies verifier, as well as improving
API usability.
Producer counter increments are serialized under spinlock, so there is
a strict ordering between reservations. Commits, on the other hand, are
completely lockless and independent. All records become available to consumer
in the order of reservations, but only after all previous records where
already committed. It is thus possible for slow producers to temporarily hold
off submitted records, that were reserved later.
Reservation/commit/consumer protocol is verified by litmus tests in
Documentation/litmus-test/bpf-rb.
One interesting implementation bit, that significantly simplifies (and thus
speeds up as well) implementation of both producers and consumers is how data
area is mapped twice contiguously back-to-back in the virtual memory. This
allows to not take any special measures for samples that have to wrap around
at the end of the circular buffer data area, because the next page after the
last data page would be first data page again, and thus the sample will still
appear completely contiguous in virtual memory. See comment and a simple ASCII
diagram showing this visually in bpf_ringbuf_area_alloc().
Another feature that distinguishes BPF ringbuf from perf ring buffer is
a self-pacing notifications of new data being availability.
bpf_ringbuf_commit() implementation will send a notification of new record
being available after commit only if consumer has already caught up right up
to the record being committed. If not, consumer still has to catch up and thus
will see new data anyways without needing an extra poll notification.
Benchmarks (see tools/testing/selftests/bpf/benchs/bench_ringbuf.c) show that
this allows to achieve a very high throughput without having to resort to
tricks like "notify only every Nth sample", which are necessary with perf
buffer. For extreme cases, when BPF program wants more manual control of
notifications, commit/discard/output helpers accept BPF_RB_NO_WAKEUP and
BPF_RB_FORCE_WAKEUP flags, which give full control over notifications of data
availability, but require extra caution and diligence in using this API.
Comparison to alternatives
--------------------------
Before considering implementing BPF ring buffer from scratch existing
alternatives in kernel were evaluated, but didn't seem to meet the needs. They
largely fell into few categores:
- per-CPU buffers (perf, ftrace, etc), which don't satisfy two motivations
outlined above (ordering and memory consumption);
- linked list-based implementations; while some were multi-producer designs,
consuming these from user-space would be very complicated and most
probably not performant; memory-mapping contiguous piece of memory is
simpler and more performant for user-space consumers;
- io_uring is SPSC, but also requires fixed-sized elements. Naively turning
SPSC queue into MPSC w/ lock would have subpar performance compared to
locked reserve + lockless commit, as with BPF ring buffer. Fixed sized
elements would be too limiting for BPF programs, given existing BPF
programs heavily rely on variable-sized perf buffer already;
- specialized implementations (like a new printk ring buffer, [0]) with lots
of printk-specific limitations and implications, that didn't seem to fit
well for intended use with BPF programs.
[0] https://lwn.net/Articles/779550/
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200529075424.3139988-2-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Add AMD Fam17h RAPL support
- Introduce CAP_PERFMON to kernel and user space
- Add Zhaoxin CPU support
- Misc fixes and cleanups
Tooling changes:
perf record:
- Introduce --switch-output-event to use arbitrary events to be setup
and read from a side band thread and, when they take place a signal
be sent to the main 'perf record' thread, reusing the --switch-output
code to take perf.data snapshots from the --overwrite ring buffer, e.g.:
# perf record --overwrite -e sched:* \
--switch-output-event syscalls:*connect* \
workload
will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
connect syscalls.
- Add --num-synthesize-threads option to control degree of parallelism of the
synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be
time consuming. This mimics pre-existing behaviour in 'perf top'.
perf bench:
- Add a multi-threaded synthesize benchmark.
- Add kallsyms parsing benchmark.
Intel PT support:
- Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
- Allow using Intel PT to synthesize callchains for regular events.
- Add support for synthesizing branch stacks for regular events (cycles,
instructions, etc) from Intel PT data.
Misc changes:
- Updated perf vendor events for power9 and Coresight.
- Add flamegraph.py script via 'perf flamegraph'
- Misc other changes, fixes and cleanups - see the Git log for details.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VJAcRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hAYw/8DFtzGkMaaWkrDSj62LXtWQiqr1l01ZFt
9GzV4aN4/go+K4BQtsQN8cUjOkRHFnOryLuD9LfSBfqsdjuiyTynV/cJkeUGQBck
TT/GgWf3XKJzTUBRQRk367Gbqs9UKwBP8CdFhOXcNzGEQpjhbwwIDPmem94U4L1N
XLsysgC45ejWL1kMTZKmk6hDIidlFeDg9j70WDPX1nNfCeisk25rxwTpdgvjsjcj
3RzPRt2EGS+IkuF4QSCT5leYSGaCpVDHCQrVpHj57UoADfWAyC71uopTLG4OgYSx
PVd9gvloMeeqWmroirIxM67rMd/TBTfVekNolhnQDjqp60Huxm+gGUYmhsyjNqdx
Pb8HRZCBAudei9Ue4jNMfhCRK2Ug1oL5wNvN1xcSteAqrwMlwBMGHWns6l12x0ks
BxYhyLvfREvnKijXc1o8D5paRgqohJgfnHlrUZeacyaw5hQCbiVRpwg0T1mWAF53
u9hfWLY0Oy+Qs2C7EInNsWSYXRw8oPQNTFVx2I968GZqsEn4DC6Pt3ovWrDKIDnz
ugoZJQkJ3/O8stYSMiyENehdWlo575NkapCTDwhLWnYztrw4skqqHE8ighU/e8ug
o/Kx7ANWN9OjjjQpq2GVUeT0jCaFO+OMiGMNEkKoniYgYjogt3Gw5PeedBMtY07p
OcWTiQZamjU=
=i27M
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Add AMD Fam17h RAPL support
- Introduce CAP_PERFMON to kernel and user space
- Add Zhaoxin CPU support
- Misc fixes and cleanups
Tooling changes:
- perf record:
Introduce '--switch-output-event' to use arbitrary events to be
setup and read from a side band thread and, when they take place a
signal be sent to the main 'perf record' thread, reusing the core
for '--switch-output' to take perf.data snapshots from the ring
buffer used for '--overwrite', e.g.:
# perf record --overwrite -e sched:* \
--switch-output-event syscalls:*connect* \
workload
will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
connect syscalls.
Add '--num-synthesize-threads' option to control degree of
parallelism of the synthesize_mmap() code which is scanning
/proc/PID/task/PID/maps and can be time consuming. This mimics
pre-existing behaviour in 'perf top'.
- perf bench:
Add a multi-threaded synthesize benchmark and kallsyms parsing
benchmark.
- Intel PT support:
Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
Allow using Intel PT to synthesize callchains for regular events.
Add support for synthesizing branch stacks for regular events
(cycles, instructions, etc) from Intel PT data.
Misc changes:
- Updated perf vendor events for power9 and Coresight.
- Add flamegraph.py script via 'perf flamegraph'
- Misc other changes, fixes and cleanups - see the Git log for details
Also, since over the last couple of years perf tooling has matured and
decoupled from the kernel perf changes to a large degree, going
forward Arnaldo is going to send perf tooling changes via direct pull
requests"
* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
perf/x86/rapl: Add AMD Fam17h RAPL support
perf/x86/rapl: Make perf_probe_msr() more robust and flexible
perf/x86/rapl: Flip logic on default events visibility
perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
perf/x86/rapl: Move RAPL support to common x86 code
perf/core: Replace zero-length array with flexible-array
perf/x86: Replace zero-length array with flexible-array
perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
perf/x86/rapl: Add Ice Lake RAPL support
perf flamegraph: Use /bin/bash for report and record scripts
perf cs-etm: Move definition of 'traceid_list' global variable from header file
libsymbols kallsyms: Move hex2u64 out of header
libsymbols kallsyms: Parse using io api
perf bench: Add kallsyms parsing
perf: cs-etm: Update to build with latest opencsd version.
perf symbol: Fix kernel symbol address display
perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
...
- Add the IV_INO_LBLK_32 encryption policy flag which modifies the
encryption to be optimized for eMMC inline encryption hardware.
- Make the test_dummy_encryption mount option for ext4 and f2fs support
v2 encryption policies.
- Fix kerneldoc warnings and some coding style inconsistencies.
There will be merge conflicts with the ext4 and f2fs trees due to the
test_dummy_encryption change, but the resolutions are straightforward.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXtScMBQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKxC6AP0eOEkMrc9e10YftdN6xsyRjvqiPyFg
oMjuU+SvQ+/sVgEAo0mBFITnl75ZGb8PyqXCNMDAy6uHaxcEjVGufx5q2QE=
=dbxy
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
- Add the IV_INO_LBLK_32 encryption policy flag which modifies the
encryption to be optimized for eMMC inline encryption hardware.
- Make the test_dummy_encryption mount option for ext4 and f2fs support
v2 encryption policies.
- Fix kerneldoc warnings and some coding style inconsistencies.
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: add support for IV_INO_LBLK_32 policies
fscrypt: make test_dummy_encryption use v2 by default
fscrypt: support test_dummy_encryption=v2
fscrypt: add fscrypt_add_test_dummy_key()
linux/parser.h: add include guards
fscrypt: remove unnecessary extern keywords
fscrypt: name all function parameters
fscrypt: fix all kerneldoc warnings
Pull crypto updates from Herbert Xu:
"API:
- Introduce crypto_shash_tfm_digest() and use it wherever possible.
- Fix use-after-free and race in crypto_spawn_alg.
- Add support for parallel and batch requests to crypto_engine.
Algorithms:
- Update jitter RNG for SP800-90B compliance.
- Always use jitter RNG as seed in drbg.
Drivers:
- Add Arm CryptoCell driver cctrng.
- Add support for SEV-ES to the PSP driver in ccp"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits)
crypto: hisilicon - fix driver compatibility issue with different versions of devices
crypto: engine - do not requeue in case of fatal error
crypto: cavium/nitrox - Fix a typo in a comment
crypto: hisilicon/qm - change debugfs file name from qm_regs to regs
crypto: hisilicon/qm - add DebugFS for xQC and xQE dump
crypto: hisilicon/zip - add debugfs for Hisilicon ZIP
crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE
crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC
crypto: hisilicon/qm - add debugfs to the QM state machine
crypto: hisilicon/qm - add debugfs for QM
crypto: stm32/crc32 - protect from concurrent accesses
crypto: stm32/crc32 - don't sleep in runtime pm
crypto: stm32/crc32 - fix multi-instance
crypto: stm32/crc32 - fix run-time self test issue.
crypto: stm32/crc32 - fix ext4 chksum BUG_ON()
crypto: hisilicon/zip - Use temporary sqe when doing work
crypto: hisilicon - add device error report through abnormal irq
crypto: hisilicon - remove codes of directly report device errors through MSI
crypto: hisilicon - QM memory management optimization
crypto: hisilicon - unify initial value assignment into QM
...
A node that has the MRA role, it can behave as MRM or MRC.
Initially it starts as MRM and sends MRP_Test frames on both ring ports.
If it detects that there are MRP_Test send by another MRM, then it
checks if these frames have a lower priority than itself. In this case
it would send MRP_Nack frames to notify the other node that it needs to
stop sending MRP_Test frames.
If it receives a MRP_Nack frame then it stops sending MRP_Test frames
and starts to behave as a MRC but it would continue to monitor the
MRP_Test frames send by MRM. If at a point the MRM stops to send
MRP_Test frames it would get the MRM role and start to send MRP_Test
frames.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each MRP instance has a priority, a lower value means a higher priority.
The priority of MRP instance is stored in MRP_Test frame in this way
all the MRP nodes in the ring can see other nodes priority.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace u16/u32 with be16/be32 in the MRP frame types.
This fixes sparse warnings like:
warning: cast to restricted __be16
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.
Do not report such packets to the kernel's drop monitor as they were not
dropped by the device no encountered an exception during forwarding.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The action is used by control traps such as IGMP query. The packet is
flooded by the device, but also trapped to the CPU in order for the
software bridge to mark the receiving port as a multicast router port.
Such packets are marked with 'skb->offload_fwd_mark = 1' in order to
prevent the software bridge from flooding them again.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next
to extend ctnetlink and the flowtable infrastructure:
1) Extend ctnetlink kernel side netlink dump filtering capabilities,
from Romain Bellan.
2) Generalise the flowtable hook parser to take a hook list.
3) Pass a hook list to the flowtable hook registration/unregistration.
4) Add a helper function to release the flowtable hook list.
5) Update the flowtable event notifier to pass a flowtable hook list.
6) Allow users to add new devices to an existing flowtables.
7) Allow users to remove devices to an existing flowtables.
8) Allow for registering a flowtable with no initial devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for Hyper-V synthetic debugger (syndbg) interface.
The syndbg interface is using MSRs to emulate a way to send/recv packets
data.
The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled
and if it supports the synthetic debugger interface it will attempt to
use it, instead of trying to initialize a network adapter.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200529134543.1127440-4-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The problem the patch is trying to address is the fact that 'struct
kvm_hyperv_exit' has different layout on when compiling in 32 and 64 bit
modes.
In 64-bit mode the default alignment boundary is 64 bits thus
forcing extra gaps after 'type' and 'msr' but in 32-bit mode the
boundary is at 32 bits thus no extra gaps.
This is an issue as even when the kernel is 64 bit, the userspace using
the interface can be both 32 and 64 bit but the same 32 bit userspace has
to work with 32 bit kernel.
The issue is fixed by forcing the 64 bit layout, this leads to ABI
change for 32 bit builds and while we are obviously breaking '32 bit
userspace with 32 bit kernel' case, we're fixing the '32 bit userspace
with 64 bit kernel' one.
As the interface has no (known) users and 32 bit KVM is rather baroque
nowadays, this seems like a reasonable decision.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200424113746.3473563-2-arilou@gmail.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce new capability to indicate that KVM supports interrupt based
delivery of 'page ready' APF events. This includes support for both
MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.
The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.
Signed-off-by: David S. Miller <davem@davemloft.net>
* many 6 GHz changes, though it's not _quite_ complete
(I left out scanning for now, we're still discussing)
* allow userspace SA-query processing for operating channel
validation
* TX status for control port TX, for AP-side operation
* more per-STA/TID control options
* move to kHz for channels, for future S1G operation
* various other small changes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl7TfecACgkQB8qZga/f
l8Tx6hAAgRizfdHb9xxp001AAzKnsdU46srOKOhwV2d6w+S+qHbLtwa0Xz43pBvX
LxpQs7dBQBLYh11xJhDlKY6duYV989xGcsHm7suO43jbjDo8KXfz4MaP65em6EKt
pdD0mD1sKkfR4FhYNbUEe8Ug/185jdk+gX+aI1Nrz6XlkUoiY+czSnGFyAvpvau2
I+NGqyKG5D6ureq7p7dQcgN+t2D4Ou9stVhpQ+jP0Ep720gvfTEzeFuMJbb3JZ1y
KSgOOWS1HQj1FdlJDs3KAmgUXpkU/lxZhNxl06MMYo3tB7Y0vmLoy/ZNcb5eW4Sw
a0SHgG5yhDysCyINz6q7llG3esDcppGiNuMjd/qR2qPOZPHNtlYaHtcoKBcKdS0k
03DyURZpA0B33cr9FTV8tXaM7IMY/2qaq/DqkeNtuDzGdh4jEwkVJ4fNtUAdgcOv
4JEz3A7fY3isy8tzi7Dom4U/2hR1di5gZloAC5PPYRvnbmY9HoIqG06k1Wtn1Yj4
pbquqvdJ5ONcaAaXz7zVQUZm1JzrK81Pl3pdih7USasc8z2MEzWQPSR+hxtwG5TY
KbDI1Nel8ZLbL2MWDakh3+lPoJAMuyadRlVVWEMj4l/afYHgcy5hEbaMbaZnxmAg
G4I6R5JZTJZuVdKi/U/Q9n7jR83qfIRNbxMLY8HFZ4caJ5qhZGs=
=wdaG
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Another set of changes, including
* many 6 GHz changes, though it's not _quite_ complete
(I left out scanning for now, we're still discussing)
* allow userspace SA-query processing for operating channel
validation
* TX status for control port TX, for AP-side operation
* more per-STA/TID control options
* move to kHz for channels, for future S1G operation
* various other small changes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With some newer AKMs, the KCK and KEK are bigger, so allow that
if the driver advertises support for it. In addition, add a new
attribute for the AKM so we can use it for offloaded rekeying.
Signed-off-by: Nathan Errera <nathan.errera@intel.com>
[reword commit message]
Link: https://lore.kernel.org/r/20200528212237.5eb58b00a5d1.I61b09d77c4f382e8d58a05dcca78096e99a6bc15@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These capabilities cover what would otherwise be transported
in HT/VHT capabilities, but only a subset thereof that is
actually needed on 6 GHz with HE already present. Expose the
capabilities to userspace, drivers are expected to set them
as using the 6 GHz band (currently) requires HE capability.
Link: https://lore.kernel.org/r/20200528213443.244cd5cb9db8.Icd8c773277a88c837e7e3af1d4d1013cc3b66543@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Steffen Klassert says:
====================
pull request (net): ipsec 2020-05-29
1) Several fixes for ESP gro/gso in transport and beet mode when
IPv6 extension headers are present. From Xin Long.
2) Fix a wrong comment on XFRMA_OFFLOAD_DEV.
From Antony Antony.
3) Fix sk_destruct callback handling on ESP in TCP encapsulation.
From Sabrina Dubroca.
4) Fix a use after free in xfrm_output_gso when used with vxlan.
From Xin Long.
5) Fix secpath handling of VTI when used wiuth IPCOMP.
From Xin Long.
6) Fix an oops when deleting a x-netns xfrm interface.
From Nicolas Dichtel.
7) Fix a possible warning on policy updates. We had a case where it was
possible to add two policies with the same lookup keys.
From Xin Long.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a flag ([EXT4|FS]_DAX_FL) to preserve FS_XFLAG_DAX in the ext4
inode.
Set the flag to be user visible and changeable. Set the flag to be
inherited. Allow applications to change the flag at any time except if
it conflicts with the set of mutually exclusive flags (Currently VERITY,
ENCRYPT, JOURNAL_DATA).
Furthermore, restrict setting any of the exclusive flags if DAX is set.
While conceptually possible, we do not allow setting EXT4_DAX_FL while
at the same time clearing exclusion flags (or vice versa) for 2 reasons:
1) The DAX flag does not take effect immediately which
introduces quite a bit of complexity
2) There is no clear use case for being this flexible
Finally, on regular files, flag the inode to not be cached to facilitate
changing S_DAX on the next creation of the inode.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20200528150003.828793-9-ira.weiny@intel.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Added migration capability in IOMMU info chain.
User application should check IOMMU info chain for migration capability
to use dirty page tracking feature provided by kernel module.
User application must check page sizes supported and maximum dirty
bitmap size returned by this capability structure for ioctls used to get
dirty bitmap.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.
To get bitmap during unmap, user should allocate memory for bitmap, set
it all zeros, set size of allocated memory, set page size to be
considered for bitmap and set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
All pages pinned by vendor driver through this API should be considered as
dirty during migration. When container consists of IOMMU capable device and
all pages are pinned and mapped, then all pages are marked dirty.
Added support to start/stop dirtied pages tracking and to get bitmap of all
dirtied pages for requested IO virtual address range.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
- Defined MIGRATION region type and sub-type.
- Defined vfio_device_migration_info structure which will be placed at the
0th offset of migration region to get/set VFIO device related
information. Defined members of structure and usage on read/write access.
- Defined device states and state transition details.
- Defined sequence to be followed while saving and resuming VFIO device.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The definitions of MMC_IOC_CMD and of MMC_IOC_MULTI_CMD rely on
MMC_BLOCK_MAJOR:
#define MMC_IOC_CMD _IOWR(MMC_BLOCK_MAJOR, 0, struct mmc_ioc_cmd)
#define MMC_IOC_MULTI_CMD _IOWR(MMC_BLOCK_MAJOR, 1, struct mmc_ioc_multi_cmd)
However, MMC_BLOCK_MAJOR is defined in linux/major.h and
linux/mmc/ioctl.h did not include it.
Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200511161902.191405-1-Jerome.Pouiller@silabs.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Conntrack dump does not support kernel side filtering (only get exists,
but it returns only one entry. And user has to give a full valid tuple)
It means that userspace has to implement filtering after receiving many
irrelevant entries, consuming resources (conntrack table is sometimes
very huge, much more than a routing table for example).
This patch adds filtering in kernel side. To achieve this goal, we:
* Add a new CTA_FILTER netlink attributes, actually a flag list to
parametize filtering
* Convert some *nlattr_to_tuple() functions, to allow a partial parsing
of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not
fully set)
Filtering is now possible on:
* IP SRC/DST values
* Ports for TCP and UDP flows
* IMCP(v6) codes types and IDs
Filtering is done as an "AND" operator. For example, when flags
PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all
values are dumped.
Changes since v1:
Set NLM_F_DUMP_FILTERED in nlm flags if entries are filtered
Changes since v2:
Move several constants to nf_internals.h
Move a fix on netlink values check in a separate patch
Add a check on not-supported flags
Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack
(not yet implemented)
Code style issues
Changes since v3:
Fix compilation warning reported by kbuild test robot
Changes since v4:
Fix a regression introduced in v3 (returned EINVAL for valid netlink
messages without CTA_MARK)
Changes since v5:
Change definition of CTA_FILTER_F_ALL
Fix a regression when CTA_TUPLE_ZONE is not set
Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch reworks the MRP netlink interface. Before, each attribute
represented a binary structure which made it hard to be extended.
Therefore update the MRP netlink interface such that each existing
attribute to be a nested attribute which contains the fields of the
binary structures.
In this way the MRP netlink interface can be extended without breaking
the backwards compatibility. It is also using strict checking for
attributes under the MRP top attribute.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Close the hole of holding a mapping over kernel driver takeover event of
a given address range.
Commit 90a545e981 ("restrict /dev/mem to idle io memory ranges")
introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
kernel against scenarios where a /dev/mem user tramples memory that a
kernel driver owns. However, this protection only prevents *new* read(),
write() and mmap() requests. Established mappings prior to the driver
calling request_mem_region() are left alone.
Especially with persistent memory, and the core kernel metadata that is
stored there, there are plentiful scenarios for a /dev/mem user to
violate the expectations of the driver and cause amplified damage.
Teach request_mem_region() to find and shoot down active /dev/mem
mappings that it believes it has successfully claimed for the exclusive
use of the driver. Effectively a driver call to request_mem_region()
becomes a hole-punch on the /dev/mem device.
The typical usage of unmap_mapping_range() is part of
truncate_pagecache() to punch a hole in a file, but in this case the
implementation is only doing the "first half" of a hole punch. Namely it
is just evacuating current established mappings of the "hole", and it
relies on the fact that /dev/mem establishes mappings in terms of
absolute physical address offsets. Once existing mmap users are
invalidated they can attempt to re-establish the mapping, or attempt to
continue issuing read(2) / write(2) to the invalidated extent, but they
will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
block those subsequent accesses.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 90a545e981 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When compiling inside the kernel include linux/stddef.h instead of
stddef.h. When I compile this header file in backports for power PC I
run into a conflict with ptrdiff_t. I was unable to reproduce this in
mainline kernel. I still would like to fix this problem in the kernel.
Fixes: 6989310f5d ("wireless: Use offsetof instead of custom macro.")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://lore.kernel.org/r/20200521201422.16493-1-hauke@hauke-m.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds support to configure per TID Tx Rate configuration
through NL80211_TID_CONFIG_ATTR_TX_RATE* attributes. And it uses
nl80211_parse_tx_bitrate_mask api to validate the Tx rate mask.
Signed-off-by: Tamizh Chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1589357504-10175-1-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This adds the necessary capabilities in nl80211 to allow drivers to
assign a cookie to control port TX frames (returned via extack in
the netlink ACK message of the command) and then later report the
frame's status.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200508144202.7678-2-markus.theil@tu-ilmenau.de
[use extack cookie instead of explicit message, recombine patches]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the driver advertises NL80211_EXT_FEATURE_SCAN_FREQ_KHZ
userspace can omit NL80211_ATTR_SCAN_FREQUENCIES in favor
of an NL80211_ATTR_SCAN_FREQ_KHZ. To get scan results in
KHz userspace must also set the
NL80211_SCAN_FLAG_FREQ_KHZ.
This lets nl80211 remain compatible with older userspaces
while not requring and sending redundant (and potentially
incorrect) scan frequency sets.
Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200430172554.18383-4-thomas@adapt-ip.com
[use just nla_nest_start() (not _noflag) for NL80211_ATTR_SCAN_FREQ_KHZ]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
cfg80211 recently gained the ability to understand a
frequency offset component in KHz. Expose this in nl80211
through the new attributes NL80211_ATTR_WIPHY_FREQ_OFFSET,
NL80211_FREQUENCY_ATTR_OFFSET,
NL80211_ATTR_CENTER_FREQ1_OFFSET, and
NL80211_BSS_FREQUENCY_OFFSET.
These add support to send and receive a KHz offset
component with the following NL80211 commands:
- NL80211_CMD_FRAME
- NL80211_CMD_GET_SCAN
- NL80211_CMD_AUTHENTICATE
- NL80211_CMD_ASSOCIATE
- NL80211_CMD_CONNECT
Along with any other command which takes a chandef, ie:
- NL80211_CMD_SET_CHANNEL
- NL80211_CMD_SET_WIPHY
- NL80211_CMD_START_AP
- NL80211_CMD_RADAR_DETECT
- NL80211_CMD_NOTIFY_RADAR
- NL80211_CMD_CHANNEL_SWITCH
- NL80211_JOIN_IBSS
- NL80211_CMD_REMAIN_ON_CHANNEL
- NL80211_CMD_JOIN_OCB
- NL80211_CMD_JOIN_MESH
- NL80211_CMD_TDLS_CHANNEL_SWITCH
If the driver advertises a band containing channels with
frequency offset, it must also verify support for
frequency offset channels in its cfg80211 ops, or return
an error.
Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200430172554.18383-3-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Current rule for applying TID configuration for specific peer looks overly
complicated. No need to reject new TID configuration when override flag is
specified. Another call with the same TID configuration, but without
override flag, allows to apply new configuration anyway.
Use the same approach as for the 'all peers' case: if override flag is
specified, then reset existing TID configuration and immediately
apply a new one.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20200424112905.26770-5-sergey.matyukevich.os@quantenna.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow the user to configure where on the cable the TDR data should be
retrieved, in terms of first and last sample, and the step between
samples. Also add the ability to ask for TDR data for just one pair.
If this configuration is not provided, it defaults to 1-150m at 1m
intervals for all pairs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v3:
Move the TDR configuration into a structure
Add a range check on step
Use NL_SET_ERR_MSG_ATTR() when appropriate
Move TDR configuration into a nest
Document attributes in the request
Signed-off-by: David S. Miller <davem@davemloft.net>
Some Ethernet PHYs can return the raw time domain reflectromatry data.
Add the attributes to allow this data to be requested and returned via
netlink ethtool.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
m -> cm
Report what the PHY actually used for start/stop/step.
Signed-off-by: David S. Miller <davem@davemloft.net>
* hwsim improvements from Jouni and myself, to be able to
test more scenarios easily
* some more HE (802.11ax) support
* some initial S1G (sub 1 GHz) work for fractional MHz channels
* some (action) frame registration updates to help DPP support
* along with other various improvements/fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl7L1A8ACgkQB8qZga/f
l8T5RQ/9EUxFqBs0cWojwyad5nkesyl51eOnbvSCJJF14W93s2oMeikCynTPe8Vg
km36041QZqGbwmU0yWC9Lmm4y3ja5qQGI+QW+vT6tutGQx6FgK5TzUfYXqiFZqf6
asqkvHpH4VqmbG1KEp0PZjIpW/OVK96pbvtXVnkrcMmjl2JjbRtAhyZQVNtt9ufJ
6wqKf8e6iYqMIInMFPLX+rl7UEknxDKVcqPbMMJmY8/iM1z9Elkg3rkRSMehC+mE
8cznZ6BsjAGCbMiA8K9fUo15lcMfZCJ1hAPzkD4TsJtMEJ0gYDo5jDB8TIpr5uoL
95OnlF8jokJIsO+1g4CyaNSQsmFIuDo84vW8LtGRu9qzTP0UwelxhjZLgE3xlP6b
W+z5HomxfWkYhJhaNywLP3B1VPtJwX8dL/wpECOWHzNKXG7Rb6GqzUwaCRFb6Jjo
TmFJ5wLoEZHhsXYO2dvcyTzCUCXviXvfq60a56IyCJN8wDqmcubePv0+NOHUmj3c
E71NTYymM3j9agdSpXdCFLBXA1OgyIydeSNHuBlaPA4sK6tr4ikUtbOrABjYTaQz
2BB5fHEi8gs4EiHbSXqLFBot3JHljKJPsSN0wAgzQffN+6Kts9FG6HcrLsL+duDg
lRdAzRrunE85S0QhsxeVIX216rX4W08sl0B1rJR+dTMX9ByblAk=
=MVBJ
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
One batch of changes, containing:
* hwsim improvements from Jouni and myself, to be able to
test more scenarios easily
* some more HE (802.11ax) support
* some initial S1G (sub 1 GHz) work for fractional MHz channels
* some (action) frame registration updates to help DPP support
* along with other various improvements/fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Reserve GlobalPlatform implementation defined logon method range
- Add support to register kernel memory with TEE to allow TEE bus drivers
to register memory references.
-----BEGIN PGP SIGNATURE-----
iQJOBAABCgA4FiEEFV+gSSXZJY9ZyuB5LinzTIcAHJcFAl6wVgYaHGplbnMud2lr
bGFuZGVyQGxpbmFyby5vcmcACgkQLinzTIcAHJcPLQ//TZixfZPBBjrjbKVonYIq
eghXjOIAi9C9Se38zf5wOan+6DGrmtIzHwqKrtUH0tgP94RGTGqIIUgXt+FOZDsr
cGhsF6EG0G1O1TA+g61WWgszdLec+WLTEq5jG0gUrAuXsKNQYw478pCcCLzVq2AP
RJ3Gv0GqVWdXHRSFN0TMWJ88Xtp/GTKNVt7KHVxgQAPMML5TO7SM8/XBbS+zexXz
UaKiSg76V72NX83VUN5a6DRPOH4fg1SoWc1F4j/qqoh8dqM4v2w3Wq9qP0XpbdQo
5dc/fPBMMotBOZrZrbXfPo8Q65HKGQ/GALwbv7054xNwNdAFryVs5PkrtU5prL+3
xgelsO7K3Yo4m/d1frJEVdIEVzKbAmrjXGzB9eV9EMgW6EqlapWx7omnebe+e8BG
lenpjmh8zzssAsQrRHWvDclEcRH9/LDbzCBJahqYOa+iSSH4/bJiuk6Bt4auhmD9
0FoUNAPTxtDfl1g7nlxB+ZkPmoYzMzjx78kqgCeU/A98uGful0rBPJ6NQILWtAph
mIQOaPvAAeG70qWeln/wci87xIRVeqbPfhtUdX8tugj+fGHD1k3+HT7yABSjZX8o
4K+G2Vt1+Ip/wQBW4zELSsG33UzAqwDktxZr4fGCQ58cgoHSDrpqeE+9p3NHYgb+
191kv4Dk9+JMYsQKOQDLwKA=
=K162
-----END PGP SIGNATURE-----
Merge tag 'tee-subsys-for-5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee into arm/drivers
TEE subsystem work
- Reserve GlobalPlatform implementation defined logon method range
- Add support to register kernel memory with TEE to allow TEE bus drivers
to register memory references.
* tag 'tee-subsys-for-5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee:
tee: add private login method for kernel clients
tee: enable support to register kernel memory
Link: https://lore.kernel.org/r/20200504181049.GA10860@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The extent references v0 have been superseded long time go, there are
some unused declarations of access helpers. We can safely remove them
now. The struct btrfs_extent_ref_v0 is not used anywhere, but struct
btrfs_extent_item_v0 is still part of a backward compatibility check in
relocation.c and thus not removed.
Signed-off-by: David Sterba <dsterba@suse.com>
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-05-23
The following pull-request contains BPF updates for your *net-next* tree.
We've added 50 non-merge commits during the last 8 day(s) which contain
a total of 109 files changed, 2776 insertions(+), 2887 deletions(-).
The main changes are:
1) Add a new AF_XDP buffer allocation API to the core in order to help
lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe
as well as mlx5 have been moved over to the new API and also gained a
small improvement in performance, from Björn Töpel and Magnus Karlsson.
2) Add getpeername()/getsockname() attach types for BPF sock_addr programs
in order to allow for e.g. reverse translation of load-balancer backend
to service address/port tuple from a connected peer, from Daniel Borkmann.
3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers
being non-NULL, e.g. if after an initial test another non-NULL test on
that pointer follows in a given path, then it can be pruned right away,
from John Fastabend.
4) Larger rework of BPF sockmap selftests to make output easier to understand
and to reduce overall runtime as well as adding new BPF kTLS selftests
that run in combination with sockmap, also from John Fastabend.
5) Batch of misc updates to BPF selftests including fixing up test_align
to match verifier output again and moving it under test_progs, allowing
bpf_iter selftest to compile on machines with older vmlinux.h, and
updating config options for lirc and v6 segment routing helpers, from
Stanislav Fomichev, Andrii Nakryiko and Alan Maguire.
6) Conversion of BPF tracing samples outdated internal BPF loader to use
libbpf API instead, from Daniel T. Lee.
7) Follow-up to BPF kernel test infrastructure in order to fix a flake in
the XDP selftests, from Jesper Dangaard Brouer.
8) Minor improvements to libbpf's internal hashmap implementation, from
Ian Rogers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.
E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.
New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.
Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst
[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf
Includes a null check fix in vxlan_xmit from Nikolay
v2 - Fixed build issue:
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).
Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
This patch includes appropriate checks to avoid routes
referencing such nexthops.
example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb
$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self
[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf
v4 - fixed uninitialized variable reported by kernel test robot
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, psample can only send the packet bits after decapsulation.
The tunnel information is lost. Add the tunnel support.
If the sampled packet has no tunnel info, the behavior is the same as
before. If it has, add a nested metadata field named PSAMPLE_ATTR_TUNNEL
and include the tunnel subfields if applicable.
Increase the metadata length for sampled packet with the tunnel info.
If new subfields of tunnel info should be included, update the metadata
length accordingly.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows userspace to completely setup a loop device with a single
ioctl, removing the in-between state where the device can be partially
configured - eg the loop device has a backing file associated with it,
but is reading from the wrong offset.
Besides removing the intermediate state, another big benefit of this
ioctl is that LOOP_SET_STATUS can be slow; the main reason for this
slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to
freeze the associated queue; this requires waiting for RCU
synchronization, which I've measured can take about 15-20ms on this
device on average.
In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can
also be used to:
- Set the correct block size immediately by setting
loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE)
- Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO
in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO)
- Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY
in loop_config.info.lo_flags
Here's setting up ~70 regular loop devices with an offset on an x86
Android device, using LOOP_SET_FD and LOOP_SET_STATUS:
vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
0m03.40s real 0m00.02s user 0m00.03s system
Here's configuring ~70 devices in the same way, but using a modified
losetup that uses the new LOOP_CONFIGURE ioctl:
vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
0m01.94s real 0m00.01s user 0m00.01s system
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in
particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas
LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this
explicit by updating the UAPI to include the flags that can be
set/cleared using this ioctl.
The implementation can then blindly take over the passed in flags,
and use the previous flags for those flags that can't be set / cleared
using LOOP_SET_STATUS.
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As stated in 983695fa67 ("bpf: fix unconnected udp hooks"), the objective
for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be
transparent to applications. In Cilium we make use of these hooks [0] in
order to enable E-W load balancing for existing Kubernetes service types
for all Cilium managed nodes in the cluster. Those backends can be local
or remote. The main advantage of this approach is that it operates as close
as possible to the socket, and therefore allows to avoid packet-based NAT
given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses.
This also allows to expose NodePort services on loopback addresses in the
host namespace, for example. As another advantage, this also efficiently
blocks bind requests for applications in the host namespace for exposed
ports. However, one missing item is that we also need to perform reverse
xlation for inet{,6}_getname() hooks such that we can return the service
IP/port tuple back to the application instead of the remote peer address.
The vast majority of applications does not bother about getpeername(), but
in a few occasions we've seen breakage when validating the peer's address
since it returns unexpectedly the backend tuple instead of the service one.
Therefore, this trivial patch allows to customise and adds a getpeername()
as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order
to address this situation.
Simple example:
# ./cilium/cilium service list
ID Frontend Service Type Backend
1 1.2.3.4:80 ClusterIP 1 => 10.0.0.10:80
Before; curl's verbose output example, no getpeername() reverse xlation:
# curl --verbose 1.2.3.4
* Rebuilt URL to: 1.2.3.4/
* Trying 1.2.3.4...
* TCP_NODELAY set
* Connected to 1.2.3.4 (10.0.0.10) port 80 (#0)
> GET / HTTP/1.1
> Host: 1.2.3.4
> User-Agent: curl/7.58.0
> Accept: */*
[...]
After; with getpeername() reverse xlation:
# curl --verbose 1.2.3.4
* Rebuilt URL to: 1.2.3.4/
* Trying 1.2.3.4...
* TCP_NODELAY set
* Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
> GET / HTTP/1.1
> Host: 1.2.3.4
> User-Agent: curl/7.58.0
> Accept: */*
[...]
Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed
peer to the context similar as in inet{,6}_getname() fashion, but API-wise
this is suboptimal as it always enforces programs having to test for ctx->peer
which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split.
Similarly, the checked return code is on tnum_range(1, 1), but if a use case
comes up in future, it can easily be changed to return an error code instead.
Helper and ctx member access is the same as with connect/sendmsg/etc hooks.
[0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
The eMMC inline crypto standard will only specify 32 DUN bits (a.k.a. IV
bits), unlike UFS's 64. IV_INO_LBLK_64 is therefore not applicable, but
an encryption format which uses one key per policy and permits the
moving of encrypted file contents (as f2fs's garbage collector requires)
is still desirable.
To support such hardware, add a new encryption format IV_INO_LBLK_32
that makes the best use of the 32 bits: the IV is set to
'SipHash-2-4(inode_number) + file_logical_block_number mod 2^32', where
the SipHash key is derived from the fscrypt master key. We hash only
the inode number and not also the block number, because we need to
maintain contiguity of DUNs to merge bios.
Unlike with IV_INO_LBLK_64, with this format IV reuse is possible; this
is unavoidable given the size of the DUN. This means this format should
only be used where the requirements of the first paragraph apply.
However, the hash spreads out the IVs in the whole usable range, and the
use of a keyed hash makes it difficult for an attacker to determine
which files use which IVs.
Besides the above differences, this flag works like IV_INO_LBLK_64 in
that on ext4 it is only allowed if the stable_inodes feature has been
enabled to prevent inode numbers and the filesystem UUID from changing.
Link: https://lore.kernel.org/r/20200515204141.251098-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Paul Crowley <paulcrowley@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add a key/keyring change notification facility whereby notifications about
changes in key and keyring content and attributes can be received.
Firstly, an event queue needs to be created:
pipe2(fds, O_NOTIFICATION_PIPE);
ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256);
then a notification can be set up to report notifications via that queue:
struct watch_notification_filter filter = {
.nr_filters = 1,
.filters = {
[0] = {
.type = WATCH_TYPE_KEY_NOTIFY,
.subtype_filter[0] = UINT_MAX,
},
},
};
ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
After that, records will be placed into the queue when events occur in
which keys are changed in some way. Records are of the following format:
struct key_notification {
struct watch_notification watch;
__u32 key_id;
__u32 aux;
} *n;
Where:
n->watch.type will be WATCH_TYPE_KEY_NOTIFY.
n->watch.subtype will indicate the type of event, such as
NOTIFY_KEY_REVOKED.
n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
record.
n->watch.info & WATCH_INFO_ID will be the second argument to
keyctl_watch_key(), shifted.
n->key will be the ID of the affected key.
n->aux will hold subtype-dependent information, such as the key
being linked into the keyring specified by n->key in the case of
NOTIFY_KEY_LINKED.
Note that it is permissible for event records to be of variable length -
or, at least, the length may be dependent on the subtype. Note also that
the queue can be shared between multiple notifications of various types.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Make it possible to have a general notification queue built on top of a
standard pipe. Notifications are 'spliced' into the pipe and then read
out. splice(), vmsplice() and sendfile() are forbidden on pipes used for
notifications as post_one_notification() cannot take pipe->mutex. This
means that notifications could be posted in between individual pipe
buffers, making iov_iter_revert() difficult to effect.
The way the notification queue is used is:
(1) An application opens a pipe with a special flag and indicates the
number of messages it wishes to be able to queue at once (this can
only be set once):
pipe2(fds, O_NOTIFICATION_PIPE);
ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
(2) The application then uses poll() and read() as normal to extract data
from the pipe. read() will return multiple notifications if the
buffer is big enough, but it will not split a notification across
buffers - rather it will return a short read or EMSGSIZE.
Notification messages include a length in the header so that the
caller can split them up.
Each message has a header that describes it:
struct watch_notification {
__u32 type:24;
__u32 subtype:8;
__u32 info;
};
The type indicates the source (eg. mount tree changes, superblock events,
keyring changes, block layer events) and the subtype indicates the event
type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field
indicates a number of things, including the entry length, an ID assigned to
a watchpoint contributing to this buffer and type-specific flags.
Supplementary data, such as the key ID that generated an event, can be
attached in additional slots. The maximum message size is 127 bytes.
Messages may not be padded or aligned, so there is no guarantee, for
example, that the notification type will be on a 4-byte bounary.
Signed-off-by: David Howells <dhowells@redhat.com>
Add an O_NOTIFICATION_PIPE flag that can be passed to pipe2() to indicate
that the pipe being created is going to be used for notifications. This
suppresses the use of splice(), vmsplice(), tee() and sendfile() on the
pipe as calling iov_iter_revert() on a pipe when a kernel notification
message has been inserted into the middle of a multi-buffer splice will be
messy.
The flag is given the same value as O_EXCL as it seems unlikely that
this flag will ever be applicable to pipes and I don't want to use up
another O_* bit unnecessarily. An alternative could be to add a pipe3()
system call.
Signed-off-by: David Howells <dhowells@redhat.com>
Add UAPI definitions for the general notification queue, including the
following pieces:
(*) struct watch_notification.
This is the metadata header for notification messages. It includes a
type and subtype that indicate the source of the message
(eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message
(eg. NOTIFY_MOUNT_NEW_MOUNT).
The header also contains an information field that conveys the
following information:
- WATCH_INFO_LENGTH. The size of the entry (entries are variable
length).
- WATCH_INFO_ID. The watch ID specified when the watchpoint was
set.
- WATCH_INFO_TYPE_INFO. (Sub)type-specific information.
- WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific
information. For use by the type.
All the information in the header can be used in filtering messages at
the point of writing into the buffer.
(*) struct watch_notification_removal
This is an extended watch-removal notification record that includes an
'id' field that can indicate the identifier of the object being
removed if available (for instance, a keyring serial number).
Signed-off-by: David Howells <dhowells@redhat.com>
Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.
This patch adds a helper function for setting up nested translation
where second level comes from a domain and first level comes from
a guest PGD.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200516062101.29541-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add support for the newly defined V4L2_CID_CAMERA_ORIENTATION
and V4L2_CID_CAMERA_SENSOR_ROTATION read-only controls used to report
the camera device mounting position and orientation respectively.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Add IORING_OP_TEE implementing tee(2) support. Almost identical to
splice bits, but without offsets.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This new flag should be set/clear from the application to
disable/enable eventfd notifications when a request is completed
and queued to the CQ ring.
Before this patch, notifications were always sent if an eventfd is
registered, so IORING_CQ_EVENTFD_DISABLED is not set during the
initialization.
It will be up to the application to set the flag after initialization
if no notifications are required at the beginning.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds the new 'cq_flags' field that should be written by
the application and read by the kernel.
This new field is available to the userspace application through
'cq_off.flags'.
We are using 4-bytes previously reserved and set to zero. This means
that if the application finds this field to zero, then the new
functionality is not supported.
In the next patch we will introduce the first flag available.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 1 day(s) which contain
a total of 67 files changed, 741 insertions(+), 252 deletions(-).
The main changes are:
1) bpf_xdp_adjust_tail() now allows to grow the tail as well, from Jesper.
2) bpftool can probe CONFIG_HZ, from Daniel.
3) CAP_BPF is introduced to isolate user processes that use BPF infra and
to secure BPF networking services by dropping CAP_SYS_ADMIN requirement
in certain cases, from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse
filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option
is intended to be used to improve performance of TC filter dump when
userland only needs to obtain stats and not the whole classifier/action
data. Extend struct tcf_proto_ops with new terse_dump() callback that must
be defined by supporting classifier implementations.
Support of the options in specific classifiers and actions is
implemented in following patches in the series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split BPF operations that are allowed under CAP_SYS_ADMIN into
combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.
For backward compatibility include them in CAP_SYS_ADMIN as well.
The end result provides simple safety model for applications that use BPF:
- to load tracing program types
BPF_PROG_TYPE_{KPROBE, TRACEPOINT, PERF_EVENT, RAW_TRACEPOINT, etc}
use CAP_BPF and CAP_PERFMON
- to load networking program types
BPF_PROG_TYPE_{SCHED_CLS, XDP, SK_SKB, etc}
use CAP_BPF and CAP_NET_ADMIN
There are few exceptions from this rule:
- bpf_trace_printk() is allowed in networking programs, but it's using
tracing mechanism, hence this helper needs additional CAP_PERFMON
if networking program is using this helper.
- BPF_F_ZERO_SEED flag for hash/lru map is allowed under CAP_SYS_ADMIN only
to discourage production use.
- BPF HW offload is allowed under CAP_SYS_ADMIN.
- bpf_probe_write_user() is allowed under CAP_SYS_ADMIN only.
CAPs are not checked at attach/detach time with two exceptions:
- loading BPF_PROG_TYPE_CGROUP_SKB is allowed for unprivileged users,
hence CAP_NET_ADMIN is required at attach time.
- flow_dissector detach doesn't check prog FD at detach,
hence CAP_NET_ADMIN is required at detach time.
CAP_SYS_ADMIN is required to iterate BPF objects (progs, maps, links) via get_next_id
command and convert them to file descriptor via GET_FD_BY_ID command.
This restriction guarantees that mutliple tasks with CAP_BPF are not able to
affect each other. That leads to clean isolation of tasks. For example:
task A with CAP_BPF and CAP_NET_ADMIN loads and attaches a firewall via bpf_link.
task B with the same capabilities cannot detach that firewall unless
task A explicitly passed link FD to task B via scm_rights or bpffs.
CAP_SYS_ADMIN can still detach/unload everything.
Two networking user apps with CAP_SYS_ADMIN and CAP_NET_ADMIN can
accidentely mess with each other programs and maps.
Two networking user apps with CAP_NET_ADMIN and CAP_BPF cannot affect each other.
CAP_NET_ADMIN + CAP_BPF allows networking programs access only packet data.
Such networking progs cannot access arbitrary kernel memory or leak pointers.
bpftool, bpftrace, bcc tools binaries should NOT be installed with
CAP_BPF and CAP_PERFMON, since unpriv users will be able to read kernel secrets.
But users with these two permissions will be able to use these tracing tools.
CAP_PERFMON is least secure, since it allows kprobes and kernel memory access.
CAP_NET_ADMIN can stop network traffic via iproute2.
CAP_BPF is the safest from security point of view and harmless on its own.
Having CAP_BPF and/or CAP_NET_ADMIN is not enough to write into arbitrary map
and if that map is used by firewall-like bpf prog.
CAP_BPF allows many bpf prog_load commands in parallel. The verifier
may consume large amount of memory and significantly slow down the system.
Existing unprivileged BPF operations are not affected.
In particular unprivileged users are allowed to load socket_filter and cg_skb
program types and to create array, hash, prog_array, map-in-map map types.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-2-alexei.starovoitov@gmail.com
Finally, after all drivers have a frame size, allow BPF-helper
bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
Remember that helper/macro xdp_data_hard_end have reserved some
tailroom. Thus, this helper makes sure that the BPF-prog don't have
access to this tailroom area.
V2: Remove one chicken check and use WARN_ONCE for other
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158945348530.97035.12577148209134239291.stgit@firesoul
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-14
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON.
2) support for narrow loads in bpf_sock_addr progs and additional
helpers in cg-skb progs, from Andrey.
3) bpf benchmark runner, from Andrii.
4) arm and riscv JIT optimizations, from Luke.
5) bpf iterator infrastructure, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With having ability to lookup sockets in cgroup skb programs it becomes
useful to access cgroup id of retrieved sockets so that policies can be
implemented based on origin cgroup of such socket.
For example, a container running in a cgroup can have cgroup skb ingress
program that can lookup peer socket that is sending packets to a process
inside the container and decide whether those packets should be allowed
or denied based on cgroup id of the peer.
More specifically such ingress program can implement intra-host policy
"allow incoming packets only from this same container and not from any
other container on same host" w/o relying on source IP addresses since
quite often it can be the case that containers share same IP address on
the host.
Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and
bpf_sk_ancestor_cgroup_id().
These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id
helpers with the only difference that sk is used to get cgroup id
instead of skb, and share code with them.
See documentation in UAPI for more details.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com
bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly
code in BPF programs, like:
volatile __u32 user_port = ctx->user_port;
__u16 port = bpf_ntohs(user_port);
Since otherwise clang may optimize the load to be 2-byte and it's
rejected by verifier.
Add support for 1- and 2-byte loads same way as it's supported for other
fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com
POSIX defines faccessat() as having a fourth "flags" argument, while the
linux syscall doesn't have it. Glibc tries to emulate AT_EACCESS and
AT_SYMLINK_NOFOLLOW, but AT_EACCESS emulation is broken.
Add a new faccessat(2) syscall with the added flags argument and implement
both flags.
The value of AT_EACCESS is defined in glibc headers to be the same as
AT_REMOVEDIR. Use this value for the kernel interface as well, together
with the explanatory comment.
Also add AT_EMPTY_PATH support, which is not documented by POSIX, but can
be useful and is trivial to implement.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Determining whether a path or file descriptor refers to a mountpoint (or
more precisely a mount root) is not trivial using current tools.
Add a flag to statx that indicates whether the path or fd refers to the
root of a mount or not.
Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Reported-by: Lennart Poettering <mzxreary@0pointer.de>
Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Systemd is hacking around to get it and it's trivial to add to statx, so...
Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Constants of the *_ALL type can be actively harmful due to the fact that
developers will usually fail to consider the possible effects of future
changes to the definition.
Deprecate STATX_ALL in the uapi, while no damage has been done yet.
We could keep something like this around in the kernel, but there's
actually no point, since all filesystems should be explicitly checking
flags that they support and not rely on the VFS masking unknown ones out: a
flag could be known to the VFS, yet not known to the filesystem.
Cc: David Howells <dhowells@redhat.com>
Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Raw Gadget is currently unable to stall/halt/wedge gadget endpoints,
which is required for proper emulation of certain USB classes.
This patch adds a few more ioctls:
- USB_RAW_IOCTL_EP0_STALL allows to stall control endpoint #0 when
there's a pending setup request for it.
- USB_RAW_IOCTL_SET/CLEAR_HALT/WEDGE allow to set/clear halt/wedge status
on non-control non-isochronous endpoints.
Fixes: f2c2e71764 ("usb: gadget: add raw-gadget interface")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
Currently automatic gadget endpoint selection based on required features
doesn't work. Raw Gadget tries iterating over the list of available
endpoints and finding one that has the right direction and transfer type.
Unfortunately selecting arbitrary gadget endpoints (even if they satisfy
feature requirements) doesn't work, as (depending on the UDC driver) they
might have fixed addresses, and one also needs to provide matching
endpoint addresses in the descriptors sent to the host.
The composite framework deals with this by assigning endpoint addresses
in usb_ep_autoconfig() before enumeration starts. This approach won't work
with Raw Gadget as the endpoints are supposed to be enabled after a
set_configuration/set_interface request from the host, so it's too late to
patch the endpoint descriptors that had already been sent to the host.
For Raw Gadget we take another approach. Similarly to GadgetFS, we allow
the user to make the decision as to which gadget endpoints to use.
This patch adds another Raw Gadget ioctl USB_RAW_IOCTL_EPS_INFO that
exposes information about all non-control endpoints that a currently
connected UDC has. This information includes endpoints addresses, as well
as their capabilities and limits to allow the user to choose the most
fitting gadget endpoint.
The USB_RAW_IOCTL_EP_ENABLE ioctl is updated to use the proper endpoint
validation routine usb_gadget_ep_match_desc().
These changes affect the portability of the gadgets that use Raw Gadget
when running on different UDCs. Nevertheless, as long as the user relies
on the information provided by USB_RAW_IOCTL_EPS_INFO to dynamically
choose endpoint addresses, UDC-agnostic gadgets can still be written with
Raw Gadget.
Fixes: f2c2e71764 ("usb: gadget: add raw-gadget interface")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6BIG4eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlHUH/RCFve2sfHRPjRW+
xR5SaLVAw6XKvtKBq7yvKmHEwqNJnL79IHyqqtSrtfFr2FfaH/KvYiCbbAezvSrM
np0udGu7STKGd21CWuyEZJudyhXkOwMRNiFiCXWp7rs35oh8T0TpJxMzo2Nc1nLk
JFQPqAP6OSvq4IkWEywKQI+Au3Z1IBf83xVjZ1s+MKPQHYD49x2hc4cQntL5/cnm
a3DoR2iBkYiGZCZ9dDqAqJTnMQIiCbACdZXgGjNRUpdyA/dtAjsMl11NPYHm8TA2
3AHBupAK50WBZGad6xv2qKQyScsmoJG2mv92QjlOFz0Tpiu6rLnDlLYREDVB6YH6
qbLDsc8=
=XEIU
-----END PGP SIGNATURE-----
Merge tag 'v5.6' into next
Sync up with mainline to get device tree and other changes.
UBSAN: array-index-out-of-bounds in drivers/block/floppy.c:1521:45
index 16 is out of range for type 'unsigned char [16]'
Call Trace:
...
setup_rw_floppy+0x5c3/0x7f0
floppy_ready+0x2be/0x13b0
process_one_work+0x2c1/0x5d0
worker_thread+0x56/0x5e0
kthread+0x122/0x170
ret_from_fork+0x35/0x40
From include/uapi/linux/fd.h:
struct floppy_raw_cmd {
...
unsigned char cmd_count;
unsigned char cmd[16];
unsigned char reply_count;
unsigned char reply[16];
...
}
This out-of-bounds access is intentional. The command in struct
floppy_raw_cmd may take up the space initially intended for the reply
and the reply count. It is needed for long 82078 commands such as
RESTORE, which takes 17 command bytes. Initial cmd size is not enough
and since struct setup_rw_floppy is a part of uapi we check that
cmd_count is in [0:16+1+16] in raw_cmd_copyin().
The patch adds union with original cmd,reply_count,reply fields and
fullcmd field of equivalent size. The cmd accesses are turned to
fullcmd where appropriate to suppress UBSAN warning.
Link: https://lore.kernel.org/r/20200501134416.72248-5-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
Use FD_RAW_CMD_SIZE, FD_RAW_REPLY_SIZE defines instead of magic numbers
for cmd & reply buffers of struct floppy_raw_cmd. Remove local to
floppy.c MAX_REPLIES define, as it is now FD_RAW_REPLY_SIZE.
FD_RAW_CMD_FULLSIZE added as we allow command to also fill reply_count
and reply fields.
Link: https://lore.kernel.org/r/20200501134416.72248-4-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
Use FD_AUTODETECT_SIZE for autodetect buffer size in struct
floppy_drive_params instead of a magic number.
Link: https://lore.kernel.org/r/20200501134416.72248-3-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
This controller provides extra status registers SRA and SRB as well
as a tape drive register (TDR) and a data rate select register (DSR),
which are referenced in the sparc port, so let's have their symbolic
definitions centralized.
Link: https://lore.kernel.org/r/20200331094054.24441-3-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
While normal video/radio/vbi/swradio nodes have a proper QUERYCAP ioctl
that apps can call to determine that it is indeed a V4L2 device, there
is currently no equivalent for v4l-subdev nodes. Adding this ioctl will
solve that, and it will allow utilities like v4l2-compliance to be used
with these devices as well.
SUBDEV_QUERYCAP currently returns the version and capabilities of the
subdevice. Define a capability flag to report if the subdevice is
registered in read-only mode.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Minor improvements to the documentation for BPF helpers:
* Fix formatting for the description of "bpf_socket" for
bpf_getsockopt() and bpf_setsockopt(), thus suppressing two warnings
from rst2man about "Unexpected indentation".
* Fix formatting for return values for bpf_sk_assign() and seq_file
helpers.
* Fix and harmonise formatting, in particular for function/struct names.
* Remove blank lines before "Return:" sections.
* Replace tabs found in the middle of text lines.
* Fix typos.
* Add a note to the footer (in Python script) about "bpftool feature
probe", including for listing features available to unprivileged
users, and add a reference to bpftool man page.
Thanks to Florian for reporting two typos (duplicated words).
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200511161536.29853-4-quentin@isovalent.com
Add the attributes needed to report cable test results to userspace.
The reports are expected to be per twisted pair. A nested property per
pair can report the result of the cable test. A nested property can
also report the length of the cable to any fault.
v2:
Grammar fixes
Change length from u16 to u32
s/DEV/HEADER/g
Add status attributes
Rename pairs from numbers to letters.
v3:
Fixed example in document
Add ETHTOOL_A_CABLE_NEST_* enum
Add ETHTOOL_MSG_CABLE_TEST_NTF to documentation
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add new ethtool netlink calls to trigger the starting of a PHY cable
test.
Add Kconfig'ury to ETHTOOL_NETLINK so that PHYLIB is not a module when
ETHTOOL_NETLINK is builtin, which would result in kernel linking errors.
v2:
Remove unwanted white space change
Remove ethnl_cable_test_act_ops and use doit handler
Rename cable_test_set_policy cable_test_act_policy
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY
v3:
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY from documentation
Remove unused cable_test_get_policy
Add Reviewed-by tags
v4:
Remove unwanted blank line
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two helpers bpf_seq_printf and bpf_seq_write, are added for
writing data to the seq_file buffer.
bpf_seq_printf supports common format string flag/width/type
fields so at least I can get identical results for
netlink and ipv6_route targets.
For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
specifically indicates a write failure due to overflow, which
means the object will be repeated in the next bpf invocation
if object collection stays the same. Note that if the object
collection is changed, depending how collection traversal is
done, even if the object still in the collection, it may not
be visited.
For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to
read kernel memory. Reading kernel memory may fail in
the following two cases:
- invalid kernel address, or
- valid kernel address but requiring a major fault
If reading kernel memory failed, the %s string will be
an empty string and %p{i,I}{4,6} will be all 0.
Not returning error to bpf program is consistent with
what bpf_trace_printk() does for now.
bpf_seq_printf may return -EBUSY meaning that internal percpu
buffer for memory copy of strings or other pointees is
not available. Bpf program can return 1 to indicate it
wants the same object to be repeated. Right now, this should not
happen on no-RT kernels since migrate_disable(), which guards
bpf prog call, calls preempt_disable().
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com
A new bpf command BPF_ITER_CREATE is added.
The anonymous bpf iterator is seq_file based.
The seq_file private data are referenced by targets.
The bpf_iter infrastructure allocated additional space
at seq_file->private before the space used by targets
to store some meta data, e.g.,
prog: prog to run
session_id: an unique id for each opened seq_file
seq_num: how many times bpf programs are queried in this session
done_stop: an internal state to decide whether bpf program
should be called in seq_ops->stop() or not
The seq_num will start from 0 for valid objects.
The bpf program may see the same seq_num more than once if
- seq_file buffer overflow happens and the same object
is retried by bpf_seq_read(), or
- the bpf program explicitly requests a retry of the
same object
Since module is not supported for bpf_iter, all target
registeration happens at __init time, so there is no
need to change bpf_iter_unreg_target() as it is used
mostly in error path of the init function at which time
no bpf iterators have been created yet.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com
Given a bpf program, the step to create an anonymous bpf iterator is:
- create a bpf_iter_link, which combines bpf program and the target.
In the future, there could be more information recorded in the link.
A link_fd will be returned to the user space.
- create an anonymous bpf iterator with the given link_fd.
The bpf_iter_link can be pinned to bpffs mount file system to
create a file based bpf iterator as well.
The benefit to use of bpf_iter_link:
- using bpf link simplifies design and implementation as bpf link
is used for other tracing bpf programs.
- for file based bpf iterator, bpf_iter_link provides a standard
way to replace underlying bpf programs.
- for both anonymous and free based iterators, bpf link query
capability can be leveraged.
The patch added support of tracing/iter programs for BPF_LINK_CREATE.
A new link type BPF_LINK_TYPE_ITER is added to facilitate link
querying. Currently, only prog_id is needed, so there is no
additional in-kernel show_fdinfo() and fill_link_info() hook
is needed for BPF_LINK_TYPE_ITER link.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
A bpf_iter program is a tracing program with attach type
BPF_TRACE_ITER. The load attribute
attach_btf_id
is used by the verifier against a particular kernel function,
which represents a target, e.g., __bpf_iter__bpf_map
for target bpf_map which is implemented later.
The program return value must be 0 or 1 for now.
0 : successful, except potential seq_file buffer overflow
which is handled by seq_file reader.
1 : request to restart the same object
In the future, other return values may be used for filtering or
teminating the iterator.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
We want to have a tighter control on what ports we bind to in
the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means
connect() becomes slightly more expensive. The expensive part
comes from the fact that we now need to call inet_csk_get_port()
that verifies that the port is not used and allocates an entry
in the hash table for it.
Since we can't rely on "snum || !bind_address_no_port" to prevent
us from calling POST_BIND hook anymore, let's add another bind flag
to indicate that the call site is BPF program.
v5:
* fix wrong AF_INET (should be AF_INET6) in the bpf program for v6
v3:
* More bpf_bind documentation refinements (Martin KaFai Lau)
* Add UDP tests as well (Martin KaFai Lau)
* Don't start the thread, just do socket+bind+listen (Martin KaFai Lau)
v2:
* Update documentation (Andrey Ignatov)
* Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-5-sdf@google.com
This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of
auto-negotiation support, we needed to be able to configure the
MASTER-SLAVE role of the port manually or from an application in user
space.
The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to
force MASTER or SLAVE role. See IEEE 802.3-2018:
22.2.4.3.7 MASTER-SLAVE control register (Register 9)
22.2.4.3.8 MASTER-SLAVE status register (Register 10)
40.5.2 MASTER-SLAVE configuration resolution
45.2.1.185.1 MASTER-SLAVE config value (1.2100.14)
45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32)
The MASTER-SLAVE role affects the clock configuration:
-------------------------------------------------------------------------------
When the PHY is configured as MASTER, the PMA Transmit function shall
source TX_TCLK from a local clock source. When configured as SLAVE, the
PMA Transmit function shall source TX_TCLK from the clock recovered from
data stream provided by MASTER.
iMX6Q KSZ9031 XXX
------\ /-----------\ /------------\
| | | | |
MAC |<----RGMII----->| PHY Slave |<------>| PHY Master |
|<--- 125 MHz ---+-<------/ | | \ |
------/ \-----------/ \------------/
^
\-TX_TCLK
-------------------------------------------------------------------------------
Since some clock or link related issues are only reproducible in a
specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial
to provide generic (not 100BASE-T1 specific) interface to the user space
for configuration flexibility and trouble shooting.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VIDIOC_ENUM_FMT ioctl enumerates all formats supported by a video
node. For MC-centric devices, its behaviour has always been ill-defined,
with drivers implementing one of the following behaviours:
- No support for VIDIOC_ENUM_FMT at all
- Enumerating all formats supported by the video node, regardless of the
configuration of the pipeline
- Enumerating formats supported by the video node for the active
configuration of the connected subdevice
The first behaviour is obviously useless for applications. The second
behaviour provides the most information, but doesn't offer a way to find
what formats are compatible with a given pipeline configuration. The
third behaviour fixes that, but with the drawback that applications
can't enumerate all supported formats anymore, and have to modify the
active configuration of the pipeline to enumerate formats.
The situation is messy as none of the implemented behaviours are ideal,
and userspace can't predict what will happen as the behaviour is
driver-specific.
To fix this, let's extend the VIDIOC_ENUM_FMT with a missing capability:
enumerating pixel formats for a given media bus code. The media bus code
is passed through the v4l2_fmtdesc structure in a new mbus_code field
(repurposed from the reserved fields). With this capability in place,
applications can enumerate pixel formats for a given media bus code
without modifying the active configuration of the device.
The current behaviour of the ioctl is preserved when the new mbus_code
field is set to 0, ensuring compatibility with existing userspace. The
API extension is documented as mandatory for MC-centric devices (as
advertised through the V4L2_CAP_IO_MC capability), allowing applications
and compliance tools to easily determine the availability of the
VIDIOC_ENUM_FMT extension.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Add a video device capability flag to indicate that its inputs and/or
outputs are controlled by the Media Controller instead of the V4L2 API.
When this flag is set, ioctl for enum inputs and outputs are
automatically enabled and programmed to call a helper function.
Suggested-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Merge in user support for Branch Target Identification, which narrowly
missed the cut for 5.7 after a late ABI concern.
* for-next/bti-user:
arm64: bti: Document behaviour for dynamically linked binaries
arm64: elf: Fix allnoconfig kernel build with !ARCH_USE_GNU_PROPERTY
arm64: BTI: Add Kconfig entry for userspace BTI
mm: smaps: Report arm64 guarded pages in smaps
arm64: mm: Display guarded pages in ptdump
KVM: arm64: BTI: Reset BTYPE when skipping emulated instructions
arm64: BTI: Reset BTYPE when skipping emulated instructions
arm64: traps: Shuffle code to eliminate forward declarations
arm64: unify native/compat instruction skipping
arm64: BTI: Decode BYTPE bits when printing PSTATE
arm64: elf: Enable BTI at exec based on ELF program properties
elf: Allow arch to tweak initial mmap prot flags
arm64: Basic Branch Target Identification support
ELF: Add ELF program property parsing support
ELF: UAPI and Kconfig additions for ELF program properties
QUIC servers would like to use SO_TXTIME, without having CAP_NET_ADMIN,
to efficiently pace UDP packets.
As far as sch_fq is concerned, we need to add safety checks, so
that a buggy application does not fill the qdisc with packets
having delivery time far in the future.
This patch adds a configurable horizon (default: 10 seconds),
and a configurable policy when a packet is beyond the horizon
at enqueue() time:
- either drop the packet (default policy)
- or cap its delivery time to the horizon.
$ tc -s -d qd sh dev eth0
qdisc fq 8022: root refcnt 257 limit 10000p flow_limit 100p buckets 1024
orphan_mask 1023 quantum 10Kb initial_quantum 51160b low_rate_threshold 550Kbit
refill_delay 40.0ms timer_slack 10.000us horizon 10.000s
Sent 1234215879 bytes 837099 pkt (dropped 21, overlimits 0 requeues 6)
backlog 0b 0p requeues 6
flows 1191 (inactive 1177 throttled 0)
gc 0 highprio 0 throttled 692 latency 11.480us
pkts_too_long 0 alloc_errors 0 horizon_drops 21 horizon_caps 0
v2: fixed an overflow on 32bit kernels in fq_init(), reported
by kbuild test robot <lkp@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These structures can get embedded in other structures in user-space
and cause all sorts of warnings and problems. So, we better don't take
any chances and keep the zero-length arrays in place for now.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
In order for users to determine if a file is currently operating in DAX
state (effective DAX). Define a statx attribute value and set that
attribute if the effective DAX flag is set.
To go along with this we propose the following addition to the statx man
page:
STATX_ATTR_DAX
The file is in the DAX (cpu direct access) state. DAX state
attempts to minimize software cache effects for both I/O and
memory mappings of this file. It requires a file system which
has been configured to support DAX.
DAX generally assumes all accesses are via cpu load / store
instructions which can minimize overhead for small accesses, but
may adversely affect cpu utilization for large transfers.
File I/O is done directly to/from user-space buffers and memory
mapped I/O may be performed with direct memory mappings that
bypass kernel page cache.
While the DAX property tends to result in data being transferred
synchronously, it does not give the same guarantees of O_SYNC
where data and the necessary metadata are transferred together.
A DAX file may support being mapped with the MAP_SYNC flag,
which enables a program to use CPU cache flush instructions to
persist CPU store operations without an explicit fsync(2). See
mmap(2) for more information.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Add adjust_phase to ptp_clock_caps capability to allow
user to query if a PHC driver supports adjust phase with
ioctl PTP_CLOCK_GETCAPS command.
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-01 (v2)
The following pull-request contains BPF updates for your *net-next* tree.
We've added 61 non-merge commits during the last 6 day(s) which contain
a total of 153 files changed, 6739 insertions(+), 3367 deletions(-).
The main changes are:
1) pulled work.sysctl from vfs tree with sysctl bpf changes.
2) bpf_link observability, from Andrii.
3) BTF-defined map in map, from Andrii.
4) asan fixes for selftests, from Andrii.
5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub.
6) production cloudflare classifier as a selftes, from Lorenz.
7) bpf_ktime_get_*_ns() helper improvements, from Maciej.
8) unprivileged bpftool feature probe, from Quentin.
9) BPF_ENABLE_STATS command, from Song.
10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav.
11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs,
from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can be passed at
specific time slot, and be dropped at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action requires the user assign a time
clock type.
Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped. When the ingress frames have reach total frames over
8000000 bytes, the excessive frames will be dropped in that 200000000ns
time slot.
> tc qdisc add dev eth0 ingress
> tc filter add dev eth0 parent ffff: protocol ip \
flower src_ip 192.168.0.20 \
action gate index 2 clockid CLOCK_TAI \
sched-entry open 200000000 -1 8000000 \
sched-entry close 100000000 -1 -1
> tc chain del dev eth0 ingress chain 0
"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow with period nanosecond. Then next item is internal
priority value means which ingress queue should put. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.
Base-time is not set will be 0 as default, as result start time would
be ((N + 1) * cycletime) which is the minimal of future time.
Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:
1357000000000 + (N + 1) * cycletime
When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.
> tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
action gate index 12 base-time 1357000000000 \
sched-entry close 200000000 -1 -1 \
clockid CLOCK_TAI
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, bpf_getsockopt and bpf_setsockopt helpers operate on the
'struct bpf_sock_ops' context in BPF_PROG_TYPE_SOCK_OPS program.
Let's generalize them and make them available for 'struct bpf_sock_addr'.
That way, in the future, we can allow those helpers in more places.
As an example, let's expose those 'struct bpf_sock_addr' based helpers to
BPF_CGROUP_INET{4,6}_CONNECT hooks. That way we can override CC before the
connection is made.
v3:
* Expose custom helpers for bpf_sock_addr context instead of doing
generic bpf_sock argument (as suggested by Daniel). Even with
try_socket_lock that doesn't sleep we have a problem where context sk
is already locked and socket lock is non-nestable.
v2:
* s/BPF_PROG_TYPE_CGROUP_SOCKOPT/BPF_PROG_TYPE_SOCK_OPS/
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200430233152.199403-1-sdf@google.com
Not much to be done here:
- add SPDX header;
- adjust title markup;
- remove a tail whitespace;
- add to networking/index.rst.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
Typical userspace tools use kernel.bpf_stats_enabled as follows:
1. Enable kernel.bpf_stats_enabled;
2. Check program run_time_ns;
3. Sleep for the monitoring period;
4. Check program run_time_ns again, calculate the difference;
5. Disable kernel.bpf_stats_enabled.
The problem with this approach is that only one userspace tool can toggle
this sysctl. If multiple tools toggle the sysctl at the same time, the
measurement may be inaccurate.
To fix this problem while keep backward compatibility, introduce a new
bpf command BPF_ENABLE_STATS. On success, this command enables stats and
returns a valid fd. BPF_ENABLE_STATS takes argument "type". Currently,
only one type, BPF_STATS_RUN_TIME, is supported. We can extend the
command to support other types of stats in the future.
With BPF_ENABLE_STATS, user space tool would have the following flow:
1. Get a fd with BPF_ENABLE_STATS, and make sure it is valid;
2. Check program run_time_ns;
3. Sleep for the monitoring period;
4. Check program run_time_ns again, calculate the difference;
5. Close the fd.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200430071506.1408910-2-songliubraving@fb.com
Corrected two function names. Added a missing space.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add, and use in generic netlink, helpers to dump out a netlink
policy to userspace, including all the range validation data,
nested policies etc.
This lets userspace discover what the kernel understands.
For families/commands other than generic netlink, the helpers
need to be used directly in an appropriate command, or we can
add some infrastructure (a new netlink family) that those can
register their policies with for introspection. I'm not that
familiar with non-generic netlink, so that's left out for now.
The data exposed to userspace also includes min and max length
for binary/string data, I've done that instead of letting the
userspace tools figure out whether min/max is intended based
on the type so that we can extend this later in the kernel, we
might want to just use the range data for example.
Because of this, I opted to not directly expose the NLA_*
values, even if some of them are already exposed via BPF, as
with min/max length we don't need to have different types here
for NLA_BINARY/NLA_MIN_LEN/NLA_EXACT_LEN, we just make them
all NL_ATTR_TYPE_BINARY with min/max length optionally set.
Similarly, we don't really need NLA_MSECS, and perhaps can
remove it in the future - but not if we encode it into the
userspace API now. It gets mapped to NL_ATTR_TYPE_U64 here.
Note that the exposing here corresponds to the strict policy
interpretation, and NLA_UNSPEC items are omitted entirely.
To get those, change them to NLA_MIN_LEN which behaves in
exactly the same way, but is exposed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fixes for dma-buf, an off-by-one fix in edid, and a return code fix in
DP-MST
-----BEGIN PGP SIGNATURE-----
iHQEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXqrvVgAKCRDj7w1vZxhR
xazeAPMFYoHj36L2cbr7lrUDER2s6cNdCpUGN0tuQx9fYjmQAP0RRCXpbfyFESvf
MG5BBZvARO7OUtUCujogiPbmAVn1DQ==
=r+Az
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2020-04-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A few resources-related fixes for qxl, some doc build warnings and ioctl
fixes for dma-buf, an off-by-one fix in edid, and a return code fix in
DP-MST
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430153201.wx6of2b2gsoip7bk@gilmour.lan
- add SPDX header;
- add a document title;
- adjust titles and chapters, adding proper markups;
- mark code blocks and literals as such;
- adjust identation, whitespaces and blank lines where needed;
- add to networking/index.rst.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ability to filter sockets based on cgroup v2 ID.
Such filter is helpful in ss utility for filtering sockets by
cgroup pathname.
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds cgroup v2 ID to common inet diag message attributes.
Cgroup v2 ID is kernfs ID (ino or ino+gen). This attribute allows filter
inet diag output by cgroup ID obtained by name_to_handle_at() syscall.
When net_cls or net_prio cgroup is activated this ID is equal to 1 (root
cgroup ID) for newly created sockets.
Some notes about this ID:
1) gets initialized in socket() syscall
2) incoming socket gets ID from listening socket
(not during accept() syscall)
3) not changed when process get moved to another cgroup
4) can point to deleted cgroup (refcounting)
v2:
- use CONFIG_SOCK_CGROUP_DATA instead if CONFIG_CGROUPS
v3:
- fix attr size by using nla_total_size_64bit() (Eric Dumazet)
- more detailed commit message (Konstantin Khlebnikov)
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-By: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To provide support for SEV-ES, the hypervisor must provide an area of
memory to the PSP. Once this Trusted Memory Region (TMR) is provided to
the PSP, the contents of this area of memory are no longer available to
the x86.
Update the PSP driver to allocate a 1MB region for the TMR that is 1MB
aligned and then provide it to the PSP through the SEV INIT command.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for nf-next:
1) Add IPS_HW_OFFLOAD status bit, from Bodong Wang.
2) Remove 128-bit limit on the set element data area, rise it
to 64 bytes.
3) Report EOPNOTSUPP for unsupported NAT types and flags.
4) Set up nft_nat flags from the control plane path.
5) Add helper functions to set up the nf_nat_range2 structure.
6) Add netmap support for nft_nat.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
Also enhance show_fdinfo to potentially include bpf_link type-specific
information (similarly to obj_info).
Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
UAPI. bpf_link_tracing also now will store and return bpf_attach_type.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-5-andriin@fb.com
Add support to look up bpf_link by ID and iterate over all existing bpf_links
in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
done as the very last step in finalizing bpf_link, together with installing
FD. This approach allows users of bpf_link in kernel code to not worry about
races between user-space and kernel code that hasn't finished attaching and
initializing bpf_link.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-4-andriin@fb.com
Generate ID for each bpf_link using IDR, similarly to bpf_map and bpf_prog.
bpf_link creation, initialization, attachment, and exposing to user-space
through FD and ID is a complicated multi-step process, abstract it away
through bpf_link_primer and bpf_link_prime(), bpf_link_settle(), and
bpf_link_cleanup() internal API. They guarantee that until bpf_link is
properly attached, user-space won't be able to access partially-initialized
bpf_link either from FD or ID. All this allows to simplify bpf_link attachment
and error handling code.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-3-andriin@fb.com
Add a new kfd ioctl to allocate queue GWS. Queue
GWS is released on queue destroy.
v2: re-introduce this API with the following fixes squashed in:
- drm/amdkfd: fix null pointer dereference on dev
- drm/amdkfd: Return proper error code for gws alloc API
- drm/amdkfd: Remove GPU ID in GWS queue creation
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch allows you to NAT the network address prefix onto another
network address prefix, a.k.a. netmapping.
Userspace must specify the NF_NAT_RANGE_NETMAP flag and the prefix
address through the NFTA_NAT_REG_ADDR_MIN and NFTA_NAT_REG_ADDR_MAX
netlink attributes.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl6mwOETHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXrFLB/4yKsrl41WwYRbTKgiir576/LA0vGxQ
cZjUQwkVv3S5/AfhvpwiGFV4dBV6j81KtNhRE6luaa3FBHObnjrx5tNqMw/P8a0j
HZGZ68n4qE+OPVtTxj54s81iWIi9vgT/La92GPYhuXoiVPTd5zJ2lwY3so04BSFJ
p30+RZFKNkTjNYZNZSHcoodr+js4Uws8JSn8OmpCJr8Gt+FJqkujQROG3HMKhJlk
KlJlCJhV48tj/nlgcbGHBF0Yy5l8DVCaKIz+MiF5F/i+P8r0cErfyihc9Ene0/un
LNFhIVGn8/MTi0CVrltcnur2qFH1qPCuLolKSpd/FKd6H2UDgK16XgAd
=NJP/
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V fixes from Wei Liu:
- Two patches from Dexuan fixing suspension bugs
- Three cleanup patches from Andy and Michael
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hyper-v: Remove internal types from UAPI header
hyper-v: Use UUID API for exporting the GUID
x86/hyperv: Suspend/resume the VP assist page for hibernation
Drivers: hv: Move AEOI determination to architecture dependent code
Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM
This patch adds a new port attribute, IFLA_BRPORT_MRP_RING_OPEN, which allows
to notify the userspace when the port lost the continuite of MRP frames.
This attribute is set by kernel whenever the SW or HW detects that the ring is
being open or closed.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new nested netlink attribute to configure the MRP. These attributes are used
by the userspace to add/delete/configure MRP instances and by the kernel to
notify the userspace when the MRP ring gets open/closed. MRP nested attribute
has the following attributes:
IFLA_BRIDGE_MRP_INSTANCE - the parameter type is br_mrp_instance which contains
the instance id, and the ifindex of the two ports. The ports can't be part of
multiple instances. This is used to create/delete MRP instances.
IFLA_BRIDGE_MRP_PORT_STATE - the parameter type is u32. Which can be forwarding,
blocking or disabled.
IFLA_BRIDGE_MRP_PORT_ROLE - the parameter type is br_mrp_port_role which
contains the instance id and the role. The role can be primary or secondary.
IFLA_BRIDGE_MRP_RING_STATE - the parameter type is br_mrp_ring_state which
contains the instance id and the state. The state can be open or closed.
IFLA_BRIDGE_MRP_RING_ROLE - the parameter type is br_mrp_ring_role which
contains the instance id and the ring role. The role can be MRM or MRC.
IFLA_BRIDGE_MRP_START_TEST - the parameter type is br_mrp_start_test which
contains the instance id, the interval at which to send the MRP_Test frames,
how many test frames can be missed before declaring the ring open and the
period which represent for how long to send the test frames.
Also add the file include/uapi/linux/mrp_bridge.h which defines all the types
used by MRP that are also needed by the userpace.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bit indicates that the conntrack entry is offloaded to hardware
flow table. nf_conntrack entry will be tagged with [HW_OFFLOAD] if
it's offload to hardware.
cat /proc/net/nf_conntrack
ipv4 2 tcp 6 \
src=1.1.1.17 dst=1.1.1.16 sport=56394 dport=5001 \
src=1.1.1.16 dst=1.1.1.17 sport=5001 dport=56394 [HW_OFFLOAD] \
mark=0 zone=0 use=3
Note that HW_OFFLOAD/OFFLOAD/ASSURED are mutually exclusive.
Changelog:
* V1->V2:
- Remove check of lastused from stats. It was meant for cases such
as removing driver module while traffic still running. Better to
handle such cases from garbage collector.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
On a device like a cellphone which is constantly suspending
and resuming CLOCK_MONOTONIC is not particularly useful for
keeping track of or reacting to external network events.
Instead you want to use CLOCK_BOOTTIME.
Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns()
based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull networking fixes from David Miller:
1) Fix memory leak in netfilter flowtable, from Roi Dayan.
2) Ref-count leaks in netrom and tipc, from Xiyu Yang.
3) Fix warning when mptcp socket is never accepted before close, from
Florian Westphal.
4) Missed locking in ovs_ct_exit(), from Tonghao Zhang.
5) Fix large delays during PTP synchornization in cxgb4, from Rahul
Lakkireddy.
6) team_mode_get() can hang, from Taehee Yoo.
7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver,
from Niklas Schnelle.
8) Fix handling of bpf XADD on BTF memory, from Jann Horn.
9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson.
10) Missing queue memory release in iwlwifi pcie code, from Johannes
Berg.
11) Fix NULL deref in macvlan device event, from Taehee Yoo.
12) Initialize lan87xx phy correctly, from Yuiko Oshino.
13) Fix looping between VRF and XFRM lookups, from David Ahern.
14) etf packet scheduler assumes all sockets are full sockets, which is
not necessarily true. From Eric Dumazet.
15) Fix mptcp data_fin handling in RX path, from Paolo Abeni.
16) fib_select_default() needs to handle nexthop objects, from David
Ahern.
17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun.
18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca.
19) Correct rx/tx stats in bcmgenet driver, from Doug Berger.
20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix
from Luke Nelson.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits)
selftests/bpf: Fix a couple of broken test_btf cases
tools/runqslower: Ensure own vmlinux.h is picked up first
bpf: Make bpf_link_fops static
bpftool: Respect the -d option in struct_ops cmd
selftests/bpf: Add test for freplace program with expected_attach_type
bpf: Propagate expected_attach_type when verifying freplace programs
bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
bpf, x86_32: Fix logic error in BPF_LDX zero-extension
bpf, x86_32: Fix clobbering of dst for BPF_JSET
bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
bpf: Fix reStructuredText markup
net: systemport: suppress warnings on failed Rx SKB allocations
net: bcmgenet: suppress warnings on failed Rx SKB allocations
macsec: avoid to set wrong mtu
mac80211: sta_info: Add lockdep condition for RCU list usage
mac80211: populate debugfs only after cfg80211 init
net: bcmgenet: correct per TX/RX ring statistics
net: meth: remove spurious copyright text
net: phy: bcm84881: clear settings on link down
chcr: Fix CPU hard lockup
...
KVM_CAP_HALT_POLL is a per-VM capability that lets userspace
control the halt-polling time, allowing halt-polling to be tuned or
disabled on particular VMs.
With dynamic halt-polling, a VM's VCPUs can poll from anywhere from
[0, halt_poll_ns] on each halt. KVM_CAP_HALT_POLL sets the
upper limit on the poll time.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jon Cargille <jcargill@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200417221446.108733-1-jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For DPP, there's a need to receive multicast action frames,
but many drivers need a special filter configuration for this.
Support announcing from userspace in the management registration
that multicast RX is required, with an extended feature flag if
the driver handles this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20200417124013.c46238801048.Ib041d437ce0bff28a0c6d5dc915f68f1d8591002@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Extend cfg80211_rx_unprot_mlme_mgmt() to cover indication of unprotected
Beacon frames in addition to the previously used Deauthentication and
Disassociation frames. The Beacon frame case is quite similar, but has
couple of exceptions: this is used both with fully unprotected and also
incorrectly protected frames and there is a rate limit on the events to
avoid unnecessary flooding netlink events in case something goes wrong.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200401142548.6990-1-jouni@codeaurora.org
[add missing kernel-doc]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The uuid_le mistakenly comes to be an UAPI type. Since it's luckily not used by
Hyper-V APIs, we may replace with POD types, i.e. __u8 array.
Note, previously shared uuid_be had been removed from UAPI few releases ago.
This is a continuation of that process towards removing uuid_le one.
Note, there is no ABI change!
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200422131818.23088-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
kernel + tools/perf:
Alexey Budankov:
- Introduce CAP_PERFMON to kernel and user space.
callchains:
Adrian Hunter:
- Allow using Intel PT to synthesize callchains for regular events.
Kan Liang:
- Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
perf script:
Andreas Gerstmayr:
- Add flamegraph.py script
BPF:
Jiri Olsa:
- Synthesize bpf_trampoline/dispatcher ksymbol events.
perf stat:
Arnaldo Carvalho de Melo:
- Honour --timeout for forked workloads.
Stephane Eranian:
- Force error in fallback on :k events, to avoid counting nothing when
the user asks for kernel events but is not allowed to.
perf bench:
Ian Rogers:
- Add event synthesis benchmark.
tools api fs:
Stephane Eranian:
- Make xxx__mountpoint() more scalable
libtraceevent:
He Zhe:
- Handle return value of asprintf.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXp2LlQAKCRCyPKLppCJ+
J95oAP0ZihVUhESv/gdeX0IDE5g6Rd2V6LNcRj+jb7gX9NlQkwD/UfS454WV1ftQ
qTwrkKPzY/5Tm2cLuVE7r7fJ6naDHgU=
=FHm4
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:
kernel + tools/perf:
Alexey Budankov:
- Introduce CAP_PERFMON to kernel and user space.
callchains:
Adrian Hunter:
- Allow using Intel PT to synthesize callchains for regular events.
Kan Liang:
- Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
perf script:
Andreas Gerstmayr:
- Add flamegraph.py script
BPF:
Jiri Olsa:
- Synthesize bpf_trampoline/dispatcher ksymbol events.
perf stat:
Arnaldo Carvalho de Melo:
- Honour --timeout for forked workloads.
Stephane Eranian:
- Force error in fallback on :k events, to avoid counting nothing when
the user asks for kernel events but is not allowed to.
perf bench:
Ian Rogers:
- Add event synthesis benchmark.
tools api fs:
Stephane Eranian:
- Make xxx__mountpoint() more scalable
libtraceevent:
He Zhe:
- Handle return value of asprintf.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some bug fixes.
Cleanup a couple of issues that surfaced meanwhile.
Disable vhost on ARM with OABI for now - to be fixed
fully later in the cycle or in the next release.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl6d6ZgPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpH3oH/0bJ6o+FiAi8xXgYqm9XXmswrZoZLahjyPay
dA7Sz5nNKVtdSGH9o0wRdcekt0SOI3ilZSkv9nwt9ep/5YzC3brf2hry+nPvMTsA
MhI3IAa7sK1vCXkftwOlx+SIeDfIwsqr+h4SCfMRxlIT0yAmOC8fl2ByT2dIbqnj
dlzwczecHI9LPUEmRWiKH/4Tj5MPZN5IeFSIAE+nA/9cl5h4qVSfYtWD3Y4VQ82g
Rv3mvVE+chaVbPxewaBZ8Y0Avti4tMyzsE0MY+dz5xfh+75hqMfygg//1osbEAbz
SiL5dDcANe8Q+QOc/BxHdj4dqpqUp1ldV+3Lge9k4lWAGnsEMEk=
=GZb2
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes and cleanups from Michael Tsirkin:
- Some bug fixes
- Cleanup a couple of issues that surfaced meanwhile
- Disable vhost on ARM with OABI for now - to be fixed fully later in
the cycle or in the next release.
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits)
vhost: disable for OABI
virtio: drop vringh.h dependency
virtio_blk: add a missing include
virtio-balloon: Avoid using the word 'report' when referring to free page hinting
virtio-balloon: make virtballoon_free_page_report() static
vdpa: fix comment of vdpa_register_device()
vdpa: make vhost, virtio depend on menu
vdpa: allow a 32 bit vq alignment
drm/virtio: fix up for include file changes
remoteproc: pull in slab.h
rpmsg: pull in slab.h
virtio_input: pull in slab.h
remoteproc: pull in slab.h
virtio-rng: pull in slab.h
virtgpu: pull in uaccess.h
tools/virtio: make asm/barrier.h self contained
tools/virtio: define aligned attribute
virtio/test: fix up after IOTLB changes
vhost: Create accessors for virtqueues private_data
vdpasim: Return status in vdpasim_get_status
...
This warning:
./include/uapi/linux/firewire-cdev.h:312: WARNING: Inline literal start-string without end-string.
is because %FOO doesn't work if there's a parenthesis at the
string (as a parenthesis may indicate a function). So, mark
the literal block using the alternate ``FOO`` syntax.
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/9b2501a41eba27ccdd4603cac2353c0efba7a90a.1586881715.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Several references got broken due to txt to ReST conversion.
Several of them can be automatically fixed with:
scripts/documentation-file-ref-check --fix
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> # hwtracing/coresight/Kconfig
Reviewed-by: Paul E. McKenney <paulmck@kernel.org> # memory-barrier.txt
Acked-by: Alex Shi <alex.shi@linux.alibaba.com> # translations/zh_CN
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> # translations/it_IT
Acked-by: Marc Zyngier <maz@kernel.org> # kvm/arm64
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6f919ddb83a33b5f2a63b6b5f0575737bb2b36aa.1586881715.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
RFC 2863 defines the operational state testing. Add support for this
state, both as a IF_LINK_MODE_ and __LINK_STATE_.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are use-cases where user-space shouldn't be allowed to communicate
directly with a TEE device which is dedicated to provide a specific
service for a kernel client. So add a private login method for kernel
clients and disallow user-space to open-session using GP implementation
defined login method range: (0x80000000 - 0xBFFFFFFF).
Reviewed-by: Jerome Forissier <jerome@forissier.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Hi Linus,
Please, pull the following patches that replace zero-length arrays with
flexible-array members.
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member convertions) will also
help to get completely rid of those sorts of issues.
Notice that all of these patches have been baking in linux-next for
quite a while now and, 238 more of these patches have already been
merged into 5.7-rc1.
There are a couple hundred more of these issues waiting to be addressed
in the whole codebase.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Thanks
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl6bccgACgkQRwW0y0cG
2zFYvBAAl5tsoZsb6h5o7+XpWetl2BfA8lRelXWg1la9mF+Zqgqz8raubs+EbR8f
65yz1lvoOl3jgeu1pQnx+AaDdG88Yu66BjPpFz/n8WWBjNC0z3M4Xcu+pFUanEzO
QqkCPryj6RlqCYL/WlSCifo+ZOAeM7jlw/2kkX1ILVwjYItFPJIw+5IEPrM0ucN2
tFp9H3iKOlA6PDuj4JO2xCnlUkL5aZk101qKqm41yZLLiS8zE8or4+s8Y7c7yDDP
ajQ+uCzJpt/VCn6Iyri0oZ5hp+gI6jJ8ox1Vo0UCuWQ2RJ7E2FE5qhhctwB4UYsg
+B6c1yckJqUoJ1c7Bbj00gsNMns3A7uLHFDOGBKQTjkRCn5+QV1wVvv5TJx2LJYL
EBt07IfS0YAv0EBIbJyxqzmWCt0unKCu3i1KePp/FYqq291dpr39olUMCa1+Qg98
v1VTGUlOvONy3v41tDx+Bfkt/0ebT8pogyenA51cjsD0bUZ3I/BsGxigXf0myLuy
6yFjx7f6ng2I3uBDSZ+H/KUM51H6yhB9UCQuQCSqHDU3iEHvh7dDdumD3A9OJyLw
nPC2HQhTOHVkbtg/E0KFh/ak1PoELCH3CR1Kgj/NSOG2Mz5tgtBfoxa+GwJTvJha
9m5JrBQcT7qF7pGtZU0NDQICrhhvUEX/Hwo3QAtYInWPsV3S+5U=
=GsIm
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-member-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array member conversion from Gustavo Silva:
"The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array
member[1][2], introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof
operator may not be applied. As a quirk of the original
implementation of zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible
array members have incomplete type[1]. There are some instances of
code in which the sizeof operator is being incorrectly/erroneously
applied to zero-length arrays and the result is zero. Such instances
may be hiding some bugs. So, this work (flexible-array member
convertions) will also help to get completely rid of those sorts of
issues.
Notice that all of these patches have been baking in linux-next for
quite a while now and, 238 more of these patches have already been
merged into 5.7-rc1.
There are a couple hundred more of these issues waiting to be
addressed in the whole codebase"
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
* tag 'flexible-array-member-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (28 commits)
xattr.h: Replace zero-length array with flexible-array member
uapi: linux: fiemap.h: Replace zero-length array with flexible-array member
uapi: linux: dlm_device.h: Replace zero-length array with flexible-array member
tpm_eventlog.h: Replace zero-length array with flexible-array member
ti_wilink_st.h: Replace zero-length array with flexible-array member
swap.h: Replace zero-length array with flexible-array member
skbuff.h: Replace zero-length array with flexible-array member
sched: topology.h: Replace zero-length array with flexible-array member
rslib.h: Replace zero-length array with flexible-array member
rio.h: Replace zero-length array with flexible-array member
posix_acl.h: Replace zero-length array with flexible-array member
platform_data: wilco-ec.h: Replace zero-length array with flexible-array member
memcontrol.h: Replace zero-length array with flexible-array member
list_lru.h: Replace zero-length array with flexible-array member
lib: cpu_rmap: Replace zero-length array with flexible-array member
irq.h: Replace zero-length array with flexible-array member
ihex.h: Replace zero-length array with flexible-array member
igmp.h: Replace zero-length array with flexible-array member
genalloc.h: Replace zero-length array with flexible-array member
ethtool.h: Replace zero-length array with flexible-array member
...
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
It can be confusing to have multiple features within the same driver that
are using the same verbage. As such this patch is creating a union of
free_page_report_cmd_id with free_page_hint_cmd_id so that we can clean-up
the userspace code a bit in terms of readability while maintaining the
functionality of legacy code.
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Link: https://lore.kernel.org/r/20200415174318.13597.99753.stgit@localhost.localdomain
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull networking fixes from David Miller:
1) Disable RISCV BPF JIT builds when !MMU, from Björn Töpel.
2) nf_tables leaves dangling pointer after free, fix from Eric Dumazet.
3) Out of boundary write in __xsk_rcv_memcpy(), fix from Li RongQing.
4) Adjust icmp6 message source address selection when routes have a
preferred source address set, from Tim Stallard.
5) Be sure to validate HSR protocol version when creating new links,
from Taehee Yoo.
6) CAP_NET_ADMIN should be sufficient to manage l2tp tunnels even in
non-initial namespaces, from Michael Weiß.
7) Missing release firmware call in mlx5, from Eran Ben Elisha.
8) Fix variable type in macsec_changelink(), caught by KASAN. Fix from
Taehee Yoo.
9) Fix pause frame negotiation in marvell phy driver, from Clemens
Gruber.
10) Record RX queue early enough in tun packet paths such that XDP
programs will see the correct RX queue index, from Gilberto Bertin.
11) Fix double unlock in mptcp, from Florian Westphal.
12) Fix offset overflow in ARM bpf JIT, from Luke Nelson.
13) marvell10g needs to soft reset PHY when coming out of low power
mode, from Russell King.
14) Fix MTU setting regression in stmmac for some chip types, from
Florian Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
amd-xgbe: Use __napi_schedule() in BH context
mISDN: make dmril and dmrim static
net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
tipc: fix incorrect increasing of link window
Documentation: Fix tcp_challenge_ack_limit default value
net: tulip: make early_486_chipsets static
dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400
ipv6: remove redundant assignment to variable err
net/rds: Use ERR_PTR for rds_message_alloc_sgs()
net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge
selftests/bpf: Check for correct program attach/detach in xdp_attach test
libbpf: Fix type of old_fd in bpf_xdp_set_link_opts
libbpf: Always specify expected_attach_type on program load if supported
xsk: Add missing check on user supplied headroom size
mac80211: fix channel switch trigger from unknown mesh peer
mac80211: fix race in ieee80211_register_hw()
net: marvell10g: soft-reset the PHY when coming out of low power
net: marvell10g: report firmware version
net/cxgb4: Check the return from t4_query_params properly
...
Introduce the CAP_PERFMON capability designed to secure system
performance monitoring and observability operations so that CAP_PERFMON
can assist CAP_SYS_ADMIN capability in its governing role for
performance monitoring and observability subsystems.
CAP_PERFMON hardens system security and integrity during performance
monitoring and observability operations by decreasing attack surface that
is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
chances to misuse the credentials and makes the operation more secure.
Thus, CAP_PERFMON implements the principle of least privilege for
performance monitoring and observability operations (POSIX IEEE 1003.1e:
2.2.2.39 principle of least privilege: A security design principle that
states that a process or program be granted only those privileges
(e.g., capabilities) necessary to accomplish its legitimate function,
and only for the time that such privileges are actually required)
CAP_PERFMON meets the demand to secure system performance monitoring and
observability operations for adoption in security sensitive, restricted,
multiuser production environments (e.g. HPC clusters, cloud and virtual compute
environments), where root or CAP_SYS_ADMIN credentials are not available to
mass users of a system, and securely unblocks applicability and scalability
of system performance monitoring and observability operations beyond root
and CAP_SYS_ADMIN use cases.
CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance
monitoring and observability operations and balances amount of CAP_SYS_ADMIN
credentials following the recommendations in the capabilities man page [1]
for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel
developers, below." For backward compatibility reasons access to system
performance monitoring and observability subsystems of the kernel remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability
usage for secure system performance monitoring and observability operations
is discouraged with respect to the designed CAP_PERFMON capability.
Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official hardware issues mitigation procedure [2]. The bugs
in the software itself can be fixed following the standard kernel development
process [3] to maintain and harden security of system performance monitoring
and observability operations.
[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl6VskEACgkQxWXV+ddt
WDurrhAAhkxlh6yrdZqr753DcpdEVAQhyHDsJ66GAKWuW8sn7ypTiZhNgKxvEGuz
UhwtXTlzZ7K9/h3TsVeih2iqEj6oc8ick+Th+Wf/7s0jhUXDcWi2OqBjTnIiH2Za
efrwGMiOEAHYqQ7tHjEbZiJGcQ2tE7+2Le4g3aFnv/kRT0jXDikzLTa/viMG73k5
9llSm+GJYl2KQNcUPmxGKrwwiiV5c5xNCGuEuY4lw+3OVn1QU4rayZDB/5GxZ/nC
72Efl9CxoDunBviys2NWxYTt/Ts3R/+yhnGX0kM6BovkN0bo1pA7HuWkADqYPnNN
r8z8X/zFYi7jZBwpPq4alcHW2IaMC7UEseEyZHlj9ce8pK8MnHFlBtfBcUzbvFl5
Wtt23AvAZ9CiQ40Sf5UBt6pliUQhr/BpBz88jatZ619ij1GLxeO++I5bIz3/YFQH
UEP7okhoqpxgKLFGRcpxkw0ggOipp7isFyfss2qaRMPebmNMKnuuUoEy5BDlHs2f
ewxbyuSUVXVBJMB4R6u77Nk5KLrTO67kfiCROaVKkzhYDESpbB4Trdl+kvzPSFb6
p3NYpJoGnkOKngG/vg5MoQGOp1oi4h3RH2Ck1Yes7jmBgYLSCQokCUXkm52PGfId
25P45yOzwS4W7sVFXsR3rygpexXlcNAIGG+2xtiw/AyFIQo5AZ4=
=pkZ2
-----END PGP SIGNATURE-----
Merge tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We have a few regressions and one fix for stable:
- revert fsync optimization
- fix lost i_size update
- fix a space accounting leak
- build fix, add back definition of a deprecated ioctl flag
- fix search condition for old roots in relocation"
* tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: re-instantiate the removed BTRFS_SUBVOL_CREATE_ASYNC definition
btrfs: fix reclaim counter leak of space_info objects
btrfs: make full fsyncs always operate on the entire file again
btrfs: fix lost i_size update after cloning inline extent
btrfs: check commit root generation in should_ignore_root
The commit 9c1036fdb1 ("btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC
support") breaks strace build with the kernel headers from git:
btrfs.c: In function "btrfs_test_subvol_ioctls":
btrfs.c:531:23: error: "BTRFS_SUBVOL_CREATE_ASYNC" undeclared (first use
in this function)
vol_args_v2.flags = BTRFS_SUBVOL_CREATE_ASYNC;
Moreover, it is improper to break UAPI, strace uses the definitions to
decode ioctls that are considered part of public API.
Restore the macro definition and put it under "#ifndef __KERNEL__"
in order to prevent inadvertent in-kernel usage.
Fixes: 9c1036fdb1 ("btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Including:
- ARM-SMMU support for the TLB range invalidation command in
SMMUv3.2.
- ARM-SMMU introduction of command batching helpers to batch up
CD and ATC invalidation.
- ARM-SMMU support for PCI PASID, along with necessary PCI
symbol exports.
- Introduce a generic (actually rename an existing) IOMMU
related pointer in struct device and reduce the IOMMU related
pointers.
- Some fixes for the OMAP IOMMU driver to make it build on 64bit
architectures.
- Various smaller fixes and improvements.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl6MmlQACgkQK/BELZcB
GuP9ug//QtyPYRYdO4ltD6mPvfB7V0qdXksJz+ZVbPOMvqUs1jr1FVYFH1HkOVu5
mFD6OJuQQJrrGukXMERyVgDhUqNr+xHrkGS+X67NrOkUrguyvUfLYSU/GmOH/kdk
w1Smp7pTcHHAMmxGyQWTSFa9jSxKes5ZYBo065Z3/SlcIcTTkbw7V87N3RPrlnCX
s/K7CtSGnKJMpL9DVuNH27eqGlfiuIrQhj/vTQVSn1nF7TjaGKXaRXj+3fcUgrIt
KAfflWiTJncMY6WLjz65iiUtUvgA2Mmgn3CKJnWjgECd70+NybLQ9OAvQO+A2H6s
8XO9DsOOe8HFq/ljev1JGSw5LgB5Ip1RtSk7Ost6mkUFzLlaeTBJFQeHbECI9dne
hksRYL4R8bwiQu+MkQe7HLa6TDb+asqjsayIO3M1oIpF+8mIz/oNOGCeP0cqSiuj
lVMnblAWatrsZrf+AlxZKddIJWiduXoTjtpV64HTTvZeL4/g3kY0ykBXpS4xLj5V
s0KvR6kjR1LYUgpe9jJ3CJTdIlU4MzSlrtq4CYFZvRa7rBLmk2cGsR1jiA3GTGpn
bcqOQNgb5X1mpAzmOZb//pbjozgvCjQpQexyU4tRzs38yk+TK5OnOe5z4M1srHPY
7dTZoUEpAcRm4K+JFQ3+yOtxRTsINYyFUL/Qt8ALbWy4hXluRGY=
=nhuS
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- ARM-SMMU support for the TLB range invalidation command in SMMUv3.2
- ARM-SMMU introduction of command batching helpers to batch up CD and
ATC invalidation
- ARM-SMMU support for PCI PASID, along with necessary PCI symbol
exports
- Introduce a generic (actually rename an existing) IOMMU related
pointer in struct device and reduce the IOMMU related pointers
- Some fixes for the OMAP IOMMU driver to make it build on 64bit
architectures
- Various smaller fixes and improvements
* tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (39 commits)
iommu: Move fwspec->iommu_priv to struct dev_iommu
iommu/virtio: Use accessor functions for iommu private data
iommu/qcom: Use accessor functions for iommu private data
iommu/mediatek: Use accessor functions for iommu private data
iommu/renesas: Use accessor functions for iommu private data
iommu/arm-smmu: Use accessor functions for iommu private data
iommu/arm-smmu: Refactor master_cfg/fwspec usage
iommu/arm-smmu-v3: Use accessor functions for iommu private data
iommu: Introduce accessors for iommu private data
iommu/arm-smmu: Fix uninitilized variable warning
iommu: Move iommu_fwspec to struct dev_iommu
iommu: Rename struct iommu_param to dev_iommu
iommu/tegra-gart: Remove direct access of dev->iommu_fwspec
drm/msm/mdp5: Remove direct access of dev->iommu_fwspec
ACPI/IORT: Remove direct access of dev->iommu_fwspec
iommu: Define dev_iommu_fwspec_get() for !CONFIG_IOMMU_API
iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
iommu/virtio: Fix freeing of incomplete domains
iommu/virtio: Fix sparse warning
iommu/vt-d: Add build dependency on IOASID
...
Some bug fixes.
The new vdpa subsystem with two first drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl6MS7wPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpGp8H/2H49Gya1cfVbGU13qgmBSQqQXC8hS3iNLuG
ltRgU+jafJT//kvkdm3/DUzfK3eRUWUfqZLKEbAQDtMY0OGHi/KGEBYVLDde7Zxt
Lg4VnwBhkYDR/f01ZZDbHxzj9JAr83i28nILjLIqf3a1BX4zf203+ZE0/JM8a7wL
dOPoH7NAfyz5ul2F67bR1IOF8vC6TidpavzR2+HC/MocHYXb6Bgfvt+i4EcrfuMf
9lnBfajgklKr9sNJniwvvR1pWVg+YyG3VeC6T8tIC/xzbCmIoNT+5b3q2XPSIHq1
EuQTeXH9CBFXS0qcFlq2ktR1xd1Lx95hKwZpqLwLFDmfgjhV2QU=
=/84P
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- Some bug fixes
- The new vdpa subsystem with two first drivers
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-balloon: Revert "virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM"
vdpa: move to drivers/vdpa
virtio: Intel IFC VF driver for VDPA
vdpasim: vDPA device simulator
vhost: introduce vDPA-based backend
virtio: introduce a vDPA based transport
vDPA: introduce vDPA bus
vringh: IOTLB support
vhost: factor out IOTLB
vhost: allow per device message handler
vhost: refine vhost and vringh kconfig
virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
virtio-net: Introduce hash report feature
virtio-net: Introduce RSS receive steering feature
virtio-net: Introduce extended RSC feature
tools/virtio: option to build an out of tree module
Pull input updates from Dmitry Torokhov:
"An update to the Goodix touchscreen driver to enable it work properly
on various Bay Trail and Cherry Trail devices, and a few other
assorted changes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (26 commits)
Input: update SPDX tag for input-event-codes.h
Input: i8042 - add Acer Aspire 5738z to nomux list
Input: goodix - fix compilation when ACPI support is disabled
dt-bindings: touchscreen: Convert edt-ft5x06 to json-schema
Input: of_touchscreen - explicitly choose axis
Input: goodix - support gt9147 touchpanel
dt-bindings: touchscreen: goodix: support of gt9147
Input: goodix - add support for Goodix GT917S
Input: goodix - use string-based chip ID
dt-bindings: input: touchscreen: add compatible string for Goodix GT917S
Input: goodix - add support for more then one touch-key
Input: goodix - fix spurious key release events
Input: goodix - try to reset the controller if the i2c-test fails
Input: goodix - restore config on resume if necessary
Input: goodix - make goodix_send_cfg() take a raw buffer as argument
Input: goodix - add minimum firmware size check
Input: goodix - save a copy of the config from goodix_read_config()
Input: goodix - move defines to above struct goodix_ts_data declaration
Input: goodix - add support for controlling the IRQ pin through ACPI methods
Input: goodix - add support for getting IRQ + reset GPIOs on Bay Trail devices
...
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net, they are:
1) Fix spurious overlap condition in the rbtree tree, from Stefano Brivio.
2) Fix possible uninitialized pointer dereference in nft_lookup.
3) IDLETIMER v1 target matches the Android layout, from
Maciej Zenczykowski.
4) Dangling pointer in nf_tables_set_alloc_name, from Eric Dumazet.
5) Fix RCU warning splat in ipset find_set_type(), from Amol Grover.
6) Report EOPNOTSUPP on unsupported set flags and object types in sets.
7) Add NFT_SET_CONCAT flag to provide consistent error reporting
when users defines set with ranges in concatenations in old kernels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge more updates from Andrew Morton:
- a lot more of MM, quite a bit more yet to come: (memcg, pagemap,
vmalloc, pagealloc, migration, thp, ksm, madvise, virtio,
userfaultfd, memory-hotplug, shmem, rmap, zswap, zsmalloc, cleanups)
- various other subsystems (procfs, misc, MAINTAINERS, bitops, lib,
checkpatch, epoll, binfmt, kallsyms, reiserfs, kmod, gcov, kconfig,
ubsan, fault-injection, ipc)
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (158 commits)
ipc/shm.c: make compat_ksys_shmctl() static
ipc/mqueue.c: fix a brace coding style issue
lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability"
ubsan: include bug type in report header
kasan: unset panic_on_warn before calling panic()
ubsan: check panic_on_warn
drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks
ubsan: split "bounds" checker from other options
ubsan: add trap instrumentation option
init/Kconfig: clean up ANON_INODES and old IO schedulers options
kernel/gcov/fs.c: replace zero-length array with flexible-array member
gcov: gcc_3_4: replace zero-length array with flexible-array member
gcov: gcc_4_7: replace zero-length array with flexible-array member
kernel/kmod.c: fix a typo "assuems" -> "assumes"
reiserfs: clean up several indentation issues
kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
samples/hw_breakpoint: drop use of kallsyms_lookup_name()
samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes
fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
fs/binfmt_elf.c: allocate less for static executable
...
- New mode for time travel, external via virtio
- Fixes for ubd to make sure no requests can get lost
- Fixes for vector networking
- Allow CONFIG_STATIC_LINK only when possible
- Minor cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl6MbGYWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wSY2D/4k1kb3A5pZ6OEXCkKmRU63j0RC
na0bsa4lztMuABgOWKXP09cqL2ZhJ1rVVRUMV7jgVFKj7rKkJHHGHgdBeEkXOcb8
skOVxln1X/i3T9q9QQ4ofkSk0U8gHCZA3pqrn7TFI9ZmrosOUYwhQKkqcNHvSfPc
XEjKUx1GCS+wA0mw5yLyDZqDGkZgMNSmNezR7Oq3EB9wi8K2n6Racn6//S/uqiS6
I8HHE7R2ci0YfflP+xE8i1qg8/TY2wj2oCP33b9o/XefyyNSndVj7KQUI3KRBmSh
M0k2sbOqegVzSH/l5YFIZ7zbDcqkYeGWopPIuYWo3en7ZmfJfP2KD31c8gPOuElC
HuUvQyS1VDpLn6JBa8Y456e8IrKl/QquXfZDc2qG5HYTR6g9nv9y8VNtx4dSQ+sB
AfgErKofx7x2JQNRfg+0BYKgw/MawGAjiSZm5qVNfvFM3YDWZSUZ9gEAcX6qto/z
P+66Zrhatdt9TaQdy9vbQKDWSJk9ood2mQYU0JJSfzgsotWslyvCsc6ANtwfkc7R
sLxnsa6EA7CYogbMJ7wRxD5spCNZrRZvepHhe5uft/nWG/qGM1jy7Vk16Or03sVH
sScIp6m+yDyhhEjJOT8Mq6WbM3mIfILMb42FyDJQIpJ9JcXSxzbiZu7RSK38yoEG
+WYGOYdTGgzxIWsRmQ==
=WVcL
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- New mode for time travel, external via virtio
- Fixes for ubd to make sure no requests can get lost
- Fixes for vector networking
- Allow CONFIG_STATIC_LINK only when possible
- Minor cleanups and fixes
* tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Remove some unnecessary NULL checks in vector_user.c
um: vector: Avoid NULL ptr deference if transport is unset
um: Make CONFIG_STATIC_LINK actually static
um: Implement cpu_relax() as ndelay(1) for time-travel
um: Implement ndelay/udelay in time-travel mode
um: Implement time-travel=ext
um: virtio: Implement VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS
um: time-travel: Rewrite as an event scheduler
um: Move timer-internal.h to non-shared
hostfs: Use kasprintf() instead of fixed buffer formatting
um: falloc.h needs to be directly included for older libc
um: ubd: Retry buffer read on any kind of error
um: ubd: Prevent buffer overrun on command completion
um: Fix overlapping ELF segments when statically linked
um: Delete never executed timer
um: Don't overwrite ethtool driver version
um: Fix len of file in create_pid_file
um: Don't use console_drivers directly
um: Cleanup CONFIG_IOSCHED_CFQ
Introduce the new uffd-wp APIs for userspace.
Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking
using the new UFFDIO_REGISTER_MODE_WP flag. Note that this flag can
co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the
userspace program can not only resolve missing page faults, and at the
same time tracking page data changes along the way.
Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level
write protection tracking. Note that we will need to register the memory
region with UFFDIO_REGISTER_MODE_WP before that.
[peterx@redhat.com: write up the commit message]
[peterx@redhat.com: remove useless block, write commit message, check against
VM_MAYWRITE rather than VM_WRITE when register]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for the page reporting feature provided by virtio-balloon.
Reporting differs from the regular balloon functionality in that is is
much less durable than a standard memory balloon. Instead of creating a
list of pages that cannot be accessed the pages are only inaccessible
while they are being indicated to the virtio interface. Once the
interface has acknowledged them they are placed back into their respective
free lists and are once again accessible by the guest system.
Unlike a standard balloon we don't inflate and deflate the pages. Instead
we perform the reporting, and once the reporting is completed it is
assumed that the page has been dropped from the guest and will be faulted
back in the next time the page is accessed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>
Cc: wei qi <weiqi4@huawei.com>
Link: http://lkml.kernel.org/r/20200211224657.29318.68624.stgit@localhost.localdomain
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefano originally proposed to introduce this flag, users hit EOPNOTSUPP
in new binaries with old kernels when defining a set with ranges in
a concatenation.
Fixes: f3a2181e16 ("netfilter: nf_tables: Support for sets with multiple ranged fields")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl6LFJ0ACgkQnJ2qBz9k
QNkSUQgAzwaescnHeVTF7/Zg9Uj2xrfrTJZ1E+Mn9qnd/0/z/asVV+RKfY7Gnu7h
g19inDI4ZESFz2gWz4jwJD1c2/yMZb8vnae4ye3dtCv2yjG/0JxCeue6vjwsWqmO
4jbSgk8YNQqzwEFVMzNp43ZJr3CFooLCIsJcL8q4yYk8Kt4pDUPmQ1vBvAc6k9vK
BKMBvp926tbomP27nq0n0CjvHy7ipDGMl4H6i4vBxHRfbDPih2x9VEklK3JatC1n
4AKS6IYJrkZVdOjli+DrResbcWxyT4db5tPio5MU0RDnVhNZT2cHyNVXf5EpRJqP
72pa7gfPu1Rx1+tU8bDR/daSveou2A==
=fkCV
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"This implements the fanotify FAN_DIR_MODIFY event.
This event reports the name in a directory under which a change
happened and together with the directory filehandle and fstatat()
allows reliable and efficient implementation of directory
synchronization"
* tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: Fix the checks in fanotify_fsid_equal
fanotify: report name info for FAN_DIR_MODIFY event
fanotify: record name info for FAN_DIR_MODIFY event
fanotify: Drop fanotify_event_has_fid()
fanotify: prepare to report both parent and child fid's
fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name
fanotify: divorce fanotify_path_event and fanotify_fid_event
fanotify: Store fanotify handles differently
fanotify: Simplify create_fd()
fanotify: fix merging marks masks with FAN_ONDIR
fanotify: merge duplicate events on parent and child
fsnotify: replace inode pointer with an object id
fsnotify: simplify arguments passing to fsnotify_parent()
fsnotify: use helpers to access data by data_type
fsnotify: funnel all dirent events through fsnotify_name()
fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
fsnotify: tidy up FS_ and FAN_ constants
Android has long had an extension to IDLETIMER to send netlink
messages to userspace, see:
https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/include/uapi/linux/netfilter/xt_IDLETIMER.h#42
Note: this is idletimer target rev 1, there is no rev 0 in
the Android common kernel sources, see registration at:
https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/net/netfilter/xt_IDLETIMER.c#483
When we compare that to upstream's new idletimer target rev 1:
https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git/tree/include/uapi/linux/netfilter/xt_IDLETIMER.h#n46
We immediately notice that these two rev 1 structs are the
same size and layout, and that while timer_type and send_nl_msg
are differently named and serve a different purpose, they're
at the same offset.
This makes them impossible to tell apart - and thus one cannot
know in a mixed Android/vanilla environment whether one means
timer_type or send_nl_msg.
Since this is iptables/netfilter uapi it introduces a problem
between iptables (vanilla vs Android) userspace and kernel
(vanilla vs Android) if the two don't match each other.
Additionally when at some point in the future Android picks up
5.7+ it's not at all clear how to resolve the resulting merge
conflict.
Furthermore, since upgrading the kernel on old Android phones
is pretty much impossible there does not seem to be an easy way
out of this predicament.
The only thing I've been able to come up with is some super
disgusting kernel version >= 5.7 check in the iptables binary
to flip between different struct layouts.
By adding a dummy field to the vanilla Linux kernel header file
we can force the two structs to be compatible with each other.
Long term I think I would like to deprecate send_nl_msg out of
Android entirely, but I haven't quite been able to figure out
exactly how we depend on it. It seems to be very similar to
sysfs notifications but with some extra info.
Currently it's actually always enabled whenever Android uses
the IDLETIMER target, so we could also probably entirely
remove it from the uapi in favour of just always enabling it,
but again we can't upgrade old kernels already in the field.
(Also note that this doesn't change the structure's size,
as it is simply fitting into the pre-existing padding, and
that since 5.7 hasn't been released yet, there's still time
to make this uapi visible change)
Cc: Manoj Basapathi <manojbm@codeaurora.org>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
core:
- Support for cgroup tracking in samples to allow cgroup based
analysis
tools:
- Support for cgroup analysis
- Commandline option and hotkey for perf top to change the sort order
- A set of fixes all over the place
- Various build system related improvements
- Updates of the X86 pmu event JSON data
- Documentation updates
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6J2iATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXMEEACy6WaNabX7EzkLUnX0WCgxnZVSryNR
4EnxLJSg5lChEe4q2mOS3mRMpzlXHQieWcFxlwda7kjIbFlgQvjJQUiYlAvxRRO/
giY/GwCtTi/Flcb+7wKxTgMYmtAUOZDWeQbBZUlFLi9vyeCHVkjget9EyVsgbe/W
EmRsrPuKOVMUTeEwm3zpIE051DObpiWLNge++My70q5W/yNsS94PbNydgKO7osqk
pX37YVyBFpI2IQxMGzaE3yK7OxXRjYljZaz1tONFDMEYOX9gmxpDsCCflsP1ZOzL
4/P4faRvAOElwxtYBelKmRl8eboqhRpTEK0Et0TI0LYbUZrE2nisDi0LTKPWQb0k
Om2Qi6AfZs67PVzx9htlx6rfee72+sUluz5BDKOGH0pNJ8CFy70ns8InLsZqbBZ7
SgFVNjx6bHxB58VuVE9WEzr/KVs6zI/SuJlH7WG7FLXbm5j0cETfCjg40JlDpSNh
CZs+Epky1zyytrVJ9Gc7KnRlw8VB2eWEQ2cQ0sqj5w7WxhfrnsCmQf1zk4sofhOF
iz3Dvb9Llz/pYWGZbEiQAuI+8bo0psJptidzpmpbIXs/woKDhW49w1FxqowJQMWe
+lGWcauSfo3gjgEnTkOWAx3yiH4i9rvRChX8Jh0Z07d6Kwf19YYfxcuhlkx0Wutj
eKaaErWDtWAxpA==
=2UUD
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Thomas Gleixner:
"Perf updates all over the place:
core:
- Support for cgroup tracking in samples to allow cgroup based
analysis
tools:
- Support for cgroup analysis
- Commandline option and hotkey for perf top to change the sort order
- A set of fixes all over the place
- Various build system related improvements
- Updates of the X86 pmu event JSON data
- Documentation updates"
* tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
perf python: Fix clang detection to strip out options passed in $CC
perf tools: Support Python 3.8+ in Makefile
perf script: Fix invalid read of directory entry after closedir()
perf script report: Fix SEGFAULT when using DWARF mode
perf script: add -S/--symbols documentation
perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
perf events parser: Add missing Intel CPU events to parser
perf script: Allow --symbol to accept hexadecimal addresses
perf report/top TUI: Fix title line formatting
perf top: Support hotkey to change sort order
perf top: Support --group-sort-idx to change the sort order
perf symbols: Fix arm64 gap between kernel start and module end
perf build-test: Honour JOBS to override detection of number of cores
perf script: Add --show-cgroup-events option
perf top: Add --all-cgroups option
perf record: Add --all-cgroups option
perf record: Support synthesizing cgroup events
perf report: Add 'cgroup' sort key
perf cgroup: Maintain cgroup hierarchy
perf tools: Basic support for CGROUP event
...
Subsystem:
- The rtc_time_to_tm and rtc_tm_to_time wrappers have finally been removed and
only the 64bit version remain.
- hctosys now works with drivers compiled as modules
New driver:
- MediaTek MT2712 SoC based RTC
Drivers:
- set range for 88pm860x, au1xxx, cpcap, da9052, davinci, ds1305, ds1374,
mcp5121, pl030, pl031, pm8xxx, puv3, sa1100, sirfsoc, starfire, sun6i
- ds1307: DS1388 oscillator failure detection and watchdog support
- jz4740: JZ4760 support
- pcf85063: clock out pin support
- sun6i: external 32k oscillator is now optional, the range is now handled by
the core, providing a solution for 2034.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEycoQi/giopmpPgB12wIijOdRNOUFAl6IgdAACgkQ2wIijOdR
NOVH9Q//evfZmRDQb9Lm8YuSOt+U6lBaCuvRP8H4s7i02jsw0tt1oT6+rGo+1N+k
pi64+iwPqDj7h+eds06jVur7yWZDNNfCSywlMU8ODGLQcc/hXMTvaIeJtdXG+I+7
dpIVTs9Artiey2Qb3qZalSMYzrE2rQG6Yuyb/tHx7Pohvje1wKnAE3T64W6sfvir
zjAwAr/+s9n82mjGtGLapz0onao7dSN4p6b3UIkph9KxPdeHjItd7cRiJuzioysj
LE8WRUek++uha32PzghK9MoUpRKf8PJtGbsAIRhdwvmwub1Pqs6Uvk5IHQvdiFZx
af1Qnw/4qUH03HVUuq8VdEkwa8mdthcZ72b8uzUWOqP+uPkeTTlQOGOd1dRECijK
Qic+SKlEznwgKC9WuDufUQg5i2WjFBnBlA+dCE12/t91SbSlZdYjb53nQLHve1b9
e4VfyKnS5Ghn/dfizzzSPdcWuPW1rqq2dHTiOvFVqlLn1Fkt1AKaJzGAeCe6PWp4
AJVqPu+D1eQ/qmvEPXI7dQpCxdI+p6FNJgQvQLMssaq7mpf2fiRoLx23ikFG0v0k
BeNkNWpLz2ACwHpsOa/RWLwa/FC2+KBHaRWf+As7te63byeDQC8hUEi7F5K78lSS
yhfZ8SJR04SIIGKjuwAzPmaaIJ0hJT+eaxkFAJU7P5hD7KElsuM=
=6Veu
-----END PGP SIGNATURE-----
Merge tag 'rtc-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"More cleanup this cycle, with the final goal of removing the
rtc_time_to_tm and rtc_tm_to_time wrappers. All the drivers that have
been modified for this now are ready for the end of times (whether it
happens in 2033, 2038, 2106, 2127 or even 4052). There is also a
single new driver and the usual fixes and features.
Summary:
Subsystem:
- The rtc_time_to_tm and rtc_tm_to_time wrappers have finally been
removed and only the 64bit version remain.
- hctosys now works with drivers compiled as modules
New driver:
- MediaTek MT2712 SoC based RTC
Drivers:
- set range for 88pm860x, au1xxx, cpcap, da9052, davinci, ds1305,
ds1374, mcp5121, pl030, pl031, pm8xxx, puv3, sa1100, sirfsoc,
starfire, sun6i
- ds1307: DS1388 oscillator failure detection and watchdog support
- jz4740: JZ4760 support
- pcf85063: clock out pin support
- sun6i: external 32k oscillator is now optional, the range is now
handled by the core, providing a solution for 2034"
* tag 'rtc-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (87 commits)
rtc: ds1307: check for failed memory allocation on wdt
rtc: class: remove redundant assignment to variable err
rtc: remove rtc_time_to_tm and rtc_tm_to_time
rtc: sun6i: let the core handle rtc range
rtc: sun6i: switch to rtc_time64_to_tm/rtc_tm_to_time64
rtc: ds1307: add support for watchdog timer on ds1388
rtc: da9052: switch to rtc_time64_to_tm/rtc_tm_to_time64
rtc: da9052: set range
rtc: da9052: convert to devm_rtc_allocate_device
rtc: imx-sc: Align imx sc msg structs to 4
rtc: fsl-ftm-alarm: report alarm to core
rtc: pcf85063: Add pcf85063 clkout control to common clock framework
rtc: make definitions in include/uapi/linux/rtc.h actually useful for user space
rtc: class: avoid unnecessary lookup in hctosys
dt-bindings: rtc: Convert and update jz4740-rtc doc to YAML
rtc: jz4740: Rename vendor-specific DT properties
rtc: jz4740: Add support for JZ4760 SoC
rtc: class: support hctosys from modular RTC drivers
rtc: pm8xxx: clear alarm register when alarm is not enabled
rtc: omap: drop unused dt-bindings header
...
Core and userspace API:
- The userspace API KFIFOs have been imoproved with locks that
do not block interrupts. This makes us better at getting
events to userspace without blocking or disturbing new events
arriving in the same time. This was reviewed by the KFIFO
maintainer Stefani. This is a generic improvement which
paves the road for similar improvements in other subsystems.
- We provide a new ioctl() for monitoring changes in the line
information, such as when multiple clients are taking lines
and giving them back, possibly reconfiguring them in the
process: we can now monitor that and not get stuck with stale
static information.
- An example tool 'gpio-watch' is provided to showcase this
functionality.
- Timestamps for events are switched to ktime_get_ns() which is
monotonic. We previously had a 'realtime' stamp which could
move forward and *backward* in time, which probably would just
cause silent bugs and weird behaviour. In the long run we
see two relevant timestamps: ktime_get_ns() or the timestamp
sometimes provided by the GPIO hardware itself, if that
exists.
- Device Tree overlay support for GPIO hogs. On systems that
load overlays, these overlays can now contain hogs, and will
then be respected.
- Handle pin control interaction with nonexisting pin ranges
in the GPIO library core instead of in the individual
drivers.
New drivers:
- New driver for the Mellanox BlueField 2 GPIO controller.
Driver improvements:
- Introduce the BGPIOF_NO_SET_ON_INPUT flag to the generic
MMIO GPIO library and use this flag in the MT7621 driver.
- Texas Instruments OMAP CPU power management improvements,
such as blocking of idle on pending GPIO interrupts.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl6IWUAACgkQQRCzN7AZ
XXPReQ//WUHALqbLOL/DZR8kVyygVZXV8JC4jcuMQgX+Mvm1DQuGp1T8VPlG1wC7
vy5eTz3oPc+jAaHGQgGiSGdPRQtDkAW5+Wo4hzcHJ5ATlzT7cgNywK3Jk6+YlPlF
CuhHB6aPEymQ45/GseUwEb5AuWWKS03P3XdOBj2IdDhuRHRfyjHi+31s0lh6bfNs
kC2eP+VtddVqLbOdPE1GKjZi+Iq9ffFRmiNTduHbl3o6ZoE6WnMvn2wsC6zPpJr4
AM8zpj9JFUq1DYaQf8bJJLTm8lItcyyUUhq8oKUxBpecjWfhUYwEYHGxqWIk9fbM
oyJ3zBlU0G7bc4gu1VG3cTacUf//BTmQG3iZNSSw9w17I1hOMX9wNJan22VkEbJD
8X+6AcgkldNwnAVWcVgg1lkfgdxlWURMYXl1uQPQZSj2SQL6gFcdkAhKwlgl8jOi
RctUfZ69tD+4ul2b+MhL6BtIk16o2RIBJF53ESxrR4qftRZ3ywW4JQ/AwNoQalxM
Xg8U3axFm8kitOzemVoXGr/AIee9KpcA4NK8yQLEPmSFgBaVzaVlP9Q7Klr7QGXg
RvK45uRULrWnvSyoVY7Ox+ie1NiG8JAbaxKBxkV8n+z+gVMh82xcyQX+5SgWxwW0
dzeEx1B0mCW4WDAK29CcdQVE6m4K2u/fabhZq5IhgS1mQoTC894=
=EFm/
-----END PGP SIGNATURE-----
Merge tag 'gpio-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO development for the v5.7 kernel cycle.
Core and userspace API:
- The userspace API KFIFOs have been imoproved with locks that do not
block interrupts. This makes us better at getting events to
userspace without blocking or disturbing new events arriving in the
same time. This was reviewed by the KFIFO maintainer Stefani. This
is a generic improvement which paves the road for similar
improvements in other subsystems.
- We provide a new ioctl() for monitoring changes in the line
information, such as when multiple clients are taking lines and
giving them back, possibly reconfiguring them in the process: we
can now monitor that and not get stuck with stale static
information.
- An example tool 'gpio-watch' is provided to showcase this
functionality.
- Timestamps for events are switched to ktime_get_ns() which is
monotonic. We previously had a 'realtime' stamp which could move
forward and *backward* in time, which probably would just cause
silent bugs and weird behaviour. In the long run we see two
relevant timestamps: ktime_get_ns() or the timestamp sometimes
provided by the GPIO hardware itself, if that exists.
- Device Tree overlay support for GPIO hogs. On systems that load
overlays, these overlays can now contain hogs, and will then be
respected.
- Handle pin control interaction with nonexisting pin ranges in the
GPIO library core instead of in the individual drivers.
New drivers:
- New driver for the Mellanox BlueField 2 GPIO controller.
Driver improvements:
- Introduce the BGPIOF_NO_SET_ON_INPUT flag to the generic MMIO GPIO
library and use this flag in the MT7621 driver.
- Texas Instruments OMAP CPU power management improvements, such as
blocking of idle on pending GPIO interrupts"
* tag 'gpio-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (59 commits)
Revert "gpio: eic-sprd: Use devm_platform_ioremap_resource()"
pinctrl: Unconditionally assign .request()/.free()
gpio: Unconditionally assign .request()/.free()
gpio: export of_pinctrl_get to modules
pinctrl: Define of_pinctrl_get() dummy for !PINCTRL
gpio: Rename variable in core APIs
gpio: Avoid using pin ranges with !PINCTRL
gpiolib: Remove unused gpio_chip parameter from gpio_set_bias()
gpiolib: Pass gpio_desc to gpio_set_config()
gpiolib: Introduce gpiod_set_config()
tools: gpio: Fix out-of-tree build regression
gpio: gpiolib: fix a doc warning
gpio: tegra186: Add Tegra194 pin ranges for GG.0 and GG.1
gpio: tegra186: Add support for pin ranges
gpio: Support GPIO controllers without pin-ranges
ARM: integrator: impd1: Use GPIO_LOOKUP() helper macro
gpio: brcmstb: support gpio-line-names property
tools: gpio: Fix typo in gpio-utils
tools: gpio-hammer: Apply scripts/Lindent and retain good changes
gpiolib: gpio_name_to_desc: factor out !name check
...
- vfio-pci SR-IOV support (Alex Williamson)
- vfio DMA read/write interface (Yan Zhao)
- Fix vfio-platform erroneous IRQ error log (Eric Auger)
- Fix shared ATSD support for NVLink on POWER (Sam Bobroff)
- Fix init error without CONFIG_IOMMU_DMA (Andre Przywara)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJeh7XVAAoJECObm247sIsihpgP/A3xkJ4bS4KALoeKJwYOtWfP
/KHNYZwlSno7WZxREsYxQvZpdbOmwj00IRl/ckvTNGZ5Urs3Po8PtZ1mmMmzqPj7
oiLbusA2HapZUCncjtDtaiOhaaIP3bMc2SejtmBm8pWkNvCZNQm2MhjxKIS5jYbG
XLRWN/OnRl6sk+w6QRYIMgYeXNXK25FVfE3NPlp/c1u75GU9cXOqO/oRwGJ4VcXJ
ClGJUJQeGrQx6RNt2380e/SXCnYfnW92FqUVWmYuW1/4uznBbhdJ/OtRQz3eJ3AU
i8jTWdvATIiQQsTwbFZIjOdQJbLPtFd4g5Q8tBb0sFTSkAnUamAqUeVn/bH9swNI
9cq3xAcRx6ch1KDREIvO7M+SCrLTvIYcvGUc0QY8tPeUaH0aHXCU4tF3asxFU91P
RCusvTK8bdjx0K90XsgXexzy45377uTnrwXzJaWlCFp6cZxL+QELLfvUrQ4ZOBFV
MNilILh9/Beig6rxrzCB4K8A3YoMHAS6PnYyyb5iBLu86xbrxVk+LgUIt3eH7jGb
n9+9UbPCPzVuMZfXV3bkh6zr/bOYu+LsASOptlOAQp29Kp0R5Al0o7IloDlBlFRX
q7lIaHneC1IgKolj3Q+w4TXFvAtA6+GPdqkckDKwp9Ho5v7/4PlkuYpzEM67+cTx
V79mNUyqw7zF/hyvVrew
=RZWJ
-----END PGP SIGNATURE-----
Merge tag 'vfio-v5.7-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- vfio-pci SR-IOV support (Alex Williamson)
- vfio DMA read/write interface (Yan Zhao)
- Fix vfio-platform erroneous IRQ error log (Eric Auger)
- Fix shared ATSD support for NVLink on POWER (Sam Bobroff)
- Fix init error without CONFIG_IOMMU_DMA (Andre Przywara)
* tag 'vfio-v5.7-rc1' of git://github.com/awilliam/linux-vfio:
vfio: Ignore -ENODEV when getting MSI cookie
vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0]
vfio/pci: Cleanup .probe() exit paths
vfio/pci: Remove dev_fmt definition
vfio/pci: Add sriov_configure support
vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
vfio/pci: Introduce VF token
vfio/pci: Implement match ops
vfio: Include optional device match in vfio_device_ops callbacks
vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages
vfio: introduce vfio_dma_rw to read/write a range of IOVAs
vfio: allow external user to get vfio group from device
vfio: platform: Switch to platform_get_irq_optional()
perf python:
Arnaldo Carvalho de Melo:
- Fix clang detection to strip out options passed in $CC.
build:
He Zhe:
- Normalize gcc parameter when generating arch errno table, fixing
the build by removing options from $(CC).
Sam Lunt:
- Support Python 3.8+ in Makefile.
perf report/top:
Arnaldo Carvalho de Melo:
- Fix title line formatting.
perf script:
Andreas Gerstmayr:
- Fix SEGFAULT when using DWARF mode.
- Fix invalid read of directory entry after closedir(), found with valgrind.
Hagen Paul Pfeifer:
- Introduce --deltatime option.
Stephane Eranian:
- Allow --symbol to accept hexadecimal addresses.
Ian Rogers:
- Add -S/--symbols documentation
Namhyung Kim:
- Add --show-cgroup-events option.
perf python:
Arnaldo Carvalho de Melo:
- Include rwsem.c in the python binding, needed by the cgroups improvements.
build-test:
Arnaldo Carvalho de Melo:
- Honour JOBS to override detection of number of cores
perf top:
Jin Yao:
- Support --group-sort-idx to change the sort order
- perf top: Support hotkey to change sort order
perf pmu-events x86:
Jin Yao:
- Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
perf symbols arm64:
Kemeng Shi:
- Fix arm64 gap between kernel start and module end
kernel perf subsystem:
Namhyung Kim:
- Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
to allow cgroup tracking, saving a link between cgroup path and
its id number.
perf cgroup:
Namhyung Kim:
- Maintain cgroup hierarchy.
perf report:
Namhyung Kim:
- Add 'cgroup' sort key.
perf record:
Namhyung Kim:
- Support synthesizing cgroup events for pre-existing cgroups.
- Add --all-cgroups option
Documentation:
Tony Jones:
- Update docs regarding kernel/user space unwinding.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXodMtwAKCRCyPKLppCJ+
J+leAP0Ws0dGzMSIwcMVc7zvK1IsYOTlZ8lYXJePxD+Po/YPdAEA6Squf4gwZ2wm
b9R7w50dlCkMJ9LaueCeZZjh/4asFwQ=
=auOb
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-5.7-20200403' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:
perf python:
Arnaldo Carvalho de Melo:
- Fix clang detection to strip out options passed in $CC.
build:
He Zhe:
- Normalize gcc parameter when generating arch errno table, fixing
the build by removing options from $(CC).
Sam Lunt:
- Support Python 3.8+ in Makefile.
perf report/top:
Arnaldo Carvalho de Melo:
- Fix title line formatting.
perf script:
Andreas Gerstmayr:
- Fix SEGFAULT when using DWARF mode.
- Fix invalid read of directory entry after closedir(), found with valgrind.
Hagen Paul Pfeifer:
- Introduce --deltatime option.
Stephane Eranian:
- Allow --symbol to accept hexadecimal addresses.
Ian Rogers:
- Add -S/--symbols documentation
Namhyung Kim:
- Add --show-cgroup-events option.
perf python:
Arnaldo Carvalho de Melo:
- Include rwsem.c in the python binding, needed by the cgroups improvements.
build-test:
Arnaldo Carvalho de Melo:
- Honour JOBS to override detection of number of cores
perf top:
Jin Yao:
- Support --group-sort-idx to change the sort order
- perf top: Support hotkey to change sort order
perf pmu-events x86:
Jin Yao:
- Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
perf symbols arm64:
Kemeng Shi:
- Fix arm64 gap between kernel start and module end
kernel perf subsystem:
Namhyung Kim:
- Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
to allow cgroup tracking, saving a link between cgroup path and
its id number.
perf cgroup:
Namhyung Kim:
- Maintain cgroup hierarchy.
perf report:
Namhyung Kim:
- Add 'cgroup' sort key.
perf record:
Namhyung Kim:
- Support synthesizing cgroup events for pre-existing cgroups.
- Add --all-cgroups option
Documentation:
Tony Jones:
- Update docs regarding kernel/user space unwinding.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl6GTQMUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vy3PhAAmqpYBRobOsG8QbmKDjoJEFtkqdvD
z6+4zf/R+hF11RyXjMDwihIe8d+tkQ4eAaYu6Oh5PrTyanz0G0PgeCrivZeytULk
thqQIWzDQMVA5vN/2/Vy8s5s+3HzP8z/MZOFScJ7+xA1MndXptPRTNmFUbjx+GAv
x8/pTp0u9AF6m7itX65DxXvwkzjWamt+Ar4Yx2IcuKAU/M5RtfuZO3PpDnqn7/wk
JFlkRoYeFB6qNnnkPdeyPHl9dALhuhzgdTyklQEnKVW3nf3xThYDhcEwdh6kBQgl
0dH8lL5LXy7PKGN8RES4wB0Vqndw/HlsCF5O4wkkfItbnbJxGJtS139e5973m0ud
sgWvF4yJAT2jCKhIeNz34sePQJMyWALhv0XzZCsJ0YeGHsrV1jrHELkwUT1+eIsT
3UV0iZ6aL06zQJDyKUbbIcQzEQ/wwBC+x9VgsyL54K1quCQZ1N1Nl/dvrb4cRG9m
m9EhJK/brDf4c0uFlOmMTSxV1t5J+z6ZSQnh1ShD/o5yBsxqN6q5brDT6LEs+jbM
LsIkA18jJOd4OyiDs98YiFKvIfFQbQ0LEBQpJwhF0snvfBFMMbUYN/T/NYneWON/
F0TpkFoP7PXDuq55iNaLdnObfzrpC9kdzUyWvePUvjxIl55bkf+/qtUny+H48t4L
dNggvW052d7BHes=
=deWu
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)
- Add more 32 GT/s link speed decoding and improve the implementation
(Yicong Yang)
Resource management:
- Add support for sizing programmable host bridge apertures and fix a
related alpha Nautilus regression (Ivan Kokshaysky)
Interrupts:
- Add boot interrupt quirk mechanism for Xeon chipsets and document
boot interrupts (Sean V Kelley)
PCIe native device hotplug:
- When possible, disable in-band presence detect and use PDS
(Alexandru Gagniuc)
- Add DMI table for devices that don't use in-band presence detection
but don't advertise that correctly (Stuart Hayes)
- Fix hang when powering slots up/down via sysfs (Lukas Wunner)
- Fix an MSI interrupt race (Stuart Hayes)
Virtualization:
- Add ACS quirks for Zhaoxin devices (Raymond Pang)
Error handling:
- Add Error Disconnect Recover (EDR) support so firmware can report
devices disconnected via DPC and we can try to recover (Kuppuswamy
Sathyanarayanan)
Peer-to-peer DMA:
- Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
Maier)
ASPM:
- Reduce severity of common clock config message (Chris Packham)
- Clear the correct bits when enabling L1 substates, so we don't go
to the wrong state (Yicong Yang)
Endpoint framework:
- Replace EPF linkup ops with notifier call chain and improve locking
(Kishon Vijay Abraham I)
- Fix concurrent memory allocation in OB address region (Kishon Vijay
Abraham I)
- Move PF function number assignment to EPC core to support multiple
function creation methods (Kishon Vijay Abraham I)
- Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)
- Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
Offset (Kishon Vijay Abraham I)
- Add support for testing DMA transfers (Kishon Vijay Abraham I)
- Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)
- Add support for tests to clear IRQ (Kishon Vijay Abraham I)
- Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)
Amlogic Meson PCIe controller driver:
- Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
Pommarel)
- Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
Pommarel)
Cadence PCIe controller driver:
- Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
Abraham I)
Intel VMD host bridge driver:
- Add two VMD Device IDs that require bus restriction mode (Sushma
Kalakota)
Mobiveil PCIe controller driver:
- Refactor and modularize mobiveil driver (Hou Zhiqiang)
- Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)
Microsoft Hyper-V host bridge driver:
- Add support for Hyper-V PCI protocol version 1.3 and
PCI_BUS_RELATIONS2 (Long Li)
- Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
Feng)
- Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)
NVIDIA Tegra PCIe controller driver:
- Use pci_parse_request_of_pci_ranges() (Rob Herring)
- Add support for endpoint mode and related DT updates (Vidya Sagar)
- Reduce -EPROBE_DEFER error message log level (Thierry Reding)
Qualcomm PCIe controller driver:
- Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)
Synopsys DesignWare PCIe controller driver:
- Refactor core initialization code for endpoint mode (Vidya Sagar)
- Fix endpoint MSI-X to use correct table address (Kishon Vijay
Abraham I)
TI DRA7xx PCIe controller driver:
- Fix MSI IRQ handling (Vignesh Raghavendra)
TI Keystone PCIe controller driver:
- Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)
Miscellaneous:
- Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
Feng)
- Use ioremap(), not phys_to_virt(), for platform ROM to fix video
ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"
* tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
PCI: tegra: Print -EPROBE_DEFER error message at debug level
misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
tools: PCI: Add 'e' to clear IRQ
misc: pci_endpoint_test: Add ioctl to clear IRQ
misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
misc: pci_endpoint_test: Add support to get DMA option from userspace
tools: PCI: Add 'd' command line option to support DMA
misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
PCI: endpoint: functions/pci-epf-test: Print throughput information
PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
PCI: pciehp: Fix MSI interrupt race
PCI: pciehp: Fix indefinite wait on sysfs requests
PCI: endpoint: Fix clearing start entry in configfs
PCI: tegra: Add support for PCIe endpoint mode in Tegra194
PCI: sysfs: Revert "rescan" file renames
...
Here is the big set of char/misc/other driver patches for 5.7-rc1.
Lots of things in here, and it's later than expected due to some reverts
to resolve some reported issues. All is now clean with no reported
problems in linux-next.
Included in here is:
- interconnect updates
- mei driver updates
- uio updates
- nvmem driver updates
- soundwire updates
- binderfs updates
- coresight updates
- habanalabs updates
- mhi new bus type and core
- extcon driver updates
- some Kconfig cleanups
- other small misc driver cleanups and updates
As mentioned, all have been in linux-next for a while, and with the last
two reverts, all is calm and good.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodfvA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynzCQCfROhar3E8EhYEqSOP6xq6uhX9uegAnRgGY2rs
rN4JJpOcTddvZcVlD+vo
=ocWk
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of char/misc/other driver patches for 5.7-rc1.
Lots of things in here, and it's later than expected due to some
reverts to resolve some reported issues. All is now clean with no
reported problems in linux-next.
Included in here is:
- interconnect updates
- mei driver updates
- uio updates
- nvmem driver updates
- soundwire updates
- binderfs updates
- coresight updates
- habanalabs updates
- mhi new bus type and core
- extcon driver updates
- some Kconfig cleanups
- other small misc driver cleanups and updates
As mentioned, all have been in linux-next for a while, and with the
last two reverts, all is calm and good"
* tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (174 commits)
Revert "driver core: platform: Initialize dma_parms for platform devices"
Revert "amba: Initialize dma_parms for amba devices"
amba: Initialize dma_parms for amba devices
driver core: platform: Initialize dma_parms for platform devices
bus: mhi: core: Drop the references to mhi_dev in mhi_destroy_device()
bus: mhi: core: Initialize bhie field in mhi_cntrl for RDDM capture
bus: mhi: core: Add support for reading MHI info from device
misc: rtsx: set correct pcr_ops for rts522A
speakup: misc: Use dynamic minor numbers for speakup devices
mei: me: add cedar fork device ids
coresight: do not use the BIT() macro in the UAPI header
Documentation: provide IBM contacts for embargoed hardware
nvmem: core: remove nvmem_sysfs_get_groups()
nvmem: core: use is_bin_visible for permissions
nvmem: core: use device_register and device_unregister
nvmem: core: add root_only member to nvmem device struct
extcon: axp288: Add wakeup support
extcon: Mark extcon_get_edev_name() function as exported symbol
extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
...
Pull cgroup updates from Tejun Heo:
- Christian extended clone3 so that processes can be spawned into
cgroups directly.
This is not only neat in terms of semantics but also avoids grabbing
the global cgroup_threadgroup_rwsem for migration.
- Daniel added !root xattr support to cgroupfs.
Userland already uses xattrs on cgroupfs for bookkeeping. This will
allow delegated cgroups to support such usages.
- Prateek tried to make cpuset hotplug handling synchronous but that
led to possible deadlock scenarios. Reverted.
- Other minor changes including release_agent_path handling cleanup.
* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: Document the cpuset_v2_mode mount option
Revert "cpuset: Make cpuset hotplug synchronous"
cgroupfs: Support user xattrs
kernfs: Add option to enable user xattrs
kernfs: Add removed_size out param for simple_xattr_set
kernfs: kvmalloc xattr value instead of kmalloc
cgroup: Restructure release_agent_path handling
selftests/cgroup: add tests for cloning into cgroups
clone3: allow spawning processes into cgroups
cgroup: add cgroup_may_write() helper
cgroup: refactor fork helpers
cgroup: add cgroup_get_from_file() helper
cgroup: unify attach permission checking
cpuset: Make cpuset hotplug synchronous
cgroup.c: Use built-in RCU list checking
kselftest/cgroup: add cgroup destruction test
cgroup: Clean up css_set task traversal
- Core:
- Some code cleanup and optimization in core by Andy
- Debugfs support for displaying dmaengine channels by Peter
- Drivers:
- New driver for uniphier-xdmac controller
- Updates to stm32 dma, mdma and dmamux drivers and PM support
- More updates to idxd drivers
- Bunch of changes in tegra-apb driver and cleaning up of pm functions
- Bunch of spelling fixes and Replace zero-length array patches
- Shutdown hook for fsl-dpaa2-qdma driver
- Support for interleaved transfers for ti-edma and virtualization
support for k3-dma driver
- Support for reset and updates in xilinx_dma driver
- Improvements and locking updates in at_hdma driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl6FyokACgkQfBQHDyUj
g0dnFRAAj9lvpflrL+b9eWBZkY1ElV1jAdxsTs4HnYdXQM3ijw8yOosVVSqiuiOy
2qMfSRTP7qU9gqZ7oa1fnh05DqPmuTc3OF2IZlvGzkU9CiGQ735WGGGG8FfK/dZe
F4OgQGwA45b47hNIbvM4acwWZYPL+pBuYusKdjdHkouqVM4SORiNM8aRrCJ59xIn
P9TER//sMpdMEASuRuUIQnXb+OzSNPn1mLiP3zT0XHSM/nBMTAm7AnCDNT/Tjs9f
hwk2j8rLrwllHGqeZln8cWLhUCPrZFNe5pBWtWyV3MyY/nxlrcUX0ndJUGJIDtsb
nfXc4QKemOeF1RsC8DsQ/AY8jl6HFvRzWEEkq742IrLPCu/nTnxia4dbXW9MJ0Dp
BI7IPwoaOoYqBdRkBnSVS2F4x3813egsEReznlu/sUorTIG2g9sWtmuzv6eRt4ow
HczGgfdJXfCvIKbRg5TIXpbaJogbbB+1YrUlWq9vrZyhVw0ULtfxlWVKDy5VI1cL
0Kiz/ZIGuoQ9h6E4G3jCpaQTV49tNbYp+vimU9kizmcm+WXrTXR7rgD4AI5tH2DQ
pxYXNEl4gm1NRtWL1zzJ+B1C0MPXpc1Xafl92W39D6rphEGOdVVzay8meVIaQKDU
qQaZ1dEK4uuSxwj8NrF7sXHSClafF888FFJBEMArde1HVql/HRU=
=+UJ7
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-5.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"Core:
- Some code cleanup and optimization in core by Andy
- Debugfs support for displaying dmaengine channels by Peter
Drivers:
- New driver for uniphier-xdmac controller
- Updates to stm32 dma, mdma and dmamux drivers and PM support
- More updates to idxd drivers
- Bunch of changes in tegra-apb driver and cleaning up of pm
functions
- Bunch of spelling fixes and Replace zero-length array patches
- Shutdown hook for fsl-dpaa2-qdma driver
- Support for interleaved transfers for ti-edma and virtualization
support for k3-dma driver
- Support for reset and updates in xilinx_dma driver
- Improvements and locking updates in at_hdma driver"
* tag 'dmaengine-5.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (89 commits)
dt-bindings: dma: renesas,usb-dmac: add r8a77961 support
dmaengine: uniphier-xdmac: Remove redandant error log for platform_get_irq
dmaengine: tegra-apb: Improve DMA synchronization
dmaengine: tegra-apb: Don't save/restore IRQ flags in interrupt handler
dmaengine: tegra-apb: mark PM functions as __maybe_unused
dmaengine: fix spelling mistake "exceds" -> "exceeds"
dmaengine: sprd: Set request pending flag when DMA controller is active
dmaengine: ppc4xx: Use scnprintf() for avoiding potential buffer overflow
dmaengine: idxd: remove global token limit check
dmaengine: idxd: reflect shadow copy of traffic class programming
dmaengine: idxd: Merge definition of dsa_batch_desc into dsa_hw_desc
dmaengine: Create debug directories for DMA devices
dmaengine: ti: k3-udma: Implement custom dbg_summary_show for debugfs
dmaengine: Add basic debugfs support
dmaengine: fsl-dpaa2-qdma: remove set but not used variable 'dpaa2_qdma'
dmaengine: ti: edma: fix null dereference because of a typo in pointer name
dmaengine: fsl-dpaa2-qdma: Adding shutdown hook
dmaengine: uniphier-xdmac: Add UniPhier external DMA controller driver
dt-bindings: dmaengine: Add UniPhier external DMA controller bindings
dmaengine: ti: k3-udma: Implement support for atype (for virtualization)
...
* GICv4.1 support
* 32bit host removal
PPC:
* secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
* allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
* New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require bulk
modification of the page tables.
* Initial work on making nested SVM event injection more similar to VMX,
and less buggy.
* Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in function
names which occasionally means eptp, KVM too has standardized on "pgd".
* A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
* Some removal of pointer chasing from kvm_x86_ops, which will also be
switched to static calls as soon as they are available.
* New Tigerlake CPUID features.
* More bugfixes, optimizations and cleanups.
Generic:
* selftests: cleanups, new MMU notifier stress test, steal-time test
* CSV output for kvm_stat.
KVM/MIPS has been broken since 5.5, it does not compile due to a patch committed
by MIPS maintainers. I had already prepared a fix, but the MIPS maintainers
prefer to fix it in generic code rather than KVM so they are taking care of it.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6GOnIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfxwf/ZKLZiRoaovXCOG71M/eHtQb8ZIqU
3MPy+On3eC5Sk/aBxWUL9EFZsbYG6kYdbZ1VOvG9XPBoLlnkDSm/IR0kaELHtnjj
oGVda/tvGn46Ne39y8xBptmb91WDcWH0vFthT/CwlMxAw3xjr+gG7Qyo+8F2CW6m
SSSuLiHSBnyO1cQKruBTHZ8qnR8LlnfXEqtd6Y4LFLic0LbLIoIdRcT3wjQrcZrm
Djd7wbTEYZjUfoqZ72ekwEDUsONcDLDSKcguDO9pSMSCGhpxCVT5Vy68KRpoIMs2
nzNWDKjvqQo5zb2+GWxJgkd12Hv+n7PCXZMbVrWBu1pQsewUns9m4mkpGw==
=6fGt
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- GICv4.1 support
- 32bit host removal
PPC:
- secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
- allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
- New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require
bulk modification of the page tables.
- Initial work on making nested SVM event injection more similar to
VMX, and less buggy.
- Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in
function names which occasionally means eptp, KVM too has
standardized on "pgd".
- A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
- Some removal of pointer chasing from kvm_x86_ops, which will also
be switched to static calls as soon as they are available.
- New Tigerlake CPUID features.
- More bugfixes, optimizations and cleanups.
Generic:
- selftests: cleanups, new MMU notifier stress test, steal-time test
- CSV output for kvm_stat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
x86/kvm: fix a missing-prototypes "vmread_error"
KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
KVM: VMX: Add a trampoline to fix VMREAD error handling
KVM: SVM: Annotate svm_x86_ops as __initdata
KVM: VMX: Annotate vmx_x86_ops as __initdata
KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
KVM: VMX: Configure runtime hooks using vmx_x86_ops
KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
KVM: x86: Move init-only kvm_x86_ops to separate struct
KVM: Pass kvm_init()'s opaque param to additional arch funcs
s390/gmap: return proper error code on ksm unsharing
KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
KVM: Fix out of range accesses to memslots
KVM: X86: Micro-optimize IPI fastpath delay
KVM: X86: Delay read msr data iff writes ICR MSR
KVM: PPC: Book3S HV: Add a capability for enabling secure guests
KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
...
- Use notification chain instead of EPF linkup ops for EPC events (Kishon
Vijay Abraham I)
- Protect concurrent allocation in endpoint outbound address region
(Kishon Vijay Abraham I)
- Protect concurrent access to pci_epf_ops (Kishon Vijay Abraham I)
- Assign function number for each PF in endpoint core (Kishon Vijay
Abraham I)
- Refactor endpoint mode core initialization (Vidya Sagar)
- Add API to notify when core initialization completes (Vidya Sagar)
- Add test framework support to defer core initialization (Vidya Sagar)
- Update Tegra SoC ABI header to support uninitialization of UPHY PLL
when in endpoint mode without reference clock (Vidya Sagar)
- Add DT and driver support for Tegra194 PCIe endpoint nodes (Vidya
Sagar)
- Add endpoint test support for DMA data transfer (Kishon Vijay
Abraham I)
- Print throughput information in endpoint test (Kishon Vijay Abraham I)
- Use streaming DMA APIs for endpoint test buffer allocation (Kishon
Vijay Abraham I)
- Add endpoint test command line option for DMA (Kishon Vijay Abraham I)
- When stopping a controller via configfs, clear endpoint "start" entry
to prevent WARN_ON (Kunihiko Hayashi)
- Update endpoint ->set_msix() to pay attention to MSI-X BAR Indicator
and offset when finding MSI-X tables (Kishon Vijay Abraham I)
- MSI-X tables are in local memory, not in the PCI address space. Update
pcie-designware-ep to account for this (Kishon Vijay Abraham I)
- Allow AM654 PCIe Endpoint to raise MSI-X interrupts (Kishon Vijay
Abraham I)
- Avoid using module parameter to determine irqtype for endpoint test
(Kishon Vijay Abraham I)
- Add ioctl to clear IRQ for endpoint test (Kishon Vijay Abraham I)
- Add endpoint test 'e' option to clear IRQ (Kishon Vijay Abraham I)
- Bump limit on number of endpoint test devices from 10 to 10,000 (Kishon
Vijay Abraham I)
- Use full pci-endpoint-test name in request_irq() for easier profiling
(Kishon Vijay Abraham I)
- Reduce log level of -EPROBE_DEFER error messages to debug (Thierry
Reding)
* remotes/lorenzo/pci/endpoint:
misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
PCI: tegra: Print -EPROBE_DEFER error message at debug level
misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
tools: PCI: Add 'e' to clear IRQ
misc: pci_endpoint_test: Add ioctl to clear IRQ
misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
misc: pci_endpoint_test: Add support to get DMA option from userspace
tools: PCI: Add 'd' command line option to support DMA
misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
PCI: endpoint: functions/pci-epf-test: Print throughput information
PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
PCI: endpoint: Fix clearing start entry in configfs
PCI: tegra: Add support for PCIe endpoint mode in Tegra194
dt-bindings: PCI: tegra: Add DT support for PCIe EP nodes in Tegra194
soc/tegra: bpmp: Update ABI header
PCI: pci-epf-test: Add support to defer core initialization
PCI: dwc: Add API to notify core initialization completion
PCI: endpoint: Add notification for core init completion
PCI: dwc: Refactor core initialization code for EP mode
PCI: endpoint: Add core init notifying feature
PCI: endpoint: Assign function number for each PF in EPC core
PCI: endpoint: Protect concurrent access to pci_epf_ops with mutex
PCI: endpoint: Fix for concurrent memory allocation in OB address region
PCI: endpoint: Replace spinlock with mutex
PCI: endpoint: Use notification chain mechanism to notify EPC events to EPF
Add ioctl to clear IRQ which can be used to free the allocated
IRQ vectors and free the requested IRQ.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Add a new command line option 'd' to use DMA for data transfers.
It should be used with read, write or copy commands.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Alan Mikhak <alan.mikhak@sifive.com>
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set,
the source mapping will not be removed. The remap operation will be
performed as it would have been normally by moving over the page tables to
the new mapping. The old vma will have any locked flags cleared, have no
pagetables, and any userfaultfds that were watching that range will
continue watching it.
For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause
the mremap() call to fail. Because MREMAP_DONTUNMAP always results in
moving a VMA you MUST use the MREMAP_MAYMOVE flag, it's not possible to
resize a VMA while also moving with MREMAP_DONTUNMAP so old_len must
always be equal to the new_len otherwise it will return -EINVAL.
We hope to use this in Chrome OS where with userfaultfd we could write an
anonymous mapping to disk without having to STOP the process or worry
about VMA permission changes.
This feature also has a use case in Android, Lokesh Gidra has said that
"As part of using userfaultfd for GC, We'll have to move the physical
pages of the java heap to a separate location. For this purpose mremap
will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
heap, its virtual mapping will be removed as well. Therefore, we'll
require performing mmap immediately after. This is not only time
consuming but also opens a time window where a native thread may call mmap
and reserve the java heap's address range for its own usage. This flag
solves the problem."
[bgeffon@google.com: v6]
Link: http://lkml.kernel.org/r/20200218173221.237674-1-bgeffon@google.com
[bgeffon@google.com: v7]
Link: http://lkml.kernel.org/r/20200221174248.244748-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Deacon <will@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>
Link: http://lkml.kernel.org/r/20200207201856.46070-1-bgeffon@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces a vDPA-based vhost backend. This backend is
built on top of the same interface defined in virtio-vDPA and provides
a generic vhost interface for userspace to accelerate the virtio
devices in guest.
This backend is implemented as a vDPA device driver on top of the same
ops used in virtio-vDPA. It will create char device entry named
vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
on top of this char device to setup the backend.
Vhost ioctls are extended to make it type agnostic and behave like a
virtio device, this help to eliminate type specific API like what
vhost_net/scsi/vsock did:
- VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
by virtio specification to differ from different type of devices
- VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
supported by the vDPA device
- VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
- VHOST_VDPA_SET/GET_CONFIG: access virtio config space
- VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue
For memory mapping, IOTLB API is mandated for vhost-vDPA which means
userspace drivers are required to use
VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
a specific userspace memory region.
The vhost-vDPA API is designed to be type agnostic, but it allows net
device only in current stage. Due to the lacking of control virtqueue
support, some features were filter out by vhost-vdpa.
We will enable more features and devices in the near future.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200326140125.19794-8-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Replace the
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
with
/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
to help coreboot community consume this file without relaxing their
licensing checks.
Signed-off-by: Rajat Jain <rajatja@google.com>
Link: https://lore.kernel.org/r/20200329172513.133548-1-rajatja@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull networking updates from David Miller:
"Highlights:
1) Fix the iwlwifi regression, from Johannes Berg.
2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.
3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.
4) Add TTL decrement action to openvswitch, from Matteo Croce.
5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.
6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.
7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.
8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.
9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.
10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.
11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.
12) Support bcmgenet on ACPI, from Jeremy Linton.
13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.
14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.
15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.
16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.
17) Offload FIFO qdisc in mlxsw, from Petr Machata.
18) Support UDP sockets in sockmap, from Lorenz Bauer.
19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.
20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.
21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.
22) Add support for hw offload of MACSEC, from Antoine Tenart.
23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.
24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.
25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...
- Add support for host software queue for (e)MMC/SD
- Throttle polling rate for CMD6
- Update CMD13 busy condition check for CMD6 commands
- Improve busy detect polling for erase/trim/discard/HPI
- Fixup support for HW busy detection for HPI commands
- Re-work and improve support for eMMC sanitize commands
MMC host:
- mmci: Add support for sdmmc variant revision 2.0
- mmci_sdmmc: Improve support for busyend detection
- mmci_sdmmc: Fixup support for signal voltage switch
- mmci_sdmmc: Add support for tuning with delay block
- mtk-sd: Fix another SDIO irq issue
- sdhci: Disable native card detect when GPIO based type exist
- sdhci: Add option to defer request completion
- sdhci_am654: Add support to set a tap value per speed mode
- sdhci-esdhc-imx: Add support for i.MX8MM based variant
- sdhci-esdhc-imx: Fixup support for standard tuning on i.MX8 usdhc
- sdhci-esdhc-imx: Optimize for strobe/clock dll settings
- sdhci-esdhc-imx: Fixup support for system and runtime suspend/resume
- sdhci-iproc: Update regulator/bus-voltage management for bcm2711
- sdhci-msm: Prevent clock gating with PWRSAVE_DLL on broken variants
- sdhci-msm: Fix management of CQE during SDHCI reset
- sdhci-of-arasan: Add support for auto tuning on ZynqMP based platforms
- sdhci-omap: Add support for system suspend/resume
- sdhci-sprd: Add support for HW busy detection
- sdhci-sprd: Enable support host software queue
- sdhci-tegra: Add support for HW busy detection
- tmio/renesas_sdhi: Enforce retune after runtime suspend
- renesas_sdhi: Use manual tap correction for HS400 on some variants
- renesas_sdhi: Add support for manual correction of tap values for tunings
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAl6CGT8XHHVsZi5oYW5z
c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjClFWg/+LzX09vHBOfAu7hT/RokcTaBT
uQnSAfmhkBI+CZerVulPjDX9lFpG2Jb/fu44Ae9EqOAOESAgsTJpxywRRO2f+aNL
ie9mc0WOkmz1wuAbqYPJImES0CIL2WNpivovLgquRWyltbneh+ImkCbqoWmDYff7
uIuIC4EPhrWYJczdKr5RCw6HVbsNEAgAr6oJEbmzC63HciCPx5Zo99FN5WHoyRnf
3c3Ehc4wkVy5iu/wlXqmRdvuayDHhAAmVq6FP5J3IfuoeES3EYeKHc2Ej+pwhYi9
IFCrO8RDKEu3/o5hLp60ShhF7N/LGWYsl+5KfrwOQ6YPyMLYawR6L0iTYSqkQijy
3admTGD4OGFuN/8DvQb0yUwhSpRm/Dj+jBZTP3uk9FJHteFlLNHnzREk7weo8i/R
2WNDSbbV3+TudfC0uC4ipsHtDoidyds+TvR/ebO53pH2Dcr/z6h7i+1tKczA2rK4
x9mqXhOsskNZC26/UBb9K2oElRON4XDv+VZdQI5ddDuabIYIswXMWLYD1TGYoX5z
1PXSrrj/Jl/Sz65ZpabKJOexa24s2uThvpOnrGCy2aDc/tbDpcvVhKwL6NX9iRK0
yYKpwy9yWCGMryVfLI+ahJpvJfQDY4ufKpLC2429LVvgFvNZDG233ZcZhdlhoLNG
nWh9qHTGTPWo/213yx0=
=gILc
-----END PGP SIGNATURE-----
Merge tag 'mmc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Add support for host software queue for (e)MMC/SD
- Throttle polling rate for CMD6
- Update CMD13 busy condition check for CMD6 commands
- Improve busy detect polling for erase/trim/discard/HPI
- Fixup support for HW busy detection for HPI commands
- Re-work and improve support for eMMC sanitize commands
MMC host:
- mmci:
* Add support for sdmmc variant revision 2.0
- mmci_sdmmc:
* Improve support for busyend detection
* Fixup support for signal voltage switch
* Add support for tuning with delay block
- mtk-sd:
* Fix another SDIO irq issue
- sdhci:
* Disable native card detect when GPIO based type exist
- sdhci:
* Add option to defer request completion
- sdhci_am654:
* Add support to set a tap value per speed mode
- sdhci-esdhc-imx:
* Add support for i.MX8MM based variant
* Fixup support for standard tuning on i.MX8 usdhc
* Optimize for strobe/clock dll settings
* Fixup support for system and runtime suspend/resume
- sdhci-iproc:
* Update regulator/bus-voltage management for bcm2711
- sdhci-msm:
* Prevent clock gating with PWRSAVE_DLL on broken variants
* Fix management of CQE during SDHCI reset
- sdhci-of-arasan:
* Add support for auto tuning on ZynqMP based platforms
- sdhci-omap:
* Add support for system suspend/resume
- sdhci-sprd:
* Add support for HW busy detection
* Enable support host software queue
- sdhci-tegra:
* Add support for HW busy detection
- tmio/renesas_sdhi:
* Enforce retune after runtime suspend
- renesas_sdhi:
* Use manual tap correction for HS400 on some variants
* Add support for manual correction of tap values for tunings"
* tag 'mmc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (86 commits)
mmc: cavium-octeon: remove nonsense variable coercion
mmc: mediatek: fix SDIO irq issue
mmc: mmci_sdmmc: Fix clear busyd0end irq flag
dt-bindings: mmc: Fix node name in an example
mmc: core: Re-work the code for eMMC sanitize
mmc: sdhci: use FIELD_GET for preset value bit masks
mmc: sdhci-of-at91: Display clock changes for debug purpose only
mmc: sdhci: iproc: Add custom set_power() callback for bcm2711
mmc: sdhci: am654: Use sdhci_set_power_and_voltage()
mmc: sdhci: at91: Use sdhci_set_power_and_voltage()
mmc: sdhci: milbeaut: Use sdhci_set_power_and_voltage()
mmc: sdhci: arasan: Use sdhci_set_power_and_voltage()
mmc: sdhci: Introduce sdhci_set_power_and_bus_voltage()
mmc: vub300: Use scnprintf() for avoiding potential buffer overflow
dt-bindings: mmc: synopsys-dw-mshc: fix clock-freq-min-max in example
sdhci: tegra: Enable MMC_CAP_WAIT_WHILE_BUSY host capability
sdhci: tegra: Implement Tegra specific set_timeout callback
mmc: sdhci-omap: Add Support for Suspend/Resume
mmc: renesas_sdhi: simplify execute_tuning
mmc: renesas_sdhi: Use BITS_PER_LONG helper
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl6CDIMACgkQxWXV+ddt
WDuJ9g/+NTVt+OXAX3G4VLAIR6EjugREAmiHPlojM7scKsmkBuH9BN35+2EPj+yS
rSmdL01nOH3gyqe+RzAc1EEiujH/9uDpkNf4zE1tGtj9m5Useqj8ZNmiG/BN0PmR
OJZkVb8DXUHEXIFscHjQJPP60kFZoqIovS7qZbDh4992+p98lTiUUEI6SPanVYeR
QysXxmafty03hQMFW93ohFZemwAELVVI44nHxxcmOHT5BbIIopXrkInkkchB9I6b
l+tIJx1gjL6k0D3v/TTqRuD+wGCE8InJgtiuEOf0WkHp2YXUlSDaKAnF/j9Le4oe
eOgc50LtA3YNGmZ2m5vTeRjBeU9qUPWjJWJ2urp87oIrxX5x7B5Hsjxdnn28P0yZ
dl/dt9HxeCKFgaRrMZYETYq9VBt0IMxiOIG9w5fukB9qnC6Dd05dXyQB0slg0+l1
chn5p0FtMS74cvXB32jW7N0fwxWNt6KI4zBvomabJGYZQd6+dyDO8l8Od86vvve/
w7KgRy7CFBjc9JOCyLTvS8eEhu/qAVc07phSblpdNnyzPFjWWTdZySON/qQYvUCf
cGDiq+5+1d1+kWuEjtYNzvxon2AaAfg7UBZm5FrjN735ojTQXqm2vi3rrurcU5AZ
ItmiU6DMre5EGZ+hfWgSPXDkeqx/JYbtDuUwWbNg6svTXaKKnmI=
=1m9l
-----END PGP SIGNATURE-----
Merge tag 'for-5.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"A number of core changes that make things work better in general, code
is simpler and cleaner.
Core changes:
- per-inode file extent tree, for in memory tracking of contiguous
extent ranges to make sure i_size adjustments are accurate
- tree root structures are protected by reference counts, replacing
SRCU that did not cover some cases
- leak detector for tree root structures
- per-transaction pinned extent tracking
- buffer heads are replaced by bios for super block access
- speedup of extent back reference resolution, on an example test
scenario the runtime of send went down from a hour to minutes
- factor out locking scheme used for subvolume writer and NOCOW
exclusion, abstracted as DREW lock, double reader-writer exclusion
(allow either readers or writers)
- cleanup and abstract extent allocation policies, preparation for
zoned device support
- make reflink/clone_range work on inline extents
- add more cancellation point for relocation, improves long response
from 'balance cancel'
- add page migration callback for data pages
- switch to guid for uuids, with additional cleanups of the interface
- make ranged full fsyncs more efficient
- removal of obsolete ioctl flag BTRFS_SUBVOL_CREATE_ASYNC
- remove b-tree readahead from delayed refs paths, avoiding seek and
read unnecessary blocks
Features:
- v2 of ioctl to delete subvolumes, allowing to delete by id and more
future extensions
Fixes:
- fix qgroup rescan worker that could block umount
- fix crash during unmount due to race with delayed inode workers
- fix dellaloc flushing logic that could create unnecessary chunks
under heavy load
- fix missing file extent item for hole after ranged fsync
- several fixes in relocation error handling
Other:
- more documentation of relocation, device replace, space
reservations
- many random cleanups"
* tag 'for-5.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (210 commits)
btrfs: fix missing semaphore unlock in btrfs_sync_file
btrfs: use nofs allocations for running delayed items
btrfs: sysfs: Use scnprintf() instead of snprintf()
btrfs: do not resolve backrefs for roots that are being deleted
btrfs: track reloc roots based on their commit root bytenr
btrfs: restart relocate_tree_blocks properly
btrfs: reloc: reorder reservation before root selection
btrfs: do not readahead in build_backref_tree
btrfs: do not use readahead for running delayed refs
btrfs: Remove async_transid from btrfs_mksubvol/create_subvol/create_snapshot
btrfs: Remove transid argument from btrfs_ioctl_snap_create_transid
btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support
btrfs: kill the subvol_srcu
btrfs: make btrfs_cleanup_fs_roots use the radix tree lock
btrfs: don't take an extra root ref at allocation time
btrfs: hold a ref on the root on the dead roots list
btrfs: make inodes hold a ref on their roots
btrfs: move the root freeing stuff into btrfs_put_root
btrfs: move ino_cache_inode dropping out of btrfs_free_fs_root
btrfs: make the extent buffer leak check per fs info
...
Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves a file's
encryption nonce. This makes it easier to write automated tests which
verify that fscrypt is doing the encryption correctly.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXoIg/RQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK2mZAQDjEil0Kf8AqZhjPuJSRrbifkzEPfu+
4EmERSyBZ5OCLgEA155kKnL5jiz7b5DRS9wGEw+drGpW8I7WfhTGv/XjoQs=
=2jU9
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
"Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves a file's
encryption nonce.
This makes it easier to write automated tests which verify that
fscrypt is doing the encryption correctly"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
ubifs: wire up FS_IOC_GET_ENCRYPTION_NONCE
f2fs: wire up FS_IOC_GET_ENCRYPTION_NONCE
ext4: wire up FS_IOC_GET_ENCRYPTION_NONCE
fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl
* Add a capability for enabling secure guests under the Protected
Execution Framework ultravisor
* Various bug fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJegnq3AAoJEJ2a6ncsY3GfrU8IANpaxS7kTAsW3ZN0wJnP2PHq
i8j7ZPlCVLpkQWArrMyMimDLdiN9VP7lvaWonAWjG0HJrxlmMUUVnwVSMuPTXhWY
vSwEqhUXwSF5KKxN7DYkgxqaKFxkElJIubVE/AYJjH9zFpu9ca4vM5sCxnzWcvS3
hxPNe756nKhFH9xhrC/9NRUZWmDAiv75wvzq+5DRKbSVPJGJugchdIPBbi3Yrr7P
qxnUCPZdCxzXXU94bfQl038wrjSMR3S7b4FvekJ12go2FalujqzsL2lVtift4Rf0
jvu+RIINcmTiaQqclY332j7l24LZ6Pni456RygT5OwFxuYoxWKZRafN16JmaMCo=
=rpJL
-----END PGP SIGNATURE-----
Merge tag 'kvm-ppc-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
KVM PPC update for 5.7
* Add a capability for enabling secure guests under the Protected
Execution Framework ultravisor
* Various bug fixes and cleanups.
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Add support to specify a stateful expression in set definitions,
this allows users to specify e.g. counters per set elements.
2) Flowtable software counter support.
3) Flowtable hardware offload counter support, from wenxu.
3) Parallelize flowtable hardware offload requests, from Paul Blakey.
This includes a patch to add one work entry per offload command.
4) Several patches to rework nf_queue refcount handling, from Florian
Westphal.
4) A few fixes for the flowtable tunnel offload: Fix crash if tunneling
information is missing and set up indirect flow block as TC_SETUP_FT,
patch from wenxu.
5) Stricter netlink attribute sanity check on filters, from Romain Bellan
and Florent Fourcot.
5) Annotations to make sparse happy, from Jules Irenge.
6) Improve icmp errors in debugging information, from Haishuang Yan.
7) Fix warning in IPVS icmp error debugging, from Haishuang Yan.
8) Fix endianess issue in tcp extension header, from Sergey Marinkevich.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Devices capable of offloading the kernel's datapath and perform
functions such as bridging and routing must also be able to send (trap)
specific packets to the kernel (i.e., the CPU) for processing.
For example, a device acting as a multicast-aware bridge must be able to
trap IGMP membership reports to the kernel for processing by the bridge
module.
In most cases, the underlying device is capable of handling packet rates
that are several orders of magnitude higher compared to those that can
be handled by the CPU.
Therefore, in order to prevent the underlying device from overwhelming
the CPU, devices usually include packet trap policers that are able to
police the trapped packets to rates that can be handled by the CPU.
This patch allows capable device drivers to register their supported
packet trap policers with devlink. User space can then tune the
parameters of these policer (currently, rate and burst size) and read
from the device the number of packets that were dropped by the policer,
if supported.
Subsequent patches in the series will allow device drivers to create
default binding between these policers and packet trap groups and allow
user space to change the binding.
v2:
* Add 'strict_start_type' in devlink policy
* Have device drivers provide max/min rate/burst size for each policer.
Use them to check validity of user provided parameters
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
under given bpf_link. Currently this is only supported for bpf_cgroup_link,
but will be extended to other kinds of bpf_links in follow-up patches.
For bpf_cgroup_link, implemented functionality matches existing semantics for
direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
unconditionally set new bpf_prog regardless of which bpf_prog is currently
active under given bpf_link, or, optionally, can specify expected active
bpf_prog. If active bpf_prog doesn't match expected one, no changes are
performed, old bpf_link stays intact and attached, operation returns
a failure.
cgroup_bpf_replace() operation is resolving race between auto-detachment and
bpf_prog update in the same fashion as it's done for bpf_link detachment,
except in this case update has no way of succeeding because of target cgroup
marked as dying. So in this case error is returned.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200330030001.2312810-3-andriin@fb.com
Implement new sub-command to attach cgroup BPF programs and return FD-based
bpf_link back on success. bpf_link, once attached to cgroup, cannot be
replaced, except by owner having its FD. Cgroup bpf_link supports only
BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
attachments can be freely intermixed.
To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
BPF program can be executed, implement auto-detachment of link. When
cgroup_bpf_release() is called, all attached bpf_links are forced to release
cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
well as still owning underlying bpf_prog. This is because user-space might
still have FDs open and active, so bpf_link as a user-referenced object can't
be freed yet. Once last active FD is closed, bpf_link will be freed and
underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
touched, because cgroup is released already.
The inherent race between bpf_cgroup_link release (from closing last FD) and
cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
the only additional check required is when bpf_cgroup_link attempts to detach
itself from cgroup. At that time we need to check whether there is still
cgroup associated with that link. And if not, exit with success, because
bpf_cgroup_link was already successfully detached.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com
Pull perf updates from Ingo Molnar:
"The main changes in this cycle were:
Kernel side changes:
- A couple of x86/cpu cleanups and changes were grandfathered in due
to patch dependencies. These clean up the set of CPU model/family
matching macros with a consistent namespace and C99 initializer
style.
- A bunch of updates to various low level PMU drivers:
* AMD Family 19h L3 uncore PMU
* Intel Tiger Lake uncore support
* misc fixes to LBR TOS sampling
- optprobe fixes
- perf/cgroup: optimize cgroup event sched-in processing
- misc cleanups and fixes
Tooling side changes are to:
- perf {annotate,expr,record,report,stat,test}
- perl scripting
- libapi, libperf and libtraceevent
- vendor events on Intel and S390, ARM cs-etm
- Intel PT updates
- Documentation changes and updates to core facilities
- misc cleanups, fixes and other enhancements"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
cpufreq/intel_pstate: Fix wrong macro conversion
x86/cpu: Cleanup the now unused CPU match macros
hwrng: via_rng: Convert to new X86 CPU match macros
crypto: Convert to new CPU match macros
ASoC: Intel: Convert to new X86 CPU match macros
powercap/intel_rapl: Convert to new X86 CPU match macros
PCI: intel-mid: Convert to new X86 CPU match macros
mmc: sdhci-acpi: Convert to new X86 CPU match macros
intel_idle: Convert to new X86 CPU match macros
extcon: axp288: Convert to new X86 CPU match macros
thermal: Convert to new X86 CPU match macros
hwmon: Convert to new X86 CPU match macros
platform/x86: Convert to new CPU match macros
EDAC: Convert to new X86 CPU match macros
cpufreq: Convert to new X86 CPU match macros
ACPI: Convert to new X86 CPU match macros
x86/platform: Convert to new CPU match macros
x86/kernel: Convert to new CPU match macros
x86/kvm: Convert to new CPU match macros
x86/perf/events: Convert to new CPU match macros
...
Here are the big set of USB and PHY driver patches for 5.7-rc1.
Nothing huge here, some new PHY drivers, loads of USB gadget fixes and
updates, xhci updates, usb-serial driver updates and new device ids, and
other minor things. Full details in the shortlog.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXoHL9w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymz6wCcDwDTZouXj+0B37q+kwlCQQPyLukAn2CxKfrM
d+wScRHWoZutA8IdzqaU
=5+jn
-----END PGP SIGNATURE-----
Merge tag 'usb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / PHY updates from Greg KH:
"Here are the big set of USB and PHY driver patches for 5.7-rc1.
Nothing huge here, some new PHY drivers, loads of USB gadget fixes and
updates, xhci updates, usb-serial driver updates and new device ids,
and other minor things. Full details in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'usb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (239 commits)
USB: cdc-acm: restore capability check order
usb: cdns3: make signed 1 bit bitfields unsigned
usb: gadget: fsl: remove unused variable 'driver_desc'
usb: gadget: f_fs: Fix use after free issue as part of queue failure
usb: typec: Correct the documentation for typec_cable_put()
USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback
USB: serial: option: add Wistron Neweb D19Q1
USB: serial: option: add BroadMobi BM806U
USB: serial: option: add support for ASKEY WWHC050
usb: core: Add ACPI support for USB interface devices
driver core: platform: Reimplement devm_platform_ioremap_resource
usb: dwc2: convert to devm_platform_get_and_ioremap_resource
usb: host: hisilicon: convert to devm_platform_get_and_ioremap_resource
usb: host: xhci-plat: convert to devm_platform_get_and_ioremap_resource
drivers: provide devm_platform_get_and_ioremap_resource()
phy: qcom-qusb2: Add new overriding tuning parameters in QUSB2 V2 PHY
phy: qcom-qusb2: Add support for overriding tuning parameters in QUSB2 V2 PHY
dt-bindings: phy: qcom-qusb2: Add support for overriding Phy tuning parameters
phy: qcom-qusb2: Add generic QUSB2 V2 PHY support
dt-bindings: phy: qcom,qusb2: Add compatibles for QUSB2 V2 phy and SC7180
...
Add support for TPROXY via a new bpf helper, bpf_sk_assign().
This helper requires the BPF program to discover the socket via a call
to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
helper takes its own reference to the socket in addition to any existing
reference that may or may not currently be obtained for the duration of
BPF processing. For the destination socket to receive the traffic, the
traffic must be routed towards that socket via local route. The
simplest example route is below, but in practice you may want to route
traffic more narrowly (eg by CIDR):
$ ip route add local default dev lo
This patch avoids trying to introduce an extra bit into the skb->sk, as
that would require more invasive changes to all code interacting with
the socket to ensure that the bit is handled correctly, such as all
error-handling cases along the path from the helper in BPF through to
the orphan path in the input. Instead, we opt to use the destructor
variable to switch on the prefetch of the socket.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl6Bvm0ACgkQCF8+vY7k
4RViMQ//baNJyAA8/Hpxz1w5+nL5hsTOhf2PcPgnCLkRVQnIiKMZoq7AS5KWtbqu
im0SM6nWduGc2T/44Ew13YmBnuWMdXL9Gs8XtAkNakgSV1UM+A3pRuOWRYCyU3Ts
1QDsc3N7oY9cvyJGWOlqdcA4gp4AaKxjS6M6Z18wYBk/jYSCcj4ZVecT89DYeeM7
wFORkv/xSdgC3eoKWEwTyglzUmrXKrbfHdcNWrQBg+1SN3WrMYQWCL6nSYMqn0Vu
f9L5E6jUSx9s6+apxS0OUQmDj78RM1JCEY1P8lgc3tAtVJ+X3yZbxwtpcvujhFPv
c48NUQeyxAJc7evarvkd73Gwl4buujqHSgiRUovHwqUXHJuGZ3PBTryV9HzbmYYy
EeHS/23t09F3j9zYtuoDNFIED03Mi+TNeS04cq8OIfwNl7xpUSEV0S/wd11V2308
cfm6lsogGE9HRbaIxCHgx4AiGFVhbpK1OQt66iYze8r/wyxnN8MVOHGWw+eI4LRK
9gwh7Wx37k6uCrjfOnLSgx7kcJ+mxSZEYyHJZqqtPm9H1SC68GOxhL/S3Zu7arvK
eiwFfxJBiunCEfauOx28kaAdvBZVyEvYeDFYl/k+q4DCIGjvK0FXud6QRjNXv24S
qUXYZKPUALTFOpbkQ3IQiBOQNM4NhF15RzCqRUptVnlF05MSywg=
=Ve8R
-----END PGP SIGNATURE-----
Merge tag 'media/v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- New sensor driver: imx219
- Support for some new pixelformats
- Support for Sun8i SoC
- Added more codecs to meson vdec driver
- Prepare for removing the legacy usbvision driver by moving it to
staging. This driver has issues and use legacy core APIs. If nobody
steps up to address those, it is time for its retirement.
- Several cleanups and improvements on drivers, with the addition of
new supported boards
* tag 'media/v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (236 commits)
media: venus: firmware: Ignore secure call error on first resume
media: mtk-vpu: load vpu firmware from the new location
media: i2c: video-i2c: fix build errors due to 'imply hwmon'
media: MAINTAINERS: add myself to co-maintain Hantro G1/G2 for i.MX8MQ
media: hantro: add initial i.MX8MQ support
media: dt-bindings: Document i.MX8MQ VPU bindings
media: vivid: fix incorrect PA assignment to HDMI outputs
media: hantro: Add linux-rockchip mailing list to MAINTAINERS
media: cedrus: h264: Fix 4K decoding on H6
media: siano: Use scnprintf() for avoiding potential buffer overflow
media: rc: Use scnprintf() for avoiding potential buffer overflow
media: allegro: create new struct for channel parameters
media: allegro: move mail definitions to separate file
media: allegro: pass buffers through firmware
media: allegro: verify source and destination buffer in VCU response
media: allegro: handle dependency of bitrate and bitrate_peak
media: allegro: read bitrate mode directly from control
media: allegro: make QP configurable
media: allegro: make frame rate configurable
media: allegro: skip filler data if possible
...
- allow TSYNC and USER_NOTIF together (Tycho Andersen)
- Add missing compat_ioctl for notify (Sven Schnelle)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl6BcbwWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJuhfD/9NjC6h+l9YNHW2O3bYPxDDjkJq
1aRInf+/UayTnfwhlZiJj2FFYyPR1gfZXB9CPcniYO/t6tsCdc+0kQdc3uUCPb0y
ClPp5Pc/u/SwEFgrj5gv/NsEwAVaTPy1ioagefZMENQXj77XcfifF5Mrave+lR3K
TiZsFItucIRTiEb8YY4xF/t5rn/lBvAqDiYNZwYYVcopnW3kgvOljz6ZRyOstV/B
J9QrErFfDH9SzPfK/1bZ5GbCUsTRzbGXA281UBhZdkJQaA3yoqK+yv/xKtoaX0WK
uxLPt2BG3qb21+8JZacJ2L6KQAwm5EdT+OyLyFzUYki23LsJNHEb+UpkoRnyJg5H
sSSZRj14WH5aK1REGTDLr5tgx5lxkXx/iuxYc4tuM56ToWS4hXNiQFU2cUcdqjSO
bSKVg1LO9FfTTMecYXUqljoOwAKMVra2nDNCpvkBr/1JMVFZjCfWpjy3ZvHHwpqt
BpxgfJW250HfnMWpa7k5p6bIP+WMwetwP1yGZx6xNz8j3ZSshIPUqCvTU6zD89CN
RXHMfnZOxNtq1biI41Ppc5/kCt2t4598BaGsWIWcjhhY8p5Ttq+HGs3tOPsuUXen
ccGAa/1Co0u5CCxudG4nZ2a/ooeijMx7D5HfvoYvHDQbugR/x4aSZuiw7JiTKBHr
EZCFZyxvFVqnqlzQ2Q==
=Ejyd
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"A couple of seccomp updates. They're both mostly bug fixes that I
wanted to have sit in linux-next for a while:
- allow TSYNC and USER_NOTIF together (Tycho Andersen)
- add missing compat_ioctl for notify (Sven Schnelle)"
* tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Add missing compat_ioctl for notify
seccomp: allow TSYNC and USER_NOTIF together
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6BJEMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpie7D/9gN4zhykYDfcgamfxMtTbpla2PdTnWoJxP
fjy/Nx2FySakmccaiCGQSQ1rzD1L67UQkJgEH6hPTomJvA4FaOmJ+ZSaExMy55LH
ZT+nD3zQ9SCuA0DEpfxbsCP1tbnoXSMQNt8Tyh0x8PAoxp5bI0eRczOju1QWLWTS
tjBEMZNipN6krrV9RPWT0S5Z31/yGr/sXprCSHFV9Ypzwrx58Tj2i6F9gR7FVbLs
nV2/O8taEn0sMQIz8TVHKol/TBalluGrC4M/bOeS3faP3BPN4TT24Gtc0LAKEibk
F49/SX7FzwhOdl43Bdkbe2bbL86p+zOLSf0IMBwMm0DJl4aiOljRUYTSYRolgGgm
Ebw9QhemTwbxxeD2nEriA4EAeYvTx69RDlN2eVilwwfJ48Xz9fVm3GNYG7LISeON
k3/TyZOBQH2SZ2Hc3oF2Mq9j1UPHXZHUUsUNlNcN+aM9SFHcWkRi6xZWemTJHJZ4
zFss5RZHo0+RLBa8rrx8xaO8iWrc73+FuRhr9eSsmyPIj+OZ4ezEFRRRHwtk2fgv
dZvD413AyCI1c+3LlBusESMsrtXyY8p9O9buNTzHy3ZUtHe0ERmYV2m/a83A5pXo
Kia/5aJbPIC61bAkCCkiVo+W9OASJ6o5+3CXl5sM9lGTbDXjcofzewmd+RHPestx
xVbzeR9UIw==
=bYLJ
-----END PGP SIGNATURE-----
Merge tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"Here are the io_uring changes for this merge window. Light on new
features this time around (just splice + buffer selection), lots of
cleanups, fixes, and improvements to existing support. In particular,
this contains:
- Cleanup fixed file update handling for stack fallback (Hillf)
- Re-work of how pollable async IO is handled, we no longer require
thread offload to handle that. Instead we rely using poll to drive
this, with task_work execution.
- In conjunction with the above, allow expendable buffer selection,
so that poll+recv (for example) no longer has to be a split
operation.
- Make sure we honor RLIMIT_FSIZE for buffered writes
- Add support for splice (Pavel)
- Linked work inheritance fixes and optimizations (Pavel)
- Async work fixes and cleanups (Pavel)
- Improve io-wq locking (Pavel)
- Hashed link write improvements (Pavel)
- SETUP_IOPOLL|SETUP_SQPOLL improvements (Xiaoguang)"
* tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block: (54 commits)
io_uring: cleanup io_alloc_async_ctx()
io_uring: fix missing 'return' in comment
io-wq: handle hashed writes in chains
io-uring: drop 'free_pfile' in struct io_file_put
io-uring: drop completion when removing file
io_uring: Fix ->data corruption on re-enqueue
io-wq: close cancel gap for hashed linked work
io_uring: make spdxcheck.py happy
io_uring: honor original task RLIMIT_FSIZE
io-wq: hash dependent work
io-wq: split hashing and enqueueing
io-wq: don't resched if there is no work
io-wq: remove duplicated cancel code
io_uring: fix truncated async read/readv and write/writev retry
io_uring: dual license io_uring.h uapi header
io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled
io_uring: Fix unused function warnings
io_uring: add end-of-bits marker and build time verify it
io_uring: provide means of removing buffers
io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG
...
On low memory system, run time dumps can consume too much memory. Add
administrator ability to disable auto dumps per reporter as part of the
error flow handle routine.
This attribute is not relevant while executing
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
By default, auto dump is activated for any reporter that has a dump method,
as part of the reporter registration to devlink.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.
$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.1.1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 10 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement TSINFO_GET request to get timestamping information for a network
device. This is traditionally available via ETHTOOL_GET_TS_INFO ioctl
request.
Move part of ethtool_get_ts_info() into common.c so that ioctl and netlink
code use the same logic to get timestamping information from the device.
v3: use "TSINFO" rather than "TIMESTAMP", suggested by Richard Cochran
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add three string sets related to timestamping information:
ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags
ETH_SS_TS_TX_TYPES: timestamping Tx types
ETH_SS_TS_RX_FILTERS: timestamping Rx filters
These will be used for TIMESTAMP_GET request.
v2: avoid compiler warning ("enumeration value not handled in switch")
in net_hwtstamp_validate()
v3: omit dash in Tx type names ("one-step-*" -> "onestep-*"), suggested by
Richard Cochran
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_EEE_NTF notification whenever EEE settings of a network
device are modified using ETHTOOL_MSG_EEE_SET netlink message or
ETHTOOL_SEEE ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement EEE_SET netlink request to set EEE settings of a network device.
These are traditionally set with ETHTOOL_SEEE ioctl request.
The netlink interface allows setting the EEE status for all link modes
supported by kernel but only first 32 link modes can be set at the moment
as only those are supported by the ethtool_ops callback.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement EEE_GET request to get EEE settings of a network device. These
are traditionally available via ETHTOOL_GEEE ioctl request.
The netlink interface allows reporting EEE status for all link modes
supported by kernel but only first 32 link modes are provided at the moment
as only those are reported by the ethtool_ops callback and drivers.
v2: fix alignment (whitespace only)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_PAUSE_NTF notification whenever pause parameters of
a network device are modified using ETHTOOL_MSG_PAUSE_SET netlink message
or ETHTOOL_SPAUSEPARAM ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PAUSE_SET netlink request to set pause parameters of a network
device. Thease are traditionally set with ETHTOOL_SPAUSEPARAM ioctl
request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PAUSE_GET request to get pause parameters of a network device.
These are traditionally available via ETHTOOL_GPAUSEPARAM ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_COALESCE_NTF notification whenever coalescing parameters
of a network device are modified using ETHTOOL_MSG_COALESCE_SET netlink
message or ETHTOOL_SCOALESCE ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement COALESCE_SET netlink request to set coalescing parameters of
a network device. These are traditionally set with ETHTOOL_SCOALESCE ioctl
request. This commit adds only support for device coalescing parameters,
not per queue coalescing parameters.
Like the ioctl implementation, the generic ethtool code checks if only
supported parameters are modified; if not, first offending attribute is
reported using extack.
v2: fix alignment (whitespace only)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement COALESCE_GET request to get coalescing parameters of a network
device. These are traditionally available via ETHTOOL_GCOALESCE ioctl
request. This commit adds only support for device coalescing parameters,
not per queue coalescing parameters.
Omit attributes with zero values unless they are declared as supported
(i.e. the corresponding bit in ethtool_ops::supported_coalesce_params is
set).
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds functionality to configure routes for RPL source routing
functionality. There is no IPIP functionality yet implemented which can
be added later when the cases when to use IPv6 encapuslation comes more
clear.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds rpl source routing receive handling. Everything works
only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly
the same behaviour as IPv6 segmentation routing. To handle compression
and uncompression a rpl.c file is created which contains the necessary
functionality. The receive handling will also care about IPv6
encapsulated so far it's specified as possible nexthdr in RFC 6554.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a uapi header for rpl struct definition. The segments
data can be accessed over rpl_segaddr or rpl_segdata macros. In case of
compri and compre is zero the segment data is not compressed and can be
accessed by rpl_segaddr. In the other case the compressed data can be
accessed by rpl_segdata and interpreted as byte array.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose a new netlink family to userspace to control the PM, setting:
- list of local addresses to be signalled.
- list of local addresses used to created subflows.
- maximum number of add_addr option to react
When the msk is fully established, the PM netlink attempts to
announce the 'signal' list via the ADD_ADDR option. Since we
currently lack the ADD_ADDR echo (and related event) only the
first addr is sent.
After exhausting the 'announce' list, the PM tries to create
subflow for each addr in 'local' list, waiting for each
connection to be completed before attempting the next one.
Idea is to add an additional PM hook for ADD_ADDR echo, to allow
the PM netlink announcing multiple addresses, in sequence.
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add ulp-specific diagnostic functions, so that subflow information can be
dumped to userspace programs like 'ss'.
v2 -> v3:
- uapi: use bit macros appropriate for userspace
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds new netlink attribute to allow a user to (optionally)
specify the desired offload mode immediately upon MACSec link creation.
Separate iproute patch will be required to support this from user space.
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce types and configs for bpf programs that can be attached to
LSM hooks. The programs can be enabled by the config option
CONFIG_BPF_LSM.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Florent Revest <revest@google.com>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org
This implements synchronized time-travel mode which - using a special
application on a unix socket - lets multiple machines take part in a
time-travelling simulation together.
The protocol for the unix domain socket is defined in the new file
include/uapi/linux/um_timetravel.h.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
While it is currently possible for userspace to specify that an existing
XDP program should not be replaced when attaching to an interface, there is
no mechanism to safely replace a specific XDP program with another.
This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
set along with IFLA_XDP_FD. If set, the kernel will check that the program
currently loaded on the interface matches the expected one, and fail the
operation if it does not. This corresponds to a 'cmpxchg' memory operation.
Setting the new attribute with a negative value means that no program is
expected to be attached, which corresponds to setting the UPDATE_IF_NOEXIST
flag.
A new companion flag, XDP_FLAGS_REPLACE, is also added to explicitly
request checking of the EXPECTED_FD attribute. This is needed for userspace
to discover whether the kernel supports the new attribute.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/158515700640.92963.3551295145441017022.stgit@toke.dk
Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(),
recvmsg() and bind-related hooks in order to retrieve the cgroup v2
context which can then be used as part of the key for BPF map lookups,
for example. Given these hooks operate in process context 'current' is
always valid and pointing to the app that is performing mentioned
syscalls if it's subject to a v2 cgroup. Also with same motivation of
commit 7723628101 ("bpf: Introduce bpf_skb_ancestor_cgroup_id helper")
enable retrieval of ancestor from current so the cgroup id can be used
for policy lookups which can then forbid connect() / bind(), for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/d2a7ef42530ad299e3cbb245e6c12374b72145ef.1585323121.git.daniel@iogearbox.net
In Cilium we're mainly using BPF cgroup hooks today in order to implement
kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*),
ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic
between Cilium managed nodes. While this works in its current shape and avoids
packet-level NAT for inter Cilium managed node traffic, there is one major
limitation we're facing today, that is, lack of netns awareness.
In Kubernetes, the concept of Pods (which hold one or multiple containers)
has been built around network namespaces, so while we can use the global scope
of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing
NodePort ports on loopback addresses), we also have the need to differentiate
between initial network namespaces and non-initial one. For example, ExternalIP
services mandate that non-local service IPs are not to be translated from the
host (initial) network namespace as one example. Right now, we have an ugly
work-around in place where non-local service IPs for ExternalIP services are
not xlated from connect() and friends BPF hooks but instead via less efficient
packet-level NAT on the veth tc ingress hook for Pod traffic.
On top of determining whether we're in initial or non-initial network namespace
we also have a need for a socket-cookie like mechanism for network namespaces
scope. Socket cookies have the nice property that they can be combined as part
of the key structure e.g. for BPF LRU maps without having to worry that the
cookie could be recycled. We are planning to use this for our sessionAffinity
implementation for services. Therefore, add a new bpf_get_netns_cookie() helper
which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would
provide the cookie for the initial network namespace while passing the context
instead of NULL would provide the cookie from the application's network namespace.
We're using a hole, so no size increase; the assignment happens only once.
Therefore this allows for a comparison on initial namespace as well as regular
cookie usage as we have today with socket cookies. We could later on enable
this helper for other program types as well as we would see need.
(*) Both externalTrafficPolicy={Local|Cluster} types
[0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information in
the sample. It will add a 64-bit id to identify current cgroup and it's
the file handle in the cgroup file system. Userspace should use this
information with PERF_RECORD_CGROUP event to match which cgroup it
belongs.
I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word. But if we want bigger samples, I can work on that
direction too.
Committer testing:
$ pahole perf_sample_data | grep -w cgroup -B5 -A5
/* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
struct perf_regs regs_intr; /* 312 16 */
/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
u64 stack_user_size; /* 328 8 */
u64 phys_addr; /* 336 8 */
u64 cgroup; /* 344 8 */
/* size: 384, cachelines: 6, members: 22 */
/* padding: 32 */
};
$
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To support cgroup tracking, add CGROUP event to save a link between
cgroup path and id number. This is needed since cgroups can go away
when userspace tries to read the cgroup info (from the id) later.
The attr.cgroup bit was also added to enable cgroup tracking from
userspace.
This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.
Committer testing:
From the resulting kernel, using /sys/kernel/btf/vmlinux:
$ pahole perf_event_attr | grep -w cgroup -B5 -A1
__u64 write_backward:1; /* 40:27 8 */
__u64 namespaces:1; /* 40:28 8 */
__u64 ksymbol:1; /* 40:29 8 */
__u64 bpf_event:1; /* 40:30 8 */
__u64 aux_output:1; /* 40:31 8 */
__u64 cgroup:1; /* 40:32 8 */
__u64 __reserved_1:31; /* 40:33 8 */
$
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
[staticize perf_event_cgroup function]
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We copied the virtio_iommu_config from the virtio-iommu specification,
which declares the fields using little-endian annotations (for example
le32). Unfortunately this causes sparse to warn about comparison between
little- and cpu-endian, because of the typecheck() in virtio_cread():
drivers/iommu/virtio-iommu.c:1024:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1024:9: sparse: restricted __le64 *
drivers/iommu/virtio-iommu.c:1024:9: sparse: unsigned long long *
drivers/iommu/virtio-iommu.c:1036:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1036:9: sparse: restricted __le64 *
drivers/iommu/virtio-iommu.c:1036:9: sparse: unsigned long long *
drivers/iommu/virtio-iommu.c:1040:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1040:9: sparse: restricted __le64 *
drivers/iommu/virtio-iommu.c:1040:9: sparse: unsigned long long *
drivers/iommu/virtio-iommu.c:1044:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1044:9: sparse: restricted __le32 *
drivers/iommu/virtio-iommu.c:1044:9: sparse: unsigned int *
drivers/iommu/virtio-iommu.c:1048:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1048:9: sparse: restricted __le32 *
drivers/iommu/virtio-iommu.c:1048:9: sparse: unsigned int *
drivers/iommu/virtio-iommu.c:1052:9: sparse: sparse: incompatible types in comparison expression (different base types):
drivers/iommu/virtio-iommu.c:1052:9: sparse: restricted __le32 *
drivers/iommu/virtio-iommu.c:1052:9: sparse: unsigned int *
Although virtio_cread() does convert virtio-endian (in our case
little-endian) to cpu-endian, the typecheck() needs the two arguments to
have the same endianness. Do as UAPI headers of other virtio devices do,
and remove the endian annotation from the device config.
Even though we change the UAPI this shouldn't cause any regression since
QEMU, the existing implementation of virtio-iommu that uses this header,
already removes the annotations when importing headers.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20200326093558.2641019-2-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull input fixes from Dmitry Torokhov:
- a fix to generate proper timestamps on key autorepeat events that
were broken recently
- a fix for Synaptics driver to only activate reduced reporting mode
when explicitly requested
- a new keycode for "selective screenshot" function
- other assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: fix stale timestamp on key autorepeat events
Input: move the new KEY_SELECTIVE_SCREENSHOT keycode
Input: avoid BIT() macro usage in the serio.h UAPI header
Input: synaptics-rmi4 - set reduced reporting mode only when requested
Input: synaptics - enable RMI on HP Envy 13-ad105ng
Input: allocate keycode for "Selective Screenshot" key
Input: tm2-touchkey - add support for Coreriver TC360 variant
dt-bindings: input: add Coreriver TC360 binding
dt-bindings: vendor-prefixes: Add Coreriver vendor prefix
Input: raydium_i2c_ts - fix error codes in raydium_i2c_boot_trigger()
This patch adds a new MACsec offloading option, MACSEC_OFFLOAD_MAC,
allowing a user to select a MAC as a provider for MACsec offloading
operations.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BIT() macro definition is internal to the Linux kernel and is not
to be used in UAPI headers; replace its usage with the _BITUL() macro
that is already used elsewhere in the header.
Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BIT() macro is not defined in UAPI headers; there is, however, similarly
defined _BITUL() macro present in include/uapi/linux/const.h; use it
instead and include <linux/const.h> and <linux/ioctl.h> in order to make
the definitions provided in the header useful.
Fixes: 3431ca4837 ("rtc: define RTC_VL_READ values")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20200324041209.GA30727@asgard.redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
We should try to keep keycodes sequential unless there is a reason to leave
a gap in numbering, so let's move it from 0x280 to 0x27a while we still
can.
Fixes: 3b059da983 ("Input: allocate keycode for Selective Screenshot key")
Acked-by: Rajat Jain <rajatja@google.com>
Link: https://lore.kernel.org/r/20200326182711.GA259753@dtor-ws
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The BIT() macro definition is not available for the UAPI headers
(moreover, it can be defined differently in the user space); replace
its usage with the _BITUL() macro that is defined in <linux/const.h>.
Fixes: 237483aa5c ("coresight: stm: adding driver for CoreSight STM component")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20200324042213.GA10452@asgard.redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At present, on Power systems with Protected Execution Facility
hardware and an ultravisor, a KVM guest can transition to being a
secure guest at will. Userspace (QEMU) has no way of knowing
whether a host system is capable of running secure guests. This
will present a problem in future when the ultravisor is capable of
migrating secure guests from one host to another, because
virtualization management software will have no way to ensure that
secure guests only run in domains where all of the hosts can
support secure guests.
This adds a VM capability which has two functions: (a) userspace
can query it to find out whether the host can support secure guests,
and (b) userspace can enable it for a guest, which allows that
guest to become a secure guest. If userspace does not enable it,
KVM will return an error when the ultravisor does the hypercall
that indicates that the guest is starting to transition to a
secure guest. The ultravisor will then abort the transition and
the guest will terminate.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Report event FAN_DIR_MODIFY with name in a variable length record similar
to how fid's are reported. With name info reporting implemented, setting
FAN_DIR_MODIFY in mark mask is now allowed.
When events are reported with name, the reported fid identifies the
directory and the name follows the fid. The info record type for this
event info is FAN_EVENT_INFO_TYPE_DFID_NAME.
For now, all reported events have at most one info record which is
either FAN_EVENT_INFO_TYPE_FID or FAN_EVENT_INFO_TYPE_DFID_NAME (for
FAN_DIR_MODIFY). Later on, events "on child" will report both records.
There are several ways that an application can use this information:
1. When watching a single directory, the name is always relative to
the watched directory, so application need to fstatat(2) the name
relative to the watched directory.
2. When watching a set of directories, the application could keep a map
of dirfd for all watched directories and hash the map by fid obtained
with name_to_handle_at(2). When getting a name event, the fid in the
event info could be used to lookup the base dirfd in the map and then
call fstatat(2) with that dirfd.
3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
directories, the application could use open_by_handle_at(2) with the fid
in event info to obtain dirfd for the directory where event happened and
call fstatat(2) with this dirfd.
The last option scales better for a large number of watched directories.
The first two options may be available in the future also for non
privileged fanotify watchers, because open_by_handle_at(2) requires
the CAP_DAC_READ_SEARCH capability.
Link: https://lore.kernel.org/r/20200319151022.31456-15-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Dirent events are going to be supported in two flavors:
1. Directory fid info + mask that includes the specific event types
(e.g. FAN_CREATE) and an optional FAN_ONDIR flag.
2. Directory fid info + name + mask that includes only FAN_DIR_MODIFY.
To request the second event flavor, user needs to set the event type
FAN_DIR_MODIFY in the mark mask.
The first flavor is supported since kernel v5.1 for groups initialized
with flag FAN_REPORT_FID. It is intended to be used for watching
directories in "batch mode" - the watcher is notified when directory is
changed and re-scans the directory content in response. This event
flavor is stored more compactly in the event queue, so it is optimal
for workloads with frequent directory changes.
The second event flavor is intended to be used for watching large
directories, where the cost of re-scan of the directory on every change
is considered too high. The watcher getting the event with the directory
fid and entry name is expected to call fstatat(2) to query the content of
the entry after the change.
Legacy inotify events are reported with name and event mask (e.g. "foo",
FAN_CREATE | FAN_ONDIR). That can lead users to the conclusion that
there is *currently* an entry "foo" that is a sub-directory, when in fact
"foo" may be negative or non-dir by the time user gets the event.
To make it clear that the current state of the named entry is unknown,
when reporting an event with name info, fanotify obfuscates the specific
event types (e.g. create,delete,rename) and uses a common event type -
FAN_DIR_MODIFY to describe the change. This should make it harder for
users to make wrong assumptions and write buggy filesystem monitors.
At this point, name info reporting is not yet implemented, so trying to
set FAN_DIR_MODIFY in mark mask will return -EINVAL.
Link: https://lore.kernel.org/r/20200319151022.31456-12-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Character arrays can be considered empty strings (if they are
immediately terminated), but they cannot be NULL.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
The commit 19ba1eb15a ("Input: psmouse - add a custom serio protocol
to send extra information") introduced usage of the BIT() macro
for SERIO_* flags; this macro is not provided in UAPI headers.
Replace if with similarly defined _BITUL() macro defined
in <linux/const.h>.
Fixes: 19ba1eb15a ("Input: psmouse - add a custom serio protocol to send extra information")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Cc: <stable@vger.kernel.org> # v5.0+
Link: https://lore.kernel.org/r/20200324041341.GA32335@asgard.redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
agnostic ioctl for setting, retrieving, and probing device features.
This implementation provides a 16-bit field for specifying a feature
index, where the data porition of the ioctl is determined by the
semantics for the given feature. Additional flag bits indicate the
direction and nature of the operation; SET indicates user data is
provided into the device feature, GET indicates the device feature is
written out into user data. The PROBE flag augments determining
whether the given feature is supported, and if provided, whether the
given operation on the feature is supported.
The first user of this ioctl is for setting the vfio-pci VF token,
where the user provides a shared secret key (UUID) on a SR-IOV PF
device, which users must provide when opening associated VF devices.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200226223125.GA20630@embeddedor
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Commit 53eca1f347 ("net: rename flow_action_hw_stats_types* ->
flow_action_hw_stats*") renamed just the flow action types and
helpers. For consistency rename variables, enums, struct members
and UAPI too (note that this UAPI was not in any official release,
yet).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This functionality was deprecated in kernel 5.4. Since no one has
complained of the impending removal it's time we did so.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
This ioctl will be responsible for deleting a subvolume using its id.
This can be used when a system has a file system mounted from a
subvolume, rather than the root file system, like below:
/
@subvol1/
@subvol2/
@subvol_default/
If only @subvol_default is mounted, we have no path to reach @subvol1
and @subvol2, thus no way to delete them. Current subvolume delete ioctl
takes a file handle point as argument, and if @subvol_default is
mounted, we can't reach @subvol1 and @subvol2 from the same mount point.
This patch introduces a new ioctl BTRFS_IOC_SNAP_DESTROY_V2 that takes
the extended structure with flags to allow to delete subvolume using
subvolid.
Now, we can use this new ioctl specifying the subvolume id and refer to
the same mount point. It doesn't matter which subvolume was mounted,
since we can reach to the desired one using the subvolume id, and then
delete it.
The full path to the subvolume id is resolved internally and access is
verified as if the subvolume was accessed by path.
The volume args v2 structure is extended to use the existing union for
subvolume id specification, that's valid in case the
BTRFS_SUBVOL_SPEC_BY_ID is set.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
The ioctl data for devices or subvolumes can be passed via
btrfs_ioctl_vol_args or btrfs_ioctl_vol_args_v2. The latter is more
versatile and needs some caution as some of the flags make sense only
for some ioctls.
As we're going to extend the flags, define support masks for each ioctl
class separately.
Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The feature VIRTIO_NET_F_HASH_REPORT extends the
layout of the packet and requests the device to
calculate hash on incoming packets and report it
in the packet header.
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Link: https://lore.kernel.org/r/20200302115003.14877-4-yuri.benditovich@daynix.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
RSS (Receive-side scaling) defines hash calculation
rules and decision on receive virtqueue according to
the calculated hash, provided mask to apply and
provided indirection table containing indices of
receive virqueues. The driver sends the control
command to enable multiqueue and provide parameters
for receive steering.
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Link: https://lore.kernel.org/r/20200302115003.14877-3-yuri.benditovich@daynix.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VIRTIO_NET_F_RSC_EXT feature bit indicates that the device
is able to provide extended RSC information. When the feature
is negotiatede and 'gso_type' field in received packet is not
GSO_NONE, the device reports number of coalesced packets in
'csum_start' field and number of duplicated acks in 'csum_offset'
field and sets VIRTIO_NET_HDR_F_RSC_INFO in 'flags' field.
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Link: https://lore.kernel.org/r/20200302115003.14877-2-yuri.benditovich@daynix.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* HE ranging (fine timing measurement) API support
* hwsim gets virtio support, for use with wmediumd,
to be able to simulate with multiple machines
* eapol-over-nl80211 improvements to exclude preauth
* IBSS reset support, to recover connections from
userspace
* and various others.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl50yM8ACgkQB8qZga/f
l8Qe0w//Rt5vsqlkiuZtg1SHQcMHq5Y6jJ92Xfg6Yw8H3H1g40r6UWDeSsF7Dl9H
YuC4VvfNH/FhnNz01FDVxjgjnUyjpAolxM6sYj3hW3ZIBahSBeovAdjTP59y1Fhi
zLMP/zYLsqtlKO5qr0fK7tUmENiF/IxyALRcxqsmQ1ul5003pIpFRVPoQVDTjHZn
JCzDTboQLmtyYiDYjejaeauo4Awjj4lDiyk3Fu+kXSv8weY4dl+w8/AM5zsiVNDs
4HRDpJUBIwfwWuygMNwl0EQVhnpYaNQFQ+IEe9B29qiAEb95MPHhSudU1qaT+4d7
CykikhH39/n4Wv1PBg4PrCfWInxAUiPz6K027lDOY5VtJFFbLg8GqQkn3Vy/WnHm
CJuVaLdJ73k7AypvWdZUgSe9SaloymVd987IIMixmz0t0cqIE3kPYot//kXFjy6m
rqvWJ3a7SIopPdkD4xXDwzASUmctCUedQ0QVnWYRQCUW5rnkZ96SHjblUA9IykYS
Ia44BwB8ggZsax1KrqyGV9hgj8mh9srLmEjtRPhxVD3ncE1ewiQaBRwv56J0EKNu
yRnV10dNMCtUyT3j9dRpHa8zuNpS/A6ShXe4aGgoXAz+ShjNOBj2GxXtolsDJzrd
zIqZH8gcWaeV+8yWhaGx8d82gvY8PhRvJl+AMax7Ds2avOZ/pa0=
=t3QF
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2020-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Another set of changes:
* HE ranging (fine timing measurement) API support
* hwsim gets virtio support, for use with wmediumd,
to be able to simulate with multiple machines
* eapol-over-nl80211 improvements to exclude preauth
* IBSS reset support, to recover connections from
userspace
* and various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have a nested tunnel info attribute we can add a separate
one for the tunnel command and require it explicitly from user-space. It
must be one of RTM_SETLINK/DELLINK. Only RTM_SETLINK requires a valid
tunnel id, DELLINK just removes it if it was set before. This allows us
to have all tunnel attributes and control in one place, thus removing
the need for an outside vlan info flag.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While discussing the new API, Roopa mentioned that we'll be adding more
tunnel attributes and options in the future, so it's better to make it a
nested attribute, since this is still in net-next we can easily change it
and nest the tunnel id attribute under BRIDGE_VLANDB_ENTRY_TUNNEL_INFO.
The new format is:
[BRIDGE_VLANDB_ENTRY]
[BRIDGE_VLANDB_ENTRY_TUNNEL_INFO]
[BRIDGE_VLANDB_TINFO_ID]
Any new tunnel attributes can be nested under
BRIDGE_VLANDB_ENTRY_TUNNEL_INFO.
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers that trigger roaming need to know the lifetime of the configured
PMKSA for deciding whether to trigger the full or PMKSA cache based
authentication. The configured PMKSA is invalid after the PMK lifetime
has expired and must not be used after that and the STA needs to
disassociate if the PMK expires. Hence the STA is expected to refresh
the PMK with a full authentication before this happens (e.g., when
reassociating to a new BSS the next time or by performing EAPOL
reauthentication depending on the AKM) to avoid unnecessary
disconnection.
The PMK reauthentication threshold is the percentage of the PMK lifetime
value and indicates to the driver to trigger a full authentication roam
(without PMKSA caching) after the reauthentication threshold time, but
before the PMK timer has expired. Authentication methods like SAE need
to be able to generate a new PMKSA entry without having to force a
disconnection after this threshold timeout. If no roaming occurs between
the reauthentication threshold time and PMK lifetime expiration,
disassociation is still forced.
The new attributes for providing these values correspond to the dot11
MIB variables dot11RSNAConfigPMKLifetime and
dot11RSNAConfigPMKReauthThreshold.
This type of functionality is already available in cases where user
space component is in control of roaming. This commit extends that same
capability into cases where parts or all of this functionality is
offloaded to the driver.
Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200312235903.18462-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Sometimes, userspace is able to detect that a peer silently lost its
state (like, if the peer reboots). wpa_supplicant does this for IBSS-RSN
by registering for auth/deauth frames, but when it detects this, it is
only able to remove the encryption keys of the peer and close its port.
However, the kernel also hold other state about the station, such as BA
sessions, probe response parameters and the like. They also need to be
resetted correctly.
This patch adds the NL80211_EXT_FEATURE_DEL_IBSS_STA feature flag
indicating the driver accepts deleting stations in IBSS mode, which
should send a deauth and reset the state of the station, just like in
mesh point mode.
Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Link: https://lore.kernel.org/r/20200305135754.12094-1-cavallar@lri.fr
[preserve -EINVAL return]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add API for telling whether the driver supports protected TWT.
The protected_twt capability in the RSNXE will be based on this.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200131111300.891737-23-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for requesting that the ranging measurement will use
the trigger-based / non trigger-based flow instead of the EDCA based
flow.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200131111300.891737-2-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the nl80211 control port is used before this patch, pre-auth frames
(0x88c7) are send to userspace uncoditionally. While this enables userspace
to only use nl80211 on the station side, it is not always useful for APs.
Furthermore, pre-auth frames are ordinary data frames and not related to
the control port. Therefore it should for example be possible for pre-auth
frames to be bridged onto a wired network on AP side without touching
userspace.
For backwards compatibility to code already using pre-auth over nl80211,
this patch adds a feature flag to disable this behavior, while it remains
enabled by default. An additional ext. feature flag is added to detect this
from userspace.
Thanks to Jouni for pointing out, that pre-auth frames should be handled as
ordinary data frames.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200312091055.54257-2-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This allows communication with external entities.
It also required fixing up the netlink policy, since NLA_UNSPEC
attributes are no longer accepted.
Signed-off-by: Erel Geron <erelx.geron@intel.com>
[port to backports, inline the ID, use 29 as the ID as requested,
drop != NULL checks, reduce ifdefs]
Link: https://lore.kernel.org/r/20200305143212.c6e4c87d225b.I7ce60bf143e863dcdf0fb8040aab7168ba549b99@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The code is called MEDIA_BUS_FMT_Y14_1X14 and behaves just like
MEDIA_BUS_FMT_Y12_1X12 with two more bits.
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The new format is called V4L2_PIX_FMT_Y14. Like V4L2_PIX_FMT_Y10 and
V4L2_PIX_FMT_Y12 it is stored in two bytes per pixel but has only two
unused bits at the top.
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves the nonce from
an encrypted file or directory. The nonce is the 16-byte random value
stored in the inode's encryption xattr. It is normally used together
with the master key to derive the inode's actual encryption key.
The nonces are needed by automated tests that verify the correctness of
the ciphertext on-disk. Except for the IV_INO_LBLK_64 case, there's no
way to replicate a file's ciphertext without knowing that file's nonce.
The nonces aren't secret, and the existing ciphertext verification tests
in xfstests retrieve them from disk using debugfs or dump.f2fs. But in
environments that lack these debugging tools, getting the nonces by
manually parsing the filesystem structure would be very hard.
To make this important type of testing much easier, let's just add an
ioctl that retrieves the nonce.
Link: https://lore.kernel.org/r/20200314205052.93294-2-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
This patch adds support for vlan stats to be included when dumping vlan
information. We have to dump them only when explicitly requested (thus the
flag below) because that disables the vlan range compression and will make
the dump significantly larger. In order to request the stats to be
included we add a new dump attribute called BRIDGE_VLANDB_DUMP_FLAGS which
can affect dumps with the following first flag:
- BRIDGE_VLANDB_DUMPF_STATS
The stats are intentionally nested and put into separate attributes to make
it easier for extending later since we plan to add per-vlan mcast stats,
drop stats and possibly STP stats. This is the last missing piece from the
new vlan API which makes the dumped vlan information complete.
A dump request which should include stats looks like:
[BRIDGE_VLANDB_DUMP_FLAGS] |= BRIDGE_VLANDB_DUMPF_STATS
A vlandb entry attribute with stats looks like:
[BRIDGE_VLANDB_ENTRY] = {
[BRIDGE_VLANDB_ENTRY_STATS] = {
[BRIDGE_VLANDB_STATS_RX_BYTES]
[BRIDGE_VLANDB_STATS_RX_PACKETS]
...
}
}
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows users to specify the stateful expression for the
elements in this set via NFTA_SET_EXPR. This new feature allows you to
turn on counters for all of the elements in this set.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This reverts the following commits:
8537f78647 ("netfilter: Introduce egress hook")
5418d3881e ("netfilter: Generalize ingress hook")
b030f194ae ("netfilter: Rename ingress hook include file")
>From the discussion in [0], the author's main motivation to add a hook
in fast path is for an out of tree kernel module, which is a red flag
to begin with. Other mentioned potential use cases like NAT{64,46}
is on future extensions w/o concrete code in the tree yet. Revert as
suggested [1] given the weak justification to add more hooks to critical
fast-path.
[0] https://lore.kernel.org/netdev/cover.1583927267.git.lukas@wunner.de/
[1] https://lore.kernel.org/netdev/20200318.011152.72770718915606186.davem@davemloft.net/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Nacked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Use nf_flow_offload_tuple() to fetch flow stats, from Paul Blakey.
2) Add new xt_IDLETIMER hard mode, from Manoj Basapathi.
Follow up patch to clean up this new mode, from Dan Carpenter.
3) Add support for geneve tunnel options, from Xin Long.
4) Make sets built-in and remove modular infrastructure for sets,
from Florian Westphal.
5) Remove unused TEMPLATE_NULLS_VAL, from Li RongQing.
6) Statify nft_pipapo_get, from Chen Wandun.
7) Use C99 flexible-array member, from Gustavo A. R. Silva.
8) More descriptive variable names for bitwise, from Jeremy Sowden.
9) Four patches to add tunnel device hardware offload to the flowtable
infrastructure, from wenxu.
10) pipapo set supports for 8-bit grouping, from Stefano Brivio.
11) pipapo can switch between nibble and byte grouping, also from
Stefano.
12) Add AVX2 vectorized version of pipapo, from Stefano Brivio.
13) Update pipapo to be use it for single ranges, from Stefano.
14) Add stateful expression support to elements via control plane,
eg. counter per element.
15) Re-visit sysctls in unprivileged namespaces, from Florian Westphal.
15) Add new egress hook, from Lukas Wunner.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement helpers for PCS accessed via the MII bus using 802.3 clause
22 cycles, conforming to 802.3 clause 37 and Cisco SGMII specifications
for the advertisement word.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for manipulating vlan/tunnel mappings. The
tunnel ids are globally unique and are one per-vlan. There were two
trickier issues - first in order to support vlan ranges we have to
compute the current tunnel id in the following way:
- base tunnel id (attr) + current vlan id - starting vlan id
This is in line how the old API does vlan/tunnel mapping with ranges. We
already have the vlan range present, so it's redundant to add another
attribute for the tunnel range end. It's simply base tunnel id + vlan
range. And second to support removing mappings we need an out-of-band way
to tell the option manipulating function because there are no
special/reserved tunnel id values, so we use a vlan flag to denote the
operation is tunnel mapping removal.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new option - BRIDGE_VLANDB_ENTRY_TUNNEL_ID which is used to dump
the tunnel id mapping. Since they're unique per vlan they can enter a
vlan range if they're consecutive, thus we can calculate the tunnel id
range map simply as: vlan range end id - vlan range start id. The
starting point is the tunnel id in BRIDGE_VLANDB_ENTRY_TUNNEL_ID. This
is similar to how the tunnel entries can be created in a range via the
old API (a vlan range maps to a tunnel range).
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New Chrome OS keyboards have a "snip" key that is basically a selective
screenshot (allows a user to select an area of screen to be copied).
Allocate a keycode for it.
Signed-off-by: Rajat Jain <rajatja@google.com>
Reviewed-by: Harry Cutts <hcutts@chromium.org>
Link: https://lore.kernel.org/r/20200313180333.75011-1-rajatja@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Commit e687ad60af ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets on ingress.
Allow the same on egress. Position the hook immediately before a packet
is handed to tc and then sent out on an interface, thereby mirroring the
ingress order. This order allows marking packets in the netfilter
egress hook and subsequently using the mark in tc. Another benefit of
this order is consistency with a lot of existing documentation which
says that egress tc is performed after netfilter hooks.
Egress hooks already exist for the most common protocols, such as
NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
they are executed earlier during packet processing. However for more
exotic protocols, there is currently no provision to apply netfilter on
egress. A common workaround is to enslave the interface to a bridge and
use ebtables, or to resort to tc. But when the ingress hook was
introduced, consensus was that users should be given the choice to use
netfilter or tc, whichever tool suits their needs best:
https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/
This hook is also useful for NAT46/NAT64, tunneling and filtering of
locally generated af_packet traffic such as dhclient.
There have also been occasional user requests for a netfilter egress
hook in the past, e.g.:
https://www.spinics.net/lists/netfilter/msg50038.html
Performance measurements with pktgen surprisingly show a speedup rather
than a slowdown with this commit:
* Without this commit:
Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
2920481pps 1401Mb/sec (1401830880bps) errors: 0
* With this commit:
Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
2941410pps 1411Mb/sec (1411876800bps) errors: 0
* Without this commit + tc egress:
Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
2562631pps 1230Mb/sec (1230062880bps) errors: 0
* With this commit + tc egress:
Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
2659259pps 1276Mb/sec (1276444320bps) errors: 0
* With this commit + nft egress:
Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
2413320pps 1158Mb/sec (1158393600bps) errors: 0
Tested on a bare-metal Core i7-3615QM, each measurement was performed
three times to verify that the numbers are stable.
Commands to perform a measurement:
modprobe pktgen
echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000
Commands for testing tc egress:
tc qdisc add dev lo clsact
tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32
Commands for testing nft egress:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop
All testing was performed on the loopback interface to avoid distorting
measurements by the packet handling in the low-level Ethernet driver.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mark usb_raw_io_flags_valid() and usb_raw_io_flags_zero() as inline to
fix the following warnings:
./usr/include/linux/usb/raw_gadget.h:69:12: warning: unused function 'usb_raw_io_flags_valid' [-Wunused-function]
./usr/include/linux/usb/raw_gadget.h:74:12: warning: unused function 'usb_raw_io_flags_zero' [-Wunused-function]
Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/6206b80b3810f95bfe1d452de45596609a07b6ea.1584456779.git.andreyknvl@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For BTI protection to be as comprehensive as possible, it is
desirable to have BTI enabled from process startup. If this is not
done, the process must use mprotect() to enable BTI for each of its
executable mappings, but this is painful to do in the libc startup
code. It's simpler and more sound to have the kernel do it
instead.
To this end, detect BTI support in the executable (or ELF
interpreter, as appropriate), via the
NT_GNU_PROGRAM_PROPERTY_TYPE_0 note, and tweak the initial prot
flags for the process' executable pages to include PROT_BTI as
appropriate.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ELF program properties will be needed for detecting whether to
enable optional architecture or ABI features for a new ELF process.
For now, there are no generic properties that we care about, so do
nothing unless CONFIG_ARCH_USE_GNU_PROPERTY=y.
Otherwise, the presence of properties using the PT_PROGRAM_PROPERTY
phdrs entry (if any), and notify each property to the arch code.
For now, the added code is not used.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull the basic ELF definitions relating to the
NT_GNU_PROPERTY_TYPE_0 note from Yu-Cheng Yu's earlier x86 shstk
series.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
1. Allow to disable gisa
2. protected virtual machines
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's
state like guest memory and guest registers anymore. Instead the
PVMs are mostly managed by a new entity called Ultravisor (UV),
which provides an API, so KVM and the PV can request management
actions.
PVMs are encrypted at rest and protected from hypervisor access
while running. They switch from a normal operation into protected
mode, so we can still use the standard boot process to load a
encrypted blob and then move it into protected mode.
Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.
One mm related patch will go via Andrews mm tree ( mm/gup/writeback:
add callbacks for inaccessible pages)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJeZf9tAAoJEBF7vIC1phx89J0P/iv3wCoMNDqAttnHa/UQFF04
njUadNYkAADDrsabIEOs9O+BE1/4BVspnIunE4+xw76p5M/7/g5eIhXWcLudhlnL
+XtvuEwz/2ffA9JWAAYNKB7cGqBM9BCC+iYzAF9ah6sPLmlDCoF+hRe0g+0tXSON
cklUJFril9bOcxd/MxrzFLcmipbxT/Z4/10eBY+FHcm6SQGOKAtJH0xL7X3PfPI5
L/6ZhML9exsj1Iplkrl8BomMRoYOrvfq/jMaZp9SwmfXaOKYmNU3a19MhzfZ593h
bfR92H8kZRy/TpBd7EnpxYGQ/n53HkUhFMhtqkkkeHW1rCo8ccwC4VfnXb+KqQp+
nJ8KieWG+OlKKFDuZPl5Gq+jQqjJfzchbyMTYnBNe+GPT5zg76tJXmQyDn5X9p3R
mfg+9ZEeEonMu7px93Ht1gLdPiC2gjRckjuBDPqMGEhG2z2SQ/MLri+WnproIQRa
TcE7rZBtuyrGFTq4M4dEcsUW02xnOaav6H57kkl8EwqYwgDHlqoUbt0AvLFyW07a
RlH7drmhKDwTJkcOhOLeLNM8Un6NvnsLZ8Lbcr9rRf9Z9Lpc+zW88BSwJ7MM/GH8
FEQM8Omnn8KAJTENpIm3bHHyvsi0kJEhl+c3Ila3QnYzXZbJ3ZDaJZngMAbUUnVl
YNeFyyALzOgVVBx4kvTm
=x6Hn
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Features and Enhancements for 5.7 part1
1. Allow to disable gisa
2. protected virtual machines
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's
state like guest memory and guest registers anymore. Instead the
PVMs are mostly managed by a new entity called Ultravisor (UV),
which provides an API, so KVM and the PV can request management
actions.
PVMs are encrypted at rest and protected from hypervisor access
while running. They switch from a normal operation into protected
mode, so we can still use the standard boot process to load a
encrypted blob and then move it into protected mode.
Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.
One mm related patch will go via Andrews mm tree ( mm/gup/writeback:
add callbacks for inaccessible pages)
It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:
1. Initialize all the bits of the dirty bitmap to 1 when enabling
dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
SPTEs gradually in small chunks
Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:
VM Size Before After optimization
128G 460ms 10ms
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
FDC registers FD_STATUS, FD_DATA, FD_DOR, FD_DIR and FD_DCR used to be
defined relative to FD_IOPORT, which is the FDC's base address, itself
a macro depending on the "fdc" local or global variable.
This patch changes this so that the register macros above now only
reference the address offset, and that the FDC's address is explicitly
passed in each call to fd_inb() and fd_outb(), thus removing the macro.
With this change there is no more implicit usage of the local/global
"fdc" variable.
One place in the ARM code used to check if the port was equal to FD_DOR,
this was changed to testing the register by applying a mask to the port,
as was already done in the sparc code.
There are still occurrences of fd_inb() and fd_outb() in the PARISC
code and these ones remain unaffected since they already used to work
with a base address and a register offset.
The sparc, m68k and parisc code could now be slightly cleaned up to
benefit from the macro definitions above instead of the equivalent
hard-coded values.
Link: https://lore.kernel.org/r/20200301195555.11154-6-w@1wt.eu
Cc: Ian Molton <spyro@f2s.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Netlink support of extended packet number cipher suites,
allows adding and updating XPN macsec interfaces.
Added support in:
* Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites.
* Setting and getting 64bit packet numbers with of SAs.
* Setting (only on SA creation) and getting ssci of SAs.
* Setting salt when installing a SAK.
Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1:
* MACSEC_CIPHER_ID_GCM_AES_XPN_128
* MACSEC_CIPHER_ID_GCM_AES_XPN_256
In addition, added 2 new netlink attribute types:
* MACSEC_SA_ATTR_SSCI
* MACSEC_SA_ATTR_SALT
Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw.
Signed-off-by: Era Mayflower <mayflowerera@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
Lastly, fix checkpatch.pl warning
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
in net/bridge/netfilter/ebtables.c
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Like vxlan and erspan opts, geneve opts should also be supported in
nft_tunnel. The difference is geneve RFC (draft-ietf-nvo3-geneve-14)
allows a geneve packet to carry multiple geneve opts. So with this
patch, nftables/libnftnl would do:
# nft add table ip filter
# nft add chain ip filter input { type filter hook input priority 0 \; }
# nft add tunnel filter geneve_02 { type geneve\; id 2\; \
ip saddr 192.168.1.1\; ip daddr 192.168.1.2\; \
sport 9000\; dport 9001\; dscp 1234\; ttl 64\; flags 1\; \
opts \"1:1:34567890,2:2:12121212,3:3:1212121234567890\"\; }
# nft list tunnels table filter
table ip filter {
tunnel geneve_02 {
id 2
ip saddr 192.168.1.1
ip daddr 192.168.1.2
sport 9000
dport 9001
tos 18
ttl 64
flags 1
geneve opts 1:1:34567890,2:2:12121212,3:3:1212121234567890
}
}
v1->v2:
- no changes, just post it separately.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a snapshot of hardidletimer netfilter target.
This patch implements a hardidletimer Xtables target that can be
used to identify when interfaces have been idle for a certain period
of time.
Timers are identified by labels and are created when a rule is set
with a new label. The rules also take a timeout value (in seconds) as
an option. If more than one rule uses the same timer label, the timer
will be restarted whenever any of the rules get a hit.
One entry for each timer is created in sysfs. This attribute contains
the timer remaining for the timer to expire. The attributes are
located under the xt_idletimer class:
/sys/class/xt_idletimer/timers/<label>
When the timer expires, the target module sends a sysfs notification
to the userspace, which can then decide what to do (eg. disconnect to
save power)
Compared to IDLETIMER, HARDIDLETIMER can send notifications when
CPU is in suspend too, to notify the timer expiry.
v1->v2: Moved all functionality into IDLETIMER module to avoid
code duplication per comment from Florian.
Signed-off-by: Manoj Basapathi <manojbm@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
USB Raw Gadget is a kernel module that provides a userspace interface for
the USB Gadget subsystem. Essentially it allows to emulate USB devices
from userspace. Enabled with CONFIG_USB_RAW_GADGET. Raw Gadget is
currently a strictly debugging feature and shouldn't be used in
production.
Raw Gadget is similar to GadgetFS, but provides a more low-level and
direct access to the USB Gadget layer for the userspace. The key
differences are:
1. Every USB request is passed to the userspace to get a response, while
GadgetFS responds to some USB requests internally based on the provided
descriptors. However note, that the UDC driver might respond to some
requests on its own and never forward them to the Gadget layer.
2. GadgetFS performs some sanity checks on the provided USB descriptors,
while Raw Gadget allows you to provide arbitrary data as responses to
USB requests.
3. Raw Gadget provides a way to select a UDC device/driver to bind to,
while GadgetFS currently binds to the first available UDC.
4. Raw Gadget uses predictable endpoint names (handles) across different
UDCs (as long as UDCs have enough endpoints of each required transfer
type).
5. Raw Gadget has ioctl-based interface instead of a filesystem-based one.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
When the RED Qdisc is currently configured to enable ECN, the RED algorithm
is used to decide whether a certain SKB should be marked. If that SKB is
not ECN-capable, it is early-dropped.
It is also possible to keep all traffic in the queue, and just mark the
ECN-capable subset of it, as appropriate under the RED algorithm. Some
switches support this mode, and some installations make use of it.
To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is
configured with this flag, non-ECT traffic is enqueued instead of being
early-dropped.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qdiscs RED, GRED, SFQ and CHOKE use different subsets of the same pool
of global RED flags. These are passed in tc_red_qopt.flags. However none of
these qdiscs validate the flag field, and just copy it over wholesale to
internal structures, and later dump it back. (An exception is GRED, which
does validate for VQs -- however not for the main setup.)
A broken userspace can therefore configure a qdisc with arbitrary
unsupported flags, and later expect to see the flags on qdisc dump. The
current ABI therefore allows storage of several bits of custom data to
qdisc instances of the types mentioned above. How many bits, depends on
which flags are meaningful for the qdisc in question. E.g. SFQ recognizes
flags ECN and HARDDROP, and the rest is not interpreted.
If SFQ ever needs to support ADAPTATIVE, it needs another way of doing it,
and at the same time it needs to retain the possibility to store 6 bits of
uninterpreted data. Likewise RED, which adds a new flag later in this
patchset.
To that end, this patch adds a new function, red_get_flags(), to split the
passed flags of RED-like qdiscs to flags and user bits, and
red_validate_flags() to validate the resulting configuration. It further
adds a new attribute, TCA_RED_FLAGS, to pass arbitrary flags.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-03-13
The following pull-request contains BPF updates for your *net-next* tree.
We've added 86 non-merge commits during the last 12 day(s) which contain
a total of 107 files changed, 5771 insertions(+), 1700 deletions(-).
The main changes are:
1) Add modify_return attach type which allows to attach to a function via
BPF trampoline and is run after the fentry and before the fexit programs
and can pass a return code to the original caller, from KP Singh.
2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher
objects to be visible in /proc/kallsyms so they can be annotated in
stack traces, from Jiri Olsa.
3) Extend BPF sockmap to allow for UDP next to existing TCP support in order
in order to enable this for BPF based socket dispatch, from Lorenz Bauer.
4) Introduce a new bpftool 'prog profile' command which attaches to existing
BPF programs via fentry and fexit hooks and reads out hardware counters
during that period, from Song Liu. Example usage:
bpftool prog profile id 337 duration 3 cycles instructions llc_misses
4228 run_cnt
3403698 cycles (84.08%)
3525294 instructions # 1.04 insn per cycle (84.05%)
13 llc_misses # 3.69 LLC misses per million isns (83.50%)
5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition
of a new bpf_link abstraction to keep in particular BPF tracing programs
attached even when the applicaion owning them exits, from Andrii Nakryiko.
6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering
and which returns the PID as seen by the init namespace, from Carlos Neira.
7) Refactor of RISC-V JIT code to move out common pieces and addition of a
new RV32G BPF JIT compiler, from Luke Nelson.
8) Add gso_size context member to __sk_buff in order to be able to know whether
a given skb is GSO or not, from Willem de Bruijn.
9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output
implementation but can be called from tracepoint programs, from Eelco Chaudron.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new helper that reuses existing xdp perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct xdp_buff *' as a tracepoint argument.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial
New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.
Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com
Pull networking fixes from David Miller:
"It looks like a decent sized set of fixes, but a lot of these are one
liner off-by-one and similar type changes:
1) Fix netlink header pointer to calcular bad attribute offset
reported to user. From Pablo Neira Ayuso.
2) Don't double clear PHY interrupts when ->did_interrupt is set,
from Heiner Kallweit.
3) Add missing validation of various (devlink, nl802154, fib, etc.)
attributes, from Jakub Kicinski.
4) Missing *pos increments in various netfilter seq_next ops, from
Vasily Averin.
5) Missing break in of_mdiobus_register() loop, from Dajun Jin.
6) Don't double bump tx_dropped in veth driver, from Jiang Lidong.
7) Work around FMAN erratum A050385, from Madalin Bucur.
8) Make sure ARP header is pulled early enough in bonding driver,
from Eric Dumazet.
9) Do a cond_resched() during multicast processing of ipvlan and
macvlan, from Mahesh Bandewar.
10) Don't attach cgroups to unrelated sockets when in interrupt
context, from Shakeel Butt.
11) Fix tpacket ring state management when encountering unknown GSO
types. From Willem de Bruijn.
12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend()
only in the suspend context. From Heiner Kallweit"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
net: systemport: fix index check to avoid an array out of bounds access
tc-testing: add ETS scheduler to tdc build configuration
net: phy: fix MDIO bus PM PHY resuming
net: hns3: clear port base VLAN when unload PF
net: hns3: fix RMW issue for VLAN filter switch
net: hns3: fix VF VLAN table entries inconsistent issue
net: hns3: fix "tc qdisc del" failed issue
taprio: Fix sending packets without dequeueing them
net: mvmdio: avoid error message for optional IRQ
net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register
net: memcg: fix lockdep splat in inet_csk_accept()
s390/qeth: implement smarter resizing of the RX buffer pool
s390/qeth: refactor buffer pool code
s390/qeth: use page pointers to manage RX buffer pool
seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number
net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
net/packet: tpacket_rcv: do not increment ring index on drop
sxgbe: Fix off by one in samsung driver strncpy size arg
net: caif: Add lockdep expression to RCU traversal primitive
MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer
...
Send ETHTOOL_MSG_CHANNELS_NTF notification whenever channel counts of
a network device are modified using ETHTOOL_MSG_CHANNELS_SET netlink
message or ETHTOOL_SCHANNELS ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement CHANNELS_SET netlink request to set channel counts of a network
device. These are traditionally set with ETHTOOL_SCHANNELS ioctl request.
Like the ioctl implementation, the generic ethtool code checks if supplied
values do not exceed driver defined limits; if they do, first offending
attribute is reported using extack. Checks preventing removing channels
used for RX indirection table or zerocopy AF_XDP socket are also
implemented.
Move ethtool_get_max_rxfh_channel() helper into common.c so that it can be
used by both ioctl and netlink code.
v2:
- fix netdev reference leak in error path (found by Jakub Kicinsky)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement CHANNELS_GET request to get channel counts of a network device.
These are traditionally available via ETHTOOL_GCHANNELS ioctl request.
Omit attributes for channel types which are not supported by driver or
device (zero reported for maximum).
v2: (all suggested by Jakub Kicinski)
- minor cleanup in channels_prepare_data()
- more descriptive channels_reply_size()
- omit attributes with zero max count
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_RINGS_NTF notification whenever ring sizes of a network
device are modified using ETHTOOL_MSG_RINGS_SET netlink message or
ETHTOOL_SRINGPARAM ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement RINGS_SET netlink request to set ring sizes of a network device.
These are traditionally set with ETHTOOL_SRINGPARAM ioctl request.
Like the ioctl implementation, the generic ethtool code checks if supplied
values do not exceed driver defined limits; if they do, first offending
attribute is reported using extack.
v2:
- fix netdev reference leak in error path (found by Jakub Kicinsky)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement RINGS_GET request to get ring sizes of a network device. These
are traditionally available via ETHTOOL_GRINGPARAM ioctl request.
Omit attributes for ring types which are not supported by driver or device
(zero reported for maximum).
v2: (all suggested by Jakub Kicinski)
- minor cleanup in rings_prepare_data()
- more descriptive rings_reply_size()
- omit attributes with zero max size
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_PRIVFLAGS_NTF notification whenever private flags of
a network device are modified using ETHTOOL_MSG_PRIVFLAGS_SET netlink
message or ETHTOOL_SPFLAGS ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PRIVFLAGS_SET netlink request to set private flags of a network
device. These are traditionally set with ETHTOOL_SPFLAGS ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PRIVFLAGS_GET request to get private flags for a network device.
These are traditionally available via ETHTOOL_GPFLAGS ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_FEATURES_NTF notification whenever network device features
are modified using ETHTOOL_MSG_FEATURES_SET netlink message, ethtool ioctl
request or any other way resulting in call to netdev_update_features() or
netdev_change_features()
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement FEATURES_SET netlink request to set network device features.
These are traditionally set using ETHTOOL_SFEATURES ioctl request.
Actual change is subject to netdev_change_features() sanity checks so that
it can differ from what was requested. Unlike with most other SET requests,
in addition to error code and optional extack, kernel provides an optional
reply message (ETHTOOL_MSG_FEATURES_SET_REPLY) in the same format but with
different semantics: information about difference between user request and
actual result and difference between old and new state of dev->features.
This reply message can be suppressed by setting ETHTOOL_FLAG_OMIT_REPLY
flag in request header.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement FEATURES_GET request to get network device features. These are
traditionally available via ETHTOOL_GFEATURES ioctl request.
v2:
- style cleanup suggested by Jakub Kicinski
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we expose the definition to the dt-bindings we need to keep those
definitions in sync. To address this the patch adds a simple cross
reference to the dt-bindings.
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The Internet Assigned Numbers Authority (IANA) has recently assigned
a protocol number value of 143 for Ethernet [1].
Before this assignment, encapsulation mechanisms such as Segment Routing
used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated
payload is an Ethernet frame.
In this patch, we add the definition of the Ethernet protocol number to the
kernel headers and update the SRv6 L2 tunnels to use it.
[1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
This just syncs the header it with the liburing version, so there's no
confusion on the license of the header parts.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We don't need a special structure just for batch descriptors. The
layout matches the general form for other descriptors.
Merge the desc_list_addr field into the union of other aliases for
the the third quadword in the structure.
Create a union to alias "xfer_size" with "desc_count".
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/158387868208.35922.5895104426944263789.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
We have IORING_OP_PROVIDE_BUFFERS, but the only way to remove buffers
is to trigger IO on them. The usual case of shrinking a buffer pool
would be to just not replenish the buffers when IO completes, and
instead just free it. But it may be nice to have a way to manually
remove a number of buffers from a given group, and
IORING_OP_REMOVE_BUFFERS provides that functionality.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a server process has tons of pending socket connections, generally
it uses epoll to wait for activity. When the socket is ready for reading
(or writing), the task can select a buffer and issue a recv/send on the
given fd.
Now that we have fast (non-async thread) support, a task can have tons
of pending reads or writes pending. But that means they need buffers to
back that data, and if the number of connections is high enough, having
them preallocated for all possible connections is unfeasible.
With IORING_OP_PROVIDE_BUFFERS, an application can register buffers to
use for any request. The request then sets IOSQE_BUFFER_SELECT in the
sqe, and a given group ID in sqe->buf_group. When the fd becomes ready,
a free buffer from the specified group is selected. If none are
available, the request is terminated with -ENOBUFS. If successful, the
CQE on completion will contain the buffer ID chosen in the cqe->flags
member, encoded as:
(buffer_id << IORING_CQE_BUFFER_SHIFT) | IORING_CQE_F_BUFFER;
Once a buffer has been consumed by a request, it is no longer available
and must be registered again with IORING_OP_PROVIDE_BUFFERS.
Requests need to support this feature. For now, IORING_OP_READ and
IORING_OP_RECV support it. This is checked on SQE submission, a CQE with
res == -EOPNOTSUPP will be posted if attempted on unsupported requests.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
IORING_OP_PROVIDE_BUFFERS uses the buffer registration infrastructure to
support passing in an addr/len that is associated with a buffer ID and
buffer group ID. The group ID is used to index and lookup the buffers,
while the buffer ID can be used to notify the application which buffer
in the group was used. The addr passed in is the starting buffer address,
and length is each buffer length. A number of buffers to add with can be
specified, in which case addr is incremented by length for each addition,
and each buffer increments the buffer ID specified.
No validation is done of the buffer ID. If the application provides
buffers within the same group with identical buffer IDs, then it'll have
a hard time telling which buffer ID was used. The only restriction is
that the buffer ID can be a max of 16-bits in size, so USHRT_MAX is the
maximum ID that can be used.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add TCP_NLA_BYTES_NOTSENT to SCM_TIMESTAMPING_OPT_STATS that reports
bytes in the write queue but not sent. This is the same metric as
what is exported with tcp_info.tcpi_notsent_bytes.
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, user who is adding an action expects HW to report stats,
however it does not have exact expectations about the stats types.
That is aligned with TCA_ACT_HW_STATS_TYPE_ANY.
Allow user to specify the type of HW stats for an action and require it.
Pass the information down to flow_offload layer.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The restriction introduced in 7a0df7fbc1 ("seccomp: Make NEW_LISTENER and
TSYNC flags exclusive") is mostly artificial: there is enough information
in a seccomp user notification to tell which thread triggered a
notification. The reason it was introduced is because TSYNC makes the
syscall return a thread-id on failure, and NEW_LISTENER returns an fd, and
there's no way to distinguish between these two cases (well, I suppose the
caller could check all fds it has, then do the syscall, and if the return
value was an fd that already existed, then it must be a thread id, but
bleh).
Matthew would like to use these two flags together in the Chrome sandbox
which wants to use TSYNC for video drivers and NEW_LISTENER to proxy
syscalls.
So, let's fix this ugliness by adding another flag, TSYNC_ESRCH, which
tells the kernel to just return -ESRCH on a TSYNC error. This way,
NEW_LISTENER (and any subsequent seccomp() commands that want to return
positive values) don't conflict with each other.
Suggested-by: Matthew Denton <mpdenton@google.com>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Link: https://lore.kernel.org/r/20200304180517.23867-1-tycho@tycho.ws
Signed-off-by: Kees Cook <keescook@chromium.org>
When multiple programs are attached, each program receives the return
value from the previous program on the stack and the last program
provides the return value to the attached function.
The fmod_ret bpf programs are run after the fentry programs and before
the fexit programs. The original function is only called if all the
fmod_ret programs return 0 to avoid any unintended side-effects. The
success value, i.e. 0 is not currently configurable but can be made so
where user-space can specify it at load time.
For example:
int func_to_be_attached(int a, int b)
{ <--- do_fentry
do_fmod_ret:
<update ret by calling fmod_ret>
if (ret != 0)
goto do_fexit;
original_function:
<side_effects_happen_here>
} <--- do_fexit
The fmod_ret program attached to this function can be defined as:
SEC("fmod_ret/func_to_be_attached")
int BPF_PROG(func_name, int a, int b, int ret)
{
// This will skip the original function logic.
return 1;
}
The first fmod_ret program is passed 0 in its return argument.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200304191853.1529-4-kpsingh@chromium.org
bdi.
- Extend dm-bio-record to track additional struct bio members needed
by DM integrity target.
- Fix DM core to properly advertise that a device is suspended during
unload (between the presuspend and postsuspend hooks). This change
is a prereq for related DM integrity and DM writecache fixes. It
elevates DM integrity's 'suspending' state tracking to DM core.
- Four stable fixes for DM integrity target.
- Fix crash in DM cache target due to incorrect work item cancelling.
- Fix DM thin metadata lockdep warning that was introduced during 5.6
merge window.
- Fix DM zoned target's chunk work refcounting that regressed during
recent conversion to refcount_t.
- Bump the minor version for DM core and all target versions that have
seen interface changes or important fixes during the 5.6 cycle.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl5fwPgTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWnUJB/9+36J0Y3HYdUIKM8P5CBH0tdkMqSo4
BFm3Og4Xo1GA5xodNptgD+QoLXy3VWkekUsLvaLgOlPq7haPvHkME20vUMuO1l46
QMNR2VXciYGgV0+CFlpXL2oMr0ZEc1hkt7ZpzYaw1OIk6Mo0tEEDrSo0rvmXafPf
W4veTWRL4HfN8fy3NpQNJ4xBQs4Iw4VC20FPFIOvzA1EGk7VGGaP1mGKmOnjxo/O
o3lEH8NpQgwxwST7d4T6DPdeu3aTifnjdY8/h8mFGpQnxOiZ/wZkheKWwTCNE8r9
oeVY07qVxNAHyC8G/SmAL5KnC6En1hq5jA2M4xh9guUI2k8YbON571n+
=iAS0
-----END PGP SIGNATURE-----
Merge tag 'for-5.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix request-based DM's congestion_fn and actually wire it up to the
bdi.
- Extend dm-bio-record to track additional struct bio members needed by
DM integrity target.
- Fix DM core to properly advertise that a device is suspended during
unload (between the presuspend and postsuspend hooks). This change is
a prereq for related DM integrity and DM writecache fixes. It
elevates DM integrity's 'suspending' state tracking to DM core.
- Four stable fixes for DM integrity target.
- Fix crash in DM cache target due to incorrect work item cancelling.
- Fix DM thin metadata lockdep warning that was introduced during 5.6
merge window.
- Fix DM zoned target's chunk work refcounting that regressed during
recent conversion to refcount_t.
- Bump the minor version for DM core and all target versions that have
seen interface changes or important fixes during the 5.6 cycle.
* tag 'for-5.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: bump version of core and various targets
dm: fix congested_fn for request-based device
dm integrity: use dm_bio_record and dm_bio_restore
dm bio record: save/restore bi_end_io and bi_integrity
dm zoned: Fix reference counter initial value of chunk works
dm writecache: verify watermark during resume
dm: report suspended device during destroy
dm thin metadata: fix lockdep complaint
dm cache: fix a crash due to incorrect work item cancelling
dm integrity: fix invalid table returned due to argument count mismatch
dm integrity: fix a deadlock due to offloading to an incorrect workqueue
dm integrity: fix recalculation when moving from journal mode to bitmap mode
Switch BPF UAPI constants, previously defined as #define macro, to anonymous
enum values. This preserves constants values and behavior in expressions, but
has added advantaged of being captured as part of DWARF and, subsequently, BTF
type info. Which, in turn, greatly improves usefulness of generated vmlinux.h
for BPF applications, as it will not require BPF users to copy/paste various
flags and constants, which are frequently used with BPF helpers. Only those
constants that are used/useful from BPF program side are converted.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200303003233.3496043-2-andriin@fb.com
BPF programs may want to know whether an skb is gso. The canonical
answer is skb_is_gso(skb), which tests that gso_size != 0.
Expose this field in the same manner as gso_segs. That field itself
is not a sufficient signal, as the comment in skb_shared_info makes
clear: gso_segs may be zero, e.g., from dodgy sources.
Also prepare net/bpf/test_run for upcoming BPF_PROG_TEST_RUN tests
of the feature.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200303200503.226217-2-willemdebruijn.kernel@gmail.com
Currently mlx5 PCI PF and VF devlink devices register their ports as
physical port in non-representors mode.
Introduce a new port flavour as virtual so that virtual devices can
register 'virtual' flavour to make it more clear to users.
An example of one PCI PF and 2 PCI virtual functions, each having
one devlink port.
$ devlink port show
pci/0000:06:00.0/1: type eth netdev ens2f0 flavour physical port 0
pci/0000:06:00.2/1: type eth netdev ens2f2 flavour virtual port 0
pci/0000:06:00.3/1: type eth netdev ens2f3 flavour virtual port 0
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes made during the 5.6 cycle warrant bumping the version number
for DM core and the targets modified by this commit.
It should be noted that dm-thin, dm-crypt and dm-raid already had
their target version bumped during the 5.6 merge window.
Signed-off-by; Mike Snitzer <snitzer@redhat.com>
Currently io_uring tries any request in a non-blocking manner, if it can,
and then retries from a worker thread if we get -EAGAIN. Now that we have
a new and fancy poll based retry backend, use that to retry requests if
the file supports it.
This means that, for example, an IORING_OP_RECVMSG on a socket no longer
requires an async thread to complete the IO. If we get -EAGAIN reading
from the socket in a non-blocking manner, we arm a poll handler for
notification on when the socket becomes readable. When it does, the
pending read is executed directly by the task again, through the io_uring
task work handlers. Not only is this faster and more efficient, it also
means we're not generating potentially tons of async threads that just
sit and block, waiting for the IO to complete.
The feature is marked with IORING_FEAT_FAST_POLL, meaning that async
pollable IO is fast, and that poll<link>other_op is fast as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add support for splice(2).
- output file is specified as sqe->fd, so it's handled by generic code
- hash_reg_file handled by generic code as well
- len is 32bit, but should be fine
- the fd_in is registered file, when SPLICE_F_FD_IN_FIXED is set, which
is a splice flag (i.e. sqe->splice_flags).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-02-28
The following pull-request contains BPF updates for your *net-next* tree.
We've added 41 non-merge commits during the last 7 day(s) which contain
a total of 49 files changed, 1383 insertions(+), 499 deletions(-).
The main changes are:
1) BPF and Real-Time nicely co-exist.
2) bpftool feature improvements.
3) retrieve bpf_sk_storage via INET_DIAG.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch will dump out the bpf_sk_storages of a sk
if the request has the INET_DIAG_REQ_SK_BPF_STORAGES nlattr.
An array of SK_DIAG_BPF_STORAGE_REQ_MAP_FD can be specified in
INET_DIAG_REQ_SK_BPF_STORAGES to select which bpf_sk_storage to dump.
If no map_fd is specified, all bpf_sk_storages of a sk will be dumped.
bpf_sk_storages can be added to the system at runtime. It is difficult
to find a proper static value for cb->min_dump_alloc.
This patch learns the nlattr size required to dump the bpf_sk_storages
of a sk. If it happens to be the very first nlmsg of a dump and it
cannot fit the needed bpf_sk_storages, it will try to expand the
skb by "pskb_expand_head()".
Instead of expanding it in inet_sk_diag_fill(), it is expanded at a
sleepable context in __inet_diag_dump() so __GFP_DIRECT_RECLAIM can
be used. In __inet_diag_dump(), it will retry as long as the
skb is empty and the cb->min_dump_alloc becomes larger than before.
cb->min_dump_alloc is bounded by KMALLOC_MAX_SIZE. The min_dump_alloc
is also changed from 'u16' to 'u32' to accommodate a sk that may have
a few large bpf_sk_storages.
The updated cb->min_dump_alloc will also be used to allocate the skb in
the next dump. This logic already exists in netlink_dump().
Here is the sample output of a locally modified 'ss' and it could be made
more readable by using BTF later:
[root@arch-fb-vm1 ~]# ss --bpf-map-id 14 --bpf-map-id 13 -t6an 'dst [::1]:8989'
State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
ESTAB 0 0 [::1]:51072 [::1]:8989
bpf_map_id:14 value:[ 3feb ]
bpf_map_id:13 value:[ 3f ]
ESTAB 0 0 [::1]:51070 [::1]:8989
bpf_map_id:14 value:[ 3feb ]
bpf_map_id:13 value:[ 3f ]
[root@arch-fb-vm1 ~]# ~/devshare/github/iproute2/misc/ss --bpf-maps -t6an 'dst [::1]:8989'
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 [::1]:51072 [::1]:8989
bpf_map_id:14 value:[ 3feb ]
bpf_map_id:13 value:[ 3f ]
bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
ESTAB 0 0 [::1]:51070 [::1]:8989
bpf_map_id:14 value:[ 3feb ]
bpf_map_id:13 value:[ 3f ]
bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com
This patch adds INET_DIAG support to bpf_sk_storage.
1. Although this series adds bpf_sk_storage diag capability to inet sk,
bpf_sk_storage is in general applicable to all fullsock. Hence, the
bpf_sk_storage logic will operate on SK_DIAG_* nlattr. The caller
will pass in its specific nesting nlattr (e.g. INET_DIAG_*) as
the argument.
2. The request will be like:
INET_DIAG_REQ_SK_BPF_STORAGES (nla_nest) (defined in latter patch)
SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
......
Considering there could have multiple bpf_sk_storages in a sk,
instead of reusing INET_DIAG_INFO ("ss -i"), the user can select
some specific bpf_sk_storage to dump by specifying an array of
SK_DIAG_BPF_STORAGE_REQ_MAP_FD.
If no SK_DIAG_BPF_STORAGE_REQ_MAP_FD is specified (i.e. an empty
INET_DIAG_REQ_SK_BPF_STORAGES), it will dump all bpf_sk_storages
of a sk.
3. The reply will be like:
INET_DIAG_BPF_SK_STORAGES (nla_nest) (defined in latter patch)
SK_DIAG_BPF_STORAGE (nla_nest)
SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
SK_DIAG_BPF_STORAGE (nla_nest)
SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
......
4. Unlike other INET_DIAG info of a sk which is pretty static, the size
required to dump the bpf_sk_storage(s) of a sk is dynamic as the
system adding more bpf_sk_storage_map. It is hard to set a static
min_dump_alloc size.
Hence, this series learns it at the runtime and adjust the
cb->min_dump_alloc as it iterates all sk(s) of a system. The
"unsigned int *res_diag_size" in bpf_sk_storage_diag_put()
is for this purpose.
The next patch will update the cb->min_dump_alloc as it
iterates the sk(s).
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230421.1975729-1-kafai@fb.com
The INET_DIAG_REQ_BYTECODE nlattr is currently re-found every time when
the "dump()" is re-started.
In a latter patch, it will also need to parse the new
INET_DIAG_REQ_SK_BPF_STORAGES nlattr to learn the map_fds. Thus, this
patch takes this chance to store the parsed nlattr in cb->data
during the "start" time of a dump.
By doing this, the "bc" argument also becomes unnecessary
and is removed. Also, the two copies of the INET_DIAG_REQ_BYTECODE
parsing-audit logic between compat/current version can be
consolidated to one.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230415.1975555-1-kafai@fb.com
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200227001744.GA3317@embeddedor
Now that everything is in place, we can announce the feature.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
diag 308 subcode 0 and 1 require several KVM and Ultravisor interactions.
Specific to these "soft" reboots are
* The "unshare all" UVC
* The "prepare for reset" UVC
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Now that we can't access guest memory anymore, we have a dedicated
satellite block that's a bounce buffer for instruction data.
We re-use the memop interface to copy the instruction data to / from
userspace. This lets us re-use a lot of QEMU code which used that
interface to make logical guest memory accesses which are not possible
anymore in protected mode anyway.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This contains 3 main changes:
1. changes in SIE control block handling for secure guests
2. helper functions for create/destroy/unpack secure guests
3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
machines
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If driver passed along the cookie, push it through Netlink.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow driver to indicate cookie metadata for registered traps.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* lots of small documentation fixes, from Jérôme Pouiller
* beacon protection (BIGTK) support from Jouni Malinen
* some initial code for TID configuration, from Tamizh chelvam
* I reverted some new API before it's actually used, because
it's wrong to mix controlled port and preauth
* a few other cleanups/fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl5UFlsACgkQB8qZga/f
l8SwNQ//Y/dELGODumDxY03l/Bj/+XL7HYl3Yn+J8mt2mYPC3zjTjvcRJBApiAog
1Oyd75fd+EqjxDUT+ngN8uJLQ/yCVsdwfpZhwV7VanAOuLI9aYkIXnai/Qs96rj3
yDIlrBVsmfsaxf/2e9UmsLmeUSm7C5s1EzaAIwoRGvcUH0pbH9WYAXF1QV+8fmXa
yoXuHV5Bv+wOW2xWqWJsFpoV109AW24pwJm0vlILcpFP/jno2GsRvwEpnC/GJhEA
4+jfEj0KlFkOewp0/HcqrUJp4yDEBhnhTTYgDL3hSWgKRVorKqY4/QmpKQCmpVQk
Qrb6k+TrnLmKBQKdqfd+PKAEC9U/9Wjg0KLPyc9btBGFNSUG3QoDigzxxIvSlW6w
2vyanDW6780FTIi8sA7sq1cBLosIyoFG44YYwbMidVtxhBk1LMRvetNAOnAJeycp
Abbp/A2EdvzM+ZMNMRwWlsgig6WkGY7jy/zpcmQUdALM+yT2o7D5ZwxE5pa2ggds
jf0eER1vVCEpOL70swNSZbAnDNCzBzTN64GX9gQIjdVT+nMUYQXwuOgEmho1FshD
bgZz4PcaOCCSTice9GYCC3C9OqXBsE2DyBwytYYzahyDiQH13Iz6wEq/+tIIjzCN
KKRdD12TXaPobLwv5zI3kogJ1I4P1fa6RZYpkngLpMbQrQ7qiDI=
=x0rf
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2020-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
A new set of changes:
* lots of small documentation fixes, from Jérôme Pouiller
* beacon protection (BIGTK) support from Jouni Malinen
* some initial code for TID configuration, from Tamizh chelvam
* I reverted some new API before it's actually used, because
it's wrong to mix controlled port and preauth
* a few other cleanups/fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Special handling is needed in bareudp module for IP & MPLS as they
support more than one ethertypes.
MPLS has 2 ethertypes. 0x8847 for MPLS unicast and 0x8848 for MPLS multicast.
While decapsulating MPLS packet from UDP packet the tunnel destination IP
address is checked to determine the ethertype. The ethertype of the packet
will be set to 0x8848 if the tunnel destination IP address is a multicast
IP address. The ethertype of the packet will be set to 0x8847 if the
tunnel destination IP address is a unicast IP address.
IP has 2 ethertypes.0x0800 for IPV4 and 0x86dd for IPv6. The version
field of the IP header tunnelled will be checked to determine the ethertype.
This special handling to tunnel additional ethertypes will be disabled
by default and can be enabled using a flag called multiproto. This flag can
be used only with ethertypes 0x8847 and 0x0800.
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Bareudp tunnel module provides a generic L3 encapsulation
tunnelling module for tunnelling different protocols like MPLS,
IP,NSH etc inside a UDP tunnel.
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This exposes the white balance configuration of the ISC as v4l2 controls
into userspace.
There are 8 controls available:
4 gain controls, sliders, for each of the BAYER components: R, B, GR, GB.
These gains are multipliers for each component, in format unsigned 0:4:9
with a default value of 512 (1.0 multiplier).
4 offset controls, sliders, for each of the BAYER components: R, B, GR, GB.
These offsets are added/substracted from each component, in format signed
1:12:0 with a default value of 0 (+/- 0)
To expose this to userspace, added 8 custom controls, in an auto cluster.
To summarize the functionality:
The auto cluster switch is the auto white balance control, and it works
like this:
AWB == 1: autowhitebalance is on, the do_white_balance button is inactive,
the gains/offsets are inactive, but volatile and readable.
Thus, the results of the whitebalance algorithm are available to userspace
to read at any time.
AWB == 0: autowhitebalance is off, cluster is in manual mode, user can
configure the gain/offsets directly. More than that, if the
do_white_balance button is pressed, the driver will perform
one-time-adjustment, (preferably with color checker card) and the userspace
can read again the new values.
With this feature, the userspace can save the coefficients and reinstall
them for example after reboot or reprobing the driver.
[hverkuil: fix checkpatch warning]
[hverkuil: minor spacing adjustments in the functionality description]
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
This patch adds support to configure per TID RTSCTS control
configuration to enable/disable through the
NL80211_TID_CONFIG_ATTR_RTSCTS_CTRL attribute.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-5-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds support to configure per TID AMPDU control
configuration to enable/disable aggregation through the
NL80211_TID_CONFIG_ATTR_AMPDU_CTRL attribute.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-4-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds support to configure per TID retry configuration
through the NL80211_TID_CONFIG_ATTR_RETRY_SHORT and
NL80211_TID_CONFIG_ATTR_RETRY_LONG attributes. This TID specific
retry configuration will have more precedence than phy level
configuration.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-3-git-send-email-tamizhr@codeaurora.org
[rebase completely on top of my previous API changes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make some changes to the TID-config API:
* use u16 in nl80211 (only, and restrict to using 8 bits for now),
to avoid issues in the future if we ever want to use higher TIDs.
* reject empty TIDs mask (via netlink policy)
* change feature advertising to not use extended feature flags but
have own mechanism for this, which simplifies the code
* fix all variable names from 'tid' to 'tids' since it's a mask
* change to cfg80211_ name prefixes, not ieee80211_
* fix some minor docs/spelling things.
Change-Id: Ia234d464b3f914cdeab82f540e018855be580dce
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the new NL80211_CMD_SET_TID_CONFIG command to support
data TID specific configuration. Per TID configuration is
passed in the nested NL80211_ATTR_TID_CONFIG attribute.
This patch adds support to configure per TID noack policy
through the NL80211_TID_CONFIG_ATTR_NOACK attribute.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-2-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
IEEE P802.11-REVmd/D3.0 adds support for protecting Beacon frames using
a new set of keys (BIGTK; key index 6..7) similarly to the way
group-addressed Robust Management frames are protected (IGTK; key index
4..5). Extend cfg80211 and nl80211 to allow the new BIGTK to be
configured. Add an extended feature flag to indicate driver support for
the new key index values to avoid array overflows in driver
implementations and also to indicate to user space when this
functionality is available.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200222132548.20835-2-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit 8c3ed7aa2b.
As Jouni points out, there's really no need for this, since the
RSN pre-authentication frames are normal data frames, not port
control frames (locally).
We can still revert this now since it hasn't actually gone beyond
-next.
Fixes: 8c3ed7aa2b ("nl80211: add src and dst addr attributes for control port tx/rx")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200224101910.b746e263287a.I9eb15d6895515179d50964dec3550c9dc784bb93@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-02-21
The following pull-request contains BPF updates for your *net-next* tree.
We've added 25 non-merge commits during the last 4 day(s) which contain
a total of 33 files changed, 2433 insertions(+), 161 deletions(-).
The main changes are:
1) Allow for adding TCP listen sockets into sock_map/hash so they can be used
with reuseport BPF programs, from Jakub Sitnicki.
2) Add a new bpf_program__set_attach_target() helper for adding libbpf support
to specify the tracepoint/function dynamically, from Eelco Chaudron.
3) Add bpf_read_branch_records() BPF helper which helps use cases like profile
guided optimizations, from Daniel Xu.
4) Enable bpf_perf_event_read_value() in all tracing programs, from Song Liu.
5) Relax BTF mandatory check if only used for libbpf itself e.g. to process
BTF defined maps, from Andrii Nakryiko.
6) Move BPF selftests -mcpu compilation attribute from 'probe' to 'v3' as it has
been observed that former fails in envs with low memlock, from Yonghong Song.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are a number of small USB driver fixes for 5.6-rc3.
Included in here are:
- MAINTAINER file updates
- USB gadget driver fixes
- usb core quirk additions and fixes for regressions
- xhci driver fixes
- usb serial driver id additions and fixes
- thunderbolt bugfix
Thunderbolt patches come in through here now that USB4 is really
thunderbolt.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXk+/zw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynQywCeOlklNLcTKw/JkbsbtMlY/yTl4W8AoLLoEgSy
GyUV8E67wZNdXLKH+n14
=o97t
-----END PGP SIGNATURE-----
Merge tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.6-rc3.
Included in here are:
- MAINTAINER file updates
- USB gadget driver fixes
- usb core quirk additions and fixes for regressions
- xhci driver fixes
- usb serial driver id additions and fixes
- thunderbolt bugfix
Thunderbolt patches come in through here now that USB4 is really
thunderbolt.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits)
USB: misc: iowarrior: add support for the 100 device
thunderbolt: Prevent crash if non-active NVMem file is read
usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format
USB: misc: iowarrior: add support for the 28 and 28L devices
USB: misc: iowarrior: add support for 2 OEMed devices
USB: Fix novation SourceControl XL after suspend
xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2
Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables"
MAINTAINERS: Sort entries in database for THUNDERBOLT
usb: dwc3: debug: fix string position formatting mixup with ret and len
usb: gadget: serial: fix Tx stall after buffer overflow
usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags
usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows
usb: dwc2: Fix in ISOC request length checking
usb: gadget: composite: Support more than 500mA MaxPower
usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus
usb: gadget: u_audio: Fix high-speed max packet size
usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields
USB: core: clean up endpoint-descriptor parsing
USB: quirks: blacklist duplicate ep on Sound Devices USBPre2
...
Pull networking fixes from David Miller:
1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from
Cong Wang.
2) Fix deadlock in xsk by publishing global consumer pointers when NAPI
is finished, from Magnus Karlsson.
3) Set table field properly to RT_TABLE_COMPAT when necessary, from
Jethro Beekman.
4) NLA_STRING attributes are not necessary NULL terminated, deal wiht
that in IFLA_ALT_IFNAME. From Eric Dumazet.
5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov.
6) Handle mtu==0 devices properly in wireguard, from Jason A.
Donenfeld.
7) Fix several lockdep warnings in bonding, from Taehee Yoo.
8) Fix cls_flower port blocking, from Jason Baron.
9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen.
10) Fix RDMA race in qede driver, from Michal Kalderon.
11) Fix several false lockdep warnings by adding conditions to
list_for_each_entry_rcu(), from Madhuparna Bhowmik.
12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen.
13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song.
14) Hey, variables declared in switch statement before any case
statements are not initialized. I learn something every day. Get
rids of this stuff in several parts of the networking, from Kees
Cook.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.
bnxt_en: Improve device shutdown method.
net: netlink: cap max groups which will be considered in netlink_bind()
net: thunderx: workaround BGX TX Underflow issue
ionic: fix fw_status read
net: disable BRIDGE_NETFILTER by default
net: macb: Properly handle phylink on at91rm9200
s390/qeth: fix off-by-one in RX copybreak check
s390/qeth: don't warn for napi with 0 budget
s390/qeth: vnicc Fix EOPNOTSUPP precedence
openvswitch: Distribute switch variables for initialization
net: ip6_gre: Distribute switch variables for initialization
net: core: Distribute switch variables for initialization
udp: rehash on disconnect
net/tls: Fix to avoid gettig invalid tls record
bpf: Fix a potential deadlock with bpf_map_do_batch
bpf: Do not grab the bucket spinlock by default on htab batch ops
ice: Wait for VF to be reset/ready before configuration
ice: Don't tell the OS that link is going down
ice: Don't reject odd values of usecs set by user
...
QEMU has a funny new build error message when I use the upstream kernel
headers:
CC block/file-posix.o
In file included from /home/cborntra/REPOS/qemu/include/qemu/timer.h:4,
from /home/cborntra/REPOS/qemu/include/qemu/timed-average.h:29,
from /home/cborntra/REPOS/qemu/include/block/accounting.h:28,
from /home/cborntra/REPOS/qemu/include/block/block_int.h:27,
from /home/cborntra/REPOS/qemu/block/file-posix.c:30:
/usr/include/linux/swab.h: In function `__swab':
/home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:34: error: "sizeof" is not defined, evaluates to 0 [-Werror=undef]
20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE)
| ^~~~~~
/home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:41: error: missing binary operator before token "("
20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE)
| ^
cc1: all warnings being treated as errors
make: *** [/home/cborntra/REPOS/qemu/rules.mak:69: block/file-posix.o] Error 1
rm tests/qemu-iotests/socket_scm_helper.o
This was triggered by commit d5767057c9 ("uapi: rename ext2_swab() to
swab() and share globally in swab.h"). That patch is doing
#include <asm/bitsperlong.h>
but it uses BITS_PER_LONG.
The kernel file asm/bitsperlong.h provide only __BITS_PER_LONG.
Let us use the __ variant in swap.h
Link: http://lkml.kernel.org/r/20200213142147.17604-1-borntraeger@de.ibm.com
Fixes: d5767057c9 ("uapi: rename ext2_swab() to swab() and share globally in swab.h")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Cc: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no in-kernel users remaining, but there may still be users that
include linux/time.h instead of sys/time.h from user space, so leave the
types available to user space while hiding them from kernel space.
Only the __kernel_old_* versions of these types remain now.
Link: http://lkml.kernel.org/r/20200110154232.4104492-4-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The presence detect state (PDS) is normally a logical OR of in-band and
out-of-band (OOB) presence detect. As of PCIe 4.0, there is the option to
disable in-band presence so that the PDS bit always reflects the state of
the out-of-band presence.
The recommendation of the PCIe spec is to disable in-band presence whenever
supported (PCIe r5.0, appendix I implementation note):
Due to architectural issues, the in-band (Physical-Layer-based) portion
of the PD mechanism is deprecated for use with async hot-plug. One issue
is that in-band PD as architected does not detect adapter removal during
certain LTSSM states, notably the L1 and Disabled States. Another issue
is that when both in-band and OOB PD are being used together, the
Presence Detect State bit and its associated interrupt mechanism always
reflect the logical OR of the inband and OOB PD states, and with some
hot-plug hardware configurations, it is important for software to detect
and respond to in-band and OOB PD events independently. If OOB PD is
being used and the associated DSP supports In-Band PD Disable, it is
recommended that the In-Band PD Disable bit be Set, and the Presence
Detect State bit and its associated interrupt mechanism be used
exclusively for OOB PD. As a substitute for in-band PD with async
hot-plug, the reference model uses either the DPC or the DLL Link Active
mechanism.
Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com
[bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD
value (suggested by Lukas)]
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Alexei Starovoitov says:
====================
pull-request: bpf 2020-02-19
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 93 insertions(+), 31 deletions(-).
The main changes are:
1) batched bpf hashtab fixes from Brian and Yonghong.
2) various selftests and libbpf fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.
We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.
Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
Add support for low latency Reed Solomon FEC as LLRS.
The LL-FEC is defined by the 25G/50G ethernet consortium,
in the document titled "Low Latency Reed Solomon Forward Error Correction"
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
CC: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This batch contains Netfilter fixes for net:
1) Restrict hashlimit size to 1048576, from Cong Wang.
2) Check for offload flags from nf_flow_table_offload_setup(),
this fixes a crash in case the hardware offload is disabled.
From Florian Westphal.
3) Three preparation patches to extend the conntrack clash resolution,
from Florian.
4) Extend clash resolution to deal with DNS packets from the same flow
racing to set up the NAT configuration.
5) Small documentation fix in pipapo, from Stefano Brivio.
6) Remove misleading unlikely() from pipapo_refill(), also from Stefano.
7) Reduce hashlimit mutex scope, from Cong Wang. This patch is actually
triggering another problem, still under discussion, another patch to
fix this will follow up.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The performance of bpf_redirect() is now roughly the same as that of
bpf_redirect_map(). However, David Ahern pointed out that the header file
has not been updated to reflect this, and still says that a significant
performance increase is possible when using bpf_redirect_map(). Remove this
text from the bpf_redirect_map() description, and reword the description in
bpf_redirect() slightly. Also fix the 'Return' section of the
bpf_redirect_map() documentation.
Fixes: 1d233886dd ("xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200218130334.29889-1-toke@redhat.com
This patch further relaxes the need to drop an skb due to a clash with
an existing conntrack entry.
Current clash resolution handles the case where the clash occurs between
two identical entries (distinct nf_conn objects with same tuples), i.e.:
Original Reply
existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353
clashing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353
... existing handling will discard the unconfirmed clashing entry and
makes skb->_nfct point to the existing one. The skb can then be
processed normally just as if the clash would not have existed in the
first place.
For other clashes, the skb needs to be dropped.
This frequently happens with DNS resolvers that send A and AAAA queries
back-to-back when NAT rules are present that cause packets to get
different DNAT transformations applied, for example:
-m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353
-m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353
In this case the A or AAAA query is dropped which incurs a costly
delay during name resolution.
This patch also allows this collision type:
Original Reply
existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353
clashing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.7:5353
In this case, clash is in original direction -- the reply direction
is still unique.
The change makes it so that when the 2nd colliding packet is received,
the clashing conntrack is tagged with new IPS_NAT_CLASH_BIT, gets a fixed
1 second timeout and is inserted in the reply direction only.
The entry is hidden from 'conntrack -L', it will time out quickly
and it can be early dropped because it will never progress to the
ASSURED state.
To avoid special-casing the delete code path to special case
the ORIGINAL hlist_nulls node, a new helper, "hlist_nulls_add_fake", is
added so hlist_nulls_del() will work.
Example:
CPU A: CPU B:
1. 10.2.3.4:42 -> 10.8.8.8:53 (A)
2. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
3. Apply DNAT, reply changed to 10.0.0.6
4. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
5. Apply DNAT, reply changed to 10.0.0.7
6. confirm/commit to conntrack table, no collisions
7. commit clashing entry
Reply comes in:
10.2.3.4:42 <- 10.0.0.6:5353 (A)
-> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
10.2.3.4:42 <- 10.0.0.7:5353 (AAAA)
-> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
The conntrack entry is deleted from table, as it has the NAT_CLASH
bit set.
In case of a retransmit from ORIGINAL dir, all further packets will get
the DNAT transformation to 10.0.0.6.
I tried to come up with other solutions but they all have worse
problems.
Alternatives considered were:
1. Confirm ct entries at allocation time, not in postrouting.
a. will cause uneccesarry work when the skb that creates the
conntrack is dropped by ruleset.
b. in case nat is applied, ct entry would need to be moved in
the table, which requires another spinlock pair to be taken.
c. breaks the 'unconfirmed entry is private to cpu' assumption:
we would need to guard all nfct->ext allocation requests with
ct->lock spinlock.
2. Make the unconfirmed list a hash table instead of a pcpu list.
Shares drawback c) of the first alternative.
3. Document this is expected and force users to rearrange their
ruleset (e.g. by using "-m cluster" instead of "-m statistics").
nft has the 'jhash' expression which can be used instead of 'numgen'.
Major drawback: doesn't fix what I consider a bug, not very realistic
and I believe its reasonable to have the existing rulesets to 'just
work'.
4. Document this is expected and force users to steer problematic
packets to the same CPU -- this would serialize the "allocate new
conntrack entry/nat table evaluation/perform nat/confirm entry", so
no race can occur. Similar drawback to 3.
Another advantage of this patch compared to 1) and 2) is that there are
no changes to the hot path; things are handled in the udp tracker and
the clash resolution path.
Cc: rcu@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.
Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
Tested with a corresponding change in the userspace:
# ovs-dpctl dump-flows
in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},1
in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},2
in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1
# ping -c1 192.168.0.2 -t 42
IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
# ping -c1 192.168.0.2 -t 120
IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
# ping -c1 192.168.0.2 -t 1
#
Co-developed-by: Bindiya Kurle <bindiyakurle@gmail.com>
Signed-off-by: Bindiya Kurle <bindiyakurle@gmail.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.
For applications using epoll, returning sk_err along with the result
of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a spurious wakeup.
Consider a multi-threaded application using epoll. A thread may awaken
with EPOLLIN but another thread may already be reading. The
spuriously-awoken thread does not necessarily know that another thread
'won'; rather, it may be possible that it was woken up due to the
presence of an error if there is no data. A zerocopy read receiving 0
bytes thus would need to be followed up by recvmsg to be sure.
Instead, we return sk_err directly with zerocopy, so the application
can avoid this extra system call.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.
For applications using edge-triggered epoll, returning inq along with
the result of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
since normally we would need to perform a recvmsg() call for every
successful small RPC read via TCP receive zerocopy, returning inq can
reduce the number of system calls performed by approximately half.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 802.11 frame encapsulation offload support
* more HE (802.11ax) support, including some for 6 GHz band
* powersave in hwsim, for better testing
Of course as usual there are various cleanups and small fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl5GggsACgkQB8qZga/f
l8Qgdg//R42bSv94JYPcwZ5phgTgraCRZODWjBJq08n2T5m0EmEufgX79d9uEdgT
u9npvn+ich5/VZhmuSbGrW9TT6/FPLAZyghV1fj79o971Qd7ky2Mp8G1fcTEbtDn
IG2e9vauY9XDSb2O3wNj8dA8rAN/kLNmhsPqWxn2CgLPqjdbf+W15dvo4rnaL2gs
ffGyE47dHuAFwCruyT8UPbw3iu4+tQhruN9eVg+UkU8rJGvEMqfrLK20zl1weIV9
a7IuXdxacdsHO8Y+tl6GtvgOURQPpvf55+leLOUhcmHPJ3f/eAal6wmWRxDxs/qB
IWSe8BC81cZZ5pYWk1A+0sXfJMlYjNsN0xw5SQRSrbgyb5saz8aLUIlHsOBM4iPH
SwzCMN5A1GOPOUFsugzPwbiki9g6dh0/EC2NyXE4A26CAd967dVXTvTY5SMNgiB+
bZaaUDaPQUm1jgDT5bLRhTipTHbekDkYzG/e+wNO+HKyStoEYM485MwY4MQCYzEh
HKDmkAbFuCwEUeXXw1y8GybUknApCRru9FtY+oiN/+y/aESfB7HJfmDFFU/KYgPu
HOuqJoNAxdMdycDCb24/cLjUiehzfM6sujwBxZOD5WHhAcXrBo5dGd6ibfurIrjj
XI36/mwTiMtyyb0/5xM1AKvoic2j+a5YU3MB7KSc9TlaPa5j2NA=
=CgmJ
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2020-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
A few big new things:
* 802.11 frame encapsulation offload support
* more HE (802.11ax) support, including some for 6 GHz band
* powersave in hwsim, for better testing
Of course as usual there are various cleanups and small fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for creating a process in a different cgroup than its
parent. Callers can limit and account processes and threads right from
the moment they are spawned:
- A service manager can directly spawn new services into dedicated
cgroups.
- A process can be directly created in a frozen cgroup and will be
frozen as well.
- The initial accounting jitter experienced by process supervisors and
daemons is eliminated with this.
- Threaded applications or even thread implementations can choose to
create a specific cgroup layout where each thread is spawned
directly into a dedicated cgroup.
This feature is limited to the unified hierarchy. Callers need to pass
a directory file descriptor for the target cgroup. The caller can
choose to pass an O_PATH file descriptor. All usual migration
restrictions apply, i.e. there can be no processes in inner nodes. In
general, creating a process directly in a target cgroup adheres to all
migration restrictions.
One of the biggest advantages of this feature is that CLONE_INTO_GROUP does
not need to grab the write side of the cgroup cgroup_threadgroup_rwsem.
This global lock makes moving tasks/threads around super expensive. With
clone3() this lock is avoided.
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently there is no way for user-space to be informed about changes
in status of GPIO lines e.g. when someone else requests the line or its
config changes. We can only periodically re-read the line-info. This
is fine for simple one-off user-space tools, but any daemon that provides
a centralized access to GPIO chips would benefit hugely from an event
driven line info synchronization.
This patch adds a new ioctl() that allows user-space processes to reuse
the file descriptor associated with the character device for watching
any changes in line properties. Every such event contains the updated
line information.
Currently the events are generated on three types of status changes: when
a line is requested, when it's released and when its config is changed.
The first two are self-explanatory. For the third one: this will only
happen when another user-space process calls the new SET_CONFIG ioctl()
as any changes that can happen from within the kernel (i.e.
set_transitory() or set_debounce()) are of no interest to user-space.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
The low level index is the index in the underlying hardware buffer of
the most recently captured taken branch which is always saved in
branch_entries[0]. It is very useful for reconstructing the call stack.
For example, in Intel LBR call stack mode, the depth of reconstructed
LBR call stack limits to the number of LBR registers. With the low level
index information, perf tool may stitch the stacks of two samples. The
reconstructed LBR call stack can break the HW limitation.
Add a new branch sample type to retrieve low level index of raw branch
records. The low level index is between -1 (unknown) and max depth which
can be retrieved in /sys/devices/cpu/caps/branches.
Only when the new branch sample type is set, the low level index
information is dumped into the PERF_SAMPLE_BRANCH_STACK output.
Perf tool should check the attr.branch_sample_type, and apply the
corresponding format for PERF_SAMPLE_BRANCH_STACK samples.
Otherwise, some user case may be broken. For example, users may parse a
perf.data, which include the new branch sample type, with an old version
perf tool (without the check). Users probably get incorrect information
without any warning.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com
To work properly on every architectures and compilers, the enum value
needs to be specific numbers.
Suggested-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/1580537624-10179-1-git-send-email-peter.chen@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zonefs is a very simple file system exposing each zone of a zoned block
device as a file.
Unlike a regular file system with native zoned block device support
(e.g. f2fs or the on-going btrfs effort), zonefs does not hide the
sequential write constraint of zoned block devices to the user. As a
result, zonefs is not a POSIX compliant file system. Its goal is to
simplify the implementation of zoned block devices support in
applications by replacing raw block device file accesses with a richer
file based API, avoiding relying on direct block device file ioctls
which may be more obscure to developers.
One example of this approach is the implementation of LSM
(log-structured merge) tree structures (such as used in RocksDB and
LevelDB) on zoned block devices by allowing SSTables to be stored in a
zone file similarly to a regular file system rather than as a range of
sectors of a zoned device. The introduction of the higher level
construct "one file is one zone" can help reducing the amount of changes
needed in the application while at the same time allowing the use of
zoned block devices with various programming languages other than C.
Zonefs IO management implementation uses the new iomap generic code.
Zonefs has been successfully tested using a functional test suite
(available with zonefs userland format tool on github) and a prototype
implementation of LevelDB on top of zonefs.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCXj1y8QAKCRDdoc3SxdoY
dqozAP9J3t+Q95BgKgI5jP+XEtyYsPBTaVrvaSaViEnwtJLVoQD/ZQ1lTCZSE9OI
UkvWawkuFtLGfOxTqyA3eZrZi22Ttwk=
=YVvO
-----END PGP SIGNATURE-----
Merge tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull new zonefs file system from Damien Le Moal:
"Zonefs is a very simple file system exposing each zone of a zoned
block device as a file.
Unlike a regular file system with native zoned block device support
(e.g. f2fs or the on-going btrfs effort), zonefs does not hide the
sequential write constraint of zoned block devices to the user. As a
result, zonefs is not a POSIX compliant file system. Its goal is to
simplify the implementation of zoned block devices support in
applications by replacing raw block device file accesses with a richer
file based API, avoiding relying on direct block device file ioctls
which may be more obscure to developers.
One example of this approach is the implementation of LSM
(log-structured merge) tree structures (such as used in RocksDB and
LevelDB) on zoned block devices by allowing SSTables to be stored in a
zone file similarly to a regular file system rather than as a range of
sectors of a zoned device. The introduction of the higher level
construct "one file is one zone" can help reducing the amount of
changes needed in the application while at the same time allowing the
use of zoned block devices with various programming languages other
than C.
Zonefs IO management implementation uses the new iomap generic code.
Zonefs has been successfully tested using a functional test suite
(available with zonefs userland format tool on github) and a prototype
implementation of LevelDB on top of zonefs"
* tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Add documentation
fs: New zonefs file system
When using control port over nl80211 in AP mode with
pre-authentication, APs need to forward frames to other
APs defined by their MAC address. Before this patch,
pre-auth frames reaching user space over nl80211 control
port have no longer any information about the dest attached,
which can be used for forwarding to a controller or injecting
the frame back to a ethernet interface over a AF_PACKET
socket.
Analog problems exist, when forwarding pre-auth frames from
AP -> STA.
This patch therefore adds the NL80211_ATTR_DST_MAC and
NL80211_ATTR_SRC_MAC attributes to provide more context
information when forwarding.
The respective arguments are optional on tx and included on rx.
Therefore unaware existing software is not affected.
Software which wants to detect this feature, can do so
by checking against:
NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211_MAC_ADDRS
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200115125522.3755-1-markus.theil@tu-ilmenau.de
[split into separate cfg80211/mac80211 patches]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit ab4dfa2053 ("cfg80211: Allow drivers to advertise supported AKM
suites") introduces the support to advertize supported AKMs to userspace.
This needs an enhancement to advertize the AKM support per interface type,
specifically for the cfg80211-based drivers that implement SME and use
different mechanisms to support the AKM's for each interface type (e.g.,
the support for SAE, OWE AKM's take different paths for such drivers on
STA/AP mode).
This commit aims the same and enhances the earlier mechanism of advertizing
the AKMs per wiphy. Add new nl80211 attributes and data structure to
provide supported AKMs per interface type to userspace.
the AKMs advertized in akm_suites are default capabilities if not
advertized for a specific interface type in iftype_akm_suites.
Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Link: https://lore.kernel.org/r/20200126203032.21934-1-vjakkam@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The regulatory domain might forbid HE operation. Certain regulatory
domains may restrict it for specific channels whereas others may do it
for the whole regulatory domain.
Add an option to indicate it in the channel flag.
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200121081213.733757-1-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
zonefs is a very simple file system exposing each zone of a zoned block
device as a file. Unlike a regular file system with zoned block device
support (e.g. f2fs), zonefs does not hide the sequential write
constraint of zoned block devices to the user. Files representing
sequential write zones of the device must be written sequentially
starting from the end of the file (append only writes).
As such, zonefs is in essence closer to a raw block device access
interface than to a full featured POSIX file system. The goal of zonefs
is to simplify the implementation of zoned block device support in
applications by replacing raw block device file accesses with a richer
file API, avoiding relying on direct block device file ioctls which may
be more obscure to developers. One example of this approach is the
implementation of LSM (log-structured merge) tree structures (such as
used in RocksDB and LevelDB) on zoned block devices by allowing SSTables
to be stored in a zone file similarly to a regular file system rather
than as a range of sectors of a zoned device. The introduction of the
higher level construct "one file is one zone" can help reducing the
amount of changes needed in the application as well as introducing
support for different application programming languages.
Zonefs on-disk metadata is reduced to an immutable super block to
persistently store a magic number and optional feature flags and
values. On mount, zonefs uses blkdev_report_zones() to obtain the device
zone configuration and populates the mount point with a static file tree
solely based on this information. E.g. file sizes come from the device
zone type and write pointer offset managed by the device itself.
The zone files created on mount have the following characteristics.
1) Files representing zones of the same type are grouped together
under a common sub-directory:
* For conventional zones, the sub-directory "cnv" is used.
* For sequential write zones, the sub-directory "seq" is used.
These two directories are the only directories that exist in zonefs.
Users cannot create other directories and cannot rename nor delete
the "cnv" and "seq" sub-directories.
2) The name of zone files is the number of the file within the zone
type sub-directory, in order of increasing zone start sector.
3) The size of conventional zone files is fixed to the device zone size.
Conventional zone files cannot be truncated.
4) The size of sequential zone files represent the file's zone write
pointer position relative to the zone start sector. Truncating these
files is allowed only down to 0, in which case, the zone is reset to
rewind the zone write pointer position to the start of the zone, or
up to the zone size, in which case the file's zone is transitioned
to the FULL state (finish zone operation).
5) All read and write operations to files are not allowed beyond the
file zone size. Any access exceeding the zone size is failed with
the -EFBIG error.
6) Creating, deleting, renaming or modifying any attribute of files and
sub-directories is not allowed.
7) There are no restrictions on the type of read and write operations
that can be issued to conventional zone files. Buffered, direct and
mmap read & write operations are accepted. For sequential zone files,
there are no restrictions on read operations, but all write
operations must be direct IO append writes. mmap write of sequential
files is not allowed.
Several optional features of zonefs can be enabled at format time.
* Conventional zone aggregation: ranges of contiguous conventional
zones can be aggregated into a single larger file instead of the
default one file per zone.
* File ownership: The owner UID and GID of zone files is by default 0
(root) but can be changed to any valid UID/GID.
* File access permissions: the default 640 access permissions can be
changed.
The mkzonefs tool is used to format zoned block devices for use with
zonefs. This tool is available on Github at:
git@github.com:damien-lemoal/zonefs-tools.git.
zonefs-tools also includes a test suite which can be run against any
zoned block device, including null_blk block device created with zoned
mode.
Example: the following formats a 15TB host-managed SMR HDD with 256 MB
zones with the conventional zones aggregation feature enabled.
$ sudo mkzonefs -o aggr_cnv /dev/sdX
$ sudo mount -t zonefs /dev/sdX /mnt
$ ls -l /mnt/
total 0
dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
The size of the zone files sub-directories indicate the number of files
existing for each type of zones. In this example, there is only one
conventional zone file (all conventional zones are aggregated under a
single file).
$ ls -l /mnt/cnv
total 137101312
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0
This aggregated conventional zone file can be used as a regular file.
$ sudo mkfs.ext4 /mnt/cnv/0
$ sudo mount -o loop /mnt/cnv/0 /data
The "seq" sub-directory grouping files for sequential write zones has
in this example 55356 zones.
$ ls -lv /mnt/seq
total 14511243264
-rw-r----- 1 root root 0 Nov 25 13:23 0
-rw-r----- 1 root root 0 Nov 25 13:23 1
-rw-r----- 1 root root 0 Nov 25 13:23 2
...
-rw-r----- 1 root root 0 Nov 25 13:23 55354
-rw-r----- 1 root root 0 Nov 25 13:23 55355
For sequential write zone files, the file size changes as data is
appended at the end of the file, similarly to any regular file system.
$ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
The written file can be truncated to the zone size, preventing any
further write operation.
$ truncate -s 268435456 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
Truncation to 0 size allows freeing the file zone storage space and
restart append-writes to the file.
$ truncate -s 0 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
Since files are statically mapped to zones on the disk, the number of
blocks of a file as reported by stat() and fstat() indicates the size
of the file zone.
$ stat /mnt/seq/0
File: /mnt/seq/0
Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
Device: 870h/2160d Inode: 50431 Links: 1
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-11-25 13:23:57.048971997 +0900
Modify: 2019-11-25 13:52:25.553805765 +0900
Change: 2019-11-25 13:52:25.553805765 +0900
Birth: -
The number of blocks of the file ("Blocks") in units of 512B blocks
gives the maximum file size of 524288 * 512 B = 256 MB, corresponding
to the device zone size in this example. Of note is that the "IO block"
field always indicates the minimum IO size for writes and corresponds
to the device physical sector size.
This code contains contributions from:
* Johannes Thumshirn <jthumshirn@suse.de>,
* Darrick J. Wong <darrick.wong@oracle.com>,
* Christoph Hellwig <hch@lst.de>,
* Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and
* Ting Yao <tingyao@hust.edu.cn>.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
* fix register corruption
* ENOTSUPP/EOPNOTSUPP mixed
* reset cleanups/fixes
* selftests
x86:
* Bug fixes and cleanups
* AMD support for APIC virtualization even in combination with
in-kernel PIT or IOAPIC.
MIPS:
* Compilation fix.
Generic:
* Fix refcount overflow for zero page.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJeOuf7AAoJEL/70l94x66DOBQH/j1W9lUpbDgr9aWbrZT+O/yP
FWzUDrRlCZCjV1FQKbGPa4YLeDRTG5n+RIQTjmCGRqiMqeoELSJ1+iK99e97nG/u
L28zf/90Nf0R+wwHL4AOFeploTYfG4WP8EVnlr3CG2UCJrNjxN1KU7yRZoWmWa2d
ckLJ8ouwNvx6VZd233LVmT38EP4352d1LyqIL8/+oXDIyAcRJLFQu1gRCwagsh3w
1v1czowFpWnRQ/z9zU7YD+PA4v85/Ge8sVVHlpi1X5NgV/khk4U6B0crAw6M+la+
mTmpz9g56oAh9m9NUdtv4zDCz1EWGH0v8+ZkAajUKtrM0ftJMn57P6p8PH4VVlE=
=5+Wl
-----END PGP SIGNATURE-----
Merge tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"s390:
- fix register corruption
- ENOTSUPP/EOPNOTSUPP mixed
- reset cleanups/fixes
- selftests
x86:
- Bug fixes and cleanups
- AMD support for APIC virtualization even in combination with
in-kernel PIT or IOAPIC.
MIPS:
- Compilation fix.
Generic:
- Fix refcount overflow for zero page"
* tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit
x86: vmxfeatures: rename features for consistency with KVM and manual
KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
KVM: x86: Fix perfctr WRMSR for running counters
x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
kvm: mmu: Separate generating and setting mmio ptes
kvm: mmu: Replace unsigned with unsigned int for PTE access
KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()
KVM: MIPS: Fix a build error due to referencing not-yet-defined function
x86/kvm: do not setup pv tlb flush when not paravirtualized
KVM: fix overflow of zero page refcount with ksm running
KVM: x86: Take a u64 when checking for a valid dr7 value
KVM: x86: use raw clock values consistently
KVM: x86: reorganize pvclock_gtod_data members
KVM: nVMX: delete meaningless nested_vmx_run() declaration
KVM: SVM: allow AVIC without split irqchip
kvm: ioapic: Lazy update IOAPIC EOI
...
Subsystem:
- the VL_READ and VL_CLR ioctls are now documented and their behavior is
unified across all the drivers.
- RTC_I2C_AND_SPI Kconfig option rework to avoid selecting both REGMAP_I2C and
REGMAP_SPI unecessarily.
Drivers:
- at91rm9200: remove deprecated procfs, add sam9x60, sama5d4 and sama5d2
compatibles.
- cmos: solve lost interrupts issue on MS Surface 3
- hym8563: return proper errno when time is invalid
- rv3029: many fixes, nvram support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEycoQi/giopmpPgB12wIijOdRNOUFAl44lxsACgkQ2wIijOdR
NOVF3BAAlQVFDklLqkS6MJplgZ06lgv4ZIbfxCHYFJPUNt/X/gTcHS1OSTwctmor
3qQamySGc74/GAI6lmNSSzDUz9vi2hwcwvLtR/e6+Kgdhg9EuWWvgsqKYRGAn573
TzHWsYY8bDpZ/mN5K+qqadGlzsP58gsEaw0fzcsPkZQQdq4mNSIrB5RILvsHX8cN
+RviYOqR+ZvAegfQrAfb/9SCwxQAstUqRDaZXodQDEeIk5CEDWyr31+U9eLDdYoN
1FOHYp6uwUy6Vnl0ym7WU42L95tVWx9XOc/PEq8dZ1m09nfrMhqeIoQC8SUtxG9+
FWXN87lkLSlDaLUwVE8T22QII6jP+7Lc2t6SbI4fwwJdNDoPg+5hhabGjQbM2We9
nG1x7TsVwKjeUglfhqeVgGWYzmIhAeOEBXQhqCdtfVi13ocecVFJxOG6oolQGtlZ
M+/t91hID6il7/nxGembcHKapMf9c41CXBPesEg0QjkvGvKj1Z8L9a5vYigzbxWW
zb8m9cjIhY3bb8mns3aCs773PSXavsycLk1Hupw6RWrN1XnEXVMHQZDtvzCJe7EV
8CY40LI0VN6x5HC/d0EnqyrMvQbkgTBPXjLBhj54edCZM3vAny3xkYc03oDU8cKc
0PbLUXKpFmzFbTQB4w0sE2FgT0uZmL2SCXiFemUwxbl+/ZACkoc=
=e1Xr
-----END PGP SIGNATURE-----
Merge tag 'rtc-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"The VL_READ and VL_CLR ioctls have been reworked to be more useful.
This will not break userspace as there are very few users and they are
using the integer value as a boolean.
Apart from that, two drivers were reworked and a few fixes here and
there for a net reduction of number of lines.
Summary:
Subsystem:
- the VL_READ and VL_CLR ioctls are now documented and their behavior
is unified across all the drivers.
- RTC_I2C_AND_SPI Kconfig option rework to avoid selecting both
REGMAP_I2C and REGMAP_SPI unecessarily.
Drivers:
- at91rm9200: remove deprecated procfs, add sam9x60, sama5d4 and
sama5d2 compatibles.
- cmos: solve lost interrupts issue on MS Surface 3
- hym8563: return proper errno when time is invalid
- rv3029: many fixes, nvram support"
* tag 'rtc-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (63 commits)
dt-bindings: rtc: at91rm9200: document clocks property
rtc: i2c/spi: Avoid inclusion of REGMAP support when not needed
rtc: Kconfig: select REGMAP_I2C when necessary
rtc: Kconfig: properly indent sd3078 entry
rtc: cmos: Refactor code by using the new dmi_get_bios_year() helper
rtc: cmos: Use predefined value for RTC IRQ on legacy x86
rtc: cmos: Stop using shared IRQ
rtc: tps6586x: Use IRQ_NOAUTOEN flag
rtc: at91rm9200: use FIELD_PREP/FIELD_GET
rtc: at91rm9200: avoid time readout in at91_rtc_setalarm
rtc: at91rm9200: move register definitions to C file
rtc: at91rm9200: add sama5d4 and sama5d2 compatibles
dt-bindings: rtc: at91rm9200: convert bindings to json-schema
rtc: at91rm9200: remove procfs information
dt-bindings: atmel, at91rm9200-rtc: add microchip, sam9x60-rtc
rtc: pcf8563: Use BIT
rtc: moxart: Convert to SPDX identifier
rtc: ds1343: Remove unused struct spi_device in struct ds1343_priv
rtc: rx8025: Remove struct i2c_client from struct rx8025_data
rtc: hym8563: Read the valid flag directly instead of caching it
...
CRNG hasn't initialized, instead of the old blocking pool. Also clean
up archrandom.h, and some other miscellaneous cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl40j1kACgkQ8vlZVpUN
gaPCywf8CWS9HFd2Iipj60gkTVugjlL5ib0lbfhQcAAwwzw1GLTXJSMBzzoMRHY/
ZI2sJZS1m0V1oWNnXXVKi+A1VXmlValWXAc+7fvbeaIe5pRT1EHP14s4Kz7/4d8Q
dk0b8cxNpR8u5CcbN8y9D+71IKpdksUbX7uGuGfw3bncQdRNwJVf+oS1fMGS0Rsb
F8ddQaED7iFpX2BMl56afQ4t2t0LA5+eLYMGoYoJx5fgd9BseP0TEcjj9Y4Z30M7
+GO4NZjUbAY0syx9r8hx3P/5miWZm2J9QJmJoXHhr5+IcAKM+6+Uo6X6gkOEqV4i
U//V1cqNuowV5ckE4Na+MfBillinsQ==
=HeFM
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random changes from Ted Ts'o:
"Change /dev/random so that it uses the CRNG and only blocking if the
CRNG hasn't initialized, instead of the old blocking pool. Also clean
up archrandom.h, and some other miscellaneous cleanups"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (24 commits)
s390x: Mark archrandom.h functions __must_check
powerpc: Mark archrandom.h functions __must_check
powerpc: Use bool in archrandom.h
x86: Mark archrandom.h functions __must_check
linux/random.h: Mark CONFIG_ARCH_RANDOM functions __must_check
linux/random.h: Use false with bool
linux/random.h: Remove arch_has_random, arch_has_random_seed
s390: Remove arch_has_random, arch_has_random_seed
powerpc: Remove arch_has_random, arch_has_random_seed
x86: Remove arch_has_random, arch_has_random_seed
random: remove some dead code of poolinfo
random: fix typo in add_timer_randomness()
random: Add and use pr_fmt()
random: convert to ENTROPY_BITS for better code readability
random: remove unnecessary unlikely()
random: remove kernel.random.read_wakeup_threshold
random: delete code to pull data into pools
random: remove the blocking pool
random: make /dev/random be almost like /dev/urandom
random: ignore GRND_RANDOM in getentropy(2)
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl40SYEACgkQCF8+vY7k
4RU4TQ/8CgWj2+0uMRyIGpggB7B82vBPRqfHr4DIcZzbLSkdDkeDtrEfM5058cUc
y3NpW9djmcqDMPvOZKFAkb03Bd+mtv89kI72RBTT2mVwCfySYa02K63RqgDg2aFU
FScPUXlwB8hmZG6BpDlMiykJY1SVyhpb9R2f/7scgJ0ZKVwkKRMmLC5/I5A1IbFX
WpoqNzRmT07bZJyDdm5RkzxHdM1EP0flMsqWJb3O2aWqeAw9u9+issk+Uv+cMGR+
70+pmE/6qeurQjS9OHRhrSkf4HjybeByATfgSnONqNrWBtQXgBrHI2TjmT2NvNqV
kWfsprM1GNPhsLveG6JYKGSNwZK6BHxuUULIjXAr1ocRrae2jVZ7/SZkAvnvzO3v
hnb2HwgMBkQSctcl4EJDJeLIc1HgIKbZ7D/mFj7N9Mk3Kn7AqcLNHBv+GMunCPFl
yXNq23ELfxC1HpmQPVhXNM/UaaO5MZCSvOD3MDObcjrxtv4b2bovi6ACDUTgGUNL
sDozTurG2p1VeGupUnzia62gfb0/fjZ2WBk7RRp8E2K4/93YNEeMA9wgF0E8b4YQ
SDQcDF1EtsAPF3msiXBC5FSFG42Ly7Ry04fl0v4lAle/0bPamEdQJ2CMVa1ux2Kp
MRxI39CbRqtoIUbpKzTeInh/FvDDU22TimBKc5sg9d29Hk6y+Yk=
=sGdY
-----END PGP SIGNATURE-----
Merge tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- New staging driver for Rockship ISPv1 unit
- New staging driver for Rockchip MIPI Synopsys DPHY RX0
- y2038 fixes at V4L2 API (backward-compatible)
- A dvb core fix when receiving invalid EIT sections
- Some clang-specific warnings got fixed
- Added support for touch V4L2 interface at vivid
- Several drivers were converted to use the new
i2c_new_scanned_device() kAPI
- Added sm1 support at meson's vdec driver
- Several other driver cleanups, fixes and improvements
* tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (207 commits)
media: staging/intel-ipu3: remove TODO item about acronyms
media: v4l2-fwnode: Print the node name while parsing endpoints
media: Revert "media: staging/intel-ipu3: make imgu use fixed running mode"
media: mt9v111: constify copied structure
media: platform: VIDEO_MEDIATEK_JPEG can also depend on MTK_IOMMU
media: uvcvideo: Add a quirk to force GEO GC6500 Camera bits-per-pixel value
media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
media: hantro: fix post-processing NULL pointer dereference
media: rcar-vin: Use correct pixel format when aligning format
media: MAINTAINERS: add entry for Rockchip ISP1 driver
media: staging: rkisp1: add TODO file for staging
media: staging: rkisp1: add document for rkisp1 meta buffer format
media: staging: rkisp1: add output device for parameters
media: staging: rkisp1: add capture device for statistics
media: staging: rkisp1: add user space ABI definitions
media: staging: rkisp1: add streaming paths
media: staging: rkisp1: add Rockchip ISP1 base driver
media: staging: phy-rockchip-dphy-rx0: add Rockchip MIPI Synopsys DPHY RX0 driver
media: staging: dt-bindings: add Rockchip MIPI RX D-PHY RX0 yaml bindings
media: staging: dt-bindings: add Rockchip ISP1 yaml bindings
...
Pull updates from Andrew Morton:
"Most of -mm and quite a number of other subsystems: hotfixes, scripts,
ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov.
MM is fairly quiet this time. Holidays, I assume"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
kcov: ignore fault-inject and stacktrace
include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
execve: warn if process starts with executable stack
reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
init/main.c: fix misleading "This architecture does not have kernel memory protection" message
init/main.c: fix quoted value handling in unknown_bootoption
init/main.c: remove unnecessary repair_env_string in do_initcall_level
init/main.c: log arguments and environment passed to init
fs/binfmt_elf.c: coredump: allow process with empty address space to coredump
fs/binfmt_elf.c: coredump: delete duplicated overflow check
fs/binfmt_elf.c: coredump: allocate core ELF header on stack
fs/binfmt_elf.c: make BAD_ADDR() unlikely
fs/binfmt_elf.c: better codegen around current->mm
fs/binfmt_elf.c: don't copy ELF header around
fs/binfmt_elf.c: fix ->start_code calculation
fs/binfmt_elf.c: smaller code generation around auxv vector fill
lib/find_bit.c: uninline helper _find_next_bit()
lib/find_bit.c: join _find_next_bit{_le}
uapi: rename ext2_swab() to swab() and share globally in swab.h
lib/scatterlist.c: adjust indentation in __sg_alloc_table
...
ext2_swab() is defined locally in lib/find_bit.c However it is not
specific to ext2, neither to bitmaps.
There are many potential users of it, so rename it to just swab() and
move to include/uapi/linux/swab.h
ABI guarantees that size of unsigned long corresponds to BITS_PER_LONG,
therefore drop unneeded cast.
Link: http://lkml.kernel.org/r/20200103202846.21616-1-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As zone reclaim has been replaced by node reclaim, this patch fixes
related comments.
Link: http://lkml.kernel.org/r/20191126141346.GA22665@haolee.github.io
Signed-off-by: Hao Lee <haolee.swjtu@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The architecture states that we need to reset local IRQs for all CPU
resets. Because the old reset interface did not support the normal CPU
reset we never did that on a normal reset.
Let's implement an interface for the missing normal and clear resets
and reset all local IRQs, registers and control structures as stated
in the architecture.
Userspace might already reset the registers via the vcpu run struct,
but as we need the interface for the interrupt clearing part anyway,
we implement the resets fully and don't rely on userspace to reset the
rest.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20200131100205.74720-4-frankja@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
uapi:
- dma-buf heaps added (and fixed)
- command line add support for panel oreientation
- command line allow overriding penguin count
drm:
- mipi dsi definition updates
- lockdep annotations for dma_resv
- remove dma-buf kmap/kunmap support
- constify fb_ops in all fbdev drivers
- MST fix for daisy chained hotplug-
- CTA-861-G modes with VIC >= 193 added
- fix drm_panel_of_backlight export
- LVDS decoder support
- more device based logging support
- scanline alighment for dumb buffers
- MST DSC helpers
scheduler:
- documentation fixes
- job distribution improvements
panel:
- Logic PD type 28 panel support
- Jimax8729d MIPI-DSI
- igenic JZ4770
- generic DSI devicetree bindings
- sony acx424AKP panel
- Leadtek LTK500HD1829
- xinpeng XPP055C272
- AUO B116XAK01
- GiantPlus GPM940B0
- BOE NV140FHM-N49
- Satoz SAT050AT40H12R2
- Sharp LS020B1DD01D panels.
ttm:
- use blocking WW lock
i915:
- hw/uapi state separation
- Lock annotation improvements
- selftest improvements
- ICL/TGL DSI VDSC support
- VBT parsing improvments
- Display refactoring
- DSI updates + fixes
- HDCP 2.2 for CFL
- CML PCI ID fixes
- GLK+ fbc fix
- PSR fixes
- GEN/GT refactor improvments
- DP MST fixes
- switch context id alloc to xarray
- workaround updates
- LMEM debugfs support
- tiled monitor fixes
- ICL+ clock gating programming removed
- DP MST disable sequence fixed
- LMEM discontiguous object maps
- prefaulting for discontiguous objects
- use LMEM for dumb buffers if possible
- add LMEM mmap support
amdgpu:
- enable sync object timelines for vulkan
- MST atomic routines
- enable MST DSC support
- add DMCUB display microengine support
- DC OEM i2c support
- Renoir DC fixes
- Initial HDCP 2.x support
- BACO support for Arcturus
- Use BACO for runtime PM power save
- gfxoff on navi10
- gfx10 golden updates and fixes
- DCN support on POWER
- GFXOFF for raven1 refresh
- MM engine idle handlers cleanup
- 10bpc EDP panel fixes
- renoir watermark fixes
- SR-IOV fixes
- Arcturus VCN fixes
- GDDR6 training fixes
- freesync fixes
- Pollock support
amdkfd:
- unify more codepath with amdgpu
- use KIQ to setup HIQ rather than MMIO
radeon:
- fix vma fault handler race
- PPC DMA fix
- register check fixes for r100/r200
nouveau:
- mmap_sem vs dma_resv fix
- rewrite the ACR secure boot code for Turing
- TU10x graphics engine support (TU11x pending)
- Page kind mapping for turing
- 10-bit LUT support
- GP10B Tegra fixes
- HD audio regression fix
hisilicon/hibmc:
- use generic fbdev code and helpers
rockchip:
- dsi/px30 support
virtio:
- fb damage support
- static some functions
vc4:
- use dma_resv lock wrappers
msm:
- use dma_resv lock wrappers
- sc7180 display + DSI support
- a618 support
- UBWC support improvements
vmwgfx:
- updates + new logging uapi
exynos:
- enable/disable callback cleanups
etnaviv:
- use dma_resv lock wrappers
atmel-hlcdc:
- clock fixes
mediatek:
- cmdq support
- non-smooth cursor fixes
- ctm property support
sun4i:
- suspend support
- A64 mipi dsi support
rcar-du:
- Color management module support
- LVDS encoder dual-link support
- R8A77980 support
analogic:
- add support for an6345
ast:
- atomic modeset support
- primary plane garbage fix
arcgpu:
- fixes for fourcc handling
tegra:
- minor fixes and improvments
mcde:
- vblank support
meson:
- OSD1 plane AFBC commit
gma500:
- add pageflip support
- reomve global drm_dev
komeda:
- tweak debugfs output
- d32 support
- runtime PM suppotr
udl:
- use generic shmem helpers
- cleanup and fixes
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJeMm6RAAoJEAx081l5xIa+vN8P/0j4jEOv+KIinAhoH+LG3EpD
m2TUuu5OQIoBrcCoWOgFBk3wqYpw6PdMBdkXh+5sE5lfeBynp8oC3Bin+QsHJE05
eGBpZtHe+70MQb0Eha+Aic0hchvBKzRnq6i0MYSIHn6afs76dLmF8knTjycxrvV5
Xu1Z3WDmjzqgWF9ja5JCD6fby11seP5RrwObYKVikO35QQyJJwGSGKgu5rq/pByK
/n0PCnCOINuL0Lz6J9qexdh/0/XYFQilRC31GJNlKbDSFuECF0GOEzEE/xUBW/pI
dLh2YwIIygm18Gar9PgvMwXJn3BfzQ0qEJsf+HlQeNw9iLgbHpp2AsTxHTE87OGe
R/y85taW3jGjPsNOKZOeLpvg/Ro8l8ZipLApvDCG2O22DThg/cd6NDjZxl1FJfRH
acDG/JdgPo5MbdRAH/cM1WuFS9gEM+0BeSQ5gCjtPakF+X4Vz+ABFDLMRJoaejkJ
q8DG32TQXELQx0RMghsqK7YCWGfl+2alA1u9w6TgJh9Rq4iVckvpDeqAZnK1Adkc
87g957Tl0n6FA4wJj/t5jrceiLRMJAm/rBK+R3GZNfWrgx4bHbCmb4fZDZsrFzph
nbAjNJ5kOchrFCaRR47ULby6+Q14MAFbkWq4Crfu4YDdzUkTPpep6pi2GIe8w0rV
P0hdYOYJf6LUda0utuQX
=oFrI
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Davbe Airlie:
"This is the main pull request for graphics for 5.6. Usual selection of
changes all over.
I've got one outstanding vmwgfx pull that touches mm so kept it
separate until after all of this lands. I'll try and get it to you
soon after this, but it might be early next week (nothing wrong with
code, just my schedule is messy)
This also hits a lot of fbdev drivers with some cleanups.
Other notables:
- vulkan timeline semaphore support added to syncobjs
- nouveau turing secureboot/graphics support
- Displayport MST display stream compression support
Detailed summary:
uapi:
- dma-buf heaps added (and fixed)
- command line add support for panel oreientation
- command line allow overriding penguin count
drm:
- mipi dsi definition updates
- lockdep annotations for dma_resv
- remove dma-buf kmap/kunmap support
- constify fb_ops in all fbdev drivers
- MST fix for daisy chained hotplug-
- CTA-861-G modes with VIC >= 193 added
- fix drm_panel_of_backlight export
- LVDS decoder support
- more device based logging support
- scanline alighment for dumb buffers
- MST DSC helpers
scheduler:
- documentation fixes
- job distribution improvements
panel:
- Logic PD type 28 panel support
- Jimax8729d MIPI-DSI
- igenic JZ4770
- generic DSI devicetree bindings
- sony acx424AKP panel
- Leadtek LTK500HD1829
- xinpeng XPP055C272
- AUO B116XAK01
- GiantPlus GPM940B0
- BOE NV140FHM-N49
- Satoz SAT050AT40H12R2
- Sharp LS020B1DD01D panels.
ttm:
- use blocking WW lock
i915:
- hw/uapi state separation
- Lock annotation improvements
- selftest improvements
- ICL/TGL DSI VDSC support
- VBT parsing improvments
- Display refactoring
- DSI updates + fixes
- HDCP 2.2 for CFL
- CML PCI ID fixes
- GLK+ fbc fix
- PSR fixes
- GEN/GT refactor improvments
- DP MST fixes
- switch context id alloc to xarray
- workaround updates
- LMEM debugfs support
- tiled monitor fixes
- ICL+ clock gating programming removed
- DP MST disable sequence fixed
- LMEM discontiguous object maps
- prefaulting for discontiguous objects
- use LMEM for dumb buffers if possible
- add LMEM mmap support
amdgpu:
- enable sync object timelines for vulkan
- MST atomic routines
- enable MST DSC support
- add DMCUB display microengine support
- DC OEM i2c support
- Renoir DC fixes
- Initial HDCP 2.x support
- BACO support for Arcturus
- Use BACO for runtime PM power save
- gfxoff on navi10
- gfx10 golden updates and fixes
- DCN support on POWER
- GFXOFF for raven1 refresh
- MM engine idle handlers cleanup
- 10bpc EDP panel fixes
- renoir watermark fixes
- SR-IOV fixes
- Arcturus VCN fixes
- GDDR6 training fixes
- freesync fixes
- Pollock support
amdkfd:
- unify more codepath with amdgpu
- use KIQ to setup HIQ rather than MMIO
radeon:
- fix vma fault handler race
- PPC DMA fix
- register check fixes for r100/r200
nouveau:
- mmap_sem vs dma_resv fix
- rewrite the ACR secure boot code for Turing
- TU10x graphics engine support (TU11x pending)
- Page kind mapping for turing
- 10-bit LUT support
- GP10B Tegra fixes
- HD audio regression fix
hisilicon/hibmc:
- use generic fbdev code and helpers
rockchip:
- dsi/px30 support
virtio:
- fb damage support
- static some functions
vc4:
- use dma_resv lock wrappers
msm:
- use dma_resv lock wrappers
- sc7180 display + DSI support
- a618 support
- UBWC support improvements
vmwgfx:
- updates + new logging uapi
exynos:
- enable/disable callback cleanups
etnaviv:
- use dma_resv lock wrappers
atmel-hlcdc:
- clock fixes
mediatek:
- cmdq support
- non-smooth cursor fixes
- ctm property support
sun4i:
- suspend support
- A64 mipi dsi support
rcar-du:
- Color management module support
- LVDS encoder dual-link support
- R8A77980 support
analogic:
- add support for an6345
ast:
- atomic modeset support
- primary plane garbage fix
arcgpu:
- fixes for fourcc handling
tegra:
- minor fixes and improvments
mcde:
- vblank support
meson:
- OSD1 plane AFBC commit
gma500:
- add pageflip support
- reomve global drm_dev
komeda:
- tweak debugfs output
- d32 support
- runtime PM suppotr
udl:
- use generic shmem helpers
- cleanup and fixes"
* tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm: (1998 commits)
drm/nouveau/fb/gp102-: allow module to load even when scrubber binary is missing
drm/nouveau/acr: return error when registering LSF if ACR not supported
drm/nouveau/disp/gv100-: not all channel types support reporting error codes
drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
drm/nouveau: support synchronous pushbuf submission
drm/nouveau: signal pending fences when channel has been killed
drm/nouveau: reject attempts to submit to dead channels
drm/nouveau: zero vma pointer even if we only unreference it rather than free
drm/nouveau: Add HD-audio component notifier support
drm/nouveau: fix build error without CONFIG_IOMMU_API
drm/nouveau/kms/nv04: remove set but not used variable 'width'
drm/nouveau/kms/nv50: remove set but not unused variable 'nv_connector'
drm/nouveau/mmu: fix comptag memory leak
drm/nouveau/gr/gp10b: Use gp100_grctx and gp100_gr_zbc
drm/nouveau/pmu/gm20b,gp10b: Fix Falcon bootstrapping
drm/exynos: Rename Exynos to lowercase
drm/exynos: change callback names
drm/mst: Don't do atomic checks over disabled managers
drm/amdgpu: add the lost mutex_init back
drm/amd/display: skip opp blank or unblank if test pattern enabled
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXjFo8wAKCRCRxhvAZXjc
omaGAQDVwCHQekqxp2eC8EJH4Pkt+Bn1BLrA25stlTo93YBPHgEAsPVUCRNcrZAl
VncYmxCfpt3Yu0S/MTVXu5xrRiIXPQk=
=uqTN
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread management updates from Christian Brauner:
"Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
syscall.
This syscall allows for the retrieval of file descriptors of a process
based on its pidfd. A task needs to have ptrace_may_access()
permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
Andy) on the target.
One of the main use-cases is in combination with seccomp's user
notification feature. As a reminder, seccomp's user notification
feature was made available in v5.0. It allows a task to retrieve a
file descriptor for its seccomp filter. The file descriptor is usually
handed of to a more privileged supervising process. The supervisor can
then listen for syscall events caught by the seccomp filter of the
supervisee and perform actions in lieu of the supervisee, usually
emulating syscalls. pidfd_getfd() is needed to expand its uses.
There are currently two major users that wait on pidfd_getfd() and one
future user:
- Netflix, Sargun said, is working on a service mesh where users
should be able to connect to a dns-based VIP. When a user connects
to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
redirected to an envoy process. This service mesh uses seccomp user
notifications and pidfd to intercept all connect calls and instead
of connecting them to 1.2.3.4:80 connects them to e.g.
127.0.0.1:8080.
- LXD uses the seccomp notifier heavily to intercept and emulate
mknod() and mount() syscalls for unprivileged containers/processes.
With pidfd_getfd() more uses-cases e.g. bridging socket connections
will be possible.
- The patchset has also seen some interest from the browser corner.
Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
broker process. In the future glibc will start blocking all signals
during dlopen() rendering this type of sandbox impossible. Hence,
in the future Firefox will switch to a seccomp-user-nofication
based sandbox which also makes use of file descriptor retrieval.
The thread for this can be found at
https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html
With pidfd_getfd() it is e.g. possible to bridge socket connections
for the supervisee (binding to a privileged port) and taking actions
on file descriptors on behalf of the supervisee in general.
Sargun's first version was using an ioctl on pidfds but various people
pushed for it to be a proper syscall which he duely implemented as
well over various review cycles. Selftests are of course included.
I've also added instructions how to deal with merge conflicts below.
There's also a small fix coming from the kernel mentee project to
correctly annotate struct sighand_struct with __rcu to fix various
sparse warnings. We've received a few more such fixes and even though
they are mostly trivial I've decided to postpone them until after -rc1
since they came in rather late and I don't want to risk introducing
build warnings.
Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
needed to avoid allocation recursions triggerable by storage drivers
that have userspace parts that run in the IO path (e.g. dm-multipath,
iscsi, etc). These allocation recursions deadlock the device.
The new prctl() allows such privileged userspace components to avoid
allocation recursions by setting the PF_MEMALLOC_NOIO and
PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
relevant maintainers and is routed here as part of prctl()
thread-management."
* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
sched.h: Annotate sighand_struct with __rcu
test: Add test for pidfd getfd
arch: wire up pidfd_getfd syscall
pid: Implement pidfd_getfd syscall
vfs, fdtable: Add fget_task helper
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4yEegQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpn5ZD/4/WlXs2cUDgg1C65bzZFO4qvevm+VkXmsk
GbyrnFstRekvSH01/ZQxlyDVKS8Wux0XIJ6OArCh1047LvL1bEE5dvOW5iIiwa/r
grjQuwFAzIPsE2fgcAO17BKIUzq2Z96+hwDzH7dw0i32yBuLvNmY/1SxcCHKfPut
uzGyp7t3/2dIHbpWILRndMYe0O9j9ubmOMvKyKTwy723yDEafsUoqu2mlpigzTq4
2i+DbYBIAd8qmLqG/m3e+vOt9xodJ2Q0hlO+v6DcP2SKXU64Hb/N98HadR//aWP9
41DBXqs+dvDBcu3Jxb80PFUTiOQZECJivkns5cNcjuSXmNkOuQhDQR5K372AHmR9
m6e6FSBxwej8HselAZCI6yu9uBKd0i+MM4FnFs/O73QGYx2ayXsEXp/Jad9xiYgW
pC5XJTSqJQhPE0AYYEOzHPPcBLBcpvXHkvmGKdjkNb8OLhhgh2S/YG0DNC+8ABXr
j1uIe/n3kJEEmOanUyiitGyLmDq+mXd7aCVKJL/J0KiGD8Gkc1avAZ1ZrTQgjujY
FqqBFawO8gv3g0L4WMI8JI+HJGMnA488obet6UKm9+l/Z/urEpXzDAKf/W/vnx2B
LD0FSA0bCh1tyO6JU+avFwHlwShtV7/rx/OhrmCK7CCYKtZCA2IEctxyr8U+PBIv
DtwIMTYTsA==
=ZZUI
-----END PGP SIGNATURE-----
Merge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
- Support for various new opcodes (fallocate, openat, close, statx,
fadvise, madvise, openat2, non-vectored read/write, send/recv, and
epoll_ctl)
- Faster ring quiesce for fileset updates
- Optimizations for overflow condition checking
- Support for max-sized clamping
- Support for probing what opcodes are supported
- Support for io-wq backend sharing between "sibling" rings
- Support for registering personalities
- Lots of little fixes and improvements
* tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits)
io_uring: add support for epoll_ctl(2)
eventpoll: support non-blocking do_epoll_ctl() calls
eventpoll: abstract out epoll_ctl() handler
io_uring: fix linked command file table usage
io_uring: support using a registered personality for commands
io_uring: allow registering credentials
io_uring: add io-wq workqueue sharing
io-wq: allow grabbing existing io-wq
io_uring/io-wq: don't use static creds/mm assignments
io-wq: make the io_wq ref counted
io_uring: fix refcounting with batched allocations at OOM
io_uring: add comment for drain_next
io_uring: don't attempt to copy iovec for READ/WRITE
io_uring: honor IOSQE_ASYNC for linked reqs
io_uring: prep req when do IOSQE_ASYNC
io_uring: use labeled array init in io_op_defs
io_uring: optimise sqe-to-req flags translation
io_uring: remove REQ_F_IO_DRAINED
io_uring: file switch work needs to get flushed on exit
io_uring: hide uring_fd in ctx
...
These are updates to device drivers and file systems that for some reason
or another were not included in the kernel in the previous y2038 series.
I've gone through all users of time_t again to make sure the kernel is
in a long-term maintainable state, replacing all remaining references
to time_t with safe alternatives.
Some related parts of the series were picked up into the nfsd, xfs,
alsa and v4l2 trees. A final set of patches in linux-mm removes the now
unused time_t/timeval/timespec types and helper functions after all five
branches are merged for linux-5.6, ensuring that no new users get merged.
As a result, linux-5.6, or my backport of the patches to 5.4 [1], should
be the first release that can serve as a base for a 32-bit system designed
to run beyond year 2038, with a few remaining caveats:
- All user space must be compiled with a 64-bit time_t, which will be
supported in the coming musl-1.2 and glibc-2.32 releases, along with
installed kernel headers from linux-5.6 or higher.
- Applications that use the system call interfaces directly need to be
ported to use the time64 syscalls added in linux-5.1 in place of the
existing system calls. This impacts most users of futex() and seccomp()
as well as programming languages that have their own runtime environment
not based on libc.
- Applications that use a private copy of kernel uapi header files or
their contents may need to update to the linux-5.6 version, in
particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h.
- A few remaining interfaces cannot be changed to pass a 64-bit time_t
in a compatible way, so they must be configured to use CLOCK_MONOTONIC
times or (with a y2106 problem) unsigned 32-bit timestamps. Most
importantly this impacts all users of 'struct input_event'.
- All y2038 problems that are present on 64-bit machines also apply to
32-bit machines. In particular this affects file systems with on-disk
timestamps using signed 32-bit seconds: ext4 with ext3-style small
inodes, ext2, xfs (to be fixed soon) and ufs.
Changes since v1 [2]:
- Add Acks I received
- Rebase to v5.5-rc1, dropping patches that got merged already
- Add NFS, XFS and the final three patches from another series
- Rewrite etnaviv patches
- Add one late revert to avoid an etnaviv regression
[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame
[2] https://lore.kernel.org/lkml/20191108213257.3097633-1-arnd@arndb.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJeMYy3AAoJEGCrR//JCVInEGwP/0R+S+ok7vw9OdLVT0lFl07D
IcVabgOWf24imN7m7L7Mlt3nDfxIT4tMpiAXq7eMO3spcyViG18O2LXdSQ4/7QBp
+BlhoMjOP9w34Jyd7mnkFr4vqQALvfIqkS8rFObDtDub2Rfj9PC36MRMIu8BPXlv
RK8bigwJeH/DV38yc5/JeUcD+WuewYLsK9XPWN+4yB4vgGsNU3ZQQ6nnzbR3hMsN
DN8WZ68Y7IBs0Kyxkf+s2zmRXtCa2RiFg/2TUsk5olVAJVaenvte69hq5RSbg1vW
vLi6K8cBoPWL59nqCzcNE+TUhSUg3LOj/a/KWyl76yovz7AlJaNjssOf8ZjHw6sL
MhQqz3hXTxiJDS2Jvbf1yojiYGlzrq/gqcRFGe9jPcZdieMc4/yZCx60G/Exa5Pu
YdMcqMyDWPFyUAFQNWEF59HPheOdj6tb1KpJ6bwgCo3P7QqhLrU4z9w3Py4/ZfBO
4sWcWteSsD6MN/ADJ2WQ56nNxzM2AvkeVJKcF6FCkdngXX9T0GExmZz7SqB5Du99
9lNjIiD5E+LBa/Swo/7n49aYa8x06V1pmHYTZVh9Wkl+CZiO21umezQFrWsfaMTp
xt3c6pFdMG5xNMGpreTAXOmf2R+T6O8IO2qQq/TYjzqOLH7QC830P7avkmml+cK1
LjOBE2TfSeO8Ru1dXV4t
=wx0A
-----END PGP SIGNATURE-----
Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull y2038 updates from Arnd Bergmann:
"Core, driver and file system changes
These are updates to device drivers and file systems that for some
reason or another were not included in the kernel in the previous
y2038 series.
I've gone through all users of time_t again to make sure the kernel is
in a long-term maintainable state, replacing all remaining references
to time_t with safe alternatives.
Some related parts of the series were picked up into the nfsd, xfs,
alsa and v4l2 trees. A final set of patches in linux-mm removes the
now unused time_t/timeval/timespec types and helper functions after
all five branches are merged for linux-5.6, ensuring that no new users
get merged.
As a result, linux-5.6, or my backport of the patches to 5.4 [1],
should be the first release that can serve as a base for a 32-bit
system designed to run beyond year 2038, with a few remaining caveats:
- All user space must be compiled with a 64-bit time_t, which will be
supported in the coming musl-1.2 and glibc-2.32 releases, along
with installed kernel headers from linux-5.6 or higher.
- Applications that use the system call interfaces directly need to
be ported to use the time64 syscalls added in linux-5.1 in place of
the existing system calls. This impacts most users of futex() and
seccomp() as well as programming languages that have their own
runtime environment not based on libc.
- Applications that use a private copy of kernel uapi header files or
their contents may need to update to the linux-5.6 version, in
particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
linux/elfcore.h, linux/sockios.h, linux/timex.h and
linux/can/bcm.h.
- A few remaining interfaces cannot be changed to pass a 64-bit
time_t in a compatible way, so they must be configured to use
CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
timestamps. Most importantly this impacts all users of 'struct
input_event'.
- All y2038 problems that are present on 64-bit machines also apply
to 32-bit machines. In particular this affects file systems with
on-disk timestamps using signed 32-bit seconds: ext4 with
ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"
[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame
* tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
y2038: sh: remove timeval/timespec usage from headers
y2038: sparc: remove use of struct timex
y2038: rename itimerval to __kernel_old_itimerval
y2038: remove obsolete jiffies conversion functions
nfs: fscache: use timespec64 in inode auxdata
nfs: fix timstamp debug prints
nfs: use time64_t internally
sunrpc: convert to time64_t for expiry
drm/etnaviv: avoid deprecated timespec
drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
drm/msm: avoid using 'timespec'
hfs/hfsplus: use 64-bit inode timestamps
hostfs: pass 64-bit timestamps to/from user space
packet: clarify timestamp overflow
tsacct: add 64-bit btime field
acct: stop using get_seconds()
um: ubd: use 64-bit time_t where possible
xtensa: ISS: avoid struct timeval
dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
...
Pull openat2 support from Al Viro:
"This is the openat2() series from Aleksa Sarai.
I'm afraid that the rest of namei stuff will have to wait - it got
zero review the last time I'd posted #work.namei, and there had been a
leak in the posted series I'd caught only last weekend. I was going to
repost it on Monday, but the window opened and the odds of getting any
review during that... Oh, well.
Anyway, openat2 part should be ready; that _did_ get sane amount of
review and public testing, so here it comes"
From Aleksa's description of the series:
"For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown
flags are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road
to being added to openat(2).
Furthermore, the need for some sort of control over VFS's path
resolution (to avoid malicious paths resulting in inadvertent
breakouts) has been a very long-standing desire of many userspace
applications.
This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
(which was a variant of David Drysdale's O_BENEATH patchset[4] which
was a spin-off of the Capsicum project[5]) with a few additions and
changes made based on the previous discussion within [6] as well as
others I felt were useful.
In line with the conclusions of the original discussion of
AT_NO_JUMPS, the flag has been split up into separate flags. However,
instead of being an openat(2) flag it is provided through a new
syscall openat2(2) which provides several other improvements to the
openat(2) interface (see the patch description for more details). The
following new LOOKUP_* flags are added:
LOOKUP_NO_XDEV:
Blocks all mountpoint crossings (upwards, downwards, or through
absolute links). Absolute pathnames alone in openat(2) do not
trigger this. Magic-link traversal which implies a vfsmount jump is
also blocked (though magic-link jumps on the same vfsmount are
permitted).
LOOKUP_NO_MAGICLINKS:
Blocks resolution through /proc/$pid/fd-style links. This is done
by blocking the usage of nd_jump_link() during resolution in a
filesystem. The term "magic-links" is used to match with the only
reference to these links in Documentation/, but I'm happy to change
the name.
It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.
In order to correctly detect magic-links, the introduction of a new
LOOKUP_MAGICLINK_JUMPED state flag was required.
LOOKUP_BENEATH:
Disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed.
Conceptually this flag is to ensure you "stay below" a certain
point in the filesystem tree -- but this requires some additional
to protect against various races that would allow escape using
"..".
Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done
as in LOOKUP_IN_ROOT, but that requires more discussion.
In addition, two new flags are added that expand on the above ideas:
LOOKUP_NO_SYMLINKS:
Does what it says on the tin. No symlink resolution is allowed at
all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
can still be used with NOFOLLOW to open an fd for the symlink as
long as no parent path had a symlink component.
LOOKUP_IN_ROOT:
This is an extension of LOOKUP_BENEATH that, rather than blocking
attempts to move past the root, forces all such movements to be
scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that
chroot(2) is not.
If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to
cross magic-links with LOOKUP_IN_ROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[7] when opening
paths in a potentially malicious container.
There is a long list of CVEs that could have bene mitigated by
having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
few).
In order to make all of the above more usable, I'm working on
libpathrs[8] which is a C-friendly library for safe path resolution.
It features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.
Future work would include implementing things like
RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
programs to be sure they don't hit DoSes though stale NFS handles)"
* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Documentation: path-lookup: include new LOOKUP flags
selftests: add openat2(2) selftests
open: introduce openat2(2) syscall
namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
namei: LOOKUP_NO_XDEV: block mountpoint crossing
namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
namei: LOOKUP_NO_SYMLINKS: block symlink resolution
namei: allow set_root() to produce errors
namei: allow nd_jump_link() to produce errors
nsfs: clean-up ns_get_path() signature to return int
namei: only return -ECHILD from follow_dotdot_rcu()
Here is the big staging/iio driver patches for 5.6-rc1
Included in here are:
- lots of new IIO drivers and updates for that subsystem
- the usual huge quantity of minor cleanups for staging drivers
- removal of the following staging drivers:
- isdn/avm
- isdn/gigaset
- isdn/hysdn
- octeon-usb
- octeon ethernet
Overall we deleted far more lines than we added, removing over 40k of
old and obsolete driver code.
All of these changes have been in linux-next for a while with no
reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXjFOKw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yly3wCfac6fbfrpwZ2VeUFyT5EJFr9JnKEAn1VMQTIJ
QCgCqbQemnXfbOXiA5pZ
=rP6a
-----END PGP SIGNATURE-----
Merge tag 'staging-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO updates from Greg KH:
"Here is the big staging/iio driver patches for 5.6-rc1
Included in here are:
- lots of new IIO drivers and updates for that subsystem
- the usual huge quantity of minor cleanups for staging drivers
- removal of the following staging drivers:
- isdn/avm
- isdn/gigaset
- isdn/hysdn
- octeon-usb
- octeon ethernet
Overall we deleted far more lines than we added, removing over 40k of
old and obsolete driver code.
All of these changes have been in linux-next for a while with no
reported issues"
* tag 'staging-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (353 commits)
staging: most: usb: check for NULL device
staging: next: configfs: fix release link
staging: most: core: fix logging messages
staging: most: core: remove container struct
staging: most: remove struct device core driver
staging: most: core: drop device reference
staging: most: remove device from interface structure
staging: comedi: drivers: fix spelling mistake "to" -> "too"
staging: exfat: remove fs_func struct.
staging: wilc1000: avoid mutex unlock without lock in wilc_wlan_handle_txq()
staging: wilc1000: return zero on success and non-zero on function failure
staging: axis-fifo: replace spinlock with mutex
staging: wilc1000: remove unused code prior to throughput enhancement in SPI
staging: wilc1000: added 'wilc_' prefix for 'struct assoc_resp' name
staging: wilc1000: move firmware API struct's to separate header file
staging: wilc1000: remove use of infinite loop conditions
staging: kpc2000: rename variables with kpc namespace
staging: vt6656: Remove memory buffer from vnt_download_firmware.
staging: vt6656: Just check NEWRSR_DECRYPTOK for RX_FLAG_DECRYPTED.
staging: vt6656: Use vnt_rx_tail struct for tail variables.
...
For personalities previously registered via IORING_REGISTER_PERSONALITY,
allow any command to select them. This is done through setting
sqe->personality to the id returned from registration, and then flagging
sqe->flags with IOSQE_PERSONALITY.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If an application wants to use a ring with different kinds of
credentials, it can register them upfront. We don't lookup credentials,
the credentials of the task calling IORING_REGISTER_PERSONALITY is used.
An 'id' is returned for the application to use in subsequent personality
support.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to
be a valid io_uring fd io-wq of which will be shared with the newly
created io_uring instance. If the flag is set but it can't share io-wq,
it fails.
This allows creation of "sibling" io_urings, where we prefer to keep the
SQ/CQ private, but want to share the async backend to minimize the amount
of overhead associated with having multiple rings that belong to the same
backend.
Reported-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Daurnimator <quae@daurnimator.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently setup the io_wq with a static set of mm and creds. Even for
a single-use io-wq per io_uring, this is suboptimal as we have may have
multiple enters of the ring. For sharing the io-wq backend, it doesn't
work at all.
Switch to passing in the creds and mm when the work item is setup. This
means that async work is no longer deferred to the io_uring mm and creds,
it is done with the current mm and creds.
Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know
they can rely on the current personality (mm and creds) being the same
for direct issue and async issue.
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull networking updates from David Miller:
1) Add WireGuard
2) Add HE and TWT support to ath11k driver, from John Crispin.
3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.
4) Add variable window congestion control to TIPC, from Jon Maloy.
5) Add BCM84881 PHY driver, from Russell King.
6) Start adding netlink support for ethtool operations, from Michal
Kubecek.
7) Add XDP drop and TX action support to ena driver, from Sameeh
Jubran.
8) Add new ipv4 route notifications so that mlxsw driver does not have
to handle identical routes itself. From Ido Schimmel.
9) Add BPF dynamic program extensions, from Alexei Starovoitov.
10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.
11) Add support for macsec HW offloading, from Antoine Tenart.
12) Add initial support for MPTCP protocol, from Christoph Paasch,
Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.
13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
Cherian, and others.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
udp: segment looped gso packets correctly
netem: change mailing list
qed: FW 8.42.2.0 debug features
qed: rt init valid initialization changed
qed: Debug feature: ilt and mdump
qed: FW 8.42.2.0 Add fw overlay feature
qed: FW 8.42.2.0 HSI changes
qed: FW 8.42.2.0 iscsi/fcoe changes
qed: Add abstraction for different hsi values per chip
qed: FW 8.42.2.0 Additional ll2 type
qed: Use dmae to write to widebus registers in fw_funcs
qed: FW 8.42.2.0 Parser offsets modified
qed: FW 8.42.2.0 Queue Manager changes
qed: FW 8.42.2.0 Expose new registers and change windows
qed: FW 8.42.2.0 Internal ram offsets modifications
MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
Documentation: net: octeontx2: Add RVU HW and drivers overview
octeontx2-pf: ethtool RSS config support
octeontx2-pf: Add basic ethtool support
...
Pull crypto updates from Herbert Xu:
"API:
- Removed CRYPTO_TFM_RES flags
- Extended spawn grabbing to all algorithm types
- Moved hash descsize verification into API code
Algorithms:
- Fixed recursive pcrypt dead-lock
- Added new 32 and 64-bit generic versions of poly1305
- Added cryptogams implementation of x86/poly1305
Drivers:
- Added support for i.MX8M Mini in caam
- Added support for i.MX8M Nano in caam
- Added support for i.MX8M Plus in caam
- Added support for A33 variant of SS in sun4i-ss
- Added TEE support for Raven Ridge in ccp
- Added in-kernel API to submit TEE commands in ccp
- Added AMD-TEE driver
- Added support for BCM2711 in iproc-rng200
- Added support for AES256-GCM based ciphers for chtls
- Added aead support on SEC2 in hisilicon"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits)
crypto: arm/chacha - fix build failured when kernel mode NEON is disabled
crypto: caam - add support for i.MX8M Plus
crypto: x86/poly1305 - emit does base conversion itself
crypto: hisilicon - fix spelling mistake "disgest" -> "digest"
crypto: chacha20poly1305 - add back missing test vectors and test chunking
crypto: x86/poly1305 - fix .gitignore typo
tee: fix memory allocation failure checks on drv_data and amdtee
crypto: ccree - erase unneeded inline funcs
crypto: ccree - make cc_pm_put_suspend() void
crypto: ccree - split overloaded usage of irq field
crypto: ccree - fix PM race condition
crypto: ccree - fix FDE descriptor sequence
crypto: ccree - cc_do_send_request() is void func
crypto: ccree - fix pm wrongful error reporting
crypto: ccree - turn errors to debug msgs
crypto: ccree - fix AEAD decrypt auth fail
crypto: ccree - fix typo in comment
crypto: ccree - fix typos in error msgs
crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data
crypto: x86/sha - Eliminate casts on asm implementations
...
- Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
provided via a keyring key.
- Prepare for the new dirhash method (SipHash of plaintext name) that
will be used by directories that are both encrypted and casefolded.
- Switch to a new format for "no-key names" that prepares for the new
dirhash method, and also fixes a longstanding bug where multiple
filenames could map to the same no-key name.
- Allow the crypto algorithms used by fscrypt to be built as loadable
modules when the fscrypt-capable filesystems are.
- Optimize fscrypt_zeroout_range().
- Various cleanups.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXi+OfxQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK0tJAQDkxUl11NRqVgS06TLWAniKVMWh3kaL
CaonjEmfs1m6kgEA+TP8coTAxUvJNubaHz3J3x9dyx9RPUVFyUzSB0J/TQg=
=XkXJ
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
- Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
provided via a keyring key.
- Prepare for the new dirhash method (SipHash of plaintext name) that
will be used by directories that are both encrypted and casefolded.
- Switch to a new format for "no-key names" that prepares for the new
dirhash method, and also fixes a longstanding bug where multiple
filenames could map to the same no-key name.
- Allow the crypto algorithms used by fscrypt to be built as loadable
modules when the fscrypt-capable filesystems are.
- Optimize fscrypt_zeroout_range().
- Various cleanups.
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: (26 commits)
fscrypt: improve format of no-key names
ubifs: allow both hash and disk name to be provided in no-key names
ubifs: don't trigger assertion on invalid no-key filename
fscrypt: clarify what is meant by a per-file key
fscrypt: derive dirhash key for casefolded directories
fscrypt: don't allow v1 policies with casefolding
fscrypt: add "fscrypt_" prefix to fname_encrypt()
fscrypt: don't print name of busy file when removing key
ubifs: use IS_ENCRYPTED() instead of ubifs_crypt_is_encrypted()
fscrypt: document gfp_flags for bounce page allocation
fscrypt: optimize fscrypt_zeroout_range()
fscrypt: remove redundant bi_status check
fscrypt: Allow modular crypto algorithms
fscrypt: include <linux/ioctl.h> in UAPI header
fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info()
fscrypt: remove fscrypt_is_direct_key_policy()
fscrypt: move fscrypt_valid_enc_modes() to policy.c
fscrypt: check for appropriate use of DIRECT_KEY flag earlier
fscrypt: split up fscrypt_supported_policy() by policy version
fscrypt: introduce fscrypt_needs_contents_encryption()
...
There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
amd nbd that have userspace components that can run in the IO path. For
example, iscsi and nbd's userspace deamons may need to recreate a socket
and/or send IO on it, and dm-multipath's daemon multipathd may need to
send SG IO or read/write IO to figure out the state of paths and re-set
them up.
In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
memalloc_*_save/restore functions to control the allocation behavior,
but for userspace we would end up hitting an allocation that ended up
writing data back to the same device we are trying to allocate for.
The device is then in a state of deadlock, because to execute IO the
device needs to allocate memory, but to allocate memory the memory
layers want execute IO to the device.
Here is an example with nbd using a local userspace daemon that performs
network IO to a remote server. We are using XFS on top of the nbd device,
but it can happen with any FS or other modules layered on top of the nbd
device that can write out data to free memory. Here a nbd daemon helper
thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
a request. This kicks off a reclaim operation which results in a WRITE to
the nbd device and the nbd thread calling back into the mm layer.
[ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000
[ 1626.609193] Call Trace:
[ 1626.609195] ? __schedule+0x29b/0x630
[ 1626.609197] ? wait_for_completion+0xe0/0x170
[ 1626.609198] schedule+0x30/0xb0
[ 1626.609200] schedule_timeout+0x1f6/0x2f0
[ 1626.609202] ? blk_finish_plug+0x21/0x2e
[ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410
[ 1626.609206] ? wait_for_completion+0xe0/0x170
[ 1626.609208] wait_for_completion+0x108/0x170
[ 1626.609210] ? wake_up_q+0x70/0x70
[ 1626.609212] ? __xfs_buf_submit+0x12e/0x250
[ 1626.609214] ? xfs_bwrite+0x25/0x60
[ 1626.609215] xfs_buf_iowait+0x22/0xf0
[ 1626.609218] __xfs_buf_submit+0x12e/0x250
[ 1626.609220] xfs_bwrite+0x25/0x60
[ 1626.609222] xfs_reclaim_inode+0x2e8/0x310
[ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300
[ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40
[ 1626.609228] super_cache_scan+0x152/0x1a0
[ 1626.609231] do_shrink_slab+0x12c/0x2d0
[ 1626.609233] shrink_slab+0x9c/0x2a0
[ 1626.609235] shrink_node+0xd7/0x470
[ 1626.609237] do_try_to_free_pages+0xbf/0x380
[ 1626.609240] try_to_free_pages+0xd9/0x1f0
[ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30
[ 1626.609251] ? ___slab_alloc+0x238/0x560
[ 1626.609254] __alloc_pages_nodemask+0x30c/0x350
[ 1626.609259] skb_page_frag_refill+0x97/0xd0
[ 1626.609274] sk_page_frag_refill+0x1d/0x80
[ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0
[ 1626.609304] tcp_sendmsg+0x27/0x40
[ 1626.609307] sock_sendmsg+0x54/0x60
[ 1626.609308] ___sys_sendmsg+0x29f/0x320
[ 1626.609313] ? sock_poll+0x66/0xb0
[ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0
[ 1626.609320] ? ep_send_events_proc+0xe6/0x230
[ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0
[ 1626.609324] ? ep_read_events_proc+0xc0/0xc0
[ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20
[ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230
[ 1626.609329] ? __hrtimer_init+0xb0/0xb0
[ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20
[ 1626.609334] ? ep_poll+0x26c/0x4a0
[ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0
[ 1626.609339] ? release_sock+0x43/0x90
[ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20
[ 1626.609342] __sys_sendmsg+0x47/0x80
[ 1626.609347] do_syscall_64+0x5f/0x1c0
[ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0
[ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9
This patch adds a new prctl command that daemons can use after they have
done their initial setup, and before they start to do allocations that
are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
flags so both userspace block and FS threads can use it to avoid the
allocation recursion and try to prevent from being throttled while
writing out data to free up memory.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Masato Suzuki <masato.suzuki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
- Time namespace support:
If a container migrates from one host to another then it expects that
clocks based on MONOTONIC and BOOTTIME are not subject to
disruption. Due to different boot time and non-suspended runtime these
clocks can differ significantly on two hosts, in the worst case time
goes backwards which is a violation of the POSIX requirements.
The time namespace addresses this problem. It allows to set offsets for
clock MONOTONIC and BOOTTIME once after creation and before tasks are
associated with the namespace. These offsets are taken into account by
timers and timekeeping including the VDSO.
Offsets for wall clock based clocks (REALTIME/TAI) are not provided by
this mechanism. While in theory possible, the overhead and code
complexity would be immense and not justified by the esoteric potential
use cases which were discussed at Plumbers '18.
The overhead for tasks in the root namespace (host time offsets = 0) is
in the noise and great effort was made to ensure that especially in the
VDSO. If time namespace is disabled in the kernel configuration the
code is compiled out.
Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature
and kept on for more than a year addressing review comments, finding
better solutions. A pleasant experience.
- Overhaul of the alarmtimer device dependency handling to ensure that
the init/suspend/resume ordering is correct.
- A new clocksource/event driver for Microchip PIT64
- Suspend/resume support for the Hyper-V clocksource
- The usual pile of fixes, updates and improvements mostly in the
driver code.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl4vbTcTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXT2D/96iJ3G9Snn2khEQP3XS2rYmtDGw7NO
m1n96falwWeGe6zreU80R2Jge5nLxQtNhRoMPLLee1GpHwRC6lvqEqgdZ4LMBrD2
JqV7Gzg8Urmdh+hpDsyTCpeEWEzoMKxiFOX8PxwctqUhM4szEe5iQg2YQsg85Jw2
vG6M93N2xwDILh4rhEMbKjo+5ZmYn7c1RQvpGOSmpKOj940W/N7H2HBsFhdaJ1Kw
FW5pFv1211PaU5RV2YNb2dMeeMTT1N3e2VN4Dkadoxp47pb+725gNHEBEjmV9poG
Lp4IhzGAPnj8zVD88icQZSTaK3gUHMClxprJ0Pf84WEtiH7SeGu8BPYyu77+oNDe
yzcctDJNyCWXkzmaP/fe/HLc0TStbvNAJ5Tagp4BC75gzebeb4/n8RtRT0fKeDYL
pxpDPKDAPU7p1JSjxiWAtshqjBycWNY3Z49bA7/VhKBhnv8BDyBPGlYd7/4xrbGr
RK7DQNXJwaJaiNJ7p5PiaFxGzNyB0B9sThD/slSlEInIKb4h9YzWr0TV+NB62VnB
sDcN+tpLbRPz5/5cHGGfxR0+zKWpfyai8pzbmmaXEaKssjRYwyvcac5EZdgbWpbK
k7CqAjoWLA2P+tGeePNJOf5JYK6Vmdyh4clmuwM0zOiRJ9NlWUyMf3z7QYILs4RO
UAI+6opYlZEPAw==
=x3qT
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timekeeping and timers departement provides:
- Time namespace support:
If a container migrates from one host to another then it expects
that clocks based on MONOTONIC and BOOTTIME are not subject to
disruption. Due to different boot time and non-suspended runtime
these clocks can differ significantly on two hosts, in the worst
case time goes backwards which is a violation of the POSIX
requirements.
The time namespace addresses this problem. It allows to set offsets
for clock MONOTONIC and BOOTTIME once after creation and before
tasks are associated with the namespace. These offsets are taken
into account by timers and timekeeping including the VDSO.
Offsets for wall clock based clocks (REALTIME/TAI) are not provided
by this mechanism. While in theory possible, the overhead and code
complexity would be immense and not justified by the esoteric
potential use cases which were discussed at Plumbers '18.
The overhead for tasks in the root namespace (ie where host time
offsets = 0) is in the noise and great effort was made to ensure
that especially in the VDSO. If time namespace is disabled in the
kernel configuration the code is compiled out.
Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
feature and kept on for more than a year addressing review
comments, finding better solutions. A pleasant experience.
- Overhaul of the alarmtimer device dependency handling to ensure
that the init/suspend/resume ordering is correct.
- A new clocksource/event driver for Microchip PIT64
- Suspend/resume support for the Hyper-V clocksource
- The usual pile of fixes, updates and improvements mostly in the
driver code"
* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
alarmtimer: Use wakeup source from alarmtimer platform device
alarmtimer: Make alarmtimer platform device child of RTC device
alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
hrtimer: Add missing sparse annotation for __run_timer()
lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
clocksource/drivers/exynos_mct: Rename Exynos to lowercase
clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
clocksource/drivers/bcm2835_timer: Fix memory leak of timer
clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4vOrAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgph+HD/9bM9CqjchDitL0NQne+4BwBdoRCcik0z7n
y/CrNIh3tZnkJO0fT9Lz6GD/6iZNU93NUHFMOgzuS+8mR5CwQUkR/xjDvPX8H07F
h+Xl8ZUX6YjbuLmO0sgc9yu3SkMaxjCHfGPl1juZXwH6ERM6MTSkg6O+YwQZnvAB
lLJWaa1oOTQAsbnz7ZVwZ5pDOfkoSirCat2kzPoyfzptcIrUw7vfu4QHdCdNHy63
eT/vcHmj6CqzZWJRfpkaFOY6fnY30Hh9fqAVQvzxHPvm1vM3z7JSw7cY8t+cjXjn
TJ0NQK2QFmGTTa/ZEf3KCB5kbNV0SpOV6Jqz1aBX/cStQez6ygFkPGscPbwy8tsR
vBVDyCMZC42jbt7TuIHNkAI/e+HqSOBgyB8MaWaQfApcbNzTIFp9lltrTcZpaYNZ
J4R6YQGDve+ElUlOAPBbiXRGrd3jmhApP8scbgls05UwZOtDf+KJBCLQYRzw8qrb
J7D7hVugwV0oDhdaUkd4Pt3KYoISsFgIe7HRuKGGmfKyqWiJ5iLH0QVPaEkPAokr
VzzSoex+5xcCSvIiGd1DNzsVD9C2xbyUvifHTa36pYKQ65BogyJBopgYgEYd8ksN
AlmPxJM9j1o85TtV1CAbb2O0827BlLmYLc6BcdD+s0x+FeStdnjwICQooHiitTiI
hEHajSDujQ==
=Us3h
-----END PGP SIGNATURE-----
Merge tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"Like the core side, not a lot of changes here, just two main items:
- Series of patches (via Coly) with fixes for bcache (Coly,
Christoph)
- MD pull request from Song"
* tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block: (31 commits)
bcache: reap from tail of c->btree_cache in bch_mca_scan()
bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
bcache: remove member accessed from struct btree
bcache: print written and keys in trace_bcache_btree_write
bcache: avoid unnecessary btree nodes flushing in btree_flush_write()
bcache: add code comments for state->pool in __btree_sort()
lib: crc64: include <linux/crc64.h> for 'crc64_be'
bcache: use read_cache_page_gfp to read the superblock
bcache: store a pointer to the on-disk sb in the cache and cached_dev structures
bcache: return a pointer to the on-disk sb from read_super
bcache: transfer the sb_page reference to register_{bdev,cache}
bcache: fix use-after-free in register_bcache()
bcache: properly initialize 'path' and 'err' in register_bcache()
bcache: rework error unwinding in register_bcache
bcache: use a separate data structure for the on-disk super block
bcache: cached_dev_free needs to put the sb page
md/raid1: introduce wait_for_serialization
md/raid1: use bucket based mechanism for IO serialization
md: introduce a new struct for IO serialization
md: don't destroy serial_info_pool if serialize_policy is true
...
- Core:
- Support for dynamic channels
- Removal of various slave wrappers
- Make few slave request APIs as private to dmaengine
- Symlinks between channels and slaves
- Support for hotplug of controllers
- Support for metadata_ops for dma_async_tx_descriptor
- Reporting DMA cached data amount
- Virtual dma channel locking updates
- New drivers/device/feature support support:
- Driver for Intel data accelerators
- Driver for TI K3 UDMA
- Driver for PLX DMA engine
- Driver for hisilicon Kunpeng DMA engine
- Support for eDMA support for QorIQ LS1028A in fsl edma driver
- Support for cyclic dma in sun4i driver
- Support for X1830 in JZ4780 driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl4u+QkACgkQfBQHDyUj
g0cCcg//awBruofTHIrBOwHmCX1a09mw5WmkFG48N7tYp4fvaI1aOgs3hH9PZiBG
fFZUktodwYpEKg6JJOfm1RnLBuKm0+3zmaKGPdK1RcbaDURh8G9qhW65f4mfImvB
GXlgw59WKtgPAM9zWW9UxjugAk4DBte5xVKYJUsI0t4P7k9TM4i0Fv0VmMUhhDuo
buPD1cM/GWFHbE7OYJ51aGRtrOHV1nPgQaHBkWaT7EotzGsZ3gtWYzteI3BRXRV/
IkSgxOefMkIgu1j3KIxFZ1CJDHCZSnx2B+AEMCcp63osyeHBOYoL7KQxo6tBjaRV
fbCasbkTkvvJUjyZdtOdU2wqf7ZqoDkD+n5nkpENf4G1M8J5RiHmrFq96m3HRonE
V1bmMslXhsJlvtoT6ec2iJFchiq0nx1XHyST6faUOK+0cd1lzbogWwztydQH4fwd
TxfEd+eYlFFu3lGDfRp14Tz7fAcFNPZ2bJQhZkF6RpwUW3y3L0cJc3Y0AcWmNkvJ
oStvTlbbUvgRgO7rvEyAmdPb31lE6PLaA0WCahcvf4zQxxNMyYyaWP73MegvqJGO
pfJXBOWBTTKwu0fDR5UHJd3tEDABvcZnwBaCSYrpI5f9bJ4NRI3f4DIMwLBnw9IK
aH6pzwo4gTAMuvxzq8KeTp3hU7kszyUN8q8hiTZlgVozMLKXhQY=
=mv1v
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-5.6-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"This time we have a bunch of core changes to support dynamic channels,
hotplug of controllers, new apis for metadata ops etc along with new
drivers for Intel data accelerators, TI K3 UDMA, PLX DMA engine and
hisilicon Kunpeng DMA engine. Also usual assorted updates to drivers.
Core:
- Support for dynamic channels
- Removal of various slave wrappers
- Make few slave request APIs as private to dmaengine
- Symlinks between channels and slaves
- Support for hotplug of controllers
- Support for metadata_ops for dma_async_tx_descriptor
- Reporting DMA cached data amount
- Virtual dma channel locking updates
New drivers/device/feature support support:
- Driver for Intel data accelerators
- Driver for TI K3 UDMA
- Driver for PLX DMA engine
- Driver for hisilicon Kunpeng DMA engine
- Support for eDMA support for QorIQ LS1028A in fsl edma driver
- Support for cyclic dma in sun4i driver
- Support for X1830 in JZ4780 driver"
* tag 'dmaengine-5.6-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (62 commits)
dmaengine: Create symlinks between DMA channels and slaves
dmaengine: hisilicon: Add Kunpeng DMA engine support
dmaengine: idxd: add char driver to expose submission portal to userland
dmaengine: idxd: connect idxd to dmaengine subsystem
dmaengine: idxd: add descriptor manipulation routines
dmaengine: idxd: add sysfs ABI for idxd driver
dmaengine: idxd: add configuration component of driver
dmaengine: idxd: Init and probe for Intel data accelerators
dmaengine: add support to dynamic register/unregister of channels
dmaengine: break out channel registration
x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction
dmaengine: ti: k3-udma: fix spelling mistake "limted" -> "limited"
dmaengine: s3c24xx-dma: fix spelling mistake "to" -> "too"
dmaengine: Move dma_get_{,any_}slave_channel() to private dmaengine.h
dmaengine: Remove dma_request_slave_channel_compat() wrapper
dmaengine: Remove dma_device_satisfies_mask() wrapper
dt-bindings: fsl-imx-sdma: Add i.MX8MM/i.MX8MN/i.MX8MP compatible string
dmaengine: zynqmp_dma: fix burst length configuration
dmaengine: sun4i: Add support for cyclic requests with dedicated DMA
dmaengine: fsl-qdma: fix duplicated argument to &&
...
Pull HID updates from Jiri Kosina:
"This time it's surprisingly quiet (probably due to the christmas
break):
- Logitech HID++ protocol improvements from Mazin Rezk, Pedro
Vanzella and Adrian Freund
- support for hidraw uniq ioctl from Marcel Holtmann"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: logitech-hidpp: avoid duplicate error handling code in 'hidpp_probe()'
hid-logitech-hidpp: read battery voltage from newer devices
HID: logitech: Add MX Master 3 Mouse
HID: logitech-hidpp: Support WirelessDeviceStatus connect events
HID: logitech-hidpp: Support translations from short to long reports
HID: hidraw: add support uniq ioctl
Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of
a device are modified using ETHTOOL_MSG_WOL_SET netlink message or
ETHTOOL_SWOL ioctl request.
As notifications can be received by anyone, do not include SecureOn(tm)
password in notification messages.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement WOL_SET netlink request to set wake-on-lan settings. This is
equivalent to ETHTOOL_SWOL ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement WOL_GET request to get wake-on-lan settings for a device,
traditionally available via ETHTOOL_GWOL ioctl request.
As part of the implementation, provide symbolic names for wake-on-line
modes as ETH_SS_WOL_MODES string set.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message
mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message
or ETHTOOL_SMSGLVL ioctl request.
The notification message has the same format as reply to DEBUG_GET request.
As with other ethtool notifications, netlink requests only trigger the
notification if the mask is actually changed while ioctl request trigger it
whenever the request results in calling the ethtool_ops handler.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement DEBUG_SET netlink request to set debugging settings for a device.
At the moment, only message mask corresponding to message level as set by
ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement DEBUG_GET request to get debugging settings for a device. At the
moment, only message mask corresponding to message level as reported by
ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)
As part of the implementation, provide symbolic names for message mask bits
as ETH_SS_MSG_CLASSES string set.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.
This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc->field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.
In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.
While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.
For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:
NFTA_SET_ELEM_KEY: 192.0.2.0 . 22
NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25
and NFTA_SET_DESC_CONCAT attribute would contain:
NFTA_LIST_ELEM
NFTA_SET_FIELD_LEN: 4
NFTA_LIST_ELEM
NFTA_SET_FIELD_LEN: 2
v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.
This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.
v4: No changes
v3: New patch
[sbrivio: refactor error paths and labels; add corresponding
nft_set_ext_type for new key; rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Using IPv6 flow-label to swiftly route around avoid congested or
disconnected network path can greatly improve TCP reliability.
This patch adds SNMP counters and a OPT_STATS counter to track both
host-level and connection-level statistics. Network administrators
can use these counters to evaluate the impact of this new ability better.
Export count for rehash attempts to
1) two SNMP counters: TcpTimeoutRehash (rehash due to timeouts),
and TcpDuplicateDataRehash (rehash due to receiving duplicate
packets)
2) Timestamping API SOF_TIMESTAMPING_OPT_STATS.
Signed-off-by: Abdul Kabbani <akabbani@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first per-vlan option added is state, it is needed for EVPN and for
per-vlan STP. The state allows to control the forwarding on per-vlan
basis. The vlan state is considered only if the port state is forwarding
in order to avoid conflicts and be consistent. br_allowed_egress is
called only when the state is forwarding, but the ingress case is a bit
more complicated due to the fact that we may have the transition between
port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still
allow the bridge to learn from the packet after vlan filtering and it will
be dropped after that. Also to optimize the pvid state check we keep a
copy in the vlan group to avoid one lookup. The state members are
modified with *_ONCE() to annotate the lockless access.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for option modification of single vlans and
ranges. It allows to only modify options, i.e. skip create/delete by
using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range
option changes we try to pack the notifications as much as possible.
v2: do full port (all vlans) notification only when creating/deleting
vlans for compatibility, rework the range detection when changing
options, add more verbose extack errors and check if a vlan should
be used (br_vlan_should_use checks)
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The idxd driver introduces the Intel Data Stream Accelerator [1] that will
be available on future Intel Xeon CPUs. One of the kernel access
point for the driver is through the dmaengine subsystem. It will initially
provide the DMA copy service to the kernel.
Some of the main functionality introduced with this accelerator
are: shared virtual memory (SVM) support, and descriptor submission using
Intel CPU instructions movdir64b and enqcmds. There will be additional
accelerator devices that share the same driver with variations to
capabilities.
This commit introduces the probe and initialization component of the
driver.
[1]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965023991.73301.6186843973135311580.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Avoid a pointless dependency on buffer heads in bcache by simply open
coding reading a single page. Also add a SB_OFFSET define for the
byte offset of the superblock instead of using magic numbers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Split out an on-disk version struct cache_sb with the proper endianness
annotations. This fixes a fair chunk of sparse warnings, but there are
some left due to the way the checksum is defined.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Principles:
- Packets are classified on flows.
- This is a Stochastic model (as we use a hash, several flows might
be hashed to the same slot)
- Each flow has a PIE managed queue.
- Flows are linked onto two (Round Robin) lists,
so that new flows have priority on old ones.
- For a given flow, packets are not reordered.
- Drops during enqueue only.
- ECN capability is off by default.
- ECN threshold (if ECN is enabled) is at 10% by default.
- Uses timestamps to calculate queue delay by default.
Usage:
tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ]
[ target TIME ] [ tupdate TIME ]
[ alpha NUMBER ] [ beta NUMBER ]
[ quantum BYTES ] [ memory_limit BYTES ]
[ ecnprob PERCENTAGE ] [ [no]ecn ]
[ [no]bytemode ] [ [no_]dq_rate_estimator ]
defaults:
limit: 10240 packets, flows: 1024
target: 15 ms, tupdate: 15 ms (in jiffies)
alpha: 1/8, beta : 5/4
quantum: device MTU, memory_limit: 32 Mb
ecnprob: 10%, ecn: off
bytemode: off, dq_rate_estimator: off
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com>
Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-01-22
The following pull-request contains BPF updates for your *net-next* tree.
We've added 92 non-merge commits during the last 16 day(s) which contain
a total of 320 files changed, 7532 insertions(+), 1448 deletions(-).
The main changes are:
1) function by function verification and program extensions from Alexei.
2) massive cleanup of selftests/bpf from Toke and Andrii.
3) batched bpf map operations from Brian and Yonghong.
4) tcp congestion control in bpf from Martin.
5) bulking for non-map xdp_redirect form Toke.
6) bpf_send_signal_thread helper from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a helper to read the 64bit jiffies. It will be used
in a later patch to implement the bpf_cubic.c.
The helper is inlined for jit_requested and 64 BITS_PER_LONG
as the map_gen_lookup(). Other cases could be considered together
with map_gen_lookup() if needed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
Introduce dynamic program extensions. The users can load additional BPF
functions and replace global functions in previously loaded BPF programs while
these programs are executing.
Global functions are verified individually by the verifier based on their types only.
Hence the global function in the new program which types match older function can
safely replace that corresponding function.
This new function/program is called 'an extension' of old program. At load time
the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function
to be replaced. The BPF program type is derived from the target program into
extension program. Technically bpf_verifier_ops is copied from target program.
The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops.
The extension program can call the same bpf helper functions as target program.
Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program
types. The verifier allows only one level of replacement. Meaning that the
extension program cannot recursively extend an extension. That also means that
the maximum stack size is increasing from 512 to 1024 bytes and maximum
function nesting level from 8 to 16. The programs don't always consume that
much. The stack usage is determined by the number of on-stack variables used by
the program. The verifier could have enforced 512 limit for combined original
plus extension program, but it makes for difficult user experience. The main
use case for extensions is to provide generic mechanism to plug external
programs into policy program or function call chaining.
BPF trampoline is used to track both fentry/fexit and program extensions
because both are using the same nop slot at the beginning of every BPF
function. Attaching fentry/fexit to a function that was replaced is not
allowed. The opposite is true as well. Replacing a function that currently
being analyzed with fentry/fexit is not allowed. The executable page allocated
by BPF trampoline is not used by program extensions. This inefficiency will be
optimized in future patches.
Function by function verification of global function supports scalars and
pointer to context only. Hence program extensions are supported for such class
of global functions only. In the future the verifier will be extended with
support to pointers to structures, arrays with sizes, etc.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4obx8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqNwD/99ae+ezi7LSVj9zQml7y/6ZSV4D3wzD9PJ
7QsUq5kGA0tisZ9q/rd0eja4Fy2Dw/qhX+GXgTYLt9+a66rp0CskWaD9NWFMtFGp
eBgitruw5SqFl8GfNCjd6NB/Af3NGyrmQSPV58K7mma6zQX7ELCrEdipCKj5QpNk
eHO0enZA1KcPliegAbDQfhz7U9frns9nSs0VHB599X9jr5pi8PPejukVEBwK67o6
Dh52CDqjeKksX8PWxhXau9j8DNt85Zs8ocRFvgWD8/UQSLcHAM6DFLdkeGEHQu9H
QouW9JRFzmTksy3KvPnCcCPdsYQrqVmj6fCCg6AXW3yOzcI1IvhO+hcNMQtkLFWI
5JKYZkFhGjCsypmkYpB+5mqcz+fsbkfgN9clU1tvPK8FcmpLolsIQrUJdaRe6r58
odDe9Qs+I46LAYKttkkAlpYg1E9CD0T7g1ENXzcqb5t6fZTW4oU0Wpqen788WQqz
EQqp30kU0FgnFAW8BUpJK5iwrrm3RrS+Br6lhk33BeA423Pt6n3RnXYFVvtAHeuA
jyUVqiMKexi+7fCC2LO1M9xofQMmr6z2nVkZNhDLIr4y9uxD4xTyiaEAFjk6Lws6
lcSWZMHQPKaCqfxhAtnoVZP96k6zMwfEJUb+fANX9SI0+3p9LHFz2Kp/AOs6GvJC
/A5vCFjLWw==
=oNaB
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.5-2020-01-22' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"This was supposed to have gone in last week, but due to a brain fart
on my part, I forgot that we made this struct addition in the 5.5
cycle. So here it is for 5.5, to prevent having a 32 vs 64-bit
compatability issue with the files_update command"
* tag 'io_uring-5.5-2020-01-22' of git://git.kernel.dk/linux-block:
io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2020-01-21
1) Add support for TCP encapsulation of IKE and ESP messages,
as defined by RFC 8229. Patchset from Sabrina Dubroca.
Please note that there is a merge conflict in:
net/unix/af_unix.c
between commit:
3c32da19a8 ("unix: Show number of pending scm files of receive queue in fdinfo")
from the net-next tree and commit:
b50b0580d2 ("net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram")
from the ipsec-next tree.
The conflict can be solved as done in linux-next.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This enables you to configure mode (DTE/DCE), Modulo, Window, T1, T2, N2 via
sethdlc (which needs to be patched as well).
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
For each IOSQE_* flag there is a corresponding REQ_F_* flag. And there
is a repetitive pattern of their translation:
e.g. if (sqe->flags & SQE_FLAG*) req->flags |= REQ_F_FLAG*
Use same numeric values/bits for them and copy instead of manual
handling.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The application currently has no way of knowing if a given opcode is
supported or not without having to try and issue one and see if we get
-EINVAL or not. And even this approach is fraught with peril, as maybe
we're getting -EINVAL due to some fields being missing, or maybe it's
just not that easy to issue that particular command without doing some
other leg work in terms of setup first.
This adds IORING_REGISTER_PROBE, which fills in a structure with info
on what it supported or not. This will work even with sparse opcode
fields, which may happen in the future or even today if someone
backports specific features to older kernels.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add support for the new openat2(2) system call. It's trivial to do, as
we can have openat(2) just be wrapped around it.
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If an application is using eventfd notifications with poll to know when
new SQEs can be issued, it's expecting the following read/writes to
complete inline. And with that, it knows that there are events available,
and don't want spurious wakeups on the eventfd for those requests.
This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like
IORING_REGISTER_EVENTFD, except it only triggers notifications for events
that happen from async completions (IRQ, or io-wq worker completions).
Any completions inline from the submission itself will not trigger
notifications.
Suggested-by: Mark Papadakis <markuspapadakis@icloud.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some applications like to start small in terms of ring size, and then
ramp up as needed. This is a bit tricky to do currently, since we don't
advertise the max ring size.
This adds IORING_SETUP_CLAMP. If set, and the values for SQ or CQ ring
size exceed what we support, then clamp them at the max values instead
of returning -EINVAL. Since we return the chosen ring sizes after setup,
no further changes are needed on the application side. io_uring already
changes the ring sizes if the application doesn't ask for power-of-two
sizes, for example.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This adds support for doing madvise(2) through io_uring. We assume that
any operation can block, and hence punt everything async. This could be
improved, but hard to make bullet proof. The async punt ensures it's
safe.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This adds support for doing fadvise through io_uring. We assume that
WILLNEED doesn't block, but that DONTNEED may block.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This behaves like preadv2/pwritev2 with offset == -1, it'll use (and
update) the current file position. This obviously comes with the caveat
that if the application has multiple read/writes in flight, then the
end result will not be as expected. This is similar to threads sharing
a file descriptor and doing IO using the current file position.
Since this feature isn't easily detectable by doing a read or write,
add a feature flags, IORING_FEAT_RW_CUR_POS, to allow applications to
detect presence of this feature.
Reported-by: 李通洲 <carter.li@eoitek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For uses cases that don't already naturally have an iovec, it's easier
(or more convenient) to just use a buffer address + length. This is
particular true if the use case is from languages that want to create
a memory safe abstraction on top of io_uring, and where introducing
the need for the iovec may impose an ownership issue. For those cases,
they currently need an indirection buffer, which means allocating data
just for this purpose.
Add basic read/write that don't require the iovec.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring defaults to always doing inline submissions, if at all
possible. But for larger copies, even if the data is fully cached, that
can take a long time. Add an IOSQE_ASYNC flag that the application can
set on the SQE - if set, it'll ensure that we always go async for those
kinds of requests. Use the io-wq IO_WQ_WORK_CONCURRENT flag to ensure we
get the concurrency we desire for this case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently fully quiesce the ring before an unregister or update of
the fixed fileset. This is very expensive, and we can be a bit smarter
about this.
Add a percpu refcount for the file tables as a whole. Grab a percpu ref
when we use a registered file, and put it on completion. This is cheap
to do. Upon removal of a file from a set, switch the ref count to atomic
mode. When we hit zero ref on the completion side, then we know we can
drop the previously registered files. When the old files have been
dropped, switch the ref back to percpu mode for normal operation.
Since there's a period between doing the update and the kernel being
done with it, add a IORING_OP_FILES_UPDATE opcode that can perform the
same action. The application knows the update has completed when it gets
the CQE for it. Between doing the update and receiving this completion,
the application must continue to use the unregistered fd if submitting
IO on this particular file.
This takes the runtime of test/file-register from liburing from 14s to
about 0.7s.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This works just like close(2), unsurprisingly. We remove the file
descriptor and post the completion inline, then offload the actual
(potential) last file put to async context.
Mark the async part of this work as uncancellable, as we really must
guarantee that the latter part of the close is run.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This works just like openat(2), except it can be performed async. For
the normal case of a non-blocking path lookup this will complete
inline. If we have to do IO to perform the open, it'll be done from
async context.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space. In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it. Also, align the field naturally and check that
no garbage is passed there.
Fixes: c3a31e6056 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.
In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.
Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).
/* Syscall Prototype. */
/*
* open_how is an extensible structure (similar in interface to
* clone3(2) or sched_setattr(2)). The size parameter must be set to
* sizeof(struct open_how), to allow for future extensions. All future
* extensions will be appended to open_how, with their zero value
* acting as a no-op default.
*/
struct open_how { /* ... */ };
int openat2(int dfd, const char *pathname,
struct open_how *how, size_t size);
/* Description. */
The initial version of 'struct open_how' contains the following fields:
flags
Used to specify openat(2)-style flags. However, any unknown flag
bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
will result in -EINVAL. In addition, this field is 64-bits wide to
allow for more O_ flags than currently permitted with openat(2).
mode
The file mode for O_CREAT or O_TMPFILE.
Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
resolve
Restrict path resolution (in contrast to O_* flags they affect all
path components). The current set of flags are as follows (at the
moment, all of the RESOLVE_ flags are implemented as just passing
the corresponding LOOKUP_ flag).
RESOLVE_NO_XDEV => LOOKUP_NO_XDEV
RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS
RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
RESOLVE_BENEATH => LOOKUP_BENEATH
RESOLVE_IN_ROOT => LOOKUP_IN_ROOT
open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.
Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).
After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.
/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.
In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).
/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).
Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.
Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb8 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR
and XOR. Extend it to do shifts as well.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new bitwise netlink attribute that will be used by shift
operations to store the size of the shift. It is not used by boolean
operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a
value of a new enum, nft_bitwise_ops. It describes the type of
operation an expression contains. Currently, it only has one value:
NFT_BITWISE_BOOL. More values will be added later to implement shifts.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The comment documenting how bitwise expressions work includes a table
which summarizes the mask and xor arguments combined to express the
supported boolean operations. However, the row for OR:
mask xor
0 x
is incorrect.
dreg = (sreg & 0) ^ x
is not equivalent to:
dreg = sreg | x
What the code actually does is:
dreg = (sreg & ~x) ^ x
Update the documentation to match.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- bump version strings, by Simon Wunderlich
- fix typo and kerneldocs, by Sven Eckelmann
- use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer
- silence some endian sparse warnings by adding annotations,
by Sven Eckelmann
- Update copyright years to 2020, by Sven Eckelmann
- Disable deprecated sysfs configuration by default, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl4dzoQWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeodEbD/0fM2O/hx/CsBoWDt9rU+tA/Nh8
CWYUn9fjghG3RpSCzIjqpcwSe4T09KHw7BvPBiWpVNmUYeNz7aLmujbaj1DmAY8l
l50VHtJYM1aCIC8O3UcqznrUtoyEdAi78gpxUXLVNABJ9t9gkpXaS3bM06gs5QW4
7Y5KSpN9O7a7QrLByb8KZUuZrHCSUzrVJHkc3eZ4FqAk3jRw0LjaMIkoQaKImc9D
wOUTU0rUvuTNBZ9jteChWkfpHwa1QrRgoxYiw5d5u7FT7yOZE1NoQUbm0ZzhdMrO
QLkcdN8n6gI8waDP47cxwNmSRpRS8S7PIyG2MSi1LNoENUcdHuPuPFAnBbXWyUqc
QAXSe+JS2MKAT13Tt8+y34WWCtT94K7Hko1SbT4WpawdF1G+DzT2BNI9p+kOJLRV
Ap7OHTwrZHqkTyQLbMEYdfIPFwMilYEc+O24kjOw6OCSt45F1l9bad4IwvE/iASb
HNzLO2Jlj5gUMXShbcIzC3PGYSMWpZKoX3075SqOXjcOKfF/jUYfaUt8WSUvi1bA
CcEMD7FMrVa0mBkhG2WUO9f/ARlfTckhg/ijD4NSDLy7CsjhWXZL7lC9N7R43ktM
LThTJ+ey1P+K6q1OH4ytvlApSwHC1OdaA6zUhoht9XQDHrnLf6gDpTDd4xn8zKPZ
BXbGv9bCi3/df6ocCw==
=xeMk
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20200114' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- fix typo and kerneldocs, by Sven Eckelmann
- use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer
- silence some endian sparse warnings by adding annotations,
by Sven Eckelmann
- Update copyright years to 2020, by Sven Eckelmann
- Disable deprecated sysfs configuration by default, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
htab can't use generic batch support due some problematic behaviours
inherent to the data structre, i.e. while iterating the bpf map a
concurrent program might delete the next entry that batch was about to
use, in that case there's no easy solution to retrieve the next entry,
the issue has been discussed multiple times (see [1] and [2]).
The only way hmap can be traversed without the problem previously
exposed is by making sure that the map is traversing entire buckets.
This commit implements those strict requirements for hmap, the
implementation follows the same interaction that generic support with
some exceptions:
- If keys/values buffer are not big enough to traverse a bucket,
ENOSPC will be returned.
- out_batch contains the value of the next bucket in the iteration, not
the next key, but this is transparent for the user since the user
should never use out_batch for other than bpf batch syscalls.
This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new
command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete
batch ops it is possible to use the generic implementations.
[1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
[2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com
This commit adds generic support for update and delete batch ops that
can be used for almost all the bpf maps. These commands share the same
UAPI attr that lookup and lookup_and_delete batch ops use and the
syscall commands are:
BPF_MAP_UPDATE_BATCH
BPF_MAP_DELETE_BATCH
The main difference between update/delete and lookup batch ops is that
for update/delete keys/values must be specified for userspace and
because of that, neither in_batch nor out_batch are used.
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com
This commit introduces generic support for the bpf_map_lookup_batch.
This implementation can be used by almost all the bpf maps since its core
implementation is relying on the existing map_get_next_key and
map_lookup_elem. The bpf syscall subcommand introduced is:
BPF_MAP_LOOKUP_BATCH
The UAPI attribute is:
struct { /* struct used by BPF_MAP_*_BATCH commands */
__aligned_u64 in_batch; /* start batch,
* NULL to start from beginning
*/
__aligned_u64 out_batch; /* output: next start batch */
__aligned_u64 keys;
__aligned_u64 values;
__u32 count; /* input/output:
* input: # of key/value
* elements
* output: # of filled elements
*/
__u32 map_fd;
__u64 elem_flags;
__u64 flags;
} batch;
in_batch/out_batch are opaque values use to communicate between
user/kernel space, in_batch/out_batch must be of key_size length.
To start iterating from the beginning in_batch must be null,
count is the # of key/value elements to retrieve. Note that the 'keys'
buffer must be a buffer of key_size * count size and the 'values' buffer
must be value_size * count, where value_size must be aligned to 8 bytes
by userspace if it's dealing with percpu maps. 'count' will contain the
number of keys/values successfully retrieved. Note that 'count' is an
input/output variable and it can contain a lower value after a call.
If there's no more entries to retrieve, ENOENT will be returned. If error
is ENOENT, count might be > 0 in case it copied some values but there were
no more entries to retrieve.
Note that if the return code is an error and not -EFAULT,
count indicates the number of elements successfully processed.
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com
Commit 8b401f9ed2 ("bpf: implement bpf_send_signal() helper")
added helper bpf_send_signal() which permits bpf program to
send a signal to the current process. The signal may be
delivered to any threads in the process.
We found a use case where sending the signal to the current
thread is more preferable.
- A bpf program will collect the stack trace and then
send signal to the user application.
- The user application will add some thread specific
information to the just collected stack trace for
later analysis.
If bpf_send_signal() is used, user application will need
to check whether the thread receiving the signal matches
the thread collecting the stack by checking thread id.
If not, it will need to send signal to another thread
through pthread_kill().
This patch proposed a new helper bpf_send_signal_thread(),
which sends the signal to the thread corresponding to
the current kernel task. This way, user space is guaranteed that
bpf_program execution context and user space signal handling
context are the same thread.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com
Add the new flash_info registers struct and the implementation of
ioctl_flash_part_info() for the new Gen4 hardware.
[logang@deltatee.com: rewrote commit message]
Link: https://lore.kernel.org/r/20200115035648.2578-7-logang@deltatee.com
Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Gen4 hardware will have different values for the SWITCHTEC_X_RUNNING and
SWITCHTEC_IOCTL_NUM_PARTITIONS, so rename them with GEN3 in their name.
No functional changes intended.
Link: https://lore.kernel.org/r/20200115035648.2578-2-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add support for the Inter Fabric Manager Communication (Intercomm) Notify
event in PAX variants of Switchtec hardware and the Upstream Error
Containment port in the MR1 release of Gen3 firmware.
Link: https://lore.kernel.org/r/20200106190337.2428-4-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN
and add support for sending vlan notifications (both single and ranges).
No functional changes intended, the notification support will be used by
later patches.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes
RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress
similar vlans into ranges. This will be also used when per-vlan options
are introduced and vlans' options match, they will be put into a single
range which is encapsulated in one netlink attribute. We need to run
similar checks as br_process_vlan_info() does because these ranges will
be used for options setting and they'll be able to skip
br_process_vlan_info().
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds vlan rtm definitions:
- NEWVLAN: to be used for creating vlans, setting options and
notifications
- DELVLAN: to be used for deleting vlans
- GETVLAN: used for dumping vlan information
Dumping vlans which can span multiple messages is added now with basic
information (vid and flags). We use nlmsg_parse() to validate the header
length in order to be able to extend the message with filtering
attributes later.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the attributes, policy and parsing code to allow userland
to send the info about the BSS coloring settings to the kernel.
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20191217141921.8114-1-john@phrozen.org
[johannes: remove the strict policy parsing, that was a misunderstanding]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.
While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.
This patch adds an "offload" and "trap" indications to IPv4 routes, so
that users will have better visibility into the offload process.
'struct fib_alias' is extended with two new fields that indicate if the
route resides in hardware or not and if it is offloading traffic from
the kernel or trapping packets to it. Note that the new fields are added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].
Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
route's key in order to set the flags.
The indications are dumped to user space via a new flags (i.e.,
'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
ancillary header.
v2:
* Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()
[1]
struct fib_alias {
struct hlist_node fa_list; /* 0 16 */
struct fib_info * fa_info; /* 16 8 */
u8 fa_tos; /* 24 1 */
u8 fa_type; /* 25 1 */
u8 fa_state; /* 26 1 */
u8 fa_slen; /* 27 1 */
u32 tb_id; /* 28 4 */
s16 fa_default; /* 32 2 */
u8 offload:1; /* 34: 0 1 */
u8 trap:1; /* 34: 1 1 */
u8 unused:6; /* 34: 2 1 */
/* XXX 5 bytes hole, try to pack */
struct callback_head rcu __attribute__((__aligned__(8))); /* 40 16 */
/* size: 56, cachelines: 1, members: 12 */
/* sum members: 50, holes: 1, sum holes: 5 */
/* sum bitfield members: 8 bits (1 bytes) */
/* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MACsec offloading to underlying hardware devices is disabled by default
(the software implementation is used). This patch adds support for
changing this setting through the MACsec netlink interface. Many checks
are done when enabling offloading on a given MACsec interface as there
are limitations (it must be supported by the hardware, only a single
interface can be offloaded on a given physical device at a time, rules
can't be moved for now).
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the macsec_context structure. It will be used
in the kernel to exchange information between the common MACsec
implementation (macsec.c) and the MACsec hardware offloading
implementations. This structure contains pointers to MACsec specific
structures which contain the actual MACsec configuration, and to the
underlying device (phydev for now).
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Time Namespace isolates clock values.
The kernel provides access to several clocks CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.
CLOCK_REALTIME
System-wide clock that measures real (i.e., wall-clock) time.
CLOCK_MONOTONIC
Clock that cannot be set and represents monotonic time since
some unspecified starting point.
CLOCK_BOOTTIME
Identical to CLOCK_MONOTONIC, except it also includes any time
that the system is suspended.
For many users, the time namespace means the ability to changes date and
time in a container (CLOCK_REALTIME). Providing per namespace notions of
CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
value.
But in the context of checkpoint/restore functionality, monotonic and
boottime clocks become interesting. Both clocks are monotonic with
unspecified starting points. These clocks are widely used to measure time
slices and set timers. After restoring or migrating processes, it has to be
guaranteed that they never go backward. In an ideal case, the behavior of
these clocks should be the same as for a case when a whole system is
suspended. All this means that it is required to set CLOCK_MONOTONIC and
CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
offsets for clocks.
A time namespace is similar to a pid namespace in the way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of the process
will be born in the new time namespace, or a process can use the setns()
system call to join a namespace.
This scheme allows setting clock offsets for a namespace, before any
processes appear in it.
All available clone flags have been used, so CLONE_NEWTIME uses the highest
bit of CSIGNAL. It means that it can be used only with the unshare() and
the clone3() system calls.
[ tglx: Adjusted paragraph about clone3() to reality and massaged the
changelog a bit. ]
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://criu.org/Time_namespace
Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
New llvm and old llvm with libbpf help produce BTF that distinguish global and
static functions. Unlike arguments of static function the arguments of global
functions cannot be removed or optimized away by llvm. The compiler has to use
exactly the arguments specified in a function prototype. The argument type
information allows the verifier validate each global function independently.
For now only supported argument types are pointer to context and scalars. In
the future pointers to structures, sizes, pointer to packet data can be
supported as well. Consider the following example:
static int f1(int ...)
{
...
}
int f3(int b);
int f2(int a)
{
f1(a) + f3(a);
}
int f3(int b)
{
...
}
int main(...)
{
f1(...) + f2(...) + f3(...);
}
The verifier will start its safety checks from the first global function f2().
It will recursively descend into f1() because it's static. Then it will check
that arguments match for the f3() invocation inside f2(). It will not descend
into f3(). It will finish f2() that has to be successfully verified for all
possible values of 'a'. Then it will proceed with f3(). That function also has
to be safe for all possible values of 'b'. Then it will start subprog 0 (which
is main() function). It will recursively descend into f1() and will skip full
check of f2() and f3(), since they are global. The order of processing global
functions doesn't affect safety, since all global functions must be proven safe
based on their arguments only.
Such function by function verification can drastically improve speed of the
verification and reduce complexity.
Note that the stack limit of 512 still applies to the call chain regardless whether
functions were static or global. The nested level of 8 also still applies. The
same recursion prevention checks are in place as well.
The type information and static/global kind is preserved after the verification
hence in the above example global function f2() and f3() can be replaced later
by equivalent functions with the same types that are loaded and verified later
without affecting safety of this main() program. Such replacement (re-linking)
of global functions is a subject of future patches.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org
To open a MPTCP socket with socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP),
IPPROTO_MPTCP needs a value that differs from IPPROTO_TCP. The existing
IPPROTO numbers mostly map directly to IANA-specified protocol numbers.
MPTCP does not have a protocol number allocated because MPTCP packets
use the TCP protocol number. Use private number not used OTA.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull input fixes from Dmitry Torokhov:
"Just a few small fixups here"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: imx_sc_key - only take the valid data from SCU firmware as key state
Input: add safety guards to input_set_keycode()
Input: input_event - fix struct padding on sparc64
Input: uinput - always report EPOLLOUT
The ungrafting from PRIO bug fixes in net, when merged into net-next,
merge cleanly but create a build failure. The resolution used here is
from Petr Machata.
Signed-off-by: David S. Miller <davem@davemloft.net>
Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects
attach_flags what may not be obvious and what may lead to confision.
Specifically attach_flags is returned only for target_fd but if programs
are inherited from an ancestor cgroup then returned attach_flags for
current cgroup may be confusing. For example, two effective programs of
same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in
attach_flags.
Simple repro:
# bpftool c s /sys/fs/cgroup/path/to/task
ID AttachType AttachFlags Name
# bpftool c s /sys/fs/cgroup/path/to/task effective
ID AttachType AttachFlags Name
95043 ingress tw_ipt_ingress
95048 ingress tw_ingress
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com
Add a helper to send out a tcp-ack. It will be used in the later
bpf_dctcp implementation that requires to send out an ack
when the CE state changed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109004551.3900448-1-kafai@fb.com
The patch introduces BPF_MAP_TYPE_STRUCT_OPS. The map value
is a kernel struct with its func ptr implemented in bpf prog.
This new map is the interface to register/unregister/introspect
a bpf implemented kernel struct.
The kernel struct is actually embedded inside another new struct
(or called the "value" struct in the code). For example,
"struct tcp_congestion_ops" is embbeded in:
struct bpf_struct_ops_tcp_congestion_ops {
refcount_t refcnt;
enum bpf_struct_ops_state state;
struct tcp_congestion_ops data; /* <-- kernel subsystem struct here */
}
The map value is "struct bpf_struct_ops_tcp_congestion_ops".
The "bpftool map dump" will then be able to show the
state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g.
number of tcp_sock in the tcp_congestion_ops case). This "value" struct
is created automatically by a macro. Having a separate "value" struct
will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding
"void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some
initialization works before registering the struct_ops to the kernel
subsystem). The libbpf will take care of finding and populating the
"struct bpf_struct_ops_XYZ" from "struct XYZ".
Register a struct_ops to a kernel subsystem:
1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s)
2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id
set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the
running kernel.
Instead of reusing the attr->btf_value_type_id,
btf_vmlinux_value_type_id s added such that attr->btf_fd can still be
used as the "user" btf which could store other useful sysadmin/debug
info that may be introduced in the furture,
e.g. creation-date/compiler-details/map-creator...etc.
3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described
in the running kernel btf. Populate the value of this object.
The function ptr should be populated with the prog fds.
4. Call BPF_MAP_UPDATE with the object created in (3) as
the map value. The key is always "0".
During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's
args as an array of u64 is generated. BPF_MAP_UPDATE also allows
the specific struct_ops to do some final checks in "st_ops->init_member()"
(e.g. ensure all mandatory func ptrs are implemented).
If everything looks good, it will register this kernel struct
to the kernel subsystem. The map will not allow further update
from this point.
Unregister a struct_ops from the kernel subsystem:
BPF_MAP_DELETE with key "0".
Introspect a struct_ops:
BPF_MAP_LOOKUP_ELEM with key "0". The map value returned will
have the prog _id_ populated as the func ptr.
The map value state (enum bpf_struct_ops_state) will transit from:
INIT (map created) =>
INUSE (map updated, i.e. reg) =>
TOBEFREE (map value deleted, i.e. unreg)
The kernel subsystem needs to call bpf_struct_ops_get() and
bpf_struct_ops_put() to manage the "refcnt" in the
"struct bpf_struct_ops_XYZ". This patch uses a separate refcnt
for the purose of tracking the subsystem usage. Another approach
is to reuse the map->refcnt and then "show" (i.e. during map_lookup)
the subsystem's usage by doing map->refcnt - map->usercnt to filter out
the map-fd/pinned-map usage. However, that will also tie down the
future semantics of map->refcnt and map->usercnt.
The very first subsystem's refcnt (during reg()) holds one
count to map->refcnt. When the very last subsystem's refcnt
is gone, it will also release the map->refcnt. All bpf_prog will be
freed when the map->refcnt reaches 0 (i.e. during map_free()).
Here is how the bpftool map command will look like:
[root@arch-fb-vm1 bpf]# bpftool map show
6: struct_ops name dctcp flags 0x0
key 4B value 256B max_entries 1 memlock 4096B
btf_id 6
[root@arch-fb-vm1 bpf]# bpftool map dump id 6
[{
"value": {
"refcnt": {
"refs": {
"counter": 1
}
},
"state": 1,
"data": {
"list": {
"next": 0,
"prev": 0
},
"key": 0,
"flags": 2,
"init": 24,
"release": 0,
"ssthresh": 25,
"cong_avoid": 30,
"set_state": 27,
"cwnd_event": 28,
"in_ack_event": 26,
"undo_cwnd": 29,
"pkts_acked": 0,
"min_tso_segs": 0,
"sndbuf_expand": 0,
"cong_control": 0,
"get_info": 0,
"name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0
],
"owner": 0
}
}
}
]
Misc Notes:
* bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup.
It does an inplace update on "*value" instead returning a pointer
to syscall.c. Otherwise, it needs a separate copy of "zero" value
for the BPF_STRUCT_OPS_STATE_INIT to avoid races.
* The bpf_struct_ops_map_delete_elem() is also called without
preempt_disable() from map_delete_elem(). It is because
the "->unreg()" may requires sleepable context, e.g.
the "tcp_unregister_congestion_control()".
* "const" is added to some of the existing "struct btf_func_model *"
function arg to avoid a compiler warning caused by this patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com
This patch allows the kernel's struct ops (i.e. func ptr) to be
implemented in BPF. The first use case in this series is the
"struct tcp_congestion_ops" which will be introduced in a
latter patch.
This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS.
The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular
func ptr of a kernel struct. The attr->attach_btf_id is the btf id
of a kernel struct. The attr->expected_attach_type is the member
"index" of that kernel struct. The first member of a struct starts
with member index 0. That will avoid ambiguity when a kernel struct
has multiple func ptrs with the same func signature.
For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written
to implement the "init" func ptr of the "struct tcp_congestion_ops".
The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops"
of the _running_ kernel. The attr->expected_attach_type is 3.
The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved
by arch_prepare_bpf_trampoline that will be done in the next
patch when introducing BPF_MAP_TYPE_STRUCT_OPS.
"struct bpf_struct_ops" is introduced as a common interface for the kernel
struct that supports BPF_PROG_TYPE_STRUCT_OPS prog. The supporting kernel
struct will need to implement an instance of the "struct bpf_struct_ops".
The supporting kernel struct also needs to implement a bpf_verifier_ops.
During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right
bpf_verifier_ops by searching the attr->attach_btf_id.
A new "btf_struct_access" is also added to the bpf_verifier_ops such
that the supporting kernel struct can optionally provide its own specific
check on accessing the func arg (e.g. provide limited write access).
After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called
to initialize some values (e.g. the btf id of the supporting kernel
struct) and it can only be done once the btf_vmlinux is available.
The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog
if the return type of the prog->aux->attach_func_proto is "void".
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com
Add support to PCIe RC controller on Intel Gateway SoCs.
PCIe controller is based of Synopsys DesignWare PCIe core.
Intel PCIe driver requires Upconfigure support, Fast Training
Sequence and link speed configurations. So adding the respective
helper functions in the PCIe DesignWare framework.
It also programs hardware autonomous speed during speed
configuration so defining it in pci_regs.h.
Also, mark Intel PCIe driver depends on MSI IRQ Domain
as Synopsys DesignWare framework depends on the
PCI_MSI_IRQ_DOMAIN.
Signed-off-by: Dilip Kota <eswara.kota@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4SYegeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG4m4H+QGCUN8SXN+2B+0/
BfzOf7PFoKzAx3NwDbJQIZqhSl+Zfa4n3VGPEF8sXsvoQgdYvuJnS/5JiAZ9iRIH
HAfFzegzQ3mCl8Du+SqCvQKs2Jt4OMCX62KGRebRBhpoKfZdwmN7n7pn9lWO771K
9rxTpeItXhmK46jOFRbi5oyQfmkfSfyUN1b9CB53FXFS+ZDkDNA7QQiIYnKOD7SZ
RrL7czhZ580QOC61qOlnz1GIhRzvU5SXg4OtuI3YfoOJRY5FKC3YtOgLReT0vPs+
vEhAyP93upVXIhqm10WHNjd4t4a45Vy5ff64uFsQ9QV4nnqsC2C70YwWbVDdtz/W
Lm0mvE8=
=NECs
-----END PGP SIGNATURE-----
Merge tag 'v5.5-rc5' into patchwork
Linux 5.5-rc5
* tag 'v5.5-rc5': (1006 commits)
Linux 5.5-rc5
Documentation: riscv: add patch acceptance guidelines
riscv: prefix IRQ_ macro names with an RV_ namespace
clocksource: riscv: add notrace to riscv_sched_clock
apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
hexagon: define ioremap_uc
ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
ocfs2: call journal flush to mark journal as empty after journal recovery when mount
mm/hugetlb: defer freeing of huge pages if in non-task context
mm/gup: fix memory leak in __gup_benchmark_ioctl
mm/oom: fix pgtables units mismatch in Killed process message
fs/posix_acl.c: fix kernel-doc warnings
hexagon: work around compiler crash
hexagon: parenthesize registers in asm predicates
fs/namespace.c: make to_mnt_ns() static
fs/nsfs.c: include headers for missing declarations
fs/direct-io.c: include fs/internal.h for missing prototype
mm: move_pages: return valid node id in status if the page is already on the target node
memcg: account security cred as well to kmemcg
kcov: fix struct layout for kcov_remote_arg
...
The separate blocking pool is going away. Start by ignoring
GRND_RANDOM in getentropy(2).
This should not materially break any API. Any code that worked
without this change should work at least as well with this change.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/705c5a091b63cc5da70c99304bb97e0109be0a26.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Typically a MAC PCS auto-configures itself after it receives the
negotiated copper-side link settings from the PHY, but some MAC devices
are more special and need manual interpretation of the SGMII AN result.
In other cases, the PCS exposes the entire tx_config_reg base page as it
is transmitted on the wire during auto-negotiation, so it makes sense to
be able to decode the equivalent lp_advertised bit mask from the raw u16
(of course, "lp" considering the PCS to be the local PHY).
Therefore, add the bit definitions for the SGMII registers 4 and 5
(local device ability, link partner ability), as well as a link_mode
conversion helper that can be used to feed the AN results into
phy_resolve_aneg_linkmode.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the layout of kcov_remote_arg the same for 32-bit and 64-bit code.
This makes it more convenient to write userspace apps that can be
compiled into 32-bit or 64-bit binaries and still work with the same
64-bit kernel.
Also use proper __u32 types in uapi headers instead of unsigned ints.
Link: http://lkml.kernel.org/r/9e91020876029cfefc9211ff747685eba9536426.1575638983.git.andreyknvl@google.com
Fixes: eec028c938 ("kcov: remote coverage support")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
Cc: "Jacky . Cao @ sony . com" <Jacky.Cao@sony.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds AMD-TEE driver.
* targets AMD APUs which has AMD Secure Processor with software-based
Trusted Execution Environment (TEE) support
* registers with TEE subsystem
* defines tee_driver_ops function callbacks
* kernel allocated memory is used as shared memory between normal
world and secure world.
* acts as REE (Rich Execution Environment) communication agent, which
uses the services of AMD Secure Processor driver to submit commands
for processing in TEE environment
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Jens Wiklander <jens.wiklander@linaro.org>
Co-developed-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Signed-off-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Reviewed-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The v4l2_buffer structure contains a 'struct timeval' member that is
defined by the user space C library, creating an ABI incompatibility
when that gets updated to a 64-bit time_t.
As in v4l2_event, handle this with a special case in video_put_user()
and video_get_user() to replace the memcpy there.
Since the structure also contains a pointer, there are now two
native versions (on 32-bit systems) as well as two compat versions
(on 64-bit systems), which unfortunately complicates the compat
handler quite a bit.
Duplicating the existing handlers for the new types is a safe
conversion for now, but unfortunately this may turn into a
maintenance burden later. A larger-scale rework of the
compat code might be a better alternative, but is out of scope
of the y2038 work.
Sparc64 needs a special case because of their special suseconds_t
definition.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The v4l2_event structure contains a 'struct timespec' member that is
defined by the user space C library, creating an ABI incompatibility
when that gets updated to a 64-bit time_t.
While passing a 32-bit time_t here would be sufficient for CLOCK_MONOTONIC
timestamps, simply redefining the structure to use the kernel's
__kernel_old_timespec would not work for any library that uses a copy
of the linux/videodev2.h header file rather than including the copy from
the latest kernel headers.
This means the kernel has to be changed to handle both versions of the
structure layout on a 32-bit architecture. The easiest way to do this
is during the copy from/to user space.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
As a preparation for adding 64-bit time_t support in the uapi,
change the drivers to no longer care about the format of the
timestamp field in struct v4l2_buffer.
The v4l2_timeval_to_ns() function is no longer needed in the
kernel after this, but there is userspace code relying on
it to be part of the uapi header.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil-cisco@xs4all.nl: replace spaces by tabs]
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
UAPI Changes:
- Commandline parser: Add support for panel orientation, and per-mode options.
- Fix IOCTL naming for dma-buf heaps.
Cross-subsystem Changes:
- Rename DMA_HEAP_IOC_ALLOC to DMA_HEAP_IOCTL_ALLOC before it becomes abi.
- Change DMA-BUF system-heap's name to system.
- Fix leak in error handling in dma_heap_ioctl(), and make a symbol static.
- Fix udma-buf cpu access.
- Fix ti devicetree bindings.
Core Changes:
- Add CTA-861-G modes with VIC >= 193.
- Change error handling and remove bug_on in *drm_dev_init.
- Export drm_panel_of_backlight() correctly once more.
- Add support for lvds decoders.
- Convert drm/client and drm/(gem-,)fb-helper to drm-device based logging and update logging todo.
Driver Changes:
- Add support for dsi/px30 to rockchip.
- Add fb damage support to virtio.
- Use dma_resv locking wrappers in vc4, msm, etnaviv.
- Make functions in virtio static, and perform some simplifications.
- Add suspend support to sun4i.
- Add A64 mipi dsi support to sun4i.
- Add runtime pm suspend to komeda.
- Associated driver fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl4N6t8ACgkQ/lWMcqZw
E8P0ww/9EEa1W1nkaYmxfWCtmBV3D6QS4490jj62RMBXETezZmLPV11xpFqTPzcw
9vRwD7PwP+rIDPTnEcg8vIMnhDgZuUMGv93PZrFZMHxe4MHeykQ6BOj4pWEnrkr4
CQxC0exyIG8sQkH5+OngXkPnANPpzsegAAQ2rGbUf0HxxdZ1WeV3aqlQFo2YDpd9
c8ouYhgnIP4NfLPYnVN3NQs/hQIVJRJ9vOHr+o8k7Fn9YoFak7ry6UFsSAan4j7I
ZQDQzPnT5CQBBSRTh9vQinOexj5bkW3AFyNFA7mknv05LHYb1kMPIIqnY01pbi2w
SyWc5oqJwwdCFPCLZIUHZMOBKYqGKWP0KTjy7+QKx2ty+Sjgf3hTZwnVdtNVLFJe
7WsXP6Dg+PoSsSEGZuwGOzbr7GCJitSXhUs5GGiMbdbTPzr3rJsDLuyf9/Q1ObUC
F+yIKkcwYZogeXRShFFQ3wjAxEQ83yyuTchyagvqSoqFsT5ccUjuUqInGAbYifPS
QfhI1U9hQGmINqXPSkQYHXxMKg+Vl2KWvFknhmLIc0Cf3fRsu+wf3NAokrHsraxd
RINvo2U5XDhPctRYXaPjPiYtPlnikR69mhyGcd7VG81F72ECzZr/2q1NmsEMmUac
VqowhgoG8Tm4LcZHloMw4UlCtjV2esvztc2T6b95Mg6j1r4aav0=
=ye8f
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2020-01-02' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.6:
UAPI Changes:
- Commandline parser: Add support for panel orientation, and per-mode options.
- Fix IOCTL naming for dma-buf heaps.
Cross-subsystem Changes:
- Rename DMA_HEAP_IOC_ALLOC to DMA_HEAP_IOCTL_ALLOC before it becomes abi.
- Change DMA-BUF system-heap's name to system.
- Fix leak in error handling in dma_heap_ioctl(), and make a symbol static.
- Fix udma-buf cpu access.
- Fix ti devicetree bindings.
Core Changes:
- Add CTA-861-G modes with VIC >= 193.
- Change error handling and remove bug_on in *drm_dev_init.
- Export drm_panel_of_backlight() correctly once more.
- Add support for lvds decoders.
- Convert drm/client and drm/(gem-,)fb-helper to drm-device based logging and update logging todo.
Driver Changes:
- Add support for dsi/px30 to rockchip.
- Add fb damage support to virtio.
- Use dma_resv locking wrappers in vc4, msm, etnaviv.
- Make functions in virtio static, and perform some simplifications.
- Add suspend support to sun4i.
- Add A64 mipi dsi support to sun4i.
- Add runtime pm suspend to komeda.
- Associated driver fixes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/efc11139-1653-86bc-1b0f-0aefde219850@linux.intel.com
Add support for userspace to specify a device index to limit the scope
of an entry via the TCP_MD5SIG_EXT setsockopt. The existing __tcpm_pad
is renamed to tcpm_ifindex and the new field is only checked if the new
TCP_MD5SIG_FLAG_IFINDEX is set in tcpm_flags. For now, the device index
must point to an L3 master device (e.g., VRF). The API and error
handling are setup to allow the constraint to be relaxed in the future
to any device index.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
<linux/fscrypt.h> defines ioctl numbers using the macros like _IOWR()
which are defined in <linux/ioctl.h>, so <linux/ioctl.h> should be
included as a prerequisite, like it is in many other kernel headers.
In practice this doesn't really matter since anyone referencing these
ioctl numbers will almost certainly include <sys/ioctl.h> too in order
to actually call ioctl(). But we might as well fix this.
Link: https://lore.kernel.org/r/20191219185624.21251-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
specified by a Linux keyring key, rather than specified directly.
This is useful because fscrypt keys belong to a particular filesystem
instance, so they are destroyed when that filesystem is unmounted.
Usually this is desired. But in some cases, userspace may need to
unmount and re-mount the filesystem while keeping the keys, e.g. during
a system update. This requires keeping the keys somewhere else too.
The keys could be kept in memory in a userspace daemon. But depending
on the security architecture and assumptions, it can be preferable to
keep them only in kernel memory, where they are unreadable by userspace.
We also can't solve this by going back to the original fscrypt API
(where for each file, the master key was looked up in the process's
keyring hierarchy) because that caused lots of problems of its own.
Therefore, add the ability for FS_IOC_ADD_ENCRYPTION_KEY to accept a
Linux keyring key. This solves the problem by allowing userspace to (if
needed) save the keys securely in a Linux keyring for re-provisioning,
while still using the new fscrypt key management ioctls.
This is analogous to how dm-crypt accepts a Linux keyring key, but the
key is then stored internally in the dm-crypt data structures rather
than being looked up again each time the dm-crypt device is accessed.
Use a custom key type "fscrypt-provisioning" rather than one of the
existing key types such as "logon". This is strongly desired because it
enforces that these keys are only usable for a particular purpose: for
fscrypt as input to a particular KDF. Otherwise, the keys could also be
passed to any kernel API that accepts a "logon" key with any service
prefix, e.g. dm-crypt, UBIFS, or (recently proposed) AF_ALG. This would
risk leaking information about the raw key despite it ostensibly being
unreadable. Of course, this mistake has already been made for multiple
kernel APIs; but since this is a new API, let's do it right.
This patch has been tested using an xfstest which I wrote to test it.
Link: https://lore.kernel.org/r/20191119222447.226853-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Remove #ifdef pollution around nf_ingress(), from Lukas Wunner.
2) Document ingress hook in netdevice, also from Lukas.
3) Remove htons() in tunnel metadata port netlink attributes,
from Xin Long.
4) Missing erspan netlink attribute validation also from Xin Long.
5) Missing erspan version in tunnel, from Xin Long.
6) Missing attribute nest in NFTA_TUNNEL_KEY_OPTS_{VXLAN,ERSPAN}
Patch from Xin Long.
7) Missing nla_nest_cancel() in tunnel netlink dump path,
from Xin Long.
8) Remove two exported conntrack symbols with no clients,
from Florian Westphal.
9) Add nft_meta_get_eval_time() helper to nft_meta, from Florian.
10) Add nft_meta_pkttype helper for loopback, also from Florian.
11) Add nft_meta_socket uid helper, from Florian Westphal.
12) Add nft_meta_cgroup helper, from Florian.
13) Add nft_meta_ifkind helper, from Florian.
14) Group all interface related meta selector, from Florian.
15) Add nft_prandom_u32() helper, from Florian.
16) Add nft_meta_rtclassid helper, from Florian.
17) Add support for matching on the slave device index,
from Florian.
This batch, among other things, contains updates for the netfilter
tunnel netlink interface: This extension is still incomplete and lacking
proper userspace support which is actually my fault, I did not find the
time to go back and finish this. This update is breaking tunnel UAPI in
some aspects to fix it but do it better sooner than never.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement LINKSTATE_GET netlink request to get link state information.
At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command
is returned.
LINKSTATE_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link
settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET
netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.
The notification message has the same format as reply to LINKMODES_GET
request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.
As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKMODES_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement LINKMODES_SET netlink request to set advertised linkmodes and
related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do.
The request allows setting autonegotiation flag, speed, duplex and
advertised link modes.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement LINKMODES_GET netlink request to get link modes related
information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl
commands.
This request provides supported, advertised and peer advertised link modes,
autonegotiation flag, speed and duplex.
LINKMODES_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link
settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or
ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.
The notification message has the same format as reply to LINKINFO_GET
request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.
As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKINFO_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement LINKINFO_SET netlink request to set link settings queried by
LINKINFO_GET message.
Only physical port, phy MDIO address and MDI(-X) control can be set,
attempt to modify MDI(-X) status and transceiver is rejected.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement LINKINFO_GET netlink request to get basic link settings provided
by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands.
This request provides settings not directly related to autonegotiation and
link mode selection: physical port, phy MDIO address, MDI(-X) status,
MDI(-X) control and transceiver.
LINKINFO_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Requests a contents of one or more string sets, i.e. indexed arrays of
strings; this information is provided by ETHTOOL_GSSET_INFO and
ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all
information can be retrieved with one request and mulitple string sets can
be requested at once.
There are three types of requests:
- no NLM_F_DUMP, no device: get "global" stringsets
- no NLM_F_DUMP, with device: get string sets related to the device
- NLM_F_DUMP, no device: get device related string sets for all devices
Client can request either all string sets of given type (global or device
related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only
set sizes (numbers of strings) are returned.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add infrastructure for ethtool netlink notifications. There is only one
multicast group "monitor" which is used to notify userspace about changes
and actions performed. Notification messages (types using suffix _NTF)
share the format with replies to GET requests.
Notifications are supposed to be broadcasted on every configuration change,
whether it is done using the netlink interface or ioctl one. Netlink SET
requests only trigger a notification if some data is actually changed.
To trigger an ethtool notification, both ethtool netlink and external code
use ethtool_notify() helper. This helper requires RTNL to be held and may
sleep. Handlers sending messages for specific notification message types
are registered in ethnl_notify_handlers array. As notifications can be
triggered from other code, ethnl_ok flag is used to prevent an attempt to
send notification before genetlink family is registered.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool netlink code uses common framework for passing arbitrary
length bit sets to allow future extensions. A bitset can be a list (only
one bitmap) or can consist of value and mask pair (used e.g. when client
want to modify only some bits). A bitset can use one of two formats:
verbose (bit by bit) or compact.
Verbose format consists of bitset size (number of bits), list flag and
an array of bit nests, telling which bits are part of the list or which
bits are in the mask and which of them are to be set. In requests, bits
can be identified by index (position) or by name. In replies, kernel
provides both index and name. Verbose format is suitable for "one shot"
applications like standard ethtool command as it avoids the need to
either keep bit names (e.g. link modes) in sync with kernel or having to
add an extra roundtrip for string set request (e.g. for private flags).
Compact format uses one (list) or two (value/mask) arrays of 32-bit
words to store the bitmap(s). It is more suitable for long running
applications (ethtool in monitor mode or network management daemons)
which can retrieve the names once and then pass only compact bitmaps to
save space.
Userspace requests can use either format; ETHTOOL_FLAG_COMPACT_BITSETS
flag in request header tells kernel which format to use in reply.
Notifications always use compact format.
As some code uses arrays of unsigned long for internal representation and
some arrays of u32 (or even a single u32), two sets of parse/compose
helpers are introduced. To avoid code duplication, helpers for unsigned
long arrays are implemented as wrappers around helpers for u32 arrays.
There are two reasons for this choice: (1) u32 arrays are more frequent in
ethtool code and (2) unsigned long array can be always interpreted as an
u32 array on little endian 64-bit and all 32-bit architectures while we
would need special handling for odd number of u32 words in the opposite
direction.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add common request/reply header definition and helpers to parse request
header and fill reply header. Provide ethnl_update_* helpers to update
structure members from request attributes (to be used for *_SET requests).
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basic genetlink and init infrastructure for the netlink interface, register
genetlink family "ethtool". Add CONFIG_ETHTOOL_NETLINK Kconfig option to
make the build optional. Add initial overall interface description into
Documentation/networking/ethtool-netlink.rst, further patches will add more
detailed information.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-12-27
The following pull-request contains BPF updates for your *net-next* tree.
We've added 127 non-merge commits during the last 17 day(s) which contain
a total of 110 files changed, 6901 insertions(+), 2721 deletions(-).
There are three merge conflicts. Conflicts and resolution looks as follows:
1) Merge conflict in net/bpf/test_run.c:
There was a tree-wide cleanup c593642c8b ("treewide: Use sizeof_field() macro")
which gets in the way with b590cb5f80 ("bpf: Switch to offsetofend in
BPF_PROG_TEST_RUN"):
<<<<<<< HEAD
if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) +
sizeof_field(struct __sk_buff, priority),
=======
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority),
>>>>>>> 7c8dce4b16
There are a few occasions that look similar to this. Always take the chunk with
offsetofend(). Note that there is one where the fields differ in here:
<<<<<<< HEAD
if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) +
sizeof_field(struct __sk_buff, tstamp),
=======
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs),
>>>>>>> 7c8dce4b16
Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to
850a88cc40 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN").
2) Merge conflict in arch/riscv/net/bpf_jit_comp.c:
(I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.)
<<<<<<< HEAD
if (is_13b_check(off, insn))
return -1;
emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx);
=======
emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx);
>>>>>>> 7c8dce4b16
Result should look like:
emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
3) Merge conflict in arch/riscv/include/asm/pgtable.h:
<<<<<<< HEAD
=======
#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
#define VMALLOC_END (PAGE_OFFSET - 1)
#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
#define BPF_JIT_REGION_SIZE (SZ_128M)
#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
#define BPF_JIT_REGION_END (VMALLOC_END)
/*
* Roughly size the vmemmap space to be large enough to fit enough
* struct pages to map half the virtual address space. Then
* position vmemmap directly below the VMALLOC region.
*/
#define VMEMMAP_SHIFT \
(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
#define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT)
#define VMEMMAP_END (VMALLOC_START - 1)
#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
#define vmemmap ((struct page *)VMEMMAP_START)
>>>>>>> 7c8dce4b16
Only take the BPF_* defines from there and move them higher up in the
same file. Remove the rest from the chunk. The VMALLOC_* etc defines
got moved via 01f52e16b8 ("riscv: define vmemmap before pfn_to_page
calls"). Result:
[...]
#define __S101 PAGE_READ_EXEC
#define __S110 PAGE_SHARED_EXEC
#define __S111 PAGE_SHARED_EXEC
#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
#define VMALLOC_END (PAGE_OFFSET - 1)
#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
#define BPF_JIT_REGION_SIZE (SZ_128M)
#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
#define BPF_JIT_REGION_END (VMALLOC_END)
/*
* Roughly size the vmemmap space to be large enough to fit enough
* struct pages to map half the virtual address space. Then
* position vmemmap directly below the VMALLOC region.
*/
#define VMEMMAP_SHIFT \
(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
#define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT)
#define VMEMMAP_END (VMALLOC_START - 1)
#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
[...]
Let me know if there are any other issues.
Anyway, the main changes are:
1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific
to a provided BPF object file. This provides an alternative, simplified API
compared to standard libbpf interaction. Also, add libbpf extern variable
resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko.
2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by
generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also,
add various BPF riscv JIT improvements, from Björn Töpel.
3) Extend bpftool to allow matching BPF programs and maps by name,
from Paul Chaignon.
4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI
flag for allowing updates without service interruption, from Andrey Ignatov.
5) Cleanup and simplification of ring access functions for AF_XDP with a
bonus of 0-5% performance improvement, from Magnus Karlsson.
6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of
audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa.
7) Move and extend test_select_reuseport into BPF program tests under
BPF selftests, from Jakub Sitnicki.
8) Various BPF sample improvements for xdpsock for customizing parameters
to set up and benchmark AF_XDP, from Jay Jayatheerthan.
9) Improve libbpf to provide a ulimit hint on permission denied errors.
Also change XDP sample programs to attach in driver mode by default,
from Toke Høiland-Jørgensen.
10) Extend BPF test infrastructure to allow changing skb mark from tc BPF
programs, from Nikita V. Shirokov.
11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King.
12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after
libbpf conversion, from Jesper Dangaard Brouer.
13) Minor misc improvements from various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As the LACP actor/partner state is now part of the uapi, rename the
3ad state defines with LACP prefix. The LACP prefix is preferred over
BOND_3AD as the LACP standard moved to 802.1AX.
Fixes: 826f66b30c ("bonding: move 802.3ad port state flags to uapi")
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow to match on vrf slave ifindex or name.
In case there was no slave interface involved, store 0 in the
destination register just like existing iif/oif matching.
sdif(name) is restricted to the ipv4/ipv6 input and forward hooks,
as it depends on ip(6) stack parsing/storing info in skb->cb[].
Cc: Martin Willi <martin@strongswan.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Shrijeet Mukherjee <shrijeet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The 1588 standard defines one step operation for both Sync and
PDelay_Resp messages. Up until now, hardware with P2P one step has
been rare, and kernel support was lacking. This patch adds support of
the mode in anticipation of new hardware developments.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing PUSH MPLS action inserts MPLS header between ethernet header
and the IP header. Though this behaviour is fine for L3 VPN where an IP
packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
encapsulate the ethernet packet.
The new mpls action ADD_MPLS inserts MPLS header at the start of the
packet or at the start of the l3 header depending on the value of l3 tunnel
flag in the ADD_MPLS arguments.
POP_MPLS action is extended to support ethertype 0x6558.
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
including adding a missing ipv6 match description.
2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
Bhat.
3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.
4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.
5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.
6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
Chaignon.
7) Multicast MAC limit test is off by one in qede, from Manish Chopra.
8) Fix established socket lookup race when socket goes from
TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
RCU grace period. From Eric Dumazet.
9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.
10) Fix active backup transition after link failure in bonding, from
Mahesh Bandewar.
11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.
12) Fix wrong interface passed to ->mac_link_up(), from Russell King.
13) Fix DSA egress flooding settings in b53, from Florian Fainelli.
14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.
15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.
16) Reject invalid MTU values in stmmac, from Jose Abreu.
17) Fix refcount leak in error path of u32 classifier, from Davide
Caratti.
18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
Kaseorg.
19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.
20) Disable hardware GRO when XDP is attached to qede, frm Manish
Chopra.
21) Since we encode state in the low pointer bits, dst metrics must be
at least 4 byte aligned, which is not necessarily true on m68k. Add
annotations to fix this, from Geert Uytterhoeven.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
sfc: Include XDP packet headroom in buffer step size.
sfc: fix channel allocation with brute force
net: dst: Force 4-byte alignment of dst_metrics
selftests: pmtu: fix init mtu value in description
hv_netvsc: Fix unwanted rx_table reset
net: phy: ensure that phy IDs are correctly typed
mod_devicetable: fix PHY module format
qede: Disable hardware gro when xdp prog is installed
net: ena: fix issues in setting interrupt moderation params in ethtool
net: ena: fix default tx interrupt moderation interval
net/smc: unregister ib devices in reboot_event
net: stmmac: platform: Fix MDIO init for platforms without PHY
llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
net: hisilicon: Fix a BUG trigered by wrong bytes_compl
net: dsa: ksz: use common define for tag len
s390/qeth: don't return -ENOTSUPP to userspace
s390/qeth: fix promiscuous mode after reset
s390/qeth: handle error due to unsupported transport mode
cxgb4: fix refcount init for TC-MQPRIO offload
tc-testing: initial tdc selftests for cls_u32
...
To enable iproute2/tipc to generate backwards compatible
printouts and validate command parameters for nodes using a
<z.c.n> node address, it needs to be able to read the legacy
address flag from the kernel. The legacy address flag records
the way in which the node identity was originally specified.
The legacy address flag is requested by the netlink message
TIPC_NL_ADDR_LEGACY_GET. If the flag is set the attribute
TIPC_NLA_NET_ADDR_LEGACY is set in the return message.
Signed-off-by: John Rutherford <john.rutherford@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The common use-case in production is to have multiple cgroup-bpf
programs per attach type that cover multiple use-cases. Such programs
are attached with BPF_F_ALLOW_MULTI and can be maintained by different
people.
Order of programs usually matters, for example imagine two egress
programs: the first one drops packets and the second one counts packets.
If they're swapped the result of counting program will be different.
It brings operational challenges with updating cgroup-bpf program(s)
attached with BPF_F_ALLOW_MULTI since there is no way to replace a
program:
* One way to update is to detach all programs first and then attach the
new version(s) again in the right order. This introduces an
interruption in the work a program is doing and may not be acceptable
(e.g. if it's egress firewall);
* Another way is attach the new version of a program first and only then
detach the old version. This introduces the time interval when two
versions of same program are working, what may not be acceptable if a
program is not idempotent. It also imposes additional burden on
program developers to make sure that two versions of their program can
co-exist.
Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
replace
That way user can replace any program among those attached with
BPF_F_ALLOW_MULTI flag without the problems described above.
Details of the new API:
* If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
error (EINVAL or EBADF).
* If replace_bpf_fd has valid descriptor of BPF program but such a
program is not attached to specified cgroup, BPF_PROG_ATTACH will
return ENOENT.
BPF_F_REPLACE is introduced to make the user intent clear, since
replace_bpf_fd alone can't be used for this (its default value, 0, is a
valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Narkyiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com
Introduces a new Qdisc, which is based on 802.1Q-2014 wording. It is
PRIO-like in how it is configured, meaning one needs to specify how many
bands there are, how many are strict and how many are dwrr, quanta for the
latter, and priomap.
The new Qdisc operates like the PRIO / DRR combo would when configured as
per the standard. The strict classes, if any, are tried for traffic first.
When there's no traffic in any of the strict queues, the ETS ones (if any)
are treated in the same way as in DRR.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'struct timex' is one of the last users of 'struct timeval' and is
only referenced in one place in the kernel any more, to convert the
user space timex into the kernel-internal version on sparc64, with a
different tv_usec member type.
As a preparation for hiding the time_t definition and everything
using that in the kernel, change the implementation once more
to only convert the timeval member, and then enclose the
struct definition in an #ifdef.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Take the renaming of timeval and timespec one level further,
also renaming itimerval to __kernel_old_itimerval, to avoid
namespace conflicts with the user-space structure that may
use 64-bit time_t members.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
As there is only a 32-bit ac_btime field in taskstat and
we should handle dates after the overflow, add a new field
with the same information but 64-bit width that can hold
a full time64_t.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In 'struct acct', 'struct acct_v3', and 'struct taskstats' we have
a 32-bit 'ac_btime' field containing an absolute time value, which
will overflow in year 2106.
There are two possible ways to deal with it:
a) let it overflow and have user space code deal with reconstructing
the data based on the current time, or
b) truncate the times based on the range of the u32 type.
Neither of them solves the actual problem. Pick the second
one to best document what the issue is, and have someone
fix it in a future version.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, the meaning of the value returned by RTC_VL_READ is undocumented
and left to the driver implementation. In order to get more meaningful
values, define a set of values to use as to make clear to userspace what is
the status of the various voltages feeding the RTC.
Link: https://lore.kernel.org/r/20191214220259.621996-2-alexandre.belloni@bootlin.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
This is more consistent with the DMA and DRM frameworks convention. This
patch is only a name change, no logic is changed.
Signed-off-by: Andrew F. Davis <afd@ti.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191216133405.1001-2-afd@ti.com
UAPI Changes:
- Add support for DMA-BUF HEAPS.
Cross-subsystem Changes:
- mipi dsi definition updates, pulled into drm-intel as well.
- Add lockdep annotations for dma_resv vs mmap_sem and fs_reclaim.
- Remove support for dma-buf kmap/kunmap.
- Constify fb_ops in all fbdev drivers, including drm drivers and drm-core, and media as well.
Core Changes:
- Small cleanups to ttm.
- Fix SCDC definition.
- Assorted cleanups to core.
- Add todo to remove load/unload hooks, and use generic fbdev emulation.
- Assorted documentation updates.
- Use blocking ww lock in ttm fault handler.
- Remove drm_fb_helper_fbdev_setup/teardown.
- Warning fixes with W=1 for atomic.
- Use drm_debug_enabled() instead of drm_debug flag testing in various drivers.
- Fallback to nontiled mode in fbdev emulation when not all tiles are present. (Later on reverted)
- Various kconfig indentation fixes in core and drivers.
- Fix freeing transactions in dp-mst correctly.
- Sean Paul is steping down as core maintainer. :-(
- Add lockdep annotations for atomic locks vs dma-resv.
- Prevent use-after-free for a bad job in drm_scheduler.
- Fill out all block sizes in the P01x and P210 definitions.
- Avoid division by zero in drm/rect, and fix bounds.
- Add drm/rect selftests.
- Add aspect ratio and alternate clocks for HDMI 4k modes.
- Add todo for drm_framebuffer_funcs and fb_create cleanup.
- Drop DRM_AUTH for prime import/export ioctls.
- Clear DP-MST payload id tables downstream when initializating.
- Fix for DSC throughput definition.
- Add extra FEC definitions.
- Fix fake offset in drm_gem_object_funs.mmap.
- Stop using encoder->bridge in core directly
- Handle bridge chaining slightly better.
- Add backlight support to drm/panel, and use it in many panel drivers.
- Increase max number of y420 modes from 128 to 256, as preparation to add the new modes.
Driver Changes:
- Small fixes all over.
- Fix documentation in vkms.
- Fix mmap_sem vs dma_resv in nouveau.
- Small cleanup in komeda.
- Add page flip support in gma500 for psb/cdv.
- Add ddc symlink in the connector sysfs directory for many drivers.
- Add support for analogic an6345, and fix small bugs in it.
- Add atomic modesetting support to ast.
- Fix radeon fault handler VMA race.
- Switch udl to use generic shmem helpers.
- Unconditional vblank handling for mcde.
- Miscellaneous fixes to mcde.
- Tweak debug output from komeda using debugfs.
- Add gamma and color transform support to komeda for DOU-IPS.
- Add support for sony acx424AKP panel.
- Various small cleanups to gma500.
- Use generic fbdev emulation in udl, and replace udl_framebuffer with generic implementation.
- Add support for Logic PD Type 28 panel.
- Use drm_panel_* wrapper functions in exynos/tegra/msm.
- Add devicetree bindings for generic DSI panels.
- Don't include drm_pci.h directly in many drivers.
- Add support for begin/end_cpu_access in udmabuf.
- Stop using drm_get_pci_dev in gma500 and mga200.
- Fixes to UDL damage handling, and use dma_buf_begin/end_cpu_access.
- Add devfreq thermal support to panfrost.
- Fix hotplug with daisy chained monitors by removing VCPI when disabling topology manager.
- meson: Add support for OSD1 plane AFBC commit.
- Stop displaying garbage when toggling ast primary plane on/off.
- More cleanups and fixes to UDL.
- Add D32 suport to komeda.
- Remove globle copy of drm_dev in gma500.
- Add support for Boe Himax8279d MIPI-DSI LCD panel.
- Add support for ingenic JZ4770 panel.
- Small null pointer deference fix in ingenic.
- Remove support for the special tfp420 driver, as there is a generic way to do it.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl34lkkACgkQ/lWMcqZw
E8M76g//WRYl9fWnV063s44FBVJYjGxaus0vQJSGidaPCIE6Ep6TNjXp8DVzV82M
HR79P9glL02DC9B8pflioNNXdIRGSVk/FJcKVB2seFAqEFCAknvWDM/X/y+mOUpp
fUeFl+Znlwx3YlM8f4Qujdbm+CbTewfbya4VAWeWd8XG2V8jfq5cmODPPlUMNenZ
J6Ja+W3ph741uSIfAKaP69LVJgOcuUjXINE4SWhRk/i5QF3GIRej/A7ZjWGLQ/t2
2zUUF7EiCzhPomM40H3ddKtXb4ZjNJuc5pOD4GpxR8ciNbe2gUOHEZ5aenwYBdsU
5MwbxNKyMbKXATtn3yv3fSc4jH3DtmEKpmovONeO8ZDBrQBnxeYa3tQvfkNghA2f
acoZMzYUImV+ft6DMIgpXppASvo7mQYDAbLPOGEJ9E44AL4UP00jesEjnK5FOHSR
3BEzGUnK/6QL5zFNPni8YZQ8dan4jDIno1mqIV+cQ4WCGlaKckzIWO6243Bf13b/
kROSJpgWkiK6Ngq0ofhD0MHyT/m1QnqUzWRKTJhRtPflSWRBsDZqWCQ5Vx1QlNIE
/HfTNbTpXWwa+5wXbbB8TkDw5t9cQGnR+QcrEd9HgoIec7B5Re8rx9i0TJAT4N05
03RCQCecSfD8gwKd2wgaFIpFGRl9lTdLYSpffSmyL2X5a20lZhM=
=b15X
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2019-12-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.6:
UAPI Changes:
- Add support for DMA-BUF HEAPS.
Cross-subsystem Changes:
- mipi dsi definition updates, pulled into drm-intel as well.
- Add lockdep annotations for dma_resv vs mmap_sem and fs_reclaim.
- Remove support for dma-buf kmap/kunmap.
- Constify fb_ops in all fbdev drivers, including drm drivers and drm-core, and media as well.
Core Changes:
- Small cleanups to ttm.
- Fix SCDC definition.
- Assorted cleanups to core.
- Add todo to remove load/unload hooks, and use generic fbdev emulation.
- Assorted documentation updates.
- Use blocking ww lock in ttm fault handler.
- Remove drm_fb_helper_fbdev_setup/teardown.
- Warning fixes with W=1 for atomic.
- Use drm_debug_enabled() instead of drm_debug flag testing in various drivers.
- Fallback to nontiled mode in fbdev emulation when not all tiles are present. (Later on reverted)
- Various kconfig indentation fixes in core and drivers.
- Fix freeing transactions in dp-mst correctly.
- Sean Paul is steping down as core maintainer. :-(
- Add lockdep annotations for atomic locks vs dma-resv.
- Prevent use-after-free for a bad job in drm_scheduler.
- Fill out all block sizes in the P01x and P210 definitions.
- Avoid division by zero in drm/rect, and fix bounds.
- Add drm/rect selftests.
- Add aspect ratio and alternate clocks for HDMI 4k modes.
- Add todo for drm_framebuffer_funcs and fb_create cleanup.
- Drop DRM_AUTH for prime import/export ioctls.
- Clear DP-MST payload id tables downstream when initializating.
- Fix for DSC throughput definition.
- Add extra FEC definitions.
- Fix fake offset in drm_gem_object_funs.mmap.
- Stop using encoder->bridge in core directly
- Handle bridge chaining slightly better.
- Add backlight support to drm/panel, and use it in many panel drivers.
- Increase max number of y420 modes from 128 to 256, as preparation to add the new modes.
Driver Changes:
- Small fixes all over.
- Fix documentation in vkms.
- Fix mmap_sem vs dma_resv in nouveau.
- Small cleanup in komeda.
- Add page flip support in gma500 for psb/cdv.
- Add ddc symlink in the connector sysfs directory for many drivers.
- Add support for analogic an6345, and fix small bugs in it.
- Add atomic modesetting support to ast.
- Fix radeon fault handler VMA race.
- Switch udl to use generic shmem helpers.
- Unconditional vblank handling for mcde.
- Miscellaneous fixes to mcde.
- Tweak debug output from komeda using debugfs.
- Add gamma and color transform support to komeda for DOU-IPS.
- Add support for sony acx424AKP panel.
- Various small cleanups to gma500.
- Use generic fbdev emulation in udl, and replace udl_framebuffer with generic implementation.
- Add support for Logic PD Type 28 panel.
- Use drm_panel_* wrapper functions in exynos/tegra/msm.
- Add devicetree bindings for generic DSI panels.
- Don't include drm_pci.h directly in many drivers.
- Add support for begin/end_cpu_access in udmabuf.
- Stop using drm_get_pci_dev in gma500 and mga200.
- Fixes to UDL damage handling, and use dma_buf_begin/end_cpu_access.
- Add devfreq thermal support to panfrost.
- Fix hotplug with daisy chained monitors by removing VCPI when disabling topology manager.
- meson: Add support for OSD1 plane AFBC commit.
- Stop displaying garbage when toggling ast primary plane on/off.
- More cleanups and fixes to UDL.
- Add D32 suport to komeda.
- Remove globle copy of drm_dev in gma500.
- Add support for Boe Himax8279d MIPI-DSI LCD panel.
- Add support for ingenic JZ4770 panel.
- Small null pointer deference fix in ingenic.
- Remove support for the special tfp420 driver, as there is a generic way to do it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ba73535a-9334-5302-2e1f-5208bd7390bd@linux.intel.com
This fixes two spelling errors in source code comments.
Signed-off-by: Josh Soref <jsoref@gmail.com>
[Jason: rewrote commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want the staging driver fixes in here, and this resolves merge issues
with the isdn code that was pointed out in linux-next
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add support for extern variables, provided to BPF program by libbpf. Currently
the following extern variables are supported:
- LINUX_KERNEL_VERSION; version of a kernel in which BPF program is
executing, follows KERNEL_VERSION() macro convention, can be 4- and 8-byte
long;
- CONFIG_xxx values; a set of values of actual kernel config. Tristate,
boolean, strings, and integer values are supported.
Set of possible values is determined by declared type of extern variable.
Supported types of variables are:
- Tristate values. Are represented as `enum libbpf_tristate`. Accepted values
are **strictly** 'y', 'n', or 'm', which are represented as TRI_YES, TRI_NO,
or TRI_MODULE, respectively.
- Boolean values. Are represented as bool (_Bool) types. Accepted values are
'y' and 'n' only, turning into true/false values, respectively.
- Single-character values. Can be used both as a substritute for
bool/tristate, or as a small-range integer:
- 'y'/'n'/'m' are represented as is, as characters 'y', 'n', or 'm';
- integers in a range [-128, 127] or [0, 255] (depending on signedness of
char in target architecture) are recognized and represented with
respective values of char type.
- Strings. String values are declared as fixed-length char arrays. String of
up to that length will be accepted and put in first N bytes of char array,
with the rest of bytes zeroed out. If config string value is longer than
space alloted, it will be truncated and warning message emitted. Char array
is always zero terminated. String literals in config have to be enclosed in
double quotes, just like C-style string literals.
- Integers. 8-, 16-, 32-, and 64-bit integers are supported, both signed and
unsigned variants. Libbpf enforces parsed config value to be in the
supported range of corresponding integer type. Integers values in config can
be:
- decimal integers, with optional + and - signs;
- hexadecimal integers, prefixed with 0x or 0X;
- octal integers, starting with 0.
Config file itself is searched in /boot/config-$(uname -r) location with
fallback to /proc/config.gz, unless config path is specified explicitly
through bpf_object_open_opts' kernel_config_path option. Both gzipped and
plain text formats are supported. Libbpf adds explicit dependency on zlib
because of this, but this shouldn't be a problem, given libelf already depends
on zlib.
All detected extern variables, are put into a separate .extern internal map.
It, similarly to .rodata map, is marked as read-only from BPF program side, as
well as is frozen on load. This allows BPF verifier to track extern values as
constants and perform enhanced branch prediction and dead code elimination.
This can be relied upon for doing kernel version/feature detection and using
potentially unsupported field relocations or BPF helpers in a CO-RE-based BPF
program, while still having a single version of BPF program running on old and
new kernels. Selftests are validating this explicitly for unexisting BPF
helper.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191214014710.3449601-3-andriin@fb.com
This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk,
transition_fwd xstats counters to the bridge ports copied over via
netlink, providing useful information for STP.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The bond slave actor/partner operating state is exported as
bitfield to userspace, which lacks a way to interpret it, e.g.,
iproute2 only prints the state as a number:
ad_actor_oper_port_state 15
For userspace to interpret the bitfield, the bitfield definitions
should be part of the uapi. The bitfield itself is defined in the
802.3ad standard.
This commit moves the 802.3ad bitfield definitions to uapi.
Related iproute2 patches, soon to be posted upstream, use the new uapi
headers to pretty-print bond slave state, e.g., with ip -d link show
ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync>
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Going through all uses of timeval, I noticed that we screwed up
input_event in the previous attempts to fix it:
The time fields now match between kernel and user space, but all following
fields are in the wrong place.
Add the required padding that is implied by the glibc timeval definition
to fix the layout, and use a struct initializer to avoid leaking kernel
stack data.
Fixes: 141e5dcaa7 ("Input: input_event - fix the CONFIG_SPARC64 mixup")
Fixes: 2e746942eb ("Input: input_event - provide override for sparc64")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20191213204936.3643476-2-arnd@arndb.de
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3y5kIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv0MEADY9LOC2JkG2aNP53gCXGu74rFQNstJfusr
xPpkMs5I7hxoeVp/bkVAvR6BbpfPDsyGMhSMURELgMFzkuw+03z2IbxVuexdGOEu
X1+6EkunAq/r331fXL+cXUKJM3ThlkUBbNTuV8z5oWWSM1EfoBAWYfh2u4L4oBcc
yIHRH8by9qipx+fhBWPU/ZWeCcMVt5Ju2b6PHAE/GWwteZh00PsTwq0oqQcVcm/5
4Ennr0OELb7LaZWblAIIZe96KeJuCPzGfbRV/y12Ne/t6STH/sWA1tMUzgT22Eju
27uXtwAJtoLsep4V66a4VYObs/hXmEP281wIsXEnIkX0YCvbAiM+J6qVHGjYSMGf
mRrzfF6nDC1vaMduE4waoO3VFiDFQU/qfQRa21ZP50dfQOAXTpEz8kJNG/PjN1VB
gmz9xsujDU1QPD7IRiTreiPPHE5AocUzlUYuYOIMt7VSbFZIMHW5S/pML93avkJt
nx6g3gOP0JKkjaCEvIKY4VLljDT8eDZ/WScDsSedSbPZikMzEo8DSx4U6nbGPqzy
qSljgQniDcrH8GdRaJFDXgLkaB8pu83NH7zUH+xioUZjAHq/XEzKQFYqysf1DbtU
SEuSnUnLXBvbwfb3Z2VpQ/oz8G3a0nD9M7oudt1sN19oTCJKxYSbOUgNUrj9JsyQ
QlAVrPKRkQ==
=fbsf
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that
don't sever if the result is < 0.
This is mostly for linked timeouts, where if we ask for a pure
timeout we always get -ETIME. This makes links useless for that case,
hence allow a case where it works.
- Five minor optimizations to fix and improve cases that regressed
since v5.4.
- An SQTHREAD locking fix.
- A sendmsg/recvmsg iov assignment fix.
- Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and
subsequently ensuring that works for io_uring.
- Fix a case where for an invalid opcode we might return -EBADF instead
of -EINVAL, if the ->fd of that sqe was set to an invalid fd value.
* tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block:
io_uring: ensure we return -EINVAL on unknown opcode
io_uring: add sockets to list of files that support non-blocking issue
net: make socket read/write_iter() honor IOCB_NOWAIT
io_uring: only hash regular files for async work execution
io_uring: run next sqe inline if possible
io_uring: don't dynamically allocate poll data
io_uring: deferred send/recvmsg should assign iov
io_uring: sqthread should grab ctx->uring_lock for submissions
io-wq: briefly spin for new work after finishing work
io-wq: remove worker->wait waitqueue
io_uring: allow unbreakable links
Use offsetof to calculate offset of a field to take advantage of
compiler built-in version when possible, and avoid UBSAN warning when
compiling with Clang:
==================================================================
UBSAN: Undefined behaviour in net/wireless/wext-core.c:525:14
member access within null pointer of type 'struct iw_point'
CPU: 3 PID: 165 Comm: kworker/u16:3 Tainted: G S W 4.19.23 #43
Workqueue: cfg80211 __cfg80211_scan_done [cfg80211]
Call trace:
dump_backtrace+0x0/0x194
show_stack+0x20/0x2c
__dump_stack+0x20/0x28
dump_stack+0x70/0x94
ubsan_epilogue+0x14/0x44
ubsan_type_mismatch_common+0xf4/0xfc
__ubsan_handle_type_mismatch_v1+0x34/0x54
wireless_send_event+0x3cc/0x470
___cfg80211_scan_done+0x13c/0x220 [cfg80211]
__cfg80211_scan_done+0x28/0x34 [cfg80211]
process_one_work+0x170/0x35c
worker_thread+0x254/0x380
kthread+0x13c/0x158
ret_from_fork+0x10/0x18
===================================================================
Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20191204081307.138765-1-pihsun@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of just having an airtime flag in debugfs, turn AQL into a proper
NL80211_EXT_FEATURE, so drivers can turn it on when they are ready, and so
we also expose the presence of the feature to userspace.
This also has the effect of flipping the default, so drivers have to opt in
to using AQL instead of getting it by default with TXQs. To keep
functionality the same as pre-patch, we set this feature for ath10k (which
is where it is needed the most).
While we're at it, split out the debugfs interface so AQL gets its own
per-station debugfs file instead of using the 'airtime' file.
[Johannes:]
This effectively disables AQL for iwlwifi, where it fixes a number of
issues:
* TSO in iwlwifi is causing underflows and associated warnings in AQL
* HE (802.11ax) rates aren't reported properly so at HE rates, AQL could
never have a valid estimate (it'd use 6 Mbps instead of up to 2400!)
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20191212111437.224294-1-toke@redhat.com
Fixes: 3ace10f5b5 ("mac80211: Implement Airtime-based Queue Limit (AQL)")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Unlike e.g. netdev features, the ethtool ioctl interface requires link mode
table to be in sync between kernel and userspace for userspace to be able
to display and set all link modes supported by kernel. The way arbitrary
length bitsets are implemented in netlink interface, this will be no longer
needed.
To allow userspace to access all link modes running kernel supports, add
table of ethernet link mode names and make it available as a string set to
userspace GET_STRSET requests. Add build time check to make sure names
are defined for all modes declared in enum ethtool_link_mode_bit_indices.
Once the string set is available, make it also accessible via ioctl.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Permanent hardware address of a network device was traditionally provided
via ethtool ioctl interface but as Jiri Pirko pointed out in a review of
ethtool netlink interface, rtnetlink is much more suitable for it so let's
add it to the RTM_NEWLINK message.
Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the
permanent address is all zeros (i.e. device driver did not fill it). As
permanent address is not modifiable, reject userspace requests containing
IFLA_PERM_ADDRESS attribute.
Note: we already provide permanent hardware address for bond slaves;
unfortunately we cannot drop that attribute for backward compatibility
reasons.
v5 -> v6: only add the attribute if permanent address is not zero
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we submit an unknown opcode and have fd == -1, io_op_needs_file()
will return true as we default to needing a file. Then when we go and
assign the file, we find the 'fd' invalid and return -EBADF. We really
should be returning -EINVAL for that case, as we normally do for
unsupported opcodes.
Change io_op_needs_file() to have the following return values:
0 - does not need a file
1 - does need a file
< 0 - error value
and use this to pass back the right value for this invalid case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The VMADDR_CID_RESERVED (1) was used by VMCI, but now it is not
used anymore, so we can reuse it for local communication
(loopback) adding the new well-know CID: VMADDR_CID_LOCAL.
Cc: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow for audit messages to be emitted upon BPF program load and
unload for having a timeline of events. The load itself is in
syscall context, so additional info about the process initiating
the BPF prog creation can be logged and later directly correlated
to the unload event.
The only info really needed from BPF side is the globally unique
prog ID where then audit user space tooling can query / dump all
info needed about the specific BPF program right upon load event
and enrich the record, thus these changes needed here can be kept
small and non-intrusive to the core.
Raw example output:
# auditctl -D
# auditctl -a always,exit -F arch=x86_64 -S bpf
# ausearch --start recent -m 1334
...
----
time->Wed Nov 27 16:04:13 2019
type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf"
type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321 \
success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477 \
pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001 \
egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf" \
exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf" \
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD
----
time->Wed Nov 27 16:04:13 2019
type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD
...
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/bpf/20191206214934.11319-1-jolsa@kernel.org
Add support for reading out the uniq information from the underlying HID
device. This might be the iSerialNumber in case of USB or the BD_ADDR in
case of Bluetooth.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The staging isdn drivers are gone, and CONFIG_BT_CMTP is now
the only user. This means a lot of the code in the subsystem
has no remaining callers and can be removed.
Change the capi user space front-end to be part of kernelcapi,
and the combined module to only be compiled if BT_CMTP is
also enabled, then remove the interfaces that have no remaining
callers.
As the notifier list and the capi_drivers list have no callers
outside of kcapi.c, the implementation gets much simpler.
Some definitions from the include/linux/*.h headers are only
needed internally and are moved to kcapi.h.
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20191210210455.3475361-2-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As described in drivers/staging/isdn/TODO, the drivers are all
assumed to be unmaintained and unused now, with gigaset being the
last one to stop being maintained after Paul Bolle lost access
to an ISDN network.
The CAPI subsystem remains for now, as it is still required by
bluetooth/cmtp.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20191210210455.3475361-1-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This framework allows a unified userspace interface for dma-buf
exporters, allowing userland to allocate specific types of memory
for use in dma-buf sharing.
Each heap is given its own device node, which a user can allocate
a dma-buf fd from using the DMA_HEAP_IOC_ALLOC.
This code is an evoluiton of the Android ION implementation,
and a big thanks is due to its authors/maintainers over time
for their effort:
Rebecca Schultz Zavin, Colin Cross, Benjamin Gaignard,
Laura Abbott, and many other contributors!
Cc: Laura Abbott <labbott@redhat.com>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Pratik Patel <pratikp@codeaurora.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Vincent Donnefort <Vincent.Donnefort@arm.com>
Cc: Sudipto Paul <Sudipto.Paul@arm.com>
Cc: Andrew F. Davis <afd@ti.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chenbo Feng <fengc@google.com>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Brian Starkey <brian.starkey@arm.com>
Acked-by: Sandeep Patil <sspatil@android.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203172641.66642-2-john.stultz@linaro.org
Some commands will invariably end in a failure in the sense that the
completion result will be less than zero. One such example is timeouts
that don't have a completion count set, they will always complete with
-ETIME unless cancelled.
For linked commands, we sever links and fail the rest of the chain if
the result is less than zero. Since we have commands where we know that
will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever
regardless of the completion result. Note that the link will still sever
if we fail submitting the parent request, hard links are only resilient
in the presence of completion results for requests that did submit
correctly.
Cc: stable@vger.kernel.org # v5.4
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reported-by: 李通洲 <carter.li@eoitek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Wait for rcu grace period after releasing netns in ctnetlink,
from Florian Westphal.
2) Incorrect command type in flowtable offload ndo invocation,
from wenxu.
3) Incorrect callback type in flowtable offload flow tuple
updates, also from wenxu.
4) Fix compile warning on flowtable offload infrastructure due to
possible reference to uninitialized variable, from Nathan Chancellor.
5) Do not inline nf_ct_resolve_clash(), this is called from slow
path / stress situations. From Florian Westphal.
6) Missing IPv6 flow selector description in flowtable offload.
7) Missing check for NETDEV_UNREGISTER in nf_tables offload
infrastructure, from wenxu.
8) Update NAT selftest to use randomized netns names, from
Florian Westphal.
9) Restore nfqueue bridge support, from Marco Oliverio.
10) Compilation warning in SCTP_CHUNKMAP_*() on xt_sctp header.
From Phil Sutter.
11) Fix bogus lookup/get match for non-anonymous rbtree sets.
12) Missing netlink validation for NFT_SET_ELEM_INTERVAL_END
elements.
13) Missing netlink validation for NFT_DATA_VALUE after
nft_data_init().
14) If rule specifies no actions, offload infrastructure returns
EOPNOTSUPP.
15) Module refcount leak in object updates.
16) Missing sanitization for ARP traffic from br_netfilter, from
Eric Dumazet.
17) Compilation breakage on big-endian due to incorrect memcpy()
size in the flowtable offload infrastructure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With 'bytes(__u32)' being 32, a left-shift of 31 may happen which is
undefined for the signed 32-bit value 1. Avoid this by declaring 1 as
unsigned.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
TCP encapsulation of IKE and IPsec messages (RFC 8229) is implemented
as a TCP ULP, overriding in particular the sendmsg and recvmsg
operations. A Stream Parser is used to extract messages out of the TCP
stream using the first 2 bytes as length marker. Received IKE messages
are put on "ike_queue", waiting to be dequeued by the custom recvmsg
implementation. Received ESP messages are sent to XFRM, like with UDP
encapsulation.
Some of this code is taken from the original submission by Herbert
Xu. Currently, only IPv4 is supported, like for UDP encapsulation.
Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
WireGuard is a layer 3 secure networking tunnel made specifically for
the kernel, that aims to be much simpler and easier to audit than IPsec.
Extensive documentation and description of the protocol and
considerations, along with formal proofs of the cryptography, are
available at:
* https://www.wireguard.com/
* https://www.wireguard.com/papers/wireguard.pdf
This commit implements WireGuard as a simple network device driver,
accessible in the usual RTNL way used by virtual network drivers. It
makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of
networking subsystem APIs. It has a somewhat novel multicore queueing
system designed for maximum throughput and minimal latency of encryption
operations, but it is implemented modestly using workqueues and NAPI.
Configuration is done via generic Netlink, and following a review from
the Netlink maintainer a year ago, several high profile userspace tools
have already implemented the API.
This commit also comes with several different tests, both in-kernel
tests and out-of-kernel tests based on network namespaces, taking profit
of the fact that sockets used by WireGuard intentionally stay in the
namespace the WireGuard interface was originally created, exactly like
the semantics of userspace tun devices. See wireguard.com/netns/ for
pictures and examples.
The source code is fairly short, but rather than combining everything
into a single file, WireGuard is developed as cleanly separable files,
making auditing and comprehension easier. Things are laid out as
follows:
* noise.[ch], cookie.[ch], messages.h: These implement the bulk of the
cryptographic aspects of the protocol, and are mostly data-only in
nature, taking in buffers of bytes and spitting out buffers of
bytes. They also handle reference counting for their various shared
pieces of data, like keys and key lists.
* ratelimiter.[ch]: Used as an integral part of cookie.[ch] for
ratelimiting certain types of cryptographic operations in accordance
with particular WireGuard semantics.
* allowedips.[ch], peerlookup.[ch]: The main lookup structures of
WireGuard, the former being trie-like with particular semantics, an
integral part of the design of the protocol, and the latter just
being nice helper functions around the various hashtables we use.
* device.[ch]: Implementation of functions for the netdevice and for
rtnl, responsible for maintaining the life of a given interface and
wiring it up to the rest of WireGuard.
* peer.[ch]: Each interface has a list of peers, with helper functions
available here for creation, destruction, and reference counting.
* socket.[ch]: Implementation of functions related to udp_socket and
the general set of kernel socket APIs, for sending and receiving
ciphertext UDP packets, and taking care of WireGuard-specific sticky
socket routing semantics for the automatic roaming.
* netlink.[ch]: Userspace API entry point for configuring WireGuard
peers and devices. The API has been implemented by several userspace
tools and network management utility, and the WireGuard project
distributes the basic wg(8) tool.
* queueing.[ch]: Shared function on the rx and tx path for handling
the various queues used in the multicore algorithms.
* send.c: Handles encrypting outgoing packets in parallel on
multiple cores, before sending them in order on a single core, via
workqueues and ring buffers. Also handles sending handshake and cookie
messages as part of the protocol, in parallel.
* receive.c: Handles decrypting incoming packets in parallel on
multiple cores, before passing them off in order to be ingested via
the rest of the networking subsystem with GRO via the typical NAPI
poll function. Also handles receiving handshake and cookie messages
as part of the protocol, in parallel.
* timers.[ch]: Uses the timer wheel to implement protocol particular
event timeouts, and gives a set of very simple event-driven entry
point functions for callers.
* main.c, version.h: Initialization and deinitialization of the module.
* selftest/*.h: Runtime unit tests for some of the most security
sensitive functions.
* tools/testing/selftests/wireguard/netns.sh: Aforementioned testing
script using network namespaces.
This commit aims to be as self-contained as possible, implementing
WireGuard as a standalone module not needing much special handling or
coordination from the network subsystem. I expect for future
optimizations to the network stack to positively improve WireGuard, and
vice-versa, but for the time being, this exists as intentionally
standalone.
We introduce a menu option for CONFIG_WIREGUARD, as well as providing a
verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: David Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull more input updates from Dmitry Torokhov:
- fixups for Synaptics RMI4 driver
- a quirk for Goodinx touchscreen on Teclast tablet
- a new keycode definition for activating privacy screen feature found
on a few "enterprise" laptops
- updates to snvs_pwrkey driver
- polling uinput device for writing (which is always allowed) now works
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers
Input: synaptics-rmi4 - re-enable IRQs in f34v7_do_reflash
Input: goodix - add upside-down quirk for Teclast X89 tablet
Input: add privacy screen toggle keycode
Input: uinput - fix returning EPOLLOUT from uinput_poll
Input: snvs_pwrkey - remove gratuitous NULL initializers
Input: snvs_pwrkey - send key events for i.MX6 S, DL and Q
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3puL4QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjOJD/4vV2ENSYcwZ7Qezgn9tx5NEdADN6jC0DXe
tJDnA0sDLfLTlVM9I1U2/wD6Wu31cvV2dHKmxO+WWWRP2M0orI00UA3I6BssVptY
42e/YJd13IM+iVrhfhm+hmcAHvQUmWHTPZg/7F7OZPkEtNcbCjn6s+HTyzu9gqtt
/kbJVDJ75TR9dEWCPY/A4jUcJYLw2leoHI4u2eqYRsTFNJHUQoaFUHDyNHyj7UNI
kIUi2UixJv+a4pkFvyLPKhJcLr6DBC2TeUnfP2Re5oX1Z/XkDR7iX+L5dUwRHHbT
4M7aJJ+mHDm8Z4IwwBqsSDRU5wUpzuplwBQBY/EQilUZAyOALVzUQABlRa0zxH8t
0D4U6LDDOopYeK0by/FGdD88S6Z0LKm0HJETbdPaGd9fqSlhBn19iGeBKYk0cCIu
Uecm9DKF+QLfKhTbN6h8nPyR3bfkqyUptQJ1UnheWo9f6L32bfp19sdI9LdtRUnS
k661QApajaYQu/641u9CDZiI+DvI+57Wm9I4eCF3GXqOYWPAxSgWP7rJVgS+mfA3
wo99/r11SruaA1Wqf+bOoN/wjsQCiUPFa8po8sh05ER4MH5Brb5EFlg04ZlQZkR9
jOdiL8ISM9t86eyKwtR4PanE7/pPEYUjEWm4ntCC6ViTwCvgV3d6s2We6QPw1j2l
nQ9p/c2bhg==
=7ZFw
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20191205' of git://git.kernel.dk/linux-block
Pull more block and io_uring updates from Jens Axboe:
"I wasn't expecting this to be so big, and if I was, I would have used
separate branches for this. Going forward I'll be doing separate
branches for the current tree, just like for the next kernel version
tree. In any case, this contains:
- Series from Christoph that fixes an inherent race condition with
zoned devices and revalidation.
- null_blk zone size fix (Damien)
- Fix for a regression in this merge window that caused busy spins by
sending empty disk uevents (Eric)
- Fix for a regression in this merge window for bfq stats (Hou)
- Fix for io_uring creds allocation failure handling (me)
- io_uring -ERESTARTSYS send/recvmsg fix (me)
- Series that fixes the need for applications to retain state across
async request punts for io_uring. This one is a bit larger than I
would have hoped, but I think it's important we get this fixed for
5.5.
- connect(2) improvement for io_uring, handling EINPROGRESS instead
of having applications needing to poll for it (me)
- Have io_uring use a hash for poll requests instead of an rbtree.
This turned out to work much better in practice, so I think we
should make the switch now. For some workloads, even with a fair
amount of cancellations, the insertion sort is just too expensive.
(me)
- Various little io_uring fixes (me, Jackie, Pavel, LimingWu)
- Fix for brd unaligned IO, and a warning for the future (Ming)
- Fix for a bio integrity data leak (Justin)
- bvec_iter_advance() improvement (Pavel)
- Xen blkback page unmap fix (SeongJae)
The major items in here are all well tested, and on the liburing side
we continue to add regression and feature test cases. We're up to 50
topic cases now, each with anywhere from 1 to more than 10 cases in
each"
* tag 'for-linus-20191205' of git://git.kernel.dk/linux-block: (33 commits)
block: fix memleak of bio integrity data
io_uring: fix a typo in a comment
bfq-iosched: Ensure bio->bi_blkg is valid before using it
io_uring: hook all linked requests via link_list
io_uring: fix error handling in io_queue_link_head
io_uring: use hash table for poll command lookups
io-wq: clear node->next on list deletion
io_uring: ensure deferred timeouts copy necessary data
io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT
null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED
brd: warn on un-aligned buffer
brd: remove max_hw_sectors queue limit
xen/blkback: Avoid unmapping unmapped grant pages
io_uring: handle connect -EINPROGRESS like -EAGAIN
block: set the zone size in blk_revalidate_disk_zones atomically
block: don't handle bio based drivers in blk_revalidate_disk_zones
block: allocate the zone bitmaps lazily
block: replace seq_zones_bitmap with conv_zones_bitmap
block: simplify blkdev_nr_zones
block: remove the empty line at the end of blk-zoned.c
...
Merge more updates from Andrew Morton:
"Most of the rest of MM and various other things. Some Kconfig rework
still awaits merges of dependent trees from linux-next.
Subsystems affected by this patch series: mm/hotfixes, mm/memcg,
mm/vmstat, mm/thp, procfs, sysctl, misc, notifiers, core-kernel,
bitops, lib, checkpatch, epoll, binfmt, init, rapidio, uaccess, kcov,
ubsan, ipc, bitmap, mm/pagemap"
* akpm: (86 commits)
mm: remove __ARCH_HAS_4LEVEL_HACK and include/asm-generic/4level-fixup.h
um: add support for folded p4d page tables
um: remove unused pxx_offset_proc() and addr_pte() functions
sparc32: use pgtable-nopud instead of 4level-fixup
parisc/hugetlb: use pgtable-nopXd instead of 4level-fixup
parisc: use pgtable-nopXd instead of 4level-fixup
nds32: use pgtable-nopmd instead of 4level-fixup
microblaze: use pgtable-nopmd instead of 4level-fixup
m68k: mm: use pgtable-nopXd instead of 4level-fixup
m68k: nommu: use pgtable-nopud instead of 4level-fixup
c6x: use pgtable-nopud instead of 4level-fixup
arm: nommu: use pgtable-nopud instead of 4level-fixup
alpha: use pgtable-nopud instead of 4level-fixup
gpio: pca953x: tighten up indentation
gpio: pca953x: convert to use bitmap API
gpio: pca953x: use input from regs structure in pca953x_irq_pending()
gpio: pca953x: remove redundant variable and check in IRQ handler
lib/bitmap: introduce bitmap_replace() helper
lib/test_bitmap: fix comment about this file
lib/test_bitmap: move exp1 and exp2 upper for others to use
...
Patch series " kcov: collect coverage from usb and vhost", v3.
This patchset extends kcov to allow collecting coverage from backgound
kernel threads. This extension requires custom annotations for each of
the places where coverage collection is desired. This patchset
implements this for hub events in the USB subsystem and for vhost
workers. See the first patch description for details about the kcov
extension. The other two patches apply this kcov extension to USB and
vhost.
Examples of other subsystems that might potentially benefit from this
when custom annotations are added (the list is based on
process_one_work() callers for bugs recently reported by syzbot):
1. fs: writeback wb_workfn() worker,
2. net: addrconf_dad_work()/addrconf_verify_work() workers,
3. net: neigh_periodic_work() worker,
4. net/p9: p9_write_work()/p9_read_work() workers,
5. block: blk_mq_run_work_fn() worker.
These patches have been used to enable coverage-guided USB fuzzing with
syzkaller for the last few years, see the details here:
https://github.com/google/syzkaller/blob/master/docs/linux/external_fuzzing_usb.md
This patchset has been pushed to the public Linux kernel Gerrit
instance:
https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/1524
This patch (of 3):
Add background thread coverage collection ability to kcov.
With KCOV_ENABLE coverage is collected only for syscalls that are issued
from the current process. With KCOV_REMOTE_ENABLE it's possible to
collect coverage for arbitrary parts of the kernel code, provided that
those parts are annotated with kcov_remote_start()/kcov_remote_stop().
This allows to collect coverage from two types of kernel background
threads: the global ones, that are spawned during kernel boot in a
limited number of instances (e.g. one USB hub_event() worker thread is
spawned per USB HCD); and the local ones, that are spawned when a user
interacts with some kernel interface (e.g. vhost workers).
To enable collecting coverage from a global background thread, a unique
global handle must be assigned and passed to the corresponding
kcov_remote_start() call. Then a userspace process can pass a list of
such handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field
of the kcov_remote_arg struct. This will attach the used kcov device to
the code sections, that are referenced by those handles.
Since there might be many local background threads spawned from
different userspace processes, we can't use a single global handle per
annotation. Instead, the userspace process passes a non-zero handle
through the common_handle field of the kcov_remote_arg struct. This
common handle gets saved to the kcov_handle field in the current
task_struct and needs to be passed to the newly spawned threads via
custom annotations. Those threads should in turn be annotated with
kcov_remote_start()/kcov_remote_stop().
Internally kcov stores handles as u64 integers. The top byte of a
handle is used to denote the id of a subsystem that this handle belongs
to, and the lower 4 bytes are used to denote the id of a thread instance
within that subsystem. A reserved value 0 is used as a subsystem id for
common handles as they don't belong to a particular subsystem. The
bytes 4-7 are currently reserved and must be zero. In the future the
number of bytes used for the subsystem or handle ids might be increased.
When a particular userspace process collects coverage by via a common
handle, kcov will collect coverage for each code section that is
annotated to use the common handle obtained as kcov_handle from the
current task_struct. However non common handles allow to collect
coverage selectively from different subsystems.
Link: http://lkml.kernel.org/r/e90e315426a384207edbec1d6aa89e43008e4caf.1572366574.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Userspace cannot compile <linux/scc.h>
CC usr/include/linux/scc.h.s
In file included from <command-line>:32:0:
usr/include/linux/scc.h:20:20: error: `SIOCDEVPRIVATE' undeclared here (not in a function)
SIOCSCCRESERVED = SIOCDEVPRIVATE,
^~~~~~~~~~~~~~
Include <linux/sockios.h> to make it self-contained, and add it to the
compile-test coverage.
Link: http://lkml.kernel.org/r/20191108055809.26969-1-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add keycode for toggling electronic privacy screen to the keycodes
definition. Some new laptops have a privacy screen which can be toggled
with a key on the keyboard.
Signed-off-by: Mathew King <mathewk@chromium.org>
Link: https://lore.kernel.org/r/20191017163208.235518-1-mathewk@chromium.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
* small x86 cleanup
* fix for an x86-specific out-of-bounds write on a ioctl (not guest triggerable,
data not attacker-controlled)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJd551cAAoJEL/70l94x66D+JkH/R3eEOyvckPmYmzd0lnV8mQ/
7e0n2G/aD+iLZkcCbUnMaImdmSJmoEEJCPjgPk/5nJ3zUi5b/ABWyidEM5uf19Hl
rzKBg0DR7BiQptPnZv2JMwEVKu3JOTchMykqu9xXChQlICocZ0xjdOA6nQ19p0Lv
FulDw5MUaWrXevIzCBskQ38zJejRQA6CpD1lQkHn7LKS9p3p+BsAOd/Ouy87RfWG
b3ktECNbXyO6KStrrhgm+z8pviWY+kqYklyBlDOOwxWif0x8WvNDpQLoVo+ZuhLU
Me8YJ1BN75vFlxzh6ZK5exBUnm9E3fGVKIaaF+dpuds2x+j4HnYl+lZCm89MdqY=
=Q4v7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
- PPC secure guest support
- small x86 cleanup
- fix for an x86-specific out-of-bounds write on a ioctl (not guest
triggerable, data not attacker-controlled)
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: vmx: Stop wasting a page for guest_msrs
KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)
Documentation: kvm: Fix mention to number of ioctls classes
powerpc: Ultravisor: Add PPC_UV config option
KVM: PPC: Book3S HV: Support reset of secure guest
KVM: PPC: Book3S HV: Handle memory plug/unplug to secure VM
KVM: PPC: Book3S HV: Radix changes for secure guest
KVM: PPC: Book3S HV: Shared pages support for secure guests
KVM: PPC: Book3S HV: Support for running secure guests
mm: ksm: Export ksm_madvise()
KVM x86: Move kvm cpuid support out of svm
Here is the "big" tty and serial driver patches for 5.5-rc1. It's a bit
later in the merge window than normal as I wanted to make sure some
last-minute patches applied to it were all sane. They seem to be :)
There's a lot of little stuff in here, for the tty core, and for lots of
serial drivers:
- reverts of uartlite serial driver patches that were wrong
- msm-serial driver fixes
- serial core updates and fixes
- tty core fixes
- serial driver dma mapping api changes
- lots of other tiny fixes and updates for serial drivers
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXebFIQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylnmACgjfMcfQWa7uC9Q6m2DaQaRMaW6QoAnjg+TgBB
eW9EhvyXL2VbrsuUl+iH
=Am9O
-----END PGP SIGNATURE-----
Merge tag 'tty-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the "big" tty and serial driver patches for 5.5-rc1.
It's a bit later in the merge window than normal as I wanted to make
sure some last-minute patches applied to it were all sane. They seem
to be :)
There's a lot of little stuff in here, for the tty core, and for lots
of serial drivers:
- reverts of uartlite serial driver patches that were wrong
- msm-serial driver fixes
- serial core updates and fixes
- tty core fixes
- serial driver dma mapping api changes
- lots of other tiny fixes and updates for serial drivers
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (58 commits)
Revert "serial/8250: Add support for NI-Serial PXI/PXIe+485 devices"
vcs: prevent write access to vcsu devices
tty: vt: keyboard: reject invalid keycodes
tty: don't crash in tty_init_dev when missing tty_port
serial: stm32: fix clearing interrupt error flags
tty: Fix Kconfig indentation, continued
serial: serial_core: Perform NULL checks for break_ctl ops
tty: remove unused argument from tty_open_by_driver()
tty: Fix Kconfig indentation
{tty: serial, nand: onenand}: samsung: rename to fix build warning
serial: ifx6x60: add missed pm_runtime_disable
serial: pl011: Fix DMA ->flush_buffer()
Revert "serial-uartlite: Move the uart register"
Revert "serial-uartlite: Add get serial id if not provided"
Revert "serial-uartlite: Do not use static struct uart_driver out of probe()"
Revert "serial-uartlite: Add runtime support"
Revert "serial-uartlite: Change logic how console_port is setup"
Revert "serial-uartlite: Use allocated structure instead of static ones"
tty: serial: msm_serial: Use dma_request_chan() directly for channel request
tty: serial: tegra: Use dma_request_chan() directly for channel request
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl3leXUUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyY3g/9FAVVdPEaadNtAhQ/zIxcjozDovKq
0q7yOA3aTBTUoNEinm88an6p0dcC4gNKtGukXmzVH2Hhxm9kLRdtpZGYY00tpLUB
9rI7XsgwwHa+hLwsHbIs507sKGFGy5FLr0ChTTGLDEMppnEvjA2hZooYmcB/OgrC
LlFcwbNKGOk/Si9u2bF2nLO0JDoVHnwzpF99saew/nqc7Lfj9e9IPZFom+VjPBUh
AOvRp2H7uBN+WQlpLeFeMDDoeXh34lX0kYqIV/cVkXVnknDGYKV2CBTg2aeX7jd0
QiPHZh6zlW8zNQgaCZRiBAbatVEOnRMRJ++yiqB8hBYp1LMXm6kJ01YSQpXkugoY
Vp9dtzzTARWV/XkKwD4brw9ZEmIDnO+Ed2x2VbUkPJVcXAvzSQWAx82IU0Iuqmcb
9qr6U2Zf/Xk5aFlGPYVH8QOG+QqzIbZNRQ7NlhDlITyW4P6QPu0mw374yYP2wDGL
sP5YSS3YGa0sQcEgDtVnd4z+WTZI4AwXLPaeaLkDhdfHp2FsERUY4TrPs33J99xw
og4EyokVFzjYzlnBPU6WWn7LL+jj5ccXkL3MA4DR4FJOnNGHh7NXfQUH56rrgsq7
F9/8shL5DuTbQkde1uSyUG9Iq/RigVLlV5DQavFm3dSXvZi0E16t5alC5URNTzk7
at8Bogn53QhlmYc=
=uUXw
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Warn if a host bridge has no NUMA info (Yunsheng Lin)
- Add PCI_STD_NUM_BARS for the number of standard BARs (Denis
Efremov)
Resource management:
- Fix boot-time Embedded Controller GPE storm caused by incorrect
resource assignment after ACPI Bus Check Notification (Mika
Westerberg)
- Protect pci_reassign_bridge_resources() against concurrent
addition/removal (Benjamin Herrenschmidt)
- Fix bridge dma_ranges resource list cleanup (Rob Herring)
- Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control
the MMIO and prefetchable MMIO window sizes of hotplug bridges
independently (Nicholas Johnson)
- Fix MMIO/MMIO_PREF window assignment that assigned more space than
desired (Nicholas Johnson)
- Only enforce bus numbers from bridge EA if the bridge has EA
devices downstream (Subbaraya Sundeep)
- Consolidate DT "dma-ranges" parsing and convert all host drivers to
use shared parsing (Rob Herring)
Error reporting:
- Restore AER capability after resume (Mayurkumar Patel)
- Add PoisonTLPBlocked AER counter (Rajat Jain)
- Use for_each_set_bit() to simplify AER code (Andy Shevchenko)
- Fix AER kernel-doc (Andy Shevchenko)
- Add "pcie_ports=dpc-native" parameter to allow native use of DPC
even if platform didn't grant control over AER (Olof Johansson)
Hotplug:
- Avoid returning prematurely from sysfs requests to enable or
disable a PCIe hotplug slot (Lukas Wunner)
- Don't disable interrupts twice when suspending hotplug ports (Mika
Westerberg)
- Fix deadlocks when PCIe ports are hot-removed while suspended (Mika
Westerberg)
Power management:
- Remove unnecessary ASPM locking (Bjorn Helgaas)
- Add support for disabling L1 PM Substates (Heiner Kallweit)
- Allow re-enabling Clock PM after it has been disabled (Heiner
Kallweit)
- Add sysfs attributes for controlling ASPM link states (Heiner
Kallweit)
- Remove CONFIG_PCIEASPM_DEBUG, including "link_state" and "clk_ctl"
sysfs files (Heiner Kallweit)
- Avoid AMD FCH XHCI USB PME# from D0 defect that prevents wakeup on
USB 2.0 or 1.1 connect events (Kai-Heng Feng)
- Move power state check out of pci_msi_supported() (Bjorn Helgaas)
- Fix incorrect MSI-X masking on resume and revert related nvme quirk
for Kingston NVME SSD running FW E8FK11.T (Jian-Hong Pan)
- Always return devices to D0 when thawing to fix hibernation with
drivers like mlx4 that used legacy power management (previously we
only did it for drivers with new power management ops) (Dexuan Cui)
- Clear PCIe PME Status even for legacy power management (Bjorn
Helgaas)
- Fix PCI PM documentation errors (Bjorn Helgaas)
- Use dev_printk() for more power management messages (Bjorn Helgaas)
- Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas)
- Convert xen-platform from legacy to generic power management (Bjorn
Helgaas)
- Removed unused .resume_early() and .suspend_late() legacy power
management hooks (Bjorn Helgaas)
- Rearrange power management code for clarity (Rafael J. Wysocki)
- Decode power states more clearly ("4" or "D4" really refers to
"D3cold") (Bjorn Helgaas)
- Notice when reading PM Control register returns an error (~0)
instead of interpreting it as being in D3hot (Bjorn Helgaas)
- Add missing link delays required by the PCIe spec (Mika Westerberg)
Virtualization:
- Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn
Helgaas)
- Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code
previously didn't recognize that) (Kuppuswamy Sathyanarayanan)
- Allow VFs to use PASID (the PF PASID capability is shared by the
VFs, but the code previously didn't recognize that) (Kuppuswamy
Sathyanarayanan)
- Disconnect PF and VF ATS enablement, since ATS in PFs and
associated VFs can be enabled independently (Kuppuswamy
Sathyanarayanan)
- Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan)
- Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas)
- Consolidate ATS declarations in linux/pci-ats.h (Krzysztof
Wilczynski)
- Remove unused PRI and PASID stubs (Bjorn Helgaas)
- Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID
interfaces that are only used by built-in IOMMU drivers (Bjorn
Helgaas)
- Hide PRI and PASID state restoration functions used only inside the
PCI core (Bjorn Helgaas)
- Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski)
- Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut)
- Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George
Cherian)
- Fix the UPDCR register address in the Intel ACS quirk (Steffen
Liebergeld)
- Unify ACS quirk implementations (Bjorn Helgaas)
Amlogic Meson host bridge driver:
- Fix meson PERST# GPIO polarity problem (Remi Pommarel)
- Add DT bindings for Amlogic Meson G12A (Neil Armstrong)
- Fix meson clock names to match DT bindings (Neil Armstrong)
- Add meson support for Amlogic G12A SoC with separate shared PHY
(Neil Armstrong)
- Add meson extended PCIe PHY functions for Amlogic G12A USB3+PCIe
combo PHY (Neil Armstrong)
- Add arm64 DT for Amlogic G12A PCIe controller node (Neil Armstrong)
- Add commented-out description of VIM3 USB3/PCIe mux in arm64 DT
(Neil Armstrong)
Broadcom iProc host bridge driver:
- Invalidate iProc PAXB address mapping before programming it
(Abhishek Shah)
- Fix iproc-msi and mvebu __iomem annotations (Ben Dooks)
Cadence host bridge driver:
- Refactor Cadence PCIe host controller to use as a library for both
host and endpoint (Tom Joseph)
Freescale Layerscape host bridge driver:
- Add layerscape LS1028a support (Xiaowei Bao)
Intel VMD host bridge driver:
- Add VMD bus 224-255 restriction decode (Jon Derrick)
- Add VMD 8086:9A0B device ID (Jon Derrick)
- Remove Keith from VMD maintainer list (Keith Busch)
Marvell ARMADA 3700 / Aardvark host bridge driver:
- Use LTSSM state to build link training flag since Aardvark doesn't
implement the Link Training bit (Remi Pommarel)
- Delay before training Aardvark link in case PERST# was asserted
before the driver probe (Remi Pommarel)
- Fix Aardvark issues with Root Control reads and writes (Remi
Pommarel)
- Don't rely on jiffies in Aardvark config access path since
interrupts may be disabled (Remi Pommarel)
- Fix Aardvark big-endian support (Grzegorz Jaszczyk)
Marvell ARMADA 370 / XP host bridge driver:
- Make mvebu_pci_bridge_emul_ops static (Ben Dooks)
Microsoft Hyper-V host bridge driver:
- Add hibernation support for Hyper-V virtual PCI devices (Dexuan
Cui)
- Track Hyper-V pci_protocol_version per-hbus, not globally (Dexuan
Cui)
- Avoid kmemleak false positive on hv hbus buffer (Dexuan Cui)
Mobiveil host bridge driver:
- Change mobiveil csr_read()/write() function names that conflict
with riscv arch functions (Kefeng Wang)
NVIDIA Tegra host bridge driver:
- Fix Tegra CLKREQ dependency programming (Vidya Sagar)
Renesas R-Car host bridge driver:
- Remove unnecessary header include from rcar (Andrew Murray)
- Tighten register index checking for rcar inbound range programming
(Marek Vasut)
- Fix rcar inbound range alignment calculation to improve packing of
multiple entries (Marek Vasut)
- Update rcar MACCTLR setting to match documentation (Yoshihiro
Shimoda)
- Clear bit 0 of MACCTLR before PCIETCTLR.CFINIT per manual
(Yoshihiro Shimoda)
- Add Marek Vasut and Yoshihiro Shimoda as R-Car maintainers (Simon
Horman)
Rockchip host bridge driver:
- Make rockchip 0V9 and 1V8 power regulators non-optional (Robin
Murphy)
Socionext UniPhier host bridge driver:
- Set uniphier to host (RC) mode always (Kunihiko Hayashi)
Endpoint drivers:
- Fix endpoint driver sign extension problem when shifting page
number to phys_addr_t (Alan Mikhak)
Misc:
- Add NumaChip SPDX header (Krzysztof Wilczynski)
- Replace EXTRA_CFLAGS with ccflags-y (Krzysztof Wilczynski)
- Remove unused includes (Krzysztof Wilczynski)
- Removed unused sysfs attribute groups (Ben Dooks)
- Remove PTM and ASPM dependencies on PCIEPORTBUS (Bjorn Helgaas)
- Add PCIe Link Control 2 register field definitions to replace magic
numbers in AMDGPU and Radeon CIK/SI (Bjorn Helgaas)
- Fix incorrect Link Control 2 Transmit Margin usage in AMDGPU and
Radeon CIK/SI PCIe Gen3 link training (Bjorn Helgaas)
- Use pcie_capability_read_word() instead of pci_read_config_word()
in AMDGPU and Radeon CIK/SI (Frederick Lawler)
- Remove unused pci_irq_get_node() Greg Kroah-Hartman)
- Make asm/msi.h mandatory and simplify PCI_MSI_IRQ_DOMAIN Kconfig
(Palmer Dabbelt, Michal Simek)
- Read all 64 bits of Switchtec part_event_bitmap (Logan Gunthorpe)
- Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn
Helgaas)
- Fix bridge emulation big-endian support (Grzegorz Jaszczyk)
- Fix dwc find_next_bit() usage (Niklas Cassel)
- Fix pcitest.c fd leak (Hewenliang)
- Fix typos and comments (Bjorn Helgaas)
- Fix Kconfig whitespace errors (Krzysztof Kozlowski)"
* tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (160 commits)
PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist
asm-generic: Make msi.h a mandatory include/asm header
Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
PCI/MSI: Fix incorrect MSI-X masking on resume
PCI/MSI: Move power state check out of pci_msi_supported()
PCI/MSI: Remove unused pci_irq_get_node()
PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer
PCI: hv: Change pci_protocol_version to per-hbus
PCI: hv: Add hibernation support
PCI: hv: Reorganize the code in preparation of hibernation
MAINTAINERS: Remove Keith from VMD maintainer
PCI/ASPM: Remove PCIEASPM_DEBUG Kconfig option and related code
PCI/ASPM: Add sysfs attributes for controlling ASPM link states
PCI: Fix indentation
drm/radeon: Prefer pcie_capability_read_word()
drm/radeon: Replace numbers with PCI_EXP_LNKCTL2 definitions
drm/radeon: Correct Transmit Margin masks
drm/amdgpu: Prefer pcie_capability_read_word()
PCI: uniphier: Set mode register to host mode
drm/amdgpu: Replace numbers with PCI_EXP_LNKCTL2 definitions
...
If this flag is set, applications can be certain that any data for
async offload has been consumed when the kernel has consumed the
SQE.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is mostly update of the usual drivers: aacraid, ufs, zfcp,
NCR5380, lpfc, qla2xxx, smartpqi, hisi_sas, target, mpt3sas, pm80xx
plus a whole load of minor updates and fixes. The two major core
changes are Al Viro's reworking of sg's handling of copy to/from user,
Ming Lei's removal of the host busy counter to avoid contention in the
multiqueue case and Damien Le Moal's fixing of residual tracking
across error handling.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXeKvHCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQJMAQDAjlAi
SNfbyndMqyf+rZGWufDI+43Up1VvW9GeWJHeDwEAxfO5XZsCks2uT8UxXhpEp9L7
HkiUww3zbcgl0FWFkUM=
=cdVU
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: aacraid, ufs, zfcp,
NCR5380, lpfc, qla2xxx, smartpqi, hisi_sas, target, mpt3sas, pm80xx
plus a whole load of minor updates and fixes.
The major core changes are Al Viro's reworking of sg's handling of
copy to/from user, Ming Lei's removal of the host busy counter to
avoid contention in the multiqueue case and Damien Le Moal's fixing of
residual tracking across error handling"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (251 commits)
scsi: bnx2fc: timeout calculation invalid for bnx2fc_eh_abort()
scsi: target: core: Fix a pr_debug() argument
scsi: iscsi: Don't send data to unbound connection
scsi: target: iscsi: Wait for all commands to finish before freeing a session
scsi: target: core: Release SPC-2 reservations when closing a session
scsi: target: core: Document target_cmd_size_check()
scsi: bnx2i: fix potential use after free
Revert "scsi: qla2xxx: Fix memory leak when sending I/O fails"
scsi: NCR5380: Add disconnect_mask module parameter
scsi: NCR5380: Unconditionally clear ICR after do_abort()
scsi: NCR5380: Call scsi_set_resid() on command completion
scsi: scsi_debug: num_tgts must be >= 0
scsi: lpfc: use hdwq assigned cpu for allocation
scsi: arcmsr: fix indentation issues
scsi: qla4xxx: fix double free bug
scsi: pm80xx: Modified the logic to collect fatal dump
scsi: pm80xx: Tie the interrupt name to the module instance
scsi: pm80xx: Controller fatal error through sysfs
scsi: pm80xx: Do not request 12G sas speeds
scsi: pm80xx: Cleanup command when a reset times out
...
Including:
- Conversion of the AMD IOMMU driver to use the dma-iommu code
for imlementing the DMA-API. This gets rid of quite some code
in the driver itself, but also has some potential for
regressions (non are known at the moment).
- Support for the Qualcomm SMMUv2 implementation in the SDM845
SoC. This also includes some firmware interface changes, but
those are acked by the respective maintainers.
- Preparatory work to support two distinct page-tables per
domain in the ARM-SMMU driver
- Power management improvements for the ARM SMMUv2
- Custom PASID allocator support
- Multiple PCI DMA alias support for the AMD IOMMU driver
- Adaption of the Mediatek driver to the changed IO/TLB flush
interface of the IOMMU core code.
- Preparatory patches for the Renesas IOMMU driver to support
future hardware.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl3hCC4ACgkQK/BELZcB
GuN9QQ/+LFh2TdhtiemIakA/19nv1FTP719uje7vjX4gGBGD++NEzW7mBcAXSEnD
rBta1GsD6N8h0fdT53Nw8cezQ1ldBomKG3j+mzcju7TcuRwebhCEQaxh2iWy+I6g
cp6HxTu3G0E6Zy7wd+MWyJzvXa7MXV2p8iCDs7Dp8yEow+c55b4LAIoeRWx3rjsT
rat29MuJ8TGLP6vOYHcpI+REGfda4rsog75980RIoOEuqRjMG6JPj9clPeakSNtQ
Rl1EtgrDskbRCgDSujbzDMHAYRUKvdCuTuTM1De/GQO+GWYsOtzqBHkct67sGn9I
H518Be9m4xfYyyktVM6K9bSpxzCOtor+u6LFOejufJN/7vL2qtePZX7EHL/ks8zh
Mn80H/1ch1UcFcF9p7V7QCMUSyaiX/VWhgwWIdPf3CGrKVaLnQ8mkB82Zf0VNuQT
OzcfJcVF+skhDkXdFL5xUkQtqqTHhpaK2CzvvTDAsR1KXMCc6mH/MT/9m+mOFQFK
P+klgGdU5rVniru10k4pamT5LlLubRV0NBpaAiGr2R3dfyYyiS/D9FBSLanqO+JM
AgSnmOSbl7y927DxufkVPH8M7TxSdtQVo7VoQFjSWE8B9bh4qU6MVV30enLvY0Fj
g4DP+8srOvY0vNsWNiBe2JpldABGEAbumFt78g1WV2tFi1d/NUI=
=ntaE
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- Conversion of the AMD IOMMU driver to use the dma-iommu code for
imlementing the DMA-API. This gets rid of quite some code in the
driver itself, but also has some potential for regressions (non are
known at the moment).
- Support for the Qualcomm SMMUv2 implementation in the SDM845 SoC.
This also includes some firmware interface changes, but those are
acked by the respective maintainers.
- Preparatory work to support two distinct page-tables per domain in
the ARM-SMMU driver
- Power management improvements for the ARM SMMUv2
- Custom PASID allocator support
- Multiple PCI DMA alias support for the AMD IOMMU driver
- Adaption of the Mediatek driver to the changed IO/TLB flush interface
of the IOMMU core code.
- Preparatory patches for the Renesas IOMMU driver to support future
hardware.
* tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (62 commits)
iommu/rockchip: Don't provoke WARN for harmless IRQs
iommu/vt-d: Turn off translations at shutdown
iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved
iommu/arm-smmu: Remove duplicate error message
iommu/arm-smmu-v3: Don't display an error when IRQ lines are missing
iommu/ipmmu-vmsa: Add utlb_offset_base
iommu/ipmmu-vmsa: Add helper functions for "uTLB" registers
iommu/ipmmu-vmsa: Calculate context registers' offset instead of a macro
iommu/ipmmu-vmsa: Add helper functions for MMU "context" registers
iommu/ipmmu-vmsa: tidyup register definitions
iommu/ipmmu-vmsa: Remove all unused register definitions
iommu/mediatek: Reduce the tlb flush timeout value
iommu/mediatek: Get rid of the pgtlock
iommu/mediatek: Move the tlb_sync into tlb_flush
iommu/mediatek: Delete the leaf in the tlb_flush
iommu/mediatek: Use gather to achieve the tlb range flush
iommu/mediatek: Add a new tlb_lock for tlb_flush
iommu/mediatek: Correct the flush_iotlb_all callback
iommu/io-pgtable-arm: Rename IOMMU_QCOM_SYS_CACHE and improve doc
iommu/io-pgtable-arm: Rationalise MAIR handling
...
Pull HID updates from Jiri Kosina:
- Support for Logitech G15 (Hans de Goede)
- HID parser improvements, improving support for some devices; e.g.
Windows Precision Touchpad, products from Primax, etc. (Blaž
Hrastnik, Candle Sun)
- robustification of tablet mode support in google-whiskers driver
(Dmitry Torokhov)
- assorted small fixes, device-specific quirks and device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (23 commits)
HID: rmi: Check that the RMI_STARTED bit is set before unregistering the RMI transport device
HID: quirks: remove hid-led devices from hid_have_special_driver
HID: Improve Windows Precision Touchpad detection.
HID: i2c-hid: Reset ALPS touchpads on resume
HID: i2c-hid: fix no irq after reset on raydium 3118
HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
HID: i2c-hid: remove orphaned member sleep_delay
HID: quirks: Add quirk for HP MSU1465 PIXART OEM mouse
HID: core: check whether Usage Page item is after Usage ID items
HID: intel-ish-hid: Spelling s/diconnect/disconnect/
HID: google: Detect base folded usage instead of hard-coding whiskers
HID: logitech: Add depends on LEDS_CLASS to Logitech Kconfig entry
HID: lg-g15: Add support for the G510's M1-M3 and MR LEDs
HID: lg-g15: Add support for controlling the G510's RGB backlight
HID: lg-g15: Add support for the G510 keyboards' gaming keys
HID: lg-g15: Add support for the M1-M3 and MR LEDs
HID: lg-g15: Add keyboard and LCD backlight control
HID: Add driver for Logitech gaming keyboards (G15, G15 v2)
Input: Add event-codes for macro keys found on various keyboards
HID: hidraw: replace printk() with corresponding pr_xx() variant
...
Core changes:
- Expose pull up/down flags for the GPIO character device to
userspace. After clear input from the RaspberryPi and Beagle
communities, it has been established that prototyping,
industrial automation and make communities strongly need
this feature, and as we want people to use the character
device, we have implemented the simple pull up/down
interface for GPIO lines. This means we can specify that
a (chip-specific) pull up/down resistor can be enabled,
but does not offer fine-grained control such as cases
where the resistance of the same pull resistor can be
controlled (yet).
- Introduce devm_fwnode_gpiod_get_index() and start to phase out
the old symbol devm_fwnode_get_index_gpiod_from_child().
- A bit of documentation clean-up work.
- Introduce a define for GPIO line directions and deploy it
in all GPIO drivers in the drivers/gpio directory.
- Add a special callback to populate pin ranges when
cooperating with the pin control subsystem and registering
ranges as part of adding a gpiolib driver and a
gpio_irq_chip driver at the same time. This is also
deployed in the Intel Merrifield driver.
New drivers:
- RDA Micro GPIO controller.
- XGS-iproc GPIO driver.
Driver improvements:
- Wake event and debounce support on the Tegra 186 driver.
- Finalize the Aspeed SGPIO driver.
- MPC8xxx uses a normal IRQ handler rather than a chained
handler.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl3hIDcACgkQQRCzN7AZ
XXOOyw/8DcaBV6j3EZPDS+b+N74/flNf9JitdJRCtUPn8mjdm+uKNPxbtL/znZc8
zd3rlpiZOqHy8klB3gOPJsJhgXY9QX/b+F5j8fHZvu0DACugRndCqMJ7wrgwUiPn
2ni6KJmz6z5urhcfaAIgyinTWOMegvSVfqjISaUCRCAg3F9dIeQoulRvTPk2ybvv
f31jAGemGbvvQPpd81SYCxuTbMg+jxBIgnOCaX+VmUaBLzh8J2W4It3Myp6M4Sbg
Td6QU4b2J2Q0quk/Dc7c4saT+qRODkg2syPKV2YqmWwdLRDxZyKESMdkCXLWlWJU
fP+KZ4lDxhCaOAYUrY2sEAYMw4E8MzrfeWikdIe0nk0DWqNQhWvDyzsNsB90XGFb
aGgeCPH2W1MdE6usPhLidBaHbLeowzndw5BiEl0UCJUqz7tzTCd5iMIAhoSU/Sr5
ymO8J45G9rdx5pscA3cXhpR/PmqaETYQ/uNrLuxTdI4F4xY12+M0vPrV8z3oDPxB
U/uL0v6HndDcFAavQQiMd9eL6Hocirnn+Z2xFut3nOznHY96ozXSnZb3lzvH/kqI
Du2C8geboVcZsiZJTKVN1zxnfIA8oDauzTOEpGFbIGFhmy0zt4RRRptL4W8NxeFm
KCOk/HqlGeuqJ49epta3mqhUC0MSASA9fdicCdiDqvw+puEznwU=
=o13I
-----END PGP SIGNATURE-----
Merge tag 'gpio-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v5.5 kernel cycle
Core changes:
- Expose pull up/down flags for the GPIO character device to
userspace.
After clear input from the RaspberryPi and Beagle communities, it
has been established that prototyping, industrial automation and
make communities strongly need this feature, and as we want people
to use the character device, we have implemented the simple pull
up/down interface for GPIO lines.
This means we can specify that a (chip-specific) pull up/down
resistor can be enabled, but does not offer fine-grained control
such as cases where the resistance of the same pull resistor can be
controlled (yet).
- Introduce devm_fwnode_gpiod_get_index() and start to phase out the
old symbol devm_fwnode_get_index_gpiod_from_child().
- A bit of documentation clean-up work.
- Introduce a define for GPIO line directions and deploy it in all
GPIO drivers in the drivers/gpio directory.
- Add a special callback to populate pin ranges when cooperating with
the pin control subsystem and registering ranges as part of adding
a gpiolib driver and a gpio_irq_chip driver at the same time. This
is also deployed in the Intel Merrifield driver.
New drivers:
- RDA Micro GPIO controller.
- XGS-iproc GPIO driver.
Driver improvements:
- Wake event and debounce support on the Tegra 186 driver.
- Finalize the Aspeed SGPIO driver.
- MPC8xxx uses a normal IRQ handler rather than a chained handler"
* tag 'gpio-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (64 commits)
gpio: Add TODO item for regmap helper
Documentation: gpio: driver.rst: Fix warnings
gpio: of: Fix bogus reference to gpiod_get_count()
gpiolib: Grammar s/manager/managed/
gpio: lynxpoint: Setup correct IRQ handlers
MAINTAINERS: Replace my email by one @kernel.org
gpiolib: acpi: Make acpi_gpiochip_alloc_event always return AE_OK
gpio/mpc8xxx: fix qoriq GPIO reading
gpio: mpc8xxx: Don't overwrite default irq_set_type callback
gpiolib: acpi: Print pin number on acpi_gpiochip_alloc_event errors
gpiolib: fix coding style in gpiod_hog()
drm/bridge: ti-tfp410: switch to using fwnode_gpiod_get_index()
gpio: merrifield: Pass irqchip when adding gpiochip
gpio: merrifield: Add GPIO <-> pin mapping ranges via callback
gpiolib: Introduce ->add_pin_ranges() callback
gpio: mmio: remove untrue leftover comment
gpio: em: Use platform_get_irq() to obtain interrupts
gpio: tegra186: Add debounce support
gpio: tegra186: Program interrupt route mapping
gpio: tegra186: Derive register offsets from bank/port
...
This is a series of cleanups for the y2038 work, mostly intended
for namespace cleaning: the kernel defines the traditional
time_t, timeval and timespec types that often lead to y2038-unsafe
code. Even though the unsafe usage is mostly gone from the kernel,
having the types and associated functions around means that we
can still grow new users, and that we may be missing conversions
to safe types that actually matter.
There are still a number of driver specific patches needed to
get the last users of these types removed, those have been
submitted to the respective maintainers.
Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJd3D+wAAoJEJpsee/mABjZfdcQAJvl6e+4ddKoDMIVJqVCE25N
meFRgA7S8jy6BefEVeUgI8TxK+amGO36szMBUEnZxSSxq9u+gd13m5bEK6Xq/ov7
4KTAiA3Irm/W5FBTktu1zc5ROIra1Xj7jLdubf8wEC3viSXIXB3+68Y28iBN7D2O
k9kSpwINC5lWeC8guZy2I+2yc4ywUEXao9nVh8C/J+FQtU02TcdLtZop9OhpAa8u
U19VVH3WHkQI7ZfLvBTUiYK6tlYTiYCnpr8l6sm850CnVv1fzBW+DzmVhPJ6FdFd
4m5staC0sQ6gVqtjVMBOtT5CdzREse6hpwbKo2GRWFroO5W9tljMOJJXHvv/f6kz
DxrpUmj37JuRbqAbr8KDmQqPo6M2CRkxFxjol1yh5ER63u1xMwLm/PQITZIMDvPO
jrFc2C2SdM2E9bKP/RMCVoKSoRwxCJ5IwJ2AF237rrU0sx/zB2xsrOGssx5CWEgc
3bbk6tDQujJJubnCfgRy1tTxpLZOHEEKw8YhFLLbR2LCtA9pA/0rfLLad16cjA5e
5jIHxfsFc23zgpzrJeB7kAF/9xgu1tlA5BotOs3VBE89LtWOA9nK5dbPXng6qlUe
er3xLCfS38ovhUw6DusQpaYLuaYuLM7DKO4iav9kuTMcY9GkbPk7vDD3KPGh2goy
hY5cSM8+kT1q/THLnUBH
=Bdbv
-----END PGP SIGNATURE-----
Merge tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull y2038 cleanups from Arnd Bergmann:
"y2038 syscall implementation cleanups
This is a series of cleanups for the y2038 work, mostly intended for
namespace cleaning: the kernel defines the traditional time_t, timeval
and timespec types that often lead to y2038-unsafe code. Even though
the unsafe usage is mostly gone from the kernel, having the types and
associated functions around means that we can still grow new users,
and that we may be missing conversions to safe types that actually
matter.
There are still a number of driver specific patches needed to get the
last users of these types removed, those have been submitted to the
respective maintainers"
Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
y2038: alarm: fix half-second cut-off
y2038: ipc: fix x32 ABI breakage
y2038: fix typo in powerpc vdso "LOPART"
y2038: allow disabling time32 system calls
y2038: itimer: change implementation to timespec64
y2038: move itimer reset into itimer.c
y2038: use compat_{get,set}_itimer on alpha
y2038: itimer: compat handling to itimer.c
y2038: time: avoid timespec usage in settimeofday()
y2038: timerfd: Use timespec64 internally
y2038: elfcore: Use __kernel_old_timeval for process times
y2038: make ns_to_compat_timeval use __kernel_old_timeval
y2038: socket: use __kernel_old_timespec instead of timespec
y2038: socket: remove timespec reference in timestamping
y2038: syscalls: change remaining timeval to __kernel_old_timeval
y2038: rusage: use __kernel_old_timeval
y2038: uapi: change __kernel_time_t to __kernel_old_time_t
y2038: stat: avoid 'time_t' in 'struct stat'
y2038: ipc: remove __kernel_time_t reference from headers
y2038: vdso: powerpc: avoid timespec references
...
As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need support
for time64_t.
In completely unrelated work, I spent time on cleaning up parts of this
file in the past, moving things out into drivers instead.
After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the rest
of it and move it all into drivers.
This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which is
the scsi ioctl handlers. I have patches for those as well, but they need
more testing or possibly a rewrite.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1
RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ
+ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF
lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL
6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD
dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH
VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb
YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT
m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm
TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n
e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl
bN65armTm7bFFR32Avnu
=lgCl
-----END PGP SIGNATURE-----
Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
"As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need
support for time64_t.
In completely unrelated work, I spent time on cleaning up parts of
this file in the past, moving things out into drivers instead.
After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the
rest of it and move it all into drivers.
This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which
is the scsi ioctl handlers. I have patches for those as well, but they
need more testing or possibly a rewrite"
* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
scsi: sd: enable compat ioctls for sed-opal
pktcdvd: add compat_ioctl handler
compat_ioctl: move SG_GET_REQUEST_TABLE handling
compat_ioctl: ppp: move simple commands into ppp_generic.c
compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
compat_ioctl: unify copy-in of ppp filters
tty: handle compat PPP ioctls
compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
compat_ioctl: handle SIOCOUTQNSD
af_unix: add compat_ioctl support
compat_ioctl: reimplement SG_IO handling
compat_ioctl: move WDIOC handling into wdt drivers
fs: compat_ioctl: move FITRIM emulation into file systems
gfs2: add compat_ioctl support
compat_ioctl: remove unused convert_in_user macro
compat_ioctl: remove last RAID handling code
compat_ioctl: remove /dev/raw ioctl translation
compat_ioctl: remove PCI ioctl translation
compat_ioctl: remove joystick ioctl translation
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl3dbM4UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMagw/+MiOlIHFykjK0NGOyUbXR7AwjdrHz
1Fgkh7VZCAHMfCcdlljIIkYe7P+ybfdK51E1QLBiTPGh353JJRAvrjFbDcIyT4kf
u7AVddUeT0QQefFs39ZFWTV0mvJjDBfjFkmiL3cdY9ulx4ZX8V426qjyl8KIwTHe
YkQF3pYMmO28G1SfZu298zTmrFRA10FezxCUbBRZTTE9FcD7mnGQRB7w/wY9t0H1
ebIDgDA0EBh3oznGD3qxD63b5ULdSrImTzvSmEfhzZZsoZB4XGO5MOnLRqgBwFbT
qfRSRfHVl+1JtDHCU43zOUlzScAsff5mvfVNdDMLAFtGGSjDGxgxKNyLwp8+5wmH
GzvB99QQECvCN+gbaedm6adWBGzi7vpoCcgfqY0UPLYvCqsNFbZw4U/iu5ONKWy/
cEGVUGzHf0pWofugDIJfGQuRt6iS2XT9Ode6+QMx8OiC3auZcluhTuJSxMQIhFZ1
5XmoHOQddBwlalmIx8fsIHSo6xsAjNFwTOikEFVzUB+ECR2Urs8eHvQj+d94xz6e
q9LrNkt/eVIDiI+PeP+UBD5IJlfmRSoyUd/mIDFMfmqMBubBSQl70Eyt3dUdt1Bz
0PBs6xjYztpgk3Q7s35TMn8EvDcEksG0WEoFM5fEohYPLWQf8tJ5BbyYJJqc0sod
ExlXu1lPfg9ppKc=
=fc7R
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20191126' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Audit is back for v5.5, albeit with only two patches:
- Allow for the auditing of suspicious O_CREAT usage via the new
AUDIT_ANOM_CREAT record.
- Remove a redundant if-conditional check found during code analysis.
It's a minor change, but when the pull request is only two patches
long, you need filler in the pull request email"
[ Heh on the pull request filler. I wish more people tried to write
better pull request messages, even if maybe it's not worth it for the
trivial cases ;^) - Linus ]
* tag 'audit-pr-20191126' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove redundant condition check in kauditd_thread()
audit: Report suspicious O_CREAT usage
Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines. The
firmware support is still in development, so the code here won't actually
activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict it to
read-only mode when the kernel is lockdown'ed, otherwise it's trivial to drop
into xmon and modify kernel data, such as the lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache() (VDSO) to work
with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management) driver to
make it behave more like other balloon drivers and enable some cleanups of
generic mm code.
- A series of fixes to our hardware breakpoint support to properly handle
unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to:
Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V, Anthony Steinhauser,
Cédric Le Goater, Chris Packham, Chris Smart, Christophe Leroy, Christopher M.
Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Hildenbrand,
Deb McLemore, Diana Craciun, Eric Richter, Geert Uytterhoeven, Greg
Kroah-Hartman, Greg Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason
Yan, Krzysztof Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M.
Rodrigues, Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes, Ravi
Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth, Tyrel
Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3hBycTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgApBEACk91MEQDYJ9MF9I6uN+85qb5p4pMsp
rGzqnpt+XFidbDAc3eP63pYfIDSo3jtkQ2YL7shAnDOTvkO0md+Vqkl9Aq/G6FIf
lDBlwbgkXMSxS/O2Lpvfn4NZAoK6dKmiV55LSgfliM62X3e2Saeg6TR55wBTgJ6/
SlYPDwZfcVHOAiFS3UmfB+hkiIZk+AI5Zr5VAZvT2ZmeH36yAWkq4JgJI1uAk6m1
/7iCnlfUjx/nl/BhnA3kjjmAgGCJ5s/WuVgwFMz47XpMBWGBhLWpMh/NqDTFb8ca
kpiVQoVPQe2xyO3pL/kOwBy6sii26ftfHDhLKMy1hJdEhVQzS5LerPIMeh1qsU8Q
hV/Cj+jfsrS/vBDOehj3jwx93t+861PmTOqgLnpYQ6Ltrt+2B/74+fufGMHE1kI3
Ffo7xvNw4sw6bSziDxDFqUx2P1dFN5D5EJsJsYM98ekkVAAkzNqCDRvfD2QI8Pif
VXWPYXqtNJTrVPJA0D7Yfo9FDNwhANd0f1zi7r/U5mVXBFUyKOlGqTQSkXgMrVeK
3I7wHPOVGgdA5UUkfcd3pcuqsY081U9E//o5PUfj8ybO5JCwly8NoatbG+xHmKia
a72uJT8MjCo9mGCHKDrwi9l/kqms6ZSv8RP+yMhGuB52YoiGc6PpVyab5jXIUd1N
yTtBlC0YGW1JYw==
=JHzg
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines.
The firmware support is still in development, so the code here
won't actually activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict
it to read-only mode when the kernel is lockdown'ed, otherwise it's
trivial to drop into xmon and modify kernel data, such as the
lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache()
(VDSO) to work with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management)
driver to make it behave more like other balloon drivers and enable
some cleanups of generic mm code.
- A series of fixes to our hardware breakpoint support to properly
handle unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V,
Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart,
Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio
Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana
Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg
Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof
Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes,
Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth,
Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing"
* tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits)
powerpc/fixmap: fix crash with HIGHMEM
x86/efi: remove unused variables
powerpc: Define arch_is_kernel_initmem_freed() for lockdep
powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
powerpc: Avoid clang warnings around setjmp and longjmp
powerpc: Don't add -mabi= flags when building with Clang
powerpc: Fix Kconfig indentation
powerpc/fixmap: don't clear fixmap area in paging_init()
selftests/powerpc: spectre_v2 test must be built 64-bit
powerpc/powernv: Disable native PCIe port management
powerpc/kexec: Move kexec files into a dedicated subdir.
powerpc/32: Split kexec low level code out of misc_32.S
powerpc/sysdev: drop simple gpio
powerpc/83xx: map IMMR with a BAT.
powerpc/32s: automatically allocate BAT in setbat()
powerpc/ioremap: warn on early use of ioremap()
powerpc: Add support for GENERIC_EARLY_IOREMAP
powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt()
powerpc/8xx: use the fixmapped IMMR in cpm_reset()
powerpc/8xx: add __init to cpm1 init functions
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3f/C0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgprUjD/4t5ituIXmfPDulW4178LchRbAdC2FquGeU
Okft4laeQsJ8ucTBm3RAziVDNnkeJ0mLy5JTMpdPwKW+1F/d8jPuDeTlmmRyKGyW
vt2NlI9fKaOt8cU+LPp3MiGIamleZFOEPsRHU03HYASFcRKQwAswgjS2DZnR7oTK
66XGB9CdwREh1mM63wERYX07+t+ylocTClcLQDCi99H4HI5ScsJXB5aSqpombycy
XujZGlEAUT6WLp0HAwihhAnQIu/fEw+tXMWLvozphIP9xZGpk+a2nbd2WxTCdLZy
MoksoCpl76RmYPuZ+TC1L+tHf2KqGHa/mo64L/6v15QS8hROw3ZTMdJI0mEC/90g
5JwWuBqWOcRiMcV1Uwkz5OYL3KYQOWebS+FBbIc+2jqBlXJeN+1R/Ik5HP8xtGaE
5p317/q09h32piZN0c+BmkXCdnAYZeBez0EjSXf6j3PW3LAqmLOjxc0yA2OmwFWj
iudOfBOg48HtHkMaNrg2f+0THMP8zHqUVoRh96BKCSjnOXmbxGFt3ACHvb+VS0CX
RmlVOcaq+SzL1EsT805Hb6N3x7mK2l9CoanGfLxYXQZlOzD6TcRdlBzs9JCgqiAs
3jk5nAq9wtYvigAkVarMw/LAJNatEO4sy0AHyWJwAcfOsVJquNvhhokT8VxOcVZs
pMxUhUYKZA==
=HqiO
-----END PGP SIGNATURE-----
Merge tag 'for-5.5/io_uring-post-20191128' of git://git.kernel.dk/linux-block
Pull more io_uring updates from Jens Axboe:
"As mentioned in the first pull request, there was a later batch as
well. This contains fixes to the stuff that already went in, cleanups,
and a few later additions. In particular, this contains:
- Cleanups/fixes/unification of the submission and completion path
(Pavel,me)
- Linked timeouts improvements (Pavel,me)
- Error path fixes (me)
- Fix lookup window where cancellations wouldn't work (me)
- Improve DRAIN support (Pavel)
- Fix backlog flushing -EBUSY on submit (me)
- Add support for connect(2) (me)
- Fix for non-iter based fixed IO (Pavel)
- creds inheritance for async workers (me)
- Disable cmsg/ancillary data for sendmsg/recvmsg (me)
- Shrink io_kiocb to 3 cachelines (me)
- NUMA fix for io-wq (Jann)"
* tag 'for-5.5/io_uring-post-20191128' of git://git.kernel.dk/linux-block: (42 commits)
io_uring: make poll->wait dynamically allocated
io-wq: shrink io_wq_work a bit
io-wq: fix handling of NUMA node IDs
io_uring: use kzalloc instead of kcalloc for single-element allocations
io_uring: cleanup io_import_fixed()
io_uring: inline struct sqe_submit
io_uring: store timeout's sqe->off in proper place
net: disallow ancillary data for __sys_{send,recv}msg_file()
net: separate out the msghdr copy from ___sys_{send,recv}msg()
io_uring: remove superfluous check for sqe->off in io_accept()
io_uring: async workers should inherit the user creds
io-wq: have io_wq_create() take a 'data' argument
io_uring: fix dead-hung for non-iter fixed rw
io_uring: add support for IORING_OP_CONNECT
net: add __sys_connect_file() helper
io_uring: only return -EBUSY for submit on non-flushed backlog
io_uring: only !null ptr to io_issue_sqe()
io_uring: simplify io_req_link_next()
io_uring: pass only !null to io_req_find_next()
io_uring: remove io_free_req_find_next()
...
- Protect pci_reassign_bridge_resources() against concurrent
addition/removal (Benjamin Herrenschmidt)
- Fix bridge dma_ranges resource list cleanup (Rob Herring)
- Add PCI_STD_NUM_BARS for the number of standard BARs (Denis Efremov)
- Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control the
MMIO and prefetchable MMIO window sizes of hotplug bridges
independently (Nicholas Johnson)
- Fix MMIO/MMIO_PREF window assignment that assigned more space than
desired (Nicholas Johnson)
- Only enforce bus numbers from bridge EA if the bridge has EA devices
downstream (Subbaraya Sundeep)
* pci/resource:
PCI: Do not use bus number zero from EA capability
PCI: Avoid double hpmemsize MMIO window assignment
PCI: Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters
PCI: Add PCI_STD_NUM_BARS for the number of standard BARs
PCI: Fix missing bridge dma_ranges resource list cleanup
PCI: Protect pci_reassign_bridge_resources() against concurrent addition/removal
Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
This ioctl will be issued by QEMU during reset and includes the
the following steps:
- Release all device pages of the secure guest.
- Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
- Unpin the VPA pages so that they can be migrated back to secure
side when guest becomes secure again. This is required because
pinned pages can't be migrated.
- Reinit the partition scoped page tables
After these steps, guest is ready to issue UV_ESM call once again
to switch to secure mode.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[Implementation of uv_svm_terminate() and its call from
guest shutdown path]
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[Unpinning of VPA pages]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl3dKsYACgkQCF8+vY7k
4RWmzBAAnIL1V0Jb6vJo9AJUmntDiO/3lVGjz+6bzvlhsjbPD0/6r1AY2ZRZxO97
U3TTnB9m1nRjtJaaZ5Lmx9kc9suB8u1s9HziTcKIQ971goSZapmrx3IKPK64jt5D
Uw6nzfinvAgRw0Rt0t6yd+yciw5bsqmGR5iIz/A29BXHnVu5P2TcGE8IM/oloxqV
ZQkff30jieSM3CXq9hQSoekJkipyCXN3xbKncVVHc4TEonvXSCcAQxC6tRulKpi/
Y/MOxB3v7Vh3snVe5lI3fBYPr8Iw8xpBTMXg7s5e4Sb9aCeZioF0cb2nLJFG8tw8
2TviW1Fktt/N1nwZreRgVMkl19fGm3IdVd+1ukWgXS4EvV3FFsx3FZMFgAebhvKZ
1SHxhGyC5yc5yDl8f9/yo/cyKRzeDj7h6SSD2TUKRmnKXWBin9cNr3CPNJDvTk6Q
rgb2B/Sbs00fZFihRdkt8n6SmHpARS4QBZiAKpHECZhmBKNqFLg050v0uzB1Z7D8
r7gK8NIWQORnmLWmHCa/kCAB8syUm8cW6LibWAvCbfN0FEaYdujFtvC+v0BQSTVJ
f9BNFwUs3mCJ6x30F2DN60f2z3G1ehuNPMxFBFJpa9sLgh2wEPVaqEakGWE2iMUP
Rv/cz0CJQweJutdhIAHtk3nVOQZeiAcqZMeHYemX9lL62Mjprmo=
=VN90
-----END PGP SIGNATURE-----
Merge tag 'media/v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- uAPI documentation for stateless decoders
- Added a new CEC ioctl together with its documentation
- Improved IPU3 documentation
- New i2c drivers: hi556 and imx290
- Added support on Vivid driver for meta streams
- Added de-interlace support for sunxi subdriver
- Added a few new remote controler keymaps
- Added H.265 support for Sunxi Cedrus driver
- Another round of random driver cleanups, fixes and improvements
* tag 'media/v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (361 commits)
media: Revert "media: mtk-vcodec: Remove extra area allocation in an input buffer on encoding"
media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly
media: hantro: Remove now unused H264 pic_size
media: hantro: Use output buffer width and height for H264 decoding
media: hantro: Reduce H264 extra space for motion vectors
media: hantro: Fix H264 motion vector buffer offset
media: ti-vpe: vpe: fix compatible to match bindings
media: dt-bindings: media: ti-vpe: Document VPE driver
media: zr364xx: remove redundant assigmnent to idx, clean up code
media: Documentation: media: *_DEFAULT targets for subdevs
media: hantro: Fix s_fmt for dynamic resolution changes
media: i2c: Use the correct style for SPDX License Identifier
media: siano: Use the correct style for SPDX License Identifier
media: vicodec: media_device_cleanup was called too early
media: vim2m: media_device_cleanup was called too early
media: cedrus: Increase maximum supported size
media: cedrus: Fix H264 4k support
media: cedrus: Properly signal size in mode register
media: v4l2-ctrl: Lock main_hdl on operations of requests_queued.
media: si470x-i2c: add missed operations in remove
...
Pull perf updates from Ingo Molnar:
"The main kernel side changes in this cycle were:
- Various Intel-PT updates and optimizations (Alexander Shishkin)
- Prohibit kprobes on Xen/KVM emulate prefixes (Masami Hiramatsu)
- Add support for LSM and SELinux checks to control access to the
perf syscall (Joel Fernandes)
- Misc other changes, optimizations, fixes and cleanups - see the
shortlog for details.
There were numerous tooling changes as well - 254 non-merge commits.
Here are the main changes - too many to list in detail:
- Enhancements to core tooling infrastructure, perf.data, libperf,
libtraceevent, event parsing, vendor events, Intel PT, callchains,
BPF support and instruction decoding.
- There were updates to the following tools:
perf annotate
perf diff
perf inject
perf kvm
perf list
perf maps
perf parse
perf probe
perf record
perf report
perf script
perf stat
perf test
perf trace
- And a lot of other changes: please see the shortlog and Git log for
more details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (279 commits)
perf parse: Fix potential memory leak when handling tracepoint errors
perf probe: Fix spelling mistake "addrees" -> "address"
libtraceevent: Fix memory leakage in copy_filter_type
libtraceevent: Fix header installation
perf intel-bts: Does not support AUX area sampling
perf intel-pt: Add support for decoding AUX area samples
perf intel-pt: Add support for recording AUX area samples
perf pmu: When using default config, record which bits of config were changed by the user
perf auxtrace: Add support for queuing AUX area samples
perf session: Add facility to peek at all events
perf auxtrace: Add support for dumping AUX area samples
perf inject: Cut AUX area samples
perf record: Add aux-sample-size config term
perf record: Add support for AUX area sampling
perf auxtrace: Add support for AUX area sample recording
perf auxtrace: Move perf_evsel__find_pmu()
perf record: Add a function to test for kernel support for AUX area sampling
perf tools: Add kernel AUX area sampling definitions
perf/core: Make the mlock accounting simple again
perf report: Jump to symbol source view from total cycles view
...
Pull networking updates from David Miller:
"Another merge window, another pull full of stuff:
1) Support alternative names for network devices, from Jiri Pirko.
2) Introduce per-netns netdev notifiers, also from Jiri Pirko.
3) Support MSG_PEEK in vsock/virtio, from Matias Ezequiel Vara
Larsen.
4) Allow compiling out the TLS TOE code, from Jakub Kicinski.
5) Add several new tracepoints to the kTLS code, also from Jakub.
6) Support set channels ethtool callback in ena driver, from Sameeh
Jubran.
7) New SCTP events SCTP_ADDR_ADDED, SCTP_ADDR_REMOVED,
SCTP_ADDR_MADE_PRIM, and SCTP_SEND_FAILED_EVENT. From Xin Long.
8) Add XDP support to mvneta driver, from Lorenzo Bianconi.
9) Lots of netfilter hw offload fixes, cleanups and enhancements,
from Pablo Neira Ayuso.
10) PTP support for aquantia chips, from Egor Pomozov.
11) Add UDP segmentation offload support to igb, ixgbe, and i40e. From
Josh Hunt.
12) Add smart nagle to tipc, from Jon Maloy.
13) Support L2 field rewrite by TC offloads in bnxt_en, from Venkat
Duvvuru.
14) Add a flow mask cache to OVS, from Tonghao Zhang.
15) Add XDP support to ice driver, from Maciej Fijalkowski.
16) Add AF_XDP support to ice driver, from Krzysztof Kazimierczak.
17) Support UDP GSO offload in atlantic driver, from Igor Russkikh.
18) Support it in stmmac driver too, from Jose Abreu.
19) Support TIPC encryption and auth, from Tuong Lien.
20) Introduce BPF trampolines, from Alexei Starovoitov.
21) Make page_pool API more numa friendly, from Saeed Mahameed.
22) Introduce route hints to ipv4 and ipv6, from Paolo Abeni.
23) Add UDP segmentation offload to cxgb4, Rahul Lakkireddy"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1857 commits)
libbpf: Fix usage of u32 in userspace code
mm: Implement no-MMU variant of vmalloc_user_node_flags
slip: Fix use-after-free Read in slip_open
net: dsa: sja1105: fix sja1105_parse_rgmii_delays()
macvlan: schedule bc_work even if error
enetc: add support Credit Based Shaper(CBS) for hardware offload
net: phy: add helpers phy_(un)lock_mdio_bus
mdio_bus: don't use managed reset-controller
ax88179_178a: add ethtool_op_get_ts_info()
mlxsw: spectrum_router: Fix use of uninitialized adjacency index
mlxsw: spectrum_router: After underlay moves, demote conflicting tunnels
bpf: Simplify __bpf_arch_text_poke poke type handling
bpf: Introduce BPF_TRACE_x helper for the tracing tests
bpf: Add bpf_jit_blinding_enabled for !CONFIG_BPF_JIT
bpf, testing: Add various tail call test cases
bpf, x86: Emit patchable direct jump as tail call
bpf: Constant map key tracking for prog array pokes
bpf: Add poke dependency tracking for prog array maps
bpf: Add initial poke descriptor table for jit images
bpf: Move owner type, jited info into array auxiliary data
...
This allows an application to call connect() in an async fashion. Like
other opcodes, we first try a non-blocking connect, then punt to async
context if we have to.
Note that we can still return -EINPROGRESS, and in that case the caller
should use IORING_OP_POLL_ADD to do an async wait for completion of the
connect request (just like for regular connect(2), except we can do it
async here too).
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXdfjBwAKCRCRxhvAZXjc
onCBAP47WZ/ie7yjoDWhOI1QB7II3NGSzToakxpgJaWoB+NjTwEA7PGrSYVEbPrf
pUhiEaEJ29t+cWUxX3+yDO+k7SA6BAY=
=Ra58
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread management updates from Christian Brauner:
- A pidfd's fdinfo file currently contains the field "Pid:\t<pid>"
where <pid> is the pid of the process in the pid namespace of the
procfs instance the fdinfo file for the pidfd was opened in.
The fdinfo file has now gained a new "NSpid:\t<ns-pid1>[\t<ns-pid2>[...]]"
field which lists the pids of the process in all child pid namespaces
provided the pid namespace of the procfs instance it is looked up
under has an ancestoral relationship with the pid namespace of the
process. If it does not 0 will be shown and no further pid namespaces
will be listed. Tests included. (Christian Kellner)
- If the process the pidfd references has already exited, print -1 for
the Pid and NSpid fields in the pidfd's fdinfo file. Tests included.
(me)
- Add CLONE_CLEAR_SIGHAND. This lets callers clear all signal handler
that are not SIG_DFL or SIG_IGN at process creation time. This
originated as a feature request from glibc to improve performance and
elimate races in their posix_spawn() implementation. Tests included.
(me)
- Add support for choosing a specific pid for a process with clone3().
This is the feature which was part of the thread update for v5.4 but
after a discussion at LPC in Lisbon we decided to delay it for one
more cycle in order to make the interface more generic. This has now
done. It is now possible to choose a specific pid in a whole pid
namespaces (sub)hierarchy instead of just one pid namespace. In order
to choose a specific pid the caller must have CAP_SYS_ADMIN in all
owning user namespaces of the target pid namespaces. Tests included.
(Adrian Reber)
- Test improvements and extensions. (Andrei Vagin, me)
* tag 'threads-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/clone3: skip if clone3() is ENOSYS
selftests/clone3: check that all pids are released on error paths
selftests/clone3: report a correct number of fails
selftests/clone3: flush stdout and stderr before clone3() and _exit()
selftests: add tests for clone3() with *set_tid
fork: extend clone3() to support setting a PID
selftests: add tests for clone3()
tests: test CLONE_CLEAR_SIGHAND
clone3: add CLONE_CLEAR_SIGHAND
pid: use pid_has_task() in pidfd_open()
exit: use pid_has_task() in do_wait()
pid: use pid_has_task() in __change_pid()
test: verify fdinfo for pidfd of reaped process
pidfd: check pid has attached task in fdinfo
pidfd: add tests for NSpid info in fdinfo
pidfd: add NSpid entries to fdinfo
- Data abort report and injection
- Steal time support
- GICv4 performance improvements
- vgic ITS emulation fixes
- Simplify FWB handling
- Enable halt polling counters
- Make the emulated timer PREEMPT_RT compliant
s390:
- Small fixes and cleanups
- selftest improvements
- yield improvements
PPC:
- Add capability to tell userspace whether we can single-step the guest.
- Improve the allocation of XIVE virtual processor IDs
- Rewrite interrupt synthesis code to deliver interrupts in virtual
mode when appropriate.
- Minor cleanups and improvements.
x86:
- XSAVES support for AMD
- more accurate report of nested guest TSC to the nested hypervisor
- retpoline optimizations
- support for nested 5-level page tables
- PMU virtualization optimizations, and improved support for nested
PMU virtualization
- correct latching of INITs for nested virtualization
- IOAPIC optimization
- TSX_CTRL virtualization for more TAA happiness
- improved allocation and flushing of SEV ASIDs
- many bugfixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJd27PMAAoJEL/70l94x66DspsH+gPc6YWtKJFJH58Zj8NrNh6y
t0FwDFcvUa51+m4jaY4L5Y8+zqu1dZFnPPhFGqNWpxrjCEvE/glQJv3BiUX06Seh
aYUHNymGoYCTJOHaaGhV+NlgQaDuZOCOkIsOLAPehyFd1KojwB+FRC0xmO6aROPw
9yQgYrKuK1UUn5HwxBNrMS4+Xv+2iKv/9sTnq1G4W2qX2NZQg84LVPg1zIdkCh3D
3GOvoCBEk3ivQqjmdE7rP/InPr0XvW0b6TFhchIk8J6jEIQFHsmOUefiTvTxsIHV
OKAZwvyeYPrYHA/aDZpaBmY2aR0ydfKDUQcviNIJoF1vOktGs0hvl3VbsmG8QCg=
=OSI1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- data abort report and injection
- steal time support
- GICv4 performance improvements
- vgic ITS emulation fixes
- simplify FWB handling
- enable halt polling counters
- make the emulated timer PREEMPT_RT compliant
s390:
- small fixes and cleanups
- selftest improvements
- yield improvements
PPC:
- add capability to tell userspace whether we can single-step the
guest
- improve the allocation of XIVE virtual processor IDs
- rewrite interrupt synthesis code to deliver interrupts in virtual
mode when appropriate.
- minor cleanups and improvements.
x86:
- XSAVES support for AMD
- more accurate report of nested guest TSC to the nested hypervisor
- retpoline optimizations
- support for nested 5-level page tables
- PMU virtualization optimizations, and improved support for nested
PMU virtualization
- correct latching of INITs for nested virtualization
- IOAPIC optimization
- TSX_CTRL virtualization for more TAA happiness
- improved allocation and flushing of SEV ASIDs
- many bugfixes and cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints
KVM: x86: Grab KVM's srcu lock when setting nested state
KVM: x86: Open code shared_msr_update() in its only caller
KVM: Fix jump label out_free_* in kvm_init()
KVM: x86: Remove a spurious export of a static function
KVM: x86: create mmu/ subdirectory
KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page
KVM: x86: remove set but not used variable 'called'
KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning
KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it
KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
KVM: x86: do not modify masked bits of shared MSRs
KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()
KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1
KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization
...
Expose the fs-verity bit through statx().
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXdtWqhQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+C9AQCCf8C2KP6DynoGQb9KRYYreJk8js8G
IgtlhazJ3j1RJAD/VijFbdwbxGCmiR1Y6BhKq5eaCYD1El68wSwkKuNO3ww=
=7WpU
-----END PGP SIGNATURE-----
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
"Expose the fs-verity bit through statx()"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
docs: fs-verity: mention statx() support
f2fs: support STATX_ATTR_VERITY
ext4: support STATX_ATTR_VERITY
statx: define STATX_ATTR_VERITY
docs: fs-verity: document first supported kernel version
- Add the IV_INO_LBLK_64 encryption policy flag which modifies the
encryption to be optimized for UFS inline encryption hardware.
- For AES-128-CBC, use the crypto API's implementation of ESSIV (which
was added in 5.4) rather than doing ESSIV manually.
- A few other cleanups.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXdtVMxQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK8MVAP44iRzj8ZXu62BhqNOYYcF60s/58QfZ
Jo1VdmvO/8MNrAD+P/jW5sqzcB5BLdNzS7pLKGIzsC55uMyp/79xyKK8wQc=
=XKWV
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
- Add the IV_INO_LBLK_64 encryption policy flag which modifies the
encryption to be optimized for UFS inline encryption hardware.
- For AES-128-CBC, use the crypto API's implementation of ESSIV (which
was added in 5.4) rather than doing ESSIV manually.
- A few other cleanups.
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
f2fs: add support for IV_INO_LBLK_64 encryption policies
ext4: add support for IV_INO_LBLK_64 encryption policies
fscrypt: add support for IV_INO_LBLK_64 policies
fscrypt: avoid data race on fscrypt_mode::logged_impl_name
docs: ioctl-number: document fscrypt ioctl numbers
fscrypt: zeroize fscrypt_info before freeing
fscrypt: remove struct fscrypt_ctx
fscrypt: invoke crypto API for ESSIV handling
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl3YCRAACgkQxWXV+ddt
WDuTuQ/7BOibDKqInm2SsL8xMuZqXjxGXUcHDPio5MbzNJ3wpV0j1KqWWsuK8hi0
HAhSI3fu3NG7RQYh3nuRO0CaZy3ENiqKoffrSpg7k5DJG0B7Lm/G/970fmOYUp6a
j6PMNcrKaw+1J3yuljSd20+n6j/hdmfn847ZsSY+7JmZ4zGMJ5GMv3IRipdLFUzR
tmjWmmCI05sF4/8cI6jzUVq588uSFTO1bGXFugmoO0ztpameudCnYniJI0tDBFSV
pqk6lqoOPNcaC9nATuA5KKOpUJ9nSscP/St3DV4D6LaZKkT/M5zs12lXPMJx/pKn
oCHt/A/wBdbDOoy1uHVMWQ9cz9PyVFtU7eSKizcFjoqnHO6fzlnRr9fsmIZKtTw9
H6nXVmP1S+xJg/zTBxCXHVfZR2dqADUsHWztN1LM8Pen/l9+UMwBeMhq9f9Jz68I
kF7zWlfLEtNh8naEYf34LkGVMtCNY4PHFsSztPg/jbfsH34xMvetKvPR2s8lejhp
42YqPHgEh2+8mmVcq65+jl+bPOp/5bdToRtuPiszWiKZSXt/5xplP+5lkSEet0J6
4aNZ8NRAiZ98br45jdTMUVSo6YtI27SS+GdVOUHPQtPI/kWi9XHx+l3E9MVOUtrd
lQ1Z9tPinEnJH4kntiCz2sKdzNKE01IagV4wFylz1Ct+ZqF9jNs=
=JzIp
-----END PGP SIGNATURE-----
Merge tag 'for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible changes:
- new block group profiles: RAID1 with 3- and 4- copies
- RAID1 in btrfs has always 2 copies, now add support for 3 and 4
- this is an incompat feature (named RAID1C34)
- recommended use of RAID1C3 is replacement of RAID6 profile on
metadata, this brings a more reliable resiliency against 2
device loss/damage
- support for new checksums
- per-filesystem, set at mkfs time
- fast hash (crc32c successor): xxhash, 64bit digest
- strong hashes (both 256bit): sha256 (slower, FIPS), blake2b
(faster)
- the blake2b module goes via the crypto tree, btrfs.ko has a
soft dependency
- speed up lseek, don't take inode locks unnecessarily, this can
speed up parallel SEEK_CUR/SEEK_SET/SEEK_END by 80%
- send:
- allow clone operations within the same file
- limit maximum number of sent clone references to avoid slow
backref walking
- error message improvements: device scan prints process name and PID
Core changes:
- cleanups
- remove unique workqueue helpers, used to provide a way to avoid
deadlocks in the workqueue code, now done in a simpler way
- remove lots of indirect function calls in compression code
- extent IO tree code moved out of extent_io.c
- cleanup backup superblock handling at mount time
- transaction life cycle documentation and cleanups
- locking code cleanups, annotations and documentation
- add more cold, const, pure function attributes
- removal of unused or redundant struct members or variables
- new tree-checker sanity tests
- try to detect missing INODE_ITEM, cross-reference checks of
DIR_ITEM, DIR_INDEX, INODE_REF, and XATTR_* items
- remove own bio scheduling code (used to avoid checksum submissions
being stuck behind other IO), replaced by cgroup controller-based
code to allow better control and avoid priority inversions in cases
where the custom and cgroup scheduling disagreed
Fixes:
- avoid getting stuck during cyclic writebacks
- fix trimming of ranges crossing block group boundaries
- fix rename exchange on subvolumes, all involved subvolumes need to
be recorded in the transaction"
* tag 'for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (137 commits)
btrfs: drop bdev argument from submit_extent_page
btrfs: remove extent_map::bdev
btrfs: drop bio_set_dev where not needed
btrfs: get bdev directly from fs_devices in submit_extent_page
btrfs: record all roots for rename exchange on a subvol
Btrfs: fix block group remaining RO forever after error during device replace
btrfs: scrub: Don't check free space before marking a block group RO
btrfs: change btrfs_fs_devices::rotating to bool
btrfs: change btrfs_fs_devices::seeding to bool
btrfs: rename btrfs_block_group_cache
btrfs: block-group: Reuse the item key from caller of read_one_block_group()
btrfs: block-group: Refactor btrfs_read_block_groups()
btrfs: document extent buffer locking
btrfs: access eb::blocking_writers according to ACCESS_ONCE policies
btrfs: set blocking_writers directly, no increment or decrement
btrfs: merge blocking_writers branches in btrfs_tree_read_lock
btrfs: drop incompat bit for raid1c34 after last block group is gone
btrfs: add incompat for raid1 with 3, 4 copies
btrfs: add support for 4-copy replication (raid1c4)
btrfs: add support for 3-copy replication (raid1c3)
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WxrEQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuH5D/9qQKfIIuQDUNO4Xx+dIHimTDCrfiEOeO9e
CRaMuSj+yMxLDMwfX8RnDmR17H3ZVoiIY1CT24U9ZkA5iDjeAH4xmzkH30US7LR7
/64YVZTxB0OrWppRK8RiIhaJJZDQ6+HPUQsn6PRaLVuFHi2unMoTQnj/ZQKz03QA
Pl8Xx7qBtH1JwYCzQ21f/uryAcNg9eWabRLN2f1uiOXLmvRxOfh6Z/iaezlaZlmL
qeJdcdLjjvOgOPwEOfNjfS6pd+XBz3gdEhn0l+11nHITxWZmVBwsWTKyUQlCmKnl
yuCWDVyx5d6zCnlrLYG0l2Fn2lr9SwAkdkq3YAKV03hA/6s6P9q9bm31VvOf828x
7gmr4YVz68y7H9bM0QAHCvDpjll0aIEUw6XFzSOCDtZ9B6/pppYQWzMU71J05eyF
8DOKv2M2EVNLUjf6u0RDyolnWGU0kIjt5ryWE3OsGcezAVa2wYstgUJTKbrn1YgT
j+4KTpaI+sg8GKDFauvxcSa6gwoRp6jweFNW+7vC090/shXmrGmVLOnQZKRuHho/
O4W8y/1/deM8CCIAETpiNxA8RV5U/EZygrFGDFc7yzTtVDGHY356M/B4Bmm2qkVu
K3WgeZp8Fc0lH0QF6Pp9ZlBkZEpGNCAPVsPkXIsxQXbctftkn3KY//uIubfpFEB1
PpHSicvkww==
=HYYq
-----END PGP SIGNATURE-----
Merge tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
"Due to more granular branches, this one is small and will be followed
with other core branches that add specific features. I meant to just
have a core and drivers branch, but external dependencies we ended up
adding a few more that are also core.
The changes are:
- Fixes and improvements for the zoned device support (Ajay, Damien)
- sed-opal table writing and datastore UID (Revanth)
- blk-cgroup (and bfq) blk-cgroup stat fixes (Tejun)
- Improvements to the block stats tracking (Pavel)
- Fix for overruning sysfs buffer for large number of CPUs (Ming)
- Optimization for small IO (Ming, Christoph)
- Fix typo in RWH lifetime hint (Eugene)
- Dead code removal and documentation (Bart)
- Reduction in memory usage for queue and tag set (Bart)
- Kerneldoc header documentation (André)
- Device/partition revalidation fixes (Jan)
- Stats tracking for flush requests (Konstantin)
- Various other little fixes here and there (et al)"
* tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block: (48 commits)
Revert "block: split bio if the only bvec's length is > SZ_4K"
block: add iostat counters for flush requests
block,bfq: Skip tracing hooks if possible
block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64'
blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1
block: Don't disable interrupts in trigger_softirq()
sbitmap: Delete sbitmap_any_bit_clear()
blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue()
block: split bio if the only bvec's length is > SZ_4K
block: still try to split bio if the bvec crosses pages
blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT
blk-cgroup: reimplement basic IO stats using cgroup rstat
blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive()
blk-throtl: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: relocate bfqg_*rwstat*() helpers
block: add zone open, close and finish ioctl support
block: add zone open, close and finish operations
block: Simplify REQ_OP_ZONE_RESET_ALL handling
block: Remove REQ_OP_ZONE_RESET plugging
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WxNwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgps4kD/9SIDXhYhhE8fNqeAF7Uouu8fxgwnkY3hSI
43vJwCziiDxWWJH5mYW7/83VNOMZKHIbiYMnU6iEUsRQ/sG/wI0wEfAQZDHLzCKt
cko2q7zAC1/4rtoslwJ3q04hE2Ap/nb93ELZBVr7fOAuODBNFUp/vifAojvsMPKz
hNMNPq/vYg7c/iYMZKSBdtjE3tqceFNBjAVNMB9dHKQLeexEy4ve7AjBeawWsSi7
GesnQ5w5u5LqkMYwLslpv/oVjHiiFWgGnDAvBNvykQvVy+DfB54KSqMV11W1aqdU
l6L+ENfZasEvlk1yMAth2Foq4vlscm5MKEb6VdJhXWHHXtXkcBmz7RBqPmjSvXCY
wS5GZRw8oYtTcid0aQf+t/wgRNTDJsGsnsT32qto41No3Z7vlIDHUDxHZGTA+gEL
E8j9rDx6EXMTo3EFbC8XZcfsorhPJ1HKAyw1YFczHtYzJEQUR9jJe3f/Q9u6K2Vy
s/EhkVeHa/lEd7kb6mI+6lQjGe1FXl7AHauDuaaEfIOZA/xJB3Bad5Wjq1va1cUO
TX+37zjzFzJghhSIBGYq7G7iT4AMecPQgxHzCdCyYfW5S4Uur9tMmIElwVPI/Pjl
kDZ9gdg9lm6JifZ9Ab8QcGhuQQTF3frwX9VfgrVgcqyvm38AiYzVgL9ZJnxRS/Cy
ZfLNkACXqQ==
=YZ9s
-----END PGP SIGNATURE-----
Merge tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"A lot of stuff has been going on this cycle, with improving the
support for networked IO (and hence unbounded request completion
times) being one of the major themes. There's been a set of fixes done
this week, I'll send those out as well once we're certain we're fully
happy with them.
This contains:
- Unification of the "normal" submit path and the SQPOLL path (Pavel)
- Support for sparse (and bigger) file sets, and updating of those
file sets without needing to unregister/register again.
- Independently sized CQ ring, instead of just making it always 2x
the SQ ring size. This makes it more flexible for networked
applications.
- Support for overflowed CQ ring, never dropping events but providing
backpressure on submits.
- Add support for absolute timeouts, not just relative ones.
- Support for generic cancellations. This divorces io_uring from
workqueues as well, which additionally gets us one step closer to
generic async system call support.
- With cancellations, we can support grabbing the process file table
as well, just like we do mm context. This allows support for system
calls that create file descriptors, like accept4() support that's
built on top of that.
- Support for io_uring tracing (Dmitrii)
- Support for linked timeouts. These abort an operation if it isn't
completed by the time noted in the linke timeout.
- Speedup tracking of poll requests
- Various cleanups making the coder easier to follow (Jackie, Pavel,
Bob, YueHaibing, me)
- Update MAINTAINERS with new io_uring list"
* tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-block: (64 commits)
io_uring: make POLL_ADD/POLL_REMOVE scale better
io-wq: remove now redundant struct io_wq_nulls_list
io_uring: Fix getting file for non-fd opcodes
io_uring: introduce req_need_defer()
io_uring: clean up io_uring_cancel_files()
io-wq: ensure free/busy list browsing see all items
io-wq: ensure we have a stable view of ->cur_work for cancellations
io_wq: add get/put_work handlers to io_wq_create()
io_uring: check for validity of ->rings in teardown
io_uring: fix potential deadlock in io_poll_wake()
io_uring: use correct "is IO worker" helper
io_uring: fix -ENOENT issue with linked timer with short timeout
io_uring: don't do flush cancel under inflight_lock
io_uring: flag SQPOLL busy condition to userspace
io_uring: make ASYNC_CANCEL work with poll and timeout
io_uring: provide fallback request for OOM situations
io_uring: convert accept4() -ERESTARTSYS into -EINTR
io_uring: fix error clear of ->file_table in io_sqe_files_register()
io_uring: separate the io_free_req and io_free_req_find_next interface
io_uring: keep io_put_req only responsible for release and put req
...
This commit reverts commit 91e6015b08 ("bpf: Emit audit messages
upon successful prog load and unload") and its follow up commit
7599a896f2 ("audit: Move audit_log_task declaration under
CONFIG_AUDITSYSCALL") as requested by Paul Moore. The change needs
close review on linux-audit, tests etc.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
This patch is to allow matching options in erspan.
The options can be described in the form:
VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK.
When ver is set to 1, index will be applied while dir
and hwid will be ignored, and when ver is set to 2,
dir and hwid will be used while index will be ignored.
Different from geneve, only one option can be set. And
also, geneve options, vxlan options or erspan options
can't be set at the same time.
# ip link add name erspan1 type erspan external
# tc qdisc add dev erspan1 ingress
# tc filter add dev erspan1 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
erspan_opts 1:12:0:0/1:ffff:0:0 \
ip_proto udp \
action mirred egress redirect dev eth0
v1->v2:
- improve some err msgs of extack.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to allow matching gbp option in vxlan.
The options can be described in the form GBP/GBP_MASK,
where GBP is represented as a 32bit hexadecimal value.
Different from geneve, only one option can be set. And
also, geneve options and vxlan options can't be set at
the same time.
# ip link add name vxlan0 type vxlan dstport 0 external
# tc qdisc add dev vxlan0 ingress
# tc filter add dev vxlan0 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
vxlan_opts 01020304/ffffffff \
ip_proto udp \
action mirred egress redirect dev eth0
v1->v2:
- add .strict_start_type for enc_opts_policy as Jakub noticed.
- use Duplicate instead of Wrong in err msg for extack as Jakub
suggested.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to allow setting erspan options using the
act_tunnel_key action. Different from geneve options,
only one option can be set. And also, geneve options,
vxlan options or erspan options can't be set at the
same time.
Options are expressed as ver:index:dir:hwid, when ver
is set to 1, index will be applied while dir and hwid
will be ignored, and when ver is set to 2, dir and
hwid will be used while index will be ignored.
# ip link add name erspan1 type erspan external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_proto udp \
action tunnel_key \
set src_ip 10.0.99.192 \
dst_ip 10.0.99.193 \
dst_port 6081 \
id 11 \
erspan_opts 1:2:0:0 \
action mirred egress redirect dev erspan1
v1->v2:
- do the validation when dst is not yet allocated as Jakub suggested.
- use Duplicate instead of Wrong in err msg for extack.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to allow setting vxlan options using the
act_tunnel_key action. Different from geneve options,
only one option can be set. And also, geneve options
and vxlan options can't be set at the same time.
gbp is the only param for vxlan options:
# ip link add name vxlan0 type vxlan dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_proto udp \
action tunnel_key \
set src_ip 10.0.99.192 \
dst_ip 10.0.99.193 \
dst_port 6081 \
id 11 \
vxlan_opts 01020304 \
action mirred egress redirect dev vxlan0
v1->v2:
- add .strict_start_type for enc_opts_policy as Jakub noticed.
- use Duplicate instead of Wrong in err msg for extack as Jakub
suggested.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add definitions for the Enter Compliance and Transmit Margin fields of the
PCIe Link Control 2 register.
Link: https://lore.kernel.org/r/20191112173503.176611-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
- Allow non-ISV data aborts to be reported to userspace
- Allow injection of data aborts from userspace
- Expose stolen time to guests
- GICv4 performance improvements
- vgic ITS emulation fixes
- Simplify FWB handling
- Enable halt pool counters
- Make the emulated timer PREEMPT_RT compliant
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl3VZ0EPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDwikQAJ/lQT97zEKV3dnpD/jjmEic/3QvTGljS4p+
pbwZAzyoSMc09lAK2pkaGRRc7euPnp4uLdRS4SenToVzmUQCzuxpQEEMdV/wjp4V
WLQ1WnTEAhYkm7k5MVo4uy3eD7nVWHWXgfQJvzL4EYZ5R/gd9NzBrnAc6LLV6hp9
0eLXIYrGFa1GESzF6P6sBDJhYpqVUcQlTI8I43kZH3iCC4+OsBxIkhHREZYsELhW
MZJIM9ZCskg2tvPC4UysaFiGjBYUJNJ0V+fFOrhyGzludP8i8rNRwgA60mznwvNw
V4N6/gLlkGK7nLqP+noUcU2wTBnIu389TcWWsX47CuDUzawCd8Fb9kX2zYauQSyS
ujE0uzoo/nhPFysh9OVVeLUZ6o/wotXoMp2t32t1c5h9N1hISEJvAWavMxTY6KzF
NEn9hWFjNcgBoArz9GKn9p2nBQpCDvu+2SlI4nL/qgZ7lPC4O3U1uq9myCOLj/gu
Can/u5EAwgyIBDVcEPHV+vP2GjyeERdXprGiG2VJTYlbHsdjgISTR+5Fy32KdGlP
YygeZxJtzretr3AYsWqD6Mri30FDSoYy9rUOWBpa+ZHbJPac0M+uKOqntV1OxPX9
QUkmNEdJcDr8fkcKxnEZ/MaZxFTGPp4vfhiT4A7dUkWTFq7ajvGo8IwN1d7PvWxS
LFMij1Js
=SxBH
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for Linux 5.5:
- Allow non-ISV data aborts to be reported to userspace
- Allow injection of data aborts from userspace
- Expose stolen time to guests
- GICv4 performance improvements
- vgic ITS emulation fixes
- Simplify FWB handling
- Enable halt pool counters
- Make the emulated timer PREEMPT_RT compliant
Conflicts:
include/uapi/linux/kvm.h
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-11-20
The following pull-request contains BPF updates for your *net-next* tree.
We've added 81 non-merge commits during the last 17 day(s) which contain
a total of 120 files changed, 4958 insertions(+), 1081 deletions(-).
There are 3 trivial conflicts, resolve it by always taking the chunk from
196e8ca748:
<<<<<<< HEAD
=======
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>>>>>>> 196e8ca748
<<<<<<< HEAD
void *bpf_map_area_alloc(u64 size, int numa_node)
=======
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
>>>>>>> 196e8ca748
<<<<<<< HEAD
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
=======
/* kmalloc()'ed memory can't be mmap()'ed */
if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>>>>> 196e8ca748
The main changes are:
1) Addition of BPF trampoline which works as a bridge between kernel functions,
BPF programs and other BPF programs along with two new use cases: i) fentry/fexit
BPF programs for tracing with practically zero overhead to call into BPF (as
opposed to k[ret]probes) and ii) attachment of the former to networking related
programs to see input/output of networking programs (covering xdpdump use case),
from Alexei Starovoitov.
2) BPF array map mmap support and use in libbpf for global data maps; also a big
batch of libbpf improvements, among others, support for reading bitfields in a
relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko.
3) Extend s390x JIT with usage of relative long jumps and loads in order to lift
the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich.
4) Add BPF audit support and emit messages upon successful prog load and unload in
order to have a timeline of events, from Daniel Borkmann and Jiri Olsa.
5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode
(XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson.
6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API
call named bpf_get_link_xdp_info() for retrieving the full set of prog
IDs attached to XDP, from Toke Høiland-Jørgensen.
7) Add BTF support for array of int, array of struct and multidimensional arrays
and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau.
8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo.
9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid
xdping to be run as standalone, from Jiri Benc.
10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song.
11) Fix a memory leak in BPF fentry test run data, from Colin Ian King.
12) Various smaller misc cleanups and improvements mostly all over BPF selftests and
samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow for audit messages to be emitted upon BPF program load and
unload for having a timeline of events. The load itself is in
syscall context, so additional info about the process initiating
the BPF prog creation can be logged and later directly correlated
to the unload event.
The only info really needed from BPF side is the globally unique
prog ID where then audit user space tooling can query / dump all
info needed about the specific BPF program right upon load event
and enrich the record, thus these changes needed here can be kept
small and non-intrusive to the core.
Raw example output:
# auditctl -D
# auditctl -a always,exit -F arch=x86_64 -S bpf
# ausearch --start recent -m 1334
[...]
----
time->Wed Nov 20 12:45:51 2019
type=PROCTITLE msg=audit(1574271951.590:8974): proctitle="./test_verifier"
type=SYSCALL msg=audit(1574271951.590:8974): arch=c000003e syscall=321 success=yes exit=14 a0=5 a1=7ffe2d923e80 a2=78 a3=0 items=0 ppid=742 pid=949 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="test_verifier" exe="/root/bpf-next/tools/testing/selftests/bpf/test_verifier" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1334] msg=audit(1574271951.590:8974): auid=0 uid=0 gid=0 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=949 comm="test_verifier" exe="/root/bpf-next/tools/testing/selftests/bpf/test_verifier" prog-id=3260 event=LOAD
----
time->Wed Nov 20 12:45:51 2019
type=UNKNOWN[1334] msg=audit(1574271951.590:8975): prog-id=3260 event=UNLOAD
----
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191120213816.8186-1-jolsa@kernel.org
RFC 8033 suggests an alternative approach to calculate the queue
delay in PIE by using a timestamp on every enqueued packet. This
patch adds an implementation of that approach and sets it as the
default method to calculate queue delay. The previous method (based
on Little's law) to calculate queue delay is set as optional.
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Wildcard support for the net,iface set from Kristian Evensen.
2) Offload support for matching on the input interface.
3) Simplify matching on vlan header fields.
4) Add nft_payload_rebuild_vlan_hdr() function to rebuild the vlan
header from the vlan sk_buff metadata.
5) Pass extack to nft_flow_cls_offload_setup().
6) Add C-VLAN matching support.
7) Use time64_t in xt_time to fix y2038 overflow, from Arnd Bergmann.
8) Use time_t in nft_meta to fix y2038 overflow, also from Arnd.
9) Add flow_action_entry_next() helper function to flowtable offload
infrastructure.
10) Add IPv6 support to the flowtable offload infrastructure.
11) Support for input interface matching from postrouting,
from Phil Sutter.
12) Missing check for ndo callback in flowtable offload, from wenxu.
13) Remove conntrack parameter from flow_offload_fill_dir(), from wenxu.
14) Do not pass flow_rule object for rule removal, cookie is sufficient
to achieve this.
15) Release flow_rule object in case of error from the offload commit
path.
16) Undo offload ruleset updates if transaction fails.
17) Check for error when binding flowtable callbacks, from wenxu.
18) Always unbind flowtable callbacks when unregistering hooks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The new raid1c3 and raid1c4 profiles are backward incompatible and the
name shall be 'raid1c34', the status can be found in the global
supported features in /sys/fs/btrfs/features or in the per-filesystem
directory.
Signed-off-by: David Sterba <dsterba@suse.com>
Add new block group profile to store 4 copies in a simliar way that
current RAID1 does. The profile attributes and constraints are defined
in the raid table and used by the same code that already handles the 2-
and 3-copy RAID1.
The minimum number of devices is 4, the maximum number of devices/chunks
that can be lost/damaged is 3. There is no comparable traditional RAID
level, the profile is added for future needs to accompany triple-parity
and beyond.
Signed-off-by: David Sterba <dsterba@suse.com>
Add new block group profile to store 3 copies in a simliar way that
current RAID1 does. The profile attributes and constraints are defined
in the raid table and used by the same code that already handles the
2-copy RAID1.
The minimum number of devices is 3, the maximum number of devices/chunks
that can be lost/damaged is 2. Like RAID6 but with 33% space
utilization.
Signed-off-by: David Sterba <dsterba@suse.com>
Add sha256 to the list of possible checksumming algorithms used by BTRFS.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add xxhash64 to the list of possible checksumming algorithms used by
BTRFS.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use enum to replace macro definitions of extent types.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add ability to memory-map contents of BPF array map. This is extremely useful
for working with BPF global data from userspace programs. It allows to avoid
typical bpf_map_{lookup,update}_elem operations, improving both performance
and usability.
There had to be special considerations for map freezing, to avoid having
writable memory view into a frozen map. To solve this issue, map freezing and
mmap-ing is happening under mutex now:
- if map is already frozen, no writable mapping is allowed;
- if map has writable memory mappings active (accounted in map->writecnt),
map freezing will keep failing with -EBUSY;
- once number of writable memory mappings drops to zero, map freezing can be
performed again.
Only non-per-CPU plain arrays are supported right now. Maps with spinlocks
can't be memory mapped either.
For BPF_F_MMAPABLE array, memory allocation has to be done through vmalloc()
to be mmap()'able. We also need to make sure that array data memory is
page-sized and page-aligned, so we over-allocate memory in such a way that
struct bpf_array is at the end of a single page of memory with array->value
being aligned with the start of the second page. On deallocation we need to
accomodate this memory arrangement to free vmalloc()'ed memory correctly.
One important consideration regarding how memory-mapping subsystem functions.
Memory-mapping subsystem provides few optional callbacks, among them open()
and close(). close() is called for each memory region that is unmapped, so
that users can decrease their reference counters and free up resources, if
necessary. open() is *almost* symmetrical: it's called for each memory region
that is being mapped, **except** the very first one. So bpf_map_mmap does
initial refcnt bump, while open() will do any extra ones after that. Thus
number of close() calls is equal to number of open() calls plus one more.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-4-andriin@fb.com
The main motivation to add set_tid to clone3() is CRIU.
To restore a process with the same PID/TID CRIU currently uses
/proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to
ns_last_pid and then (quickly) does a clone(). This works most of the
time, but it is racy. It is also slow as it requires multiple syscalls.
Extending clone3() to support *set_tid makes it possible restore a
process using CRIU without accessing /proc/sys/kernel/ns_last_pid and
race free (as long as the desired PID/TID is available).
This clone3() extension places the same restrictions (CAP_SYS_ADMIN)
on clone3() with *set_tid as they are currently in place for ns_last_pid.
The original version of this change was using a single value for
set_tid. At the 2019 LPC, after presenting set_tid, it was, however,
decided to change set_tid to an array to enable setting the PID of a
process in multiple PID namespaces at the same time. If a process is
created in a PID namespace it is possible to influence the PID inside
and outside of the PID namespace. Details also in the corresponding
selftest.
To create a process with the following PIDs:
PID NS level Requested PID
0 (host) 31496
1 42
2 1
For that example the two newly introduced parameters to struct
clone_args (set_tid and set_tid_size) would need to be:
set_tid[0] = 1;
set_tid[1] = 42;
set_tid[2] = 31496;
set_tid_size = 3;
If only the PIDs of the two innermost nested PID namespaces should be
defined it would look like this:
set_tid[0] = 1;
set_tid[1] = 42;
set_tid_size = 2;
The PID of the newly created process would then be the next available
free PID in the PID namespace level 0 (host) and 42 in the PID namespace
at level 1 and the PID of the process in the innermost PID namespace
would be 1.
The set_tid array is used to specify the PID of a process starting
from the innermost nested PID namespaces up to set_tid_size PID namespaces.
set_tid_size cannot be larger then the current PID namespace level.
Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191115123621.142252-1-areber@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type
including their subprograms. This feature allows snooping on input and output
packets in XDP, TC programs including their return values. In order to do that
the verifier needs to track types not only of vmlinux, but types of other BPF
programs as well. The verifier also needs to translate uapi/linux/bpf.h types
used by networking programs into kernel internal BTF types used by FENTRY/FEXIT
BPF programs. In some cases LLVM optimizations can remove arguments from BPF
subprograms without adjusting BTF info that LLVM backend knows. When BTF info
disagrees with actual types that the verifiers sees the BPF trampoline has to
fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT
program can still attach to such subprograms, but it won't be able to recognize
pointer types like 'struct sk_buff *' and it won't be able to pass them to
bpf_skb_output() for dumping packets to user space. The FENTRY/FEXIT program
would need to use bpf_probe_read_kernel() instead.
The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set
to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd
points to previously loaded BPF program the attach_btf_id is BTF type id of
main function or one of its subprograms.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191114185720.1641606-18-ast@kernel.org
Introduce BPF trampoline concept to allow kernel code to call into BPF programs
with practically zero overhead. The trampoline generation logic is
architecture dependent. It's converting native calling convention into BPF
calling convention. BPF ISA is 64-bit (even on 32-bit architectures). The
registers R1 to R5 are used to pass arguments into BPF functions. The main BPF
program accepts only single argument "ctx" in R1. Whereas CPU native calling
convention is different. x86-64 is passing first 6 arguments in registers
and the rest on the stack. x86-32 is passing first 3 arguments in registers.
sparc64 is passing first 6 in registers. And so on.
The trampolines between BPF and kernel already exist. BPF_CALL_x macros in
include/linux/filter.h statically compile trampolines from BPF into kernel
helpers. They convert up to five u64 arguments into kernel C pointers and
integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On
32-bit architecture they're meaningful.
The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and
__bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert
kernel function arguments into array of u64s that BPF program consumes via
R1=ctx pointer.
This patch set is doing the same job as __bpf_trace_##call() static
trampolines, but dynamically for any kernel function. There are ~22k global
kernel functions that are attachable via nop at function entry. The function
arguments and types are described in BTF. The job of btf_distill_func_proto()
function is to extract useful information from BTF into "function model" that
architecture dependent trampoline generators will use to generate assembly code
to cast kernel function arguments into array of u64s. For example the kernel
function eth_type_trans has two pointers. They will be casted to u64 and stored
into stack of generated trampoline. The pointer to that stack space will be
passed into BPF program in R1. On x86-64 such generated trampoline will consume
16 bytes of stack and two stores of %rdi and %rsi into stack. The verifier will
make sure that only two u64 are accessed read-only by BPF program. The verifier
will also recognize the precise type of the pointers being accessed and will
not allow typecasting of the pointer to a different type within BPF program.
The tracing use case in the datacenter demonstrated that certain key kernel
functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always
active. Other functions have both kprobe and kretprobe. So it is essential to
keep both kernel code and BPF programs executing at maximum speed. Hence
generated BPF trampoline is re-generated every time new program is attached or
detached to maintain maximum performance.
To avoid the high cost of retpoline the attached BPF programs are called
directly. __bpf_prog_enter/exit() are used to support per-program execution
stats. In the future this logic will be optimized further by adding support
for bpf_stats_enabled_key inside generated assembly code. Introduction of
preemptible and sleepable BPF programs will completely remove the need to call
to __bpf_prog_enter/exit().
Detach of a BPF program from the trampoline should not fail. To avoid memory
allocation in detach path the half of the page is used as a reserve and flipped
after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly
which is enough for BPF tracing use cases. This limit can be increased in the
future.
BPF_TRACE_FENTRY programs have access to raw kernel function arguments while
BPF_TRACE_FEXIT programs have access to kernel return value as well. Often
kprobe BPF program remembers function arguments in a map while kretprobe
fetches arguments from a map and analyzes them together with return value.
BPF_TRACE_FEXIT accelerates this typical use case.
Recursion prevention for kprobe BPF programs is done via per-cpu
bpf_prog_active counter. In practice that turned out to be a mistake. It
caused programs to randomly skip execution. The tracing tools missed results
they were looking for. Hence BPF trampoline doesn't provide builtin recursion
prevention. It's a job of BPF program itself and will be addressed in the
follow up patches.
BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases
in the future. For example to remove retpoline cost from XDP programs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org
User space may request time stamps on rising edges, falling edges, or
both. However, the particular mode may or may not be supported in the
hardware or in the driver. This patch adds a "strict" flag that tells
drivers to ensure that the requested mode will be honored.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 415606588c ("PTP: introduce new versions of IOCTLs")
introduced a new external time stamp ioctl that validates the flags.
This patch extends the validation to ensure that at least one rising
or falling edge flag is set when enabling external time stamps.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We store elapsed time for a crashed process in struct elf_prstatus using
'timeval' structures. Once glibc starts using 64-bit time_t, this becomes
incompatible with the kernel's idea of timeval since the structure layout
no longer matches on 32-bit architectures.
This changes the definition of the elf_prstatus structure to use
__kernel_old_timeval instead, which is hardcoded to the currently used
binary layout. There is no risk of overflow in y2038 though, because
the time values are all relative times, and can store up to 68 years
of process elapsed time.
There is a risk of applications breaking at build time when they
use the new kernel headers and expect the type to be exactly 'timeval'
rather than a structure that has the same fields as before. Those
applications have to be modified to deal with 64-bit time_t anyway.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In order to remove the 'struct timespec' definition and the
timespec64_to_timespec() helper function, change over the in-kernel
definition of 'struct scm_timestamping' to use the __kernel_old_timespec
replacement and open-code the assignment.
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There are two 'struct timeval' fields in 'struct rusage'.
Unfortunately the definition of timeval is now ambiguous when used in
user space with a libc that has a 64-bit time_t, and this also changes
the 'rusage' definition in user space in a way that is incompatible with
the system call interface.
While there is no good solution to avoid all ambiguity here, change
the definition in the kernel headers to be compatible with the kernel
ABI, using __kernel_old_timeval as an unambiguous base type.
In previous discussions, there was also a plan to add a replacement
for rusage based on 64-bit timestamps and nanosecond resolution,
i.e. 'struct __kernel_timespec'. I have patches for that as well,
if anyone thinks we should do that.
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This is mainly a patch for clarification, and to let us remove
the time_t definition from the kernel to prevent new users from
creeping in that might not be y2038-safe.
All remaining uses of 'time_t' or '__kernel_time_t' are part of
the user API that cannot be changed by that either have a
replacement or that do not suffer from the y2038 overflow.
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>