This commit adds support for HTB offload in the mlx5e driver.
Performance:
NIC: Mellanox ConnectX-6 Dx
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (24 cores with HT)
100 Gbit/s line rate, 500 UDP streams @ ~200 Mbit/s each
48 traffic classes, flower used for steering
No shaping (rate limits set to 4 Gbit/s per TC) - checking for max
throughput.
Baseline: 98.7 Gbps, 8.25 Mpps
HTB: 6.7 Gbps, 0.56 Mpps
HTB offload: 95.6 Gbps, 8.00 Mpps
Limitations:
1. 256 leaf nodes, 3 levels of depth.
2. Granularity for ceil is 1 Mbit/s. Rates are converted to weights, and
the bandwidth is split among the siblings according to these weights.
Other parameters for classes are not supported.
Ethtool statistics support for QoS SQs are also added. The counters are
called qos_txN_*, where N is the QoS queue number (starting from 0, the
numeration is separate from the normal SQs), and * is the counter name
(the counters are the same as for the normal SQs).
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that mlx5 upper
layer who wish to support sf port management in switchdev mode can
perform its task whenever eswitch mode is set to switchdev or before
eswitch is disabled.
Initialize sf port table on such eswitch event.
Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
or by its unique port index:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88 -jp
{
"port": {
"pci/0000:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00",
"state": "inactive",
"opstate": "detached"
}
}
}
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add auxiliary device driver for mlx5 subfunction auxiliary device.
A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.
As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as
$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4
$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4
$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4
$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].
[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Introduce API to add and delete an auxiliary device for an SF.
Each SF has its own dedicated window in the PCI BAR 2.
SF device is similar to PCI PF and VF that supports multiple class of
devices such as net, rdma and vdpa.
SF device will be added or removed in subsequent patch during SF
devlink port function state change command.
A subfunction device exposes user supplied subfunction number which will
be further used by systemd/udev to have deterministic name for its
netdevice and rdma device.
An mlx5 subfunction auxiliary device example:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:88:88 state inactive opstate detached
$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active
On activation,
$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4
$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.
Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.
This enables single entity to subscribe, query and rearm the event
for a function.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This reverts commit fbdd0049d9.
Due to commit in fixes tag, netdevice events were received only in one net
namespace of mlx5_core_dev. Due to this when netdevice events arrive in
net namespace other than net namespace of mlx5_core_dev, they are missed.
This results in empty GID table due to RDMA device being detached from its
net device.
Hence, revert back to receive netdevice events in all net namespaces to
restore back RDMA functionality in non init_net net namespace. The
deadlock will have to be addressed in another patch.
Fixes: fbdd0049d9 ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion")
Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Expose ifc bits for query_modify_header_context_in to be used by DEVX.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Add capability bit to test whether reg_c value is preserved on
recirculation.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expose firmware indication that it supports setting eswitch uplink state
to follow (follow the physical link). Condition setting the eswitch
uplink admin-state with this capability bit. Older FW may not support
the uplink state setting.
Fixes: 7d0314b11c ("net/mlx5e: Modify uplink state on interface up/down")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
MLX5_GENERAL_OBJECT_TYPES types bitfield is 64-bit field.
Defining an enum for such bit fields on 32-bit platform results in below
warning.
./include/vdso/bits.h:7:26: warning: left shift count >= width of type [-Wshift-count-overflow]
^
./include/linux/mlx5/mlx5_ifc.h:10716:46: note: in expansion of macro ‘BIT’
MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT(0x20),
^~~
Use 32-bit friendly BIT_ULL macro.
Fixes: 2a29708916 ("net/mlx5: Add sample offload hardware bits and structures")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20201213120641.216032-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5-next auxbus support
This pull request is targeting net-next and rdma-next branches.
This series provides mlx5 support for auxiliary bus devices.
It starts with a merge commit of tag 'auxbus-5.11-rc1' from
gregkh/driver-core into mlx5-next, then the mlx5 patches that will convert
mlx5 ulp devices (netdev, rdma, vdpa) to use the proper auxbus
infrastructure instead of the internal mlx5 device and interface management
implementation, which Leon is deleting at the end of this patchset.
Link: https://lore.kernel.org/alsa-devel/20201026111849.1035786-1-leon@kernel.org/
Thanks to everyone for the joint effort !
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
RDMA/mlx5: Remove IB representors dead code
net/mlx5: Simplify eswitch mode check
net/mlx5: Delete custom device management logic
RDMA/mlx5: Convert mlx5_ib to use auxiliary bus
net/mlx5e: Connect ethernet part to auxiliary bus
vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus
net/mlx5: Register mlx5 devices to auxiliary virtual bus
vdpa/mlx5: Make hardware definitions visible to all mlx5 devices
net/mlx5_core: Clean driver version and name
net/mlx5: Properly convey driver version to firmware
driver core: auxiliary bus: minor coding style tweaks
driver core: auxiliary bus: make remove function return void
driver core: auxiliary bus: move slab.h from include file
Add auxiliary bus support
====================
Link: https://lore.kernel.org/r/20201207053349.402772-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide mlx5_core device instead of "priv" pointer while checking
eswith mode.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
After conversion to use auxiliary bus, all custom device management is
not needed anymore, delete it.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Move mlx5_vdpa IFC header file to the general include folder, so
mlx5_core will be able to reuse it to check if VDPA is supported
prior to creating an auxiliary device.
As part of this move, update the header file name to mlx5 general
naming scheme.
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Remove exposed driver version as it was done in other drivers,
so module version will work correctly by displaying the kernel
version for which it is compiled.
And move mlx5_core module name to general include, so auxiliary drivers
will be able to use it as a basis for a name in their device ID tables.
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
STEs format for Connect-X5 and Connect-X6DX different. Currently, on
Connext-X6DX the SW steering would break at some point when building STEs
w/o giving a proper error message. Fix this by checking the STE format of
the current device when initializing domain: add mlx5_ifc definitions for
Connect-X6DX SW steering, read FW capability to get the current format
version, and check this version when domain is being created.
Fixes: 26d688e33f ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Low level mlx5 updates required by both netdev and rdma trees:
net/mlx5: Treat host PF vport as other (non eswitch manager) vport
net/mlx5: Enable host PF HCA after eswitch is initialized
net/mlx5: Rename peer_pf to host_pf
net/mlx5: Make API mlx5_core_is_ecpf accept const pointer
net/mlx5: Export steering related functions
net/mlx5: Expose other function ifc bits
net/mlx5: Expose IP-in-IP TX and RX capability bits
net/mlx5: Update the hardware interface definition for vhca state
net/mlx5: Update the list of the PCI supported devices
net/mlx5: Avoid exposing driver internal command helpers
net/mlx5: Add ts_cqe_to_dest_cqn related bits
net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits
net/mlx5: Check dr mask size against mlx5_match_param size
net/mlx5: Add sampler destination type
net/mlx5: Add sample offload hardware bits and structures
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl/IOZcACgkQSD+KveBX
+j4J8wgAuxwflrYrbCWXV7LE08J7T7ZHRDE+jEbaZ0Zp9mOsYDDpcifpKwy2EVRf
RKcpMYh/GzAljmEpeWIAlMxmlpXhKWXTDruWCx73r1jvdXf/RU24/zQHa6BjeiDo
rMB8bgiW4a66+z4LcN/U6ahbVM5gScBNEt2sS1OIi9ZInngGVo9FgfhYMpERPNcH
3+mcHulCnGBNbbLwoTllOcgbxexn+xoByukg5Z0ddBJp007DMjzBIWDpDS0y2HaT
jGo1LYONgRc3zoGVmdeu9F+tSsWBIgsaiyGxKj1T/8sZUaNz2TKj9VOiYIj9BLff
cp6GRc88k7HWA4tImSHQiLbK6cx+yA==
=mjvI
-----END PGP SIGNATURE-----
Merge tag 'mlx5-next-2020-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-next-2020-12-02
Low level mlx5 updates required by both netdev and rdma trees.
* tag 'mlx5-next-2020-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Treat host PF vport as other (non eswitch manager) vport
net/mlx5: Enable host PF HCA after eswitch is initialized
net/mlx5: Rename peer_pf to host_pf
net/mlx5: Make API mlx5_core_is_ecpf accept const pointer
net/mlx5: Export steering related functions
net/mlx5: Expose other function ifc bits
net/mlx5: Expose IP-in-IP TX and RX capability bits
net/mlx5: Update the hardware interface definition for vhca state
net/mlx5: Update the list of the PCI supported devices
net/mlx5: Avoid exposing driver internal command helpers
net/mlx5: Add ts_cqe_to_dest_cqn related bits
net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits
net/mlx5: Check dr mask size against mlx5_match_param size
net/mlx5: Add sampler destination type
net/mlx5: Add sample offload hardware bits and structures
====================
Link: https://lore.kernel.org/r/20201203011010.213440-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When eswitch manager is running on ECPF, host PF should be treated
as non eswitch manager port, similar to other VF vports.
Fail to do so, results in firmware treating PF's vport as ECPF
vport for eswitch ACL tables.
Non zero check to figure out if a given vport is other vport or not
is not sufficient becase PF vport number = 0 on ECPF.
Hence, create esw acl tables with an attribute of other vport.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To match the hardware spec, rename peer_pf to host_pf.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Subsequent patch implements helper API which has mlx5_core_dev
as const pointer, make its caller API too const *.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Expose other function ifc bits to enable setting HCA caps on behalf of
other function.
In addition, expose vhca_resource_manager bit to control whether the
other function functionality is supported by firmware.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Expose FW indication that it supports stateless offloads for IP over IP
tunneled packets per direction. In some HW like ConnectX-4 IP-in-IP
support is not symmetric, it supports steering on the inner header but
it doesn't TX-Checksum and TSO. Add IP-in-IP capability per direction to
cover this case as well.
Note: only if both indications are turned on, the global
tunnel_stateless_ip_over_ip is on too.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Update the hardware interface definitions to query and modify vhca
state, related EQE and event code.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5 command init and cleanup routines are internal to mlx5_core driver.
Hence, avoid exporting them and move their definition to mlx5_core
driver's internal file mlx5_core.h
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add a bit in HCA capabilities layout to indicate if ts_cqe_to_dest_cqn is
supported.
In addition, add ts_cqe_to_dest_cqn field to SQ context, for driver to
set the actual CQN.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add misc4 match params to enable matching on prog_sample_fields.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The flow sampler object is a new destination type. Add a new member
for the flow destination.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Hardware introduces flow sampler object for packet sampling.
Add the offload hardware bits and structures.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Hi Linus,
Please, pull the following patches that replace zero-length arrays with
flexible-array members.
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl+cjRUACgkQRwW0y0cG
2zGWAhAAjUfTsAmXWhKNaWFSCYR0Q822puTUWOKfiBd+jjGaO04luTtr2gjv2Dkb
Vgad8H4N8oZU79xfh5JZ5PUyScaso8wE6ZJTh2PLKXpKmNd213f5x/pIt78CCDTa
Y1L/eR41mmveTL3VNS3sf6WaZpT9owxJKGIY8JgdiOmSjxJQpX5zdaC1KYso4eXr
lIXIRo9VLEmVLhhHhZi+QmX6+aQ05E1D9K0ENe4/uEnRsV525W78iwZ4fYeLzr+A
krEOdgx6sPgzajPYnHoayrrcKNKxD5YY1SWuVSm2tqYYIhlRoK3f5xgLOd10RiHE
YMgx8aWzGmGJwoUhgp1bo/l9EZ7O8OWRqM/GOP4x6Wgjdhqw2x5jgskmhsKNGEXu
/BlbS+qL5aUrMCxhvNbApuZW6xBiBbva76MH3vU9vFhZbVz1CHLQdGI0tfxggYWS
jc2UPgoxL9OQlf3jSc+gK7RMFhBGNWn2Aiy8GQas3BxPYXuYPvwOj+irDOG/qZ9D
VZ5swUw4+th+DsF5K53mEFeLv0fONMgL9Ka5bNR6+k6HG0WNLYYVOiet3xYUDo1f
eZbMZthfc+QW7R8cwG0WuFk6rC6mLqE+A9nQuLZoJD+VMuJd4pwW9+6EW8nDX08w
FS4/o92xUFJfOCgaLRS61FSAuSmFENieN+yoKMK/Uf6PJVdNMb4=
=vyu3
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull more flexible-array member conversions from Gustavo A. R. Silva:
"Replace zero-length arrays with flexible-array members"
* tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
printk: ringbuffer: Replace zero-length array with flexible-array member
net/smc: Replace zero-length array with flexible-array member
net/mlx5: Replace zero-length array with flexible-array member
mei: hw: Replace zero-length array with flexible-array member
gve: Replace zero-length array with flexible-array member
Bluetooth: btintel: Replace zero-length array with flexible-array member
scsi: target: tcmu: Replace zero-length array with flexible-array member
ima: Replace zero-length array with flexible-array member
enetc: Replace zero-length array with flexible-array member
fs: Replace zero-length array with flexible-array member
Bluetooth: Replace zero-length array with flexible-array member
params: Replace zero-length array with flexible-array member
tracepoint: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member
mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member
dmaengine: ti-cppi5: Replace zero-length array with flexible-array member
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.
Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo
mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence. A below call graph
of the unload flow.
cleanup_net()
down_read(&pernet_ops_rwsem); <- first sem acquired
ops_pre_exit_list()
pre_exit()
devlink_pernet_pre_exit()
devlink_reload()
mlx5_devlink_reload_down()
mlx5_unload_one()
[...]
mlx5_ib_remove()
mlx5_ib_unbind_slave_port()
mlx5_remove_netdev_notifier()
unregister_netdevice_notifier()
down_write(&pernet_ops_rwsem);<- recurrsive lock
Hence, when net namespace is deleted, mlx5 reload results in deadlock.
When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.
Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.
Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The typical set of driver updates across the subsystem:
- Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns,
usnic, qib, qedr, cxgb4, hns, bnxt_re
- Various rtrs fixes and updates
- Bug fix for mlx4 CM emulation for virtualization scenarios where MRA
wasn't working right
- Use tracepoints instead of pr_debug in the CM code
- Scrub the locking in ucma and cma to close more syzkaller bugs
- Use tasklet_setup in the subsystem
- Revert the idea that 'destroy' operations are not allowed to fail at
the driver level. This proved unworkable from a HW perspective.
- Revise how the umem API works so drivers make fewer mistakes using it
- XRC support for qedr
- Convert uverbs objects RWQ and MW to new the allocation scheme
- Large queue entry sizes for hns
- Use hmm_range_fault() for mlx5 On Demand Paging
- uverbs APIs to inspect the GID table instead of sysfs
- Move some of the RDMA code for building large page SGLs into
lib/scatterlist
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g
mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2
l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi
5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk
46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp
oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL
YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310
/b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk
p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a
ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl
fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI
16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE=
=LKpl
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A usual cycle for RDMA with a typical mix of driver and core subsystem
updates:
- Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
hns, usnic, qib, qedr, cxgb4, hns, bnxt_re
- Various rtrs fixes and updates
- Bug fix for mlx4 CM emulation for virtualization scenarios where
MRA wasn't working right
- Use tracepoints instead of pr_debug in the CM code
- Scrub the locking in ucma and cma to close more syzkaller bugs
- Use tasklet_setup in the subsystem
- Revert the idea that 'destroy' operations are not allowed to fail
at the driver level. This proved unworkable from a HW perspective.
- Revise how the umem API works so drivers make fewer mistakes using
it
- XRC support for qedr
- Convert uverbs objects RWQ and MW to new the allocation scheme
- Large queue entry sizes for hns
- Use hmm_range_fault() for mlx5 On Demand Paging
- uverbs APIs to inspect the GID table instead of sysfs
- Move some of the RDMA code for building large page SGLs into
lib/scatterlist"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
RDMA/ucma: Fix use after free in destroy id flow
RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
RDMA: Explicitly pass in the dma_device to ib_register_device
lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
IB/mlx4: Convert rej_tmout radix-tree to XArray
RDMA/rxe: Fix bug rejecting all multicast packets
RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
RDMA/rxe: Remove duplicate entries in struct rxe_mr
IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
IB/rdmavt: Fix sizeof mismatch
MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
RDMA/uverbs: Expose the new GID query API to user space
...
Add new FTE in TX IPsec FT per IPsec state. It has the
same matching criteria as the RX steering rule.
The IPsec FT is created/destroyed when the first/last rule
is added/deleted respectively.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add new namespace that represents the NIC TX domain.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of pci is offline reclaim_pages_cmd() will still try to call
the FW to release FW pages, cmd_exec() in this case will return a silent
success without actually calling the FW.
This is wrong and will cause page leaks, what we should do is to detect
pci offline or command interface un-available before tying to access the
FW and manually release the FW pages in the driver.
In this patch we share the code to check for FW command interface
availability and we call it in sensitive places e.g. reclaim_pages_cmd().
Alternative fix:
1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
command success simulation list.
2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Upon command completion timeout, driver simulates a forced command
completion. In a rare case where real interrupt for that command arrives
simultaneously, it might release the command entry while the forced
handler might still access it.
Fix that by adding an entry refcount, to track current amount of allowed
handlers. Command entry to be released only when this refcount is
decremented to zero.
Command refcount is always initialized to one. For callback commands,
command completion handler is the symmetric flow to decrement it. For
non-callback commands, it is wait_func().
Before ringing the doorbell, increment the refcount for the real completion
handler. Once the real completion handler is called, it will decrement it.
For callback commands, once the delayed work is scheduled, increment the
refcount. Upon callback command completion handler, we will try to cancel
the timeout callback. In case of success, we need to decrement the callback
refcount as it will never run.
In addition, gather the entry index free and the entry free into a one
flow for all command types release.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently only 256 vports can be supported as only 8 bits are
reserved for them and 8 bits are reserved for vhca_ids in
metadata reg c0. To support more than 256 vports, replace
vhca_id with a unique shorter 4-bit PF number which covers
upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095.
This will continue to generate unique metadata even if
multiple PCI devices have same switch_id.
Signed-off-by: sunils <sunils@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Leon Romanovsky says:
====================
This series from Alex extends software steering interface to support
devices with extra capability "sw_owner_2" which will replace existing
"sw_owner".
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
* branch 'mlx5_sw_owner_v2:
RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
RDMA/mlx5: Add sw_owner_v2 bit capability
Leon Romanovsky says:
====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
* branch 'mlx5_active_speed':
RDMA: Fix link active_speed size
RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
net/mlx5: Refactor query port speed functions
Added sw_owner_v2 which will be enabled for future devices,
replacing sw_owner bit.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Combine two same enums to avoid duplication.
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.
And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.
According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.
Fixes: ada68c31ba ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Add CQE compression support for completions of packets that span
multiple strides in a Striding RQ, per the HW capability.
In our memory model, we use small strides (256B as of today) for the
non-linear SKB mode. This feature allows CQE compression to work also
for multiple strides packets. In this case decompressing the mini CQE
array will use stride index provided by HW as part of the mini CQE.
Before this feature, compression was possible only for single-strided
packets, i.e. for packets of size up to 256 bytes when in non-linear
mode, and the index was maintained by SW.
This feature is supported for ConnectX-5 and above.
Feature performance test:
This was whitebox-tested, we reduced the PCI speed from 125Gb/s to
62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming
packets before building the SKB to achieve low cpu utilization.
Outcome is low cpu utilization and bottleneck on pci only.
Test setup:
Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores
NIC: ConnectX-6 DX.
Sender side generates 300 byte packets at full pci bandwidth.
Receiver side configuration:
Single channel, one cpu processing with one ring allocated. Cpu utilization
is ~20% while pci bandwidth is fully utilized.
For the generated traffic and interface MTU of 4500B (to activate the
non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps
to ~21Mpps.
Without this feature, counters show no CQE compression blocks for
this setup, while with the feature, counters show ~20.7Mpps compressed CQEs
in ~500K compression blocks.
Signed-off-by: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on
some cases used container_of and on another used clock->mdev.
Align code to use container_of amd remove clock->mdev pointer.
While here, fix reverse xmas tree coding style.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
When DCT QPs work in RoCE LAG mode:
1. DCT creation is allowed only when it is supported
2. The "port" of a DCT QP is assigned in a round-robin way
Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Smaller set of RDMA updates. A smaller number of 'big topics' with the
majority of changes being driver updates.
- Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re
- Removal of dead or redundant code across the drivers
- RAW resource tracker dumps to include a device specific data blob for
device objects to aide device debugging
- Further advance the IOCTL interface, remove the ability to turn it off.
Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands
- Remove stubs related to devices with no pkey table
- A shared CQ scheme to allow multiple ULPs to share the CQ rings of a
device to give higher performance
- Several more static checker, syzkaller and rare crashers fixed
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl8sSA0ACgkQOG33FX4g
mxpp1w/8Df/KIB38PVHpKraIW10bX03KsXwoskMYCA+ITYWM5ce+P7YF+yXXGs69
Vh2vUYHlr1RvqXQkq3Y3LjzCPKTYFuNFVQRZF1LrfbfOpSS9aoQqoxwgKs08dibm
YDeRwueWneksWhXeEZLA0QoKd4kEWrScA/n7VGYQ4YcWw8FLKa9t6OMSGivCrFLu
QA+sA9nytrvMWC5uJUCdeVwlRnoaICPYHmM5yafOykPyEciRw2jU1kzTRVy5Z0Hu
iCsXm2lJPcVoMgSjW6SgktY3oBkQeSu3ZZesT3eTM6FJsoDYkuSiKjNmWSZjW1zv
x6CFGjVVin41rN4FMTeqqnwYoML9Q/obbyHvBHs5MTd5J8tLDhesQj3Ev7CUaUed
b0s38v+oEL1w22nkOChfeyfh7eLcy3yiszqvkIU9ABk8mF0p1guGQYsfguzbsq0K
3ZRw/361SxCUBvU6P8CdQbIJlhkH+Un7d81qyt+rhLgaZYm/N+d8auIKUxP1jCxh
q9hss2Cj2U9eZsA/wGNqV1LNazfEAAj/5qjItMirbRd90FL8h+AP2LfJfC7p+id3
3BfOui0JbZqNTTl4ftTxPuxtWDEdTPgwi7JvQd/be9HRlSV8DYCSMUzYFn8A+Zya
cbxjxFuBJWmF+y9csDIVBTdFi+j9hO6notw+G89NznuB3QlPl50=
=0z2L
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A quiet cycle after the larger 5.8 effort. Substantially cleanup and
driver work with a few smaller features this time.
- Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re
- Removal of dead or redundant code across the drivers
- RAW resource tracker dumps to include a device specific data blob
for device objects to aide device debugging
- Further advance the IOCTL interface, remove the ability to turn it
off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands
- Remove stubs related to devices with no pkey table
- A shared CQ scheme to allow multiple ULPs to share the CQ rings of
a device to give higher performance
- Several more static checker, syzkaller and rare crashers fixed"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits)
RDMA/mlx5: Fix flow destination setting for RDMA TX flow table
RDMA/rxe: Remove pkey table
RDMA/umem: Add a schedule point in ib_umem_get()
RDMA/hns: Fix the unneeded process when getting a general type of CQE error
RDMA/hns: Fix error during modify qp RTS2RTS
RDMA/hns: Delete unnecessary memset when allocating VF resource
RDMA/hns: Remove redundant parameters in set_rc_wqe()
RDMA/hns: Remove support for HIP08_A
RDMA/hns: Refactor hns_roce_v2_set_hem()
RDMA/hns: Remove redundant hardware opcode definitions
RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP
RDMA/include: Replace license text with SPDX tags
RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
RDMA/cma: Execute rdma_cm destruction from a handler properly
RDMA/cma: Remove unneeded locking for req paths
RDMA/cma: Using the standard locking pattern when delivering the removal event
RDMA/cma: Simplify DEVICE_REMOVAL for internal_id
RDMA/efa: Add EFA 0xefa1 PCI ID
RDMA/efa: User/kernel compatibility handshake mechanism
...
When setting the PF interface up/down, notify the firmware to update
uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.
This behavior will prevent sending traffic out on uplink port when PF is
down, such as sending traffic from a VF interface which is still up.
Currently when calling mlx5e_open/close(), the driver only sends PAOS
command to notify the firmware to set the physical port state to
up/down, however, it is not sufficient. When VF is in "auto" state, it
follows the uplink state, which was not updated on mlx5e_open/close()
before this patch.
When switchdev mode is enabled and uplink representor is first enabled,
set the uplink port state value back to its FW default "AUTO".
Fixes: 63bfd399de ("net/mlx5e: Send PAOS command on interface up/down")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Per page request event, FW request to allocated or release pages for a
single function. Driver maintains FW pages object per function, so there
is no need to hold one global page data-base. Instead, have a page
data-base per function, which will improve performance release flow in all
cases, especially for "release all pages".
As the range of function IDs is large and not sequential, use xarray to
store a per function ID page data-base, where the function ID is the key.
Upon first allocation of a page to a function ID, create the page
data-base per function. This data-base will be released only at pagealloc
mechanism cleanup.
NIC: ConnectX-4 Lx
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Test case: 32 VFs, measure release pages on one VF as part of FLR
Before: 0.021 Sec
After: 0.014 Sec
The improvement depends on amount of VFs and memory utilization
by them. Time measurements above were taken from idle system.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.
With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.
Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX
device data structures instead of using structure defined specifically for
mlx5_ib module.
Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Up to ConnectX-7 setting mkey relaxed ordering read/write attributes
by UMR is not supported. ConnectX-7 supports this option, which is
indicated by two new HCA capabilities - relaxed_ordering_write_umr
and relaxed_ordering_read_umr.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This to set the base for downstream patches to support
the new IPsec implementation of the Connect-X family.
Following modifications made:
- Remove accel layer dependency from MLX5_FPGA_IPSEC.
- Introduce accel_ipsec_ops, each IPsec device will
have to support these ops.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rename mlx5_ifc_device_virtio_emulation_cap_bits to
mlx5_ifc_virtio_emulation_cap_bits to match names produced by the
tools producing these auto generated files.
In addition missing capabilities that will be required by VDPA
implementation.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
VDPA is a new interface that will be added in subsequent patches. It
uses mlx5 core devices and resources. Add an interface type for it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5_fill_page_frag_array() is used to populate dma addresses to
resources that require it, such as QPs, RQs etc. When the resource is
used, PA list permissions are ignored. For resources that use MTT list,
the user is required to provide the access rights. Subsequent patches
use resources that require MTT lists, so modify API and implementation
to support that.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Device unit for port buffers size, xoff_threshold and xon_threshold is
cells. Fix a bug in driver where cell unit size was hard-coded to
128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
versions.
Driver to read cell size from SBCAM register and translate bytes to cell
units accordingly.
In order to fix the bug, this patch exposes SBCAM (Shared buffer
capabilities mask) layout and defines.
If SBCAM.cap_cell_size is valid, use it for all bytes to cells
calculations. If not valid, fallback to 128.
Cell size do not change on the fly per device. Instead of issuing SBCAM
access reg command every time such translation is needed, cache it in
mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
as a param to every function that needs bytes to cells translation.
While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
en_dcbnl.c, as it is only used by that file.
Fixes: 0696d60853 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch exposes new link modes using 100Gbps per lane, including 100G,
200G and 400G modes.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If in the process of creating the underlay QP for an IPoIB interface
the user has set the address and specifically the 1st-3rd bytes
representing the QP number, use the requested QP number when creating
the underlay QP.
For a user to be able to request a QP number on QP creation, the MKEY_BY_NAME
NVCONFIG should be set. As mkey_by_name and qp_by_name are coupled in FW.
This requires driver to query the mkey_by_name max cap during initialization
and set the current cap if it was enabled in FW.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Add explicit WQE segment structures for the TLS static and progress
params.
According to the HW spec, TISN is not part of the progress params context,
take it out of it.
Rename the control segment tisn field as it could hold either a TIS or
a TIR number.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Minor overlapping changes in xfrm_device.c, between the double
ESP trailing bug fix setting the XFRM_INIT flag and the changes
in net-next preparing for bonding encryption support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new resource dump segments - PRM_QUERY_QP,
PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource
dump in PRM query format.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Export some of the resource dump API. mlx5_ib driver will use
it in downstream patches.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Since none of the functions need to modify the input mac address,
constify them.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Missed steps during ECE handshake left userspace application with less
options for the ECE handshake. Pass ECE options in the additional
transitions.
Fixes: 50aec2c313 ("RDMA/mlx5: Return ECE data after modify QP")
Link: https://lore.kernel.org/r/20200616104536.2426384-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A few large, long discussed works this time. The RNBD block driver has
been posted for nearly two years now, and the removal of FMR has been a
recurring discussion theme for a long time. The usual smattering of
features and bug fixes.
- Various small driver bugs fixes in rxe, mlx5, hfi1, and efa
- Continuing driver cleanups in bnxt_re, hns
- Big cleanup of mlx5 QP creation flows
- More consistent use of src port and flow label when LAG is used and a
mlx5 implementation
- Additional set of cleanups for IB CM
- 'RNBD' network block driver and target. This is a network block RDMA
device specific to ionos's cloud environment. It brings strong multipath
and resiliency capabilities.
- Accelerated IPoIB for HFI1
- QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds
- Support for exchanging the new IBTA defiend ECE data during RDMA CM
exchanges
- Removal of the very old and insecure FMR interface from all ULPs and
drivers. FRWR should be preferred for at least a decade now.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7X/IwACgkQOG33FX4g
mxp2uw/+MI2S/aXqEBvZfTT8yrkAwqYezS0VeTDnwH/T6UlTMDhHVN/2Ji3tbbX3
FEKT1i2mnAL5RqUAL1lr9g4sG/bVozrpN46Ws5Lu9dTbIPLKTNPWDuLFQDUShKY7
OyMI/bRx6anGnsOy20iiBqnrQbrrZj5TECgnmrkAl62QFdcl7aBWe/yYjy4CT11N
ub+aBXBREN1F1pc0HIjd2tI+8gnZc+mNm1LVVDRH9Capun/pI26qDNh7e6QwGyIo
n8ItraC8znLwv/nsUoTE7/JRcsTEe6vJI26PQmczZfNJs/4O65G7fZg0eSBseZYi
qKf7Uwtb3qW0R7jRUMEgFY4DKXVAA0G2ph40HXBuzOSsqlT6HqYMO2wgG8pJkrTc
qAjoSJGzfAHIsjxzxKI8wKuufCddjCm30VWWU7EKeriI6h1J0uPVqKkQMfYBTkik
696eZSBycAVgwayOng3XaehiTxOL7qGMTjUpDjUR6UscbiPG919vP+QsbIUuBXdb
YoddBQJdyGJiaCXv32ciJjo9bjPRRi/bII7Q5qzCNI2mi4ZVbudF4ffzyQvdHtNJ
nGnpRXoPi7kMvUrKTMPWkFjj0R5/UsPszsA51zbxPydfgBe0Dlc2PrrIG8dlzYAp
wbV0Lec+iJucKlt7EZtrjz1xOiOOaQt/5/cW1bWqL+wk2t6gAuY=
=9zTe
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A more active cycle than most of the recent past, with a few large,
long discussed works this time.
The RNBD block driver has been posted for nearly two years now, and
flowing through RDMA due to it also introducing a new ULP.
The removal of FMR has been a recurring discussion theme for a long
time.
And the usual smattering of features and bug fixes.
Summary:
- Various small driver bugs fixes in rxe, mlx5, hfi1, and efa
- Continuing driver cleanups in bnxt_re, hns
- Big cleanup of mlx5 QP creation flows
- More consistent use of src port and flow label when LAG is used and
a mlx5 implementation
- Additional set of cleanups for IB CM
- 'RNBD' network block driver and target. This is a network block
RDMA device specific to ionos's cloud environment. It brings strong
multipath and resiliency capabilities.
- Accelerated IPoIB for HFI1
- QP/WQ/SRQ ioctl migration for uverbs, and support for multiple
async fds
- Support for exchanging the new IBTA defiend ECE data during RDMA CM
exchanges
- Removal of the very old and insecure FMR interface from all ULPs
and drivers. FRWR should be preferred for at least a decade now"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits)
RDMA/cm: Spurious WARNING triggered in cm_destroy_id()
RDMA/mlx5: Return ECE DC support
RDMA/mlx5: Don't rely on FW to set zeros in ECE response
RDMA/mlx5: Return an error if copy_to_user fails
IB/hfi1: Use free_netdev() in hfi1_netdev_free()
RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr()
RDMA/core: Move and rename trace_cm_id_create()
IB/hfi1: Fix hfi1_netdev_rx_init() error handling
RDMA: Remove 'max_map_per_fmr'
RDMA: Remove 'max_fmr'
RDMA/core: Remove FMR device ops
RDMA/rdmavt: Remove FMR memory registration
RDMA/mthca: Remove FMR support for memory registration
RDMA/mlx4: Remove FMR support for memory registration
RDMA/i40iw: Remove FMR leftovers
RDMA/bnxt_re: Remove FMR leftovers
RDMA/mlx5: Remove FMR leftovers
RDMA/core: Remove FMR pool API
RDMA/rds: Remove FMR support for memory registration
RDMA/srp: Remove support for FMR memory registration
...
Fix sparse warning:
drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1949:15:
warning: memset with byte count of 271720
mlx5_cmd_stats array is too big to be held inline in mlx5_cmd.
Allocate it separately.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
net/mlx5: Add ability to read and write ECE options
net/mlx5: Add support for RDMA TX FT headers modifying
net/mlx5: Move iseg access helper routines close to mlx5_core driver
net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits
net/mlx5: Add support in forward to namespace
{IB/net}/mlx5: Simplify don't trap code
net/mlx5: Replace zero-length array with flexible-array
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of hand crafted mlx5_qp_context and mlx5_qp_path use common
MLX5_SET() macros.
Link: https://lore.kernel.org/r/20200526115440.205922-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
From the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Required for dependencies in following patches
* branch 'mellanox/mlx5-next':
net/mlx5: Add ability to read and write ECE options
net/mlx5: Add support for RDMA TX FT headers modifying
net/mlx5: Move iseg access helper routines close to mlx5_core driver
net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The end result of RDMA-CM ECE handshake is ECE options, which is
needed to be used while configuring data QPs. Such options can
come in any QP state, so add in/out fields to set and query
ECE options.
OUT fields:
* create_qp() - default ECE options for that type of QP.
* modify_qp() - enabled ECE options after QP state transition.
IN fields:
* create_qp() - create QP with this ECE option.
* modify_qp() - requested options. For unconnected QPs, the FW
will return an error if ECE is already configured with any options
that not equal to previously set.
Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
When driver is reloading during recovery flow, it can't get new commands
till command interface is up again. Otherwise we may get to null pointer
trying to access non initialized command structures.
Add cmdif state to avoid processing commands while cmdif is not ready.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
After driver creates (via FW command) an EQ for commands, the driver will
be informed on new commands completion by EQE. However, due to a race in
driver's internal command mode metadata update, some new commands will
still be miss-handled by driver as if we are in polling mode. Such commands
can get two non forced completion, leading to already freed command entry
access.
CREATE_EQ command, that maps EQ to the command queue must be posted to the
command queue while it is empty and no other command should be posted.
Add SW mechanism that once the CREATE_EQ command is about to be executed,
all other commands will return error without being sent to the FW. Allow
sending other commands only after successfully changing the driver's
internal command mode metadata.
We can safely return error to all other commands while creating the command
EQ, as all other commands might be sent from the user/application during
driver load. Application can rerun them later after driver's load was
finished.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When FW response to commands is very slow and all command entries in
use are waiting for completion we can have a race where commands can get
timeout before they get out of the queue and handled. Timeout
completion on uninitialized command will cause releasing command's
buffers before accessing it for initialization and then we will get NULL
pointer exception while trying access it. It may also cause releasing
buffers of another command since we may have timeout completion before
even allocating entry index for this command.
Add entry handling completion to avoid this race.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Only mlx5_core driver handles fw initialization check and command
interface revision check.
Hence move them inside the mlx5_core driver where it is used.
This avoid exposing these helpers to all mlx5 drivers.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Remove the "metadata_reg_b" field and all uses of this field in code
to match the device specification. As this field is not in use in SW
steering it is safe to remove it.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
From the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Required for dependencies in following patches
* branch 'mellanox/mlx5-next':
net/mlx5: Add support in forward to namespace
{IB/net}/mlx5: Simplify don't trap code
net/mlx5: Replace zero-length array with flexible-array
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently, fs_core supports rule of forward the traffic
to continue matching in the next priority, now we add support
to forward the traffic matching in the next namespace.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This merge includes updates to bonding driver needed for the rdma stack,
to avoid conflicts with the RDMA branch.
Maor Gottlieb Says:
====================
Bonding: Add support to get xmit slave
The following series adds support to get the LAG master xmit slave by
introducing new .ndo - ndo_get_xmit_slave. Every LAG module can
implement it and it first implemented in the bond driver.
This is follow-up to the RFC discussion [1].
The main motivation for doing this is for drivers that offload part
of the LAG functionality. For example, Mellanox Connect-X hardware
implements RoCE LAG which selects the TX affinity when the resources
are created and port is remapped when it goes down.
The first part of this patchset introduces the new .ndo and add the
support to the bonding module.
The second part adds support to get the RoCE LAG xmit slave by building
skb of the RoCE packet based on the AH attributes and call to the new
.ndo.
The third part change the mlx5 driver driver to set the QP's affinity
port according to the slave which found by the .ndo.
====================
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The patch sets the lag tx affinity of the data QPs and the GSI QPs
according to the LAG xmit slave.
For GSI QPs, in case the link layer is Ethenet (RoCE) we create two GSI
QPs, one for each physical port. When the driver selects the GSI QP, it
will consider the port affinity result. For connected QPs, the driver
sets the affinity of the xmit slave.
The above, ensures that RC QP and it's corresponding GSI QP will transmit
from the same physical port.
Link: https://lore.kernel.org/r/20200430192146.12863-17-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add function to get the device physical port of the lag slave.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 updates for both net-next and rdma-next:
1) HW bits and definitions for TLS and IPsec offlaods
2) Release all pages capability bits
3) New command interface helpers and some code cleanup as a result
4) Move qp.c out of mlx5 core driver into mlx5_ib rdma driver
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add new TX WQE field for Connect-X6DX trailer insertion support,
when set, the HW adds a trailer to the packet, the WQE trailer
association flags are used to set to HW the header which the
trailer belongs.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a bit in HCA capabilities layout to indicate if release all pages is
supported.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add needed structure layouts and defines for pci sync for fw update
event. The downstream patches will include event handlers for this event
type.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add needed structure layouts and defines for MFRL (Management Firmware
Reset Level) register. This structure will be used for the firmware
upgrade and reset flow in the downstream patches.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The imm_inval_pkey field can hold four different types of data,
depends on the usage, the data could be one of the below:
- Immediate field of the received message
- Invalidate rkey
- Pkey of the packet
- Flow table metadata
Current implementation doesn't reflect the intended usage of the
field at usage time.
Reflect the different types by replace this field with a union,
modify code where this field is used to reflect its intended
usage.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The alignment value is part of the input structure, so use it and spare
extra memory allocation when is not needed.
Now, using the new ability when allocating icm for Direct-Rule
insertion.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add COPY type to modify_header action. IPsec feature is the first
feature that needs COPY steering action.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of cq.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Currently the FPGA IPsec is the only hw implementation of the IPsec
acceleration api, and so the mlx5_accel_esp_create_hw_context was
wrongly made to suit this HW api, among other in its parameter list
and some of its parameter endianness.
This implementation might not be suitable for different HW.
Refactor by group and pass all function arguments of
mlx5_accel_esp_create_hw_context in common mlx5_accel_esp_xfrm_attrs
struct field of mlx5_accel_esp_xfrm struct and correct the endianness
according to the HW being called.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When this is enabled, UDP source port for RoCEv2 packets are defined
by software instead of firmware.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The mlx5_core doesn't need any functionality coded in qp.c, so move
that file to drivers/infiniband/ be under mlx5_ib responsibility.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
mlx5 core users are encouraged to use low level API (mlx5_cmd_exec)
without the need of helper functions, do this for q counters, remove
helper functions and call mlx5_cmd_exec directly from users.
This will help reduce the total amount of code and reduction of the
mlx5_core symbol table.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Many mlx5_cmd_exec() callers are not interested in the output from that
command or have standard in/out structures. Those callers simply allocate
those structure on the stack and use sizeof() to provide in/out arguments.
In this naive approach provide simplified versions of mlx5_cmd_exec().
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The majority of the patches are cleanups, refactorings and clarity
improvements
- Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
- Lots of cleanup patches for hns
- Convert more places to use refcount
- Aggressively lock the RDMA CM code that syzkaller says isn't working
- Work to clarify ib_cm
- Use the new ib_device lifecycle model in bnxt_re
- Fix mlx5's MR cache which seems to be failing more often with the new
ODP code
- mlx5 'dynamic uar' and 'tx steering' user interfaces
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6CSr0ACgkQOG33FX4g
mxrtKg//XovbOfYAO7nC05FtGz9iEkIUBiwQOjUojgNSi6RMNDqRW1bmqKUugm1o
9nXA6tw+fueEvUNSD541SCcxkUZJzWvubO9wHB6N3Fgy68N3Vf2rKV3EBBTh99rK
Cb7rnmTTN6izRyI1wdyP2sjDJGyF8zvsgIrG2sibzLnqlyScrnD98YS0FdPZUfOa
a1mmXBN/T7eaQ4TbE3lLLzGnifRlGmZ5vxEvOQmAlOdqAfIKQdbbW7oCRLVIleso
gfQlOOvIgzHRwQ3VrFa3i6ETYtywXq7EgmQxCjqPVJQjCA79n5TLBkP1iRhvn8xi
3+LO4YCkiSJ/NjTA2d9KwT6K4djj3cYfcueuqo2MoXXr0YLiY6TLv1OffKcUIq7c
LM3d4CSwIAG+C2FZwaQrdSGa2h/CNfLAEeKxv430zggeDNKlwHJPV5w3rUJ8lT56
wlyT7Lzosl0O9Z/1BSLYckTvbBCtYcmanVyCfHG8EJKAM1/tXy5LS8btJ3e51rPu
XekR9ELrTTA2CTuoSCQGP6J0dBD2U7qO4XRCQ9N5BYLrI6RdP7Z4xYzzey49Z3Cx
JaF86eurM7nS5biUszTtwww8AJMyYicB+0VyjBfk+mhv90w8tS1vZ1aZKzaQ1L6Z
jWn8WgIN4rWY0YGQs6PiovT1FplyGs3p1wNmjn92WO0wZZ3WsmQ=
=ae+a
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"The majority of the patches are cleanups, refactorings and clarity
improvements.
This cycle saw some more activity from Syzkaller, I think we are now
clean on all but one of those bugs, including the long standing and
obnoxious rdma_cm locking design defect. Continue to see many drivers
getting cleanups, with a few new user visible features.
Summary:
- Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
- Lots of cleanup patches for hns
- Convert more places to use refcount
- Aggressively lock the RDMA CM code that syzkaller says isn't
working
- Work to clarify ib_cm
- Use the new ib_device lifecycle model in bnxt_re
- Fix mlx5's MR cache which seems to be failing more often with the
new ODP code
- mlx5 'dynamic uar' and 'tx steering' user interfaces"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
RDMA/bnxt_re: make bnxt_re_ib_init static
IB/qib: Delete struct qib_ivdev.qp_rnd
RDMA/hns: Fix uninitialized variable bug
RDMA/hns: Modify the mask of QP number for CQE of hip08
RDMA/hns: Reduce the maximum number of extend SGE per WQE
RDMA/hns: Reduce PFC frames in congestion scenarios
RDMA/mlx5: Add support for RDMA TX flow table
net/mlx5: Add support for RDMA TX steering
IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
IB/hfi1: Fix memory leaks in sysfs registration and unregistration
IB/mlx5: Move to fully dynamic UAR mode once user space supports it
IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
IB/mlx5: Extend QP creation to get uar page index from user space
IB/mlx5: Extend CQ creation to get uar page index from user space
IB/mlx5: Expose UAR object and its alloc/destroy commands
IB/hfi1: Get rid of a warning
RDMA/hns: Remove redundant judgment of qp_type
RDMA/hns: Remove redundant assignment of wc->smac when polling cq
RDMA/hns: Remove redundant qpc setup operations
RDMA/hns: Remove meaningless prints
...
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
mlx5: Remove uninitialized use of key in mlx5_core_create_mkey
{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
{IB,net}/mlx5: Assign mkey variant in mlx5_ib only
{IB,net}/mlx5: Setup mkey variant before mr create command invocation
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Leon Romanovsky says:
====================
Those two patches from Michael extends mlx5_core and mlx5_ib flow steering
to support RDMA TX in similar way to already supported RDMA RX.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_tx_steering':
RDMA/mlx5: Add support for RDMA TX flow table
net/mlx5: Add support for RDMA TX steering
Add new RDMA TX flow steering namespace. Flow steering rules in
this namespace are used to filter transmitted RDMA traffic.
Link: https://lore.kernel.org/r/20200324061425.1570190-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
struct mlx5_bfreg_info is used by mlx5_ib only but is exposed to both RDMA
and netdev parts of mlx5 driver. Move that struct to mlx5_ib namespace,
clean vertical space alignment and convert lib_uar_4k from bool to
bitfield.
Link: https://lore.kernel.org/r/20200324060143.1569116-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add dependencny on cap termination_table_raw_traffic to allow non
encapsulated packets received from uplink to be forwarded back to the
received uplink port.
Refactor the conditions into a separate function.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Leon Romanovsky says:
====================
This series fixes various corner cases in the mlx5_ib MR cache
implementation, see specific commit messages for more information.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_mr-cache':
RDMA/mlx5: Allow MRs to be created in the cache synchronously
RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
RDMA/mlx5: Fix locking in MR cache work queue
RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
RDMA/mlx5: Fix MR cache size and limit debugfs
RDMA/mlx5: Always remove MRs from the cache before destroying them
RDMA/mlx5: Simplify how the MR cache bucket is located
RDMA/mlx5: Rename the tracking variables for the MR cache
RDMA/mlx5: Replace spinlock protected write with atomic var
{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
{IB,net}/mlx5: Assign mkey variant in mlx5_ib only
{IB,net}/mlx5: Setup mkey variant before mr create command invocation
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Enable reg c1 loopback if firmware reports it's supported,
as this is needed for restoring packet metadata (e.g chain).
Also define helper to query if it is enabled.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now the flex parser capability was used in ib_query_device() to
indicate tunnel_offloads_caps support for mpls_over_gre/mpls_over_udp.
Newer devices and firmware will have configurations with the flexparser
but without mpls support.
Testing for the flex parser capability was a mistake, the tunnel_stateless
capability was intended for detecting mpls and was introduced at the same
time as the flex parser capability.
Otherwise userspace will be incorrectly informed that a future device
supports MPLS when it does not.
Link: https://lore.kernel.org/r/20200305123841.196086-1-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.17
Fixes: e818e255a5 ("IB/mlx5: Expose MPLS related tunneling offloads")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas Says:
====================
Expose raw packet pacing APIs to be used by DEVX based applications. The
existing code was refactored to have a single flow with the new raw APIs.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_packet_pacing':
IB/mlx5: Introduce UAPIs to manage packet pacing
net/mlx5: Expose raw packet pacing APIs
This series adds some HW bits and definitions for mlx5 driver, to be
used by downstream features in both rdma and netdev branches.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: HW bit for goto chain offload support
net/mlx5: Expose link speed directly
net/mlx5: Introduce TLS and IPSec objects enums
net/mlx5: Introduce egress acl forward-to-vport capability
net/mlx5: Expose raw packet pacing APIs
net/mlx5e: Replace zero-length array with flexible-array member
net/mlx5: fix spelling mistake "reserverd" -> "reserved"
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Expose port rate as part of the port speed register fields.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Expose the TLS encryption key general object type enum correctly,
and add the IPSec encryption key general object type enum.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add HCA_CAP.egress_acl_forward_to_vport field to check whether HW
supports e-switch vport's egress acl to forward packets to other
e-switch vport or not.
By default E-Switch egress ACL forwards eswitch vports egress packets
to their corresponding NIC/VF vports.
With this cap enabled, the driver is allowed to alter this behavior
and forward packets to arbitrary NIC/VF vports with the following
limitations:
a. Multiple processing paths are supported if all of the following
conditions are met:
- HCA_CAP.egress_acl_forward_to_vport is set ==1.
- A destination of type Flow Table only appears once, as the
last destination in the list.
- Vport destination is supported if
HCA_CAP.egress_acl_forward_to_vport==1. Vport must not be
the Uplink.
b. Flow_tag not supported.
c. This table is only applicable after an FDB table is created.
d. Push VLAN action is not supported.
e. Pop VLAN action cannot be added concurrently to this table and
FDB table.
This feature will be used during port failover in bonding scenario
where two VFs representors are bonded to handle failover egress traffic
(VM's ingress/receive traffic).
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Expose raw packet pacing APIs to be used by DEVX based applications.
The existing code was refactored to have a single flow with the new raw
APIs.
The new raw APIs considered the input of 'pp_rate_limit_context', uid,
'dedicated', upon looking for an existing entry.
This raw mode enables future device specification data in the raw
context without changing the existing logic and code.
The ability to ask for a dedicated entry gives control for application
to allocate entries according to its needs.
A dedicated entry may not be used by some other process and it also
enables the process spreading its resources to some different entries
for use different hardware resources as part of enforcing the rate.
The counter per entry was changed to be u64 to prevent any option to
overflow.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
When using port mirroring, we forward the traffic to another table and
use that table to forward to the mirrored vport. Since the hardware
loses the values of reg c, and in particular reg c0, we fail the match
on the input vport which previously existed in reg c0. To overcome this
situation, we use a set of per vport tables, positioned at the lowest
priority, and forward traffic to those tables. Since these tables are
per vport, we can avoid matching on reg c0.
Fixes: c01cfd0f11 ("net/mlx5: E-Switch, Add match on vport metadata for rule in fast path")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On RX side create a restore table in OFFLOADS namespace.
This table will match on all values for reg_c0 we will use,
and set it to the flow_tag. This flow tag can then be read on the CQE.
As there is no copy action from reg c0 to flow tag, instead we have to
set the flow tag explictily. We add an API so callers can add all the used
reg_c0 values (tags) and for each of those we add a restore rule.
This will be used in a following patch to save the miss chain mapping
tag on reg_c0 and from it restore the tc chain on the skb.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Multi chain support requires the miss path to continue the processing
from the last chain id, and for that we need to save the chain
miss tag (a mapping for 32bit chain id) on reg_c0 which will
come in a next patch.
Currently reg_c0 is exclusively used to store the source port
metadata, giving it 32bit, it is created from 16bits of vcha_id,
and 16bits of vport number.
We will move this source port metadata to upper 16bits, and leave the
lower bits for the chain miss tag. We compress the reg_c0 source port
metadata to 16bits by taking 8 bits from vhca_id, and 8bits from
the vport number.
Since we compress the vport number to 8bits statically, and leave two
top ids for special PF/ECPF numbers, we will only support a max of 254
vports with this strategy.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On driver load:
- Initialize resource dump data structure and memory access tools (mkey
& pd).
- Read the resource dump's menu which contains the FW segment
identifier. Each record is identified by the segment name (ASCII).
During the driver's course of life, users (like reporters) may request
dumps per segment. The user should create a command providing the
segment identifier (SW enumeration) and command keys. In return, the
user receives a command context. In order to receive the dump, the user
should supply the command context and a memory (aligned to a PAGE) on
which the dump content will be written. Since the dump may be larger
than the given memory, the user may resubmit the command until received
an indication of end-of-dump. It is the user's responsibility to destroy
the command.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On flow table creation, send the relevant flags according to what the FW
currently supports.
When FW doesn't support reformat option over SW-steering managed table,
the driver shouldn't pass this.
Fixes: 988fd6b32d ("net/mlx5: DR, Pass table flags at creation to lower layer")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The reserved member should be named reserved.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Deprecate the generic TLS cap bit, use the new TX-specific
TLS cap bit instead.
Fixes: a12ff35e0f ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
- Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4, rxe,
i40iw
- Larger series doing cleanup and rework for hns and hfi1.
- Some general reworking of the CM code to make it a little more
understandable
- Unify the different code paths connected to the uverbs FD scheme
- New UAPI ioctls conversions for get context and get async fd
- Trace points for CQ and CM portions of the RDMA stack
- mlx5 driver support for virtio-net formatted rings as RDMA raw ethernet QPs
- verbs support for setting the PCI-E relaxed ordering bit on DMA traffic
connected to a MR
- A couple of bug fixes that came too late to make rc7
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl4zPwQACgkQOG33FX4g
mxoURw//fuQmuJ7aTMH+0qrhaZUmzXOcI/WKvY0YMyYLvxolRcIO+uCL239wxezR
9iTHPO7HeYXUQ4W8Hi/fTyuQ9hzaPOP3wgOJfQhm4QT/XDpRW0H3Mb+hTLHTUAcA
rgKc9suAn+5BbIDOz7hEfeOTssx1wYrLsaHDc11NZ42JuG6uvPR33lhXiKWG+5tH
2MpfeTU6BjL035dm3YZXCo+ouobpdMuvzJItYIsB2E5Nl0s91SMzsymIYiD0gb3t
yUJ3wqPW3pchfAl8VEn+W5AHTUYYgGjmEblL8WdVq5JRrkQgQzj8QtCRT9NOPAT0
LivCvgBrm0kscaQS2TjtG56Ojbwz8z1QPE/4shf0pj/G2lZfacYDAeaUc/2VafxY
y/KG+3dB1DxtYY3eXJUxbB7Vpk7kfr35p5b75NdMhd2t49oPgV7EKoZMLYGzfX4S
PtyNyNSiwx8qsRTr4lznOMswmrDLfG4XiywWgYo6NGOWyKYlARWIYBAEQZ0DPTiE
9mqJ19gusdSdAgm8LGDInPmH6/AojGOVzYonJFWdlOtwCXGNXL4Gx02x4WYHykDG
w+oy5NMJbU3b6+MWEagkuQNcrwqv02MT1mB/Lgv4GPm6rS0UXR7zUPDeccE50fSL
X36k28UlftlPlaD7PeJdTOAhyBv5DxfpL5rbB2TfpUTpNxjayuU=
=hepK
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A very quiet cycle with few notable changes. Mostly the usual list of
one or two patches to drivers changing something that isn't quite rc
worthy. The subsystem seems to be seeing a larger number of rework and
cleanup style patches right now, I feel that several vendors are
prepping their drivers for new silicon.
Summary:
- Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
rxe, i40iw
- Larger series doing cleanup and rework for hns and hfi1.
- Some general reworking of the CM code to make it a little more
understandable
- Unify the different code paths connected to the uverbs FD scheme
- New UAPI ioctls conversions for get context and get async fd
- Trace points for CQ and CM portions of the RDMA stack
- mlx5 driver support for virtio-net formatted rings as RDMA raw
ethernet QPs
- verbs support for setting the PCI-E relaxed ordering bit on DMA
traffic connected to a MR
- A couple of bug fixes that came too late to make rc7"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
RDMA/core: Make the entire API tree static
RDMA/efa: Mask access flags with the correct optional range
RDMA/cma: Fix unbalanced cm_id reference count during address resolve
RDMA/umem: Fix ib_umem_find_best_pgsz()
IB/mlx4: Fix leak in id_map_find_del
IB/opa_vnic: Spelling correction of 'erorr' to 'error'
IB/hfi1: Fix logical condition in msix_request_irq
RDMA/cm: Remove CM message structs
RDMA/cm: Use IBA functions for complex structure members
RDMA/cm: Use IBA functions for simple structure members
RDMA/cm: Use IBA functions for swapping get/set acessors
RDMA/cm: Use IBA functions for simple get/set acessors
RDMA/cm: Add SET/GET implementations to hide IBA wire format
RDMA/cm: Add accessors for CM_REQ transport_type
IB/mlx5: Return the administrative GUID if exists
RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
IB/mlx4: Fix memory leak in add_gid error flow
IB/mlx5: Expose RoCE accelerator counters
RDMA/mlx5: Set relaxed ordering when requested
RDMA/core: Add the core support field to METHOD_GET_CONTEXT
...
A user can change the operational GUID (a.k.a affective GUID) through
link/infiniband. Therefore it is preferred to return the currently set
GUID if it exists instead of the operational.
This way the PF can query which VF GUID will be set in the next bind. In
order to align with MAC address, zero is returned if administrative GUID
is not set.
For example, before setting administrative GUID:
$ ip link show
ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off
Then:
$ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00
$ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00
After setting administrative GUID:
$ ip link show
ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off
Fixes: 9c0015ef09 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes")
Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Exclude the last n entries for an autogrouped flow table.
Reserving entries at the end of the FT will ensure that this FG will be
the last to be evaluated. This will be used in the next patch to create
a miss group enabling custom actions on FT miss.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If user sets ignore flow level flag on a rule, that rule can point to
a flow table of any level, including those with levels equal or less
than the level of the flow table it is added on.
This with unamanged tables will be used to create a FDB chain/prio
hierarchy much larger than currently supported level range.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, Most of the steering tree is statically declared ahead of time,
with steering prios instances allocated for each fdb chain to assign max
number of levels for each of them. This allows fs_core to manage the
connections and levels of the flow tables hierarcy to prevent loops, but
restricts us with the number of supported chains and priorities.
Introduce unmananged flow tables, allowing the user to manage the flow
table connections. A unamanged table is detached from the fs_core flow
table hierarcy, and is only connected back to the hierarchy by explicit
FTEs forward actions.
This will be used together with firmware that supports ignoring the flow
table levels to increase the number of supported chains and prios.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This merge syncs with mlx5-next latest HW bits and layout updates for next
features, in addition one patch that improves
mlx5_create_auto_grouped_flow_table() API across all mlx5 users.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
net/mlx5e: Add discard counters per priority
net/mlx5e: Expose FEC feilds and related capability bit
net/mlx5: Add mlx5_ifc definitions for connection tracking support
net/mlx5: Add copy header action struct layout
net/mlx5: Expose resource dump register mapping
net/mlx5: Add structures and defines for MIRC register
net/mlx5: Read MCAM register groups 1 and 2
net/mlx5: Add structures layout for new MCAM access reg groups
net/mlx5: Expose vDPA emulation device capabilities
net/mlx5: Add Virtio Emulation related device capabilities
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add counters that count (per priority) the number of received
packets that dropped due to lack of buffers on a physical port. If
this counter is increasing, it implies that the adapter is
congested and cannot absorb the traffic coming from the network.
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce 50G per lane FEC modes capability bit and newly supported
fields in PPLM register which allow this configuration.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add the required hardware definitions to mlx5_ifc:
ignore_flow_level, registers, copy_header, and fwd_and_modify cap.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Sholomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add definition for copy header action, copy action is used
to copy header fields from source to destination.
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add needed structures, layouts and defines for MIRC (Management Image
Re-activation Control) register. This structure will be used for the FSM
reactivation flow in the downstream patches.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On load, Driver caches MCAM (Management Capabilities Mask Register)
registers. in addition to the only MCAM register group (0) the driver
already reads, here we add support for reading groups 1 and 2.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
MCAM has 3 access_reg_groups (0-2). Defines data structures in order to
read and parse access_reg_groups #1 and #2.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Expose vDPA emulation device capabilities from the core layer.
It includes reading the capabilities from the firmware and exposing
helper functions to access the data.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Add Virtio Emulation related fields to the device capabilities.
It includes a general bit to indicate whether Virtio Emulation is
supported and the capabilities structure itself.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The function mlx5_buf_alloc_node is only used by the function in the
local scope. So it is appropriate to limit this function in the local
scope.
Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mainly a collection of smaller of driver updates this cycle.
- Various driver updates and bug fixes for siw, bnxt_re, hns, qedr,
iw_cxgb4, vmw_pvrdma, mlx5
- Improvements in SRPT from working with iWarp
- SRIOV VF support for bnxt_re
- Skeleton kernel-doc files for drivers/infiniband
- User visible counters for events related to ODP
- Common code for tracking of mmap lifetimes so that drivers can link HW
object liftime to a VMA
- ODP bug fixes and rework
- RDMA READ support for efa
- Removal of the very old cxgb3 driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl3dwFYACgkQOG33FX4g
mxohHRAAnkUr0SKh1uV7vuhA8sUlejkwAaw2V2Sm3E1y4GiPpfRAak/FcbEu9F8l
E7Q7VI04DRKTTpTRmREVyKYjlKr5QOA4mSryNMfBZobjkp+t++U1GC9YKL3zh65U
odyJedtv3zKCLzV9RBwX06C8Vi1PQNQp7RbQ42xH/IyKXJIZPHeCUeKv7PsfTWVo
vhuXFc9pPwqHDfRVTbj6s1g+OwVuRHc+SWep6eTKLGYvt8CqdN9WEpA0sJrlwPet
NLTZTrFDpuC12RPc4Lo5c7R5MeAzHgojZCeSFaL2DhJLOx3kfmU30wG+rV94Lvsq
Y2yt6BwKeLleKzpR5ApkuIVHQt2KECPrJbmVLDFqi+rT7yvzsd2AB+uiCS50+LPG
h2UpWJdKWtrnSzTqbJQieXd+oT253Dk+ciy7zbdPSGPwz1dc/Cna9TTn26X2SezR
bmRhHOykrh7LCNrv/7jiSq/zWApGTlR9YZ86x9Li+lJ3kkG+7d2DBEofDU+oKE5F
p0+1Cer1td5IkN3N8oDpvyXAbCIv7qdWwXB2Cm7dSfUETIpAKnXiBxFCHyvmsuus
7gGyKZh2490N3SoIL1muu12dIvhPC2Obg9ckCNU3ldw9+uPJkCzrs+n+JtkIW0mi
i1fMjRQ6FumVT6/KdavYgeCsHmItVqv03SGdFsHF3SGwvWo6y8Y=
=UV2u
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Again another fairly quiet cycle with few notable core code changes
and the usual variety of driver bug fixes and small improvements.
- Various driver updates and bug fixes for siw, bnxt_re, hns, qedr,
iw_cxgb4, vmw_pvrdma, mlx5
- Improvements in SRPT from working with iWarp
- SRIOV VF support for bnxt_re
- Skeleton kernel-doc files for drivers/infiniband
- User visible counters for events related to ODP
- Common code for tracking of mmap lifetimes so that drivers can link
HW object liftime to a VMA
- ODP bug fixes and rework
- RDMA READ support for efa
- Removal of the very old cxgb3 driver"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits)
RDMA/hns: Delete unnecessary callback functions for cq
RDMA/hns: Rename the functions used inside creating cq
RDMA/hns: Redefine the member of hns_roce_cq struct
RDMA/hns: Redefine interfaces used in creating cq
RDMA/efa: Expose RDMA read related attributes
RDMA/efa: Support remote read access in MR registration
RDMA/efa: Store network attributes in device attributes
IB/hfi1: remove redundant assignment to variable ret
RDMA/bnxt_re: Fix missing le16_to_cpu
RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices
RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series
RDMA/bnxt_re: Fix Kconfig indentation
IB/mlx5: Implement callbacks for getting VFs GUID attributes
IB/ipoib: Add ndo operation for getting VFs GUID attributes
IB/core: Add interfaces to get VF node and port GUIDs
net/core: Add support for getting VF GUIDs
RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset
RDMA/cm: Use refcount_t type for refcount variable
IB/mlx5: Support extended number of strides for Striding RQ
IB/mlx4: Update HW GID table while adding vlan GID
...
Danit Goldberg says:
====================
This series extends RTNETLINK to provide IB port and node GUIDs, which
were configured for Infiniband VFs.
The functionality to set VF GUIDs already existed for a long time, and
here we are adding the missing "get" so that netlink will be symmetric and
various cloud orchestration tools will be able to manage such VFs more
naturally.
The iproute2 was extended too to present those GUIDs.
- ip link show <device>
For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
====================
Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies
* branch 'ib-guids': (35 commits)
IB/mlx5: Implement callbacks for getting VFs GUID attributes
IB/ipoib: Add ndo operation for getting VFs GUID attributes
IB/core: Add interfaces to get VF node and port GUIDs
net/core: Add support for getting VF GUIDs
net/mlx5: Add new chain for netfilter flow table offload
net/mlx5: Refactor creating fast path prio chains
net/mlx5: Accumulate levels for chains prio namespaces
net/mlx5: Define fdb tc levels per prio
net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines
net/mlx5: Simplify fdb chain and prio eswitch defines
IB/mlx5: Load profile according to RoCE enablement state
IB/mlx5: Rename profile and init methods
net/mlx5: Handle "enable_roce" devlink param
net/mlx5: Document flow_steering_mode devlink param
devlink: Add new "enable_roce" generic device param
net/mlx5: fix spelling mistake "metdata" -> "metadata"
net/mlx5: fix kvfree of uninitialized pointer spec
IB/mlx5: Introduce and use mlx5_core_is_vf()
net/mlx5: E-switch, Enable metadata on own vport
net/mlx5: Refactor ingress acl configuration
...
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
1) New generic devlink param "enable_roce", for downstream devlink
reload support
2) Do vport ACL configuration on per vport basis when
enabling/disabling a vport. This enables to have vports enabled/disabled
outside of eswitch config for future
3) Split the code for legacy vs offloads mode and make it clear
4) Tide up vport locking and workqueue usage
5) Fix metadata enablement for ECPF
6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION
7) E-Switch and flow steering core low level support and refactoring for
netfilter flowtables offload
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Netfilter tables (nftables) implements a software datapath that
comes after tc ingress datapath. The datapath supports offloading
such rules via the flow table offload API.
This API is currently only used by NFT and it doesn't provide the
global priority in regards to tc offload, so we assume offloading such
rules must come after tc. It does provide a flow table priority
parameter, so we need to provide some supported priority range.
For that, split fastpath prio to two, flow table offload and tc offload,
with one dedicated priority chain for flow table offload.
Next patch will re-use the multi chain API to access this chain by
allowing access to this chain by the fdb_sub_namespace.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Register "enable_roce" param, default value is RoCE enabled.
Current configuration is stored on mlx5_core_dev and exposed to user
through the cmode runtime devlink param.
Changing configuration requires changing the cmode driverinit devlink
param and calling devlink reload.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().
This enables to clearly identify PF, VF and non virtual functions.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The union should contain the extended dest and counter list.
Remove the resevered 0x40 bits which is redundant.
This change doesn't break any functionally.
Everything works today because the code in fs_cmd.c is using
the correct structs if extended dest or the basic dest.
Fixes: 1b11549859 ("net/mlx5: Introduce extended destination fields")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Jason Gunthorpe says:
====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.
This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.
The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.
This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================
Merge conflicts with the odp statistics patch resolved.
* branch 'odp_rework':
RDMA/odp: Remove broken debugging call to invalidate_range
RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
RDMA/mlx5: Rework implicit ODP destroy
RDMA/mlx5: Avoid double lookups on the pagefault path
RDMA/mlx5: Reduce locking in implicit_mr_get_data()
RDMA/mlx5: Use an xarray for the children of an implicit ODP
RDMA/mlx5: Split implicit handling from pagefault_mr
RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
RDMA/mlx5: Rework implicit_mr_get_data
RDMA/mlx5: Delete struct mlx5_priv->mkey_table
RDMA/mlx5: Use a dedicated mkey xarray for ODP
RDMA/mlx5: Split sig_err MR data into its own xarray
RDMA/mlx5: Use SRCU properly in ODP prefetch
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Expose maximum scatter entries per RDMA READ for optimal performance.
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Pull networking fixes from David Miller:
1) Sanity check URB networking device parameters to avoid divide by
zero, from Oliver Neukum.
2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
don't work properly. Longer term this needs a better fix tho. From
Vijay Khemka.
3) Small fixes to selftests (use ping when ping6 is not present, etc.)
from David Ahern.
4) Bring back rt_uses_gateway member of struct rtable, it's semantics
were not well understood and trying to remove it broke things. From
David Ahern.
5) Move usbnet snaity checking, ignore endpoints with invalid
wMaxPacketSize. From Bjørn Mork.
6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
Alex Vesker, and Yevgeny Kliteynik
8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
9) Fix crash when removing sch_cbs entry while offloading is enabled,
from Vinicius Costa Gomes.
10) Signedness bug fixes, generally in looking at the result given by
of_get_phy_mode() and friends. From Dan Crapenter.
11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
13) Fix quantization code in tcp_bbr, from Kevin Yang.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
net: tap: clean up an indentation issue
nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
sk_buff: drop all skb extensions on free and skb scrubbing
tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
Documentation: Clarify trap's description
mlxsw: spectrum: Clear VLAN filters during port initialization
net: ena: clean up indentation issue
NFC: st95hf: clean up indentation issue
net: phy: micrel: add Asym Pause workaround for KSZ9021
net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
ptp: correctly disable flags on old ioctls
lib: dimlib: fix help text typos
net: dsa: microchip: Always set regmap stride to 1
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
...
This cycle mainly saw lots of bug fixes and clean up code across the core
code and several drivers, few new functional changes were made.
- Many cleanup and bug fixes for hns
- Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
bnxt_re, efa
- Share the query_port code between all the iWarp drivers
- General rework and cleanup of the ODP MR umem code to fit better with
the mmu notifier get/put scheme
- Support rdma netlink in non init_net name spaces
- mlx5 support for XRC devx and DC ODP
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl2A1ugACgkQOG33FX4g
mxp+EQ//Ug8CyyDs40SGZztItoGghuQ4TVA2pXtuZ9LkFVJRWhYPJGOadHIYXqGO
KQJJpZPQ02HTZUPWBZNKmD5bwHfErm4cS73/mVmUpximnUqu/UsLRJp8SIGmBg1D
W1Lz1BJX24MdV8aUnChYvdL5Hbl52q+BaE99Z0haOvW7I3YnKJC34mR8m/A5MiRf
rsNIZNPHdq2U6pKLgCbOyXib8yBcJQqBb8x4WNAoB1h4MOx+ir5QLfr3ozrZs1an
xXgqyiOBmtkUgCMIpXC4juPN/6gw3Y5nkk2VIWY+MAY1a7jZPbI+6LJZZ1Uk8R44
Lf2KSzabFMMYGKJYE1Znxk+JWV8iE+m+n6wWEfRM9l0b4gXXbrtKgaouFbsLcsQA
CvBEQuiFSO9Kq01JPaAN1XDmhqyTarG6lHjXnW7ifNlLXnPbR1RJlprycExOhp0c
axum5K2fRNW2+uZJt+zynMjk2kyjT1lnlsr1Rbgc4Pyionaiydg7zwpiac7y/bdS
F7/WqdmPiff78KMi187EF5pjFqMWhthvBtTbWDuuxaxc2nrXSdiCUN+96j1amQUH
yU/7AZzlnKeKEQQCR4xddsSs2eTrXiLLFRLk9GCK2eh4cUN212eHTrPLKkQf1cN+
ydYbR2pxw3B38LCCNBy+bL+u7e/Tyehs4ynATMpBuEgc5iocTwE=
=zHXW
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull RDMA subsystem updates from Jason Gunthorpe:
"This cycle mainly saw lots of bug fixes and clean up code across the
core code and several drivers, few new functional changes were made.
- Many cleanup and bug fixes for hns
- Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
bnxt_re, efa
- Share the query_port code between all the iWarp drivers
- General rework and cleanup of the ODP MR umem code to fit better
with the mmu notifier get/put scheme
- Support rdma netlink in non init_net name spaces
- mlx5 support for XRC devx and DC ODP"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
RDMA: Fix double-free in srq creation error flow
RDMA/efa: Fix incorrect error print
IB/mlx5: Free mpi in mp_slave mode
IB/mlx5: Use the original address for the page during free_pages
RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
RDMA/hns: Package operations of rq inline buffer into separate functions
RDMA/hns: Optimize cmd init and mode selection for hip08
IB/hfi1: Define variables as unsigned long to fix KASAN warning
IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
IB/hfi1: Add traces for TID RDMA READ
RDMA/siw: Relax from kmap_atomic() use in TX path
IB/iser: Support up to 16MB data transfer in a single command
RDMA/siw: Fix page address mapping in TX path
RDMA: Fix goto target to release the allocated memory
RDMA/usnic: Avoid overly large buffers on stack
RDMA/odp: Add missing cast for 32 bit
RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
Documentation/infiniband: update name of some functions
RDMA/cma: Fix false error message
RDMA/hns: Fix wrong assignment of qp_access_flags
...
To resolve dependencies in following patches
mlx5_ib.h conflict resolved by keeing both hunks
Linux 5.3-rc8
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Map capability bit indicating that HCA supports port buffer's congestion
counters. Also map registers with the corresponding counters.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Merge mlx5-next patches needed for upcoming mlx5 software steering.
1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Return MLX5_ESWITCH_NONE when CONFIG_MLX5_ESWITCH
is not selected.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.
This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.
In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 HW spec and bits updates:
1) Aya exposes IP-in-IP capability in mlx5_core.
2) Maxim exposes lag tx port affinity capabilities.
3) Moshe adds VNIC_ENV internal rq counter bits.
4) ODP capabilities for DC transport
Misc updates:
5) Saeed, two compiler warnings cleanups
6) Add XRQ legacy commands opcodes
7) Use refcount_t for refcount
8) fix a -Wstringop-truncation warning
Michael Guralnik says:
====================
The series adds support for on-demand paging for DC transport.
As DC is a mlx-only transport, the capabilities are exposed to the user
using DEVX objects and later on through mlx5dv_query_device.
====================
Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies
* branch 'mlx5-odp-dc':
IB/mlx5: Add page fault handler for DC initiator WQE
IB/mlx5: Remove check of FW capabilities in ODP page fault handling
net/mlx5: Set ODP capabilities for DC transport to max
In mlx5_core initialization, query max ODP capabilities for DC transport
from FW and set as current capabilities.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Bring in the lastest mlx5-next branch as the RDMA RX RoCE Steering
Support patch series requires it (first two patches are in mlx5-next,
final patch in RDMA tree).
Signed-off-by: Doug Ledford <dledford@redhat.com>
HV VHCA is a layer which provides PF to VF communication channel based on
HyperV PCI config channel. It implements Mellanox's Inter VHCA control
communication protocol. The protocol contains control block in order to
pass messages between the PF and VF drivers, and data blocks in order to
pass actual data.
The infrastructure is agent based. Each agent will be responsible of
contiguous buffer blocks in the VHCA config space. This infrastructure will
bind agents to their blocks, and those agents can only access read/write
the buffer blocks assigned to them. Each agent will provide three
callbacks (control, invalidate, cleanup). Control will be invoked when
block-0 is invalidated with a command that concerns this agent. Invalidate
callback will be invoked if one of the blocks assigned to this agent was
invalidated. Cleanup will be invoked before the agent is being freed in
order to clean all of its open resources or deferred works.
Block-0 serves as the control block. All execution commands from the PF
will be written by the PF over this block. VF will ack on those by
writing on block-0 as well. Its format is described by struct
mlx5_hv_vhca_control_block layout.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Gunthorpe says:
====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================
The branch is based on v5.3-rc5 due to dependencies
* odp_fixes:
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
RDMA/core: Make invalidate_range a device operation
RDMA/odp: Use kvcalloc for the dma_list and page_list
RDMA/odp: Check for overflow when computing the umem_odp end
RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
RDMA/odp: Split creating a umem_odp from ib_umem_get
RDMA/odp: Make the three ways to create a umem_odp clear
RMDA/odp: Consolidate umem_odp initialization
RDMA/odp: Make it clearer when a umem is an implicit ODP umem
RDMA/odp: Iterate over the whole rbtree directly
RDMA/odp: Use the common interval tree library instead of generic
RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use different namespaces for bypass and switchdev loopback because they
have different priorities and default table miss action requirement:
1. bypass: with multiple priorities support, and
MLX5_FLOW_TABLE_MISS_ACTION_DEF as the default table miss action;
2. switchdev loopback: with single priority support, and
MLX5_FLOW_TABLE_MISS_ACTION_SWITCH_DOMAIN as the default table miss
action.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Add the lag_tx_port_affinity HCA capability bit that indicates that
setting port affinity of TISes is supported.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Expose Fw indication that it supports Stateless Offloads for IP over IP
tunneled packets. The following offloads are supported for the inner
packets: RSS, RX & TX Checksum Offloads, LSO and Flow Steering.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add mlx5 interface support for reading internal rq out of buffer counter
as part of QUERY_VNIC_ENV command. The command is used by the driver to
query vnic diagnostic statistics from FW.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add two events that were defined in the device specification but were
not exposed in the driver list.
Post this patch those events can be read over the DEVX events interface
once be reported by the firmware.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Edward Srouji <edwards@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190808084358.29517-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add XRQ legacy commands opcodes, will be used via the DEVX interface.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
This cleans up a lot of unneeded code and logic around the debugfs
files, making all of this much simpler and easier to understand as we
don't need to keep the dentries saved anymore.
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mlx5_eswitch_rep stores same hw ID for all representors.
However it is never used from this structure.
It is always used from mlx5_vport.
Hence, remove unused field.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock, protect mod_hdr hash table from
concurrent modifications with new mutex.
Implement helper function to get flow namespace to prevent code
duplication.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
List of flows attached to mod header entry is used as implicit reference
counter (mod header entry is deallocated when list becomes free) and as a
mechanism to obtain mod header entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to mod header entry is possible. Proper atomic reference counter
is required to support concurrent access.
As a preparation for extending mod header with reference counting, extract
code that lookups and deletes mod header entry into standalone put/get
helpers. In order to remove this dependency on external locking, extend mod
header entry with reference counter to manage its lifetime and extend flow
structure with direct pointer to mod header entry that flow is attached to.
To remove code duplication between legacy and switchdev mode
implementations that both support mod_hdr functionality, store mod_hdr
table in dedicated structure used by both fdb and kernel namespaces. New
table structure is extended with table lock by one of the following patches
in this series. Implement helper function to get correct mod_hdr table
depending on flow namespace.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The TLS progress params context WQE should not include an
Eth segment, drop it.
In addition, align the tls_progress_params layout with the
HW specification document:
- fix the tisn field name.
- remove the valid bit.
Fixes: a12ff35e0f ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reference counters are preferred to use refcount_t instead of
atomic_t.
This is because the implementation of refcount_t can prevent
overflows and detect possible use-after-free.
So convert atomic_t ref counters to refcount_t.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Remove the "reserved_at_40" field to match the device specification.
Fixes: 84df61ebc6 ("net/mlx5: Add HW interfaces used by LAG")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Add a pool of flow counters, based on flow counter bulks, removing the
need to allocate a new counter via a costly FW command during the flow
creation process. The time it takes to acquire/release a flow counter
is cut from ~50 [us] to ~50 [ns].
The pool is part of the mlx5 driver instance, and provides flow
counters for aging flows. mlx5_fc_create() was modified to provide
counters for aging flows from the pool by default, and
mlx5_destroy_fc() was modified to release counters back to the pool
for later reuse. If bulk allocation is not supported or fails, and for
non-aging flows, the fallback behavior is to allocate and free
individual counters.
The pool is comprised of three lists of flow counter bulks, one of
fully used bulks, one of partially used bulks, and one of unused
bulks. Counters are provided from the partially used bulks first, to
help limit bulk fragmentation.
The pool maintains a threshold, and strives to maintain the amount of
available counters below it. The pool is increased in size when a
counter acquisition request is made and there are no available
counters, and it is decreased in size when the last counter in a bulk
is released and there are more available counters than the threshold.
All pool size changes are done in the context of the
acquiring/releasing process.
The value of the threshold is directly correlated to the amount of
used counters the pool is providing, while constrained by a hard
maximum, and is recalculated every time a bulk is allocated/freed.
This ensures that the pool only consumes large amounts of memory for
available counters if the pool is being used heavily. When fully
populated and at the hard maximum, the buffer of available counters
consumes ~40 [MB].
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Check if firmware supports the requested element type before
attempting to create the element type.
In addition, explicitly specify the request element type and tsar type.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
First reserved field is off by one instead of reserved_at_1 it should be
reserved_at_2, fix that.
Fixes: a12ff35e0f ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a handle to invoke the new FW capability of allocating a bulk of
flow counters.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Towards introducing the ability to allocate bulks of flow counters,
refactor the flow counter bulk query process, removing functions and
structs whose names indicated being used for flow counter bulk
allocation FW commands, despite them actually only being used to
support bulk querying, and migrate their functionality to correctly
named functions in their natural location, fs_counters.c.
Additionally, optimize the bulk query process by:
* Extracting the memory used for the query to mlx5_fc_stats so
that it is only allocated once, and not for each bulk query.
* Querying all the counters in one function call.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch prevents a race between user invoked cached counters
query and a neighbor last usage updater.
The cached flow counter stats can be queried by calling
"mlx5_fc_query_cached" which provides the number of bytes and
packets that passed via this flow since the last time this counter
was queried.
It does so by reducting the last saved stats from the current, cached
stats and then updating the last saved stats with the cached stats.
It also provide the lastuse value for that flow.
Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the
last usage time of encapsulation flows, it calls the flow counter
query method periodically and async to user queries of the flow counter
using cls_flower.
This call is causing the driver to update the last reported bytes and
packets from the cache and therefore, future user queries of the flow
stats will return lower than expected number for bytes and packets
since the last saved stats in the driver was updated async to the last
saved stats in cls_flower.
This causes wrong stats presentation of encapsulation flows to user.
Since the neighbor usage updater only needs the lastuse stats from the
cached counter, the fix is to use a dedicated lastuse query call that
returns the lastuse value without synching between the cached stats and
the last saved stats.
Fixes: f6dfb4c3f2 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix modify_cq_in alignment to match the device specification.
After this fix the 'cq_umem_valid' field will be in the right offset.
Cc: <stable@vger.kernel.org> # 4.19
Fixes: bd37197554 ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
Signed-off-by: Edward Srouji <edwards@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
A smaller cycle this time. Notably we see another new driver, 'Soft
iWarp', and the deletion of an ancient unused driver for nes.
- Revise and simplify the signature offload RDMA MR APIs
- More progress on hoisting object allocation boiler plate code out of the
drivers
- Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib, i40iw
- Tree wide cleanups: struct_size, put_user_page, xarray, rst doc conversion
- Removal of obsolete ib_ucm chardev and nes driver
- netlink based discovery of chardevs and autoloading of the modules
providing them
- Move more of the rdamvt/hfi1 uapi to include/uapi/rdma
- New driver 'siw' for software based iWarp running on top of netdev,
much like rxe's software RoCE.
- mlx5 feature to report events in their raw devx format to userspace
- Expose per-object counters through rdma tool
- Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core
from netdev
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl0ozSwACgkQOG33FX4g
mxqncg//Qe2zSnlbd6r3hofsc1WiHSx/CiXtT52BUGipO+cWQUwO7hGFuUHIFCuZ
JBg7mc998xkyLIH85a/txd+RwAIApKgHVdd+VlrmybZeYCiERAMFpWg8cHpzrbnw
l3Ln9fTtJf/NAhO0ZCGV9DCd01fs9yVQgAv21UnLJMUhp9Pzk/iMhu7C7IiSLKvz
t7iFhEqPXNJdoqZ+wtWyc/463YxKUd9XNg9Z1neQdaeZrX4UjgDbY9x/ub3zOvQV
jc/IL4GysJ3z8mfx5mAd6sE/jAjhcnJuaGYYATqkxiLZEP+muYwU50CNs951XhJC
b/EfRQIcLg9kq/u6CP+CuWlMrRWy3U7yj3/mrbbGhlGq88Yt6FGqUf0aFy6TYMaO
RzTG5ZR+0AmsOrR1QU+DbH9CKX5PGZko6E7UCdjROqUlAUOjNwRr99O5mYrZoM9E
PdN2vtdWY9COR3Q+7APdhWIA/MdN2vjr3LDsR3H94tru1yi6dB/BPDRcJieozaxn
2T+YrZbV+9/YgrccpPQCilaQdanXKpkmbYkbEzVLPcOEV/lT9odFDt3eK+6duVDL
ufu8fs1xapMDHKkcwo5jeNZcoSJymAvHmGfZlo2PPOmh802Ul60bvYKwfheVkhHF
Eee5/ovCMs1NLqFiq7Zq5mXO0fR0BHyg9VVjJBZm2JtazyuhoHQ=
=iWcG
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A smaller cycle this time. Notably we see another new driver, 'Soft
iWarp', and the deletion of an ancient unused driver for nes.
- Revise and simplify the signature offload RDMA MR APIs
- More progress on hoisting object allocation boiler plate code out
of the drivers
- Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib,
i40iw
- Tree wide cleanups: struct_size, put_user_page, xarray, rst doc
conversion
- Removal of obsolete ib_ucm chardev and nes driver
- netlink based discovery of chardevs and autoloading of the modules
providing them
- Move more of the rdamvt/hfi1 uapi to include/uapi/rdma
- New driver 'siw' for software based iWarp running on top of netdev,
much like rxe's software RoCE.
- mlx5 feature to report events in their raw devx format to userspace
- Expose per-object counters through rdma tool
- Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core
from netdev"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (194 commits)
RMDA/siw: Require a 64 bit arch
RDMA/siw: Mark expected switch fall-throughs
RDMA/core: Fix -Wunused-const-variable warnings
rdma/siw: Remove set but not used variable 's'
rdma/siw: Add missing dependencies on LIBCRC32C and DMA_VIRT_OPS
RDMA/siw: Add missing rtnl_lock around access to ifa
rdma/siw: Use proper enumerated type in map_cqe_status
RDMA/siw: Remove unnecessary kthread create/destroy printouts
IB/rdmavt: Fix variable shadowing issue in rvt_create_cq
RDMA/core: Fix race when resolving IP address
RDMA/core: Make rdma_counter.h compile stand alone
IB/core: Work on the caller socket net namespace in nldev_newlink()
RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
RDMA/mlx5: Set RDMA DIM to be enabled by default
RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink
RDMA/core: Provide RDMA DIM support for ULPs
linux/dim: Implement RDMA adaptive moderation (DIM)
IB/mlx5: Report correctly tag matching rendezvous capability
docs: infiniband: add it to the driver-api bookset
IB/mlx5: Implement VHCA tunnel mechanism in DEVX
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0ng7AACgkQSD+KveBX
+j41FQgAjcUgjCiq6gmwI62EsMpZjgjwKhGfYz56jwSmNDgG7Ltj9o/K/+b+oL3u
ZVPO9YgNstWuoPyApE12MzsA+kx/ydRz8GW6IrI5v71JK3yy8Mw/HrYVyUKEnzH4
BVnSRnvHgvCnjZ3SjmUb6Xs74vmoxti9cYr0w0XtBogzuXlfpSYILYFiqrkks6sH
VEmD9P3dTBNQUviYwkRcHhJoI7NqdQBApn9/30F4dQsmnUsikoaom5GovrbBw817
L3Q+U/9+2Xp5k1i7J2MLFSL6Vyaxs/48eRKxs3YubBmrtwBjPj+aOYdRdDOscguQ
WcDRiYPOW72cGIlFtYyuzApqo+J+xw==
=mq0x
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-07-11
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.15
('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn')
For -stable v5.1
('net/mlx5e: Fix port tunnel GRE entropy control')
('net/mlx5e: Rx, Fix checksum calculation for new hardware')
('net/mlx5e: Fix return value from timeout recover function')
('net/mlx5e: Fix error flow in tx reporter diagnose')
For -stable v5.2
('net/mlx5: E-Switch, Fix default encap mode')
Conflict note: This pull request will produce a small conflict when
merged with net-next.
In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
Take the hunk from net and replace:
esw_offloads_steering_init(esw, vf_nvports, total_nvports);
with:
esw_offloads_steering_init(esw);
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CQE checksum full mode in new HW, provides a full checksum of rx frame.
Covering bytes starting from eth protocol up to last byte in the received
frame (frame_size - ETH_HLEN), as expected by the stack.
Fixing up skb->csum by the driver is not required in such case. This fix
is to avoid wrong checksum calculation in drivers which already support
the new hardware with the new checksum mode.
Fixes: 85327a9c41 ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Max Gurtovoy says:
====================
Those two patches introduce VHCA tunnel mechanism to DEVX interface
needed for Bluefield SOC. See extensive commit messages for more
information.
====================
Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies
* branch 'vcha-tunnel':
IB/mlx5: Implement VHCA tunnel mechanism in DEVX
net/mlx5: Introduce VHCA tunnel device capability
When using the device emulation feature (introduced in Bluefield-1 SOC),
a privileged function (the device emulation manager) will be able to
create a channel to execute commands on behalf of the emulated function.
This channel will be a general object of type VHCA_TUNNEL that will have
a unique ID for each emulated function. This ID will be passed in each
cmd that will be issued by the emulation SW in a well known offset in
the command header.
This channel is needed since the emulated function doesn't have a normal
command interface to the HCA HW, but some basic configuration for that
function is needed (e.g. initialize and enable the HCA). For that matter,
a specific command-set was defined and only those commands will be issued
by the HCA.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Always contain all acceleration functions declarations in
'accel' files, independent to the flags setting.
For this, introduce new flags CONFIG_FPGA_{IPSEC/TLS} and use stubs
where needed.
This obsoletes the need for stubs in 'fpga' files. Remove them.
Also use the new flags in Makefile, to decide whether to compile
TLS-specific or IPSEC-specific objects, or not.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support bind a qp with counter. If counter is null then bind the qp to the
default counter. Different QP state has different operation:
- RESET: Set the counter field so that it will take effective during
RST2INIT change;
- RTS: Issue an RTS2RTS change to update the QP counter;
- Other: Set the counter field and mark the counter_pending flag, when QP
is moved to RTS state and this flag is set, then issue an RTS2RTS
modification to update the counter.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Required for dependencies in the next patches.
* mlx5-next:
net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
net/mlx5: Properly name the generic WQE control field
net/mlx5: Introduce TLS TX offload hardware bits and structures
net/mlx5: Refactor mlx5_esw_query_functions for modularity
net/mlx5: E-Switch prepare functions change handler to be modular
net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
Misc updates from mlx5-next branch:
1) Add the required HW definitions and structures for upcoming TLS
support.
2) Add support for MCQI and MCQS hardware registers for fw version query.
3) Added hardware bits and structures definitions for sub-functions
4) Small code cleanup and improvement for PF pci driver.
5) Bluefield (ECPF) updates and refactoring for better E-Switch
management on ECPF embedded CPU NIC:
5.1) Consolidate querying eswitch number of VFs
5.2) Register event handler at the correct E-Switch init stage
5.3) Setup PF's inline mode and vlan pop when the ECPF is the
E-Swtich manager ( the host PF is basically a VF ).
5.4) Handle Vport UC address changes in switchdev mode.
6) Cleanup the rep and netdev reference when unloading IB rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
i# All conflicts fixed but you are still merging.
Add rts2rts_qp_counters_set_id field in hca cap so that RTS2RTS
qp modification can be used to change the counter of a QP.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
A generic WQE control field is used for different purposes
in different cases.
Use union to allow using the proper name in each case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.
Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Required for dependencies in the next patches.
Resolved the conflicts:
- esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports()
version
- esw_offloads_steering_init() drop the cap test
- esw_offloads_init() drop the extra function arguments
* branch 'mlx5-next': (39 commits)
net/mlx5: Expose device definitions for object events
net/mlx5: Report EQE data upon CQ completion
net/mlx5: Report a CQ error event only when a handler was set
net/mlx5: mlx5_core_create_cq() enhancements
net/mlx5: Expose the API to register for ANY event
net/mlx5: Use event mask based on device capabilities
net/mlx5: Fix mlx5_core_destroy_cq() error flow
net/mlx5: E-Switch, Handle UC address change in switchdev mode
net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop
net/mlx5: E-Switch, Use iterator for vlan and min-inline setups
net/mlx5: E-Switch, Reg/unreg function changed event at correct stage
net/mlx5: E-Switch, Consolidate eswitch function number of VFs
net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
net/mlx5: Handle host PF vport mac/guid for ECPF
net/mlx5: E-Switch, Use correct flags when configuring vlan
net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs
net/mlx5: Don't handle VF func change if host PF is disabled
net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices
net/mlx5: Move pci status reg access mutex to mlx5_pci_init
net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type
...
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Expose an extra device definitions for objects events.
It includes: object_type values for legacy objects and generic data
header for any other object.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Report EQE data upon CQ completion to let upper layers use this data.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Expose the API to register for ANY event, mlx5_ib will be able to use
this functionality for its needs.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.
As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
When ECPF is the eswitch manager, host PF is treated like other VFs.
Driver should do the same for inline mode and vlan pop.
Add new iterators to include host PF if ECPF is the eswitch manager.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.
Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When ECPF is eswitch manager, it has the privilege to query and
configure the mac and node guid of host PF.
While vport number of host PF is 0, the vport command should be
issued with other_vport set in this case as the cmd is issued by
ECPF vport(0xfffe).
Add a specific function to query own vport mac. Low level functions
are used by vport manager to query/modify any vport mac and node guid.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
While enabling SR-IOV, PCI core already checks that if SR-IOV is already
enabled, it returns failure error code.
Hence, remove such duplicate check from mlx5_core driver.
While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in
reverse order of mlx5_device_enable_sriov().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When ECPF eswitch manager is at offloads mode, it monitors functions
changed event from host PF side and acts according to the number of
VFs enabled/disabled.
As ECPF and host PF work in two independent hosts, it's possible that
host PF OS reboots but ECPF system is still kept on and continues
monitoring events from host PF. When kernel from host PF side is
booting, PCI iov driver does sriov_init and compute_max_vf_buses by
iterating over all valid num of VFs. This triggers FLR and generates
functions changed events, even though host PF HCA is not enabled at
this time. However, ECPF is not aware of this information, and still
handles these events as usual. ECPF system will see massive number of
reps are created, but destroyed immediately once creation finished.
To eliminate this noise, a bit is added to host parameter context to
indicate host PF is disabled. ECPF will not handle the VF changed
event if this bit is set.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In the single IB device mode, the mapping between vport number and
rep relies on a counter. However for dynamic vport allocation, it is
desired to keep consistent map of eswitch vport and IB port.
Hence, simplify code to remove the free running counter and instead
use the available vport index during load/unload sequence from the
eswitch.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Given a fw component index, the MCQI register allows us to query
this component's information (e.g. its version and capabilities).
Given a fw component index, the MCQS register allows us to query the
status of a fw component, including its type and state
(e.g. PRESET/IN_USE).
It can be used to find the index of a component of a specific type, by
sequentially increasing the component index, and querying each time the
type of the returned component.
If max component index is reached, 'last_index_flag' is set by the HCA.
These registers' description was added to query the running and pending
fw version of the HCA.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Update mlx5 device interface data structures for:
1. New command definitions for allocating, deallocating SF
2. Query SF partition
3. Eswitch SF fields
4. HCA CAP SF fields
5. Extend Eswitch functions command for SF
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
VTS5zps=
=huBY
-----END PGP SIGNATURE-----
Merge tag 'v5.2-rc6' into rdma.git for-next
For dependencies in next patches.
Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Putting an empty 'mlx5_flow_spec' structure on the stack is a bit
wasteful and causes a warning on 32-bit architectures when building
with clang -fsanitize-coverage:
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create':
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Since the structure is never written to, we can statically allocate
it to avoid the stack usage. To be on the safe side, mark all
subsequent function arguments that we pass it into as 'const'
as well.
Fixes: 10caabdaad ("net/mlx5e: Use termination table for VLAN push actions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Misc updates from mlx5-next branch:
1) E-Switch vport metadata support for source vport matching
2) Convert mkey_table to XArray
3) Shared IRQs and to use single IRQ for all async EQs
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In vport metadata matching, source port number is replaced by metadata.
While FW has no idea about what it is in the metadata, a syndrome will
happen. Specify a known origin to avoid the syndrome.
However, there is no functional change because ANY_VPORT (0) is filled
in flow_source, the same default value as before, as a pre-step towards
metadata matching for fast path.
There are two other values can be filled in flow_source. When setting
0x1, packet matching this rule is from uplink, while 0x2 is for packet
from other local vports.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When a dual-port VHCA sends a RoCE packet on its non-native port, and the
packet arrives to its affiliated vport FDB, a mismatch might occur on the
rules that match the packet source vport as it is not represented by single
VHCA only in this case. So we change to match on metadata instead of source
vport.
To do that, a rule is created in all vports and uplink ingress ACLs, to
save the source vport number and vhca id in the packet's metadata in order
to match on it later.
The metadata register used is the first of the 32-bit type C registers. It
can be used for matching and header modify operations. The higher 16 bits
of this register are for vhca id, and the lower 16 ones is for vport
number.
This change is not for dual-port RoCE only. If HW and FW allow, the vport
metadata matching is enabled by default.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When a dual-port VHCA sends a RoCE packet on its non-native port, and
the packet arrives to its affiliated vport FDB, a mismatch might occur
on the rules that match the packet source vport. So we replace the
match on source port with the match on metadata that was configured in
ingress ACL, and that metadata will be passed further also to the NIC
RX table of the eswitch manager.
Introduce vport metadata matching bits and enum constants as a pre-step
towards metadata matching.
o metadata type C registers in the misc parameters 2 fields.
o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports
ingress ACL for the uplink vport.
o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to
support propagating the metadata to the nic rx through the loopback
path.
o flow_source in flow context, to indicate the known origin of packets.
o enum constants, to support the above bits.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The lock protecting the data structure does not need to be an rwlock. The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.
Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.
This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This new WR will be used to perform PI (protection information) handover
using the new API. Using the new API, the user will post a single WR that
will internally perform all the needed actions to complete PI operation.
This new WR will use a memory region that was allocated as
IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the
registration. In the old API, in order to perform a signature handover
operation, each ULP should perform the following:
1. Map and register the data buffers.
2. Map and register the protection buffers.
3. Post a special reg WR to configure the signature handover operation
layout.
4. Invalidate the signature memory key.
5. Invalidate protection buffers memory key.
6. Invalidate data buffers memory key.
In the new API, the mapping of both data and protection buffers is
performed using a single call to ib_map_mr_sg_pi function. Also the
registration of the buffers and the configuration of the signature
operation layout is done by a single new work request called
IB_WR_REG_MR_INTEGRITY.
This patch implements this operation for mlx5 devices that are capable to
offload data integrity generation/validation while performing the actual
buffer transfer.
This patch will not remove the old signature API that is used by the iSER
initiator and target drivers. This will be done in the future.
In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work
request, we are using a single UMR operation to register both data and
protection buffers using KLM's.
Afterwards, another UMR operation will describe the strided block format.
These will be followed by 2 SET_PSV operations to set the memory/wire
domains initial signature parameters passed by the user.
In the end of the whole transaction, only the signature memory key
(the one that exposed for the RDMA operation) will be invalidated.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add API to get the current Eswitch encap mode.
It will be used in downstream patches to check if
flow table can be created with encap support or not.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Report devlink health on FW fatal issues via fw_fatal_reporter. The
driver recover flow for FW fatal error is now being handled by the
devlink health.
Having the recovery controlled by devlink health, the user has the
ability to cancel the auto-recovery for debug session and run it
manually.
Call mlx5_enter_error_state() before calling devlink_health_report() to
ensure entering device error state even if auto-recovery is off.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Create mlx5_devlink_health_reporter for fw fatal reporter.
The fw fatal reporter is added in addition to the fw reporter and
implements the recover callback.
The point of having two reporters for FW issues, is that we
don't want to run FW recover on any issue, but only fatal ones.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use devlink_health_report() to report any symptom of FW issue as FW
counter miss or new health syndrome.
The FW issues detected in mlx5 during poll_health which is called in
timer atomic context and so health work queue is used to schedule the
reports.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>