-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmOS/ooACgkQrB3Eaf9P
W7cOVA/+L8rHwLe78DDz/PNESyShtTVCYBDF/ngYMV8AIvjSfPresMbFV3NKqO5E
3qbMl199QH2eWI7dhQaQ+edynSG0QCx5FmPai0UuHPLxATct1pNPJPpvBryO/4jC
ZouYBIVjdMbq6Y8vD2gJ8UtA7TZpncP0HYOKTvYyDL9kQ+nUmu9KUYxcEcNHL5w+
TjL9jJafR+GqczCRiwAoMKIFV7lUrTFzh7slfINNN5DVTuzN33H7Tp70z6IKOfVL
1LATlZv7mqpLVF6dQuMXOt6kd/BEBl1y4ZHTHow5nstJvwu99P96iKwEfIXuOvWK
fulhDU61eIik8D9QJWeM7TuZDbYewWI77plwVY/R/zRt0At4VLpq7I1m33CmLLMY
Fb5fMxJPkM8YAtDID+BknYPrSAcxo8ji04BWFrVqQ6InPmtGfnP83XSSkYfxY7FB
3hUfz4igsJpV5vrS1EFRhjklNwI+jY2yAvIggQtdkJ97ubSUY3E4ACfNqlJ5lJbv
2KqWnSKlG21F9ZTR68VzcQVhFIQF6j/EuQqro+4TQUIdZswcml2iK32zrel0rs9C
iAsgQQaMV9a2vEaScRZqdOJ4HENTbm9wD7Mso/i5vr+lnpr1ThKjQo8osU8YUlbC
SDTMeWRRos+esFML6SP+YZ7SM/qXMluou204x/llJ/VDMXQ5e8k=
=enQp
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next 2022-12-09
1) Add xfrm packet offload core API.
From Leon Romanovsky.
2) Add xfrm packet offload support for mlx5.
From Leon Romanovsky and Raed Salem.
3) Fix a typto in a error message.
From Colin Ian King.
* tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits)
xfrm: Fix spelling mistake "oflload" -> "offload"
net/mlx5e: Open mlx5 driver to accept IPsec packet offload
net/mlx5e: Handle ESN update events
net/mlx5e: Handle hardware IPsec limits events
net/mlx5e: Update IPsec soft and hard limits
net/mlx5e: Store all XFRM SAs in Xarray
net/mlx5e: Provide intermediate pointer to access IPsec struct
net/mlx5e: Skip IPsec encryption for TX path without matching policy
net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows
net/mlx5e: Improve IPsec flow steering autogroup
net/mlx5e: Configure IPsec packet offload flow steering
net/mlx5e: Use same coding pattern for Rx and Tx flows
net/mlx5e: Add XFRM policy offload logic
net/mlx5e: Create IPsec policy offload tables
net/mlx5e: Generalize creation of default IPsec miss group and rule
net/mlx5e: Group IPsec miss handles into separate struct
net/mlx5e: Make clear what IPsec rx_err does
net/mlx5e: Flatten the IPsec RX add rule path
net/mlx5e: Refactor FTE setup code to be more clear
net/mlx5e: Move IPsec flow table creation to separate function
...
====================
Link: https://lore.kernel.org/r/20221209093310.4018731-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Range is a new flow destination type which allows matching on
a range of values instead of matching on a specific value.
Range flow destination has the following fields:
- hit_ft: flow table to forward the traffic in case of hit
- miss_ft: flow table to forward the traffic in case of miss
- field: which packet characteristic to match on
- min: minimal value for the selected field
- max: maximal value for the selected field
Note:
- In order to match, the value in the packet should meet
the following criteria: min <= value < max
- Currently, the only supported field type is L2 packet length
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Update full structure of match definer and add an ID of
the SELECT match definer type.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Downstream patch requires to get other function GENERAL2 caps while
mlx5_vport_get_other_func_cap() gets only one type of caps (general).
Rename it to represent this and introduce a generic implementation
of mlx5_vport_get_other_func_cap().
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce IFC related capabilities to enable setting VF to be able to
perform live migration. e.g.: to be migratable.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MLX5_UMR_KLM_ALIGNMENT is in units of number of entries, while
MLX5_UMR_MTT_ALIGNMENT (generalized and renamed to
MLX5_UMR_FLEX_ALIGNMENT) is in byte units. This is misleading and
confusing.
Replace this KLM definition with one based on the generic definition.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Per the device spec, MLX5_UMR_MTT_ALIGNMENT is good not only for UMR MTT
entries, but for all other entries as well, like KLMs and KSMs.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Defines MLX5_UMR_MTT_MASK and MLX5_UMR_MTT_MIN_CHUNK_SIZE are not in
use. Remove them.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Per the device spec, MTTs/KLMs list in a UMR WQE must be aligned to 64B.
Per our SW design, the MTT/KLMs list would need alignment only if it's
too small, for example on PPC when PAGE_SIZE is 64KB, and only 4 pages
are needed to cover a MPWQE of size 256KB.
Padding, if needed, is taken into account when calculating the UMR WQE
fields (ds_cnt and xlt_octowords), however no entries are provided,
instead garbage is passed.
No real harm though, as these parts act as gaps between the RX MPWQEs
and not used by any of them. Hence, in practice, device does not try to
write any incoming packet to them. Still, prefer providing clean padding
marking the end of the list, and do not map garbage into the RQ memory
region.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Remove mlx5_priv.ctx_list and ctx_lock which are no longer used after
commit 601c10c89c ("net/mlx5: Delete custom device management logic").
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
While moving to new CMD API (quiet API), some pre-existing flows may call the new API
function that in case of error, returns the error instead of printing it as previously done.
For such flows we bring back the print but to tracepoint this time for sys admins to
have the ability to check for errors especially for commands using the new quiet API.
Tracepoint output example:
devlink-1333 [001] ..... 822.746922: mlx5_cmd: ACCESS_REG(0x805) op_mod(0x0) failed, status bad resource(0x5), syndrome (0xb06e1f), err(-22)
Fixes: f23519e542 ("net/mlx5: cmdif, Add new api for command execution")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CQE compression feature improves performance by reducing PCI bandwidth
bottleneck on CQEs write.
Enhanced CQE compression introduced in ConnectX-6 and it aims to reduce
CPU utilization of SW side packets decompression by eliminating the
need to rewrite ownership bit, which is likely to cost a cache-miss, is
replaced by validity byte handled solely by HW.
Another advantage of the enhanced feature is that session packets are
available to SW as soon as a single CQE slot is filled, instead of
waiting for session to close, this improves packet latency from NIC to
host.
Performance:
Following are tested scenarios and reults comparing basic and enahnced
CQE compression.
setup: IXIA 100GbE connected directly to port 0 and port 1 of
ConnectX-6 Dx 100GbE dual port.
Case #1 RX only, single flow goes to single queue:
IRQ rate reduced by ~ 30%, CPU utilization improved by 2%.
Case #2 IP forwarding from port 1 to port 0 single flow goes to
single queue:
Avg latency improved from 60us to 21us, frame loss improved from 0.5% to 0.0%.
Case #3 IP forwarding from port 1 to port 0 Max Throughput IXIA sends
100%, 8192 UDP flows, goes to 24 queues:
Enhanced is equal or slightly better than basic.
Testing the basic compression feature with this patch shows there is
no perfrormance degradation of the basic compression feature.
Signed-off-by: Ofer Levi <oferle@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
- Small bug fixes in mlx5, efa, rxe, hns, irdma, erdma, siw
- rts tracing improvements
- Code improvements: strlscpy conversion, unused parameter, spelling
mistakes, unused variables, flex arrays
- restrack device details report for hns
- Simplify struct device initialization in SRP
- Eliminate the never-used service_mask support in IB CM
- Make rxe not print to the console for some kinds of network packets
- Asymetric paths and router support in the CM through netlink messages
- DMABUF importer support for mlx5devx umem's
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYz9bgAAKCRCFwuHvBreF
YevoAP47J/svlOFlFtBhTVF79Ddtf+MMeqeVvLoHHQbCU5rUpAD+KUpTXAvwNcM9
dHwNXz9ctanP5397qusH0rxOKPo/EA4=
=lgSv
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Not a big list of changes this cycle, mostly small things. The new
MANA rdma driver should come next cycle along with a bunch of work on
rxe.
Summary:
- Small bug fixes in mlx5, efa, rxe, hns, irdma, erdma, siw
- rts tracing improvements
- Code improvements: strlscpy conversion, unused parameter, spelling
mistakes, unused variables, flex arrays
- restrack device details report for hns
- Simplify struct device initialization in SRP
- Eliminate the never-used service_mask support in IB CM
- Make rxe not print to the console for some kinds of network packets
- Asymetric paths and router support in the CM through netlink
messages
- DMABUF importer support for mlx5devx umem's"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (84 commits)
RDMA/rxe: Remove error/warning messages from packet receiver path
RDMA/usnic: fix set-but-not-unused variable 'flags' warning
IB/hfi1: Use skb_put_data() instead of skb_put/memcpy pair
RDMA/hns: Unified Log Printing Style
RDMA/hns: Replacing magic number with macros in apply_func_caps()
RDMA/hns: Repacing 'dseg_len' by macros in fill_ext_sge_inl_data()
RDMA/hns: Remove redundant 'max_srq_desc_sz' in caps
RDMA/hns: Remove redundant 'num_mtt_segs' and 'max_extend_sg'
RDMA/hns: Remove redundant 'phy_addr' in hns_roce_hem_list_find_mtt()
RDMA/hns: Remove redundant 'use_lowmem' argument from hns_roce_init_hem_table()
RDMA/hns: Remove redundant 'bt_level' for hem_list_alloc_item()
RDMA/hns: Remove redundant 'attr_mask' in modify_qp_init_to_init()
RDMA/hns: Remove unnecessary brackets when getting point
RDMA/hns: Remove unnecessary braces for single statement blocks
RDMA/hns: Cleanup for a spelling error of Asynchronous
IB/rdmavt: Add __init/__exit annotations to module init/exit funcs
RDMA/rxe: Remove redundant num_sge fields
RDMA/mlx5: Enable ATS support for MRs and umems
RDMA/mlx5: Add support for dmabuf to devx umem
RDMA/core: Add UVERBS_ATTR_RAW_FD
...
Start health poll at earlier stage, so if fw fatal issue occurred before
or during initialization commands such as init_hca or set_hca_cap the
poll health can detect and indicate that the driver is already in error
state.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the rx_oversize_pkts_buffer counter to ethtool statistics.
This counter exposes the number of dropped received packets due to
length which arrived to RQ and exceed software buffer size allocated by
the device for incoming traffic. It might imply that the device MTU is
larger than the software buffers size.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of passing the unaligned flag, pass an enum that indicates the
UMR mode. The next commit will add the third mode (KLM for certain
configurations of XSK), which will be added to this enum instead of
adding another bool flag everywhere.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
UMR MTTs used in striding RQ have certain alignment requirements. While
it's guaranteed to work when UMR pages are aligned to the UMR page size,
in practice it works then UMR pages are aligned to 8 bytes. However,
it's still not enough flexibility for the unaligned mode of XSK. This
patch leverages KSM to map UMR pages without alignment requirements,
when unaligned XSK is active. The downside is that KSM entries are twice
as big as MTTs, which limits the maximum WQE size, so regular RQs and
aligned XSK continue using MTTs.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit allows striding RQ to determine MTT page size at runtime,
instead of sticking to the compile-time PAGE_SIZE. This functionality
will be used by a following commit that adjusts the MTT page size to the
XSK frame size.
Stick with PAGE_SIZE for XSK on legacy RQ, as frag_stride is not used in
data path, it only helps calculate how pages are partitioned into
fragments, and PAGE_SIZE will ensure each fragment starts at the
beginning of a new allocation unit (XSK frame).
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the capability that will allow the driver to determine the minimal
MTT page size to be able to map the smallest possible pages in XSK. The
older firmwares that don't have this capability default to 12 (i.e.
4096-byte pages).
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
updates from mlx5-next 2022-09-24
Updates form mlx5-next including[1]:
1) HW definitions and support for NPPS clock settings.
2) various cleanups
3) Enable hash mode by default for all NICs
4) page tracker and advanced virtualization HW definitions for vfio
[1] https://lore.kernel.org/netdev/20220907233636.388475-1-saeed@kernel.org/
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Remove from FPGA IFC file not-needed definitions
net/mlx5: Remove unused structs
net/mlx5: Remove unused functions
net/mlx5: detect and enable bypass port select flow table
net/mlx5: Lag, enable hash mode by default for all NICs
net/mlx5: Lag, set active ports if support bypass port select flow table
RDMA/mlx5: Don't set tx affinity when lag is in hash mode
net/mlx5: add IFC bits for bypassing port select flow table
net/mlx5: Add support for NPPS with real time mode
net/mlx5: Expose NPPS related registers
net/mlx5: Query ADV_VIRTUALIZATION capabilities
net/mlx5: Introduce ifc bits for page tracker
RDMA/mlx5: Move function mlx5_core_query_ib_ppcnt() to mlx5_ib
====================
Link: https://lore.kernel.org/all/20220927201906.234015-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move IP layout bits definitions to be close to the place that actually
uses it, together with removal extra defines that not in-use.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Remove structs which are no longer used in the driver:
mlx5dr_cmd_qp_create_attr
mlx5_fs_dr_ns
mlx5_pas
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Remove functions which are no longer used in the driver:
mlx5e_ipsec_is_tx_flow
mlx5_health_flush
get_cqe_enhanced_num_mini_cqes
get_cqe_l3_hdr_type
mlx5_health_flush
mlx5_fs_is_ipsec_flow
_mlx5_fs_is_outer_ipproto_flow
mlx5_fs_is_outer_tcp_flow
mlx5_fs_is_outer_udp_flow
mlx5_fs_is_vxlan_flow
mlx5_fs_is_outer_ipsec_flow
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In hash mode, without setting tx affinity explicitly, the port select
flow table decides which port is used for the traffic.
If port_select_flow_table_bypass capability is supported and tx affinity
is set explicitly for QP/TIS, they will be added into the explicit affinity
table in FW to check which port is used for the traffic.
1. The overloaded explicit affinity table may affect performance.
To avoid this, do not set tx affinity explicitly by default.
2. The packets of the same flow need to be transmitted on the same port.
Because the packets of the same flow use different QPs in slow & fast
path, it shouldn't set tx affinity explicitly for these QPs.
Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
port_select_flow_table_bypass - When set, device supports
bypass port select flow table.
active_port - Bitmask indicates the current active ports
in PORT_SELECT_FT LAG.
MLX5_SET_HCA_CAP_OP_MODE_PORT_SELECTION - op_mod to operate
PORT_SELECTION_Capabilities.
Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support for setting NPPS. NPPS is currently available in
REAL_TIME_CLOCK mode only. In addition allow the user to set the pulse
duration.
When NPPS pulse duration is not set explicitly by the user, driver set
it to 50% of the NPPS period.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add management capability bits indicating firmware may support N pulses
per second. Add corresponding fields in MTPPS register.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
MACsec EPN splits the packet number (PN) into two 32-bits fields,
epn_lsb (32 least significant bits (LSBs) of PN) and epn_msb (32
most significant bits (MSBs) of PN).
Epn_msb bits are managed by SW and for that HW is required to send
an object change event of type EPN event notifying the SW to update
the epn_msb in addition, once epn_msb is updated SW update HW with
the new epn_msb value for HW to perform replay protection.
To prevent HW from stopping while handling the event, SW manages
another bit for HW called epn_overlap, HW uses the latter to get
an indication regarding how to read the epn_msb value correctly
while still receiving packets.
Add epn event handling that updates the epn_overlap and epn_msb for
every 2^31 packets according to the following logic:
if epn_lsb crosses 2^31 (half sequence number wraparound) upon HW
relevant event, SW updates the esn_overlap value to OLD (value = 1).
When the epn_lsb crosses 2^32 (full sequence number wraparound)
upon HW relevant event, SW updates the esn_overlap to NEW
(value = 0) and increment the esn_msb.
When using MACsec EPN a salt and short secure channel id (ssci)
needs to be provided by the user, when offloading EPN need to pass
this salt and ssci to the HW to be used in the initial vector (IV)
calculations.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add ifc bits related to advanced steering operations (ASO) and general
object modify for macsec to use as part of offloading EPN and replay
protection features.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix ifc fields name to be consistent with the device spec document.
Fixes: 8385c51ff5 ("net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Many bug fixes in several drivers:
- Fix misuse of the DMA API in rtrs
- Several irdma issues: hung task due to SQ flushing, incorrect capability
reporting to userspace, improper error handling for MW corners, touching
an uninitialized SGL for during invalidation.
- hns was using the wrong page size limits for the HW, an incorrect
calculation of wqe_shift causing WQE corruption, and mis computed
a timer id.
- Fix a crash in SRP triggered by blktests
- Fix compiler errors by calling virt_to_page() with the proper type in
siw
- Userspace triggerable deadlock in ODP
- mlx5 could use the wrong profile due to some driver loading races,
counters were not working in some device configurations, and a crash on
error unwind.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYxtj4QAKCRCFwuHvBreF
YQNdAQDOAoXv3PCZikmyu4zmjzVdeUUXEig5RU3MgFdCimo99gEA8t+2/pHmnSTB
vn7cxuKMpJydAmLVFJPZxaOEuaBdegQ=
=/eYF
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Many bug fixes in several drivers:
- Fix misuse of the DMA API in rtrs
- Several irdma issues: hung task due to SQ flushing, incorrect
capability reporting to userspace, improper error handling for MW
corners, touching an uninitialized SGL for during invalidation.
- hns was using the wrong page size limits for the HW, an incorrect
calculation of wqe_shift causing WQE corruption, and mis computed a
timer id.
- Fix a crash in SRP triggered by blktests
- Fix compiler errors by calling virt_to_page() with the proper type
in siw
- Userspace triggerable deadlock in ODP
- mlx5 could use the wrong profile due to some driver loading races,
counters were not working in some device configurations, and a
crash on error unwind"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Report RNR NAK generation in device caps
RDMA/irdma: Use s/g array in post send only when its valid
RDMA/irdma: Return correct WC error for bind operation failure
RDMA/irdma: Return error on MR deregister CQP failure
RDMA/irdma: Report the correct max cqes from query device
MAINTAINERS: Update maintainers of HiSilicon RoCE
RDMA/mlx5: Fix UMR cleanup on error flow of driver init
RDMA/mlx5: Set local port to one when accessing counters
RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile
IB/core: Fix a nested dead lock as part of ODP flow
RDMA/siw: Pass a pointer to virt_to_page()
RDMA/srp: Set scmnd->result only when scmnd is not NULL
RDMA/hns: Remove the num_qpc_timer variable
RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift
RDMA/hns: Fix supported page size
RDMA/cma: Fix arguments order in net device validation
RDMA/irdma: Fix drain SQ hang with no completion
RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL
RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sg
Add new namespace for MACsec RX flows.
Encrypted MACsec packets should be first decrypted and stripped
from MACsec header and then continues with the kernel's steering
pipeline.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx flow steering consists of two flow tables (FTs).
The first FT (crypto table) has two fixed rules:
One default miss rule so non MACsec offloaded packets bypass the MACSec
tables, another rule to make sure that MACsec key exchange (MKE) traffic
passes unencrypted as expected (matched of ethertype).
On each new MACsec offload flow, a new MACsec rule is added.
This rule is matched on metadata_reg_a (which contains the id of the
flow) and invokes the MACsec offload action on match.
The second FT (check table) has two fixed rules:
One rule for verifying that the previous offload actions were
finished successfully and packet need to be transmitted.
Another default rule for dropping packets that were failed in the
offload actions.
The MACsec FTs should be created on demand when the first MACsec rule is
added and destroyed when the last MACsec rule is deleted.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changed EGRESS_KERNEL namespace to EGRESS_IPSEC and add new
namespace for MACsec TX.
This namespace should be the last namespace for transmitted packets.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MACsec offload related IFC structs, layouts and enumerations.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support MACsec offload (and maybe some other crypto features
in the future), generalize flow action parameters / defines to be used by
crypto offlaods other than IPsec.
The following changes made:
ipsec_obj_id field at flow action context was changed to crypto_obj_id,
intreduced a new crypto_type field where IPsec is the default zero type
for backward compatibility.
Action ipsec_decrypt was changed to crypto_decrypt.
Action ipsec_encrypt was changed to crypto_encrypt.
IPsec offload code was updated accordingly for backward compatibility.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
esp_id is no longer in used
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Query ADV_VIRTUALIZATION capabilities which provide information for
advanced virtualization related features.
Current capabilities refer to the page tracker object which is used for
tracking the pages that are dirtied by the device.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20220905105852.26398-3-yishaih@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Introduce ifc related stuff to enable using page tracker.
A page tracker is a dirty page tracking object used by the device to
report the tracking log.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20220905105852.26398-2-yishaih@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
When the RDMA auxiliary driver probes, it sets its profile based on
devlink driverinit value. The latter might not be in sync with FW yet
(In case devlink reload is not performed), thus causing a mismatch
between RDMA driver and FW. This results in the following FW syndrome
when the RDMA driver tries to adjust RoCE state, which fails the probe:
"0xC1F678 | modify_nic_vport_context: roce_en set on a vport that
doesn't support roce"
To prevent this, select the PF profile based on FW RoCE capability
instead of relying on devlink driverinit value.
To provide backward compatibility of the RoCE disable feature, on older
FW's where roce_rw is not set (FW RoCE capability is read-only), keep
the current behavior e.g., rely on devlink driverinit value.
Fixes: fbfa97b4d7 ("net/mlx5: Disable roce at HCA level")
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>