Commit Graph

549 Commits

Author SHA1 Message Date
Gal Pressman
1b9f86c6d5 net/mlx5: Fix MTMP register capability offset in MCAM register
The MTMP register (0x900a) capability offset is off-by-one, move it to
the right place.

Fixes: 1f507e80c7 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Rahul Rameshbabu
445a25f6e1 net/mlx5e: Support updating coalescing configuration without resetting channels
When CQE mode or DIM state is changed, gracefully reconfigure channels to
handle new configuration. Previously, would create new channels that would
reflect the changes rather than update the original channels.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Rahul Rameshbabu
eca1e8a628 net/mlx5e: Use DIM constants for CQ period mode parameter
Use core DIM CQ period mode enum values for the CQ parameter for the period
mode. Translate the value to the specific mlx5 device constant for the
selected period mode when creating a CQ. Avoid needing to translate mlx5
device constants to DIM constants for core DIM functionality.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Cosmin Ratiu
4aafb8ab2a net/mlx5e: Support FEC settings for 100G/lane modes
This consists of:
1. Expose the 100G/lane capability bit in the PCAM reg.
2. Expose the per link mode FEC capability masks in the PPLM reg.
3. Set the overrides according to ethtool parameters.
FEC for new modes is set if and only if the PCAM 100G/lane capability is
advertised and the capability mask for a given link mode reports that it
can accept the requested FEC mode.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240404173357.123307-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05 21:54:40 -07:00
Jianbo Liu
c788d79cfa net/mlx5: Skip pages EQ creation for non-page supplier function
Page events are not issued by device on the function if
page_request_disable is set, so no need to create pages EQ.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240402133043.56322-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:47:59 -07:00
Jianbo Liu
137f3d50ad net/mlx5: Support matching on l4_type for ttc_table
Replace matching on TCP and UDP protocols with new l4_type field which
is parsed by steering for ttc_table. It is enabled by the
outer_l4_type or inner_l4_type bits in nic_rx or port_sel flow table
capabilities and used only if pcc_ifa2 bit in HCA capabilities is set.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240402133043.56322-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:47:59 -07:00
Linus Torvalds
4138f02288 VFIO updates for v6.9-rc1
- Add warning in unlikely case that device is not captured with
    driver_override. (Kunwu Chan)
 
  - Error handling improvements in mlx5-vfio-pci to detect firmware
    tracking object error states, logging of firmware error syndrom,
    and releasing of firmware resources in aborted migration sequence.
    (Yishai Hadas)
 
  - Correct an un-alphabetized VFIO MAINTAINERS entry.
    (Alex Williamson)
 
  - Make the mdev_bus_type const and also make the class struct const
    for a couple of the vfio-mdev sample drivers. (Ricardo B. Marliere)
 
  - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's
    Grace-Hopper superchip.  During initialization of the chip-to-chip
    interconnect in this hardware module, the PCI BARs of the device
    become unused in favor of a faster, coherent mechanism for exposing
    device memory.  This driver primarily changes the VFIO
    representation of the device to masquerade this coherent aperture
    to replace the physical PCI BARs for userspace drivers.  This also
    incorporates use of a new vma flag allowing KVM to use write
    combining attributes for uncached device memory.  (Ankit Agrawal)
 
  - Reset fixes and cleanups for the pds-vfio-pci driver.  Save and
    restore files were previously leaked if the device didn't pass
    through an error state, this is resolved and later re-fixed to
    prevent access to the now freed files.  Reset handling is also
    refactored to remove the complicated deferred reset mechanism.
    (Brett Creeley)
 
  - Remove some references to pl330 in the vfio-platform amba
    driver. (Geert Uytterhoeven)
 
  - Remove twice redundant and ugly code to unpin incidental pins
    of the zero-page. (Alex Williamson)
 
  - Deferred reset logic is also removed from the hisi-acc-vfio-pci
    driver as a simplification. (Shameer Kolothum)
 
  - Enforce that mlx5-vfio-pci devices must support PRE_COPY and
    remove resulting unnecessary code.  There is no device firmware
    that has been available publicly without this support.
    (Yishai Hadas)
 
  - Switch over to using the .remove_new callback for vfio-platform
    in support of the broader transition for a void remove function.
    (Uwe Kleine-König)
 
  - Resolve multiple issues in interrupt code for VFIO bus drivers
    that allow calling eventfd_signal() on a NULL context.  This
    also remove a potential race in INTx setup on certain hardware
    for vfio-pci, races with various mechanisms to mask INTx, and
    leaked virqfds in vfio-platform. (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmXzesgbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiA4oQAKU3Z6h8oQXaMsc2nKip
 NnOtrrKw2jIohEGw01uRUf8q9uhLeLE0bidrDETion812/Lyv7M/aDlLIK4nvDvt
 AAFwL1iAKbVYTomIIWQckCwki5gBp3I+1vAQekJn4qXe7B9GohNz9cl9fNLVpcNd
 X3rWUVB5LVOvSzI+o6Ueqau+XFOMxpndr9VX4zbknIa0Th49EoYGYWPAYjzN4YyV
 GVSIWJHbtpAAHsL46jc7HmCeAtsVVkW/qHPInerSPCxabiQ+i0LSnlM16j6xXjK1
 9SvJi7+FCRGTvF3Ql2sWTK65glEbQ0xBzwSIs0L3AuKHsRISGbCHP1wymriJr5K7
 +asIM18HNLfmH/BAksbrd2M5gys8/xO9+7xIzTaYlZyTNM99Zu7d/u0B3AjYemG/
 Me3N86E2cl9Xc3NV6UEX8L1/pPpg6jKiOcZ6V9pGycUMyOTJS36FJT8Czr/jemtA
 /y6HOBpjE1gMACkk63P8GQaLMnQs7glSAEg2e++MvUVIW5END7usyLrSDr87Ysoa
 O0deH5FNSW6QAbGV+PRAmaacPvGF5B8BppYm/lnxHmPf0+saVEYOSbdFoSpK4l5H
 8f79fCtzpY8in5EDIOGJvu4u5ZWWoS/5dFLr0Teyj4vyK//PLmBv0gWoGRJV6aqj
 BrJ1f9NQ0+lVTIj/jTQ5uzlC
 =m5jA
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Add warning in unlikely case that device is not captured with
   driver_override (Kunwu Chan)

 - Error handling improvements in mlx5-vfio-pci to detect firmware
   tracking object error states, logging of firmware error syndrom, and
   releasing of firmware resources in aborted migration sequence (Yishai
   Hadas)

 - Correct an un-alphabetized VFIO MAINTAINERS entry (Alex Williamson)

 - Make the mdev_bus_type const and also make the class struct const for
   a couple of the vfio-mdev sample drivers (Ricardo B. Marliere)

 - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's
   Grace-Hopper superchip. During initialization of the chip-to-chip
   interconnect in this hardware module, the PCI BARs of the device
   become unused in favor of a faster, coherent mechanism for exposing
   device memory. This driver primarily changes the VFIO representation
   of the device to masquerade this coherent aperture to replace the
   physical PCI BARs for userspace drivers. This also incorporates use
   of a new vma flag allowing KVM to use write combining attributes for
   uncached device memory (Ankit Agrawal)

 - Reset fixes and cleanups for the pds-vfio-pci driver. Save and
   restore files were previously leaked if the device didn't pass
   through an error state, this is resolved and later re-fixed to
   prevent access to the now freed files. Reset handling is also
   refactored to remove the complicated deferred reset mechanism (Brett
   Creeley)

 - Remove some references to pl330 in the vfio-platform amba driver
   (Geert Uytterhoeven)

 - Remove twice redundant and ugly code to unpin incidental pins of the
   zero-page (Alex Williamson)

 - Deferred reset logic is also removed from the hisi-acc-vfio-pci
   driver as a simplification (Shameer Kolothum)

 - Enforce that mlx5-vfio-pci devices must support PRE_COPY and remove
   resulting unnecessary code. There is no device firmware that has been
   available publicly without this support (Yishai Hadas)

 - Switch over to using the .remove_new callback for vfio-platform in
   support of the broader transition for a void remove function (Uwe
   Kleine-König)

 - Resolve multiple issues in interrupt code for VFIO bus drivers that
   allow calling eventfd_signal() on a NULL context. This also remove a
   potential race in INTx setup on certain hardware for vfio-pci, races
   with various mechanisms to mask INTx, and leaked virqfds in
   vfio-platform (Alex Williamson)

* tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio: (29 commits)
  vfio/fsl-mc: Block calling interrupt handler without trigger
  vfio/platform: Create persistent IRQ handlers
  vfio/platform: Disable virqfds on cleanup
  vfio/pci: Create persistent INTx handler
  vfio: Introduce interface to flush virqfd inject workqueue
  vfio/pci: Lock external INTx masking ops
  vfio/pci: Disable auto-enable of exclusive INTx IRQ
  vfio/pds: Refactor/simplify reset logic
  vfio/pds: Make sure migration file isn't accessed after reset
  vfio/platform: Convert to platform remove callback returning void
  vfio/mlx5: Enforce PRE_COPY support
  vfio/mbochs: make mbochs_class constant
  vfio/mdpy: make mdpy_class constant
  hisi_acc_vfio_pci: Remove the deferred_reset logic
  Revert "vfio/type1: Unpin zero pages"
  vfio/nvgrace-gpu: Convey kvm to map device memory region as noncached
  vfio: amba: Rename pl330_ids[] to vfio_amba_ids[]
  vfio/pds: Always clear the save/restore FDs on reset
  vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
  vfio/pci: rename and export range_intersect_range
  ...
2024-03-15 13:21:13 -07:00
Jakub Kicinski
d7e14e5344 Support Multi-PF netdev (Socket Direct)
This series adds support for combining multiple devices (PFs) of the
 same port under one netdev instance. Passing traffic through different
 devices belonging to different NUMA sockets saves cross-numa traffic and
 allows apps running on the same netdev from different numas to still
 feel a sense of proximity to the device and achieve improved
 performance.
 
 We achieve this by grouping PFs together, and creating the netdev only
 once all group members are probed. Symmetrically, we destroy the netdev
 once any of the PFs is removed.
 
 The channels are distributed between all devices, a proper configuration
 would utilize the correct close numa when working on a certain app/cpu.
 
 We pick one device to be a primary (leader), and it fills a special
 role.  The other devices (secondaries) are disconnected from the network
 in the chip level (set to silent mode). All RX/TX traffic is steered
 through the primary to/from the secondaries.
 
 Currently, we limit the support to PFs only, and up to two devices
 (sockets).
 
 V6:
 - Address documentation comments from Jakub.
 
 V5:
  - Address documentation comments from Przemek Kitszel.
 
 V4:
  - Improve documentation for better user observability and understanding
    of the feature, in terms of queues and their expected NUMA/CPU/IRQ
    affinity.
 
 V3:
  - Fix documentation per Jakubs feedback.
  - Fix typos
  - Link new documentation in the networking index.rst
 
 V2:
  - Add documentation in a new patch.
  - Add debugfs in a new patch.
  - Add mlx5_ifc bit for MPIR cap check and use it before query.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmXpfYgACgkQSD+KveBX
 +j5jIAf/VGIX/UQttq74MzK9pWgJNKtf7l8aSYtZuKXx68pmpr+25DfsxbKEeVfy
 KzjvGFx5peoKisWILyaljQXSn7snmSqOsQf/IwDzmsmF/2ZTDyf6NPC6gND0bIjJ
 Uu6cJ2T6Sa9ktg+ANz/gLDvGBBfPqSYTYIXrJnNQKsnW6nV8mDvy4WVf6etvCxOi
 rMjfcqwNijf3GPTJd/qkaWhwneDG2AFWd5HzdORpNh6iuv8Cbc9aNhWgAPh18o7v
 VWuAiFraTgaz6jj2H/NfziAk4ZrtVsCqhaFjJe3eLO+MCk/bZ/SizsAcR61JLkjL
 pFqh5wqxA6v+5YJm4zVatZqPLIt4gQ==
 =GZBa
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Support Multi-PF netdev (Socket Direct)

This series adds support for combining multiple devices (PFs) of the
same port under one netdev instance. Passing traffic through different
devices belonging to different NUMA sockets saves cross-numa traffic and
allows apps running on the same netdev from different numas to still
feel a sense of proximity to the device and achieve improved
performance.

We achieve this by grouping PFs together, and creating the netdev only
once all group members are probed. Symmetrically, we destroy the netdev
once any of the PFs is removed.

The channels are distributed between all devices, a proper configuration
would utilize the correct close numa when working on a certain app/cpu.

We pick one device to be a primary (leader), and it fills a special
role.  The other devices (secondaries) are disconnected from the network
in the chip level (set to silent mode). All RX/TX traffic is steered
through the primary to/from the secondaries.

Currently, we limit the support to PFs only, and up to two devices
(sockets).

* tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  Documentation: networking: Add description for multi-pf netdev
  net/mlx5: Enable SD feature
  net/mlx5e: Block TLS device offload on combined SD netdev
  net/mlx5e: Support per-mdev queue counter
  net/mlx5e: Support cross-vhca RSS
  net/mlx5e: Let channels be SD-aware
  net/mlx5e: Create EN core HW resources for all secondary devices
  net/mlx5e: Create single netdev per SD group
  net/mlx5: SD, Add debugfs
  net/mlx5: SD, Add informative prints in kernel log
  net/mlx5: SD, Implement steering for primary and secondaries
  net/mlx5: SD, Implement devcom communication and primary election
  net/mlx5: SD, Implement basic query and instantiation
  net/mlx5: SD, Introduce SD lib
  net/mlx5: Add MPIR bit in mcam_access_reg
====================

Link: https://lore.kernel.org/r/20240307084229.500776-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 20:45:17 -08:00
Jakub Kicinski
e3afe5dd3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/page_pool_user.c
  0b11b1c5c3 ("netdev: let netlink core handle -EMSGSIZE errors")
  429679dcf7 ("page_pool: fix netlink dump stop/resume")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 10:29:36 -08:00
Tariq Toukan
a0873a5d54 net/mlx5: Add MPIR bit in mcam_access_reg
Add a cap bit in mcam_access_reg to check for MPIR support.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:38 -08:00
Moshe Shemesh
5e6107b499 net/mlx5: Check capability for fw_reset
Functions which can't access MFRL (Management Firmware Reset Level)
register, have no use of fw_reset structures or events. Remove fw_reset
structures allocation and registration for fw reset events notifications
for these functions.

Having the devlink param enable_remote_dev_reset on functions that don't
have this capability is misleading as these functions are not allowed to
influence the reset flow. Hence, this patch removes this parameter for
such functions.

In addition, return not supported on devlink reload action fw_activate
for these functions.

Fixes: 38b9f903f2 ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01 23:02:26 -08:00
Jakub Kicinski
fecc51559a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/udp.c
  f796feabb9 ("udp: add local "peek offset enabled" flag")
  56667da739 ("net: implement lockless setsockopt(SO_PEEK_OFF)")

Adjacent changes:

net/unix/garbage.c
  aa82ac51d6 ("af_unix: Drop oob_skb ref before purging queue in GC.")
  11498715f2 ("af_unix: Remove io_uring code for GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 15:29:26 -08:00
Yishai Hadas
1cbcb564f5 net/mlx5: Add the IFC related bits for query tracker
Add the IFC related bits for query tracker.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Link: https://lore.kernel.org/r/20240205124828.232701-2-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-02-22 12:17:32 -07:00
Linus Torvalds
9fc1ccccfd RDMA v6.8 first rc
irdma and bnxt_re fixes:
 
 - Missing error unwind in hf1
 
 - For bnxt - fix fenching behavior to work on new chips, fail unsupported
   SRQ resize back to userspace, propogate SRQ FW failure back to
   userspace.
 
 - Correctly fail unsupported SRQ resize back to userspace in bnxt
 
 - Adjust a memcpy in mlx5 to not overflow a struct field.
 
 - Prevent userspace from triggering mlx5 fw syndrome logging from sysfs
 
 - Use the correct access mode for MLX5_IB_METHOD_DEVX_OBJ_MODIFY to avoid
   a userspace failure on modify
 
 - For irdma - Don't UAF a concurrent tasklet during destroy, prevent
   userspace from issuing invalid QP attrs, fix a possible CQ overflow,
   capture a missing HW async error event
 
 - sendmsg() triggerable memory access crash in hfi1
 
 - Fix the srpt_service_guid parameter to not crash due to missing function
   pointer
 
 - Don't leak objects in error unwind in qedr
 
 - Don't weirdly cast function pointers in srpt
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZdUweQAKCRCFwuHvBreF
 YXH/AQCsg2hU9k0CiyDql2pW+fu+91bG4FFHed1zimdpzIrXcAEA1qcFIK+EWlPZ
 HDxyvbp3dupmnH2QHjEQChzXt3EhdQk=
 =+bP+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Mostly irdma and bnxt_re fixes:

   - Missing error unwind in hf1

   - For bnxt - fix fenching behavior to work on new chips, fail
     unsupported SRQ resize back to userspace, propogate SRQ FW failure
     back to userspace.

   - Correctly fail unsupported SRQ resize back to userspace in bnxt

   - Adjust a memcpy in mlx5 to not overflow a struct field.

   - Prevent userspace from triggering mlx5 fw syndrome logging from
     sysfs

   - Use the correct access mode for MLX5_IB_METHOD_DEVX_OBJ_MODIFY to
     avoid a userspace failure on modify

   - For irdma - Don't UAF a concurrent tasklet during destroy, prevent
     userspace from issuing invalid QP attrs, fix a possible CQ
     overflow, capture a missing HW async error event

   - sendmsg() triggerable memory access crash in hfi1

   - Fix the srpt_service_guid parameter to not crash due to missing
     function pointer

   - Don't leak objects in error unwind in qedr

   - Don't weirdly cast function pointers in srpt"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/srpt: fix function pointer cast warnings
  RDMA/qedr: Fix qedr_create_user_qp error flow
  RDMA/srpt: Support specifying the srpt_service_guid parameter
  IB/hfi1: Fix sdma.h tx->num_descs off-by-one error
  RDMA/irdma: Add AE for too many RNRS
  RDMA/irdma: Set the CQ read threshold for GEN 1
  RDMA/irdma: Validate max_send_wr and max_recv_wr
  RDMA/irdma: Fix KASAN issue with tasklet
  RDMA/mlx5: Relax DEVX access upon modify commands
  IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
  RDMA/mlx5: Fix fortify source warning while accessing Eth segment
  RDMA/bnxt_re: Add a missing check in bnxt_qplib_query_srq
  RDMA/bnxt_re: Return error for SRQ resize
  RDMA/bnxt_re: Fix unconditional fence for newer adapters
  RDMA/bnxt_re: Remove a redundant check inside bnxt_re_vf_res_config
  RDMA/bnxt_re: Avoid creating fence MR for newer adapters
  IB/hfi1: Fix a memleak in init_credit_return
2024-02-20 17:00:26 -08:00
Gal Pressman
91a72ada66 net/mlx5: Remove initial segmentation duplicate definitions
Device definitions belong in mlx5_ifc, remove the duplicates in
mlx5_core.h.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05 16:45:52 -08:00
Jiri Pirko
2c54a4d712 net/mlx5: DPLL, Implement lock status error value
Fill-up the lock status error value properly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 15:39:44 +01:00
Mark Zhang
43fdbd1402 IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
debugfs entries for RRoCE general CC parameters must be exposed only when
they are supported, otherwise when accessing them there may be a syndrome
error in kernel log, for example:

$ cat /sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp
cat: '/sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp': Invalid argument
$ dmesg
 mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1253): QUERY_CONG_PARAMS(0x824) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x325a82), err(-22)

Fixes: 66fb1d5df6 ("IB/mlx5: Extend debug control for CC parameters")
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/e7ade70bad52b7468bdb1de4d41d5fad70c8b71c.1706433934.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-01-31 11:15:29 +02:00
Moshe Shemesh
ec7cc38ef9 net/mlx5: Bridge, fix multicast packets sent to uplink
To enable multicast packets which are offloaded in bridge multicast
offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should
be set. Add this bit to FTE for the bridge multicast offload rules.

Fixes: 18c2916cee ("net/mlx5: Bridge, snoop igmp/mld packets")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24 00:15:35 -08:00
Tariq Toukan
cfbc3608a8 net/mlx5: Fix query of sd_group field
The sd_group field moved in the HW spec from the MPIR register
to the vport context.
Align the query accordingly.

Fixes: f5e9563299 ("net/mlx5: Expose Management PCIe Index Register (MPIR)")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24 00:15:33 -08:00
Linus Torvalds
0b7359ccdd virtio: features, fixes
vdpa/mlx5: support for resumable vqs
 virtio_scsi: mq_poll support
 3virtio_pmem: support SHMEM_REGION
 virtio_balloon: stay awake while adjusting balloon
 virtio: support for no-reset virtio PCI PM
 
 Fixes, cleanups.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmWmgP8PHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpcfgH/0RD2S+NFY0ZEJz8BuI6GjykzYnyRW9iyxcw
 epTLjPUcoEBttlw8TA+3PiPoNIJGfuU8Q4iKXJ8Jzql081tP9G1UxTIbj0v3Hx+q
 0L2DUXfdAMYMLo5WQVl/PADV/10xLgExEh9jMqpU3IJIxVaLE/knD9ghRCDvDbs/
 fOo3sSUGaNsSHYZs5bH73Q7cRKKmTLO+MzvHBbavFfz2fQ1b3vwecmJuQtAtK0JC
 6JxH6Y38VfOl8jA6IHeEpGIHeF661HABkDDUh4UVEGOeyBl4E6ZcG4fjWSMinZ08
 U3TbQLYOq10i8ki2LJKgoZHRv1HkxbM1Ogn0bsIh1hish8dPORM=
 =RWjR
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - vdpa/mlx5: support for resumable vqs

 - virtio_scsi: mq_poll support

 - 3virtio_pmem: support SHMEM_REGION

 - virtio_balloon: stay awake while adjusting balloon

 - virtio: support for no-reset virtio PCI PM

 - Fixes, cleanups

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Add mkey leak detection
  vdpa/mlx5: Introduce reference counting to mrs
  vdpa/mlx5: Use vq suspend/resume during .set_map
  vdpa/mlx5: Mark vq state for modification in hw vq
  vdpa/mlx5: Mark vq addrs for modification in hw vq
  vdpa/mlx5: Introduce per vq and device resume
  vdpa/mlx5: Allow modifying multiple vq fields in one modify command
  vdpa/mlx5: Expose resumable vq capability
  vdpa: Block vq property changes in DRIVER_OK
  vdpa: Track device suspended state
  scsi: virtio_scsi: Add mq_poll support
  virtio_pmem: support feature SHMEM_REGION
  virtio_balloon: stay awake while adjusting balloon
  vdpa: Remove usage of the deprecated ida_simple_xx() API
  virtio: Add support for no-reset virtio PCI PM
  virtio_net: fix missing dma unmap for resize
  vhost-vdpa: account iommu allocations
  vdpa: Fix an error handling path in eni_vdpa_probe()
2024-01-18 16:44:03 -08:00
Linus Torvalds
bf9ca811bb RDMA v6.8 merge window
Small cycle, with some typical driver updates
 
  - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and bnxt_re
 
  - Many small siw cleanups without an overeaching theme
 
  - Debugfs stats for hns
 
  - Fix a TX queue timeout in IPoIB and missed locking of the mcast list
 
  - Support more features of P7 devices in bnxt_re including a new work
    submission protocol
 
  - CQ interrupts for MANA
 
  - netlink stats for erdma
 
  - EFA multipath PCI support
 
  - Fix Incorrect MR invalidation in iser
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZaCQDQAKCRCFwuHvBreF
 YemEAQCTGebv0k2hbocDOmKml5awt8j9aDJX3aO7Zpfi0AYUtwEAzk+kgN4yAo+B
 Vinvpu171zry+QvmGJsXv2mtZkXH6QY=
 =HT3p
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Small cycle, with some typical driver updates:

   - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and
     bnxt_re

   - Many small siw cleanups without an overeaching theme

   - Debugfs stats for hns

   - Fix a TX queue timeout in IPoIB and missed locking of the mcast
     list

   - Support more features of P7 devices in bnxt_re including a new work
     submission protocol

   - CQ interrupts for MANA

   - netlink stats for erdma

   - EFA multipath PCI support

   - Fix Incorrect MR invalidation in iser"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits)
  RDMA/bnxt_re: Fix error code in bnxt_re_create_cq()
  RDMA/efa: Add EFA query MR support
  IB/iser: Prevent invalidating wrong MR
  RDMA/erdma: Add hardware statistics support
  RDMA/erdma: Introduce dma pool for hardware responses of CMDQ requests
  IB/iser: iscsi_iser.h: fix kernel-doc warning and spellos
  RDMA/mana_ib: Add CQ interrupt support for RAW QP
  RDMA/mana_ib: query device capabilities
  RDMA/mana_ib: register RDMA device with GDMA
  RDMA/bnxt_re: Fix the sparse warnings
  RDMA/bnxt_re: Fix the offset for GenP7 adapters for user applications
  RDMA/bnxt_re: Share a page to expose per CQ info with userspace
  RDMA/bnxt_re: Add UAPI to share a page with user space
  IB/ipoib: Fix mcast list locking
  RDMA/mlx5: Expose register c0 for RDMA device
  net/mlx5: E-Switch, expose eswitch manager vport
  net/mlx5: Manage ICM type of SW encap
  RDMA/mlx5: Support handling of SW encap ICM area
  net/mlx5: Introduce indirect-sw-encap ICM properties
  RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters
  ...
2024-01-12 13:52:21 -08:00
Dragos Tatulea
ef067191f7 vdpa/mlx5: Expose resumable vq capability
Necessary for checking if resumable vqs are supported by the hardware.
Actual support will be added in a downstream patch.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231225151203.152687-2-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10 13:01:37 -05:00
Jakub Kicinski
3fbf61207c Revert "mlx5 updates 2023-12-20"
Revert "net/mlx5: Implement management PF Ethernet profile"
This reverts commit 22c4640698.
Revert "net/mlx5: Enable SD feature"
This reverts commit c88c49ac9c.
Revert "net/mlx5e: Block TLS device offload on combined SD netdev"
This reverts commit 83a59ce005.
Revert "net/mlx5e: Support per-mdev queue counter"
This reverts commit d72baceb92.
Revert "net/mlx5e: Support cross-vhca RSS"
This reverts commit c73a3ab8fa.
Revert "net/mlx5e: Let channels be SD-aware"
This reverts commit e4f9686bde.
Revert "net/mlx5e: Create EN core HW resources for all secondary devices"
This reverts commit c4fb94aa82.
Revert "net/mlx5e: Create single netdev per SD group"
This reverts commit e2578b4f98.
Revert "net/mlx5: SD, Add informative prints in kernel log"
This reverts commit c82d360325.
Revert "net/mlx5: SD, Implement steering for primary and secondaries"
This reverts commit 605fcce33b.
Revert "net/mlx5: SD, Implement devcom communication and primary election"
This reverts commit a45af9a967.
Revert "net/mlx5: SD, Implement basic query and instantiation"
This reverts commit 63b9ce944c.
Revert "net/mlx5: SD, Introduce SD lib"
This reverts commit 4a04a31f49.
Revert "net/mlx5: Fix query of sd_group field"
This reverts commit e04984a373.
Revert "net/mlx5e: Use the correct lag ports number when creating TISes"
This reverts commit a7e7b40c4b.

There are some unanswered questions on the list, and we don't
have any docs. Given the lack of replies so far and the fact
that v6.8 merge window has started - let's revert this and
revisit for v6.9.

Link: https://lore.kernel.org/all/20231221005721.186607-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-07 17:16:11 -08:00
Armen Ratner
22c4640698 net/mlx5: Implement management PF Ethernet profile
Add management PF modules, which introduce support for the structures
needed to create the resources for the MGMT PF to work.
Also, add the necessary calls and functions to establish this
functionality.

Signed-off-by: Armen Ratner <armeng@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
2023-12-20 16:54:27 -08:00
Tariq Toukan
e04984a373 net/mlx5: Fix query of sd_group field
The sd_group field moved in the HW spec from the MPIR register
to the vport context.
Align the query accordingly.

Fixes: f5e9563299 ("net/mlx5: Expose Management PCIe Index Register (MPIR)")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20 16:54:24 -08:00
David S. Miller
12da68e27b mlx5-updates-2023-12-13
Preparation for mlx5e socket direct feature.
 
 Socket direct will allow multiple PF devices attached to different
 NUMA nodes but sharing the same physical port.
 
 The following series is a small refactoring series in preparation
 to support socket direct in the following submission.
 
 Highlights:
  - Define required device registers and bits related to socket direct
  - Flow steering re-arrangements
  - Generalize TX objects (TISs) and store them in a common object, will
    be useful in the next series for per function object management.
  - Decouple raw CQ objects from their parent netdev priv
  - Prepare devcom for Socket Direct device group discovery.
 
 Please see the individual patches for more information.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmV6YnQACgkQSD+KveBX
 +j48MggAi+iKb6supNWaDkSufLsJu7R56jCJK/mHEl/3586wsoiAx8e7BcAzB2FS
 J6xfEJOwC++slNlIubWgEAvH8vv8YHI0K8q4+IUaC19FY3uy9z4GdPAAvzhqUbQ7
 D1kWQZ4KIwDb5FbniwHw4V88ZBAGXl5Rhm7Wk/T90B4aG3hyGCeYOlsd0DnWNUae
 ea7OkMI1s7QZkJ2HL1KS3xy6WMfrTi8G0TnznP0iLX51FYHciIujp0lfeZZlAJgo
 +ovMUl2TTKb0yuKuWBapOz6w7le8VixWp7LbVYyd/wVZ1jBQuScgYHA6EeUKhdb8
 9rZ726sSNtMjfebcx85KgkYYNleEKA==
 =e11K
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-12-13

Preparation for mlx5e socket direct feature.

Socket direct will allow multiple PF devices attached to different
NUMA nodes but sharing the same physical port.

The following series is a small refactoring series in preparation
to support socket direct in the following submission.

Highlights:
 - Define required device registers and bits related to socket direct
 - Flow steering re-arrangements
 - Generalize TX objects (TISs) and store them in a common object, will
   be useful in the next series for per function object management.
 - Decouple raw CQ objects from their parent netdev priv
 - Prepare devcom for Socket Direct device group discovery.

Please see the individual patches for more information.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-15 10:00:02 +00:00
Jakub Kicinski
8f674972d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/intel/iavf/iavf_ethtool.c
  3a0b5a2929 ("iavf: Introduce new state machines for flow director")
  95260816b4 ("iavf: use iavf_schedule_aq_request() helper")
https://lore.kernel.org/all/84e12519-04dc-bd80-bc34-8cf50d7898ce@intel.com/

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  c13e268c07 ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic")
  c2f8063309 ("bnxt_en: Refactor RX VLAN acceleration logic.")
  a7445d6980 ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
  1c7fd6ee2f ("bnxt_en: Rename some macros for the P5 chips")
https://lore.kernel.org/all/20231211110022.27926ad9@canb.auug.org.au/

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
  bd6781c18c ("bnxt_en: Fix wrong return value check in bnxt_close_nic()")
  84793a4995 ("bnxt_en: Skip nic close/open when configuring tstamp filters")
https://lore.kernel.org/all/20231214113041.3a0c003c@canb.auug.org.au/

drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
  3d7a3f2612 ("net/mlx5: Nack sync reset request when HotPlug is enabled")
  cecf44ea1a ("net/mlx5: Allow sync reset flow when BF MGT interface device is present")
https://lore.kernel.org/all/20231211110328.76c925af@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-14 17:14:41 -08:00
Tariq Toukan
f5e9563299 net/mlx5: Expose Management PCIe Index Register (MPIR)
MPIR register allows to query the PCIe indexes
and Socket-Direct related parameters.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-13 18:03:30 -08:00
Tariq Toukan
13049408a4 net/mlx5: Add mlx5_ifc bits used for supporting single netdev Socket-Direct
Multiple device caps and features are required to support
single netdev Socket-Direct.
Add them here in preparation for the feature implementation.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-13 18:03:30 -08:00
Shun Hao
1ca51628e7 net/mlx5: Introduce indirect-sw-encap ICM properties
Add new fields for device memory capabilities, in order to support
creation of new ICM memory type of SW encap.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Link: https://lore.kernel.org/r/107cca7dd6a932a1704abf6ebd1b801105546a8e.1701871118.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-12-12 09:03:46 +02:00
Leon Romanovsky
c2bf84f1d1 net/mlx5e: Tidy up IPsec NAT-T SA discovery
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones.
In case they arrive to RX, the SPI and ESP are located in inner header,
while the check was performed on outer header instead.

That wrong check caused to the situation where received rekeying request
was missed and caused to rekey timeout, which "compensated" this failure
by completing rekeying.

Fixes: d659549349 ("net/mlx5e: Support IPsec NAT-T functionality")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04 22:11:52 -08:00
Leon Romanovsky
a5e400a985 net/mlx5e: Honor user choice of IPsec replay window size
Users can configure IPsec replay window size, but mlx5 driver didn't
honor their choice and set always 32bits. Fix assignment logic to
configure right size from the beginning.

Fixes: 7db21ef456 ("net/mlx5e: Set IPsec replay sequence numbers")
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04 22:11:51 -08:00
Rahul Rameshbabu
4aea6a6d61 net/mlx5: Query maximum frequency adjustment of the PTP hardware clock
Some mlx5 devices do not support the default advertised maximum frequency
adjustment value for the PTP hardware clock that is set by the driver.
These devices need to be queried when initializing the clock functionality
in order to get the maximum supported frequency adjustment value. This
value can be greater than the minimum supported frequency adjustment across
mlx5 devices (50 million ppb).

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15 11:34:31 -08:00
Linus Torvalds
77fa2fbe87 vhost,virtio,vdpa: features, fixes, cleanups
vdpa/mlx5:
 	VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
 	new maintainer
 vdpa:
 	support for vq descriptor mappings
 	decouple reset of iotlb mapping from device reset
 
 fixes, cleanups all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmVCUMYPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRp4L0H/RKcnNXPRqzhhBI1XVQ11Z8CO8WjovcmJalu
 ADHNEGmvuWnY79fp9eLiZ4iVaTx1qbzqIB5Q500DJ65jh71W7UQ8ww6CGjNUoRGs
 Zoe4G09WoOf4bvDZZzVV7ml/AzMdsHWSZK8pxY3QI9CsC9Zfp9hg20QYxPylCqYx
 SIJx7w2MkoojfmtOHRx1WUxaQz99yfU4Z0C5PxtRE1HGN6/a1aY0P0CAl5jq8uCK
 U5sCRsfCmP7VKlspeEddMiPA35ADbCiysSobCbwGVQEs5cHpMUX7KWa+oV0tF/PY
 9uyJb2rJy6zG3tXmL4XNib665ZR86HX6qiWRfm2nBQQStuHaJyg=
 =mXgo
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "vhost,virtio,vdpa: features, fixes, cleanups.

  vdpa/mlx5:
   - VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
   - new maintainer

  vdpa:
   - support for vq descriptor mappings
   - decouple reset of iotlb mapping from device reset

  and fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
  vdpa_sim: implement .reset_map support
  vdpa/mlx5: implement .reset_map driver op
  vhost-vdpa: clean iotlb map during reset for older userspace
  vdpa: introduce .compat_reset operation callback
  vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
  vhost-vdpa: reset vendor specific mapping to initial state in .release
  vdpa: introduce .reset_map operation callback
  virtio_pci: add check for common cfg size
  virtio-blk: fix implicit overflow on virtio_max_dma_size
  virtio_pci: add build offset check for the new common cfg items
  virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
  vduse: make vduse_class constant
  vhost-scsi: Spelling s/preceeding/preceding/g
  virtio: kdoc for struct virtio_pci_modern_device
  vdpa: Update sysfs ABI documentation
  MAINTAINERS: Add myself as mlx5_vdpa driver
  virtio-balloon: correct the comment of virtballoon_migratepage()
  mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
  vdpa/mlx5: Update cvq iotlb mapping on ASID change
  vdpa/mlx5: Make iotlb helper functions more generic
  ...
2023-11-05 09:02:32 -10:00
Jakub Kicinski
1bc60524ca Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
This PR is collected from
https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org

This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).

These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.

[1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Handle IPsec steering upon master unbind/bind
  net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
  net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
  net/mlx5: Add create alias flow table function to ipsec roce
  net/mlx5: Implement alias object allow and create functions
  net/mlx5: Add alias flow table bits
  net/mlx5: Store devcom pointer inside IPsec RoCE
  net/mlx5: Register mlx5e priv to devcom in MPV mode
  RDMA/mlx5: Send events from IB driver about device affiliation state
  net/mlx5: Introduce ifc bits for migration in a chunk mode

====================

Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 09:35:34 -07:00
Dragos Tatulea
d424348b06 vdpa/mlx5: Expose descriptor group mkey hw capability
Necessary for improved live migration flow. Actual support will be added
in a downstream patch.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20230928164550.980832-3-dtatulea@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 13:12:05 +03:00
Patrisious Haddad
ef36ffcb38 net/mlx5: Add alias flow table bits
Add all the capabilities needed to check for alias object support.
As well as all the fields or commands needed for its creation and
the creation of flow table that is able to jump to an alias object.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/544c030f2a78c4adf3fe6b64f97a39cc1bbdabb9.1695296682.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 11:21:11 +03:00
Yishai Hadas
5aa4c9608d net/mlx5: Introduce ifc bits for migration in a chunk mode
Introduce ifc related stuff to enable migration in a chunk mode.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20230911093856.81910-2-yishaih@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-28 13:58:38 +03:00
Moshe Shemesh
e0cc92fd94 net/mlx5: Add a health error syndrome for pci data poisoned
Add new health error syndrome to indicate that pci data poisoned error
has been received while fetching device ICM data.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-09-19 23:33:08 -07:00
Jiri Pirko
496fd0a26b mlx5: Implement SyncE support using DPLL infrastructure
Implement SyncE support using newly introduced DPLL support.
Make sure that each PFs/VFs/SFs probed with appropriate capability
will spawn a dpll auxiliary device and register appropriate dpll device
and pin instances.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 11:50:20 +01:00
Leon Romanovsky
17c8da5a34 net/mlx5: Add IFC bits to support IPsec enable/disable
Add hardware definitions to allow to control IPSec capabilities.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:08:45 -07:00
Shay Drory
0b4eb603d6 net/mlx5: Remove unused CAPs
mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but
there isn't any user who requires them.
As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used.

Thus, drop all usages and definitions of the mentioned caps above.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14 14:40:22 -07:00
Moshe Shemesh
a9f168e4c6 net/mlx5: Check with FW that sync reset completed successfully
Even if the PF driver had no error on his part of the sync reset flow,
the firmware can see wider picture as it syncs all the PFs in the flow.
So add at end of sync reset flow check with firmware by reading MFRL
register and initialization segment that the flow had no issue from
firmware point of view too.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14 14:40:21 -07:00
Adham Faris
1f507e80c7 net/mlx5: Expose NIC temperature via hardware monitoring kernel API
Expose NIC temperature by implementing hwmon kernel API, which turns
current thermal zone kernel API to redundant.

For each one of the supported and exposed thermal diode sensors, expose
the following attributes:
1) Input temperature.
2) Highest temperature.
3) Temperature label:
   Depends on the firmware capability, if firmware doesn't support
   sensors naming, the fallback naming convention would be: "sensorX",
   where X is the HW spec (MTMP register) sensor index.
4) Temperature critical max value:
   refers to the high threshold of Warning Event. Will be exposed as
   `tempY_crit` hwmon attribute (RO attribute). For example for
   ConnectX5 HCA's this temperature value will be 105 Celsius, 10
   degrees lower than the HW shutdown temperature).
5) Temperature reset history: resets highest temperature.

For example, for dualport ConnectX5 NIC with a single IC thermal diode
sensor will have 2 hwmon directories (one for each PCI function)
under "/sys/class/hwmon/hwmon[X,Y]".

Listing one of the directories above (hwmonX/Y) generates the
corresponding output below:

$ grep -H -d skip . /sys/class/hwmon/hwmon0/*

Output
=======================================================================
/sys/class/hwmon/hwmon0/name:mlx5
/sys/class/hwmon/hwmon0/temp1_crit:105000
/sys/class/hwmon/hwmon0/temp1_highest:48000
/sys/class/hwmon/hwmon0/temp1_input:46000
/sys/class/hwmon/hwmon0/temp1_label:asic
grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied

In addition, displaying the sensors data via lm_sensors generates the
corresponding output below:

$ sensors

Output
=======================================================================
mlx5-pci-0800
Adapter: PCI adapter
asic:         +46.0°C  (crit = +105.0°C, highest = +48.0°C)

mlx5-pci-0801
Adapter: PCI adapter
asic:         +46.0°C  (crit = +105.0°C, highest = +48.0°C)

CC: Jean Delvare <jdelvare@suse.com>
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09 15:52:16 -07:00
Leon Romanovsky
5726628127 net/mlx5: Add relevant capabilities bits to support NAT-T
Provide an ability to check if flow steering supports UDP
encapsulation and decapsulation of IPsec ESP packets.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25 15:08:57 +02:00
Lama Kayal
9ee473c259 net/mlx5: Fix reserved at offset in hca_cap register
A member of struct mlx5_ifc_cmd_hca_cap_bits has been mistakenly
assigned the wrong reserved_at offset value. Correct it to align to the
right value, thus avoid future miscalculation.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-23 12:27:33 -07:00
Or Har-Toov
0bd2e6fc78 net/mlx5: Expose bits for local loopback counter
Add needed HW bits for querying local loopback counter and the
HCA capability for it.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:08 -07:00
Moshe Shemesh
7a9770f1bf net/mlx5: Handle sync reset unload event
Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables the firmware to ensure host PFs unload before ECPFs
unload, to avoid race of PFs recovery.

If firmware sends sync_reset_unload event to driver the driver should
unload and close all HW resources of the function. Once the driver
finishes unloading part, it can't get any more events from firmware as
event queues are closed, so it polls the reset state field to know when
to continue to next stage of the sync reset flow.

Added capability bit for supporting sync_reset_unload event.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:07 -07:00
Moshe Shemesh
8bb42ed421 net/mlx5: Expose timeout for sync reset unload stage
Expose new timoueout in Default Timeouts Register to be used on sync
reset flow running on smart NIC. In this flow the driver should know how
much time to wait from getting unload request till firmware will ask the
PF to continue to next stage of the flow.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:06 -07:00
Daniel Jurgens
93b36d0f28 net/mlx5: mlx5_ifc updates for embedded CPU SRIOV
Add ec_vf_vport_base to HCA Capabilities 2. This indicates the base vport
of embedded CPU virtual functions that are connected to the eswitch.

Add ec_vf_function to query/set_hca_caps. If set this indicates
accessing a virtual function on the embedded CPU by function ID. This
should only be used with other_function set to 1.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:50 -07:00