mirror_ubuntu-kernels

mirror of https://git.proxmox.com/git/mirror_ubuntu-kernels.git synced 2026-01-03 10:16:02 +00:00

Author	SHA1	Message	Date
Jiri Pirko	ec54677e55	mlxsw: reg: Add XM Lookup Table Query Register The XLTQ is used to query HW for XM-related info. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 19:09:54 -08:00
Jiri Pirko	087489dc27	mlxsw: reg: Add Router XLT M select Register The RXLTM configures and selects the M for the XM lookups. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 19:09:54 -08:00
Jiri Pirko	50779c3325	mlxsw: Ignore ports that are connected to eXtended mezanine Use the info stored in the bus_info struct about the eXtended mezanine connected ports and don't expose them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 19:09:54 -08:00
Jiri Pirko	2ea3f4c7fa	mlxsw: pci: Obtain info about ports used by eXtended mezanine The output of boardinfo command was extended to contain information about XM. Indicates if is present and in case it is, tells which localports are used for the connection. So parse this info and store it in bus_info passed up to the driver. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 19:09:54 -08:00
Jiri Pirko	ff462103ca	mlxsw: spectrum_router: Introduce XM implementation of router low-level ops In order to offload entries to XM, implement a set of low-level functions to work with LPM trees in XM and also to pack and write FIB entries into XM. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 19:09:54 -08:00
Jiri Pirko	6100fbf13d	mlxsw: reg: Add Router XLT Enable Register The RXLTE enables XLT (eXtended Lookup Table) LPM lookups if a capable XM is present on the system. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 19:09:53 -08:00
Jiri Pirko	be6ba3b61e	mlxsw: reg: Add XM Direct Register The XMDR allows direct access to the XM device via the switch. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 19:09:53 -08:00
Jakub Kicinski	46d5e62dd3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-11 22:29:38 -08:00
Zheng Yongjun	b18cac546b	net/mlx4: simplify the return expression of mlx4_init_srq_table() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-10 13:00:51 -08:00
Zheng Yongjun	016ade51a7	net/mlx4: simplify the return expression of mlx4_init_cq_table() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-09 19:20:46 -08:00
Moshe Shemesh	ba603d9d7b	net/mlx4_en: Handle TX error CQE In case error CQE was found while polling TX CQ, the QP is in error state and all posted WQEs will generate error CQEs without any data transmitted. Fix it by reopening the channels, via same method used for TX timeout handling. In addition add some more info on error CQE and WQE for debug. Fixes: `bd2f631d7c` ("net/mlx4_en: Notify user when TX ring in error state") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-09 16:44:35 -08:00
Moshe Shemesh	fed91613c9	net/mlx4_en: Avoid scheduling restart task if it is already running Add restarting state flag to avoid scheduling another restart task while such task is already running. Change task name from watchdog_task to restart_task to better fit the task role. Fixes: `1e338db56e` ("mlx4_en: Fix a race at restart task") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-09 16:44:35 -08:00
Zheng Yongjun	873d2f1216	net: mlx5: convert comma to semicolon Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-09 16:23:08 -08:00
Amit Cohen	745f73deea	mlxsw: spectrum_switchdev: Allow joining VxLAN to 802.1ad bridge The previous patches added support for VxLAN device enslaved to 802.1ad bridge in Spectrum-2 ASIC and vetoed it in Spectrum-1. Do not veto VxLAN with 802.1ad bridge. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:57 -08:00
Amit Cohen	efbcb67339	mlxsw: Veto Q-in-VNI for Spectrum-1 ASIC Implementation of Q-in-VNI is different between ASIC types, this set adds support only for Spectrum-2. Return an error when trying to create VxLAN device and enslave it to 802.1ad bridge in Spectrum-1. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	7e9c72a5da	mlxsw: spectrum_switchdev: Use ops->vxlan_join() when adding VLAN to VxLAN device Currently mlxsw_sp_switchdev_vxlan_vlan_add() always calls mlxsw_sp_bridge_8021q_vxlan_join() because VLANs were only ever added to a VLAN-filtering bridge, which is only 802.1q bridge. This set adds support for VxLAN with 802.1ad bridge, so VLAN-filtering bridge is not only 802.1q. Call ops->vxlan_join(), so mlxsw_sp_bridge_802{1q, 1ad}_vxlan_join() will be called according to bridge type. This is needed to ensure that VxLAN with 802.1ad bridge will be vetoed in Spectrum-1 with the next patch. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	0b5ec8f237	mlxsw: spectrum_nve_vxlan: Add support for Q-in-VNI for Spectrum-2 ASIC On Spectrum-2, the default setting is not to push VLAN to the decapsulated packet. This is controlled by SPVTR.ipvid_mode. Set SPVTR.ipvid_mode to always push VLAN. Without this setting, Spectrum-2 overtakes the VLAN tag of decapsulated packet for bridging. In addition, set SPVID register to use EtherType saved in mlxsw_sp_nve_config when VLAN is pushed for the NVE tunnel. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	4418096e84	mlxsw: spectrum: Publish mlxsw_sp_ethtype_to_sver_type() Declare mlxsw_sp_ethtype_to_sver_type() in spectrum.h to enable using it in other files. It will be used in the next patch to map between EtherType and the relevant value configured by SVER register. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	49d18964e9	mlxsw: Save EtherType as part of mlxsw_sp_nve_config Add EtherType field to mlxsw_sp_nve_config struct. Set EtherType according to mlxsw_sp_nve_params.ethertype. Pass 'mlxsw_sp_nve_params' instead of 'mlxsw_sp_nve_params->dev' to the function which initializes mlxsw_sp_nve_config struct to know which EtherType to use. This field is needed to configure which EtherType will be used when VLAN is pushed at ingress of the tunnel port. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	0913a24b3a	mlxsw: Save EtherType as part of mlxsw_sp_nve_params Add EtherType field to mlxsw_sp_nve_params struct. Set it when VxLAN device is added to bridge device. This field is needed to configure which EtherType will be used when VLAN is pushed at ingress of the tunnel port. Use ETH_P_8021Q for tunnel port enslaved to 802.1d and 802.1q bridges. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	e2c777d7e3	mlxsw: spectrum_switchdev: Create common function for joining VxLAN to VLAN-aware bridge The code in mlxsw_sp_bridge_8021q_vxlan_join() can be used also for 802.1ad bridge. Move the code to function called mlxsw_sp_bridge_vlan_aware_vxlan_join() and call it from mlxsw_sp_bridge_8021q_vxlan_join() to enable code reuse. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	598874c8e9	mlxsw: reg: Add support for tunnel port in SPVID register Add spvid_tport field which indicates if the port is tunnel port. When spvid_tport is true, local_port field supposed to be tunnel port type. It will be used to configure which Ethertype will be used when VLAN is pushed at ingress for tunnel port. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	c1c32a79c5	mlxsw: reg: Add Switch Port VLAN Stacking Register SPVTR register configures the VLAN mode of the port to enable VLAN stacking. It will be used to configure VxLAN to push VLAN to the decapsulated packet. Without this setting, Spectrum-2 overtakes the VLAN tag of decapsulated packet for bridging. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Amit Cohen	02c3b5c5d0	mlxsw: Use one enum for all registers that contain tunnel_port field Currently SFN, TNUMT and TNPC registers use separate enums for tunnel_port. Create one enum with a neutral name and use it. Remove the enums that are not currently required. The next patches add two more registers that contain tunnel_port field, the new enum can be used for them also. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:45:56 -08:00
Maxim Mikityanskiy	2f6b379cca	net/mlx5e: Fill mlx5e_create_cq_param in a function Create a function to fill the fields of struct mlx5e_create_cq_param based on a channel. The purpose is code reuse between normal CQs, XSK CQs and the upcoming QoS CQs. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:49 -08:00
Aya Levin	c28e3bd4cc	net/mlx5e: Split between RX/TX tunnel FW support indication Use the new FW caps to advertise for ip-in-ip tunnel support separately for RX and TX. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:48 -08:00
YueHaibing	0c4accc41c	net/mlx5: Fix passing zero to 'PTR_ERR' Fix smatch warnings: drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c:105 esw_acl_egress_lgcy_setup() warn: passing zero to 'PTR_ERR' drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c:177 esw_acl_egress_ofld_setup() warn: passing zero to 'PTR_ERR' drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c:184 esw_acl_ingress_lgcy_setup() warn: passing zero to 'PTR_ERR' drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.c:262 esw_acl_ingress_ofld_setup() warn: passing zero to 'PTR_ERR' esw_acl_table_create() never returns NULL, so NULL test should be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:48 -08:00
Shay Drory	d894892dda	net/mlx5: Arm only EQs with EQEs Currently, when more than one EQ is sharing an IRQ, and this IRQ is being interrupted, all the EQs sharing the IRQ will be armed. This is done regardless of whether an EQ has EQE. When multiple EQs are sharing an IRQ, one or more EQs can have valid EQEs. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:48 -08:00
YueHaibing	fe8395168d	net/mlx5e: Remove duplicated include Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:47 -08:00
Zhu Yanjun	ade84367fb	net/mlx5e: remove unnecessary memset Since kvzalloc will initialize the allocated memory, it is not necessary to initialize it once again. Fixes: `11b717d615` ("net/mlx5: E-Switch, Get reg_c0 value on CQE") Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:47 -08:00
Eran Ben Elisha	1880bc4e4a	net/mlx5e: Add TX port timestamp support Transmitted packet timestamping accuracy can be improved when using timestamp from the port, instead of packet CQE creation timestamp, as it better reflects the actual time of a packet's transmit. TX port timestamping is supported starting from ConnectX6-DX hardware. Although at the original completion, only CQE timestamp can be attached, we are able to get TX port timestamping via an additional completion over a special CQ associated with the SQ (in addition to the regular CQ). Driver to ignore the original packet completion timestamp, and report back the timestamp of the special CQ completion. If the absolute timestamp diff between the two completions is greater than 1 / 128 second, ignore the TX port timestamp as it has a jitter which is too big. No skb will be generate out of the extra completion. Allocate additional CQ per ptpsq, to receive the TX port timestamp. Driver to hold an skb FIFO in order to map between transmitted skb to the two expected completions. When using ptpsq, hold double refcount on the skb, to gaurantee it will not get released before both completions arrive. Expose dedicated counters of the ptp additional CQ and connect it to the TX health reporter. This patch improves TX Hardware timestamping offset to be less than 40ns at a 100Gbps line rate, compared to 600ns before. With that, making our HW compliant with G.8273.2 class C, and allow Linux systems to be deployed in the 5G telco edge, where this standard is a must. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:47 -08:00
Eran Ben Elisha	145e5637d9	net/mlx5e: Add TX PTP port object support Add TX PTP port object support for better TX timestamping accuracy. Currently, driver supports CQE based TX port timestamp. Device also offers TX port timestamp, which has less jitter and better reflects the actual time of a packet's transmit. Define new driver layout called ptpsq, on which driver will create SQs that will support TX port timestamp for their transmitted packets. Driver to identify PTP TX skbs and steer them to these dedicated SQs as part of the select queue ndo. Driver to hold ptpsq per TC and report them at netif_set_real_num_tx_queues(). Add support for all needed functionality in order to xmit and poll completions received via ptpsq. Add ptpsq to the TX reporter recover, diagnose and dump methods. Creation of ptpsqs is disabled by default, and can be enabled via tx_port_ts private flag. This patch steer all timestamp related packets to a ptpsq, but it does not open the port timestamp support for it. The support will be added in the following patch. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:46 -08:00
Eran Ben Elisha	cecaa6a7d5	net/mlx5e: Move MLX5E_RX_ERR_CQE macro MLX5E_RX_ERR_CQE Macro is used only in data-path, move it to the appropriate header file. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:46 -08:00
Eran Ben Elisha	1a7f51240d	net/mlx5e: Split SW group counters update function SW group counter update function aggregates sw stats out of many mlx5e__stats resides in a given mlx5e_channel_stats struct. Split the function into a few helper functions. This will be used later in the series to calculate specific mlx5e__stats which are not defined inside mlx5e_channel_stats. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:46 -08:00
Eran Ben Elisha	0b676aaecc	net/mlx5e: Change skb fifo push/pop API to be used without SQ The skb fifo push/pop API used pre-defined attributes within the mlx5e_txqsq. In order to share the skb fifo API with other non-SQ use cases, change the API input to get newly defined mlx5e_skb_fifo struct. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:46 -08:00
Eran Ben Elisha	4ad40d8ee3	net/mlx5e: Allow SQ outside of channel context In order to be able to create an SQ outside of a channel context, remove sq->channel direct pointer. This requires adding a direct pointer to: netdevice, priv and mlx5_core in order to support SQs that are part of mlx5e_channel. Use channel_stats from the corresponding CQ. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:45 -08:00
Aya Levin	521f31af00	net/mlx5e: Allow RQ outside of channel context In order to be able to create an RQ outside of a channel context, remove rq->channel direct pointer. This requires adding a direct pointer to: ICOSQ and priv in order to support RQs that are part of mlx5e_channel. Use channel_stats from the corresponding CQ. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:45 -08:00
Aya Levin	4d0b7ef909	net/mlx5e: Allow CQ outside of channel context In order to be able to create a CQ outside of a channel context, remove cq->channel direct pointer. This requires adding a direct pointer to channel statistics, netdevice, priv and to mlx5_core in order to support CQs that are a part of mlx5e_channel. In addition, parameters the were previously derived from the channel like napi, NUMA node, channel stats and index are now assembled in struct mlx5e_create_cq_param which is given to mlx5e_open_cq() instead of channel pointer. Generalizing mlx5e_open_cq() allows opening CQ outside of channel context which will be used in following patches in the patch-set. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:45 -08:00
Tariq Toukan	cdd3f2367a	net/mlx5e: Free drop RQ in a dedicated function The drop RQ has very limited objects to be freed, and differs from regular RQs in the context that it is freed from. Add a dedicated function for it, use it where needed, and remove the drop_rq-specific checks in the generic function. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-12-08 11:28:44 -08:00
Jakub Kicinski	8e98387b16	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next auxbus support This pull request is targeting net-next and rdma-next branches. This series provides mlx5 support for auxiliary bus devices. It starts with a merge commit of tag 'auxbus-5.11-rc1' from gregkh/driver-core into mlx5-next, then the mlx5 patches that will convert mlx5 ulp devices (netdev, rdma, vdpa) to use the proper auxbus infrastructure instead of the internal mlx5 device and interface management implementation, which Leon is deleting at the end of this patchset. Link: https://lore.kernel.org/alsa-devel/20201026111849.1035786-1-leon@kernel.org/ Thanks to everyone for the joint effort ! * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Remove IB representors dead code net/mlx5: Simplify eswitch mode check net/mlx5: Delete custom device management logic RDMA/mlx5: Convert mlx5_ib to use auxiliary bus net/mlx5e: Connect ethernet part to auxiliary bus vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus net/mlx5: Register mlx5 devices to auxiliary virtual bus vdpa/mlx5: Make hardware definitions visible to all mlx5 devices net/mlx5_core: Clean driver version and name net/mlx5: Properly convey driver version to firmware driver core: auxiliary bus: minor coding style tweaks driver core: auxiliary bus: make remove function return void driver core: auxiliary bus: move slab.h from include file Add auxiliary bus support ==================== Link: https://lore.kernel.org/r/20201207053349.402772-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-07 18:36:21 -08:00
Jiri Pirko	acde33bf73	mlxsw: spectrum_router: Reduce mlxsw_sp_ipip_fib_entry_op_gre4() Turned out that mlxsw_sp_ipip_fib_entry_op_gre4() does not need to figure out the IP address and virtual router id. Those are exactly the same as in the fib_entry it is called for. So just use that and reduce mlxsw_sp_ipip_fib_entry_op_gre4() function to only call mlxsw_sp_ipip_fib_entry_op_gre4_rtdp() make the ipip decap op code similar to nve. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-06 19:22:14 -08:00
Petr Machata	f54d3c81b7	mlxsw: spectrum: Bump minimum FW version to xx.2008.2018 The indicated version fixes an issue whereby the MOMTE register would by default enable mirroring of ECN-marked traffic from all traffic classes, once the ECN mirroring was configured. This fix is necessary for offload of RED "ecn_mark" qevent. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-06 19:22:14 -08:00
Ido Schimmel	9add5f1954	mlxsw: core_acl: Use an array instead of a struct with a zero-length array Suppresses the following coccinelle warning: drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:139:3-7: WARNING use flexible-array member instead Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-06 19:22:14 -08:00
Ido Schimmel	42c435a2ac	mlxsw: spectrum_mr: Use flexible-array member instead of zero-length array Suppresses the following coccinelle warning: drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:18:15-19: WARNING use flexible-array member instead Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-06 19:22:14 -08:00
Ido Schimmel	4834ad8079	mlxsw: core: Trace EMAD events Currently, mlxsw triggers the 'devlink:devlink_hwmsg' tracepoint whenever a request is sent to the device and whenever a response is received from it. However, the tracepoint is not triggered when an event (e.g., port up / down) is received from the device. Also trace EMAD events in order to log a more complete picture of all the exchanged hardware messages. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-06 19:22:14 -08:00
Ido Schimmel	31e1de4f12	mlxsw: spectrum: Apply RIF configuration when joining a LAG In case a router interface (RIF) is configured for a LAG, make sure its configuration is applied on the new LAG member. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-06 19:22:14 -08:00
Leon Romanovsky	e87114022e	net/mlx5: Simplify eswitch mode check Provide mlx5_core device instead of "priv" pointer while checking eswith mode. Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-06 07:43:54 +02:00
Leon Romanovsky	601c10c89c	net/mlx5: Delete custom device management logic After conversion to use auxiliary bus, all custom device management is not needed anymore, delete it. Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-06 07:43:54 +02:00
Leon Romanovsky	93f8244431	RDMA/mlx5: Convert mlx5_ib to use auxiliary bus The conversion to auxiliary bus solves long standing issue with existing mlx5_ib<->mlx5_core coupling. It required to have both modules in initramfs if one of them needed for the boot. Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-06 07:43:50 +02:00
Leon Romanovsky	912cebf420	net/mlx5e: Connect ethernet part to auxiliary bus Reuse auxiliary bus to perform device management of the ethernet part of the mlx5 driver. Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-06 07:37:38 +02:00
Leon Romanovsky	74c9729dd8	vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus Change module registration logic to use auxiliary bus instead of custom made mlx5 register interface. Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-06 07:32:05 +02:00
Arnd Bergmann	0b32e91fdf	ethernet: select CONFIG_CRC32 as needed A number of ethernet drivers require crc32 functionality to be avaialable in the kernel, causing a link error otherwise: arm-linux-gnueabi-ld: drivers/net/ethernet/agere/et131x.o: in function `et1310_setup_device_for_multicast': et131x.c:(.text+0x5918): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/cadence/macb_main.o: in function `macb_start_xmit': macb_main.c:(.text+0x4b88): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/faraday/ftgmac100.o: in function `ftgmac100_set_rx_mode': ftgmac100.c:(.text+0x2b38): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fec_main.o: in function `set_multicast_list': fec_main.c:(.text+0x6120): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fman/fman_dtsec.o: in function `dtsec_add_hash_mac_address': fman_dtsec.c:(.text+0x830): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fman/fman_dtsec.o:fman_dtsec.c:(.text+0xb68): more undefined references to `crc32_le' follow arm-linux-gnueabi-ld: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_hwinfo.o: in function `nfp_hwinfo_read': nfp_hwinfo.c:(.text+0x250): undefined reference to `crc32_be' arm-linux-gnueabi-ld: nfp_hwinfo.c:(.text+0x288): undefined reference to `crc32_be' arm-linux-gnueabi-ld: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.o: in function `nfp_resource_acquire': nfp_resource.c:(.text+0x144): undefined reference to `crc32_be' arm-linux-gnueabi-ld: nfp_resource.c:(.text+0x158): undefined reference to `crc32_be' arm-linux-gnueabi-ld: drivers/net/ethernet/nxp/lpc_eth.o: in function `lpc_eth_set_multicast_list': lpc_eth.c:(.text+0x1934): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_flow_tbl_do': rocker_ofdpa.c:(.text+0x2e08): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_flow_tbl_del': rocker_ofdpa.c:(.text+0x3074): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_port_fdb': arm-linux-gnueabi-ld: drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.o: in function `mlx5dr_ste_calc_hash_index': dr_ste.c:(.text+0x354): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/microchip/lan743x_main.o: in function `lan743x_netdev_set_multicast': lan743x_main.c:(.text+0x5dc4): undefined reference to `crc32_le' Add the missing 'select CRC32' entries in Kconfig for each of them. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Acked-by: Mark Einon <mark.einon@gmail.com> Acked-by: Simon Horman <simon.horman@netronome.com> Link: https://lore.kernel.org/r/20201203232114.1485603-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 14:42:21 -08:00
Jakub Kicinski	a1dd1d8697	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-12-03 The main changes are: 1) Support BTF in kernel modules, from Andrii. 2) Introduce preferred busy-polling, from Björn. 3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh. 4) Memcg-based memory accounting for bpf objects, from Roman. 5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits) selftests/bpf: Fix invalid use of strncat in test_sockmap libbpf: Use memcpy instead of strncpy to please GCC selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module selftests/bpf: Add tp_btf CO-RE reloc test for modules libbpf: Support attachment of BPF tracing programs to kernel modules libbpf: Factor out low-level BPF program loading helper bpf: Allow to specify kernel module BTFs when attaching BPF programs bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF selftests/bpf: Add support for marking sub-tests as skipped selftests/bpf: Add bpf_testmod kernel module for testing libbpf: Add kernel module BTF support for CO-RE relocations libbpf: Refactor CO-RE relocs to not assume a single BTF object libbpf: Add internal helper to load BTF data by FD bpf: Keep module's btf_data_size intact after load bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP bpf: Adds support for setting window clamp samples/bpf: Fix spelling mistake "recieving" -> "receiving" bpf: Fix cold build of test_progs-no_alu32 ... ==================== Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 07:48:12 -08:00
Leon Romanovsky	a925b5e309	net/mlx5: Register mlx5 devices to auxiliary virtual bus Create auxiliary devices under new virtual bus. This will replace the custom-made mlx5 ->add()/->remove() interfaces and next patches will fill the missing callback and remove the old interface logic. The attachment of auxiliary drivers to the devices is possible in 1-to-1 manner only and it requires us to create device for every protocol, so that device (module) will be able to connect to it. System with 2 IB and 1 RoCE cards: [leonro@vm ~]$ lspci \|grep nox 00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7] [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/ mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2 [leonro@vm ~]$ rdma dev 0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456 2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457 System with RoCE SR-IOV card with 4 VFs: [leonro@vm ~]$ lspci \|grep nox 01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/ mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2 mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3 mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2 mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3 mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2 mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3 mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4 [leonro@vm ~]$ rdma dev 0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456 2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457 3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458 4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459 Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-04 14:46:56 +02:00
Leon Romanovsky	17a7612b99	net/mlx5_core: Clean driver version and name Remove exposed driver version as it was done in other drivers, so module version will work correctly by displaying the kernel version for which it is compiled. And move mlx5_core module name to general include, so auxiliary drivers will be able to use it as a basis for a name in their device ID tables. Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-04 14:46:55 +02:00
Leon Romanovsky	907af0f0ca	net/mlx5: Properly convey driver version to firmware mlx5 firmware expects driver version in specific format X.X.X, so make it always correct and based on real kernel version aligned with the driver. Fixes: `012e50e109` ("net/mlx5: Set driver version into firmware") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-04 14:46:55 +02:00
Jakub Kicinski	55fd59b003	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 15:44:09 -08:00
Yevgeny Kliteynik	d421e466c2	net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering STEs format for Connect-X5 and Connect-X6DX different. Currently, on Connext-X6DX the SW steering would break at some point when building STEs w/o giving a proper error message. Fix this by checking the STE format of the current device when initializing domain: add mlx5_ifc definitions for Connect-X6DX SW steering, read FW capability to get the current format version, and check this version when domain is being created. Fixes: `26d688e33f` ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 11:18:36 -08:00
Tariq Toukan	b336e6b25e	net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS Checksum calculation cannot be done in SW for TX kTLS HW offloaded packets. Offload it to the device, disregard the declared state of the TX csum offload feature. Fixes: `d2ead1f360` ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 11:18:35 -08:00
Randy Dunlap	8a78a44010	net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled Fix build when CONFIG_IPV6 is not enabled by making a function be built conditionally. Fixes these build errors and warnings: ../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c: In function 'accel_fs_tcp_set_ipv6_flow': ../include/net/sock.h:380:34: error: 'struct sock_common' has no member named 'skc_v6_daddr'; did you mean 'skc_daddr'? 380 \| #define sk_v6_daddr __sk_common.skc_v6_daddr \| ^~~~~~~~~~~~ ../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:55:14: note: in expansion of macro 'sk_v6_daddr' 55 \| &sk->sk_v6_daddr, 16); \| ^~~~~~~~~~~ At top level: ../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:47:13: warning: 'accel_fs_tcp_set_ipv6_flow' defined but not used [-Wunused-function] 47 \| static void accel_fs_tcp_set_ipv6_flow(struct mlx5_flow_spec spec, struct sock sk) Fixes: `5229a96e59` ("net/mlx5e: Accel, Expose flow steering API for rules add/del") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 11:18:35 -08:00
Eran Ben Elisha	1d2bb5ad89	net/mlx5: Fix wrong address reclaim when command interface is down When command interface is down, driver to reclaim all 4K page chucks that were hold by the Firmeware. Fix a bug for 64K page size systems, where driver repeatedly released only the first chunk of the page. Define helper function to fill 4K chunks for a given Firmware pages. Iterate over all unreleased Firmware pages and call the hepler per each. Fixes: `5adff6a088` ("net/mlx5: Fix incorrect page count when in internal error") Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 11:18:35 -08:00
Jakub Kicinski	32e417024f	mlx5-next-2020-12-02 Low level mlx5 updates required by both netdev and rdma trees: net/mlx5: Treat host PF vport as other (non eswitch manager) vport net/mlx5: Enable host PF HCA after eswitch is initialized net/mlx5: Rename peer_pf to host_pf net/mlx5: Make API mlx5_core_is_ecpf accept const pointer net/mlx5: Export steering related functions net/mlx5: Expose other function ifc bits net/mlx5: Expose IP-in-IP TX and RX capability bits net/mlx5: Update the hardware interface definition for vhca state net/mlx5: Update the list of the PCI supported devices net/mlx5: Avoid exposing driver internal command helpers net/mlx5: Add ts_cqe_to_dest_cqn related bits net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits net/mlx5: Check dr mask size against mlx5_match_param size net/mlx5: Add sampler destination type net/mlx5: Add sample offload hardware bits and structures Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl/IOZcACgkQSD+KveBX +j4J8wgAuxwflrYrbCWXV7LE08J7T7ZHRDE+jEbaZ0Zp9mOsYDDpcifpKwy2EVRf RKcpMYh/GzAljmEpeWIAlMxmlpXhKWXTDruWCx73r1jvdXf/RU24/zQHa6BjeiDo rMB8bgiW4a66+z4LcN/U6ahbVM5gScBNEt2sS1OIi9ZInngGVo9FgfhYMpERPNcH 3+mcHulCnGBNbbLwoTllOcgbxexn+xoByukg5Z0ddBJp007DMjzBIWDpDS0y2HaT jGo1LYONgRc3zoGVmdeu9F+tSsWBIgsaiyGxKj1T/8sZUaNz2TKj9VOiYIj9BLff cp6GRc88k7HWA4tImSHQiLbK6cx+yA== =mjvI -----END PGP SIGNATURE----- Merge tag 'mlx5-next-2020-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next-2020-12-02 Low level mlx5 updates required by both netdev and rdma trees. * tag 'mlx5-next-2020-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Treat host PF vport as other (non eswitch manager) vport net/mlx5: Enable host PF HCA after eswitch is initialized net/mlx5: Rename peer_pf to host_pf net/mlx5: Make API mlx5_core_is_ecpf accept const pointer net/mlx5: Export steering related functions net/mlx5: Expose other function ifc bits net/mlx5: Expose IP-in-IP TX and RX capability bits net/mlx5: Update the hardware interface definition for vhca state net/mlx5: Update the list of the PCI supported devices net/mlx5: Avoid exposing driver internal command helpers net/mlx5: Add ts_cqe_to_dest_cqn related bits net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits net/mlx5: Check dr mask size against mlx5_match_param size net/mlx5: Add sampler destination type net/mlx5: Add sample offload hardware bits and structures ==================== Link: https://lore.kernel.org/r/20201203011010.213440-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-02 17:29:23 -08:00
Danielle Ratson	09139f67d3	mlxsw: Add QinQ configuration vetoes After adding support for QinQ, a.k.a 802.1ad protocol, there are a few scenarios that should be vetoed. The vetoes are motivated by various ASIC limitations. For example, a port that is member in a 802.1ad bridge cannot have 802.1q uppers as the port needs to be configured to treat 802.1q packets as untagged packets. Veto all those unsupported scenarios and return suitable messages. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:13 -08:00
Amit Cohen	80dfeafd64	mlxsw: spectrum_switchdev: Add support of QinQ traffic 802.1ad, also known as QinQ is an extension to the 802.1q standard, which is concerned with passing possibly 802.1q-tagged packets through another VLAN-like tunnel. The format of 802.1ad tag is the same as 802.1q, except it uses the EtherType of 0x88a8, unlike 802.1q's 0x8100. Add support for 802.1ad protocol. Most of the configuration is the same as 802.1q. The difference is that before a port joins an 802.1ad bridge it needs to be configured to recognize 802.1ad packets as tagged and other packets (e.g., 802.1q) as untagged. VXLAN is not currently supported with 802.1ad bridge, so return an error with an appropriate extack message. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	773ce33a48	mlxsw: spectrum_switchdev: Create common functions for VLAN-aware bridge The code in mlxsw_sp_bridge_8021q_port_{join, leave}() can be used also for 802.1ad bridge. Move the code to functions called mlxsw_sp_bridge_vlan_aware_port_{join, leave}() and call them from mlxsw_sp_bridge_8021q_port_{join, leave}() respectively to enable code reuse. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	3ae7a65b64	mlxsw: Make EtherType configurable when pushing VLAN at ingress Currently, when pushing a PVID at ingress, mlxsw always uses 802.1q EtherType. Make this EtherType configurable by extending mlxsw_sp_port_pvid_set() with an EtherType argument. This is a preparation for QinQ support, that needs to push a PVID with 802.1ad EtherType. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	a2ef3ae158	mlxsw: spectrum: Only treat 802.1q packets as tagged packets By default, the device considers both 802.1ad and 802.1q packets as tagged, but this is not supported by the driver. It only supports VLAN and bridge devices that use 802.1q protocol. Instead, configure the device to only treat 802.1q packets as tagged packets. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	2a5a290d6d	mlxsw: reg: Add et_vlan field to SPVID register et_vlan field is used to configure which EtherType is used when VLAN is pushed at ingress (for untagged packets or for QinQ push mode). It will be used to configure tagging with ether_type1 (i.e., 0x88A8) for QinQ mode. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	7e9a6620d5	mlxsw: reg: Add Switch Port VLAN Classification Register SPVC configures the port to identify packets as untagged / single tagged / double tagged packets based on the packet EtherTypes. It will be used to classify 802.1q packets as untagged and 802.1ad packets as tagged when received by ports member in a 802.1ad bridge. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Björn Töpel	b02e5a0ebb	xsk: Propagate napi_id to XDP socket Rx path Add napi_id to the xdp_rxq_info structure, and make sure the XDP socket pick up the napi_id in the Rx path. The napi_id is used to find the corresponding NAPI structure for socket busy polling. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com	2020-12-01 00:09:25 +01:00
Ido Schimmel	ff47fa13c9	mlxsw: spectrum_router: Update adjacency index more efficiently The device supports an operation that allows the driver to issue one request to update the adjacency index for all the routes in a given virtual router (VR) from old index and size to new ones. This is useful in case the configuration of a certain nexthop group is updated and its adjacency index changes. Currently, the driver does not use this operation in an efficient manner. It iterates over all the routes using the nexthop group and issues an update request for the VR if it is not the same as the previous VR. Instead, use the VR tracking added in the previous patch to update the adjacency index once for each VR currently using the nexthop group. Example: 8k IPv6 routes were added in an alternating manner to two VRFs. All the routes are using the same nexthop object ('nhid 1'). Before: # perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3 Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3': 16,385 devlink:devlink_hwmsg 4.255933213 seconds time elapsed 0.000000000 seconds user 0.666923000 seconds sys Number of EMAD transactions corresponds to number of routes using the nexthop group. After: # perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3 Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3': 3 devlink:devlink_hwmsg 0.077655094 seconds time elapsed 0.000000000 seconds user 0.076698000 seconds sys Number of EMAD transactions corresponds to number of VRFs / VRs. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:17:33 -08:00
Ido Schimmel	d2141a42b9	mlxsw: spectrum_router: Track nexthop group virtual router membership For each nexthop group, track in which virtual routers (VRs) the group is used. This is going to be used by the next patch to perform a more efficient adjacency index update whenever the group's adjacency index changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:17:33 -08:00
Ido Schimmel	9a4ab10c74	mlxsw: spectrum_router: Rollback virtual router adjacency pointer update In the rare case where the adjacency pointer cannot be updated for a given virtual router, rollback the operation so that virtual routers that are already using the new index will use the old one again. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:17:33 -08:00
Ido Schimmel	40e4413d5d	mlxsw: spectrum_router: Pass virtual router parameters directly instead of pointer mlxsw_sp_adj_index_mass_update_vr() only needs the virtual router's identifier and protocol, so pass them directly. In a subsequent patch the caller will not have access to the pointer. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:17:33 -08:00
Ido Schimmel	1c2c5eb6e1	mlxsw: spectrum_router: Fix error handling issue Return error to the caller instead of suppressing it. Fixes: `e3ddfb45ba` ("mlxsw: spectrum_router: Allow returning errors from mlxsw_sp_nexthop_group_refresh()") Addresses-Coverity: ("Error handling issues (CHECKED_RETURN)") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:17:32 -08:00
Parav Pandit	617b860c18	net/mlx5: Treat host PF vport as other (non eswitch manager) vport When eswitch manager is running on ECPF, host PF should be treated as non eswitch manager port, similar to other VF vports. Fail to do so, results in firmware treating PF's vport as ECPF vport for eswitch ACL tables. Non zero check to figure out if a given vport is other vport or not is not sufficient becase PF vport number = 0 on ECPF. Hence, create esw acl tables with an attribute of other vport. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:45:03 -08:00
Parav Pandit	5bef709d76	net/mlx5: Enable host PF HCA after eswitch is initialized Currently ECPF enables external host PF too early in the initialization sequence for Ethernet links when ECPF is eswitch manager. Due to this, when external host PF driver is loaded, host PF's HCA CAP has inner_ip_version supported by NIC RX flow table. This capability is later updated by firmware after ECPF driver enables ENCAP/DECAP as eswitch manager. This results into a timing race condition, where CREATE_TIR command fails with a below syndrome on host PF. mlx5_cmd_check:775:(pid 510): CREATE_TIR(0x900) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x562b00) Hence, enable the external host PF after necessary eswitch and per vport initialization is completed. Continue to enable host PF when eswitch manager capability is off for a ECPF. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:45:03 -08:00
Parav Pandit	8a90f2fc67	net/mlx5: Rename peer_pf to host_pf To match the hardware spec, rename peer_pf to host_pf. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:45:03 -08:00
Eli Cohen	8d2a9d8d64	net/mlx5: Export steering related functions Export mlx5_create_flow_table() mlx5_create_flow_group() mlx5_destroy_flow_group(). These symbols are required by the VDPA implementation to create rules that consume VDPA specific traffic. We do not deal with putting the prototypes in a header file since they already exist in include/linux/mlx5/fs.h. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:45:02 -08:00
Meir Lichtinger	dd8595eabe	net/mlx5: Update the list of the PCI supported devices Add the upcoming BlueField-3 device ID. Signed-off-by: Meir Lichtinger <meirl@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:43:48 -08:00
Parav Pandit	e5dfe6b57e	net/mlx5: Avoid exposing driver internal command helpers mlx5 command init and cleanup routines are internal to mlx5_core driver. Hence, avoid exporting them and move their definition to mlx5_core driver's internal file mlx5_core.h Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:43:48 -08:00
Muhammad Sammar	7da3ad6c26	net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits Add misc4 match params to enable matching on prog_sample_fields. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:43:47 -08:00
Muhammad Sammar	699d531f55	net/mlx5: Check dr mask size against mlx5_match_param size This is to allow passing misc4 match param from userspace when function like ib_flow_matcher_create is called. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:43:47 -08:00
Chris Mi	3873063088	net/mlx5: Add sampler destination type The flow sampler object is a new destination type. Add a new member for the flow destination. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-26 18:43:47 -08:00
Christian Eggers	37e9d0559a	mlxsw: spectrum_ptp: use PTP wide message type definitions Use recently introduced PTP wide defines instead of a driver internal enumeration. Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Cc: Petr Machata <petrm@mellanox.com> Cc: Jiri Pirko <jiri@nvidia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 12:23:14 -08:00
Ido Schimmel	37b50e556e	mlxsw: spectrum_trap: Add blackhole_nexthop trap Register with devlink the blackhole_nexthop trap so that mlxsw will be able to report packets dropped due to a blackhole nexthop. The internal trap identifier is "DISCARD_ROUTER3", which traps packets dropped in the adjacency table. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 12:14:56 -08:00
Ido Schimmel	68e92ad855	mlxsw: spectrum_router: Add support for blackhole nexthops Add support for blackhole nexthops by programming them to the adjacency table with a discard action. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 12:14:56 -08:00
Ido Schimmel	18c4b79d28	mlxsw: spectrum_router: Resolve RIF from nexthop struct instead of neighbour The two are the same, but for blackhole nexthops we will not have an associated neighbour struct, so resolve the RIF from the nexthop struct itself instead. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 12:14:56 -08:00
Ido Schimmel	919f6aaa3a	mlxsw: spectrum_router: Use loopback RIF for unresolved nexthops Now that the driver creates a loopback RIF during its initialization, it can be used to program the adjacency entries for unresolved nexthops instead of other RIFs. The loopback RIF is guaranteed to exist for the entire life time of the driver, unlike other RIFs that come and go. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 12:14:55 -08:00
Ido Schimmel	52d45575ec	mlxsw: spectrum_router: Use different trap identifier for unresolved nexthops Unresolved nexthops are currently written to the adjacency table with a discard action. Packets hitting such entries are trapped to the CPU via the 'DISCARD_ROUTER3' trap which can be enabled or disabled on demand, but is always enabled in order to ensure the kernel can resolve the unresolved neighbours. This trap will be needed for blackhole nexthops support. Therefore, move unresolved nexthops to explicitly program the adjacency entries with a trap action and a different trap identifier, 'RTR_EGRESS0'. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 12:14:55 -08:00
Ido Schimmel	07c78536ef	mlxsw: spectrum_router: Create loopback RIF during initialization Up until now RIFs (router interfaces) were created on demand (e.g., when an IP address was added to a netdev). However, sometimes the device needs to be provided with a RIF when one might not be available. For example, adjacency entries that drop packets need to be programmed with an egress RIF despite the RIF not being used to forward packets. Create such a RIF during initialization so that it could be used later on to support blackhole nexthops. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 12:14:55 -08:00
Jakub Kicinski	cc69837fca	net: don't include ethtool.h from netdevice.h linux/netdevice.h is included in very many places, touching any of its dependecies causes large incremental builds. Drop the linux/ethtool.h include, linux/netdevice.h just needs a forward declaration of struct ethtool_ops. Fix all the places which made use of this implicit include. Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20201120225052.1427503-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 17:27:04 -08:00
Ido Schimmel	cdd6cfc54c	mlxsw: spectrum_router: Allow programming routes with nexthop objects Now that the driver supports nexthop objects, the check is no longer necessary. Remove it. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:20:20 -08:00
Ido Schimmel	c25db3a77f	mlxsw: spectrum_router: Enable resolution of nexthop groups from nexthop objects If the FIB info (i.e, 'struct fib_info', 'struct fib6_info') uses a nexthop object, then use the object's identifier to resolve the nexthop group. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:20:20 -08:00
Ido Schimmel	2a014b200b	mlxsw: spectrum_router: Add support for nexthop objects Register a listener to the nexthop notification chain and parse notified nexthop objects into the existing mlxsw nexthop data structures. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:20:20 -08:00
Tariq Toukan	1a0058cf0c	net/mlx4_en: Remove unused performance counters Performance analysis counters are maintained under the MLX4_EN_PERF_STAT definition, which is never set. Clean them up, with all related structures and logic. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Link: https://lore.kernel.org/r/20201118103427.4314-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 10:59:39 -08:00
Jacob Keller	52cc5f3a16	devlink: move flash end and begin to core devlink When performing a flash update via devlink, device drivers may inform user space of status updates via devlink_flash_update_(begin\|end\|timeout\|status)_notify functions. It is expected that drivers do not send any status notifications unless they send a begin and end message. If a driver sends a status notification without sending the appropriate end notification upon finishing (regardless of success or failure), the current implementation of the devlink userspace program can get stuck endlessly waiting for the end notification that will never come. The current ice driver implementation may send such a status message without the appropriate end notification in rare cases. Fixing the ice driver is relatively simple: we just need to send the begin_notify at the start of the function and always send an end_notify no matter how the function exits. Rather than assuming driver authors will always get this right in the future, lets just fix the API so that it is not possible to get wrong. Make devlink_flash_update_begin_notify and devlink_flash_update_end_notify static, and call them in devlink.c core code. Always send the begin_notify just before calling the driver's flash_update routine. Always send the end_notify just after the routine returns regardless of success or failure. Doing this makes the status notification easier to use from the driver, as it no longer needs to worry about catching failures and cleaning up by calling devlink_flash_update_end_notify. It is now no longer possible to do the wrong thing in this regard. We also save a couple of lines of code in each driver. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 21:41:02 -08:00
Jacob Keller	b44cfd4f5b	devlink: move request_firmware out of driver All drivers which implement the devlink flash update support, with the exception of netdevsim, use either request_firmware or request_firmware_direct to locate the firmware file. Rather than having each driver do this separately as part of its .flash_update implementation, perform the request_firmware within net/core/devlink.c Replace the file_name parameter in the struct devlink_flash_update_params with a pointer to the fw object. Use request_firmware rather than request_firmware_direct. Although most Linux distributions today do not have the fallback mechanism implemented, only about half the drivers used the _direct request, as compared to the generic request_firmware. In the event that a distribution does support the fallback mechanism, the devlink flash update ought to be able to use it to provide the firmware contents. For distributions which do not support the fallback userspace mechanism, there should be essentially no difference between request_firmware and request_firmware_direct. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Shannon Nelson <snelson@pensando.io> Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 21:40:57 -08:00
Jakub Kicinski	56495a2442	Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 19:08:46 -08:00
Aya Levin	6d9c8d15af	net/mlx4_core: Fix init_hca fields offset Slave function read the following capabilities from the wrong offset: 1. log_mc_entry_sz 2. fs_log_entry_sz 3. log_mc_hash_sz Fix that by adjusting these capabilities offset to match firmware layout. Due to the wrong offset read, the following issues might occur: 1+2. Negative value reported at max_mcast_qp_attach. 3. Driver to init FW with multicast hash size of zero. Fixes: `a40ded6043` ("net/mlx4_core: Add masking for a few queries on HCA caps") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20201118081922.553-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 17:46:20 -08:00
Jakub Kicinski	f93e8497a9	mlx5-fixes-2020-11-17 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl+0KZ4ACgkQSD+KveBX +j5DDAgAip1LGfOUq7RISh1OWQ9zLl2KT/mmbdSioObGgjKunU4lMGZAPrLB0bpe RH1RodslY+1bepcz7VQ/QxbL0EU605SZn08xLZIQAYYXT0Sar+hhg7h7vhD0iSA1 dwDF/XLQYHf8snc3x/tu6/zp9BzfzRfuiieYAbC1ri97Iz9uoYO+QcRI7OyqgXWf LmK9/s6WzgaC0DKg8XXEQGp7F1MbZfJph6tThZnF62NY9Rrkut6AEah85p1QaqF7 9WLjqOtJwo893fzbN+WCXJTYo2UX9oU18nUSvSoYid2LEofiwY3PcT8R0gKSPZto cVcjm1oGsotbonBnPlR2eMZq8LdPDQ== =MKi5 -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2020-11-17 This series introduces some fixes to mlx5 driver. * tag 'mlx5-fixes-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: fix error return code in mlx5e_tc_nic_init() net/mlx5: E-Switch, Fail mlx5_esw_modify_vport_rate if qos disabled net/mlx5: Disable QoS when min_rates on all VFs are zero net/mlx5: Clear bw_share upon VF disable net/mlx5: Add handling of port type in rule deletion net/mlx5e: Fix check if netdev is bond slave net/mlx5e: Fix IPsec packet drop by mlx5e_tc_update_skb net/mlx5e: Set IPsec WAs only in IP's non checksum partial case. net/mlx5e: Fix refcount leak on kTLS RX resync ==================== Link: https://lore.kernel.org/r/20201117195702.386113-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 17:17:33 -08:00
Ido Schimmel	e3ddfb45ba	mlxsw: spectrum_router: Allow returning errors from mlxsw_sp_nexthop_group_refresh() The function is responsible for allocating the adjacency entries used by the nexthop group and populating them with the adjacency information such as egress RIF and MAC address. Allow the function to return an error when it encounters a problem and have the relevant call sites check it. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	2efca2bfba	mlxsw: spectrum_router: Add an indication if a nexthop group can be destroyed Currently, a nexthop group is destroyed when the last FIB entry is detached from it. When nexthop objects are supported, this can no longer be the case, as the group is a separate object whose lifetime is managed by user space. Add an indication if a nexthop group can be destroyed and always set it to true for the existing IPv4 and IPv6 nexthop groups. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	a9a711a3f7	mlxsw: spectrum_router: Only clear offload indication from valid IPv6 FIB info When the IPv6 FIB info has a nexthop object, the nexthop offload indication is set on the nexthop object and not on the FIB info itself. Therefore, do not try to clear the offload indication from the FIB info when it has a nexthop object. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	5b9954e1e7	mlxsw: spectrum_router: Re-order mlxsw_sp_nexthop6_group_get() Attach the FIB entry to the nexthop group after setting the offload flag on the IPv6 FIB info (i.e., 'struct fib6_info'). The second operation is not needed when the nexthop group is a nexthop object. This will allow us to have a common exit path from the function, regardless of the nexthop group's type. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	c0351b7c25	mlxsw: spectrum_router: Set FIB entry's type based on nexthop group The previous patch associated a nexthop group with the FIB entry before the entry's type is determined. Make use of the nexthop group when determining the entry's type instead of relying on helpers that assume that the nexthop info is not a nexthop object (i.e., 'struct nexthop'). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	5c9a3b2451	mlxsw: spectrum_router: Set FIB entry's type after creating nexthop group Each FIB entry has a type (e.g., remote, local) that determines how the entry is programmed to the device. In order to determine if the entry is local (directly connected) or remote (has a gateway) the relevant FIB info structures (e.g., 'struct fib_info') are checked. When entries that use nexthop objects are supported, these checks will need to be changed to take into account 'struct nexthop'. Instead, first associate the entry with a nexthop group so that the next patch could determine the entry's type based on the associated nexthop group's type. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	c68e248d53	mlxsw: spectrum_router: Pass ifindex to mlxsw_sp_ipip_entry_find_by_decap() The sole caller of the function will soon only have the ifindex available, instead of the pointer itself. Therefore, change the function to take the ifindex as input and have it get the pointer. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	ff8a24182a	mlxsw: spectrum_router: Set ifindex for IPv4 nexthops The ifindex of the nexthop device was never set for IPv4 nexthops, unlike IPv6 nexthops. This went unnoticed since only IPv6 nexthops use it. Set the ifindex for IPv4 nexthops in order to be consistent with IPv6 and also because it will be used by a later patch. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	fbf805bf1f	mlxsw: spectrum_router: Fix wrong kfree() in error path The function allocates 'nhgi', not 'nh_grp', so it needs to free the former in its error path. Fixes: `7f7a417e6a` ("mlxsw: spectrum_router: Split nexthop group configuration to a different struct") Addresses-Coverity: ("Memory - corruptions (USE_AFTER_FREE)") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:51:18 -08:00
Ido Schimmel	1f492eab67	mlxsw: core: Use variable timeout for EMAD retries The driver sends Ethernet Management Datagram (EMAD) packets to the device for configuration purposes and waits for up to 200ms for a reply. A request is retried up to 5 times. When the system is under heavy load, replies are not always processed in time and EMAD transactions fail. Make the process more robust to such delays by using exponential backoff. First wait for up to 200ms, then retransmit and wait for up to 400ms and so on. Fixes: `caf7297e7a` ("mlxsw: core: Introduce support for asynchronous EMAD register access") Reported-by: Denis Yulevich <denisyu@nvidia.com> Tested-by: Denis Yulevich <denisyu@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:07:00 -08:00
Ido Schimmel	fb738b99ef	mlxsw: Fix firmware flashing The commit cited below moved firmware flashing functionality from mlxsw_spectrum to mlxsw_core, but did not adjust the Kconfig dependencies. This makes it possible to have mlxsw_core as built-in and mlxfw as a module. The mlxfw code is therefore not reachable from mlxsw_core and firmware flashing fails: # devlink dev flash pci/0000:01:00.0 file mellanox/mlxsw_spectrum-13.2008.1310.mfa2 devlink answers: Operation not supported Fix by having mlxsw_core select mlxfw. Fixes: `b79cb787ac` ("mlxsw: Move fw flashing code into core.c") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vadim Pasternak <vadimp@nvidia.com> Tested-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 11:07:00 -08:00
Wang Hai	68ec32daf7	net/mlx5: fix error return code in mlx5e_tc_nic_init() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `aedd133d17` ("net/mlx5e: Support CT offload for tc nic flows") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:54 -08:00
Eli Cohen	5b8631c7b2	net/mlx5: E-Switch, Fail mlx5_esw_modify_vport_rate if qos disabled Avoid calling mlx5_esw_modify_vport_rate() if qos is not enabled and avoid unnecessary syndrome messages from firmware. Fixes: `fcb64c0f56` ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:53 -08:00
Vladyslav Tarasiuk	470b747582	net/mlx5: Disable QoS when min_rates on all VFs are zero Currently when QoS is enabled for VF and any min_rate is configured, the driver sets bw_share value to at least 1 and doesn’t allow to set it to 0 to make minimal rate unlimited. It means there is always a minimal rate configured for every VF, even if user tries to remove it. In order to make QoS disable possible, check whether all vports have configured min_rate = 0. If this is true, set their bw_share to 0 to disable min_rate limitations. Fixes: `c9497c9890` ("net/mlx5: Add support for setting VF min rate") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:53 -08:00
Vladyslav Tarasiuk	1ce5fc724a	net/mlx5: Clear bw_share upon VF disable Currently, if user disables VFs with some min and max rates configured, they are cleared. But QoS data is not cleared and restored upon next VF enable placing limits on minimal rate for given VF, when user expects none. To match cleared vport->info struct with QoS-related min and max rates upon VF disable, clear vport->qos struct too. Fixes: `556b9d16d3` ("net/mlx5: Clear VF's configuration on disabling SRIOV") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:53 -08:00
Michael Guralnik	8cbcc5ef2a	net/mlx5: Add handling of port type in rule deletion Handle destruction of rules with port destination type to enable full destruction of flow. Without this handling of TX rules the deletion of these rules fails. Dmesg of flow destruction failure: [ 203.714146] mlx5_core 0000:00:0b.0: mlx5_cmd_check:753:(pid 342): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a) [ 210.547387] ------------[ cut here ]------------ [ 210.548663] refcount_t: decrement hit 0; leaking memory. [ 210.550651] WARNING: CPU: 4 PID: 342 at lib/refcount.c:31 refcount_warn_saturate+0x5c/0x110 [ 210.550654] Modules linked in: mlx5_ib mlx5_core ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm ib_umad ib_uverbs ib_core [ 210.550675] CPU: 4 PID: 342 Comm: test Not tainted 5.8.0-rc2+ #116 [ 210.550678] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 210.550680] RIP: 0010:refcount_warn_saturate+0x5c/0x110 [ 210.550685] Code: c6 d1 1b 01 00 0f 84 ad 00 00 00 5b 5d c3 80 3d b5 d1 1b 01 00 75 f4 48 c7 c7 20 d1 15 82 c6 05 a5 d1 1b 01 01 e8 a7 eb af ff <0f> 0b eb dd 80 3d 99 d1 1b 01 00 75 d4 48 c7 c7 c0 cf 15 82 c6 05 [ 210.550687] RSP: 0018:ffff8881642e77e8 EFLAGS: 00010282 [ 210.550691] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 [ 210.550694] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102c85ceef [ 210.550696] RBP: ffff888161720428 R08: ffffffff8124c10e R09: ffffed103243beae [ 210.550698] R10: ffff8881921df56b R11: ffffed103243bead R12: ffff8881841b4180 [ 210.550701] R13: ffff888161720428 R14: ffff8881616d0000 R15: ffff888161720380 [ 210.550704] FS: 00007fc27f025740(0000) GS:ffff888192000000(0000) knlGS:0000000000000000 [ 210.550706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 210.550708] CR2: 0000557e4b41a6a0 CR3: 0000000002415004 CR4: 0000000000360ea0 [ 210.550711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 210.550713] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 210.550715] Call Trace: [ 210.550717] mlx5_del_flow_rules+0x484/0x490 [mlx5_core] [ 210.550720] ? mlx5_cmd_set_fte+0xa80/0xa80 [mlx5_core] [ 210.550722] mlx5_ib_destroy_flow+0x17f/0x280 [mlx5_ib] [ 210.550724] uverbs_free_flow+0x4c/0x90 [ib_uverbs] [ 210.550726] destroy_hw_idr_uobject+0x41/0xb0 [ib_uverbs] [ 210.550728] uverbs_destroy_uobject+0xaa/0x390 [ib_uverbs] [ 210.550731] __uverbs_cleanup_ufile+0x129/0x1b0 [ib_uverbs] [ 210.550733] ? uverbs_destroy_uobject+0x390/0x390 [ib_uverbs] [ 210.550735] uverbs_destroy_ufile_hw+0x78/0x190 [ib_uverbs] [ 210.550737] ib_uverbs_close+0x36/0x140 [ib_uverbs] [ 210.550739] __fput+0x181/0x380 [ 210.550741] task_work_run+0x88/0xd0 [ 210.550743] do_exit+0x5f6/0x13b0 [ 210.550745] ? sched_clock_cpu+0x30/0x140 [ 210.550747] ? is_current_pgrp_orphaned+0x70/0x70 [ 210.550750] ? lock_downgrade+0x360/0x360 [ 210.550752] ? mark_held_locks+0x1d/0x90 [ 210.550754] do_group_exit+0x8a/0x140 [ 210.550756] get_signal+0x20a/0xf50 [ 210.550758] do_signal+0x8c/0xbe0 [ 210.550760] ? hrtimer_nanosleep+0x1d8/0x200 [ 210.550762] ? nanosleep_copyout+0x50/0x50 [ 210.550764] ? restore_sigcontext+0x320/0x320 [ 210.550766] ? __hrtimer_init+0xf0/0xf0 [ 210.550768] ? timespec64_add_safe+0x150/0x150 [ 210.550770] ? mark_held_locks+0x1d/0x90 [ 210.550772] ? lockdep_hardirqs_on_prepare+0x14c/0x240 [ 210.550774] __prepare_exit_to_usermode+0x119/0x170 [ 210.550776] do_syscall_64+0x65/0x300 [ 210.550778] ? trace_hardirqs_off+0x10/0x120 [ 210.550781] ? mark_held_locks+0x1d/0x90 [ 210.550783] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 210.550785] ? lockdep_hardirqs_on+0x112/0x190 [ 210.550787] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 210.550789] RIP: 0033:0x7fc27f1cd157 [ 210.550791] Code: Bad RIP value. [ 210.550793] RSP: 002b:00007ffd4db27ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023 [ 210.550798] RAX: fffffffffffffdfc RBX: ffffffffffffff80 RCX: 00007fc27f1cd157 [ 210.550800] RDX: 00007fc27f025740 RSI: 00007ffd4db27eb0 RDI: 00007ffd4db27eb0 [ 210.550803] RBP: 0000000000000016 R08: 0000000000000000 R09: 000000000000000e [ 210.550805] R10: 00007ffd4db27dc7 R11: 0000000000000246 R12: 0000000000400c00 [ 210.550808] R13: 00007ffd4db285f0 R14: 0000000000000000 R15: 0000000000000000 [ 210.550809] irq event stamp: 49399 [ 210.550812] hardirqs last enabled at (49399): [<ffffffff81172d36>] console_unlock+0x556/0x6f0 [ 210.550815] hardirqs last disabled at (49398): [<ffffffff81172897>] console_unlock+0xb7/0x6f0 [ 210.550818] softirqs last enabled at (48706): [<ffffffff81e0037b>] __do_softirq+0x37b/0x60c [ 210.550820] softirqs last disabled at (48697): [<ffffffff81c00e2f>] asm_call_on_stack+0xf/0x20 [ 210.550822] ---[ end trace ad18c0e6fa846454 ]--- [ 210.581862] mlx5_core 0000:00:0c.0: mlx5_destroy_flow_table:2132:(pid 342): Flow table 262150 wasn't destroyed, refcount > 1 Fixes: `a7ee18bdee` ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:52 -08:00
Maor Dickman	219b3267ca	net/mlx5e: Fix check if netdev is bond slave Bond events handler uses bond_slave_get_rtnl to check if net device is bond slave. bond_slave_get_rtnl return the rcu rx_handler pointer from the netdev which exists for bond slaves but also exists for devices that are attached to linux bridge so using it as indication for bond slave is wrong. Fix by using netif_is_lag_port instead. Fixes: `7e51891a23` ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:52 -08:00
Huy Nguyen	6248ce991f	net/mlx5e: Fix IPsec packet drop by mlx5e_tc_update_skb Both TC and IPsec crypto offload use metadata_regB to store private information. Since TC does not use bit 31 of regB, IPsec will use bit 31 as the IPsec packet marker. The IPsec's regB usage is changed to: Bit31: IPsec marker Bit30-24: IPsec syndrome Bit23-0: IPsec obj id Fixes: `b2ac7541e3` ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:52 -08:00
Huy Nguyen	5cfb540ef2	net/mlx5e: Set IPsec WAs only in IP's non checksum partial case. The IP's checksum partial still requires L4 csum flag on Ethernet WQE. Make the IPsec WAs only for the IP's non checksum partial case (for example icmd packet) Fixes: `5be019040c` ("net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offload") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:52 -08:00
Maxim Mikityanskiy	ea63609857	net/mlx5e: Fix refcount leak on kTLS RX resync On resync, the driver calls inet_lookup_established (__inet6_lookup_established) that increases sk_refcnt of the socket. To decrease it, the driver set skb->destructor to sock_edemux. However, it didn't work well, because the TCP stack also sets this destructor for early demux, and the refcount gets decreased only once, while increased two times (in mlx5e and in the TCP stack). It leads to a socket leak, a TLS context leak, which in the end leads to calling tls_dev_del twice: on socket close and on driver unload, which in turn leads to a crash. This commit fixes the refcount leak by calling sock_gen_put right away after using the socket, thus fixing all the subsequent issues. Fixes: `0419d8c9d8` ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-17 11:50:51 -08:00
Ido Schimmel	245f4e44d2	mlxsw: spectrum_router: Remove outdated comment Since commit `21151f64a4` ("mlxsw: Add new FIB entry type for reject routes") this comment is no longer correct. Remove it. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:04 -08:00
Ido Schimmel	9ed2b4d287	mlxsw: spectrum_router: Consolidate mlxsw_sp_nexthop{4, 6}_type_fini() The two functions are identical, so consolidate them to mlxsw_sp_nexthop_type_fini(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:04 -08:00
Ido Schimmel	c181a89a6d	mlxsw: spectrum_router: Consolidate mlxsw_sp_nexthop{4, 6}_type_init() The two functions are now identical, so consolidate them to mlxsw_sp_nexthop_type_init(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:04 -08:00
Ido Schimmel	b360952bbf	mlxsw: spectrum_router: Remove unused argument from mlxsw_sp_nexthop6_type_init() Remove it as it is unused. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:04 -08:00
Ido Schimmel	c3bde5a914	mlxsw: spectrum_router: Pass nexthop netdev to mlxsw_sp_nexthop4_type_init() Instead of passing the nexthop and resolving the nexthop netdev from it, pass the nexthop netdev directly. This will later allow us to consolidate code paths between IPv4 and IPv6 code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:04 -08:00
Ido Schimmel	4dd38da54a	mlxsw: spectrum_router: Pass nexthop netdev to mlxsw_sp_nexthop6_type_init() Instead of passing the route and resolving the nexthop netdev from it, pass the nexthop netdev directly. This will later allow us to consolidate code paths between IPv4 and IPv6 code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:04 -08:00
Ido Schimmel	7ba7bc55cf	mlxsw: spectrum_ipip: Remove overlay protocol from can_offload() callback The overlay protocol (i.e., IPv4/IPv6) that is being encapsulated has no impact on whether a certain IP tunnel can be offloaded or not. Only the underlay protocol matters. Therefore, remove the unused overlay protocol parameter from the callback. This will later allow us to consolidate code paths between IPv4 and IPv6 code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:04 -08:00
Ido Schimmel	7f7a417e6a	mlxsw: spectrum_router: Split nexthop group configuration to a different struct Currently, the individual nexthops member in the group and attributes of the group (e.g., its type) are stored in the same struct (i.e., 'struct mlxsw_sp_nexthop_group'). This is fine since the individual nexthops cannot change during the lifetime of the group. With nexthop objects this is no longer the case. An existing nexthop group can be replaced to use a new set of nexthops. Creating a new struct whenever a group is replaced entails replacing the group pointer of all the routes (i.e., 'struct mlxsw_sp_fib_entry') using the group. Avoid this inefficient step by splitting the nexthop group configuration to a different struct (i.e., 'struct mlxsw_sp_nexthop_group_info'). When a nexthop group is replaced a new group info struct is created and the individual rotues do not need to be touched. Illustration after the change: mlxsw_sp_fib_entry mlxsw_sp_nexthop_group mlxsw_sp_nexthop_group_info +-------------------+ +----------------------+ +---------------------------+ \| nh_group; +--> nhgi; +--> \| \| \| \| \| \| \| +-------------------+ +----------------------+ +---------------------------+ No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:04 -08:00
Ido Schimmel	5a49dfe51f	mlxsw: spectrum_router: Move IPv4 FIB info into a union in nexthop group struct Instead of storing the FIB info as 'priv' when the nexthop group represents an IPv4 nexthop group, simply store it as a FIB info with a proper comment. When nexthop objects are supported, this field will become a union with the nexthop object's identifier. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:03 -08:00
Ido Schimmel	46d5b7b541	mlxsw: spectrum_router: Remove unused field 'prio' from IPv4 FIB entry struct Not used anywhere. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:03 -08:00
Ido Schimmel	9ce254d9fb	mlxsw: spectrum_router: Store FIB info in route When needed, IPv4 routes fetch the FIB info (i.e., 'struct fib_info') from their associated nexthop group. This will not work when the nexthop group represents a nexthop object (i.e., 'struct nexthop'), as it will only have access to the nexthop's identifier. Instead, store the FIB info in the route itself. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:03 -08:00
Ido Schimmel	02d8fdcad7	mlxsw: spectrum_router: Associate neighbour table with nexthop instead of group As explained in the previous patch, nexthop objects can have both IPv4 and IPv6 nexthops in the same group. Therefore, move the neighbour table to be a property of the nexthop instead of the nexthop group. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:03 -08:00
Ido Schimmel	1664dd3d5e	mlxsw: spectrum_router: Use nexthop group type in hash table key Both IPv4 and IPv6 nexthop groups are hashed in the same table. The protocol field is used to indicate how the hash should be computed for each group. When nexthop group objects are supported, the hash will be computed for them based on the nexthop identifier. To differentiate between all the nexthop group types, encode the type of the group in the key instead of the protocol. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:03 -08:00
Ido Schimmel	a06191aabb	mlxsw: spectrum_router: Add nexthop group type field Currently, the type (i.e., IPv4/IPv6) of the nexthop group is derived from the neighbour table associated with the group. This is problematic when nexthop objects are taken into account, as a nexthop group object can contain both IPv4 and IPv6 nexthops. Instead, add a new field that indicates the type of the group and initialize it during the group's creation. Currently, the types are IPv4 ('struct fib_info') and IPv6 ('struct fib6_info'). In the future another type will be added for nexthop objects ('struct nexthop'). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:03 -08:00
Ido Schimmel	10502d055b	mlxsw: spectrum_router: Compare key with correct object type When comparing a key with a nexthop group in rhastable's obj_cmpfn() callback, make sure that the key and nexthop group are of the same type (i.e., IPv4 / IPv6). The bug is not currently visible because IPv6 nexthop groups do not populate the FIB info pointer and IPv4 nexthop groups do not set the ifindex for the individual nexthops. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 16:55:03 -08:00
Jakub Kicinski	07cbce2e46	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-11-14 1) Add BTF generation for kernel modules and extend BTF infra in kernel e.g. support for split BTF loading and validation, from Andrii Nakryiko. 2) Support for pointers beyond pkt_end to recognize LLVM generated patterns on inlined branch conditions, from Alexei Starovoitov. 3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh. 4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage infra, from Martin KaFai Lau. 5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the XDP_REDIRECT path, from Lorenzo Bianconi. 6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI context, from Song Liu. 7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build for different target archs on same source tree, from Jean-Philippe Brucker. 8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet. 9) Move functionality from test_tcpbpf_user into the test_progs framework so it can run in BPF CI, from Alexander Duyck. 10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner. Note that for the fix from Song we have seen a sparse report on context imbalance which requires changes in sparse itself for proper annotation detection where this is currently being discussed on linux-sparse among developers [0]. Once we have more clarification/guidance after their fix, Song will follow-up. [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/ https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/ * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits) net: mlx5: Add xdp tx return bulking support net: mvpp2: Add xdp tx return bulking support net: mvneta: Add xdp tx return bulking support net: page_pool: Add bulk support for ptr_ring net: xdp: Introduce bulking for xdp tx return path bpf: Expose bpf_d_path helper to sleepable LSM hooks bpf: Augment the set of sleepable LSM hooks bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Rename some functions in bpf_sk_storage bpf: Folding omem_charge() into sk_storage_charge() selftests/bpf: Add asm tests for pkt vs pkt_end comparison. selftests/bpf: Add skb_pkt_end test bpf: Support for pointers beyond pkt_end. tools/bpf: Always run the *-clean recipes tools/bpf: Add bootstrap/ to .gitignore bpf: Fix NULL dereference in bpf_task_storage tools/bpftool: Fix build slowdown tools/runqslower: Build bpftool using HOSTCC tools/runqslower: Enable out-of-tree build ... ==================== Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 09:13:41 -08:00
Lorenzo Bianconi	b87c57ae12	net: mlx5: Add xdp tx return bulking support Convert mlx5 driver to xdp_return_frame_bulk APIs. XDP_REDIRECT (upstream codepath): 8.9Mpps XDP_REDIRECT (upstream codepath + bulking APIs): 10.2Mpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/250460319fd868b7b5668fc1deca74dd42813a90.1605267335.git.lorenzo@kernel.org	2020-11-14 02:29:00 +01:00
Jakub Kicinski	e1d9d7b913	Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 16:54:48 -08:00
Jiri Pirko	173f14cda3	mlxsw: spectrum_router: Introduce FIB entry update op Follow-up patchset introducing XMDR implementation is going to need to distinguish write and update ops. Therefore introduce "update op" and call "write op" only when new FIB entry is inserted. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:22 -08:00
Jiri Pirko	a005a7fe2f	mlxsw: spectrum_router: Track FIB entry committed state and skip uncommitted on delete In case bulking is used, the entry that was previously added may not be yet committed to the HW as it waits in the queue for bulk send. For such entries, skip the deletion. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:22 -08:00
Jiri Pirko	ae9ce81aa7	mlxsw: spectrum_router: Introduce fib_entry priv for low-level ops Prepare for the low-level ops that need to store some data alongside the fib_entry and introduce a per-fib_entry priv for ll ops. The priv is reference counted as in the follow-up patch it is going to be saved in pack() function and used later on in commit() even in case the related fib_entry gets freed in the middle. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:21 -08:00
Jiri Pirko	91d20d71b2	mlxsw: spectrum_router: Have FIB entry op context allocated for the instance Get the max size needed for FIB entry op context and allocate it once for the instance. Use it repeatedly from the scheduled work. By this, allow to extend the context to hold more data than it is wise to do when it was on the stack. Make sure to signalize that the context needs to be initialized in case families of subsequent FIB entries differ. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:21 -08:00
Jiri Pirko	505cd65c66	mlxsw: spectrum_router: Prepare work context for possible bulking For XMDR register it is possible to carry multiple FIB entry operations in a single write. However the FW does not restrict mixing the types of operations, make the code easier and indicate the bulking is ok only in case the bulk contains FIB operations of the same family and event. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:21 -08:00
Jiri Pirko	7f5c4090e4	mlxsw: spectrum: Push RALUE packing and writing into low-level router ops With follow-up introduction of XM implementation, XMDR register is going to be optionally used instead of RALUE register. Push the RALUE packing helpers and write call into low-level router ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:21 -08:00
Jiri Pirko	1a9c21d5f7	mlxsw: spectrum_router: Use RALUE pack helper from abort function Unify the RALUE register payload packing and use the __mlxsw_sp_fib_entry_ralue_pack() helper from __mlxsw_sp_router_set_abort_trap(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:21 -08:00
Jiri Pirko	1a7fcdf75d	mlxsw: reg: Allow to pass NULL pointer to mlxsw_reg_ralue_pack4/6() In preparation for the change that is going to be done in the next patch, allow to pass NULL pointer to mlxsw_reg_ralue_pack4() and mlxsw_reg_ralue_pack6() helpers. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:20 -08:00
Jiri Pirko	0c1d6b2694	mlxsw: spectrum_router: Pass destination IP as a pointer to mlxsw_reg_ralue_pack4() Instead of passing destination IP as a u32 value, pass it as pointer to u32. Avoid using local variable for the pointer store. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:20 -08:00
Jiri Pirko	d271cf9f29	mlxsw: spectrum: Export RALUE pack helper and use it from IPIP As the RALUE packing is going to be put into op, make the user from IPIP code use the same helper as the router code does. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:20 -08:00
Jiri Pirko	0f6b66011a	mlxsw: spectrum_router: Push out RALUE pack into separate helper As the RALUE packing is going to be pushed into an op, in preparation for that push the code into a separate function in the meantime. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:20 -08:00
Jiri Pirko	2d5bd7a111	mlxsw: spectrum: Propagate context from work handler containing RALUE payload Currently, RALUE payload is defined locally in the function that is calling the register write. With introduction of alternative register to RALUE, XMDR, it has to be possible to put multiple FIB entry operations into single register write. So in order to prepare for that, have per-work entry operation context and propagate it all the way down to the functions writing RALUE. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:19 -08:00
Jiri Pirko	c1b290d594	mlxsw: spectrum_router: Introduce FIB event queue instead of separate works Currently, every FIB event is queued-up as a separate work to be processed. However, that allows to process only one FIB entry per work callback. In preparation of future XMDR register bulking of multiple FIB entries, convert to FIB event queue. Implement this by a list_head, adding new events to the end of the list in the FIB notify callback. That allows to process multiple events from the list inside the work callback. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:19 -08:00
Jiri Pirko	d57ff02286	mlxsw: spectrum_router: Use RALUE-independent op arg Since the write/delete of FIB entry is going to be implemented by XMDR register for XM implementation, introduce RALUE-independent enum for op so the enum could be used in both RALUE and XMDR. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:19 -08:00
Jiri Pirko	69ba53e72b	mlxsw: spectrum_router: Pass non-register proto enum to __mlxsw_sp_router_set_abort_trap() Don't pass RALXX register enum and rather pass enum mlxsw_sp_l3proto to __mlxsw_sp_router_set_abort_trap(). This is in preparation to fib entry pack implementation by XMDR register. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-12 15:55:19 -08:00
Kaixu Xia	785d21b826	net/mlx4: Assign boolean values to a bool variable Fix the following coccinelle warnings: ./drivers/net/ethernet/mellanox/mlx4/en_rx.c:687:1-17: WARNING: Assignment of 0/1 to bool variable Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/1604732038-6057-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-09 17:37:53 -08:00
Jakub Kicinski	c9448e828d	mlx5-updates-2020-11-03 This series includes updates to mlx5 software steering component. 1) Few improvements in the DR area, such as removing unneeded checks, renaming to better general names, refactor in some places, etc. 2) Software steering (DR) Memory management improvements This patch series contains SW Steering memory management improvements: using buddy allocator instead of an existing bucket allocator, and several other optimizations. The buddy system is a memory allocation and management algorithm that manages memory in power of two increments. The algorithm is well-known and well-described, such as here: https://en.wikipedia.org/wiki/Buddy_memory_allocation Linux uses this algorithm for managing and allocating physical pages, as described here: https://www.kernel.org/doc/gorman/html/understand/understand009.html In our case, although the algorithm in principal is similar to the Linux physical page allocator, the "building blocks" and the circumstances are different: in SW steering, buddy allocator doesn't really allocates a memory, but rather manages ICM (Interconnect Context Memory) that was previously allocated and registered. The ICM memory that is used in SW steering is always power of 2 (order), so buddy system is a good fit for this. Patches in this series: [PATH 4] net/mlx5: DR, Add buddy allocator utilities This patch adds a modified implementation of a well-known buddy allocator, adjusted for SW steering needs: the algorithm in principal is similar to the Linux physical page allocator, but in our case buddy allocator doesn't really allocate a memory, but rather manages ICM memory that was previously allocated and registered. [PATH 5] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management This patch changes ICM management of SW steering to use buddy-system mechanism Instead of the previous bucket management. [PATH 6] net/mlx5: DR, Sync chunks only during free This patch makes syncing happen only when freeing memory chunks. [PATH 7] net/mlx5: DR, ICM memory pools sync optimization This patch adds tracking of pool's "hot" memory and makes the check whether steering sync is required much shorter and faster. [PATH 8] net/mlx5: DR, Free buddy ICM memory if it is unused This patch adds tracking buddy's used ICM memory, and frees the buddy if all its memory becomes unused. 3) Misc code cleanups -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl+kW/sACgkQSD+KveBX +j7pRwf+KH6vIkwKlt0I2cYUmUINAkz0o1E82wXGx5Q81iMJLeIeMPKiatai6/0r BIYD8t8oagOVw5OU+H9DbVIR47wz6pB90bkQIrIJk0S3ocVcDfLlN0ssbwCdEDlH jj56SB6jjVVj1LlTXSVfUrXEKJ2FBjTxPbNtVL/NLW0GQkoJM5RaYjK++ZB58o+O YI1Gb2w+FT5vVHdRVxs/2a/NTy71VQOrYctkhql4/8P0SsstPvhgOD/oRU26l8Il Qpl68H/0N9KM04SlOlD91AwaWhPEIBP1qFSgRnfStmqxAfQlAfqajOoSntK8Vkz0 0yRF7pVLELhzxx+IR/W9Vpdf7uzjeA== =rQcg -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-11-03 This series includes updates to mlx5 software steering component. 1) Few improvements in the DR area, such as removing unneeded checks, renaming to better general names, refactor in some places, etc. 2) Software steering (DR) Memory management improvements This patch series contains SW Steering memory management improvements: using buddy allocator instead of an existing bucket allocator, and several other optimizations. The buddy system is a memory allocation and management algorithm that manages memory in power of two increments. The algorithm is well-known and well-described, such as here: https://en.wikipedia.org/wiki/Buddy_memory_allocation Linux uses this algorithm for managing and allocating physical pages, as described here: https://www.kernel.org/doc/gorman/html/understand/understand009.html In our case, although the algorithm in principal is similar to the Linux physical page allocator, the "building blocks" and the circumstances are different: in SW steering, buddy allocator doesn't really allocates a memory, but rather manages ICM (Interconnect Context Memory) that was previously allocated and registered. The ICM memory that is used in SW steering is always power of 2 (order), so buddy system is a good fit for this. Patches in this series: [PATH 4] net/mlx5: DR, Add buddy allocator utilities This patch adds a modified implementation of a well-known buddy allocator, adjusted for SW steering needs: the algorithm in principal is similar to the Linux physical page allocator, but in our case buddy allocator doesn't really allocate a memory, but rather manages ICM memory that was previously allocated and registered. [PATH 5] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management This patch changes ICM management of SW steering to use buddy-system mechanism Instead of the previous bucket management. [PATH 6] net/mlx5: DR, Sync chunks only during free This patch makes syncing happen only when freeing memory chunks. [PATH 7] net/mlx5: DR, ICM memory pools sync optimization This patch adds tracking of pool's "hot" memory and makes the check whether steering sync is required much shorter and faster. [PATH 8] net/mlx5: DR, Free buddy ICM memory if it is unused This patch adds tracking buddy's used ICM memory, and frees the buddy if all its memory becomes unused. 3) Misc code cleanups * tag 'mlx5-updates-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net: mlx5: Replace in_irq() usage net/mlx5: Cleanup kernel-doc warnings net/mlx4: Cleanup kernel-doc warnings net/mlx5e: Validate stop_room size upon user input net/mlx5: DR, Free unused buddy ICM memory net/mlx5: DR, ICM memory pools sync optimization net/mlx5: DR, Sync chunks only during free net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets net/mlx5: DR, Add buddy allocator utilities net/mlx5: DR, Rename matcher functions to be more HW agnostic net/mlx5: DR, Rename builders HW specific names net/mlx5: DR, Remove unused member of action struct ==================== Link: https://lore.kernel.org/r/20201105201242.21716-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-05 18:01:31 -08:00
Maxim Mikityanskiy	1a50cf9a67	net/mlx5e: Fix incorrect access of RCU-protected xdp_prog rq->xdp_prog is RCU-protected and should be accessed only with rcu_access_pointer for the NULL check in mlx5e_poll_rx_cq. rq->xdp_prog may change on the fly only from one non-NULL value to another non-NULL value, so the checks in mlx5e_xdp_handle and mlx5e_poll_rx_cq will have the same result during one NAPI cycle, meaning that no additional synchronization is needed. Fixes: `fe45386a20` ("net/mlx5e: Use RCU to protect rq->xdp_prog") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:17:06 -08:00
Aya Levin	c5eb51adf0	net/mlx5e: Fix VXLAN synchronization after function reload During driver reload, perform firmware tear-down which results in firmware losing the configured VXLAN ports. These ports are still available in the driver's database. Fix this by cleaning up driver's VXLAN database in the nic unload flow, before firmware tear-down. With that, minimize mlx5_vxlan_destroy() to remove only what was added in mlx5_vxlan_create() and warn on leftover UDP ports. Fixes: `18a2b7f969` ("net/mlx5: convert to new udp_tunnel infrastructure") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:17:06 -08:00
Parav Pandit	ae35859445	net/mlx5: E-switch, Avoid extack error log for disabled vport When E-switch vport is disabled, querying its hardware address is unsupported. Avoid setting extack error log message in such case. Fixes: `f099fde16d` ("net/mlx5: E-switch, Support querying port function mac address") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:17:06 -08:00
Maor Gottlieb	465e7baab6	net/mlx5: Fix deletion of duplicate rules When a rule is duplicated, the refcount of the rule is increased so only the second deletion of the rule should cause destruction of the FTE. Currently, the FTE will be destroyed in the first deletion of rule since the modify_mask will be 0. Fix it and call to destroy FTE only if all the rules (FTE's children) have been removed. Fixes: `718ce4d601` ("net/mlx5: Consolidate update FTE for all removal changes") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:17:06 -08:00
Maxim Mikityanskiy	f42139ba49	net/mlx5e: Use spin_lock_bh for async_icosq_lock async_icosq_lock may be taken from softirq and non-softirq contexts. It requires protection with spin_lock_bh, otherwise a softirq may be triggered in the middle of the critical section, and it may deadlock if it tries to take the same lock. This patch fixes such a scenario by using spin_lock_bh to disable softirqs on that CPU while inside the critical section. Fixes: `8d94b590f1` ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:17:05 -08:00
Vlad Buslov	78c906e430	net/mlx5e: Protect encap route dev from concurrent release In functions mlx5e_route_lookup_ipv{4\|6}() route_dev can be arbitrary net device and not necessary mlx5 eswitch port representor. As such, in order to ensure that route_dev is not destroyed concurrent the code needs either explicitly take reference to the device before releasing reference to rtable instance or ensure that caller holds rtnl lock. First approach is chosen as a fix since rtnl lock dependency was intentionally removed from mlx5 TC layer. To prevent unprotected usage of route_dev in encap code take a reference to the device before releasing rt. Don't save direct pointer to the device in mlx5_encap_entry structure and use ifindex instead. Modify users of route_dev pointer to properly obtain the net device instance from its ifindex. Fixes: `61086f3910` ("net/mlx5e: Protect encap hash table with mutex") Fixes: `6707f74be8` ("net/mlx5e: Update hw flows when encap source mac changed") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:17:05 -08:00
Maor Dickman	e68e28b4a9	net/mlx5e: Fix modify header actions memory leak Modify header actions are allocated during parse tc actions and only freed during the flow creation, however, on error flow the allocated memory is wrongly unfreed. Fix this by calling dealloc_mod_hdr_actions in __mlx5e_add_fdb_flow and mlx5e_add_nic_flow error flow. Fixes: `d7e75a325c` ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions") Fixes: `2f4fe4cab0` ("net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:17:05 -08:00
Sebastian Andrzej Siewior	5144368571	net: mlx5: Replace in_irq() usage mlx5_eq_async_int() uses in_irq() to decide whether eq::lock needs to be acquired and released with spin_[un]lock() or the irq saving/restoring variants. The usage of in_*() in drivers is phased out and Linus clearly requested that code which changes behaviour depending on context should either be seperated or the context be conveyed in an argument passed by the caller, which usually knows the context. mlx5_eq_async_int() knows the context via the action argument already so using it for the lock variant decision is a straight forward replacement for in_irq(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:31 -08:00
Saeed Mahameed	6c6132032d	net/mlx5: Cleanup kernel-doc warnings $ git ls-files *.[ch] \| egrep drivers/net/ethernet/mellanox/ \| \ xargs scripts/kernel-doc -none drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57: warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_I2C' not described ... drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57: warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_DONTCARE' not described ... drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:118: warning: Function parameter or member 'cb_arg' not described ... drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160: warning: Function parameter or member 'conn' not described ... drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160: warning: Excess function parameter 'fdev' description ... Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>	2020-11-05 12:09:31 -08:00
Saeed Mahameed	7c36e785d6	net/mlx4: Cleanup kernel-doc warnings $ git ls-files *.[ch] \| egrep drivers/net/ethernet/mellanox/ \| \ xargs scripts/kernel-doc -none drivers/net/ethernet/mellanox/mlx4/fw_qos.h:144: warning: Function parameter or member 'in_param' not described ... drivers/net/ethernet/mellanox/mlx4/fw_qos.h:144: warning: Excess function parameter 'out_param' description ... Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>	2020-11-05 12:09:30 -08:00
Vladyslav Tarasiuk	579524c6ea	net/mlx5e: Validate stop_room size upon user input Stop room is a space that may be taken by WQEs in the SQ during a packet transmit. It is used to check if next packet has enough room in the SQ. Stop room guarantees this packet can be served and if not, the queue is stopped, so no more packets are passed to the driver until it's ready. Currently, stop_room size is calculated and validated upon tx queues allocation. This makes it impossible to know if user provided valid input for certain parameters when interface is down. Instead, store stop_room in mlx5e_sq_param and create mlx5e_validate_params(), to validate its fields upon user input even when the interface is down. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:30 -08:00
Yevgeny Kliteynik	284836d966	net/mlx5: DR, Free unused buddy ICM memory Track buddy's used ICM memory, and free it if all of the buddy's memory bacame unused. Do this only for STEs. MODIFY_ACTION buddies are much smaller, so in case there is a large amount of modify_header actions, which result in large amount of MODIFY_ACTION buddies, doing this cleanup during sync will result in performance hit while not freeing significant amount of memory. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:30 -08:00
Yevgeny Kliteynik	1c58651412	net/mlx5: DR, ICM memory pools sync optimization Track the pool's hot ICM memory when freeing/allocating chunk, so that when checking if the sync is required, just check if the pool hot memory has reached the sync threshold. Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:30 -08:00
Yevgeny Kliteynik	3eb1006a3b	net/mlx5: DR, Sync chunks only during free When freeing chunks, we want to sync the steering so that all the "hot" memory will be written to ICM and all the chunks that are in the hot_list will be actually destroyed. When allocating from the pool, we don't have a need to sync the steering, as we're not freeing anything, and sync might just hurt the performance in terms of flow-per-second offloaded. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:29 -08:00
Yevgeny Kliteynik	a00cd87880	net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets Till now in order to manage the ICM memory we used bucket mechanism, which kept a bucket per specified size (sizes were between 1 block to 2^21 blocks). Now changing that with buddy-system mechanism, which gives us much more flexible way to manage the ICM memory. Its biggest advantage over the bucket is by using the same ICM memory area for all the sizes of blocks, which reduces the memory consumption. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:29 -08:00
Yevgeny Kliteynik	3b72422dea	net/mlx5: DR, Add buddy allocator utilities Add implementation of SW Steering variation of buddy allocator. The buddy system for ICM memory uses 2 main data structures: - Bitmap per order, that keeps the current state of allocated blocks for this order - Indicator for the number of available blocks per each order Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:29 -08:00
Yevgeny Kliteynik	8a8a102300	net/mlx5: DR, Rename matcher functions to be more HW agnostic Remove flex parser from the matcher function names since the matcher should not be aware of such HW specific details. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:28 -08:00
Yevgeny Kliteynik	de1facaf56	net/mlx5: DR, Rename builders HW specific names We will support multiple STE versions. The existing naming is not suitable for newer versions. Removed the HW specific details and renamed with a more general names. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:28 -08:00
Yevgeny Kliteynik	77662e75e0	net/mlx5: DR, Remove unused member of action struct Struct mlx5dr_action doesn't use this member Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-11-05 12:09:28 -08:00
Jiri Pirko	803be1085d	mlxsw: spectrum_router: Introduce low-level ops and implement them for RALXX regs In preparation for support of XM router implementation which uses different registers to work with trees and FIB entries, introduce a structure to hold low-level ops and implement tree manipulation register ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-03 17:27:15 -08:00
Jiri Pirko	fb281f24f8	mlxsw: reg: Add XRALXX Registers Add a couple of registers used to manipulate LPM trees on XM: The XRALTA is used to allocate the XLT LPM trees. The XRALST is used to set and query the structure of an XLT LPM tree. The XRALTB register is used to bind virtual router and protocol to an allocated LPM tree. Since the XM registers are identical to the legacy router registers with a fixed offset, re-use their pack functions. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-03 17:27:15 -08:00
Tom Rix	c568db7fd0	net/mlx4_core : remove unneeded semicolon A semicolon is not needed after a switch statement. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20201101140528.2279424-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:51:17 -08:00
Linus Torvalds	934291ffb6	Networking fixes for 5.10-rc2. Current release regressions: - r8169: fix forced threading conflicting with other shared interrupts; we tried to fix the use of raise_softirq_irqoff from an IRQ handler on RT by forcing hard irqs, but this driver shares legacy PCI IRQs so drop the _irqoff() instead - tipc: fix memory leak caused by a recent syzbot report fix to tipc_buf_append() Current release - bugs in new features: - devlink: Unlock on error in dumpit() and fix some error codes - net/smc: fix null pointer dereference in smc_listen_decline() Previous release - regressions: - tcp: Prevent low rmem stalls with SO_RCVLOWAT. - net: protect tcf_block_unbind with block lock - ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering to only send legal frames to the hypervisor was too strict - net: hns3: Clear the CMDQ registers before unmapping BAR region; incorrect cleanup order was leading to a crash - bnxt_en - handful of fixes to fixes: - Send HWRM_FUNC_RESET fw command unconditionally, even if there are PCIe errors being reported - Check abort error state in bnxt_open_nic(). - Invoke cancel_delayed_work_sync() for PFs also. - Fix regression in workqueue cleanup logic in bnxt_remove_one(). - mlxsw: Only advertise link modes supported by both driver and device, after removal of 56G support from the driver 56G was not cleared from advertised modes - net/smc: fix suppressed return code Previous release - always broken: - netem: fix zero division in tabledist, caused by integer overflow - bnxt_en: Re-write PCI BARs after PCI fatal error. - cxgb4: set up filter action after rewrites - net: ipa: command payloads already mapped Misc: - s390/ism: fix incorrect system EID, it's okay to change since it was added in current release - vsock: use ns_capable_noaudit() on socket create to suppress false positive audit messages Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+bGTcACgkQMUZtbf5S IrtMvxAAldlA7x22atOHJ2HMTqUGK3rlIQYgxlWJbfDnA7Ui4rZTDa/K0VkuS4ey rfaBf37XLDmzZkHgYvXG1qV2kB0MrXQqF7jJn+BNlAuM1kIsURt85Y2FxVu/+x6X wWtBgg/D77VXpeMimGcp8wBg5xFlUDdTezo+tInSuY9ahi1dUQx3ZSBTgqz3a5Vn wUwD7U0wkBEHkZFeLE6u0tdN9wY8IHH6cbMfzfnPxxIv6VVUOcQcvbomc+reEPhH vxeCHg7tK3yxbe9cPEbuwVDpoapB8Y627rv08Njhfuxx6Yysp/OOvUNRIBeD/7Gi TiZc6RMQ9XZ9QoGueaxFVSFIGRpRIQiO/gh+O5lWVX8dGsIjlKnw2E8gWmSS48YP cMAez0Fe+CJ2S2QNFbGVyJJX6xOl5h6kQaf88OiEhudpEUgyz156MNVwbJnE4fYk 8GONCIea1hNjLQ1VUfcQEYdxChWVeAoUEZIFcK2YKA+1w9Ris6hV21j/aUxYXQRt RGOALFUtCRIEX28ZW8eEyXgp1EdUvp7qcIK5YZEF6YHWlRxQ8LkU6qhD7Mm2oqkE fydoMDz9TEBaWqFtpgQmZH76JYqd7btCsR2YPwnlKmcKQ3tEKtW0NKt1QH/DKcvm nmDA6A+52XSbar1sRlVPnr3IGfodqGQ3A35sVFS8jkcmMvDRlbk= =reLi -----END PGP SIGNATURE----- Merge tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Current release regressions: - r8169: fix forced threading conflicting with other shared interrupts; we tried to fix the use of raise_softirq_irqoff from an IRQ handler on RT by forcing hard irqs, but this driver shares legacy PCI IRQs so drop the _irqoff() instead - tipc: fix memory leak caused by a recent syzbot report fix to tipc_buf_append() Current release - bugs in new features: - devlink: Unlock on error in dumpit() and fix some error codes - net/smc: fix null pointer dereference in smc_listen_decline() Previous release - regressions: - tcp: Prevent low rmem stalls with SO_RCVLOWAT. - net: protect tcf_block_unbind with block lock - ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering to only send legal frames to the hypervisor was too strict - net: hns3: Clear the CMDQ registers before unmapping BAR region; incorrect cleanup order was leading to a crash - bnxt_en - handful of fixes to fixes: - Send HWRM_FUNC_RESET fw command unconditionally, even if there are PCIe errors being reported - Check abort error state in bnxt_open_nic(). - Invoke cancel_delayed_work_sync() for PFs also. - Fix regression in workqueue cleanup logic in bnxt_remove_one(). - mlxsw: Only advertise link modes supported by both driver and device, after removal of 56G support from the driver 56G was not cleared from advertised modes - net/smc: fix suppressed return code Previous release - always broken: - netem: fix zero division in tabledist, caused by integer overflow - bnxt_en: Re-write PCI BARs after PCI fatal error. - cxgb4: set up filter action after rewrites - net: ipa: command payloads already mapped Misc: - s390/ism: fix incorrect system EID, it's okay to change since it was added in current release - vsock: use ns_capable_noaudit() on socket create to suppress false positive audit messages" * tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) r8169: fix issue with forced threading in combination with shared interrupts netem: fix zero division in tabledist ibmvnic: fix ibmvnic_set_mac mptcp: add missing memory scheduling in the rx path tipc: fix memory leak caused by tipc_buf_append() gtp: fix an use-before-init in gtp_newlink() net: protect tcf_block_unbind with block lock ibmveth: Fix use of ibmveth in a bridge. net/sched: act_mpls: Add softdep on mpls_gso.ko ravb: Fix bit fields checking in ravb_hwtstamp_get() devlink: Unlock on error in dumpit() devlink: Fix some error codes chelsio/chtls: fix memory leaks in CPL handlers chelsio/chtls: fix deadlock issue net: hns3: Clear the CMDQ registers before unmapping BAR region bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally. bnxt_en: Check abort error state in bnxt_open_nic(). bnxt_en: Re-write PCI BARs after PCI fatal error. bnxt_en: Invoke cancel_delayed_work_sync() for PFs also. bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one(). ...	2020-10-29 12:55:02 -07:00
Amit Cohen	0daf2bf5a2	mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish() Each EMAD transaction stores the skb used to issue the EMAD request ('trans->tx_skb') so that the request could be retried in case of a timeout. The skb can be freed when a corresponding response is received or as part of the retry logic (e.g., failed retransmit, exceeded maximum number of retries). The two tasks (i.e., response processing and retransmits) are synchronized by the atomic 'trans->active' field which ensures that responses to inactive transactions are ignored. In case of a failed retransmit the transaction is finished and all of its resources are freed. However, the current code does not mark it as inactive. Syzkaller was able to hit a race condition in which a concurrent response is processed while the transaction's resources are being freed, resulting in a use-after-free [1]. Fix the issue by making sure to mark the transaction as inactive after a failed retransmit and free its resources only if a concurrent task did not already do that. [1] BUG: KASAN: use-after-free in consume_skb+0x30/0x370 net/core/skbuff.c:833 Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004 CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192 instrument_atomic_read include/linux/instrumented.h:56 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] refcount_read include/linux/refcount.h:147 [inline] skb_unref include/linux/skbuff.h:1044 [inline] consume_skb+0x30/0x370 net/core/skbuff.c:833 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592 mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline] mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672 mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline] mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550 __do_softirq+0x223/0x964 kernel/softirq.c:292 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711 Allocated by task 1006: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:494 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467 slab_post_alloc_hook mm/slab.h:586 [inline] slab_alloc_node mm/slub.c:2824 [inline] slab_alloc mm/slub.c:2832 [inline] kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837 __build_skb+0x21/0x60 net/core/skbuff.c:311 __netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464 netdev_alloc_skb include/linux/skbuff.h:2810 [inline] mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline] mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline] mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817 mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831 mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline] mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365 mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037 devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765 genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline] genl_family_rcv_msg net/netlink/genetlink.c:714 [inline] genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470 genl_rcv+0x24/0x40 net/netlink/genetlink.c:742 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0x150/0x190 net/socket.c:671 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2359 ___sys_sendmsg+0xff/0x170 net/socket.c:2413 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2446 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 73: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455 slab_free_hook mm/slub.c:1474 [inline] slab_free_freelist_hook mm/slub.c:1507 [inline] slab_free mm/slub.c:3072 [inline] kmem_cache_free+0xbe/0x380 mm/slub.c:3088 kfree_skbmem net/core/skbuff.c:622 [inline] kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616 __kfree_skb net/core/skbuff.c:679 [inline] consume_skb net/core/skbuff.c:837 [inline] consume_skb+0xe1/0x370 net/core/skbuff.c:831 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592 mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613 mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415 kthread+0x355/0x470 kernel/kthread.c:291 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293 The buggy address belongs to the object at ffff88804f5703c0 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 212 bytes inside of 224-byte region [ffff88804f5703c0, ffff88804f5704a0) The buggy address belongs to the page: page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x100000000000200(slab) raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400 raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc Fixes: `caf7297e7a` ("mlxsw: core: Introduce support for asynchronous EMAD register access") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:45:50 -07:00
Ido Schimmel	adc80b6cfe	mlxsw: core: Fix memory leak on module removal Free the devlink instance during the teardown sequence in the non-reload case to avoid the following memory leak. unreferenced object 0xffff888232895000 (size 2048): comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff .P.2.....P.2.... backtrace: [<00000000c704e9a6>] __kmalloc+0x13a/0x2a0 [<00000000ee30129d>] devlink_alloc+0xff/0x760 [<0000000092ab3e5d>] 0xffffffffa042e5b0 [<000000004f3f8a31>] 0xffffffffa042f6ad [<0000000092800b4b>] 0xffffffffa0491df3 [<00000000c4843903>] local_pci_probe+0xcb/0x170 [<000000006993ded7>] pci_device_probe+0x2c2/0x4e0 [<00000000a8e0de75>] really_probe+0x2c5/0xf90 [<00000000d42ba75d>] driver_probe_device+0x1eb/0x340 [<00000000bcc95e05>] device_driver_attach+0x294/0x300 [<000000000e2bc177>] __driver_attach+0x167/0x2f0 [<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0 [<000000003cd5a91e>] driver_attach+0x45/0x60 [<000000000041ce51>] bus_add_driver+0x3b8/0x720 [<00000000f5215476>] driver_register+0x230/0x4e0 [<00000000d79356f5>] __pci_register_driver+0x190/0x200 Fixes: `a22712a962` ("mlxsw: core: Fix devlink unregister flow") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vadim Pasternak <vadimp@nvidia.com> Tested-by: Oleksandr Shamray <oleksandrs@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:45:50 -07:00
Amit Cohen	1601559be3	mlxsw: Only advertise link modes supported by both driver and device During port creation the driver instructs the device to advertise all the supported link modes queried from the device. Since cited commit not all the link modes supported by the device are supported by the driver. This can result in the device negotiating a link mode that is not recognized by the driver causing ethtool to show an unsupported speed: $ ethtool swp1 ... Speed: Unknown! This is especially problematic when the netdev is enslaved to a bond, as the bond driver uses unknown speed as an indication that the link is down: [13048.900895] net_ratelimit: 86 callbacks suppressed [13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex [13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex Fix this by making sure that only link modes that are supported by both the device and the driver are advertised. Fixes: `b97cd89126` ("mlxsw: Remove 56G speed support") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:45:50 -07:00
Parav Pandit	fbdd0049d9	RDMA/mlx5: Fix devlink deadlock on net namespace deletion When a mlx5 core devlink instance is reloaded in different net namespace, its associated IB device is deleted and recreated. Example sequence is: $ ip netns add foo $ devlink dev reload pci/0000:00:08.0 netns foo $ ip netns del foo mlx5 IB device needs to attach and detach the netdevice to it through the netdev notifier chain during load and unload sequence. A below call graph of the unload flow. cleanup_net() down_read(&pernet_ops_rwsem); <- first sem acquired ops_pre_exit_list() pre_exit() devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one() [...] mlx5_ib_remove() mlx5_ib_unbind_slave_port() mlx5_remove_netdev_notifier() unregister_netdevice_notifier() down_write(&pernet_ops_rwsem);<- recurrsive lock Hence, when net namespace is deleted, mlx5 reload results in deadlock. When deadlock occurs, devlink mutex is also held. This not only deadlocks the mlx5 device under reload, but all the processes which attempt to access unrelated devlink devices are deadlocked. Hence, fix this by mlx5 ib driver to register for per net netdev notifier instead of global one, which operats on the net namespace without holding the pernet_ops_rwsem. Fixes: `4383cfcc65` ("net/mlx5: Add devlink reload") Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:18:19 -03:00
Linus Torvalds	a1e16bc7d5	RDMA 5.10 pull request The typical set of driver updates across the subsystem: - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns, usnic, qib, qedr, cxgb4, hns, bnxt_re - Various rtrs fixes and updates - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA wasn't working right - Use tracepoints instead of pr_debug in the CM code - Scrub the locking in ucma and cma to close more syzkaller bugs - Use tasklet_setup in the subsystem - Revert the idea that 'destroy' operations are not allowed to fail at the driver level. This proved unworkable from a HW perspective. - Revise how the umem API works so drivers make fewer mistakes using it - XRC support for qedr - Convert uverbs objects RWQ and MW to new the allocation scheme - Large queue entry sizes for hns - Use hmm_range_fault() for mlx5 On Demand Paging - uverbs APIs to inspect the GID table instead of sysfs - Move some of the RDMA code for building large page SGLs into lib/scatterlist -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2 l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi 5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk 46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310 /b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI 16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE= =LKpl -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "A usual cycle for RDMA with a typical mix of driver and core subsystem updates: - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns, usnic, qib, qedr, cxgb4, hns, bnxt_re - Various rtrs fixes and updates - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA wasn't working right - Use tracepoints instead of pr_debug in the CM code - Scrub the locking in ucma and cma to close more syzkaller bugs - Use tasklet_setup in the subsystem - Revert the idea that 'destroy' operations are not allowed to fail at the driver level. This proved unworkable from a HW perspective. - Revise how the umem API works so drivers make fewer mistakes using it - XRC support for qedr - Convert uverbs objects RWQ and MW to new the allocation scheme - Large queue entry sizes for hns - Use hmm_range_fault() for mlx5 On Demand Paging - uverbs APIs to inspect the GID table instead of sysfs - Move some of the RDMA code for building large page SGLs into lib/scatterlist" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits) RDMA/ucma: Fix use after free in destroy id flow RDMA/rxe: Handle skb_clone() failure in rxe_recv.c RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI RDMA: Explicitly pass in the dma_device to ib_register_device lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values IB/mlx4: Convert rej_tmout radix-tree to XArray RDMA/rxe: Fix bug rejecting all multicast packets RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt() RDMA/rxe: Remove duplicate entries in struct rxe_mr IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS IB/rdmavt: Fix sizeof mismatch MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl. RDMA/bnxt_re: Use rdma_umem_for_each_dma_block() RDMA/umem: Move to allocate SG table from pages lib/scatterlist: Add support in dynamic allocation of SG table from pages tools/testing/scatterlist: Show errors in human readable form tools/testing/scatterlist: Rejuvenate bit-rotten test RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces RDMA/uverbs: Expose the new GID query API to user space ...	2020-10-17 11:18:18 -07:00
Jason Gunthorpe	16e7483e6f	Merge branch 'dynamic_sg' into rdma.git for-next From Maor Gottlieb says: ==================== This series extends __sg_alloc_table_from_pages to allow chaining of new pages to an already initialized SG table. This allows for drivers to utilize the optimization of merging contiguous pages without a need to pre allocate all the pages and hold them in a very large temporary buffer prior to the call to SG table initialization. The last patch changes the Infiniband core to use the new API. It removes duplicate functionality from the code and benefits from the optimization of allocating dynamic SG table from pages. In huge pages system of 2MB page size, without this change, the SG table would contain x512 SG entries. ==================== * branch 'dynamic_sg': RDMA/umem: Move to allocate SG table from pages lib/scatterlist: Add support in dynamic allocation of SG table from pages tools/testing/scatterlist: Show errors in human readable form tools/testing/scatterlist: Rejuvenate bit-rotten test	2020-10-16 12:40:58 -03:00
Linus Torvalds	9ff9b0d392	networking changes for the 5.10 merge window Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. Allow more than 255 IPv4 multicast interfaces. Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. Allow more calls to same peer in RxRPC. Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. Add TC actions for implementing MPLS L2 VPNs. Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. Support sleepable BPF programs, initially targeting LSM and tracing. Add bpf_d_path() helper for returning full path for given 'struct path'. Make bpf_tail_call compatible with bpf-to-bpf calls. Allow BPF programs to call map_update_elem on sockmaps. Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). Support listing and getting information about bpf_links via the bpf syscall. Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. Add XDP support for Intel's igb driver. Support offloading TC flower classification and filtering rules to mscc_ocelot switches. Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81 PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy 2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP 81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc= =bc1U -----END PGP SIGNATURE----- Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ...	2020-10-15 18:42:13 -07:00
Jakub Kicinski	2295cddf99	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Minor conflicts in net/mptcp/protocol.h and tools/testing/selftests/net/Makefile. In both cases code was added on both sides in the same place so just keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-15 12:43:21 -07:00
Linus Torvalds	3e4fb4346c	SPDX patches for 5.10-rc1 Here are some SPDX-specific changes for 5.10-rc1. They include: - driver fixes to make spdxcheck.pl work properly - add GFDL licenses as "deprecated" but required due to some of our documentation using them - add Zlib license as "deprecated" but required because we have code with this license in the tree. - convert some drivers to have SPDX identifiers that previously didn't have them. All have been in linux-next for a very long time with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX4c6oA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yl35ACg2i+pP5CBExSzQUtA1Tx/UD2CVNMAoIAQChwj SHZurDuyHkEiCdB+5n1u =C9qR -----END PGP SIGNATURE----- Merge tag 'spdx-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are some SPDX-specific changes for 5.10-rc1. They include: - driver fixes to make spdxcheck.pl work properly - add GFDL licenses as "deprecated" but required due to some of our documentation using them - add Zlib license as "deprecated" but required because we have code with this license in the tree. - convert some drivers to have SPDX identifiers that previously didn't have them. All have been in linux-next for a very long time with no reported issues" * tag 'spdx-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: scripts/spdxcheck.py: handle license identifiers in XML comments net/mlx5: IPsec: make spdxcheck.py happy LICENSES/deprecated: add Zlib license text LICENSE: add GFDL deprecated licenses net/qla3xxx: Convert to SPDX license identifiers net/qlge: Convert to SPDX license identifiers net/qlcnic: Convert to SPDX license identifiers scsi/qla2xxx: Convert to SPDX license identifiers scsi/qla4xxx: Convert to SPDX license identifiers	2020-10-14 16:19:42 -07:00
Raed Salem	5be019040c	net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offload In the TX data path, spot packets with xfrm stack IPsec offload indication. Fill Software-Parser segment in TX descriptor so that the hardware may parse the ESP protocol, and perform TX checksum offload on the inner payload. Support GSO, by providing the trailer data and ICV placeholder so HW can fill it post encryption operation. Padding alignment cannot be performed in HW (ConnectX-6Dx) due to a bug. Software can overcome this limitation by adding NETIF_F_HW_ESP to the gso_partial_features field in netdev so the packets being aligned by the stack. l4_inner_checksum cannot be offloaded by HW for IPsec tunnel type packet. Note that for GSO SKBs, the stack does not include an ESP trailer, unlike the non-GSO case. Below is the iperf3 performance report on two server of 24 cores Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with ConnectX6-DX. All the bandwidth test uses iperf3 TCP traffic with packet size 128KB. Each tunnel uses one iperf3 stream with one thread (option -P1). TX crypto offload shows improvements on both bandwidth and CPU utilization. ---------------------------------------------------------------------- Mode \| Num tunnel \| BW \| Send CPU util \| Recv CPU util \| \| (Gbps) \| (Average %) \| (Average %) ---------------------------------------------------------------------- Cryto offload \| \| \| \| (RX only) \| 1 \| 4.7 \| 4.2 \| 3.5 ---------------------------------------------------------------------- Cryto offload \| \| \| \| (RX only) \| 24 \| 15.6 \| 20 \| 10 ---------------------------------------------------------------------- Non-offload \| 1 \| 4.6 \| 4 \| 5 ---------------------------------------------------------------------- Non-offload \| 24 \| 11.9 \| 16 \| 12 ---------------------------------------------------------------------- Cryto offload \| \| \| \| (TX & RX) \| 1 \| 11.9 \| 2.1 \| 5.9 ---------------------------------------------------------------------- Cryto offload \| \| \| \| (TX & RX) \| 24 \| 38 \| 9.5 \| 27.5 ---------------------------------------------------------------------- Cryto offload \| \| \| \| (TX only) \| 1 \| 4.7 \| 0.7 \| 5 ---------------------------------------------------------------------- Cryto offload \| \| \| \| (TX only) \| 24 \| 14.5 \| 6 \| 20 Regression tests show no degradation on non-ipsec and non-offload-ipsec traffics. The packet rate test uses pktgen UDP to transmit on single CPU, the instructions and cycles are measured on the transmit CPU. before: ---------------------------------------------------------------------- Non-offload \| 1 \| 4.7 \| 4.2 \| 5.1 ---------------------------------------------------------------------- Non-offload \| 24 \| 11.2 \| 14 \| 15 ---------------------------------------------------------------------- Non-ipsec \| 1 \| 28 \| 4 \| 5.7 ---------------------------------------------------------------------- Non-ipsec \| 24 \| 68.3 \| 17.8 \| 39.7 ---------------------------------------------------------------------- Non-ipsec packet rate(BURST=1000 BC=5 NCPUS=1 SIZE=60) 13.56Mpps, 456 instructions/pkt, 191 cycles/pkt after: ---------------------------------------------------------------------- Non-offload \| 1 \| 4.69 \| 4.2 \| 5 ---------------------------------------------------------------------- Non-offload \| 24 \| 11.9 \| 13.5 \| 15.1 ---------------------------------------------------------------------- Non-ipsec \| 1 \| 29 \| 3.2 \| 5.5 ---------------------------------------------------------------------- Non-ipsec \| 24 \| 68.2 \| 18.5 \| 39.8 ---------------------------------------------------------------------- Non-ipsec packet rate: 13.56Mpps, 472 instructions/pkt, 191 cycles/pkt Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-12 15:37:45 -07:00
Huy Nguyen	9b9d454ddb	net/mlx5e: IPsec: Add TX steering rule per IPsec state Add new FTE in TX IPsec FT per IPsec state. It has the same matching criteria as the RX steering rule. The IPsec FT is created/destroyed when the first/last rule is added/deleted respectively. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-12 15:37:45 -07:00
Huy Nguyen	ee92e4f1f9	net/mlx5: Add NIC TX domain namespace Add new namespace that represents the NIC TX domain. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-12 15:37:44 -07:00
Colin Ian King	825f8b0b17	net/mlx5: Fix uininitialized pointer read on pointer attr Currently the error exit path err_free kfree's attr. In the case where flow and parse_attr failed to be allocated this return path will free the uninitialized pointer attr, which is not correct. In the other case where attr fails to allocate attr does not need to be freed. So in both error exits via err_free attr should not be freed, so remove it. Addresses-Coverity: ("Uninitialized pointer read") Fixes: `ff7ea04ad5` ("net/mlx5e: Fix potential null pointer dereference") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-12 15:37:44 -07:00
Jonathan Lemon	b2b8a92733	mlx4: handle non-napi callers to napi_poll netcons calls napi_poll with a budget of 0 to transmit packets. Handle this by: - skipping RX processing - do not try to recycle TX packets to the RX cache Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-12 14:02:32 -07:00
Moshe Shemesh	bef878e865	net/mlx5: Add support for devlink reload limit no reset Add support for devlink reload action fw_activate with reload limit no_reset which does firmware live patching, updating the firmware image without reset, no downtime and no configuration lose. The driver checks if the firmware is capable of handling the pending firmware changes as a live patch. If it is then it triggers firmware live patching flow. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:53 -07:00
Moshe Shemesh	2d69356752	net/mlx5: Add support for fw live patch event Firmware live patch event notifies the driver that the firmware was just updated using live patch. In such case the driver should not reload or re-initiate entities, part to updating the firmware version and re-initiate the firmware tracer which can be updated by live patch with new strings database to help debugging an issue. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:53 -07:00
Moshe Shemesh	b4f7cbb367	net/mlx5: Add devlink param enable_remote_dev_reset support The enable_remote_dev_reset devlink param flags that the host admin allows resets by other hosts. In case it is cleared mlx5 host PF driver will send NACK on pci sync for firmware update reset request and the command will fail. By default enable_remote_dev_reset parameter is true, so pci sync for firmware update reset is enabled. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:53 -07:00
Moshe Shemesh	5ec697446f	net/mlx5: Add support for devlink reload action fw activate Add support for devlink reload action fw_activate. To activate firmware image the mlx5 driver resets the firmware and reloads it from flash. If a new image was stored on flash it will be loaded. Once this reload command is executed the driver initiates fw sync reset flow, where the firmware synchronizes all PFs on coming reset and driver reload. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:53 -07:00
Moshe Shemesh	7dd6df329d	net/mlx5: Handle sync reset abort event If firmware sends sync_reset_abort to driver the driver should clear the reset requested mode as reset is not expected any more. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:53 -07:00
Moshe Shemesh	eabe8e5e88	net/mlx5: Handle sync reset now event On sync_reset_now event the driver does reload and PCI link toggle to activate firmware upgrade reset. When the firmware sends this event it syncs the event on all PFs, so all PFs will do PCI link toggle at once. To do PCI link toggle, the driver ensures that no other device ID under the same bridge by checking that all the PF functions under the same PCI bridge have same device ID. If no other device it uses PCI bridge link control to turn link down and up. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:52 -07:00
Moshe Shemesh	38b9f903f2	net/mlx5: Handle sync reset request event Once the driver gets sync_reset_request from firmware it prepares for the coming reset and sends acknowledge. After getting this event the driver expects device reset, either it will trigger PCI reset on sync_reset_now event or such PCI reset will be triggered by another PF of the same device. So it moves to reset requested mode and if it gets PCI reset triggered by the other PF it detect the reset and reloads. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:52 -07:00
Moshe Shemesh	e7f4d0bcb8	net/mlx5: Set cap for pci sync for fw update event Set capability to notify the firmware that this host driver is capable of handling pci sync for firmware update events. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:52 -07:00
Moshe Shemesh	3180472f58	net/mlx5: Add functions to set/query MFRL register Add functions to query and set the MFRL reset options supported by firmware. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:52 -07:00
Moshe Shemesh	dc64cc7c63	devlink: Add devlink reload limit option Add reload limit to demand restrictions on reload actions. Reload limits supported: no_reset: No reset allowed, no down time allowed, no link flap and no configuration is lost. By default reload limit is unspecified and so no constraints on reload actions are required. Some combinations of action and limit are invalid. For example, driver can not reinitialize its entities without any downtime. The no_reset reload limit will have usecase in this patchset to implement restricted fw_activate on mlx5. Have the uapi parameter of reload limit ready for future support of multiselection. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:52 -07:00
Moshe Shemesh	ccdf07219d	devlink: Add reload action option to devlink reload command Add devlink reload action to allow the user to request a specific reload action. The action parameter is optional, if not specified then devlink driver re-init action is used (backward compatible). Note that when required to do firmware activation some drivers may need to reload the driver. On the other hand some drivers may need to reset the firmware to reinitialize the driver entities. Therefore, the devlink reload command returns the actions which were actually performed. Reload actions supported are: driver_reinit: driver entities re-initialization, applying devlink-param and devlink-resource values. fw_activate: firmware activate. command examples: $devlink dev reload pci/0000:82:00.0 action driver_reinit reload_actions_performed: driver_reinit $devlink dev reload pci/0000:82:00.0 action fw_activate reload_actions_performed: driver_reinit fw_activate Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-09 12:06:52 -07:00
David S. Miller	8b0308fe31	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-05 18:40:01 -07:00
Vlad Buslov	1253935ad8	net/mlx5e: Fix race condition on nhe->n pointer in neigh update Current neigh update event handler implementation takes reference to neighbour structure, assigns it to nhe->n, tries to schedule workqueue task and releases the reference if task was already enqueued. This results potentially overwriting existing nhe->n pointer with another neighbour instance, which causes double release of the instance (once in neigh update handler that failed to enqueue to workqueue and another one in neigh update workqueue task that processes updated nhe->n pointer instead of original one): [ 3376.512806] ------------[ cut here ]------------ [ 3376.513534] refcount_t: underflow; use-after-free. [ 3376.521213] Modules linked in: act_skbedit act_mirred act_tunnel_key vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp rpcrdma rdma_ucm ib_umad ib_ipoib ib_iser rdma_cm ib_cm iw_cm rfkill ib_uverbs ib_core sunrpc kvm_intel kvm iTCO_wdt iTCO_vendor_support virtio_net irqbypass net_failover crc32_pclmul lpc_ich i2c_i801 failover pcspkr i2c_smbus mfd_core ghash_clmulni_intel sch_fq_codel drm i2c _core ip_tables crc32c_intel serio_raw [last unloaded: mlxfw] [ 3376.529468] CPU: 8 PID: 22756 Comm: kworker/u20:5 Not tainted 5.9.0-rc5+ #6 [ 3376.530399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 3376.531975] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [ 3376.532820] RIP: 0010:refcount_warn_saturate+0xd8/0xe0 [ 3376.533589] Code: ff 48 c7 c7 e0 b8 27 82 c6 05 0b b6 09 01 01 e8 94 93 c1 ff 0f 0b c3 48 c7 c7 88 b8 27 82 c6 05 f7 b5 09 01 01 e8 7e 93 c1 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 [ 3376.536017] RSP: 0018:ffffc90002a97e30 EFLAGS: 00010286 [ 3376.536793] RAX: 0000000000000000 RBX: ffff8882de30d648 RCX: 0000000000000000 [ 3376.537718] RDX: ffff8882f5c28f20 RSI: ffff8882f5c18e40 RDI: ffff8882f5c18e40 [ 3376.538654] RBP: ffff8882cdf56c00 R08: 000000000000c580 R09: 0000000000001a4d [ 3376.539582] R10: 0000000000000731 R11: ffffc90002a97ccd R12: 0000000000000000 [ 3376.540519] R13: ffff8882de30d600 R14: ffff8882de30d640 R15: ffff88821e000900 [ 3376.541444] FS: 0000000000000000(0000) GS:ffff8882f5c00000(0000) knlGS:0000000000000000 [ 3376.542732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3376.543545] CR2: 0000556e5504b248 CR3: 00000002c6f10005 CR4: 0000000000770ee0 [ 3376.544483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3376.545419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3376.546344] PKRU: 55555554 [ 3376.546911] Call Trace: [ 3376.547479] mlx5e_rep_neigh_update.cold+0x33/0xe2 [mlx5_core] [ 3376.548299] process_one_work+0x1d8/0x390 [ 3376.548977] worker_thread+0x4d/0x3e0 [ 3376.549631] ? rescuer_thread+0x3e0/0x3e0 [ 3376.550295] kthread+0x118/0x130 [ 3376.550914] ? kthread_create_worker_on_cpu+0x70/0x70 [ 3376.551675] ret_from_fork+0x1f/0x30 [ 3376.552312] ---[ end trace d84e8f46d2a77eec ]--- Fix the bug by moving work_struct to dedicated dynamically-allocated structure. This enabled every event handler to work on its own private neighbour pointer and removes the need for handling the case when task is already enqueued. Fixes: `232c001398` ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:58 -07:00
Aya Levin	d4a16052bc	net/mlx5e: Fix VLAN create flow When interface is attached while in promiscuous mode and with VLAN filtering turned off, both configurations are not respected and VLAN filtering is performed. There are 2 flows which add the any-vid rules during interface attach: VLAN creation table and set rx mode. Each is relaying on the other to add any-vid rules, eventually non of them does. Fix this by adding any-vid rules on VLAN creation regardless of promiscuous mode. Fixes: `9df30601c8` ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:58 -07:00
Aya Levin	8c7353b6f7	net/mlx5e: Fix VLAN cleanup flow Prior to this patch unloading an interface in promiscuous mode with RX VLAN filtering feature turned off - resulted in a warning. This is due to a wrong condition in the VLAN rules cleanup flow, which left the any-vid rules in the VLAN steering table. These rules prevented destroying the flow group and the flow table. The any-vid rules are removed in 2 flows, but none of them remove it in case both promiscuous is set and VLAN filtering is off. Fix the issue by changing the condition of the VLAN table cleanup flow to clean also in case of promiscuous mode. mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1 ... ... ------------[ cut here ]------------ FW pages counter is 11560 after reclaiming all pages WARNING: CPU: 1 PID: 28729 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660 mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: mlx5_function_teardown+0x2f/0x90 [mlx5_core] mlx5_unload_one+0x71/0x110 [mlx5_core] remove_one+0x44/0x80 [mlx5_core] pci_device_remove+0x3e/0xc0 device_release_driver_internal+0xfb/0x1c0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x68/0x90 pci_stop_and_remove_bus_device+0x12/0x20 hv_eject_device_work+0x6f/0x170 [pci_hyperv] ? __schedule+0x349/0x790 process_one_work+0x206/0x400 worker_thread+0x34/0x3f0 ? process_one_work+0x400/0x400 kthread+0x126/0x140 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 ---[ end trace 6283bde8d26170dc ]--- Fixes: `9df30601c8` ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:58 -07:00
Aya Levin	2608a2f831	net/mlx5e: Fix return status when setting unsupported FEC mode Verify the configured FEC mode is supported by at least a single link mode before applying the command. Otherwise fail the command and return "Operation not supported". Prior to this patch, the command was successful, yet it falsely set all link modes to FEC auto mode - like configuring FEC mode to auto. Auto mode is the default configuration if a link mode doesn't support the configured FEC mode. Fixes: `b5ede32d33` ("net/mlx5e: Add support for FEC modes based on 50G per lane links") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:57 -07:00
Aya Levin	3d093bc236	net/mlx5e: Fix driver's declaration to support GRE offload Declare GRE offload support with respect to the inner protocol. Add a list of supported inner protocols on which the driver can offload checksum and GSO. For other protocols, inform the stack to do the needed operations. There is no noticeable impact on GRE performance. Fixes: `2729984149` ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:57 -07:00
Maor Dickman	2b0219898b	net/mlx5e: CT, Fix coverity issue The cited commit introduced the following coverity issue at function mlx5_tc_ct_rule_to_tuple_nat: - Memory - corruptions (OVERRUN) Overrunning array "tuple->ip.src_v6.in6_u.u6_addr32" of 4 4-byte elements at element index 7 (byte offset 31) using index "ip6_offset" (which evaluates to 7). In case of IPv6 destination address rewrite, ip6_offset values are between 4 to 7, which will cause memory overrun of array "tuple->ip.src_v6.in6_u.u6_addr32" to array "tuple->ip.dst_v6.in6_u.u6_addr32". Fixed by writing the value directly to array "tuple->ip.dst_v6.in6_u.u6_addr32" in case ip6_offset values are between 4 to 7. Fixes: `bc562be967` ("net/mlx5e: CT: Save ct entries tuples in hashtables") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:57 -07:00
Aya Levin	c3c9402373	net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU Prior to this fix, in Striding RQ mode the driver was vulnerable when receiving packets in the range (stride size - headroom, stride size]. Where stride size is calculated by mtu+headroom+tailroom aligned to the closest power of 2. Usually, this filtering is performed by the HW, except for a few cases: - Between 2 VFs over the same PF with different MTUs - On bluefield, when the host physical function sets a larger MTU than the ARM has configured on its representor and uplink representor. When the HW filtering is not present, packets that are larger than MTU might be harmful for the RQ's integrity, in the following impacts: 1) Overflow from one WQE to the next, causing a memory corruption that in most cases is unharmful: as the write happens to the headroom of next packet, which will be overwritten by build_skb(). In very rare cases, high stress/load, this is harmful. When the next WQE is not yet reposted and points to existing SKB head. 2) Each oversize packet overflows to the headroom of the next WQE. On the last WQE of the WQ, where addresses wrap-around, the address of the remainder headroom does not belong to the next WQE, but it is out of the memory region range. This results in a HW CQE error that moves the RQ into an error state. Solution: Add a page buffer at the end of each WQE to absorb the leak. Actually the maximal overflow size is headroom but since all memory units must be of the same size, we use page size to comply with UMR WQEs. The increase in memory consumption is of a single page per RQ. Initialize the mkey with all MTTs pointing to a default page. When the channels are activated, UMR WQEs will redirect the RX WQEs to the actual memory from the RQ's pool, while the overflow MTTs remain mapped to the default page. Fixes: `73281b78a3` ("net/mlx5e: Derive Striding RQ size from MTU") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:56 -07:00
Aya Levin	08a762cecc	net/mlx5e: Fix error path for RQ alloc Increase granularity of the error path to avoid unneeded free/release. Fix the cleanup to be symmetric to the order of creation. Fixes: `0ddf543226` ("xdp/mlx5: setup xdp_rxq_info") Fixes: `422d4c401e` ("net/mlx5e: RX, Split WQ objects for different RQ types") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:56 -07:00
Maor Gottlieb	732ebfab7f	net/mlx5: Fix request_irqs error flow Fix error flow handling in request_irqs which try to free irq that we failed to request. It fixes the below trace. WARNING: CPU: 1 PID: 7587 at kernel/irq/manage.c:1684 free_irq+0x4d/0x60 CPU: 1 PID: 7587 Comm: bash Tainted: G W OE 4.15.15-1.el7MELLANOXsmp-x86_64 #1 Hardware name: Advantech SKY-6200/SKY-6200, BIOS F2.00 08/06/2020 RIP: 0010:free_irq+0x4d/0x60 RSP: 0018:ffffc9000ef47af0 EFLAGS: 00010282 RAX: ffff88001476ae00 RBX: 0000000000000655 RCX: 0000000000000000 RDX: ffff88001476ae00 RSI: ffffc9000ef47ab8 RDI: ffff8800398bb478 RBP: ffff88001476a838 R08: ffff88001476ae00 R09: 000000000000156d R10: 0000000000000000 R11: 0000000000000004 R12: ffff88001476a838 R13: 0000000000000006 R14: ffff88001476a888 R15: 00000000ffffffe4 FS: 00007efeadd32740(0000) GS:ffff88047fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc9cc010008 CR3: 00000001a2380004 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: mlx5_irq_table_create+0x38d/0x400 [mlx5_core] ? atomic_notifier_chain_register+0x50/0x60 mlx5_load_one+0x7ee/0x1130 [mlx5_core] init_one+0x4c9/0x650 [mlx5_core] pci_device_probe+0xb8/0x120 driver_probe_device+0x2a1/0x470 ? driver_allows_async_probing+0x30/0x30 bus_for_each_drv+0x54/0x80 __device_attach+0xa3/0x100 pci_bus_add_device+0x4a/0x90 pci_iov_add_virtfn+0x2dc/0x2f0 pci_enable_sriov+0x32e/0x420 mlx5_core_sriov_configure+0x61/0x1b0 [mlx5_core] ? kstrtoll+0x22/0x70 num_vf_store+0x4b/0x70 [mlx5_core] kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x140 ? rcu_all_qs+0x5/0x80 ? _cond_resched+0x15/0x30 ? __sb_start_write+0x41/0x80 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x60/0x110 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: `24163189da` ("net/mlx5: Separate IRQ request/free from EQ life cycle") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:56 -07:00
Saeed Mahameed	b898ce7bcc	net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible In case of pci is offline reclaim_pages_cmd() will still try to call the FW to release FW pages, cmd_exec() in this case will return a silent success without actually calling the FW. This is wrong and will cause page leaks, what we should do is to detect pci offline or command interface un-available before tying to access the FW and manually release the FW pages in the driver. In this patch we share the code to check for FW command interface availability and we call it in sensitive places e.g. reclaim_pages_cmd(). Alternative fix: 1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value, command success simulation list. 2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd(). Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:55 -07:00
Eran Ben Elisha	410bd754cd	net/mlx5: Add retry mechanism to the command entry index allocation It is possible that new command entry index allocation will temporarily fail. The new command holds the semaphore, so it means that a free entry should be ready soon. Add one second retry mechanism before returning an error. Patch "net/mlx5: Avoid possible free of command entry while timeout comp handler" increase the possibility to bump into this temporarily failure as it delays the entry index release for non-callback commands. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:55 -07:00
Eran Ben Elisha	1d5558b1f0	net/mlx5: poll cmd EQ in case of command timeout Once driver detects a command interface command timeout, it warns the user and returns timeout error to the caller. In such case, the entry of the command is not evacuated (because only real event interrupt is allowed to clear command interface entry). If the HW event interrupt of this entry will never arrive, this entry will be left unused forever. Command interface entries are limited and eventually we can end up without the ability to post a new command. In addition, if driver will not consume the EQE of the lost interrupt and rearm the EQ, no new interrupts will arrive for other commands. Add a resiliency mechanism for manually polling the command EQ in case of a command timeout. In case resiliency mechanism will find non-handled EQE, it will consume it, and the command interface will be fully functional again. Once the resiliency flow finished, wait another 5 seconds for the command interface to complete for this command entry. Define mlx5_cmd_eq_recover() to manage the cmd EQ polling resiliency flow. Add an async EQ spinlock to avoid races between resiliency flows and real interrupts that might run simultaneously. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:55 -07:00
Eran Ben Elisha	50b2412b7e	net/mlx5: Avoid possible free of command entry while timeout comp handler Upon command completion timeout, driver simulates a forced command completion. In a rare case where real interrupt for that command arrives simultaneously, it might release the command entry while the forced handler might still access it. Fix that by adding an entry refcount, to track current amount of allowed handlers. Command entry to be released only when this refcount is decremented to zero. Command refcount is always initialized to one. For callback commands, command completion handler is the symmetric flow to decrement it. For non-callback commands, it is wait_func(). Before ringing the doorbell, increment the refcount for the real completion handler. Once the real completion handler is called, it will decrement it. For callback commands, once the delayed work is scheduled, increment the refcount. Upon callback command completion handler, we will try to cancel the timeout callback. In case of success, we need to decrement the callback refcount as it will never run. In addition, gather the entry index free and the entry free into a one flow for all command types release. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:54 -07:00
Eran Ben Elisha	432161ea26	net/mlx5: Fix a race when moving command interface to polling mode As part of driver unload, it destroys the commands EQ (via FW command). As the commands EQ is destroyed, FW will not generate EQEs for any command that driver sends afterwards. Driver should poll for later commands status. Driver commands mode metadata is updated before the commands EQ is actually destroyed. This can lead for double completion handle by the driver (polling and interrupt), if a command is executed and completed by FW after the mode was changed, but before the EQ was destroyed. Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee that only DESTROY_EQ command can be executed during this time period. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-10-02 10:59:54 -07:00
Gustavo A. R. Silva	ff7ea04ad5	net/mlx5e: Fix potential null pointer dereference Calls to kzalloc() and kvzalloc() should be null-checked in order to avoid any potential failures. In this case, a potential null pointer dereference. Fix this by adding null checks for _parse_attr_ and _flow_ right after allocation. Addresses-Coverity-ID: 1497154 ("Dereference before null check") Fixes: `c620b77215` ("net/mlx5: Refactor tc flow attributes structure") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:31 -07:00
Dan Carpenter	7b2b16ee54	net/mlx5e: Fix a use after free on error in mlx5_tc_ct_shared_counter_get() This code frees "shared_counter" and then dereferences on the next line to get the error code. Fixes: `1edae2335a` ("net/mlx5e: CT: Use the same counter for both directions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:31 -07:00
Ariel Levkovich	5efbe61788	net/mlx5: Fix dereference on pointer attr after null check When removing a flow from the slow path fdb, a flow attr struct is allocated for the rule removal process. If the allocation fails the code prints a warning message but continues with the removal flow which include dereferencing a pointer which could be null. Fix this by exiting the function in case the attr allocation failed. Fixes: `c620b77215` ("net/mlx5: Refactor tc flow attributes structure") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:31 -07:00
Parav Pandit	7be3412a76	net/mlx5: Use dma device access helper Use the PCI device directly for dma accesses as non PCI device unlikely support IOMMU and dma mappings. Introduce and use helper routine to access DMA device. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:30 -07:00
Hamdan Igbaria	036e19b90f	net/mlx5: E-Switch, Support flow source for local vport Set flow source as hint for local vport. Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:30 -07:00
Parav Pandit	c7eddc6092	net/mlx5: E-switch, Move devlink eswitch ports closer to eswitch Currently devlink eswitch ports are registered and unregistered by the representor layer. However it is better to register them at eswitch layer so that in future user initiated command port add and delete commands can also register/unregister devlink ports without depending on representor layer. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:29 -07:00
Parav Pandit	38679b5a0d	net/mlx5: E-switch, Use helper function to load unload representor To register and unregister devlink ports when loading/unload representors, refactor the code to helper functions. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:29 -07:00
Parav Pandit	2c40db2f1d	net/mlx5: E-switch, Add helper to check egress ACL need Currently only VF vports need egress ACL table. Add a generic helper to check whether a vport need egress ACL table or not. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:29 -07:00
sunils	7cd7becddd	net/mlx5: E-switch, Use PF num in metadata reg c0 Currently only 256 vports can be supported as only 8 bits are reserved for them and 8 bits are reserved for vhca_ids in metadata reg c0. To support more than 256 vports, replace vhca_id with a unique shorter 4-bit PF number which covers upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095. This will continue to generate unique metadata even if multiple PCI devices have same switch_id. Signed-off-by: sunils <sunils@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:28 -07:00
Hamdan Igbaria	0172391967	net/mlx5: DR, Add support for rule creation with flow source hint Skip the rule according to flow arrival source, in case of RX and the source is local port skip and in case of TX and the source is uplink skip, we get this info according to the flow source hint we get from upper layers when creating the rule. This is needed because for example in case of FDB table which has a TX and RX tables and we are inserting a rule with an encap action which is only a TX action, in this case rule will fail on RX, so we can rely on the flow source hint and skip RX in such case. Until now we relied on metadata regc_0 that upper layer mapped the port in the regc_0, but the problem is that upper layer did not always use regc_0 for port mapping, so now we added support to flow source hint which upper layers will pass to SW steering when creating a rule. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:28 -07:00
Yevgeny Kliteynik	e6b69bf379	net/mlx5: DR, Call ste_builder directly with tag pointer Instead of getting the tag in each function, call the builder directly with the tag. This will allow to use the same function for building the tag and the bitmask. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:28 -07:00
Yevgeny Kliteynik	92b4b88531	net/mlx5: DR, Remove unneeded local variable The misc3 variable is used only once and can be dropped. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:27 -07:00
Yevgeny Kliteynik	e6422d1da0	net/mlx5: DR, Remove unneeded vlan check from L2 builder When we create a matcher we check that all fields are consumed. There is no need for this specific check. This keeps the STE builder functions simple and clean. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:27 -07:00
Yevgeny Kliteynik	38a5c59d7e	net/mlx5: DR, Remove unneeded check from source port builder Mask validity for ste builders is checked by mlx5dr_ste_build_pre_check during matcher creation. It already checks the mask value of source_vport, so removing this duplicated check. Also, moving there the check of source_eswitch_owner_vhca_id mask. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:27 -07:00
Yevgeny Kliteynik	97ffd895fe	net/mlx5: DR, Replace the check for valid STE entry Validity check is done by reading the next lu_type from the STE, this check can be replaced by checking the refcount. This will make the check independent on internal STE structure. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:26 -07:00
Taehee Yoo	eff7423365	net: core: introduce struct netdev_nested_priv for nested interface infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper \| lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:00:15 -07:00
Amit Cohen	69f6d4ee68	mlxsw: spectrum_ethtool: Expose transceiver_overheat counter Add structures for port statistics which read from core and not directly from registers. When netdev's ethtool statistics are queried, query the corresponding module's overheat counter from core and expose it as "transceiver_overheat". Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:01 -07:00
Amit Cohen	05cf5828bc	mlxsw: Update module's settings when module is plugged in Module temperature warning events are enabled for modules that have a temperature sensor and configured according to the temperature thresholds queried from the module. When a module is unplugged we are guaranteed not to get temperature warning events. However, when a module is plugged in we need to potentially update its current settings (i.e., event enablement and thresholds). Register to port module plug/unplug events and update module's settings upon plug in events. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:01 -07:00
Amit Cohen	3bdbab3fee	mlxsw: spectrum: Initialize netdev's module overheat counter The overheat counter is a per-module counter, but it is exposed as part of the corresponding netdev's statistics. It should therefore be presented to user space relative to the netdev's lifetime. Query the counter just before registering the netdev, so that the value exposed to user space will be relative to this initial value. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:01 -07:00
Amit Cohen	f21b1a646f	mlxsw: Enable temperature event for all supported port module sensors MTWE (Management Temperature Warning Event) is triggered for sensors whose temperature event enable bit is enabled in the MTMP register. Enable events for all the modules that have a temperature sensor. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:01 -07:00
Amit Cohen	943585c9ee	mlxsw: Update transceiver_overheat counter according to MTWE MTWE (Management Temperature Warning Event) is triggered when module's temperature is higher than its threshold. Register for MTWE events and increase the module's overheat counter when its corresponding sensor goes above the configured threshold. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:00 -07:00
Amit Cohen	0652ac0775	mlxsw: core: Add an infrastructure to track transceiver overheat counter Initialize an array that stores per-module overheat state and a counter indicating how many times the module was in overheat state. Export a function to query the counter according to module number. Will be used later on by the switch driver (i.e., mlxsw_spectrum) to expose module's overheat counter as part of ethtool statistics. Initialize mlxsw_env after driver initialization to be able to query number of modules from MGPIR register. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:00 -07:00
Amit Cohen	fb1292f821	mlxsw: core_hwmon: Query MTMP before writing to set only relevant fields The MTMP register controls various temperature settings on a per-sensor basis. Subsequent patches are going to alter some of these settings for sensors found on port modules in response to certain events. In order to prevent the current callers that write to MTMP from overriding these settings, have them first query the register and then change only the relevant register fields. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:00 -07:00
Amit Cohen	02d33b4520	mlxsw: reg: Add Ports Module Administrative and Operational Status Register PMAOS register configures and retrieves the per module status. The register is used also for enabling event for status change. It will be used to enable PMPE (Port Module Plug/Unplug) event. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:00 -07:00
Amit Cohen	e7d62a3ca4	mlxsw: reg: Add Port Module Plug/Unplug Event Register PMPE register reports any operational status change of a module. It will be used for enabling temperature warning event when a module is plugged in. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:00 -07:00
Amit Cohen	946bd43519	mlxsw: reg: Add Management Temperature Warning Event Register Add MTWE (Management Temperature Warning Event) register, which is used for over temperature warning. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:27:00 -07:00
Ido Schimmel	7286502858	mlxsw: spectrum_acl: Fix mlxsw_sp_acl_tcam_group_add()'s error path If mlxsw_sp_acl_tcam_group_id_get() fails, the mutex initialized earlier is not destroyed. Fix this by initializing the mutex after calling the function. This is symmetric to mlxsw_sp_acl_tcam_group_del(). Fixes: `5ec2ee28d2` ("mlxsw: spectrum_acl: Introduce a mutex to guard region list updates") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:22:31 -07:00
Jacob Keller	bc75c054f0	devlink: convert flash_update to use params structure The devlink core recently gained support for checking whether the driver supports a flash_update parameter, via `supported_flash_update_params`. However, parameters are specified as function arguments. Adding a new parameter still requires modifying the signature of the .flash_update callback in all drivers. Convert the .flash_update function to take a new `struct devlink_flash_update_params` instead. By using this structure, and the `supported_flash_update_params` bit field, a new parameter to flash_update can be added without requiring modification to existing drivers. As before, all parameters except file_name will require driver opt-in. Because file_name is a necessary field to for the flash_update to make sense, no "SUPPORTED" bitflag is provided and it is always considered valid. All future additional parameters will require a new bit in the supported_flash_update_params bitfield. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Bin Luo <luobin9@huawei.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Danielle Ratson <danieller@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:20:57 -07:00
Jacob Keller	22ec3d232f	devlink: check flash_update parameter support in net core When implementing .flash_update, drivers which do not support per-component update are manually checking the component parameter to verify that it is NULL. Without this check, the driver might accept an update request with a component specified even though it will not honor such a request. Instead of having each driver check this, move the logic into net/core/devlink.c, and use a new `supported_flash_update_params` field in the devlink_ops. Drivers which will support per-component update must now specify this by setting DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT in the supported_flash_update_params in their devlink_ops. This helps ensure that drivers do not forget to check for a NULL component if they do not support per-component update. This also enables a slightly better error message by enabling the core stack to set the netlink bad attribute message to indicate precisely the unsupported attribute in the message. Going forward, any new additional parameter to flash update will require a bit in the supported_flash_update_params bitfield. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Bin Luo <luobin9@huawei.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Danielle Ratson <danieller@mellanox.com> Cc: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:20:57 -07:00
Jesse Brandeburg	7c8c0291f8	drivers/net/ethernet: clean up unused assignments As part of the W=1 compliation series, these lines all created warnings about unused variables that were assigned a value. Most of them are from register reads, but some are just picking up a return value from a function and never doing anything with it. Fixed warnings: .../ethernet/brocade/bna/bnad.c:3280:6: warning: variable ‘rx_count’ set but not used [-Wunused-but-set-variable] .../ethernet/brocade/bna/bnad.c:3280:6: warning: variable ‘rx_count’ set but not used [-Wunused-but-set-variable] .../ethernet/cortina/gemini.c:512:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable] .../ethernet/cortina/gemini.c:2110:21: warning: variable ‘config0’ set but not used [-Wunused-but-set-variable] .../ethernet/cavium/liquidio/octeon_device.c:1327:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable] .../ethernet/cavium/liquidio/octeon_device.c:1358:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable] .../ethernet/dec/tulip/media.c:322:8: warning: variable ‘setup’ set but not used [-Wunused-but-set-variable] .../ethernet/dec/tulip/de4x5.c:4928:13: warning: variable ‘r3’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:4981:6: warning: variable ‘rx_status’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:6510:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:6087: warning: cannot understand function prototype: 'struct hw_regs ' .../ethernet/microchip/lan743x_main.c:161:6: warning: variable ‘int_en’ set but not used [-Wunused-but-set-variable] .../ethernet/microchip/lan743x_main.c:1702:6: warning: variable ‘int_sts’ set but not used [-Wunused-but-set-variable] .../ethernet/microchip/lan743x_main.c:3041:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable] .../ethernet/natsemi/ns83820.c:603:6: warning: variable ‘tbisr’ set but not used [-Wunused-but-set-variable] .../ethernet/natsemi/ns83820.c:1207:11: warning: variable ‘tanar’ set but not used [-Wunused-but-set-variable] .../ethernet/marvell/mvneta.c:754:6: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/neterion/vxge/vxge-traffic.c:33:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable] .../ethernet/neterion/vxge/vxge-traffic.c:160:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable] .../ethernet/neterion/vxge/vxge-traffic.c:490:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable] .../ethernet/neterion/vxge/vxge-traffic.c:2378:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable] .../ethernet/packetengines/yellowfin.c:1063:18: warning: variable ‘yf_size’ set but not used [-Wunused-but-set-variable] .../ethernet/realtek/8139cp.c:1242:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] .../ethernet/mellanox/mlx4/en_tx.c:858:6: warning: variable ‘ring_cons’ set but not used [-Wunused-but-set-variable] .../ethernet/sis/sis900.c:792:6: warning: variable ‘status’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:878:11: warning: variable ‘rx_ev_pkt_type’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:877:23: warning: variable ‘rx_ev_mcast_pkt’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:877:7: warning: variable ‘rx_ev_hdr_type’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:876:7: warning: variable ‘rx_ev_other_err’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:1646:21: warning: variable ‘buftbl_min’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:2535:32: warning: variable ‘spec’ set but not used [-Wunused-but-set-variable] .../ethernet/via/via-velocity.c:880:6: warning: variable ‘curr_status’ set but not used [-Wunused-but-set-variable] .../ethernet/ti/tlan.c:656:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] .../ethernet/ti/davinci_emac.c:1230:6: warning: variable ‘num_tx_pkts’ set but not used [-Wunused-but-set-variable] .../ethernet/synopsys/dwc-xlgmac-common.c:516:8: warning: variable ‘str’ set but not used [-Wunused-but-set-variable] .../ethernet/ti/cpsw_new.c:1662:22: warning: variable ‘priv’ set but not used [-Wunused-but-set-variable] The register reads should be OK, because the current implementation of readl and friends will always execute even without an lvalue. When it makes sense, just remove the lvalue assignment and the local. Other times, just remove the offending code, and occasionally, just mark the variable as maybe unused since it could be used in an ifdef or debug scenario. Only compile tested with W=1. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 16:29:00 -07:00
David S. Miller	075c156850	mlx5-updates-2020-09-22 This series includes mlx5 updates 1) Add support for Connection Tracking offload in NIC mode. Supporting CT offload in NIC mode on Mellanox cards is useful for scenarios where the dual port NIC serves as a gateway between 2 networks and forwards traffic between these networks. Since the traffic is not terminated on the host in this case, no use of SRIOV VFs and/or switchdev mode is required. Today Mellanox NIC cards already support offloading of packet forwarding between physical ports without going to the host so combining it with CT offloading allows users to create a gateway with forwarding and CT (Including NAT) offloading capabilities in non-switchdev mode. To support connection tracking in non-Switchdev mode (Single NIC mode), we need to make use of the current Connection tracking infrastructure implemented on top of E-Switch and the mlx5 generic flow table chains APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain, the following was performed: 1.1) Refactor current flow steering chains infrastructure and updates TC nic mode implementation to use flow table chains. 1.2) Refactor current Connection Tracking (CT) infrastructure to not assume E-switch backend, and make the CT layer agnostic to underlying steering mode (E-Switch/NIC) 1.3) Plumbing to support CT offload in NIC mode. 2) Trivial code cleanups. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl9rz9cACgkQSD+KveBX +j7K/gf/ZysTFuFuC7MCo7xJO2vxlGGE1r6/ENsqonvUT2tcoZCdK9bZMw1Mx17Z r1nyn0xQ3MwRheXMSpqXngTPpfGM6eNgV9CDfFXm62z6WXMYieen0t/LrM/mxo+2 s74Okp53peyGNpePyseewEUGV7zaR6F6uukkKvr441gvAOF3Fcfaz+dIv7KzxKNS +b78yw0b6mGc4foYLSuJcDQlSwqjeIpdSib8xmETMZwRzCt20GCEBDsBAaKt0wzM 1fTZttY+kuLd/m/q+sh3s/4lN2kOO+dwK5NGf+RWtiOaDWT+J/ogVmI2ywXIwsg7 U63nhjGAr7GPqkaG0Jv3aS7na6pbSA== =sByc -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-09-22 This series includes mlx5 updates 1) Add support for Connection Tracking offload in NIC mode. Supporting CT offload in NIC mode on Mellanox cards is useful for scenarios where the dual port NIC serves as a gateway between 2 networks and forwards traffic between these networks. Since the traffic is not terminated on the host in this case, no use of SRIOV VFs and/or switchdev mode is required. Today Mellanox NIC cards already support offloading of packet forwarding between physical ports without going to the host so combining it with CT offloading allows users to create a gateway with forwarding and CT (Including NAT) offloading capabilities in non-switchdev mode. To support connection tracking in non-Switchdev mode (Single NIC mode), we need to make use of the current Connection tracking infrastructure implemented on top of E-Switch and the mlx5 generic flow table chains APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain, the following was performed: 1.1) Refactor current flow steering chains infrastructure and updates TC nic mode implementation to use flow table chains. 1.2) Refactor current Connection Tracking (CT) infrastructure to not assume E-switch backend, and make the CT layer agnostic to underlying steering mode (E-Switch/NIC) 1.3) Plumbing to support CT offload in NIC mode. 2) Trivial code cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:54:40 -07:00

... 3 4 5 6 7 ...

7252 Commits