Commit Graph

1471 Commits

Author SHA1 Message Date
Linus Torvalds
147cc5838c Updates for the interrupt subsystem:
Core:
 
   - Provide a new interface for affinity hints to provide a separation
     between hint and actual affinity change which has become a hidden
     property of the current interface
 
   - Fix up the in tree usage of the affinity hint interfaces
 
  Drivers:
 
   - No new irqchip drivers!
 
   - Fix GICv3 redistributor table reservation with RT across kexec
 
   - Fix GICv4.1 redistributor view of the VPE table across kexec
 
   - Add support for extra interrupts on spear-shirq
 
   - Make obtaining some interrupts optional for the Renesas drivers
 
   - Various cleanups and bug fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf9v0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRK6D/9bQmyITmJ4KLn0HZ1DsvkuR/GB7I8v
 yTF99FxIi/F0jlJ7+87Hdm68cfYPMahpiHqSlsf/QE2kkuWYDJmMaPUao14XMdG6
 jxrJ1OZtZXeDXyAWkB/gjmiuqyW/e/Myndg0UNUrJ66GqKfxfxtz1/4GfLjgDpIu
 TfZQdojvo6T7NTVnU8aAkgKUhM2jL/HxPiR3VUJ+VneSfwKLHzr3+lTY9zkSvJ8s
 ATqqGn6+GugJmDWaCI13IJcmBhPU/Gvs+Eqnwz7Xez/6wJftYvJh7vGec3ixS9pw
 skjPDnwuHcPl+h0mYMv7ySN7WuqTr0iqYIepdvLUfq6D1WjnHvF5XNcV4W7EzPJN
 B/pBosJ97ZAiHgrWsb35/S3bJ0mnB3Ib4WOOIcnRM36JUdNZrnKJntCsyrrmUsYA
 s6J1og9Ut7it+F9OFvsuZ2pUv25U8BlzhgfJen8Z0fzV1/2f5LQN0gQGVxqVpwkg
 3Cmd5Rmy5h2vlcKKHklLxIP24+UMIb2WyhsDiZ/qYH3zSFFnQPUJ6fvmZIxN/fPx
 exU5O8kgsXSwauXWHJJBb+qhKNcUNvUwKGHNMAvM9mh1xytU6ZowjTqqOlCfBWlg
 dRXT2xI0ex7liXek6yXa4lN1tabIdnvmYTmueUoFiOCqbUPBO8LTutjdehsUMa4d
 xV0a8WEzuk9Q/A==
 =myJA
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core:

   - Provide a new interface for affinity hints to provide a separation
     between hint and actual affinity change which has become a hidden
     property of the current interface

   - Fix up the in tree usage of the affinity hint interfaces

  Drivers:

   - No new irqchip drivers!

   - Fix GICv3 redistributor table reservation with RT across kexec

   - Fix GICv4.1 redistributor view of the VPE table across kexec

   - Add support for extra interrupts on spear-shirq

   - Make obtaining some interrupts optional for the Renesas drivers

   - Various cleanups and bug fixes"

* tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  irqchip/renesas-intc-irqpin: Use platform_get_irq_optional() to get the interrupt
  irqchip/renesas-irqc: Use platform_get_irq_optional() to get the interrupt
  irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time
  irqchip/ingenic-tcu: Use correctly sized arguments for bit field
  irqchip/gic-v2m: Add const to of_device_id
  irqchip/imx-gpcv2: Mark imx_gpcv2_instance with __ro_after_init
  irqchip/spear-shirq: Add support for IRQ 0..6
  irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime
  irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve
  irqchip/gic-v3-its: Give the percpu rdist struct its own flags field
  net/mlx4: Use irq_update_affinity_hint()
  net/mlx5: Use irq_set_affinity_and_hint()
  hinic: Use irq_set_affinity_and_hint()
  scsi: lpfc: Use irq_set_affinity()
  mailbox: Use irq_update_affinity_hint()
  ixgbe: Use irq_update_affinity_hint()
  be2net: Use irq_update_affinity_hint()
  enic: Use irq_update_affinity_hint()
  RDMA/irdma: Use irq_update_affinity_hint()
  scsi: mpt3sas: Use irq_set_affinity_and_hint()
  ...
2022-01-13 08:53:45 -08:00
Ingo Molnar
0422fe2666 Merge branch 'linus' into irq/core, to fix conflict
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-01-08 10:53:57 +01:00
David S. Miller
e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
Jakub Kicinski
b6459415b3 net: Don't include filter.h from net/sock.h
sock.h is pretty heavily used (5k objects rebuilt on x86 after
it's touched). We can drop the include of filter.h from it and
add a forward declaration of struct sk_filter instead.
This decreases the number of rebuilt objects when bpf.h
is touched from ~5k to ~1k.

There's a lot of missing includes this was masking. Primarily
in networking tho, this time.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-29 08:48:14 -08:00
Hangbin Liu
9c9211a3fc net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX
Since commit 94dd016ae5 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP
ioctl to active device") the user could get bond active interface's
PHC index directly. But when there is a failover, the bond active
interface will change, thus the PHC index is also changed. This may
break the user's program if they did not update the PHC timely.

This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
When the user wants to get the bond active interface's PHC, they need to
add this flag and be aware the PHC index may be changed.

With the new flag. All flag checks in current drivers are removed. Only
the checking in net_hwtstamp_validate() is kept.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14 12:28:24 +00:00
Paolo Abeni
c8064e5b4a bpf: Let bpf_warn_invalid_xdp_action() report more info
In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
2021-12-13 22:28:27 +01:00
Nitesh Narayan Lal
4b3ddc6462 net/mlx4: Use irq_update_affinity_hint()
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.

To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210903152430.244937-15-nitesh@redhat.com
2021-12-10 20:47:40 +01:00
Jakub Kicinski
fc993be36f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-02 11:44:56 -08:00
Zhou Qingyang
addad76431 net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
In mlx4_en_try_alloc_resources(), mlx4_en_copy_priv() is called and
tmp->tx_cq will be freed on the error path of mlx4_en_copy_priv().
After that mlx4_en_alloc_resources() is called and there is a dereference
of &tmp->tx_cq[t][i] in mlx4_en_alloc_resources(), which could lead to
a use after free problem on failure of mlx4_en_copy_priv().

Fix this bug by adding a check of mlx4_en_copy_priv()

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_MLX4_EN=m show no new warnings,
and our static analyzer no longer warns about this code.

Fixes: ec25bc04ed ("net/mlx4_en: Add resilience in low memory systems")
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20211130164438.190591-1-zhou1615@umn.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-01 19:04:50 -08:00
Erik Ekman
2191b1dfef net/mlx4_en: Update reported link modes for 1/10G
When link modes were initially added in commit 2c76267943
("net/mlx4_en: Use PTYS register to query ethtool settings") and
later updated for the new ethtool API in commit 3d8f7cc78d
("net: mlx4: use new ETHTOOL_G/SSETTINGS API") the only 1/10G non-baseT
link modes configured were 1000baseKX, 10000baseKX4 and 10000baseKR.
It looks like these got picked to represent other modes since nothing
better was available.

Switch to using more specific link modes added in commit 5711a98221
("net: ethtool: add support for 1000BaseX and missing 10G link modes").

Tested with MCX311A-XCAT connected via DAC.
Before:

% sudo ethtool enp3s0
Settings for enp3s0:
	Supported ports: [ FIBRE ]
	Supported link modes:   1000baseKX/Full
	                        10000baseKR/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseKX/Full
	                        10000baseKR/Full
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: off
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Supports Wake-on: d
	Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
	Link detected: yes

With this change:

% sudo ethtool enp3s0
	Settings for enp3s0:
	Supported ports: [ FIBRE ]
	Supported link modes:   1000baseX/Full
	                        10000baseCR/Full
 	                        10000baseSR/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseX/Full
 	                        10000baseCR/Full
 	                        10000baseSR/Full
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: off
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Supports Wake-on: d
	Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
	Link detected: yes

Tested-by: Michael Stapelberg <michael@stapelberg.ch>
Signed-off-by: Erik Ekman <erik@kryo.se>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29 13:03:32 +00:00
Hao Chen
7462494408 ethtool: extend ringparam setting/getting API with rx_buf_len
Add two new parameters kernel_ringparam and extack for
.get_ringparam and .set_ringparam to extend more ring params
through netlink.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 12:31:49 +00:00
Sean Anderson
4973056cce net: convert users of bitmap_foo() to linkmode_foo()
This converts instances of
	bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS)
to
	linkmode_foo(args...)

I manually fixed up some lines to prevent them from being excessively
long. Otherwise, this change was generated with the following semantic
patch:

// Generated with
// echo linux/linkmode.h > includes
// git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \
// | sort | uniq | tee new_includes | wc -l && mv new_includes includes
// and repeating until the number stopped going up
@i@
@@

(
 #include <linux/acpi_mdio.h>
|
 #include <linux/brcmphy.h>
|
 #include <linux/dsa/loop.h>
|
 #include <linux/dsa/sja1105.h>
|
 #include <linux/ethtool.h>
|
 #include <linux/ethtool_netlink.h>
|
 #include <linux/fec.h>
|
 #include <linux/fs_enet_pd.h>
|
 #include <linux/fsl/enetc_mdio.h>
|
 #include <linux/fwnode_mdio.h>
|
 #include <linux/linkmode.h>
|
 #include <linux/lsm_audit.h>
|
 #include <linux/mdio-bitbang.h>
|
 #include <linux/mdio.h>
|
 #include <linux/mdio-mux.h>
|
 #include <linux/mii.h>
|
 #include <linux/mii_timestamper.h>
|
 #include <linux/mlx5/accel.h>
|
 #include <linux/mlx5/cq.h>
|
 #include <linux/mlx5/device.h>
|
 #include <linux/mlx5/driver.h>
|
 #include <linux/mlx5/eswitch.h>
|
 #include <linux/mlx5/fs.h>
|
 #include <linux/mlx5/port.h>
|
 #include <linux/mlx5/qp.h>
|
 #include <linux/mlx5/rsc_dump.h>
|
 #include <linux/mlx5/transobj.h>
|
 #include <linux/mlx5/vport.h>
|
 #include <linux/of_mdio.h>
|
 #include <linux/of_net.h>
|
 #include <linux/pcs-lynx.h>
|
 #include <linux/pcs/pcs-xpcs.h>
|
 #include <linux/phy.h>
|
 #include <linux/phy_led_triggers.h>
|
 #include <linux/phylink.h>
|
 #include <linux/platform_data/bcmgenet.h>
|
 #include <linux/platform_data/xilinx-ll-temac.h>
|
 #include <linux/pxa168_eth.h>
|
 #include <linux/qed/qed_eth_if.h>
|
 #include <linux/qed/qed_fcoe_if.h>
|
 #include <linux/qed/qed_if.h>
|
 #include <linux/qed/qed_iov_if.h>
|
 #include <linux/qed/qed_iscsi_if.h>
|
 #include <linux/qed/qed_ll2_if.h>
|
 #include <linux/qed/qed_nvmetcp_if.h>
|
 #include <linux/qed/qed_rdma_if.h>
|
 #include <linux/sfp.h>
|
 #include <linux/sh_eth.h>
|
 #include <linux/smsc911x.h>
|
 #include <linux/soc/nxp/lpc32xx-misc.h>
|
 #include <linux/stmmac.h>
|
 #include <linux/sunrpc/svc_rdma.h>
|
 #include <linux/sxgbe_platform.h>
|
 #include <net/cfg80211.h>
|
 #include <net/dsa.h>
|
 #include <net/mac80211.h>
|
 #include <net/selftests.h>
|
 #include <rdma/ib_addr.h>
|
 #include <rdma/ib_cache.h>
|
 #include <rdma/ib_cm.h>
|
 #include <rdma/ib_hdrs.h>
|
 #include <rdma/ib_mad.h>
|
 #include <rdma/ib_marshall.h>
|
 #include <rdma/ib_pack.h>
|
 #include <rdma/ib_pma.h>
|
 #include <rdma/ib_sa.h>
|
 #include <rdma/ib_smi.h>
|
 #include <rdma/ib_umem.h>
|
 #include <rdma/ib_umem_odp.h>
|
 #include <rdma/ib_verbs.h>
|
 #include <rdma/iw_cm.h>
|
 #include <rdma/mr_pool.h>
|
 #include <rdma/opa_addr.h>
|
 #include <rdma/opa_port_info.h>
|
 #include <rdma/opa_smi.h>
|
 #include <rdma/opa_vnic.h>
|
 #include <rdma/rdma_cm.h>
|
 #include <rdma/rdma_cm_ib.h>
|
 #include <rdma/rdmavt_cq.h>
|
 #include <rdma/rdma_vt.h>
|
 #include <rdma/rdmavt_qp.h>
|
 #include <rdma/rw.h>
|
 #include <rdma/tid_rdma_defs.h>
|
 #include <rdma/uverbs_ioctl.h>
|
 #include <rdma/uverbs_named_ioctl.h>
|
 #include <rdma/uverbs_std_types.h>
|
 #include <rdma/uverbs_types.h>
|
 #include <soc/mscc/ocelot.h>
|
 #include <soc/mscc/ocelot_ptp.h>
|
 #include <soc/mscc/ocelot_vcap.h>
|
 #include <trace/events/ib_mad.h>
|
 #include <trace/events/rdma_core.h>
|
 #include <trace/events/rdma.h>
|
 #include <trace/events/rpcrdma.h>
|
 #include <uapi/linux/ethtool.h>
|
 #include <uapi/linux/ethtool_netlink.h>
|
 #include <uapi/linux/mdio.h>
|
 #include <uapi/linux/mii.h>
)

@depends on i@
expression list args;
@@

(
- bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_zero(args)
|
- bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_copy(args)
|
- bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_and(args)
|
- bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_or(args)
|
- bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_empty(args)
|
- bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_andnot(args)
|
- bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_equal(args)
|
- bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_intersects(args)
|
- bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_subset(args)
)

Add missing linux/mii.h include to mellanox. -DaveM

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24 13:58:52 +01:00
Leon Romanovsky
82465bec3e devlink: Delete reload enable/disable interface
Commit a0c76345e3 ("devlink: disallow reload operation during device
cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
operation from racing with device probe/dismantle.

After recent changes to move devlink_register() to the end of device
probe and devlink_unregister() to the beginning of device dismantle,
these races can no longer happen. Reload operations will be denied if
the devlink instance is unregistered and devlink_unregister() will block
until all in-flight operations are done.

Therefore, remove these devlink_reload_{enable,disable}() APIs.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 16:29:17 -07:00
Leon Romanovsky
bd032e35c5 devlink: Allow control devlink ops behavior through feature mask
Introduce new devlink call to set feature mask to control devlink
behavior during device initialization phase after devlink_alloc()
is already called.

This allows us to set reload ops based on device property which
is not known at the beginning of driver initialization.

For the sake of simplicity, this API lacks any type of locking and
needs to be called before devlink_register() to make sure that no
parallel access to the ops is possible at this stage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 16:29:17 -07:00
Christophe JAILLET
b9c56ccb43 ethernet: Remove redundant 'flush_workqueue()' calls
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

This was generated with coccinelle:

@@
expression E;
@@
- 	flush_workqueue(E);
	destroy_workqueue(E);

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> #mlx*
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-10 11:33:15 +01:00
Jakub Kicinski
ebb1fdb589 mlx4: constify args for const dev_addr
netdev->dev_addr will become const soon. Make sure all
functions which pass it around mark appropriate args
as const.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:15:35 +01:00
Jakub Kicinski
e04ffd120f mlx4: remove custom dev_addr clearing
mlx4_en_u64_to_mac() takes the dev->dev_addr pointer and writes
to it byte by byte. It also clears the two bytes _after_ ETH_ALEN
which seems unnecessary. dev->addr_len is set to ETH_ALEN just
before the call.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:15:35 +01:00
Jakub Kicinski
1bb96a07f9 mlx4: replace mlx4_u64_to_mac() with u64_to_ether_addr()
mlx4_u64_to_mac() predates the common helper but doesn't
make the argument constant.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:15:35 +01:00
Jakub Kicinski
ded6e16b37 mlx4: replace mlx4_mac_to_u64() with ether_addr_to_u64()
mlx4_mac_to_u64() predates and opencodes ether_addr_to_u64().
It doesn't make the argument constant so it'll be problematic
when dev->dev_addr becomes a const. Convert to the generic helper.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:15:35 +01:00
Eric Dumazet
9ac936276f net/mlx4_en: avoid one cache line miss to ring doorbell
This patch caches doorbell address directly in struct mlx4_en_tx_ring.

This removes the need to bring in cpu caches whole struct mlx4_uar
in fast path.

Note that mlx4_uar is not guaranteed to be on a local node,
because mlx4_bf_alloc() uses a single free list (priv->bf_list)
regardless of its node parameter.

This kind of change does matter in presence of light/moderate traffic.
In high stress, this read-only line would be kept hot in caches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-04 12:50:13 +01:00
Jakub Kicinski
a96d317fb1 ethernet: use eth_hw_addr_set()
Convert all Ethernet drivers from memcpy(... ETH_ADDR)
to eth_hw_addr_set():

  @@
  expression dev, np;
  @@
  - memcpy(dev->dev_addr, np, ETH_ALEN)
  + eth_hw_addr_set(dev, np)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02 14:18:25 +01:00
Joshua Roys
dee3b2d0fa net/mlx4_en: Add XDP_REDIRECT statistics
Add counters for XDP REDIRECT success and failure. This brings the
redirect path in line with metrics gathered via the other XDP paths.

Signed-off-by: Joshua Roys <roysjosh@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30 14:14:30 +01:00
Gustavo A. R. Silva
f69bf5dee7 net/mlx4: Use array_size() helper in copy_to_user()
Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29 11:32:14 +01:00
Leon Romanovsky
1e72685916 net/mlx4: Move devlink_register to be the last initialization command
Refactor the code to make sure that devlink_register() is the last
command during initialization stage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27 16:31:59 +01:00
Joshua Roys
a8551c9b75 net: mlx4: Add support for XDP_REDIRECT
Signed-off-by: Joshua Roys <roysjosh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 14:10:35 +01:00
Jakub Kicinski
2fcd14d0f7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/mptcp/protocol.c
  977d293e23 ("mptcp: ensure tx skbs always have the MPTCP ext")
  efe686ffce ("mptcp: ensure tx skbs always have the MPTCP ext")

same patch merged in both trees, keep net-next.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23 11:19:49 -07:00
Aya Levin
fdbccea419 net/mlx4_en: Don't allow aRFS for encapsulated packets
Driver doesn't support aRFS for encapsulated packets, return early error
in such a case.

Fixes: 1eb8c695bd ("net/mlx4_en: Add accelerated RFS support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 13:17:39 +01:00
Leon Romanovsky
db4278c55f devlink: Make devlink_register to be void
devlink_register() can't fail and always returns success, but all drivers
are obligated to check returned status anyway. This adds a lot of boilerplate
code to handle impossible flow.

Make devlink_register() void and simplify the drivers that use that
API call.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22 14:15:12 +01:00
Lama Kayal
72a3c58d18 net/mlx4_en: Resolve bad operstate value
Any link state change that's done prior to net device registration
isn't reflected on the state, thus the operational state is left
obsolete, with 'UNKNOWN' status.

To resolve the issue, query link state from FW upon open operations
to ensure operational state is updated.

Fixes: c27a02cd94 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19 13:21:04 +01:00
Yufeng Mo
f3ccfda193 ethtool: extend coalesce setting uAPI with CQE mode
In order to support more coalesce parameters through netlink,
add two new parameter kernel_coal and extack for .set_coalesce
and .get_coalesce, then some extra info can return to user with
the netlink API.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24 07:38:29 -07:00
Christophe JAILLET
eb9c5c0d3a net/mellanox: switch from 'pci_' to 'dma_' API
The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

It has been hand modified to use 'dma_set_mask_and_coherent()' instead of
'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
This is less verbose.

It has been compile tested.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-23 12:02:28 +01:00
Jason Wang
19b8ece42c net/mlx4: Use ARRAY_SIZE to get an array's size
The ARRAY_SIZE macro is defined to get an array's size which is
more compact and more formal in linux source. Thus, we can replace
the long sizeof(arr)/sizeof(arr[0]) with the compact ARRAY_SIZE.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210817121106.44189-1-wangborong@cdjrlc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18 15:16:54 -07:00
Arnd Bergmann
e5f3155267 ethernet: fix PTP_1588_CLOCK dependencies
The 'imply' keyword does not do what most people think it does, it only
politely asks Kconfig to turn on another symbol, but does not prevent
it from being disabled manually or built as a loadable module when the
user is built-in. In the ICE driver, the latter now causes a link failure:

aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':

This is a recurring problem in many drivers, and we have discussed
it several times befores, without reaching a consensus. I'm providing
a link to the previous email thread for reference, which discusses
some related problems.

To solve the dependency issue better than the 'imply' keyword, introduce a
separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
can depend on if it is able to use PTP support when available, but works
fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
then prevented from being built-in, the same way as with a 'depends on
PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
but that can be rather confusing when you first see it.

Since this should cover the dependencies correctly, the IS_REACHABLE()
hack in the header is no longer needed now, and can be turned back
into a normal IS_ENABLED() check. Any driver that gets the dependency
wrong will now cause a link time failure rather than being unable to use
PTP support when that is in a loadable module.

However, the two recently added ptp_get_vclocks_index() and
ptp_convert_timestamp() interfaces are only called from builtin code with
ethtool and socket timestamps, so keep the current behavior by stubbing
those out completely when PTP is in a loadable module. This should be
addressed properly in a follow-up.

As Richard suggested, we may want to actually turn PTP support into a
'bool' option later on, preventing it from being a loadable module
altogether, which would be one way to solve the problem with the ethtool
interface.

Fixes: 06c16d89d2 ("ice: register 1588 PTP clock device object for E810 devices")
Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/
Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/
Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 17:49:05 -07:00
Leon Romanovsky
919d13a7e4 devlink: Set device as early as possible
All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().

Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.

Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.

[    8.758862]  devlink_nl_param_fill+0x2e8/0xe50
[    8.760305]  devlink_param_notify+0x6d/0x180
[    8.760435]  __devlink_params_register+0x2f1/0x670
[    8.760558]  devlink_params_register+0x1e/0x20

The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-09 10:21:40 +01:00
Colin Ian King
7cdd0a89ec net/mlx4: make the array states static const, makes object smaller
Don't populate the array states on the stack but instead it
static const. Makes the object code smaller by 79 bytes.

Before:
   text   data   bss    dec    hex filename
  21309   8304   192  29805   746d drivers/net/ethernet/mellanox/mlx4/qp.o

After:
   text   data   bss    dec    hex filename
  21166   8368   192  29726   741e drivers/net/ethernet/mellanox/mlx4/qp.o

(gcc version 10.2.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210801153742.147304-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-02 15:02:13 -07:00
Jakub Kicinski
d2e11fd2b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicting commits, all resolutions pretty trivial:

drivers/bus/mhi/pci_generic.c
  5c2c853159 ("bus: mhi: pci-generic: configurable network interface MRU")
  56f6f4c4eb ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean")

drivers/nfc/s3fwrn5/firmware.c
  a0302ff590 ("nfc: s3fwrn5: remove unnecessary label")
  46573e3ab0 ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
  801e541c79 ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")

MAINTAINERS
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")
  8a7b46fa79 ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-31 09:14:46 -07:00
Arnd Bergmann
a76053707d dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement
the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.

Separate these from the few drivers that use ndo_do_ioctl to
implement SIOCBOND, SIOCBR and SIOCWANDEV commands.

This is a purely cosmetic change intended to help readers find
their way through the implementation.

Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27 20:11:45 +01:00
Jiapeng Chong
7e4960b3d6 mlx4: Fix missing error code in mlx4_load_one()
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'err'.

Eliminate the follow smatch warning:

drivers/net/ethernet/mellanox/mlx4/main.c:3538 mlx4_load_one() warn:
missing error code 'err'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: 7ae0e400cd ("net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs")
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-25 10:47:24 +01:00
Linus Torvalds
383df634f1 More fallthrough fixes for Clang for 5.14-rc1
Hi Linus,
 
 Please, pull the following patches that fix many fall-through warnings
 when building with Clang 12.0.0 and this[1] change reverted. Notice
 that in order to enable -Wimplicit-fallthrough for Clang, such change[1]
 is meant to be reverted at some point. So, these patches help to move
 in that direction.
 
 Thanks!
 
 [1] commit e2079e93f5 ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmDlNrUACgkQRwW0y0cG
 2zH/WRAAmQ1o2HoyWp1KCoad6mR6EeCXNAEBSF2F8PL+Yi9oGPTJSRVEn5EPO4wZ
 9X15lnMo63nwYx09Ka96uqbXKf90bnf/EvwB7+GY+3+qAUw7wbgxW9wW0208gddl
 u8JTQKHEfUZVMEyye5f7FSNwTVVOtsX581GDsYIFxmiuO00BkoQdoMm43NNXb2b2
 W3PlCFNjRkbN7kqvCwTKRQJZw586/7hv6SiNA1UgBVeFuJEZvhC2Uaj7oOXHv5AP
 ICQDZI419Zzb7Vh2JcPFCsc00Ak33YEEpdU+iTX0xVyPkFIW6/wsgaFlGYugRR3n
 PxdpJtrvVj7uW/ncJ1UaOAaZZ3qfs2qTbrfoa+ybN+qfmocVz9Zlh9+bWn00Ymxw
 sk+MUtcfJF6Lt0IgbFgtYtruBo7Qp8IPalJuyoBpXdhLHJUifT+n3qdhPNrH2rpo
 uHuuseg5OJHsP3QxBA2Q4KtwhvI84+RDFCfwrHdSM5BitCFAyYGOLSzbtZHvcpPX
 EKW2P4toqZlmxCX+Bfxdc6DjqgogUm/38YcMjoOX5h4leugyWRjuBbCgX+xh121W
 wz5g02qUaoTNTSNylN/mL3POKVLv7PgZNj/5OXu5Cz3hk9qt/WqfwCQI6caxzrjR
 sRIlzP4qc4THWa9I99XbR2/8jXLjK0mZV9xwgO8XMcCDMAxrugA=
 =2p48
 -----END PGP SIGNATURE-----

Merge tag 'Wimplicit-fallthrough-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull more fallthrough fixes from Gustavo Silva:
 "Fix maore fall-through warnings when building the kernel with clang
  and '-Wimplicit-fallthrough'"

* tag 'Wimplicit-fallthrough-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  Input: Fix fall-through warning for Clang
  scsi: aic94xx: Fix fall-through warning for Clang
  i3c: master: cdns: Fix fall-through warning for Clang
  net/mlx4: Fix fall-through warning for Clang
2021-07-07 11:03:04 -07:00
Gustavo A. R. Silva
6ca24c6563 net/mlx4: Fix fall-through warning for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-06-29 08:17:23 -05:00
David S. Miller
e1289cfb63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-28

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).

The main changes are:

1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.

2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.

3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.

4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.

5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.

6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.

7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.

8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.

9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.

10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.

11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.

12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28 15:28:03 -07:00
Toke Høiland-Jørgensen
c4411b371c mlx4: Remove rcu_read_lock() around XDP program invocation
The mlx4 driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around. Also switch the RCU
dereferences in the driver loop itself to the _bh variants.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20210624160609.292325-14-toke@redhat.com
2021-06-24 19:45:18 +02:00
Jakub Kicinski
adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Matteo Croce
c420c98982 skbuff: add a parameter to __skb_frag_unref
This is a prerequisite patch, the next one is enabling recycling of
skbs and fragments. Add an extra argument on __skb_frag_unref() to
handle recycling, and update the current users of the function with that.

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07 14:11:47 -07:00
Shay Drory
404e5a1269 RDMA/mlx4: Do not map the core_clock page to user space unless enabled
Currently when mlx4 maps the hca_core_clock page to the user space there
are read-modifiable registers, one of which is semaphore, on this page as
well as the clock counter. If user reads the wrong offset, it can modify
the semaphore and hang the device.

Do not map the hca_core_clock page to the user space unless the device has
been put in a backwards compatibility mode to support this feature.

After this patch, mlx4 core_clock won't be mapped to user space on the
majority of existing devices and the uverbs device time feature in
ibv_query_rt_values_ex() will be disabled.

Fixes: 52033cfb5a ("IB/mlx4: Add mmap call to map the hardware clock")
Link: https://lore.kernel.org/r/9632304e0d6790af84b3b706d8c18732bc0d5e27.1622726305.git.leonro@nvidia.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-03 14:19:53 -03:00
Vladyslav Tarasiuk
db825feefc net/mlx4: Fix EEPROM dump support
Fix SFP and QSFP* EEPROM queries by setting i2c_address, offset and page
number correctly. For SFP set the following params:
- I2C address for offsets 0-255 is 0x50. For 256-511 - 0x51.
- Page number is zero.
- Offset is 0-255.

At the same time, QSFP* parameters are different:
- I2C address is always 0x50.
- Page number is not limited to zero.
- Offset is 0-255 for page zero and 128-255 for others.

To set parameters accordingly to cable used, implement function to query
module ID and implement respective helper functions to set parameters
correctly.

Fixes: 135dd9594f ("net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-10 14:34:39 -07:00
Hans Westgaard Ry
79ebfb11fe net/mlx4: Treat VFs fair when handling comm_channel_events
Handling comm_channel_event in mlx4_master_comm_channel uses a double
loop to determine which slaves have requested work. The search is
always started at lowest slave. This leads to unfairness; lower VFs
tends to be prioritized over higher VFs.

The patch uses find_next_bit to determine which slaves to handle.
Fairness is implemented by always starting at the next to the last
start.

An MPI program has been used to measure improvements. It runs 500
ibv_reg_mr, synchronizes with all other instances and then runs 500
ibv_dereg_mr.

The results running 500 processes, time reported is for running 500
calls:

ibv_reg_mr:
             Mod.   Org.
mlx4_1    403.356ms 424.674ms
mlx4_2    403.355ms 424.674ms
mlx4_3    403.354ms 424.674ms
mlx4_4    403.355ms 424.674ms
mlx4_5    403.357ms 424.677ms
mlx4_6    403.354ms 424.676ms
mlx4_7    403.357ms 424.675ms
mlx4_8    403.355ms 424.675ms

ibv_dereg_mr:
             Mod.   Org.
mlx4_1    116.408ms 142.818ms
mlx4_2    116.434ms 142.793ms
mlx4_3    116.488ms 143.247ms
mlx4_4    116.679ms 143.230ms
mlx4_5    112.017ms 107.204ms
mlx4_6    112.032ms 107.516ms
mlx4_7    112.083ms 184.195ms
mlx4_8    115.089ms 190.618ms

Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 14:59:26 -07:00
Kevin(Yudong) Yang
00ff801bb8 net/mlx4_en: update moderation when config reset
This patch fixes a bug that the moderation config will not be
applied when calling mlx4_en_reset_config. For example, when
turning on rx timestamping, mlx4_en_reset_config() will be called,
causing the NIC to forget previous moderation config.

This fix is in phase with a previous fix:
commit 79c54b6bbf ("net/mlx4_en: Fix TX moderation info loss
after set_ringparam is called")

Tested: Before this patch, on a host with NIC using mlx4, run
netserver and stream TCP to the host at full utilization.
$ sar -I SUM 1
                 INTR    intr/s
14:03:56          sum  48758.00

After rx hwtstamp is enabled:
$ sar -I SUM 1
14:10:38          sum 317771.00
We see the moderation is not working properly and issued 7x more
interrupts.

After the patch, and turned on rx hwtstamp, the rate of interrupts
is as expected:
$ sar -I SUM 1
14:52:11          sum  49332.00

Fixes: 79c54b6bbf ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called")
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
CC: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-05 12:42:31 -08:00
Chuhong Yuan
8eb65fda4a net/mlx4_core: Add missed mlx4_free_cmd_mailbox()
mlx4_do_mirror_rule() forgets to call mlx4_free_cmd_mailbox() to
free the memory region allocated by mlx4_alloc_cmd_mailbox() before
an exit.
Add the missed call to fix it.

Fixes: 78efed2751 ("net/mlx4_core: Support mirroring VF DMFS rules on both ports")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210221143559.390277-1-hslester96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-22 19:08:33 -08:00
Lorenzo Bianconi
be9df4aff6 net, xdp: Introduce xdp_prepare_buff utility routine
Introduce xdp_prepare_buff utility routine to initialize per-descriptor
xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
all XDP capable drivers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08 13:39:24 -08:00