All WQ types moved to using the fragmented allocation API
for coherent memory. Contiguous API is not used anymore.
Remove it, reduce the number of exported functions.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
On tuple offload del command, tuples are tried to be removed twice
from the hashtable, once directly via mlx5_tc_ct_entry_remove_from_tuples()
and a second time in the following mlx5_tc_ct_entry_put()->
mlx5_tc_ct_entry_del()->mlx5_tc_ct_entry_remove_from_tuples() call.
This doesn't cause any issue since rhashtable first checks if the
removed object exists in the hashtable.
Remove the extra mlx5_tc_ct_entry_remove_from_tuples().
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
It can be calculated via function mlx5dr_ste_get_hw_ste().
Very simple and lightweight, no need to use a dedicated member.
Reduce 8 bytes from struct mlx5dr_ste and its size is 48 bytes now.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Remove chunk_size in struct mlx5dr_icm_chunk and use
chunk->size instead.
Remove ste_arr/hw_ste_arr/miss_list since they can be accessed
from htbl->chunk pointer, no need to keep a copy.
This commit reduces 28 bytes from struct mlx5dr_ste_htbl and its
size is 32 bytes now.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Target to reduce the memory consumption in large scale of flow rules.
They can be calculated quickly from buddy memory pool.
1. num_of_entries calls dr_icm_pool_get_chunk_num_of_entries().
2. byte_size calls dr_icm_pool_get_chunk_byte_size().
Use chunk size in dr_icm_chunk to speed up and the one in dr_ste_htbl
will be removed in the upcoming commit.
This commit reduce 8 bytes from struct mlx5_dr_icm_chunk and its
current size is 56 bytes.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
It can be calculated quickly from buddy memory pool by
function mlx5dr_icm_pool_get_chunk_icm_addr().
This function is very lightweight and straightforward.
Reduce 8 bytes and current size of struct mlx5_dr_icm_chunk
is 64 bytes.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reduce memory footprint by removing mr_addr and rkey from
mlx5_dr_icm_chunk.
1. mr_addr is calculated by mlx5dr_icm_pool_get_chunk_mr_addr()
2. rkey is calculated by mlx5dr_icm_pool_get_chunk_rkey()
The two new functions are very lightweight and straightforward.
Reduce 8 bytes from struct mlx5_dr_icm_chunk, its current size is
72 bytes.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Accord to profiling, mlx5dr_ste/mlx5dr_icm_chunk are the two
hot structures. Their memory layout can be optimized by
adjusting member sequences.
Struct mlx5dr_ste size changes from 64 bytes to 56 bytes.
In the upcoming commits, struct mlx5dr_icm_chunk memory layout
will change automatically after removing some members.
Keep it untouched here.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The packet size in mlx5e_skb_from_cqe_mpwrq_linear can't overflow u16,
since the maximum packet size in linear striding RQ is 2^13 bytes. Drop
the unneeded u32 variable.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The len parameter of mlx5e_xdp_handle is used to output the new packet
length after XDP has processed the packet and returned XDP_PASS.
However, this value can be calculated on the caller site, as the caller
knows if it was an XDP_PASS.
This commit drops the len parameter and moves the calculation to the
caller, reducing the number of parameters passed to the function and
preparing for XDP support in non-linear legacy RQ.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Instead of early return inside mlx5e_xdp_handle(), let the caller check
if an XDP program is loaded. This allows saving a few unnecessary
function calls and calculations in case !prog.
Performance test: single core, drop packets in iptables
Before: 3,872,504 pps
After: 3,975,628 pps (+2.66%)
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As a performance optimization and preparation to enabling XDP multi
buffer on non-linear legacy RQ, build the linear part of the SKB over
the first fragment, instead of allocating a new buffer and copying the
first 256 bytes there.
To achieve this, add headroom and tailroom to the first fragment.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, rq->buff.headroom is applied to all fragments in legacy RQ.
In the linear mode, there is a non-zero headroom, but there is only one
fragment per packet. In the non-linear mode, the headroom is zero.
This commit changes the logic to apply the headroom only to the first
fragment. The current behavior remains the same for both linear and
non-linear modes. However, it allows the next commit to enable headroom
for the non-linear mode, which will be applied only to the first
fragment.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_build_rq_frags_info() assumes that MTU is not bigger than
PAGE_SIZE * MLX5E_MAX_RX_FRAGS, which is 16K for 4K pages. Currently,
the firmware limits MTU to 10K, so the assumption doesn't lead to a bug.
This commits adds an additional driver check for reliability, since the
firmware boundary might be changed.
The calculation is taken to a separate function with a comment
explaining it. It's a preparation for the following patches that
introcuce XDP multi buffer support.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Recent commit 974578017f ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.
The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).
Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot
[52625.981294] sysrq: SysRq : Show Blocked State
[52625.988377] task:reboot state:D stack: 0 pid:17359 ppid: 1 f2
[52625.996732] Call Trace:
[52625.999187] __schedule+0x2d1/0x830
[52626.007400] schedule+0x35/0xa0
[52626.010545] schedule_hrtimeout_range_clock+0x83/0x100
[52626.020046] usleep_range+0x5b/0x80
[52626.023540] iavf_remove+0x63/0x5b0 [iavf]
[52626.027645] pci_device_remove+0x3b/0xc0
[52626.031572] device_release_driver_internal+0x103/0x1f0
[52626.036805] pci_stop_bus_device+0x72/0xa0
[52626.040904] pci_stop_and_remove_bus_device+0xe/0x20
[52626.045870] pci_iov_remove_virtfn+0xba/0x120
[52626.050232] sriov_disable+0x2f/0xe0
[52626.053813] ice_free_vfs+0x7c/0x340 [ice]
[52626.057946] ice_remove+0x220/0x240 [ice]
[52626.061967] ice_shutdown+0x16/0x50 [ice]
[52626.065987] pci_device_shutdown+0x34/0x60
[52626.070086] device_shutdown+0x165/0x1c5
[52626.074011] kernel_restart+0xe/0x30
[52626.077593] __do_sys_reboot+0x1d2/0x210
[52626.093815] do_syscall_64+0x5b/0x1a0
[52626.097483] entry_SYSCALL_64_after_hwframe+0x65/0xca
Fixes: 974578017f ("iavf: Add waiting so the port is initialized in remove")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://lore.kernel.org/r/20220317104524.2802848-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ACL rules can be offloaded to VCAP IS2 either through chain 0, or, since
the blamed commit, through a chain index whose number encodes a specific
PAG (Policy Action Group) and lookup number.
The chain number is translated through ocelot_chain_to_pag() into a PAG,
and through ocelot_chain_to_lookup() into a lookup number.
The problem with the blamed commit is that the above 2 functions don't
have special treatment for chain 0. So ocelot_chain_to_pag(0) returns
filter->pag = 224, which is in fact -32, but the "pag" field is an u8.
So we end up programming the hardware with VCAP IS2 entries having a PAG
of 224. But the way in which the PAG works is that it defines a subset
of VCAP IS2 filters which should match on a packet. The default PAG is
0, and previous VCAP IS1 rules (which we offload using 'goto') can
modify it. So basically, we are installing filters with a PAG on which
no packet will ever match. This is the hardware equivalent of adding
filters to a chain which has no 'goto' to it.
Restore the previous functionality by making ACL filters offloaded to
chain 0 go to PAG 0 and lookup number 0. The choice of PAG is clearly
correct, but the choice of lookup number isn't "as before" (which was to
leave the lookup a "don't care"). However, lookup 0 should be fine,
since even though there are ACL actions (policers) which have a
requirement to be used in a specific lookup, that lookup is 0.
Fixes: 226e9cd82a ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220316192117.2568261-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RXCHK block will return a partial checksum of 0 if it encounters
a problem while receiving a packet. Since a 1's complement sum can
only produce this result if no bits are set in the received data
stream it is fair to treat it as an invalid partial checksum and
not pass it up the stack.
Fixes: 8101553978 ("net: bcmgenet: use CHECKSUM_COMPLETE for NETIF_F_RXCSUM")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220317012812.1313196-1-opendmb@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit b7a49f7305 ("bnx2x: Utilize firmware 7.13.21.0")
added request_firmware() logic in probe() which caused
load failure when firmware file is not present in initrd (below),
as access to firmware file is not feasible during probe.
Direct firmware load for bnx2x/bnx2x-e2-7.13.15.0.fw failed with error -2
Direct firmware load for bnx2x/bnx2x-e2-7.13.21.0.fw failed with error -2
This patch fixes this issue by -
1. Removing request_firmware() logic from the probe()
such that .ndo_open() handle it as it used to handle
it earlier
2. Given request_firmware() is removed from probe(), so
driver has to relax FW version comparisons a bit against
the already loaded FW version (by some other PFs of same
adapter) to allow different compatible/close enough FWs with which
multiple PFs may run with (in different environments), as the
given PF who is in probe flow has no idea now with which firmware
file version it is going to initialize the device in ndo_open()
Link: https://lore.kernel.org/all/46f2d9d9-ae7f-b332-ddeb-b59802be2bab@molgen.mpg.de/
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: b7a49f7305 ("bnx2x: Utilize firmware 7.13.21.0")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220316214613.6884-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clang static analysis reports this representative issue
igb_ptp.c:997:3: warning: The left operand of '+' is a
garbage value
ktime_add_ns(shhwtstamps.hwtstamp, adjust);
^ ~~~~~~~~~~~~~~~~~~~~
shhwtstamps.hwtstamp is set by a call to
igb_ptp_systim_to_hwtstamp(). In the switch-statement
for the hw type, the hwtstamp is zeroed for matches
but not the default case. Move the memset out of
switch-statement. This degarbages the default case
and reduces the size.
Some whitespace cleanup of empty lines
Signed-off-by: Tom Rix <trix@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The calculation of the checksum can fail.
So move converting the checksum to little endian
to inside the return status check.
Signed-off-by: Tom Rix <trix@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This code works but it has a static checker warning:
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1687 init_dma_rx_desc_rings()
warn: always true condition '(queue >= 0) => (0-u32max >= 0)'
Obviously, it makes no sense to check if an unsigned int is >= 0. What
prevents this code from being a forever loop is that later there is a
separate check for if (queue == 0).
The "queue" variable is less than MTL_MAX_RX_QUEUES (8) so it can easily
fit in an int type. Any larger value for "queue" would lead to an array
overflow when we assign "rx_q = &priv->rx_queue[queue]".
Fixes: de0b90e52a ("net: stmmac: rearrange RX and TX desc init into per-queue basis")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220316083744.GB30941@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The 98DX2530 SoC is similar to the Armada 3700 except it needs a
different MBUS window configuration. Add a new compatible string to
identify this device and the required MBUS window configuration.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently the MPLSoUDP encap offload does the L2 pop implicitly
while adding such action explicitly (vlan eth_push) will cause
the rule to not be offloaded.
Solve it by adding offload support for vlan eth_push in case of
MPLSoUDP decap case.
Flow example:
filter root protocol ip pref 1 flower chain 0
filter root protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 2.2.2.22
src_ip 2.2.2.21
in_hw in_hw_count 1
action order 1: vlan pop_eth pipe
index 1 ref 1 bind 1
used_hw_stats delayed
action order 2: mpls push protocol mpls_uc label 555 tc 3 ttl 255 pipe
index 1 ref 1 bind 1
used_hw_stats delayed
action order 3: tunnel_key set
src_ip 8.8.8.21
dst_ip 8.8.8.22
dst_port 6635
csum
tos 0x4
ttl 6 pipe
index 1 ref 1 bind 1
used_hw_stats delayed
action order 4: mirred (Egress Redirect to device bareudp0) stolen
index 1 ref 1 bind 1
used_hw_stats delayed
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently action pedit of source and destination MACs is used
to fill the MACs in L2 push step in MPLSoUDP decap offload,
this isn't aligned to tc SW which use vlan eth_push action
to do this.
To fix that, offload support for vlan veth_push action is
added together with mpls pop action, and deprecate the use
of pedit of MACs.
Flow example:
filter protocol mpls_uc pref 1 flower chain 0
filter protocol mpls_uc pref 1 flower chain 0 handle 0x1
eth_type 8847
mpls_label 555
enc_dst_port 6635
in_hw in_hw_count 1
action order 1: tunnel_key unset pipe
index 2 ref 1 bind 1
used_hw_stats delayed
action order 2: mpls pop protocol ip pipe
index 2 ref 1 bind 1
used_hw_stats delayed
action order 3: vlan push_eth dst_mac de:a2:ec:d6:69:c8 src_mac de:a2:ec:d6:69:c8 pipe
index 2 ref 1 bind 1
used_hw_stats delayed
action order 4: mirred (Egress Redirect to device enp8s0f0_0) stolen
index 2 ref 1 bind 1
used_hw_stats delayed
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cannot directly return platform_get_irq return irq, there
are operations that need to be undone.
Fixes: bf2b83425b ("net: mv643xx_eth: use platform_get_irq() instead of platform_get_resource()")
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220316012444.2126070-1-chi.minghao@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that devlink ports are protected by the instance lock
it seems natural to pass devlink_port as an argument to
the port_split / port_unsplit callbacks.
This should save the drivers from doing a lookup.
In theory drivers may have supported unsplitting ports
which were not registered prior to this change.
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Let the core take the devlink instance lock around port splitting
and remove the now redundant locking in the drivers.
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Explicitly lock the devlink instance and use devl_ API.
This will be used by the subsequent patch to invoke
.port_split / .port_unsplit callbacks with devlink
instance lock held.
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The whole reason for existence of the pf mutex is that we could
not lock the devlink instance around port splitting. There are
more types of reconfig which can make ports appear or disappear.
Now that the devlink instance lock is exposed to drivers and
"locked" helpers exist we can switch to using the devlink lock
directly.
Next patches will move the locking inside .port_(un)split to
the core.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can replace the PF lock with devlink instance lock in subsequent
changes. To make the patches easier to comprehend and limit line
lengths - factor out the existing locking assertions.
No functional changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We've previously run into many issues related to the latency of a Tx
timestamp completion with the ice hardware. It can be difficult to
determine the root cause of a slow Tx timestamp. To aid in this,
introduce new trace events which capture timing data about when the
driver reaches certain points while processing a transmit timestamp
* ice_tx_tstamp_request: Trace when the stack initiates a new timestamp
request.
* ice_tx_tstamp_fw_req: Trace when the driver begins a read of the
timestamp register in the work thread.
* ice_tx_tstamp_fw_done: Trace when the driver finishes reading a
timestamp register in the work thread.
* ice_tx_tstamp_complete: Trace when the driver submits the skb back to
the stack with a completed Tx timestamp.
These trace events can be enabled using the standard trace event
subsystem exposed by the ice driver. If they are disabled, they become
no-ops with no run time cost.
The following is a simple GNU AWK script which can highlight one
potential way to use the trace events to capture latency data from the
trace buffer about how long the driver takes to process a timestamp:
-----
BEGIN {
PREC=256
}
# Detect requests
/tx_tstamp_request/ {
time=strtonum($4)
skb=$7
# Store the time of request for this skb
requests[skb] = time
printf("skb %s: idx %d at %.6f\n", skb, idx, time)
}
# Detect completions
/tx_tstamp_complete/ {
time=strtonum($4)
skb=$7
idx=$9
if (skb in requests) {
latency = (time - requests[skb]) * 1000
printf("skb %s: %.3f to complete\n", skb, latency)
if (latency > 4) {
printf(">>> HIGH LATENCY <<<\n")
}
printf("\n")
} else {
printf("!!! skb %s (idx %d) at %.6f\n", skb, idx, time)
}
}
-----
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
kthread_create_worker() and tty_alloc_driver() return ERR_PTR()
and never return NULL. The NULL test in the return value check
should be replaced with IS_ERR().
Fixes: 43113ff734 ("ice: add TTY for GNSS module for E810T device")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the following warning as reported by smatch:
smatch warnings:
drivers/net/ethernet/intel/ice/ice_switch.c:5568 ice_find_dummy_packet() warn: inconsistent indenting
Fixes: 9a225f81f5 ("ice: Support GTP-U and GTP-C offload in switchdev")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add Ethernet support for MediaTek SoCs from the mt8195 family.
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rmii_internal clock is needed only when PHY
interface is RMII, and reference clock is from MAC.
Re-arrange the clock setting as following:
1. the optional "rmii_internal" is controlled by devm_clk_get(),
2. other clocks still be configured by devm_clk_bulk_get().
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes dwmac-mediatek reuse more features
supported by stmmac_platform.c.
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements clks_config callback for dwmac-mediatek platform,
which could support platform level clocks management.
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-03-15
Jacob Keller says:
The ice_sriov.c file now houses almost all of the virtualization code in the
ice driver. This includes both Single Root specific implementation as well
as generic functionality such as the virtchnl interface.
We are planning to implement support for Scalable IOV in the ice driver in
the future. This implementation will want to use the generic functionality
in ice_sriov.c
Rather than dump the Scalable IOV code into ice_sriov.c, we will want to
implement it in a separate file, ice_siov.c
To help with this, refactor the code in ice_sriov.c and split the generic
functionality out into separate files.
Reorganize code to make the non-implementation specific bits into new files
with the following general guidelines:
* ice_vf_lib.[ch]
Basic VF structures and accessors. This is where scheme-independent
code will reside.
* ice_virtchnl.[ch]
Virtchnl message handling. This is where the bulk of the logic for
processing messages from VFs using the virtchnl messaging scheme will
reside. This is separated from ice_vf_lib.c because it is somewhat
distinct and stand alone.
* ice_sriov.[ch]
Single Root IOV implementation, including initialization and the
routines for interacting with SR-IOV based netdev operations.
* (future) ice_siov.[ch]
Scalable IOV implementation.
The end goal is to make it easier to re-use the generic parts of the
virtualization logic while keeping separate the concerns of the Single Root
implementation.
In addition to the pure code moves, this series has a reset refactor which
clean up the functionality to make it easier to reuse the reset code. A new
ops table is introduced to make the VF reset logic more generic. The Single
Root specific details are implemented in ice_sriov.c. A future series
implementing Scalable IOV support will use this ops table to allow re-use of
the reset logic which is now in ice_vf_lib.c
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes the handling of fdb entries to use Switchdev events,
instead of the previous "sync_bridge" and "sync_port" which
only run when adding or removing VLANs on the bridge.
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Link: https://lore.kernel.org/r/20220314160918.4rfrrfgmbsf2pxl3@wse-c0155
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix double free possibility in iavf_disable_vf, as crit_lock is
freed in caller, iavf_reset_task. Add kernel-doc for iavf_disable_vf.
Remove mutex_unlock in iavf_disable_vf.
Without this patch there is double free scenario, when calling
iavf_reset_task.
Fixes: e85ff9c631 ("iavf: Fix deadlock in iavf_reset_task")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently fdir_fltr_lock is accessed in ice_vsi_release_all() function
after it is destroyed. Instead destroy mutex after ice_vsi_release_all.
Fixes: 40319796b7 ("ice: Add flow director support for channel mode")
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
It is possible to do NULL pointer dereference in routine that updates
Tx ring stats. Currently only stats and bytes are updated when ring
pointer is valid, but later on ring is accessed to propagate gathered Tx
stats onto VSI stats.
Change the existing logic to move to next ring when ring is NULL.
Fixes: e72bba2135 ("ice: split ice_ring onto Tx/Rx separate structs")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_check_vf_init function takes both a PF and a VF pointer. Every
caller looks up the PF pointer from the VF structure. Some callers only
use of the PF pointer is call this function. Move the lookup inside
ice_check_vf_init and drop the unnecessary argument.
Cleanup the callers to drop the now unnecessary local variables. In
particular, replace the local PF pointer with a HW structure pointer in
ice_vc_get_vf_res_msg which simplifies a few accesses to the HW
structure in that function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Just as we moved the generic virtualization library logic into
ice_vf_lib.c, move the virtchnl message handling into ice_virtchnl.c
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Before we move the virtchnl message handling from ice_sriov.c into
ice_virtchnl.c, cleanup some long line warnings to avoid checkpatch.pl
complaints.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_reset_vf function performs actions which must be taken only
while holding the VF configuration lock. Some flows already acquired the
lock, while other flows must acquire it just for the reset function. Add
the ICE_VF_RESET_LOCK flag to the function so that it can handle taking
and releasing the lock instead at the appropriate scope.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In some cases of resetting a VF, the PF would like to first notify the
VF that a reset is impending. This is currently done via
ice_vc_notify_vf_reset. A wrapper to ice_reset_vf, ice_vf_reset_vf, is
used to call this function first before calling ice_reset_vf.
In fact, every single call to ice_vc_notify_vf_reset occurs just prior
to a call to ice_vc_reset_vf.
Now that ice_reset_vf has flags, replace this separate call with an
ICE_VF_RESET_NOTIFY flag. This removes an unnecessary exported function
of ice_vc_notify_vf_reset, and also makes there be a single function to
reset VFs (ice_reset_vf).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_reset_vf function takes a boolean parameter which indicates
whether or not the reset is due to a VFLR event.
This is somewhat confusing to read because readers must interpret what
"true" and "false" mean when seeing a line of code like
"ice_reset_vf(vf, false)".
We will want to add another toggle to the ice_reset_vf in a following
change. To avoid proliferating many arguments, convert this function to
take flags instead. ICE_VF_RESET_VFLR will indicate if this is a VFLR
reset. A value of 0 indicates no flags.
One could argue that "ice_reset_vf(vf, 0)" is no more readable than
"ice_reset_vf(vf, false)".. However, this type of flags interface is
somewhat common and using 0 to mean "no flags" makes sense in this
context. We could bother to add a define for "ICE_VF_RESET_PLAIN" or
something similar, but this can be confusing since its not an actual bit
flag.
This paves the way to add another flag to the function in a following
change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_reset_vf function returns a boolean value indicating whether or
not the VF reset. This is a bit confusing since it means that callers
need to know how to interpret the return value when needing to indicate
an error.
Refactor the function and call sites to report a regular error code. We
still report success (i.e. return 0) in cases where the reset is in
progress or is disabled.
Existing callers don't care because they do not check the return value.
We keep the error code anyways instead of a void return because we
expect future code which may care about or at least report the error
value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_reset_all_vfs function returns true if any VFs were reset, and
false otherwise. However, no callers check the return value.
Drop this return value and make the function void since the callers do
not care about this.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_reset_all_vfs function takes a parameter to handle whether its
operating after a VFLR event or not. This is not necessary as every
caller always passes true. Simplify the interface by removing the
parameter.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Now that the reset functions do not rely on Single Root specific
behavior, move the ice_reset_vf, ice_reset_all_vfs, and
ice_vf_rebuild_host_cfg functions and their dependent helper functions
out of ice_sriov.c and into ice_vf_lib.c
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We're about to move ice_reset_vf out of ice_sriov.c and into
ice_vf_lib.c
One of the dev_err statements has a checkpatch.pl violation due to
putting the vf->vf_id on the same line as the dev_err. Fix this style
issue first before moving the code.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice driver currently supports virtualization using Single Root IOV,
with code in the ice_sriov.c file. In the future, we plan to also
implement support for Scalable IOV, which uses slightly different
hardware implementations for some functionality.
To eventually allow this, we introduce a new ice_vf_ops structure which
will contain the basic operations that are different between the two IOV
implementations. This primarily includes logic for how to handle the VF
reset registers, as well as what to do before and after rebuilding the
VF's VSI.
Implement these ops structures and call the ops table instead of
directly calling the SR-IOV specific function. This will allow us to
easily add the Scalable IOV implementation in the future. Additionally,
it helps separate the generalized VF logic from SR-IOV specifics. This
change allows us to move the reset logic out of ice_sriov.c and into
ice_vf_lib.c without placing any Single Root specific details into the
generic file.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If we fail to clear the malicious VF indication after a VF reset, the
dev_dbg message which is printed uses the local variable 'i' when it
meant to use vf->vf_id. Fix this.
Fixes: 0891c89674 ("ice: warn about potentially malicious VFs")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Introduce the ice_vf_lib.c file along with the ice_vf_lib.h and
ice_vf_lib_private.h header files.
These files will house the generic VF structures and access functions.
Move struct ice_vf and its dependent definitions into this new header
file.
The ice_vf_lib.c is compiled conditionally on CONFIG_PCI_IOV. Some of
its functionality is required by all driver files. However, some of its
functionality will only be required by other files also conditionally
compiled based on CONFIG_PCI_IOV.
Declaring these functions used only in CONFIG_PCI_IOV files in
ice_vf_lib.h is verbose. This is because we must provide a fallback
implementation for each function in this header since it is included in
files which may not be compiled with CONFIG_PCI_IOV.
Instead, introduce a new ice_vf_lib_private.h header which verifies that
CONFIG_PCI_IOV is enabled. This header is intended to be directly
included in .c files which are CONFIG_PCI_IOV only. Add a #error
indication that will complain if the file ever gets included by another
C file on a kernel with CONFIG_PCI_IOV disabled. Add a comment
indicating the nature of the file and why it is useful.
This makes it so that we can easily define functions exposed from
ice_vf_lib.c into other virtualization files without needing to add
fallback implementations for every single function.
This begins the path to separate out generic code which will be reused
by other virtualization implementations from ice_sriov.h and ice_sriov.c
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-03-14
Jacob Keller says:
The ice_virtchnl_pf.c file has become a single place for a lot of
virtualization functionality. This includes most of the virtchnl message
handling, integration with kernel hooks like the .ndo operations, reset
logic, and more.
We are planning in the future to implement and support Scalable IOV in the
ice driver. To do this, much (but not all) of the code in ice_virtchnl_pf.c
will want to be reused.
Rather than dump all of the Scalable IOV implementation into
ice_virtchnl_pf.c it makes sense to house it in a separate file. But that
still leaves all of the Single Root IOV code littered among more generic
logic.
The long term goal is to re-organize the code such that generic re-usable
code is split into separate files. The ice_sriov.c file would end up
containing all of the Single Root IOV implementation specific details, while
ice_vf_lib.[ch] and ice_virtchnl.[ch] contain the generic pieces.
As a first step, notice that ice_sriov.c currently does not contain much of
the SR-IOV implementation. This is housed primarily in ice_virtchnl_pf.c
The code in ice_sriov.c is really generic and relates to the VF mailbox,
including mailbox overflow detection.
Rename ice_sriov.c to ice_vf_mbx.c, and then rename ice_virtchnl_pf.c to
ice_sriov.c
A later series will finish the refactor by splitting ice_sriov.c into
multiple files, moving the generic code into ice_vf_lib.c and ice_virtchnl.c
To prepare for that series, perform some basic cleanup and other refactors
that we've accumulated during this development cycle.
This series builds on top of the recent hash table refactor work.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: use ice_is_vf_trusted helper function
ice: log an error message when eswitch fails to configure
ice: cleanup error logging for ice_ena_vfs
ice: move ice_set_vf_port_vlan near other .ndo ops
ice: refactor spoofchk control code in ice_sriov.c
ice: rename ICE_MAX_VF_COUNT to avoid confusion
ice: remove unused definitions from ice_sriov.h
ice: convert vf->vc_ops to a const pointer
ice: remove circular header dependencies on ice.h
ice: rename ice_virtchnl_pf.c to ice_sriov.c
ice: rename ice_sriov.c to ice_vf_mbx.c
====================
Link: https://lore.kernel.org/r/20220315011155.2166817-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
1) Revert CHECKSUM_UNNECESSARY for UDP packet from conntrack.
2) Reject unsupported families when creating tables, from Phil Sutter.
3) GRE support for the flowtable, from Toshiaki Makita.
4) Add GRE offload support for act_ct, also from Toshiaki.
5) Update mlx5 driver to support for GRE flowtable offload,
from Toshiaki Makita.
6) Oneliner to clean up incorrect indentation in nf_conntrack_bridge,
from Jiapeng Chong.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: bridge: clean up some inconsistent indenting
net/mlx5: Support GRE conntrack offload
act_ct: Support GRE offload
netfilter: flowtable: Support GRE
netfilter: nf_tables: Reject tables of unsupported family
Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY"
====================
Link: https://lore.kernel.org/r/20220315091513.66544-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IEEE_8021QAZ_MAX_TCS is defined in include/uapi/linux/dcbnl.h, which is
included by net/dcbnl.h. Then, linux/netdevice.h conditionally includes
net/dcbnl.h if CONFIG_DCB is enabled.
Therefore, when CONFIG_DCB is disabled, this indirect dependency is
broken.
There isn't a good reason to include net/dcbnl.h headers into the ocelot
switch library which exports low-level hardware API, so replace
IEEE_8021QAZ_MAX_TCS with OCELOT_NUM_TC which has the same value.
Fixes: 978777d0fb ("net: dsa: felix: configure default-prio and dscp priorities")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220315131215.273450-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The WARN_ON() macro takes a condition, not a warning message.
Fixes: 0933bd0404 ("net: sparx5: Add support for ptp clocks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220314140327.GB30883@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The ice_vc_cfg_promiscuous_mode_msg function directly checks
ICE_VIRTCHNL_VF_CAP_PRIVILEGE, instead of using the existing helper
function ice_is_vf_trusted. Switch this to use the helper function so
that all trusted checks are consistent. This aids in any potential
future refactor by ensuring consistent code.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When ice_eswitch_configure fails, print an error message to make it more
obvious why VF initialization did not succeed.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_ena_vfs function and some of its sub-functions like
ice_set_per_vf_res use a "if (<function>) { <print error> ; <exit> }"
flow. This flow discards specialized errors reported by the called
function.
This style is generally not preferred as it makes tracing error sources
more difficult. It also means we cannot log the actual error received
properly.
Refactor several calls in the ice_ena_vfs function that do this to catch
the error in the 'ret' variable. Report this in the messages, and then
return the more precise error value.
Doing this reveals that ice_set_per_vf_res returns -EINVAL or -EIO in
places where -ENOSPC makes more sense. Fix these calls up to return the
more appropriate value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_set_vf_port_vlan function is located in ice_sriov.c very far
away from the other .ndo operations that it is similar to. Move this so
that its located near the other .ndo operation definitions.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The API to control the VSI spoof checking for a VF VSI has three
functions: enable, disable, and set. The set function takes the VSI and
the VF and decides whether to call enable or disable based on the
vf->spoofchk field.
In some flows, vf->spoofchk is not yet set, such as the function used to
control the setting for a VF. (vf->spoofchk is only updated after a
success).
Simplify this API by refactoring ice_vf_set_spoofchk_cfg to be
"ice_vsi_apply_spoofchk" which takes the boolean and allows all callers
to avoid having to determine whether to call enable or disable
themselves.
This matches the expected callers better, and will prevent the need to
export more than one function when this code must be called from another
file.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ICE_MAX_VF_COUNT field is defined in ice_sriov.h. This count is true
for SR-IOV but will not be true for all VF implementations, such as when
the ice driver supports Scalable IOV.
Rename this definition to clearly indicate ICE_MAX_SRIOV_VFS.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
A few more macros exist in ice_sriov.h which are not used anywhere.
These can be safely removed. Note that ICE_VIRTCHNL_VF_CAP_L2 capability
is set but never checked anywhere in the driver. Thus it is also safe to
remove.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The vc_ops structure is used to allow different handlers for virtchnl
commands when the driver is in representor mode. The current
implementation uses a copy of the ops table in each VF, and modifies
this copy dynamically.
The usual practice in kernel code is to store the ops table in a
constant structure and point to different versions. This has a number of
advantages:
1. Reduced memory usage. Each VF merely points to the correct table,
so they're able to re-use the same constant lookup table in memory.
2. Consistency. It becomes more difficult to accidentally update or
edit only one op call. Instead, the code switches to the correct
able by a single pointer write. In general this is atomic, either
the pointer is updated or its not.
3. Code Layout. The VF structure can store a pointer to the table
without needing to have the full structure definition defined prior
to the VF structure definition. This will aid in future refactoring
of code by allowing the VF pointer to be kept in ice_vf_lib.h while
the virtchnl ops table can be maintained in ice_virtchnl.h
There is one major downside in the case of the vc_ops structure. Most of
the operations in the table are the same between the two current
implementations. This can appear to lead to duplication since each
implementation must now fill in the complete table. It could make
spotting the differences in the representor mode more challenging.
Unfortunately, methods to make this less error prone either add
complexity overhead (macros using CPP token concatenation) or don't work
on all compilers we support (constant initializer from another constant
structure).
The cost of maintaining two structures does not out weigh the benefits
of the constant table model.
While we're making these changes, go ahead and rename the structure and
implementations with "virtchnl" instead of "vc_vf_". This will more
closely align with the planned file renaming, and avoid similar names when
we later introduce a "vf ops" table for separating Scalable IOV and
Single Root IOV implementations.
Leave the accessor/assignment functions in order to avoid issues with
compiling with options disabled. The interface makes it easier to handle
when CONFIG_PCI_IOV is disabled in the kernel.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Several headers in the ice driver include ice.h even though they are
themselves included by that header. The most notable of these is
ice_common.h, but several other headers also do this.
Such a recursive inclusion is problematic as it forces headers to be
included in a strict order, otherwise compilation errors can result. The
circular inclusions do not trigger an endless loop due to standard
header inclusion guards, however other errors can occur.
For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.
ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
creates a circular dependency.
The definition in ice_sriov.h requires things from ice_flow.h, but
ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
process for expanding ice.h. The current code avoids this issue by
having an implicit dependency without the include of ice_flow.h.
If we were to fix that so that ice_sriov.h explicitly depends on
ice_flow.h the following pattern would occur:
ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h
At this point, during the expansion of, the header guard for ice_flow.h
is already set, so when ice_sriov.h attempts to load the ice_flow.h
header it is skipped. Then, we go on to begin including the rest of
ice_sriov.h, including structure definitions which depend on
ice_rss_hash_cfg. This produces a compiler warning because
ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
start of ice_flow.h!
If the order of headers is incorrect (ice_flow.h is not implicitly
loaded first in all files which include ice_sriov.h) then we get the
same failure.
Removing this recursive inclusion requires fixing a few cases where some
headers depended on the header inclusions from ice.h. In addition, a few
other changes are also required.
Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
which is the likely reason that ice_common.h includes ice.h at all. This
macro implementation requires the full definition of ice_pf in order to
properly compile.
Fix this by moving it to a function declared in ice_main.c, so that we
do not require all files to depend on the layout of the ice_pf
structure.
Note that this change only fixes circular dependencies, but it does not
fully resolve all implicit dependencies where one header may depend on
the inclusion of another. I tried to fix as many of the implicit
dependencies as I noticed, but fixing them all requires a somewhat
tedious analysis of each header and attempting to compile it separately.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
code for implementing Single Root IOV virtualization resides. This code
includes support for bringing up and tearing down VFs, hooks into the
kernel SR-IOV netdev operations, and for handling virtchnl messages from
VFs.
In the future, we plan to support Scalable IOV in addition to Single
Root IOV as an alternative virtualization scheme. This implementation
will re-use some but not all of the code in ice_virtchnl_pf.c
To prepare for this future, we want to refactor and split up the code in
ice_virtchnl_pf.c into the following scheme:
* ice_vf_lib.[ch]
Basic VF structures and accessors. This is where scheme-independent
code will reside.
* ice_virtchnl.[ch]
Virtchnl message handling. This is where the bulk of the logic for
processing messages from VFs using the virtchnl messaging scheme will
reside. This is separated from ice_vf_lib.c because it is distinct
and has a bulk of the processing code.
* ice_sriov.[ch]
Single Root IOV implementation, including initialization and the
routines for interacting with SR-IOV based netdev operations.
* (future) ice_siov.[ch]
Scalable IOV implementation.
As a first step, lets assume that all of the code in
ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
ice_sriov.c and its header to ice_sriov.h
Future changes will further split out the code in these files following
the plan outlined here.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_sriov.c file primarily contains code which handles the logic for
mailbox overflow detection and some other utility functions related to
the virtualization mailbox.
The bulk of the SR-IOV implementation is actually found in
ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific.
In the future, the ice driver will support an additional virtualization
scheme known as Scalable IOV, and the code in this file will be used
for this alternative implementation.
Rename this file (and its associated header) to ice_vf_mbx.c, so that we
can later re-use the ice_sriov.c file as the SR-IOV specific file.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the following coccicheck warning:
drivers/net/ethernet/netronome/nfp/flower/action.c:959:7-69: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220312095823.2425775-1-niklas.soderlund@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to sync page pool stats only for active channels. Reading ethtool
stats on a down netdev or a netdev with modified number of channels will
result in a user-after-free, trying to access page pools that are freed
already.
BUG: KASAN: use-after-free in mlx5e_stats_grp_sw_update_stats+0x465/0xf80
Read of size 8 at addr ffff888004835e40 by task ethtool/720
Fixes: cc10e84b2e ("mlx5: add support for page_pool_get_stats")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Joe Damato <jdamato@fastly.com>
Link: https://lore.kernel.org/r/20220312005353.786255-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use kzalloc instead of kmalloc + memset.
The semantic patch that makes this change is:
(https://coccinelle.gitlabpages.inria.fr/website/)
//<smpl>
@@
expression res, size, flag;
@@
- res = kmalloc(size, flag);
+ res = kzalloc(size, flag);
...
- memset(res, 0, size);
//</smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220312102705.71413-3-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch integrates the dpaa2-eth driver with the generic PHY
infrastructure in order to search, find and reconfigure the SerDes lanes
in case of a protocol change.
On the .mac_config() callback, the phy_set_mode_ext() API is called so
that the Lynx 28G SerDes PHY driver can change the lane's configuration.
In the same phylink callback the MC firmware is called so that it
reconfigures the MAC side to run using the new protocol.
The consumer drivers - dpaa2-eth and dpaa2-switch - are updated to call
the dpaa2_mac_start/stop functions newly added which will
power_on/power_off the associated SerDes lane.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic to setup the supported interfaces will get annotated based on
what the configuration of the SerDes PLLs supports. Move the current
setup into a separate function just to try to keep it clean.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retrieve the API version running on the firmware and based on it detect
which features are available for usage.
The first one to be listed is the capability to change the MAC protocol
at runtime.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MC firmware gained recently a new command which can reconfigure the
running protocol on the underlying MAC. Add this new command which will
be used in the next patches in order to do a major reconfig on the
interface.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dpmac_get_api_version command will be used in the next patches to
determine if the current firmware is capable or not to change the
Ethernet protocol running on the MAC.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow the established programming model for this driver and provide
shims in the felix DSA driver which call the implementations from the
ocelot switch lib. The ocelot switchdev driver wasn't integrated with
dcbnl due to lack of hardware availability.
The switch doesn't have any fancy QoS classification enabled by default.
The provided getters will create a default-prio app table entry of 0,
and no dscp entry. However, the getters have been made to actually
retrieve the hardware configuration rather than static values, to be
future proof in case DSA will need this information from more call paths.
For default-prio, there is a single field per port, in ANA_PORT_QOS_CFG,
called QOS_DEFAULT_VAL.
DSCP classification is enabled per-port, again via ANA_PORT_QOS_CFG
(field QOS_DSCP_ENA), and individual DSCP values are configured as
trusted or not through register ANA_DSCP_CFG (replicated 64 times).
An untrusted DSCP value falls back to other QoS classification methods.
If trusted, the selected ANA_DSCP_CFG register also holds the QoS class
in the QOS_DSCP_VAL field.
The hardware also supports DSCP remapping (DSCP value X is translated to
DSCP value Y before the QoS class is determined based on the app table
entry for Y) and DSCP packet rewriting. The dcbnl framework, for being
so flexible in other useless areas, doesn't appear to support this.
So this functionality has been left out.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
ice: GTP support in switchdev
Marcin Szycik says:
Add support for adding GTP-C and GTP-U filters in switchdev mode.
To create a filter for GTP, create a GTP-type netdev with ip tool, enable
hardware offload, add qdisc and add a filter in tc:
ip link add $GTP0 type gtp role <sgsn/ggsn> hsize <hsize>
ethtool -K $PF0 hw-tc-offload on
tc qdisc add dev $GTP0 ingress
tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \
action mirred egress redirect dev $VF1_PR
By default, a filter for GTP-U will be added. To add a filter for GTP-C,
specify enc_dst_port = 2123, e.g.:
tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \
enc_dst_port 2123 action mirred egress redirect dev $VF1_PR
Note: outer IPv6 offload is not supported yet.
Note: GTP-U with no payload offload is not supported yet.
ICE COMMS package is required to create a filter as it contains GTP
profiles.
Changes in iproute2 [1] are required to be able to add GTP netdev and use
GTP-specific options (QFI and PDU type).
[1] https://lore.kernel.org/netdev/20220211182902.11542-1-wojciech.drewek@intel.com/T
---
v2: Add more CC
v3: Fix mail thread, sorry for spam
v4: Add GTP echo response in gtp module
v5: Change patch order
v6: Add GTP echo request in gtp module
v7: Fix kernel-docs in ice
v8: Remove handling of GTP Echo Response
v9: Add sending of multicast message on GTP Echo Response, fix GTP-C dummy
packet selection
v10: Rebase, fixed most 80 char line limits
v11: Rebase, collect Harald's Reviewed-by on patch 3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Before adding yet another possibly contended atomic_long_t,
it is time to add per-cpu storage for existing ones:
dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler
Because many devices do not have to increment such counters,
allocate the per-cpu storage on demand, so that dev_get_stats()
does not have to spend considerable time folding zero counters.
Note that some drivers have abused these counters which
were supposed to be only used by core networking stack.
v4: should use per_cpu_ptr() in dev_get_stats() (Jakub)
v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo)
v2: add a missing include (reported by kernel test robot <lkp@intel.com>)
Change in netdev_core_stats_alloc() (Jakub)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: jeffreyji <jeffreyji@google.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enable binding the nfp driver to NFP3800 and NFP3803 devices.
The PCIE_SRAM offset is different for the NFP3800 device, which also
only supports a single explicit group.
Changes to Dirk's work:
* 48-bit dma addressing is not ready yet. Keep 40-bit dma addressing
for NFP3800.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NFP3800 has slightly different queue controller range bounds.
Use the static chip data instead of defines. This commit
still assumes unchanged descriptor format. Later datapath
changes will allow adjusting for descriptor accounting.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The queue controller (QCP) is accessed based on a device specific
offset. The NFP3800 device also supports more queues.
Furthermore, the NFP3800 VFs also access the QCP differently to how the
NFP6000 VFs accesses it, though still indirectly. Fortunately, we can
remove the offset all together for both VF types. This is safe for
NFP6000 VFs since the offset was effectively a wrap around and only used
for convenience to have it set the same as the NFP6000 PF.
Use nfp_dev_info to store queue controller parameters.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for new chips instead of defines use dev_info constants
to store DMA mask length.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NFP3800 uses a different PCIe configuration to CPP expansion BAR offsets.
We don't need to differentiate between the NFP4000, NFP5000 and NFP6000
since they all use the same offsets.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for supporting new chip add a driver data structure
which will hold per-chip-version information such as register
offsets.
Plumb it through to the relevant functions (nfpcore and nfp_net).
For now only a very simple member holding chip names is added,
following commits will add more.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure the device ID tables are in ascending order.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The model number for NFP3800 and newer devices can be completely
derived from PluDevice register without subtracting 0x10.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The PCI_DEVICE_ID_NETRONOME_NFP6000_VF is available for use and
should be used instead of the PCI_DEVICE_NFP6000VF. Meanwhile,
PCI_DEVICE_NFP6000VF PCI ID is removed for not being used.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Multiple writes cause intermediate pointer values that do not
end on complete TX descriptors.
The QCP peripheral on the NFP provides a number of access
modes. In some access modes, the maximum amount to add must
be restricted to a 6bit value. The particular access mode
used by _nfp_qcp_ptr_add() has no such restrictions, so the
"< NFP_QCP_MAX_ADD" test is unnecessary.
Note that trying to add more that the configured ring size
in a single add will cause a QCP overflow, caught and handled
by the QCP peripheral.
Signed-off-by: Christo du Toit <christo.du.toit@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NFP driver ABI contains a bit for ring prioritization which
was never implemented in the initially envisioned form.
Remove it, and open up the possibility of reclaiming for other uses.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The print function dev_err() is redundant because platform_get_irq()
already prints an error.
Eliminate the follow coccicheck warning:
./drivers/net/ethernet/8390/mcf8390.c:414:2-9: line 414 is redundant
because platform_get_irq() already prints an error
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220311001756.12234-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
alx_reinit has a lockdep assertion that the alx->mtx mutex must be held.
alx_reinit is called from two places: alx_reset and alx_change_mtu.
alx_reset does acquire alx->mtx before calling alx_reinit.
alx_change_mtu does not acquire this mutex, nor do its callers or any
path towards alx_change_mtu.
Acquire the mutex in alx_change_mtu.
The issue was introduced when the fine-grained locking was introduced
to the code to replace the RTNL. The same commit also introduced the
lockdep assertion.
Fixes: 4a5fe57e77 ("alx: use fine-grained locking instead of RTNL")
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Link: https://lore.kernel.org/r/20220310232707.44251-1-dossche.niels@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for creating filters for GTP-U and GTP-C in switchdev mode. Add
support for parsing GTP-specific options (QFI and PDU type) and TEID.
By default, a filter for GTP-U will be added. To add a filter for GTP-C,
specify enc_dst_port = 2123, e.g.:
tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \
enc_dst_port 2123 action mirred egress redirect dev $VF1_PR
Note: GTP-U with outer IPv6 offload is not supported yet.
Note: GTP-U with no payload offload is not supported yet.
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Checking only protocol ids while searching for correct FVs can lead to a
situation, when incorrect FV will be added to the list. Incorrect means
that FV has correct protocol id but incorrect offset.
Call ice_get_sw_fv_list with ice_prot_lkup_ext struct which contains all
protocol ids with offsets.
With this modification allocating and collecting protocol ids list is
not longer needed.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When doing manual injection of the frame, it is required to check if the
TX FIFO is ready to accept the next word of the frame. For this we are
using 'readx_poll_timeout_atomic', the only problem is that before it
actually checks the status, is determining the time when to finish polling
the status. Which seems to be an expensive operation.
Therefore check the status of the TX FIFO before calling
'readx_poll_timeout_atomic'.
Doing this will improve the TX bitrate by ~70%. Because 99% the FIFO is
ready by that time. The measurements were done using iperf3.
Before:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.03 sec 55.2 MBytes 46.2 Mbits/sec 0 sender
[ 5] 0.00-10.04 sec 53.8 MBytes 45.0 Mbits/sec receiver
After:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.10 sec 95.0 MBytes 78.9 Mbits/sec 0 sender
[ 5] 0.00-10.11 sec 95.0 MBytes 78.8 Mbits/sec receiver
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove dev_err() messages after platform_get_irq*() failures.
platform_get_irq() already prints an error.
Generated by: scripts/coccinelle/api/platform_get_irq.cocci
Signed-off-by: Yihao Han <hanyihao@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not recommened to use platform_get_resource(pdev, IORESOURCE_IRQ)
for requesting IRQ's resources any more, as they can be not ready yet in
case of DT-booting.
platform_get_irq() instead is a recommended way for getting IRQ even if
it was not retrieved earlier.
It also makes code simpler because we're getting "int" value right away
and no conversion from resource to int is required.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.
In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq() for DT users only.
While at it propagate error code in emac_dev_stop() in case
platform_get_irq_optional() fails.
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert am65-cpsw driver and am65-cpsw ethtool to use Phylink APIs
as described at Documentation/networking/sfp-phylink.rst. All calls
to Phy APIs are replaced with their equivalent Phylink APIs.
No functional change intended. Use Phylink instead of conventional
Phylib, in preparation to add support for SGMII/QSGMII modes.
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike the legacy EEPROM callbacks, when using the netlink EEPROM query
(get_module_eeprom_by_page) the driver should not try to validate the
query parameters, but just perform the read requested by the userspace.
Recent discussion in the mailing list:
https://lore.kernel.org/netdev/20220120093051.70845141@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net/
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The assumption that the first byte in the module mapping dword is the
module number shouldn't be hard-coded in the driver, but come from
mlx5_ifc structs.
While at it, fix the incorrect width for the 'rx_lane' and 'tx_lane'
fields.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The MCIA register supports either 12 or 32 dwords, use the correct value
by querying the capability from the MCAM register.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SMFS dr matchers are processed sequentially in hardware according to
their priorities, and not skipped if empty.
Currently, smfs ct fs creates four predefined dr matchers per ct
table (ct/ct nat) with hardcoded priority. Compared to dmfs ct fs
using autogroups, this might cause additional hops in fastpath for
traffic patterns that match later priorties, even if previous
priorites are empty, e.g user only using ipv6 UDP traffic will
have additional 3 hops.
Create the matchers dynamically, using the highest priority available,
on first rule usage, and remove them on last usage.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
fs_core layer adds extra book keeping that is either unneeded for CT, or
unused by the underlying software steering, such as allocating FTEs and
FTE ids, saving the match key and mask, and autogroups management.
On top of that, direct steering has a translation layer (fs_dr) from PRM
commands to direct steering objects, for example, creating temporary
dr_action objects. This has a performance impact when dealing
with CT high insertion rate.
To use direct steering (smfs) directly for ct, add a tc ct fs smfs
implementation. Instead of dmfs autogroups, smfs ct fs uses one of 4
predefined dr matchers in CT and CT-NAT tables, for each combination
of tuple ethertype (ipv4/ipv6), and tuple ip_proto (udp/tcp) that
is currently used by nf flow table flow offload.
At rule insertions, validate the flow rule fits one of the predfined
matcher, and insert to it.
To fill the dr_actions of the rule efficiently, create the fwd to post_ct
tbl dr_action at fs init, the count dr_action at counter creation,
and re-use the already pre-allocated modify header dr_action.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add a thin layer that exports selected direct steering (dr) API
which will be used by a ct fs implementation in a following
patch.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If sw steering was used to create the table, dr steeering fs creates
a backing dr table for the mlx5 flow table.
Add helper to return this table so it can be used to create matchers and
add rules on it directly instead of passing via eswitch_offloads/fs_core
insertion.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, fs_core layer provides flow steering services to the driver
including: autogroups, allocating FTEs (flow table entries) and FTE ids,
and support of fte action modification. If then software steering is
configured, rule insertion will go through a translation layer from
firmware buffers to software steering objects (see fs_dr.c).
The connection tracking table is a system table that is not directly
controlled by the user and is a very high scale table. These fs_core
services introduces an overhead that may be optimized by using software
steering API directly.
Introduce ct flow steering interface to allow multiple flow steering
providers. Use the new interface to implement the current dmfs (device
managed flow steering) provider which uses fs_core insertion.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The function is node-aware and gets the node as an argument.
Use a node-aware allocation for the doorbell pgdir structure.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There is no need in include of module.h in the following files.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Previous commits introduced AF_XDP zero-copy support, in which
we need register different mem model for xdp_rxq when AF_XDP
zero-copy is enabled or not. And this should be done after xdp_rxq
info is registered, which is not needed for ctrl port, otherwise
there complaints warnings: "Missing register, driver bug".
Fix this by not registering mem model for ctrl port, just like we
don't register xdp_rxq info for ctrl port.
Fixes: 6402528b7a ("nfp: xsk: add AF_XDP zero-copy Rx and Tx support")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220309135533.10162-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-03-09
This series contains updates to ice driver only.
Martyna implements switchdev filtering on inner EtherType field for
tunnels.
Marcin adds reporting of slowpath statistics for port representors.
Jonathan Toppins changes a non-fatal link error message from warning to
debug.
Maciej removes unnecessary checks in ice_clean_tx_irq().
Amritha adds support for ADQ to match outer destination MAC for tunnels.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Add support for outer dest MAC for ADQ tunnels
ice: avoid XDP checks in ice_clean_tx_irq()
ice: change "can't set link" message to dbg level
ice: Add slow path offload stats on port representor in switchdev
ice: Add support for inner etype in switchdev
====================
Link: https://lore.kernel.org/r/20220309190315.1380414-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 5dbbbd01cb ("ice: Avoid RTNL lock when re-creating
auxiliary device") changes a process of re-creation of aux device
so ice_plug_aux_dev() is called from ice_service_task() context.
This unfortunately opens a race window that can result in dead-lock
when interface has left LAG and immediately enters LAG again.
Reproducer:
```
#!/bin/sh
ip link add lag0 type bond mode 1 miimon 100
ip link set lag0
for n in {1..10}; do
echo Cycle: $n
ip link set ens7f0 master lag0
sleep 1
ip link set ens7f0 nomaster
done
```
This results in:
[20976.208697] Workqueue: ice ice_service_task [ice]
[20976.213422] Call Trace:
[20976.215871] __schedule+0x2d1/0x830
[20976.219364] schedule+0x35/0xa0
[20976.222510] schedule_preempt_disabled+0xa/0x10
[20976.227043] __mutex_lock.isra.7+0x310/0x420
[20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
[20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
[20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core]
[20976.261079] ib_register_device+0x40d/0x580 [ib_core]
[20976.266139] irdma_ib_register_device+0x129/0x250 [irdma]
[20976.281409] irdma_probe+0x2c1/0x360 [irdma]
[20976.285691] auxiliary_bus_probe+0x45/0x70
[20976.289790] really_probe+0x1f2/0x480
[20976.298509] driver_probe_device+0x49/0xc0
[20976.302609] bus_for_each_drv+0x79/0xc0
[20976.306448] __device_attach+0xdc/0x160
[20976.310286] bus_probe_device+0x9d/0xb0
[20976.314128] device_add+0x43c/0x890
[20976.321287] __auxiliary_device_add+0x43/0x60
[20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice]
[20976.330109] ice_service_task+0xd0c/0xed0 [ice]
[20976.342591] process_one_work+0x1a7/0x360
[20976.350536] worker_thread+0x30/0x390
[20976.358128] kthread+0x10a/0x120
[20976.365547] ret_from_fork+0x1f/0x40
...
[20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084
[20976.446469] Call Trace:
[20976.448921] __schedule+0x2d1/0x830
[20976.452414] schedule+0x35/0xa0
[20976.455559] schedule_preempt_disabled+0xa/0x10
[20976.460090] __mutex_lock.isra.7+0x310/0x420
[20976.464364] device_del+0x36/0x3c0
[20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice]
[20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice]
[20976.477288] notifier_call_chain+0x47/0x70
[20976.481386] __netdev_upper_dev_link+0x18b/0x280
[20976.489845] bond_enslave+0xe05/0x1790 [bonding]
[20976.494475] do_setlink+0x336/0xf50
[20976.502517] __rtnl_newlink+0x529/0x8b0
[20976.543441] rtnl_newlink+0x43/0x60
[20976.546934] rtnetlink_rcv_msg+0x2b1/0x360
[20976.559238] netlink_rcv_skb+0x4c/0x120
[20976.563079] netlink_unicast+0x196/0x230
[20976.567005] netlink_sendmsg+0x204/0x3d0
[20976.570930] sock_sendmsg+0x4c/0x50
[20976.574423] ____sys_sendmsg+0x1eb/0x250
[20976.586807] ___sys_sendmsg+0x7c/0xc0
[20976.606353] __sys_sendmsg+0x57/0xa0
[20976.609930] do_syscall_64+0x5b/0x1a0
[20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca
1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
is called from ice_service_task() context, aux device is created
and associated device->lock is taken.
2. Command 'ip link ... set master...' calls ice's notifier under
RTNL lock and that notifier calls ice_unplug_aux_dev(). That
function tries to take aux device->lock but this is already taken
by ice_plug_aux_dev() in step 1
3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
taken in step 2
4. Dead-lock
The patch fixes this issue by following changes:
- Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
call in ice_service_task()
- The bit is checked in ice_clear_rdma_cap() and only if it is not set
then ice_unplug_aux_dev() is called. If it is set (in other words
plugging of aux device was requested and ice_plug_aux_dev() is
potentially running) then the function only clears the bit
- Once ice_plug_aux_dev() call (in ice_service_task) is finished
the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
whether it was already cleared by ice_clear_rdma_cap(). If so then
aux device is unplugged.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some of the bcmgenet platforms don't correctly support WOL, yet
ethtool returns:
"Supports Wake-on: gsf"
which is false.
Ideally if there isn't a wol_irq, or there is something else that
keeps the device from being able to wakeup it should display:
"Supports Wake-on: d"
This patch checks whether the device can wakup, before using the
hard-coded supported flags. This corrects the ethtool reporting, as
well as the WOL configuration because ethtool verifies that the mode
is supported before attempting it.
Fixes: c51de7f397 ("net: bcmgenet: add Wake-on-LAN support code")
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Peter Robinson <pbrobinson@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220310045535.224450-1-jeremy.linton@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If bus->state is equal to MDIOBUS_ALLOCATED, mdiobus_free(bus) will free
the "bus". But bus->name is still used in the next line, which will lead
to a use after free.
We can fix it by putting the name in a local variable and make the
bus->name point to the rodata section "name",then use the name in the
error message without referring to bus to avoid the uaf.
Fixes: 95b5fc03c1 ("net: arc_emac: Make use of the helper function dev_err_probe()")
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Link: https://lore.kernel.org/r/20220309121824.36529-1-niejianglei2021@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1) Remove kernel log prints on FW events regarding FW pages management
and replace that with debugfs entries to track FW pages management commands
failures and general stats, we do that for all FW commands in general since
it's the same effort to do so under the already existing debugfs entry for
FW commands.
2) Add support for ConnectX-7 Software managed steering, in other words STEv2
which shares a lot in common with STE V1, the difference is in specific
offsets in the devices, the logic is almost the same, thus we implement
STEv1 and STEv2 in the same file.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmIpHREACgkQSD+KveBX
+j7Ubwf/WDV1+UDECznQtWAZ3DMmSlVlfUDgxrdaNiJ1BRFEk4fg3yBLyEtYlVBR
Rw0ZrtkuZo31Iu92P2rvOlExmMJuGNBMpBpNwXN1TZuG43DKMUY3RVHB27O9Y9zQ
z4BOhYFVu4Bhn7euO1Icu9YlDdVWGssb0sjAN7iFyCghzA6VeuPpJxYPfNgQNZrx
frHLKTW2tHqKvCCIv7oxZpG2zcg0wyV4QgG0P2XOYddRDTCPZTbLGrQEaCos0WcM
uKy5vJWjbW1lxxOW2S7uYSujdTttkpC/ltv1Pe47QFZsCAtGV/P5S34D170Lu/t5
J1suur834H6o2qm9hrDj7QDwcjwsYw==
=f7Ae
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2022-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-03-09
1) Remove kernel log prints on FW events regarding FW pages management
and replace that with debugfs entries to track FW pages management commands
failures and general stats, we do that for all FW commands in general since
it's the same effort to do so under the already existing debugfs entry for
FW commands.
2) Add support for ConnectX-7 Software managed steering, in other words STEv2
which shares a lot in common with STE V1, the difference is in specific
offsets in the devices, the logic is almost the same, thus we implement
STEv1 and STEv2 in the same file.
* tag 'mlx5-updates-2022-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: DR, Add support for ConnectX-7 steering
net/mlx5: DR, Refactor ste_ctx handling for STE v0/1
net/mlx5: DR, Rename action modify fields to reflect naming in HW spec
net/mlx5: DR, Fix handling of different actions on the same STE in STEv1
net/mlx5: DR, Remove unneeded comments
net/mlx5: DR, Add support for matching on Internet Header Length (IHL)
net/mlx5: DR, Align mlx5dv_dr API vport action with FW behavior
net/mlx5: Add debugfs counters for page commands failures
net/mlx5: Add pages debugfs
net/mlx5: Move debugfs entries to separate struct
net/mlx5: Change release_all_pages cap bit location
net/mlx5: Remove redundant error on reclaim pages
net/mlx5: Remove redundant error on give pages
net/mlx5: Remove redundant notify fail on give pages
net/mlx5: Add command failures data to debugfs
net/mlx5e: TC, Fix use after free in mlx5e_clone_flow_attr_for_post_act()
====================
Link: https://lore.kernel.org/r/20220309213755.610202-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The of_find_compatible_node() function returns a node pointer with
refcount incremented, We should use of_node_put() on it when done
Add the missing of_node_put() to release the refcount.
Fixes: 7349a74ea7 ("net: ethernet: gianfar_ethtool: get phc index through drvdata")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20220310015313.14938-1-linmq006@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use napi_alloc_skb to allocate memory when refilling the RX ring
in axienet_poll for more efficiency. napi_alloc_skb() can reuse
softirq-local cache of freed skbs which may still be cache-warm
and skipping allocator calls.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20220308211013.1530955-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Include the TLS headers unconditionally and define driver TLS symbols
used in code compiled also when CONFIG_TLS_DEVICE=n to fix the
following errors:
../drivers/net/ethernet/fungible/funeth/funeth_tx.c: In function ‘write_pkt_desc’:
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:13: error: implicit declaration of function ‘tls_driver_ctx’ [-Werror=implicit-function-declaration]
244 | tls_ctx = tls_driver_ctx(skb->sk, TLS_OFFLOAD_CTX_DIR_TX);
| ^~~~~~~~~~~~~~
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:37: error: ‘TLS_OFFLOAD_CTX_DIR_TX’ undeclared (first use in this function)
244 | tls_ctx = tls_driver_ctx(skb->sk, TLS_OFFLOAD_CTX_DIR_TX);
| ^~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:37: note: each undeclared identifier is reported only once for each function it appears in
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:245:23: error: dereferencing pointer to incomplete type ‘struct fun_ktls_tx_ctx’
245 | tls->tlsid = tls_ctx->tlsid;
| ^~
../drivers/net/ethernet/fungible/funeth/funeth_tx.c: In function ‘fun_start_xmit’:
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:310:6: error: implicit declaration of function ‘tls_is_sk_tx_device_offloaded’ [-Werror=implicit-function-declaration]
310 | tls_is_sk_tx_device_offloaded(skb->sk)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:311:9: error: implicit declaration of function ‘fun_tls_tx’; did you mean ‘fun_xdp_tx’? [-Werror=implicit-function-declaration]
311 | skb = fun_tls_tx(skb, q, &tls_len);
| ^~~~~~~~~~
| fun_xdp_tx
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:311:7: warning: assignment to ‘struct sk_buff *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
311 | skb = fun_tls_tx(skb, q, &tls_len);
| ^
Fixes: db37bc177d ("net/funeth: add the data path")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts:
commit 02acd39953 ("bnxt_en: parse result field when NVRAM package install fails")
commit 22f5dba506 ("bnxt_en: add an nvm test for hw diagnose")
commit bafed3f231 ("bnxt_en: implement hw health reporter")
These patches are still under discussion / I don't think they
are right, and since the authors don't reply promptly let me
lessen my load of "things I need to resolve before next release"
and revert them.
Acked-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220308173659.304915-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If HW doesn't support PTP, then it doesn't support it. This is neither
a problem nor can the user do something about it. Therefore change the
message level to info.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/ee685745-f1ab-e9bf-f20e-077d55dff441@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is occasional suspend error from e1000e which blocks the
system from further suspending. And the issue was found on
a WhiskeyLake-U platform with I219-V:
[ 20.078957] PM: pci_pm_suspend(): e1000e_pm_suspend+0x0/0x780 [e1000e] returns -2
[ 20.078970] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -2
[ 20.078974] e1000e 0000:00:1f.6: PM: pci_pm_suspend+0x0/0x170 returned -2 after 371012 usecs
[ 20.078978] e1000e 0000:00:1f.6: PM: failed to suspend async: error -2
According to the code flow, this might be caused by broken MDI read/write
to PHY registers. However currently the code does not tell us which
register is broken. Thus enhance the debug information to print the
offender PHY register. So the next the issue is reproduced, this
information could be used for narrow down.
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220308172030.451566-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for a new SW format version that is implemented by
ConnectX-7.
Except for several differences, the STEv2 is identical to STEv1, so for
most callbacks the STEv2 context struct will call STEv1 functions.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As preparation for supporting ConnectX-7, this patches changes handling
of ste_ctx handling for existing STE v0 and V1:
- each context is now a static struct, and it has a corresponding getter
- v0 and v1 were extended to contain the fields that are required for
integrating STEv2.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As preparation for supporting ConnectX-7, rename action modify fields
steering registers from arbitrary names to the names that reflect the
corresponding naming and location of the steering registers in HW.
These registers mapping has changed in ConnectX-7, so the renaming allows
to keep track of their mapping better.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fix handling of various conditions in set_actions_rx/tx that check
whether different actions can be on the same STE.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Remove two comments that were erroneously left in the code.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support for matching on new field - Internet Header Length (IHL).
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This aligns the behavior with FW when creating an FDB rule with wire
vport destination but no source port matching. Until now such rules
would fail on internal DR RX rule creation since the source and
destination are the wire vport.
The new behavior is the same as done on FW steering, if destination is
wire, we will create both TX and RX rules, but the RX packet coming from
wire will be dropped due to loopback not supported.
Signed-off-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add the following new debugfs counters for debug and verbosity:
fw_pages_alloc_failed - number of pages FW requested but driver failed
to allocate.
give_pages_dropped - number of pages given to FW, but command give pages
failed by FW.
reclaim_pages_discard - number of pages which were about to reclaim back
and FW failed the command.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add pages debugfs to expose the following counters for debuggability:
fw_pages_total - How many pages were given to FW and not returned yet.
vfs_pages - For SRIOV, how many pages were given to FW for virtual
functions usage.
host_pf_pages - For ECPF, how many pages were given to FW for external
hosts physical functions usage.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Move the debugfs entry pointers under priv to their own struct.
Add get function for device debugfs root.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If reclaim pages was triggered by FW event and FW failed the command,
the driver should ignore as FW is aware and will handle it.
The downstream patch will add a debugfs counter on this flow for
debuggability.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If give pages was triggered by FW event and FW failed the command,
the driver should ignore as FW is aware and will handle it.
The downstream patch will add a debugfs counter on this flow for
debuggability.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If give pages command failed by FW, there is no need to notify the FW on
the failure. FW is aware and will handle it.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add new counters to command interface debugfs to count command failures.
The following counters added:
total_failed - number of times command failed (any kind of failure).
failed_mbox_status - number of times command failed on bad status
returned by FW.
In addition, add data about last command failure to command interface
debugfs:
last_failed_errno - last command failed returned errno.
last_failed_mbox_status - last bad status returned by FW.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>