Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-17 (iavf)
This series contains updates to iavf driver only.
Ding Hui fixes use-after-free issue by calling netif_napi_del() for all
allocated q_vectors. He also resolves out-of-bounds issue by not
updating to new values when timeout is encountered.
Marcin and Ahmed change the way resets are handled so that the callback
operating under the RTNL lock will wait for the reset to finish, the
rtnl_lock sensitive functions in reset flow will schedule the netdev update
for later in order to remove circular dependency with the critical lock.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: fix reset task race with iavf_remove()
iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies
Revert "iavf: Do not restart Tx queues after reset task failure"
Revert "iavf: Detach device during reset task"
iavf: Wait for reset in callbacks which trigger it
iavf: use internal state to free traffic IRQs
iavf: Fix out-of-bounds when setting channels on remove
iavf: Fix use-after-free in free_netdev
====================
Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hardware generated encryption and ICV tags are found to
be wrong when tested with IEEE MACSEC test vectors.
This is because as per the HRM, the hash key (derived by
AES-ECB block encryption of an all 0s block with the SAK)
has to be programmed by the software in
MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register.
Hence fix this by generating hash key in software and
configuring in hardware.
Fixes: c54ffc7360 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In normal operation, each populated queue item has
next_to_watch pointing to the last TX desc of the packet,
while each cleaned item has it set to 0. In particular,
next_to_use that points to the next (necessarily clean)
item to use has next_to_watch set to 0.
When the TX queue is used both by an application using
AF_XDP with ZEROCOPY as well as a second non-XDP application
generating high traffic, the queue pointers can get in
an invalid state where next_to_use points to an item
where next_to_watch is NOT set to 0.
However, the implementation assumes at several places
that this is never the case, so if it does hold,
bad things happen. In particular, within the loop inside
of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
Finally, this prevents any further transmission via
this queue and it never gets unblocked or signaled.
Secondly, if the queue is in this garbled state,
the inner loop of igc_clean_tx_ring() will never terminate,
completely hogging a CPU core.
The reason is that igc_xdp_xmit_zc() reads next_to_use
before acquiring the lock, and writing it back
(potentially unmodified) later. If it got modified
before locking, the outdated next_to_use is written
pointing to an item that was already used elsewhere
(and thus next_to_watch got written).
Fixes: 9acf59a752 ("igc: Enable TX via AF_XDP zero-copy")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current driver enables backpressure for LBK interfaces.
But these interfaces do not support this feature.
Hence, this patch fixes the issue by skipping the
backpressure configuration for these interfaces.
Fixes: 75f3627099 ("octeontx2-pf: Support to enable/disable pause frames via ethtool").
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-14 (ice)
This series contains updates to ice driver only.
Petr Oros removes multiple calls made to unregister netdev and
devlink_port.
Michal fixes null pointer dereference that can occur during reload.
====================
Link: https://lore.kernel.org/r/20230714201041.1717834-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The reset task is currently scheduled from the watchdog or adminq tasks.
First, all direct calls to schedule the reset task are replaced with the
iavf_schedule_reset(), which is modified to accept the flag showing the
type of reset.
To prevent the reset task from starting once iavf_remove() starts, we need
to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now
easily added to iavf_schedule_reset().
Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task.
It is redundant since all callers who set the flag immediately schedules
the reset task.
Fixes: 3ccd54ef44 ("iavf: Fix init state closure on remove")
Fixes: 14756b2ae2 ("iavf: Fix __IAVF_RESETTING state usage")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
A driver's lock (crit_lock) is used to serialize all the driver's tasks.
Lockdep, however, shows a circular dependency between rtnl and
crit_lock. This happens when an ndo that already holds the rtnl requests
the driver to reset, since the reset task (in some paths) tries to grab
rtnl to either change real number of queues of update netdev features.
[566.241851] ======================================================
[566.241893] WARNING: possible circular locking dependency detected
[566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE
[566.241984] ------------------------------------------------------
[566.242025] repro.sh/2604 is trying to acquire lock:
[566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
[566.242167]
but task is already holding lock:
[566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
[566.242300]
which lock already depends on the new lock.
[566.242353]
the existing dependency chain (in reverse order) is:
[566.242401]
-> #1 (rtnl_mutex){+.+.}-{3:3}:
[566.242451] __mutex_lock+0xc1/0xbb0
[566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf]
[566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf]
[566.242627] process_one_work+0x2b3/0x560
[566.242663] worker_thread+0x4f/0x3a0
[566.242696] kthread+0xf2/0x120
[566.242730] ret_from_fork+0x29/0x50
[566.242763]
-> #0 (&adapter->crit_lock){+.+.}-{3:3}:
[566.242815] __lock_acquire+0x15ff/0x22b0
[566.242869] lock_acquire+0xd2/0x2c0
[566.242901] __mutex_lock+0xc1/0xbb0
[566.242934] iavf_close+0x3c/0x240 [iavf]
[566.242997] __dev_close_many+0xac/0x120
[566.243036] dev_close_many+0x8b/0x140
[566.243071] unregister_netdevice_many_notify+0x165/0x7c0
[566.243116] unregister_netdevice_queue+0xd3/0x110
[566.243157] iavf_remove+0x6c1/0x730 [iavf]
[566.243217] pci_device_remove+0x33/0xa0
[566.243257] device_release_driver_internal+0x1bc/0x240
[566.243299] pci_stop_bus_device+0x6c/0x90
[566.243338] pci_stop_and_remove_bus_device+0xe/0x20
[566.243380] pci_iov_remove_virtfn+0xd1/0x130
[566.243417] sriov_disable+0x34/0xe0
[566.243448] ice_free_vfs+0x2da/0x330 [ice]
[566.244383] ice_sriov_configure+0x88/0xad0 [ice]
[566.245353] sriov_numvfs_store+0xde/0x1d0
[566.246156] kernfs_fop_write_iter+0x15e/0x210
[566.246921] vfs_write+0x288/0x530
[566.247671] ksys_write+0x74/0xf0
[566.248408] do_syscall_64+0x58/0x80
[566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[566.249886]
other info that might help us debug this:
[566.252014] Possible unsafe locking scenario:
[566.253432] CPU0 CPU1
[566.254118] ---- ----
[566.254800] lock(rtnl_mutex);
[566.255514] lock(&adapter->crit_lock);
[566.256233] lock(rtnl_mutex);
[566.256897] lock(&adapter->crit_lock);
[566.257388]
*** DEADLOCK ***
The deadlock can be triggered by a script that is continuously resetting
the VF adapter while doing other operations requiring RTNL, e.g:
while :; do
ip link set $VF up
ethtool --set-channels $VF combined 2
ip link set $VF down
ip link set $VF up
ethtool --set-channels $VF combined 4
ip link set $VF down
done
Any operation that triggers a reset can substitute "ethtool --set-channles"
As a fix, add a new task "finish_config" that do all the work which
needs rtnl lock. With the exception of iavf_remove(), all work that
require rtnl should be called from this task.
As for iavf_remove(), at the point where we need to call
unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
finish_config work cannot restart after that since the task is guarded
by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
Fixes: 5ac49f3c27 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This reverts commit 08f1c147b7.
Netdev is no longer being detached during reset, so this fix can be
reverted. We leave the removal of "hacky" IFF_UP flag update.
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This reverts commit aa626da947.
Detaching device during reset was not fully fixing the rtnl locking issue,
as there could be a situation where callback was already in progress before
detaching netdev.
Furthermore, detaching netdevice causes TX timeouts if traffic is running.
To reproduce:
ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null &
while :; do
for i in 200 7000 400 5000 300 3000 ; do
ip netns exec ns1 ip link set $VF1 mtu $i
sleep 2
done
sleep 10
done
Currently, callbacks such as iavf_change_mtu() wait for the reset.
If the reset fails to acquire the rtnl_lock, they schedule the netdev
update for later while continuing the reset flow. Operations like MTU
changes are performed under the rtnl_lock. Therefore, when the operation
finishes, another callback that uses rtnl_lock can start.
Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There was a fail when trying to add the interface to bonding
right after changing the MTU on the interface. It was caused
by bonding interface unable to open the interface due to
interface being in __RESETTING state because of MTU change.
Add new reset_waitqueue to indicate that reset has finished.
Add waiting for reset to finish in callbacks which trigger hw reset:
iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
We use a 5000ms timeout period because on Hyper-V based systems,
this operation takes around 3000-4000ms. In normal circumstances,
it doesn't take more than 500ms to complete.
Add a function iavf_wait_for_reset() to reuse waiting for reset code and
use it also in iavf_set_channels(), which already waits for reset.
We don't use error handling in iavf_set_channels() as this could
cause the device to be in incorrect state if the reset was scheduled
but hit timeout or the waitng function was interrupted by a signal.
Fixes: 4e5e6b5d9d ("iavf: Fix return of set the new channel count")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If the system tries to close the netdev while iavf_reset_task() is
running, __LINK_STATE_START will be cleared and netif_running() will
return false in iavf_reinit_interrupt_scheme(). This will result in
iavf_free_traffic_irqs() not being called and a leak as follows:
[7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
[7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
is shown when pci_disable_msix() is later called. Fix by using the
internal adapter state. The traffic IRQs will always exist if
state == __IAVF_RUNNING.
Fixes: 5b36e8d04b ("i40evf: Enable VF to request an alternate queue allocation")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Referenced commit missed that for chip versions 42 and 43 ASPM
remained disabled in the respective rtl_hw_start_...() routines.
This resulted in problems as described in the referenced bug
ticket. Therefore re-instantiate the previous logic.
Fixes: 5fc3f6c90c ("r8169: consolidate disabling ASPM before EPHY access")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217635
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 6624e780a5 ("ice: split ice_vsi_setup into smaller
functions") ice_vsi_release does things twice. There is unregister
netdev which is unregistered in ice_deinit_eth also.
It also unregisters the devlink_port twice which is also unregistered
in ice_deinit_eth(). This double deregistration is hidden because
devl_port_unregister ignores the return value of xa_erase.
[ 68.642167] Call Trace:
[ 68.650385] ice_devlink_destroy_pf_port+0xe/0x20 [ice]
[ 68.655656] ice_vsi_release+0x445/0x690 [ice]
[ 68.660147] ice_deinit+0x99/0x280 [ice]
[ 68.664117] ice_remove+0x1b6/0x5c0 [ice]
[ 171.103841] Call Trace:
[ 171.109607] ice_devlink_destroy_pf_port+0xf/0x20 [ice]
[ 171.114841] ice_remove+0x158/0x270 [ice]
[ 171.118854] pci_device_remove+0x3b/0xc0
[ 171.122779] device_release_driver_internal+0xc7/0x170
[ 171.127912] driver_detach+0x54/0x8c
[ 171.131491] bus_remove_driver+0x77/0xd1
[ 171.135406] pci_unregister_driver+0x2d/0xb0
[ 171.139670] ice_module_exit+0xc/0x55f [ice]
Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Identify issues that arise by using the tests/doublebitand.cocci
semantic patch. Need to remove duplicate expression in if statement.
Signed-off-by: Wang Ming <machel@vivo.com>
Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is expected that most callers should _ignore_ the errors return by
debugfs_create_dir() in bnad_debugfs_init().
Signed-off-by: Wang Ming <machel@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the call to of_get_ethdev_address to mtk_add_mac which is part of
the probe function and can hence itself return -EPROBE_DEFER should
of_get_ethdev_address return -EPROBE_DEFER. This allows us to entirely
get rid of the mtk_init function.
The problem of of_get_ethdev_address returning -EPROBE_DEFER surfaced
in situations in which the NVMEM provider holding the MAC address has
not yet be loaded at the time mtk_eth_soc is initially probed. In this
case probing of mtk_eth_soc should be deferred instead of falling back
to use a random MAC address, so once the NVMEM provider becomes
available probing can be repeated.
Fixes: 656e705243 ("net-next: mediatek: add support for MT7623 ethernet")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CPSW ALE has 75 bit ALE entries which are stored within three 32 bit words.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions assume that the
field will be strictly contained within one word. However, this is not
guaranteed to be the case and it is possible for ALE field entries to span
across up to two words at the most.
Fix the methods to handle getting/setting fields spanning up to two words.
Fixes: db82173f23 ("netdev: driver: ethernet: add cpsw address lookup engine support")
Signed-off-by: Tanmay Patil <t-patil@ti.com>
[s-vadapalli@ti.com: rephrased commit message and added Fixes tag]
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some device types like TXGBE_ID_XAUI, *checksum computed in
txgbe_calc_eeprom_checksum() is larger than TXGBE_EEPROM_SUM. Remove the
limit on the size of *checksum.
Fixes: 049fe53653 ("net: txgbe: Add operations to interact with firmware")
Fixes: 5e2ea7801f ("net: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://lore.kernel.org/r/20230711063414.3311-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ENA adapters on our instances occasionally reset. Once recently
logged a UBSAN failure to console in the process:
UBSAN: shift-out-of-bounds in build/linux/drivers/net/ethernet/amazon/ena/ena_com.c:540:13
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 28 PID: 70012 Comm: kworker/u72:2 Kdump: loaded not tainted 5.15.117
Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017
Workqueue: ena ena_fw_reset_device [ena]
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
? __const_udelay+0x43/0x50
ena_delay_exponential_backoff_us.cold+0x16/0x1e [ena]
wait_for_reset_state+0x54/0xa0 [ena]
ena_com_dev_reset+0xc8/0x110 [ena]
ena_down+0x3fe/0x480 [ena]
ena_destroy_device+0xeb/0xf0 [ena]
ena_fw_reset_device+0x30/0x50 [ena]
process_one_work+0x22b/0x3d0
worker_thread+0x4d/0x3f0
? process_one_work+0x3d0/0x3d0
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
Apparently, the reset delays are getting so large they can trigger a
UBSAN panic.
Looking at the code, the current timeout is capped at 5000us. Using a
base value of 100us, the current code will overflow after (1<<29). Even
at values before 32, this function wraps around, perhaps
unintentionally.
Cap the value of the exponent used for this backoff at (1<<16) which is
larger than currently necessary, but large enough to support bigger
values in the future.
Cc: stable@vger.kernel.org
Fixes: 4bb7f4cf60 ("net: ena: reduce driver load time")
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230711013621.GE1926@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
igc: Fix corner cases for TSN offload
Florian Kauer says:
The igc driver supports several different offloading capabilities
relevant in the TSN context. Recent patches in this area introduced
regressions for certain corner cases that are fixed in this series.
Each of the patches (except the first one) addresses a different
regression that can be separately reproduced. Still, they have
overlapping code changes so they should not be separately applied.
Especially #4 and #6 address the same observation,
but both need to be applied to avoid TX hang occurrences in
the scenario described in the patches.
====================
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to hardware limitation, MCAM drop rule with
ether_type == 802.1Q and vlan_id == 0 is not supported. Hence rejecting
such rules.
Fixes: dce677da57 ("octeontx2-pf: Add vlan-etype to ntuple filters")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20230710103027.2244139-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In the case of heavy XDP traffic to be transmitted, the console
will print the error log continuously if there are lack of enough
BDs to accommodate the frames. The log looks like below.
[ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
Not only will this log be replicated and redundant, it will also
degrade XDP performance. So we use netdev_err_once() instead of
netdev_err() now.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When the XDP feature is enabled and with heavy XDP frames to be
transmitted, there is a considerable probability that available
tx BDs are insufficient. This will lead to some XDP frames to be
discarded and the "NOT enough BD for SG!" error log will appear
in the console (as shown below).
[ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
In the case of heavy XDP traffic, sometimes the speed of recycling
tx BDs may be slower than the speed of sending XDP frames. There
may be several specific reasons, such as the interrupt is not
responsed in time, the efficiency of the NAPI callback function is
too low due to all the queues (tx queues and rx queues) share the
same NAPI, and so on.
After trying various methods, I think that increase the size of tx
BD ring is simple and effective. Maybe the best resolution is that
allocate NAPI for each queue to improve the efficiency of the NAPI
callback, but this change is a bit big and I didn't try this method.
Perheps this method will be implemented in a future patch.
This patch also updates the tx_wake_threshold of tx ring which is
related to the size of tx ring in the previous logic. Otherwise,
the tx_wake_threshold will be too high (403 BDs), which is more
likely to impact the slow path in the case of heavy XDP traffic,
because XDP path and slow path share the tx BD rings. According
to Jakub's suggestion, the tx_wake_threshold is at least equal to
tx_stop_threshold + 2 * MAX_SKB_FRAGS, if a queue of hundreds of
entries is overflowing, we should be able to apply a hysteresis
of a few tens of entries.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Once the XDP frames have been successfully transmitted through the
ndo_xdp_xmit() interface, it's the driver responsibility to free
the frames so that the page_pool can recycle the pages and reuse
them. However, this action is not implemented in the fec driver.
This leads to a user-visible problem that the console will print
the following warning log.
[ 157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec
[ 217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec
[ 278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec
[ 338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec
[ 399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec
Therefore, to solve this issue, we free XDP frames via xdp_return_frame()
while cleaning the tx BD ring.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When a XDP program is installed or uninstalled, fec_restart() will
be invoked to reset MAC and buffer descriptor rings. It's reasonable
not to transmit any packet during the process of reset. However, the
NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default,
that is to say, it's possible that the fec_enet_xdp_xmit() will be
invoked even if the process of reset is not finished. In this case,
the redirected XDP frames might be dropped and available transmit BDs
may be incorrectly deemed insufficient. So this patch disable the
NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure
this feature when the bpf program is installed or uninstalled.
Fixes: e4ac7cc6e5 ("net: fec: turn on XDP features")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The insertion of an empty frame was introduced with
commit db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
in order to ensure that the current cycle has at least one packet if
there is some packet to be scheduled for the next cycle.
However, the current implementation does not properly check if
a packet is already scheduled for the current cycle. Currently,
an empty packet is always inserted if and only if
txtime >= end_of_cycle && txtime > last_tx_cycle
but since last_tx_cycle is always either the end of the current
cycle (end_of_cycle) or the end of a previous cycle, the
second part (txtime > last_tx_cycle) is always true unless
txtime == last_tx_cycle.
What actually needs to be checked here is if the last_tx_cycle
was already written within the current cycle, so an empty frame
should only be inserted if and only if
txtime >= end_of_cycle && end_of_cycle > last_tx_cycle.
This patch does not only avoid an unnecessary insertion, but it
can actually be harmful to insert an empty packet if packets
are already scheduled in the current cycle, because it can lead
to a situation where the empty packet is actually processed
as the first packet in the upcoming cycle shifting the packet
with the first_flag even one cycle into the future, finally leading
to a TX hang.
The TX hang can be reproduced on a i225 with:
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <32>
TDT <3c>
next_to_use <3c>
next_to_clean <32>
buffer_info[next_to_clean]
time_stamp <ffff26a8>
next_to_watch <00000000632a1828>
jiffies <ffff27f8>
desc.status <1048000>
Fixes: db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
It is possible (verified on a running system) that frames are processed
by igc_tx_launchtime with a txtime before the start of the cycle
(baset_est).
However, the result of txtime - baset_est is written into a u32,
leading to a wrap around to a positive number. The following
launchtime > 0 check will only branch to executing launchtime = 0
if launchtime is already 0.
Fix it by using a s32 before checking launchtime > 0.
Fixes: db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The flags IGC_TXQCTL_STRICT_CYCLE and IGC_TXQCTL_STRICT_END
prevent the packet transmission over slot and cycle boundaries.
This is important for taprio offload where the slots and
cycles correspond to the slots and cycles configured for the
network.
However, the Qbv offload feature of the i225 is also used for
enabling TX launchtime / ETF offload. In that case, however,
the cycle has no meaning for the network and is only used
internally to adapt the base time register after a second has
passed.
Enabling strict mode in this case would unnecessarily prevent
the transmission of certain packets (i.e. at the boundary of a
second) and thus interferes with the ETF qdisc that promises
transmission at a certain point in time.
Similar to ETF, this also applies to CBS offload that also should
not be influenced by strict mode unless taprio offload would be
enabled at the same time.
This fully reverts
commit d8f45be01d ("igc: Use strict cycles for Qbv scheduling")
but its commit message only describes what was already implemented
before that commit. The difference to a plain revert of that commit
is that it now copes with the base_time = 0 case that was fixed with
commit e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
In particular, enabling strict mode leads to TX hang situations
under high traffic if taprio is applied WITHOUT taprio offload
but WITH ETF offload, e.g. as in
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <d0>
TDT <f0>
next_to_use <f0>
next_to_clean <d0>
buffer_info[next_to_clean]
time_stamp <ffff661f>
next_to_watch <00000000245a4efb>
jiffies <ffff6e48>
desc.status <1048000>
Fixes: d8f45be01d ("igc: Use strict cycles for Qbv scheduling")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Since commit e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
it is possible to enable taprio offload with a basetime of 0.
However, the check if taprio offload is already enabled (and thus -EALREADY
should be returned for igc_save_qbv_schedule) still relied on
adapter->base_time > 0.
This can be reproduced as follows:
# TAPRIO offload (flags == 0x2) and base-time = 0
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x2
# The second call should fail with "Error: Device failed to setup taprio offload."
# But that only happens if base-time was != 0
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x2
Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Only set adapter->taprio_offload_enable after validating the arguments.
Otherwise, it stays set even if the offload was not enabled.
Since the subsequent code does not get executed in case of invalid
arguments, it will not be read at first.
However, by activating and then deactivating another offload
(e.g. ETF/TX launchtime offload), taprio_offload_enable is read
and erroneously keeps the offload feature of the NIC enabled.
This can be reproduced as follows:
# TAPRIO offload (flags == 0x2) and negative base-time leading to expected -ERANGE
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time -1000 \
sched-entry S 01 300000 \
flags 0x2
# IGC_TQAVCTRL is 0x0 as expected (iomem=relaxed for reading register)
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
# Activate ETF offload
sudo tc qdisc replace dev enp1s0 parent root handle 6666 mqprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
hw 0
sudo tc qdisc add dev enp1s0 parent 6666:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload
# IGC_TQAVCTRL is 0x9 as expected
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
# Deactivate ETF offload again
sudo tc qdisc delete dev enp1s0 parent 6666:1
# IGC_TQAVCTRL should now be 0x0 again, but is observed as 0x9
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In the current implementation the flags adapter->qbv_enable
and IGC_FLAG_TSN_QBV_ENABLED have a similar name, but do not
have the same meaning. The first one is used only to indicate
taprio offload (i.e. when igc_save_qbv_schedule was called),
while the second one corresponds to the Qbv mode of the hardware.
However, the second one is also used to support the TX launchtime
feature, i.e. ETF qdisc offload. This leads to situations where
adapter->qbv_enable is false, but the flag IGC_FLAG_TSN_QBV_ENABLED
is set. This is prone to confusion.
The rename should reduce this confusion. Since it is a pure
rename, it has no impact on functionality.
Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Current codebase contained the usage of two different names for this
driver (i.e., `gvnic` and `gve`), which is quite unfriendly for users
to use, especially when trying to bind or unbind the driver manually.
The corresponding kernel module is registered with the name of `gve`.
It's more reasonable to align the name of the driver with the module.
Fixes: 893ce44df5 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Cc: csully@google.com
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The blamed commit introduces usage of fixed_phy_register() but
not a corresponding dependency on FIXED_PHY.
This can result in a build failure.
s390-linux-ld: drivers/net/ethernet/microchip/lan743x_main.o: in function `lan743x_phy_open':
drivers/net/ethernet/microchip/lan743x_main.c:1514: undefined reference to `fixed_phy_register'
Fixes: 624864fbff ("net: lan743x: add fixed phy support for LAN7431 device")
Cc: stable@vger.kernel.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/netdev/725bf1c5-b252-7d19-7582-a6809716c7d6@infradead.org/
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turning IRQs off is done by accessing Ethernet controller registers.
That can't be done until device's clock is enabled. It results in a SoC
hang otherwise.
This bug remained unnoticed for years as most bootloaders keep all
Ethernet interfaces turned on. It seems to only affect a niche SoC
family BCM47189. It has two Ethernet controllers but CFE bootloader uses
only the first one.
Fixes: 34322615cb ("net: bgmac: Mask interrupts during probe")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the probe error path code that leaves the driver bound
to the device, but with essentially a dead device. This was
useful maybe twice early in the driver's life and no longer
makes sense to keep.
Fixes: 30a1e6d0f8 ("ionic: keep ionic dev on lif init fail")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary early code development check and the WARN_ON
that it uses. The irq alloc and free paths have long been
cleaned up and this check shouldn't have stuck around so long.
Fixes: 77ceb68e29 ("ionic: Add notifyq support")
Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved PTP pointer validation before its use to avoid smatch warning.
Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.
Fixes: 2ef4e45d99 ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In legacy silicon, promiscuous mode is only modified
through CGX mbox messages. In CN10KB silicon, it is modified
from CGX mbox and NIX. This breaks legacy application
behaviour. Fix this by removing call from NIX.
Fixes: d6c9784baf ("octeontx2-af: Invoke exact match functions if supported")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-05 (igc)
This series contains updates to igc driver only.
Husaini adds check to increment Qbv change error counter only on taprio
Qbvs. He also removes delay during Tx ring configuration and
resolves Tx hang that could occur when transmitting on a gate to be
closed.
Prasad Koya reports ethtool link mode as TP (twisted pair).
Tee Min corrects value for max SDU.
Aravindhan ensures that registers for PPS are always programmed to occur
in future.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Current duplex mode was unset in the driver, resulting in the default
parameter being set to 0, which corresponds to half duplex. It might
mislead users to have incorrect expectation about the driver's
transmission capabilities.
Set the default duplex configuration to full, as the driver runs in
full duplex mode at this point.
Fixes: 7e074d5a76 ("gve: Enable Link Speed Reporting in the driver.")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-05 (ice)
This series contains updates to ice driver only.
Sridhar fixes incorrect comparison of max Tx rate limit to occur against
each TC value rather than the aggregate. He also resolves an issue with
the wrong VSI being used when setting max Tx rate when TCs are enabled.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix tx queue rate limit when TCs are configured
ice: Fix max_rate check while configuring TX rate limits
====================
Link: https://lore.kernel.org/r/20230705201346.49370-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSlrvAACgkQSD+KveBX
+j4EcggAsZlHQHRaA9re/5Fr8VV/YNmf+eLM6/2F6CD1sLcwWEsmDXkqZpL/wqVv
bcP/dE/ehiMd1FI6XmEb9aGZY29mQBoItwCnxeUrK75d8CQib/pH87VkQdT9Uf6W
RVqPk2rOL778sTy6V/UZDecfmZB5XfEoOO6f0YP/j2t8HxmfdetBN0orE0iRQBmO
+0W8X5bIfCmGyWdQzOJB4a+S7Wi1JACr6CIOdNtADuZ7C7kKiPWrvEl7DMHlyX3Y
TyrE//piSJuxaHwdMNBHPsXK5ZYyn7ZJjqeCA+Kd+lKMJlJEO7R+TkTOP0RKE++a
ZdDetCqLKegQPpWZgjGzei7PpSpaFA==
=cNgk
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2023-07-05
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
net/mlx5: Query hca_cap_2 only when supported
net/mlx5e: TC, CT: Offload ct clear only once
net/mlx5e: Check for NOT_READY flag state after locking
net/mlx5: Register a unique thermal zone per device
net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq
net/mlx5e: fix memory leak in mlx5e_ptp_open
net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create
net/mlx5e: fix double free in mlx5e_destroy_flow_table
====================
Link: https://lore.kernel.org/r/20230705175757.284614-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This switch implements Hold/Release in a strange way, with no control
from the user as required by IEEE 802.1Q-2018 through Set-And-Hold-MAC
and Set-And-Release-MAC, but rather, it emits HOLD requests implicitly
based on the schedule.
Namely, when the gate of a preemptible TC is about to close (actually
QSYS::PREEMPTION_CFG.HOLD_ADVANCE octet times in advance of this event),
the QSYS seems to emit a HOLD request pulse towards the MAC which
preempts the currently transmitted packet, and further packets are held
back in the queue system.
This allows large frames to be squeezed through small time slots,
because HOLD requests initiated by the gate events result in the frame
being segmented in multiple fragments, the bit time of which is equal to
the size of the time slot.
It has been reported that the vsc9959_tas_guard_bands_update() logic
breaks this, because it doesn't take preemptible TCs into account, and
enables oversized frame dropping when the time slot doesn't allow a full
MTU to be sent, but it does allow 2*minFragSize to be sent (128B).
Packets larger than 128B are dropped instead of being sent in multiple
fragments.
Confusingly, the manual says:
| For guard band, SDU calculation of a traffic class of a port, if
| preemption is enabled (through 'QSYS::PREEMPTION_CFG.P_QUEUES') then
| QSYS::PREEMPTION_CFG.HOLD_ADVANCE is used, otherwise
| QSYS::QMAXSDU_CFG_*.QMAXSDU_* is used.
but this only refers to the static guard band durations, and the
QMAXSDU_CFG_* registers have dual purpose - the other being oversized
frame dropping, which takes place irrespective of whether frames are
preemptible or express.
So, to fix the problem, we need to call vsc9959_tas_guard_bands_update()
from ocelot_port_update_active_preemptible_tcs(), and modify the guard
band logic to consider a different (lower) oversize limit for
preemptible traffic classes.
Fixes: 403ffc2c34 ("net: mscc: ocelot: add support for preemptible traffic classes")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Message-ID: <20230705104422.49025-4-vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a future commit we will have to call vsc9959_tas_guard_bands_update()
from ocelot_port_update_active_preemptible_tcs(), and that will be
impossible due to the AB/BA locking dependencies between
ocelot->tas_lock and ocelot->fwd_domain_lock.
Just like we did in commit 3ff468ef98 ("net: mscc: ocelot: remove
struct ocelot_mm_state :: lock"), the only solution is to expand the
scope of ocelot->fwd_domain_lock for it to also serialize changes made
to the Time-Aware Shaper, because those will have to result in a
recalculation of cut-through TCs, which is something that depends on the
forwarding domain.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Message-ID: <20230705104422.49025-2-vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we boot with mvneta.txq_number=1, the txq_map is set incorrectly:
MVNETA_CPU_TXQ_ACCESS(1) refers to TX queue 1, but only TX queue 0 is
initialized. Fix this.
Fixes: 50bf8cb6fc ("net: mvneta: Configure XPS support")
Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://lore.kernel.org/r/20230705053712.3914-1-klaus.kudielka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Current release - regressions:
- nvme-tcp: fix comma-related oops after sendpage changes
Current release - new code bugs:
- ptp: make max_phase_adjustment sysfs device attribute invisible
when not supported
Previous releases - regressions:
- sctp: fix potential deadlock on &net->sctp.addr_wq_lock
- mptcp:
- ensure subflow is unhashed before cleaning the backlog
- do not rely on implicit state check in mptcp_listen()
Previous releases - always broken:
- net: fix net_dev_start_xmit trace event vs skb_transport_offset()
- Bluetooth:
- fix use-bdaddr-property quirk
- L2CAP: fix multiple UaFs
- ISO: use hci_sync for setting CIG parameters
- hci_event: fix Set CIG Parameters error status handling
- hci_event: fix parsing of CIS Established Event
- MGMT: fix marking SCAN_RSP as not connectable
- wireguard: queuing: use saner cpu selection wrapping
- sched: act_ipt: various bug fixes for iptables <> TC interactions
- sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
- dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
- eth: sfc: fix null-deref in devlink port without MAE access
- eth: ibmvnic: do not reset dql stats on NON_FATAL err
Misc:
- xsk: honor SO_BINDTODEVICE on bind
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSlu+MACgkQMUZtbf5S
Irslgw//S7jf/GL8V6y8VL3te+/OPOZnLDTzFFOdy64/y97FE6XIacJUpyWRhtmz
oSzcSNHETPW9U+xSGa2ZQlKhAXt6n9iRNvUegql+VBb13Iz+l7AdTeoxRv/YuwDo
5lTOIB6cBw+ATd0oxS6wr8SyUlcvktUKBfTAItjbVM55aXfIUpXIa84+F7avJgIA
XP1u/3PHhwItmwo/hXhHH0+P0QA8ix1q2SvRB7DAlQLBsTuQhaKjXWQkYYTKw/Nt
dtvh8iQSs/YXaHMjTa5CK28HOD8+ywIizr+uJ9VaNqIzV0W5JE9IE8P4NFpBcY7t
kGjTYODOph7dkNmZ5RLj3N+B6CyC57OXDzoo/tr8940UytCLVj9EVyduarLGLx57
edqK9cUz5kWejyGoyZ4Pvlo/SKvCQ2HKMeiAJ0/nNpTJMFuygMoqGsaD6ttzkXMj
fZLPjRUK3axd+15ZzhLEf8HyL5Qh+qPqqX9p7NljfMKwhxMWJ5fuICJfdGOSdMJR
ndL+wPfRPFQwszZ4pbTY2Ivn29mo8ScBOSOEgQs2mOny+zFzTzmqNWz/jcFfQnjS
cylxBEHrgudT2FuCImZ/v66TM5yakHXqIdpTGG+zsvJWQqjM96Z3I7WRvi0g9d75
n84il+j34mnzl90j2xEutqUiK7BQ9ZpZBsutPVTKBIHKWWiortI=
=9yzk
-----END PGP SIGNATURE-----
Merge tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and wireguard.
Current release - regressions:
- nvme-tcp: fix comma-related oops after sendpage changes
Current release - new code bugs:
- ptp: make max_phase_adjustment sysfs device attribute invisible
when not supported
Previous releases - regressions:
- sctp: fix potential deadlock on &net->sctp.addr_wq_lock
- mptcp:
- ensure subflow is unhashed before cleaning the backlog
- do not rely on implicit state check in mptcp_listen()
Previous releases - always broken:
- net: fix net_dev_start_xmit trace event vs skb_transport_offset()
- Bluetooth:
- fix use-bdaddr-property quirk
- L2CAP: fix multiple UaFs
- ISO: use hci_sync for setting CIG parameters
- hci_event: fix Set CIG Parameters error status handling
- hci_event: fix parsing of CIS Established Event
- MGMT: fix marking SCAN_RSP as not connectable
- wireguard: queuing: use saner cpu selection wrapping
- sched: act_ipt: various bug fixes for iptables <> TC interactions
- sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
- dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
- eth: sfc: fix null-deref in devlink port without MAE access
- eth: ibmvnic: do not reset dql stats on NON_FATAL err
Misc:
- xsk: honor SO_BINDTODEVICE on bind"
* tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
nfp: clean mc addresses in application firmware when closing port
selftests: mptcp: pm_nl_ctl: fix 32-bit support
selftests: mptcp: depend on SYN_COOKIES
selftests: mptcp: userspace_pm: report errors with 'remove' tests
selftests: mptcp: userspace_pm: use correct server port
selftests: mptcp: sockopt: return error if wrong mark
selftests: mptcp: sockopt: use 'iptables-legacy' if available
selftests: mptcp: connect: fail if nft supposed to work
mptcp: do not rely on implicit state check in mptcp_listen()
mptcp: ensure subflow is unhashed before cleaning the backlog
s390/qeth: Fix vipa deletion
octeontx-af: fix hardware timestamp configuration
net: dsa: sja1105: always enable the send_meta options
net: dsa: tag_sja1105: fix MAC DA patching from meta frames
net: Replace strlcpy with strscpy
pptp: Fix fib lookup calls.
mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
xsk: Honor SO_BINDTODEVICE on bind
ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
...
I225/6 hardware can be programmed to start PPS output once
the time in Target Time registers is reached. The time
programmed in these registers should always be into future.
Only then PPS output is triggered when SYSTIM register
reaches the programmed value. There are two modes in i225/6
hardware to program PPS, pulse and clock mode.
There were issues reported where PPS is not generated when
start time is in past.
Example 1, "echo 0 0 0 2 0 > /sys/class/ptp/ptp0/period"
In the current implementation, a value of '0' is programmed
into Target time registers and PPS output is in pulse mode.
Eventually an interrupt which is triggered upon SYSTIM
register reaching Target time is not fired. Thus no PPS
output is generated.
Example 2, "echo 0 0 0 1 0 > /sys/class/ptp/ptp0/period"
Above case, a value of '0' is programmed into Target time
registers and PPS output is in clock mode. Here, HW tries to
catch-up the current time by incrementing Target Time
register. This catch-up time seem to vary according to
programmed PPS period time as per the HW design. In my
experiments, the delay ranged between few tens of seconds to
few minutes. The PPS output is only generated after the
Target time register reaches current time.
In my experiments, I also observed PPS stopped working with
below test and could not recover until module is removed and
loaded again.
1) echo 0 <future time> 0 1 0 > /sys/class/ptp/ptp1/period
2) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period
3) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period
After this PPS did not work even if i re-program with proper
values. I could only get this back working by reloading the
driver.
This patch takes care of calculating and programming
appropriate future time value into Target Time registers.
Fixes: 5e91c72e56 ("igc: Fix PPS delta between two synchronized end-points")
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
IEEE 802.1Q does not have clear definitions of what constitutes an
SDU (Service Data Unit), but IEEE Std 802.3 clause 3.1.2 does define
the MAC service primitives and clause 3.2.7 does define the MAC Client
Data for Q-tagged frames.
It shows that the mac_service_data_unit (MSDU) does NOT contain the
preamble, destination and source address, or FCS. The MSDU does contain
the length/type field, MAC client data, VLAN tag and any padding
data (prior to the FCS).
Thus, the maximum 802.3 frame size that is allowed to be transmitted
should be QueueMaxSDU (MSDU) + 16 (6 byte SA + 6 byte DA + 4 byte FCS).
Fixes: 92a0dcb842 ("igc: offload queue max SDU from tc-taprio")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
set TP bit in the 'supported' and 'advertising' fields. i225/226 parts
only support twisted pair copper.
Fixes: 8c5ad0dae9 ("igc: Add ethtool support")
Signed-off-by: Prasad Koya <prasad@arista.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When moving devices from one namespace to another, mc addresses are
cleaned in software while not removed from application firmware. Thus
the mc addresses are remained and will cause resource leak.
Now use `__dev_mc_unsync` to clean mc addresses when closing port.
Fixes: e20aa071cd ("nfp: fix schedule in atomic context when sync mc address")
Cc: stable@vger.kernel.org
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Message-ID: <20230705052818.7122-1-louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently mlx5e releases pages directly to the page_pool for XDP_TX and
does page fragment counting for XDP_REDIRECT. RX pages from the
page_pool are leaking on XDP_REDIRECT because the xdp core will release
only one fragment out of MLX5E_PAGECNT_BIAS_MAX and subsequently the page
is marked as "skip release" which avoids the driver release.
A fix would be to take an extra fragment for XDP_REDIRECT and not set the
"skip release" bit so that the release on the driver side can handle the
remaining bias fragments. But this would be a shortsighted solution.
Instead, this patch converges the two XDP paths (XDP_TX and XDP_REDIRECT) to
always do fragment tracking. The "skip release" bit is no longer
necessary for XDP.
Fixes: 6f57428460 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
On vport enable, where fw's hca caps are queried, the driver queries
hca_caps_2 without checking if fw truly supports them, causing a false
failure of vfs vport load and blocking SRIOV enablement on old devices
such as CX4 where hca_caps_2 support is missing.
Thus, add a check for the said caps support before accessing them.
Fixes: e5b9642a33 ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Non-clear CT action causes a flow rule split, while CT clear action
doesn't and is just a header-rewrite to the current flow rule.
But ct offload is done in post_parse and is per ct action instance,
so ct clear offload is parsed multiple times, while its deleted once.
Fix this by post_parsing the ct action only once per flow attribute
(which is per flow rule) by using a offloaded ct_attr flag.
Fixes: 08fe94ec5f ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Prior to this patch only one "mlx5" thermal zone could have been
registered regardless of the number of individual mlx5 devices in the
system.
To fix this setup a unique name per device to register its own thermal
zone.
In order to not register a thermal zone for a virtual device (VF/SF) add
a check for PF device type.
The new name is a concatenation between "mlx5_" and "<PCI_DEV_BDF>", which
will also help associating a thermal zone with its PCI device.
$ lspci | grep ConnectX
00:04.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
00:05.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
$ cat /sys/devices/virtual/thermal/thermal_zone0/type
mlx5_0000:00:04.0
$ cat /sys/devices/virtual/thermal/thermal_zone1/type
mlx5_0000:00:05.0
Fixes: c1fef618d6 ("net/mlx5: Implement thermal zone")
CC: Sandipan Patra <spatra@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Regular (non-XSK) RQs get flushed on XSK setup and re-activated on XSK
close. If the same regular RQ is closed (a config change for example)
soon after the XSK close, a double release occurs because the missing
wqes get released a second time.
Fixes: 3f93f82988 ("net/mlx5e: RX, Defer page release in legacy rq for better recycling")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When kvzalloc_node or kvzalloc failed in mlx5e_ptp_open, the memory
pointed by "c" or "cparams" is not freed, which can lead to a memory
leak. Fix by freeing the array in the error path.
Fixes: 145e5637d9 ("net/mlx5e: Add TX PTP port object support")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The memory pointed to by the fs->any pointer is not freed in the error
path of mlx5e_fs_tt_redirect_any_create, which can lead to a memory leak.
Fix by freeing the memory in the error path, thereby making the error path
identical to mlx5e_fs_tt_redirect_any_destroy().
Fixes: 0f575c20bf ("net/mlx5e: Introduce Flow Steering ANY API")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In function accel_fs_tcp_create_groups(), when the ft->g memory is
successfully allocated but the 'in' memory fails to be allocated, the
memory pointed to by ft->g is released once. And in function
accel_fs_tcp_create_table, mlx5e_destroy_flow_table is called to release
the memory pointed to by ft->g again. This will cause double free problem.
Fixes: c062d52ac2 ("net/mlx5e: Receive flow steering framework for accelerated TCP flows")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If a user schedules a Gate Control List (GCL) to close one of
the QBV gates while also transmitting a packet to that closed gate,
TX Hang will be happen. HW would not drop any packet when the gate
is closed and keep queuing up in HW TX FIFO until the gate is re-opened.
This patch implements the solution to drop the packet for the closed
gate.
This patch will also reset the adapter to perform SW initialization
for each 1st Gate Control List (GCL) to avoid hang.
This is due to the HW design, where changing to TSN transmit mode
requires SW initialization. Intel Discrete I225/6 transmit mode
cannot be changed when in dynamic mode according to Software User
Manual Section 7.5.2.1. Subsequent Gate Control List (GCL) operations
will proceed without a reset, as they already are in TSN Mode.
Step to reproduce:
DUT:
1) Configure GCL List with certain gate close.
BASE=$(date +%s%N)
tc qdisc replace dev $IFACE parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \
queues 1@0 1@1 1@2 1@3 \
base-time $BASE \
sched-entry S 0x8 500000 \
sched-entry S 0x4 500000 \
flags 0x2
2) Transmit the packet to closed gate. You may use udp_tai
application to transmit UDP packet to any of the closed gate.
./udp_tai -i <interface> -P 100000 -p 90 -c 1 -t <0/1> -u 30004
Fixes: ec50a9d437 ("igc: Add support for taprio offloading")
Co-developed-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Tested-by: Chwee Lin Choong <chwee.lin.choong@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove unnecessary delay during the TX ring configuration.
This will cause delay, especially during link down and
link up activity.
Furthermore, old SKUs like as I225 will call the reset_adapter
to reset the controller during TSN mode Gate Control List (GCL)
setting. This will add more time to the configuration of the
real-time use case.
It doesn't mentioned about this delay in the Software User Manual.
It might have been ported from legacy code I210 in the past.
Fixes: 13b5b7fd6a ("igc: Add support for Tx/Rx rings")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add condition to increase the qbv counter during taprio qbv
configuration only.
There might be a case when TC already been setup then user configure
the ETF/CBS qdisc and this counter will increase if no condition above.
Fixes: ae4fe46983 ("igc: Add qbv_config_change_errors counter")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Configuring tx_maxrate via sysfs interface
/sys/class/net/eth0/queues/tx-1/tx_maxrate was not working when
TCs are configured because always main VSI was being used. Fix by
using correct VSI in ice_set_tx_maxrate when TCs are configured.
Fixes: 1ddef455f4 ("ice: Add NDO callback to set the maximum per-queue bitrate")
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove incorrect check in ice_validate_mqprio_opt() that limits
filter configuration when sum of max_rates of all TCs exceeds
the link speed. The max rate of each TC is unrelated to value
used by other TCs and is valid as long as it is less than link
speed.
Fixes: fbc7b27af0 ("ice: enable ndo_setup_tc support for mqprio_qdisc")
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
MAC block on CN10K (RPM) supports hardware timestamp configuration. The
previous patch which added timestamp configuration support has a bug.
Though the netdev driver requests to disable timestamp configuration,
the driver is always enabling it.
This patch fixes the same.
Fixes: d148920868 ("octeontx2-af: cn10k: RPM hardware timestamp configuration")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlxsw_sp_crif_alloc() function returns NULL on error. It doesn't
return error pointers. Fix the check.
Fixes: 78126cfd5d ("mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF driver configures MAC features like internal loopback and PFC upon
receiving the request from PF and its VF netdev. But these
features are not getting reset in FLR. This patch fixes the issue by
resetting the same.
Fixes: 23999b30ae ("octeontx2-af: Enable or disable CGX internal loopback")
Fixes: 1121f6b02e ("octeontx2-af: Priority flow control configuration support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
with the addition of new MAC blocks like CN10K RPM and CN10KB
RPM_USX, LMACs are noncontiguous and CGX blocks are also
noncontiguous. But during RVU driver initialization, the driver
is assuming they are contiguous and trying to access
cgx or lmac with their id which is resulting in kernel panic.
This patch fixes the issue by adding proper checks.
[ 23.219150] pc : cgx_lmac_read+0x38/0x70
[ 23.219154] lr : rvu_program_channels+0x3f0/0x498
[ 23.223852] sp : ffff000100d6fc80
[ 23.227158] x29: ffff000100d6fc80 x28: ffff00010009f880 x27:
000000000000005a
[ 23.234288] x26: ffff000102586768 x25: 0000000000002500 x24:
fffffffffff0f000
Fixes: 91c6945ea1 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware configures NIX block mapping for all MAC blocks.
The current implementation reads the configuration and
creates the mapping between RVU PF and NIX blocks. But
this configuration is only valid for silicons that support
multiple blocks. For all other silicons, all MAC blocks
map to NIX0.
This patch corrects the mapping by adding a check for the same.
Fixes: c5a73b632b ("octeontx2-af: Map NIX block from CGX connection")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current design is that, for asynchronous events like link_up and
link_down firmware raises the interrupt to kernel. The previous patch
which added RPM_USX driver has a bug where it uses old csr addresses
for configuring interrupts. Which is resulting in losing interrupts
from source firmware.
This patch fixes the issue by correcting csr addresses.
Fixes: b9d0fedc62 ("octeontx2-af: cn10kb: Add RPM_USX MAC support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmSdtQYUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxp7A/+KmoBOm9ytiM2HPiq6pmHiJ9zF/DP
0jvKqDlc0BkmCyRw+/woxA5ZQgDnIYXxxt31toeSu+n31p6AR4wZ14Q5HBahABMw
O/NUXmLAhYaczcp8hK4lS4Opz65+MaDHomu5fNuD7j7CagIwu20MegSEoyo35YC1
nDRN0IVYRRy/58wW509deQi/3U04TuC3kflc/iBToa/34g77L9ESoxpsZuAzo7wI
nc9DF28H6PkuOtnp26iVufqkeYD3wfE5VAtaIZhyO+/WkhcTTUU5JB2hgFSACr0G
qJTofncvGQrRYTNS7aIYPVrtTZpSMmPS08tgZc+iDkTr8iKth1jcPf3YHLpenQzx
2B197BUOLENOiWJFPIAe2TAgoGYYBgKhnnwcPHCHoinvVLTO82wUUW5qfn/GckgY
WNYmS4PofkjlbJH3JdyHdH+vsL0VRzAmkhH+k6F+V02T78Lk+QdXKegLbr5yzRwh
YF/GjX0JYjnONQeL1LQQ/4hfiA1VzmoXbHvXc0XJew6d/RYMon5G5xBAAZ8tnUHC
PX6WMd/CG8RBxFv7IsPh6hTGKEXw4/1ElynPVP/ZEBGVelHda4Sm4G5nJ6/r2cwK
VICfsSb9Nw76STGHpVeQB2O2Y/yZ1xIwWRiAsCndsYOlnGi+knRVxH5UH/1aQPOE
N3DsyT0s/sZaJvc=
=xpTd
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Export pcie_retrain_link() for use outside ASPM
- Add Data Link Layer Link Active Reporting as another way for
pcie_retrain_link() to determine the link is up
- Work around link training failures (especially on the ASMedia
ASM2824 switch) by training first at 2.5GT/s and then attempting
higher rates
Resource management:
- When we coalesce host bridge windows, remove invalidated resources
from the resource tree so future allocations work correctly
Hotplug:
- Cancel bringup sequence if card is not present, to keep from
blinking Power Indicator indefinitely
- Reassign bridge resources if necessary for ACPI hotplug
Driver binding:
- Convert platform_device .remove() callbacks to return void instead
of a mostly useless int
Power management:
- Reduce wait time for secondary bus to be ready to speed up resume
- Avoid putting EloPOS E2/S2/H2 (as well as Elo i2) PCIe Ports in
D3cold
- Call _REG when transitioning D-states so AML that uses the PCI
config space OpRegion works, which fixes some ASMedia GPIO
controllers after resume
Virtualization:
- Delay extra 250ms after FLR of Solidigm P44 Pro NVMe to avoid KVM
hang when guest is rebooted
- Add function 1 DMA alias quirk for Marvell 88SE9235
Error handling:
- Unexport pci_save_aer_state() since it's only used in drivers/pci/
- Drop recommendation for drivers to configure AER Capability, since
the PCI core does this for all devices
ASPM:
- Disable ASPM on MFD function removal to avoid use-after-free
- Tighten up pci_enable_link_state() and pci_disable_link_state()
interfaces so they don't enable/disable states the driver didn't
specify
- Avoid link retraining race that can happen if ASPM sets link
control parameters while the link is in the midst of training for
some other reason
Endpoint framework:
- Change "PCI Endpoint Virtual NTB driver" Kconfig prompt to be
different from "PCI Endpoint NTB driver"
- Automatically create a function specific attributes group for
endpoint drivers to avoid reference counting issues
- Fix many EPC test issues
- Return pci_epf_type_add_cfs() error if EPF has no driver
- Add kernel-doc for pci_epc_raise_irq() and pci_epc_map_msi_irq()
MSI vector parameters
- Pass EPF device ID to driver probe functions
- Return -EALREADY if EPC has already been started/stopped
- Add linkdown notifier support and use it in qcom-ep
- Add Bus Master Enable event support and use it in qcom-ep
- Add Qualcomm Modem Host Interface (MHI) endpoint driver
- Add Layerscape PME interrupt handling to manage link-up
notification
Cadence PCIe controller driver:
- Wait for link retrain to complete when working around the J721E
i2085 erratum with Gen2 mode
Faraday FTPC100 PCI controller driver:
- Release clock resources on error paths
Freescale i.MX6 PCIe controller driver:
- Save and restore Root Port MSI control to work around hardware defect
Intel VMD host bridge driver:
- Reset VMD config register between soft reboots
- Capture pci_reset_bus() return value instead of printing junk when
it fails
Qualcomm PCIe controller driver:
- Add SDX65 endpoint compatible string to DT binding
- Disable register write access after init for IP v2.3.3, v2.9.0
- Use DWC helpers for enabling/disabling writes to DBI registers
- Hide slot hotplug capability for IP v1.0.0, v1.9.0, v2.1.0, v2.3.2,
v2.3.3, v2.7.0, v2.9.0
- Reuse v2.3.2 post-init sequence for v2.4.0
Renesas R-Car PCIe controller driver:
- Remove unused static pcie_base and pcie_dev
Rockchip PCIe controller driver:
- Remove writes to unused registers
- Write endpoint Device ID using correct register
- Assert PCI Configuration Enable bit after probe so endpoint
responds instead of generating Request Retry Status messages
- Poll waiting for PHY PLLs to lock
- Update RK3399 example DT binding to be valid
- Use RK3399 PCIE_CLIENT_LEGACY_INT_CTRL to generate INTx instead of
manually generating PCIe message
- Use multiple windows to avoid address translation conflicts
- Use u32 (not u16) when accessing 32-bit registers
- Hide MSI-X Capability, since RK3399 can't generate MSI-X
- Set endpoint controller required alignment to 256
Synopsys DesignWare PCIe controller driver:
- Wait for link to come up only if we've initiated link training
Miscellaneous:
- Add pci_clear_master() stub for non-CONFIG_PCI"
* tag 'pci-v6.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
Documentation: PCI: correct spelling
PCI: vmd: Fix uninitialized variable usage in vmd_enable_domain()
PCI: xgene-msi: Convert to platform remove callback returning void
PCI: tegra: Convert to platform remove callback returning void
PCI: rockchip-host: Convert to platform remove callback returning void
PCI: mvebu: Convert to platform remove callback returning void
PCI: mt7621: Convert to platform remove callback returning void
PCI: mediatek-gen3: Convert to platform remove callback returning void
PCI: mediatek: Convert to platform remove callback returning void
PCI: iproc: Convert to platform remove callback returning void
PCI: hisi-error: Convert to platform remove callback returning void
PCI: dwc: Convert to platform remove callback returning void
PCI: j721e: Convert to platform remove callback returning void
PCI: brcmstb: Convert to platform remove callback returning void
PCI: altera-msi: Convert to platform remove callback returning void
PCI: altera: Convert to platform remove callback returning void
PCI: aardvark: Convert to platform remove callback returning void
PCI: rcar: Use correct product family name for Renesas R-Car
PCI: layerscape: Add the endpoint linkup notifier support
PCI: endpoint: pci-epf-vntb: Fix typo in comments
...
This cycle saw a focus on rxe and bnxt_re drivers:
- Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma
- rxe uses workqueues instead of tasklets
- rxe has better compliance around access checks for MRs and rereg_mr
- mana supportst he 'v2' FW interface for RX coalescing
- hfi1 bug fix for stale cache entries in its MR cache
- mlx5 buf fix to handle FW failures when destroying QPs
- erdma HW has a new doorbell allocation mechanism for uverbs that is
secure
- Lots of small cleanups and rework in bnxt_re
* Use the common mmap functions
* Support disassociation
* Improve FW command flow
- bnxt_re support for "low latency push", this allows a packet
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZJxq+wAKCRCFwuHvBreF
YSN+AQDKAmWdKCqmc3boMIq5wt4h7yYzdW47LpzGarOn5Hf+UgEA6mpPJyRqB43C
CNXYIbASl/LLaWzFvxCq/AYp6tzuog4=
=I8ju
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This cycle saw a focus on rxe and bnxt_re drivers:
- Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma
- rxe uses workqueues instead of tasklets
- rxe has better compliance around access checks for MRs and rereg_mr
- mana supportst he 'v2' FW interface for RX coalescing
- hfi1 bug fix for stale cache entries in its MR cache
- mlx5 buf fix to handle FW failures when destroying QPs
- erdma HW has a new doorbell allocation mechanism for uverbs that is
secure
- Lots of small cleanups and rework in bnxt_re:
- Use the common mmap functions
- Support disassociation
- Improve FW command flow
- support for 'low latency push'"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits)
RDMA/bnxt_re: Fix an IS_ERR() vs NULL check
RDMA/bnxt_re: Fix spelling mistake "priviledged" -> "privileged"
RDMA/bnxt_re: Remove duplicated include in bnxt_re/main.c
RDMA/bnxt_re: Refactor code around bnxt_qplib_map_rc()
RDMA/bnxt_re: Remove incorrect return check from slow path
RDMA/bnxt_re: Enable low latency push
RDMA/bnxt_re: Reorg the bar mapping
RDMA/bnxt_re: Move the interface version to chip context structure
RDMA/bnxt_re: Query function capabilities from firmware
RDMA/bnxt_re: Optimize the bnxt_re_init_hwrm_hdr usage
RDMA/bnxt_re: Add disassociate ucontext support
RDMA/bnxt_re: Use the common mmap helper functions
RDMA/bnxt_re: Initialize opcode while sending message
RDMA/cma: Remove NULL check before dev_{put, hold}
RDMA/rxe: Simplify cq->notify code
RDMA/rxe: Fixes mr access supported list
RDMA/bnxt_re: optimize the parameters passed to helper functions
RDMA/bnxt_re: remove redundant cmdq_bitmap
RDMA/bnxt_re: use firmware provided max request timeout
RDMA/bnxt_re: cancel all control path command waiters upon error
...
The line cards array is not freed in the error path of
mlxsw_m_linecards_init(), which can lead to a memory leak. Fix by
freeing the array in the error path, thereby making the error path
identical to mlxsw_m_linecards_fini().
Fixes: 01328e23a4 ("mlxsw: minimal: Extend module to port mapping with slot index")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230630012647.1078002-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nothing surprising in the SoC specific drivers, with the usual updates:
* Added or improved SoC driver support for Tegra234, Exynos4121, RK3588,
as well as multiple Mediatek and Qualcomm chips
* SCMI firmware gains support for multiple SMC/HVC transport and version
3.2 of the protocol
* Cleanups amd minor changes for the reset controller, memory controller,
firmware and sram drivers
* Minor changes to amd/xilinx, samsung, tegra, nxp, ti, qualcomm,
amlogic and renesas SoC specific drivers
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmSdmbIACgkQYKtH/8kJ
UicewQ/6Aq8j5pBFYBimZoyQ0bi9z+prGrHoDDYLew2vKjtOXJl5z7ZnM3J1oyPt
Zvis3IaGkHJCuuqotPdsquZrzHq8slzXzwkHPfHORJBC4gV0V/vMS8w32tO5FfTq
ULrMyWnbsU7Udeywc2xuEpAoC9+bXX9brnCpa3H41peIGZKM+0g7EE6FASt3YaOk
O+ZMSGqF8QbCqSQrUH3GudFlFMy/VxIvwuUsbLt8aNkRACunQZXVgUdArvLV49nX
SElFN7hOVRoVDv0rgYMxlwElymrta/kMyjLba8GU1GIhzyDGozVqIJQAnsQ3f6CC
yyzaJm27zzJH0mx9jx4W+JLBdjqDL4ctE2WyllRVIpTGYMHiMQtutHNwtNupIuD5
j9j/fIVQWZqOdWXnA6V/CHYN1MZBRTH3KQcnLlYPC01dWKThPDnrHGfwOkfsrwtN
zuERJJ+gd5b8KW4dmy1ueDOSB8162LxbS7iHxpOBGySmqVOYj3XUqACZhKRfXfIQ
BVj9punCE/gO2fMb9IZByjeOzgtV+PBRmPxoglyaGkT4fVfL06kEbpKFYbXXq9b/
aAS/U84gGr8ebWsOXszwDnBzTZRzjMVv/T9KDTTJuWbBEPNyCR7fUG0cZ50rSKnJ
2cTPe3a0sS6LaBt71qfExCIfxG+cJ2c3N1U5/jb2C49Aob45obs=
=zvLr
-----END PGP SIGNATURE-----
Merge tag 'soc-drivers-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
"Nothing surprising in the SoC specific drivers, with the usual
updates:
- Added or improved SoC driver support for Tegra234, Exynos4121,
RK3588, as well as multiple Mediatek and Qualcomm chips
- SCMI firmware gains support for multiple SMC/HVC transport and
version 3.2 of the protocol
- Cleanups amd minor changes for the reset controller, memory
controller, firmware and sram drivers
- Minor changes to amd/xilinx, samsung, tegra, nxp, ti, qualcomm,
amlogic and renesas SoC specific drivers"
* tag 'soc-drivers-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (118 commits)
dt-bindings: interrupt-controller: Convert Amlogic Meson GPIO interrupt controller binding
MAINTAINERS: add PHY-related files to Amlogic SoC file list
drivers: meson: secure-pwrc: always enable DMA domain
tee: optee: Use kmemdup() to replace kmalloc + memcpy
soc: qcom: geni-se: Do not bother about enable/disable of interrupts in secondary sequencer
dt-bindings: sram: qcom,imem: document qdu1000
soc: qcom: icc-bwmon: Fix MSM8998 count unit
dt-bindings: soc: qcom,rpmh-rsc: Require power-domains
soc: qcom: socinfo: Add Soc ID for IPQ5300
dt-bindings: arm: qcom,ids: add SoC ID for IPQ5300
soc: qcom: Fix a IS_ERR() vs NULL bug in probe
soc: qcom: socinfo: Add support for new fields in revision 19
soc: qcom: socinfo: Add support for new fields in revision 18
dt-bindings: firmware: scm: Add compatible for SDX75
soc: qcom: mdt_loader: Fix split image detection
dt-bindings: memory-controllers: drop unneeded quotes
soc: rockchip: dtpm: use C99 array init syntax
firmware: tegra: bpmp: Add support for DRAM MRQ GSCs
soc/tegra: pmc: Use devm_clk_notifier_register()
soc/tegra: pmc: Simplify debugfs initialization
...
All ibmvnic resets, make a call to netdev_tx_reset_queue() when
re-opening the device. netdev_tx_reset_queue() resets the num_queued
and num_completed byte counters. These stats are used in Byte Queue
Limit (BQL) algorithms. The difference between these two stats tracks
the number of bytes currently sitting on the physical NIC. ibmvnic
increases the number of queued bytes though calls to
netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports
that it is done transmitting bytes, the ibmvnic device increases the
number of completed bytes through calls to netdev_tx_completed_queue().
It is important to note that the driver batches its transmit calls and
num_queued is increased every time that an skb is added to the next
batch, not necessarily when the batch is sent to VIOS for transmission.
Unlike other reset types, a NON FATAL reset will not flush the sub crq
tx buffers. Therefore, it is possible for the batched skb array to be
partially full. So if there is call to netdev_tx_reset_queue() when
re-opening the device, the value of num_queued (0) would not account
for the skb's that are currently batched. Eventually, when the batch
is sent to VIOS, the call to netdev_tx_completed_queue() would increase
num_completed to a value greater than the num_queued. This causes a
BUG_ON crash:
ibmvnic 30000002: Firmware reports error, cause: adapter problem.
Starting recovery...
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:27!
Oops: Exception in kernel mode, sig: 5
[....]
NIP dql_completed+0x28/0x1c0
LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic]
Call Trace:
ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable)
ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic]
__handle_irq_event_percpu+0x98/0x270
---[ end trace ]---
Therefore, do not reset the dql stats when performing a NON_FATAL reset.
Fixes: 0d97338818 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On systems without MAE permission efx->mae is not initialised,
and trying to lookup an mport results in a NULL pointer
dereference.
Fixes: 25414b2a64 ("sfc: add devlink port support for ef100")
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes a bug where on the M1 mac mini initramfs-tools fails to
include the necessary firmware into the initrd.
Fixes: c4dab50697 ("tg3: Download 57766 EEE service patch firmware")
Signed-off-by: Tobias Heider <me@tobhe.de>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/ZJt7LKzjdz8+dClx@tobhe.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a future change, the driver will need to determine whether PTP RX
timestamping is enabled on a port (including whether traps were set up
on that port in particular) and that is currently not possible.
The driver supports different RX filters (L2, L4) and kinds of TX
timestamping (one-step, two-step) on its ports, but it saves all
configuration in a single struct hwtstamp_config that is global to the
switch. So, the latest timestamping configuration on one port
(including a request to disable timestamping) affects what gets reported
for all ports, even though the configuration itself is still individual
to each port.
The port timestamping configurations are only coupled because of the
common structure, so replace the hwtstamp_config with a mask of trapped
protocols saved per port. We also have the ptp_cmd to distinguish
between one-step and two-step PTP timestamping, so with those 2 bits of
information we can fully reconstruct a descriptive struct
hwtstamp_config for each port, during the SIOCGHWTSTAMP ioctl.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Fixes: 96ca08c058 ("net: mscc: ocelot: set up traps for PTP packets")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
PTP RX timestamping should be enabled when the user requests it, not by
default. If it is enabled by default, it can be problematic when the
ocelot driver is a DSA master, and it sidesteps what DSA tries to avoid
through __dsa_master_hwtstamp_validate().
Additionally, after the change which made ocelot trap PTP packets only
to the CPU at ocelot_hwtstamp_set() time, it is no longer even true that
RX timestamping is enabled by default, because until ocelot_hwtstamp_set()
is called, the PTP traps are actually not set up. So the rx_filter field
of ocelot->hwtstamp_config reflects an incorrect reality.
Fixes: 96ca08c058 ("net: mscc: ocelot: set up traps for PTP packets")
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
dev_set_rx_mode() grabs a spin_lock, and the lan743x implementation
proceeds subsequently to go to sleep using readx_poll_timeout().
Introduce a helper wrapping the readx_poll_timeout_atomic() function
and use it to replace the calls to readx_polL_timeout().
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Cc: stable@vger.kernel.org
Cc: Bryan Whitehead <bryan.whitehead@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Signed-off-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230627035000.1295254-1-moritzf@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Core
----
- Rework the sendpage & splice implementations. Instead of feeding
data into sockets page by page extend sendmsg handlers to support
taking a reference on the data, controlled by a new flag called
MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file
to invoke an additional callback instead of trying to predict what
the right combination of MORE/NOTLAST flags is.
Remove the MSG_SENDPAGE_NOTLAST flag completely.
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid.
- Enable socket busy polling with CONFIG_RT.
- Improve reliability and efficiency of reporting for ref_tracker.
- Auto-generate a user space C library for various Netlink families.
Protocols
---------
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2].
- Use per-VMA locking for "page-flipping" TCP receive zerocopy.
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags.
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative.
- Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO).
- Avoid waking up applications using TLS sockets until we have
a full record.
- Allow using kernel memory for protocol ioctl callbacks, paving
the way to issuing ioctls over io_uring.
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address.
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch.
- PPPoE: make number of hash bits configurable.
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig).
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge).
- Support matching on Connectivity Fault Management (CFM) packets.
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug.
- HSR: don't enable promiscuous mode if device offloads the proto.
- Support active scanning in IEEE 802.15.4.
- Continue work on Multi-Link Operation for WiFi 7.
BPF
---
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used,
or in fact passing the verifier at all for complex programs,
especially those using open-coded iterators.
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what
the output buffer *should* be, without writing anything.
- Accept dynptr memory as memory arguments passed to helpers.
- Add routing table ID to bpf_fib_lookup BPF helper.
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands.
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only).
- Show target_{obj,btf}_id in tracing link fdinfo.
- Addition of several new kfuncs (most of the names are self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter
---------
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value.
- Increase ip_vs_conn_tab_bits range for 64BIT builds.
- Allow updating size of a set.
- Improve NAT tuple selection when connection is closing.
Driver API
----------
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out).
- Support configuring rate selection pins of SFP modules.
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines.
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer.
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio).
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message.
- Split devlink instance and devlink port operations.
New hardware / drivers
----------------------
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers
-------
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is
a variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split
the different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and
Enhanced MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S
IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3
MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07
IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ
CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f
mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/
fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp
J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY
dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7
yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/
JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv
tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4=
=9i4I
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
"WiFi 7 and sendpage changes are the biggest pieces of work for this
release. The latter will definitely require fixes but I think that we
got it to a reasonable point.
Core:
- Rework the sendpage & splice implementations
Instead of feeding data into sockets page by page extend sendmsg
handlers to support taking a reference on the data, controlled by a
new flag called MSG_SPLICE_PAGES
Rework the handling of unexpected-end-of-file to invoke an
additional callback instead of trying to predict what the right
combination of MORE/NOTLAST flags is
Remove the MSG_SENDPAGE_NOTLAST flag completely
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid
- Enable socket busy polling with CONFIG_RT
- Improve reliability and efficiency of reporting for ref_tracker
- Auto-generate a user space C library for various Netlink families
Protocols:
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2]
- Use per-VMA locking for "page-flipping" TCP receive zerocopy
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative
- Create a new MPTCP getsockopt to retrieve all info
(MPTCP_FULL_INFO)
- Avoid waking up applications using TLS sockets until we have a full
record
- Allow using kernel memory for protocol ioctl callbacks, paving the
way to issuing ioctls over io_uring
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch
- PPPoE: make number of hash bits configurable
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig)
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge)
- Support matching on Connectivity Fault Management (CFM) packets
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug
- HSR: don't enable promiscuous mode if device offloads the proto
- Support active scanning in IEEE 802.15.4
- Continue work on Multi-Link Operation for WiFi 7
BPF:
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used, or
in fact passing the verifier at all for complex programs,
especially those using open-coded iterators
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what the
output buffer *should* be, without writing anything
- Accept dynptr memory as memory arguments passed to helpers
- Add routing table ID to bpf_fib_lookup BPF helper
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only)
- Show target_{obj,btf}_id in tracing link fdinfo
- Addition of several new kfuncs (most of the names are
self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter:
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value
- Increase ip_vs_conn_tab_bits range for 64BIT builds
- Allow updating size of a set
- Improve NAT tuple selection when connection is closing
Driver API:
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out)
- Support configuring rate selection pins of SFP modules
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio)
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message
- Split devlink instance and devlink port operations
New hardware / drivers:
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers:
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer
header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested
configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is a
variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split the
different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and Enhanced
MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips"
* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
net: scm: introduce and use scm_recv_unix helper
af_unix: Skip SCM_PIDFD if scm->pid is NULL.
net: lan743x: Simplify comparison
netlink: Add __sock_i_ino() for __netlink_diag_dump().
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
libceph: Partially revert changes to support MSG_SPLICE_PAGES
net: phy: mscc: fix packet loss due to RGMII delays
net: mana: use vmalloc_array and vcalloc
net: enetc: use vmalloc_array and vcalloc
ionic: use vmalloc_array and vcalloc
pds_core: use vmalloc_array and vcalloc
gve: use vmalloc_array and vcalloc
octeon_ep: use vmalloc_array and vcalloc
net: usb: qmi_wwan: add u-blox 0x1312 composition
perf trace: fix MSG_SPLICE_PAGES build error
ipvlan: Fix return value of ipvlan_queue_xmit()
netfilter: nf_tables: fix underflow in chain reference counter
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
...
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
- Convert strreplace() to return string start (Andy Shevchenko)
- Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
- Add missing function prototypes seen with W=1 (Arnd Bergmann)
- Fix strscpy() kerndoc typo (Arne Welzel)
- Replace strlcpy() with strscpy() across many subsystems which were
either Acked by respective maintainers or were trivial changes that
went ignored for multiple weeks (Azeem Shaikh)
- Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
- Add KUnit tests for strcat()-family
- Enable KUnit tests of FORTIFY wrappers under UML
- Add more complete FORTIFY protections for strlcat()
- Add missed disabling of FORTIFY for all arch purgatories.
- Enable -fstrict-flex-arrays=3 globally
- Tightening UBSAN_BOUNDS when using GCC
- Improve checkpatch to check for strcpy, strncpy, and fake flex arrays
- Improve use of const variables in FORTIFY
- Add requested struct_size_t() helper for types not pointers
- Add __counted_by macro for annotating flexible array size members
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSbftQWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj0MD/9X9jzJzCmsAU+yNldeoAzC84Sk
GVU3RBxGcTNysL1gZXynkIgigw7DWc4htMGeSABHHwQRVP65JCH1Kw/VqIkyumbx
9LdX6IklMJb4pRT4PVU3azebV4eNmSjlur2UxMeW54Czm91/6I8RHbJOyAPnOUmo
2oomGdP/hpEHtKR7hgy8Axc6w5ySwQixh2V5sVZG3VbvCS5WKTmTXbs6puuRT5hz
iHt7v+7VtEg/Qf1W7J2oxfoghvVBsaRrSLrExWT/oZYh1ZxM7DsCAAoG/IsDgHGA
9LBXiRECgAFThbHVxLvvKZQMXdVk0i8iXLX43XMKC0wTA+NTyH7wlcQQ4RWNMuo8
sfA9Qm9gMArXaf64aymr3Uwn20Zan0391HdlbhOJZAE6v3PPJbleUnM58AzD2d3r
5Lz6AIFBxDImy+3f9iDWgacCT5/PkeiXTHzk9QnKhJyKKtRA58XJxj4q2+rPnGJP
n4haXqoxD5FJbxdXiGKk31RS0U5HBug7wkOcUrTqDHUbc/QNU2b7dxTKUx+zYtCU
uV5emPzpF4H4z+91WpO47n9gkMAfwV0lt9S2dwS8pxsgqctbmIan+Jgip7rsqZ2G
OgLXBsb43eEs+6WgO8tVt/ZHYj9ivGMdrcNcsIfikzNs/xweUJ53k2xSEn2xEa5J
cwANDmkL6QQK7yfeeg==
=s0j1
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"There are three areas of note:
A bunch of strlcpy()->strscpy() conversions ended up living in my tree
since they were either Acked by maintainers for me to carry, or got
ignored for multiple weeks (and were trivial changes).
The compiler option '-fstrict-flex-arrays=3' has been enabled
globally, and has been in -next for the entire devel cycle. This
changes compiler diagnostics (though mainly just -Warray-bounds which
is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
coverage. In other words, there are no new restrictions, just
potentially new warnings. Any new FORTIFY warnings we've seen have
been fixed (usually in their respective subsystem trees). For more
details, see commit df8fc4e934.
The under-development compiler attribute __counted_by has been added
so that we can start annotating flexible array members with their
associated structure member that tracks the count of flexible array
elements at run-time. It is possible (likely?) that the exact syntax
of the attribute will change before it is finalized, but GCC and Clang
are working together to sort it out. Any changes can be made to the
macro while we continue to add annotations.
As an example of that last case, I have a treewide commit waiting with
such annotations found via Coccinelle:
https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b
Also see commit dd06e72e68 for more details.
Summary:
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
- Convert strreplace() to return string start (Andy Shevchenko)
- Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
- Add missing function prototypes seen with W=1 (Arnd Bergmann)
- Fix strscpy() kerndoc typo (Arne Welzel)
- Replace strlcpy() with strscpy() across many subsystems which were
either Acked by respective maintainers or were trivial changes that
went ignored for multiple weeks (Azeem Shaikh)
- Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
- Add KUnit tests for strcat()-family
- Enable KUnit tests of FORTIFY wrappers under UML
- Add more complete FORTIFY protections for strlcat()
- Add missed disabling of FORTIFY for all arch purgatories.
- Enable -fstrict-flex-arrays=3 globally
- Tightening UBSAN_BOUNDS when using GCC
- Improve checkpatch to check for strcpy, strncpy, and fake flex
arrays
- Improve use of const variables in FORTIFY
- Add requested struct_size_t() helper for types not pointers
- Add __counted_by macro for annotating flexible array size members"
* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
netfilter: ipset: Replace strlcpy with strscpy
uml: Replace strlcpy with strscpy
um: Use HOST_DIR for mrproper
kallsyms: Replace all non-returning strlcpy with strscpy
sh: Replace all non-returning strlcpy with strscpy
of/flattree: Replace all non-returning strlcpy with strscpy
sparc64: Replace all non-returning strlcpy with strscpy
Hexagon: Replace all non-returning strlcpy with strscpy
kobject: Use return value of strreplace()
lib/string_helpers: Change returned value of the strreplace()
jbd2: Avoid printing outside the boundary of the buffer
checkpatch: Check for 0-length and 1-element arrays
riscv/purgatory: Do not use fortified string functions
s390/purgatory: Do not use fortified string functions
x86/purgatory: Do not use fortified string functions
acpi: Replace struct acpi_table_slit 1-element array with flex-array
clocksource: Replace all non-returning strlcpy with strscpy
string: use __builtin_memcpy() in strlcpy/strlcat
staging: most: Replace all non-returning strlcpy with strscpy
drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
...
For historical reasons, unbound workqueues with max concurrency limit of 1
are considered ordered, even though the concurrency limit hasn't been
system-wide for a long time. This creates ambiguity around whether ordered
execution is actually required for correctness, which was actually confusing
for e.g. btrfs (btrfs updates are being routed through the btrfs tree).
There aren't that many users in the tree which use the combination and there
are pending improvements to unbound workqueue affinity handling which will
make inadvertent use of ordered workqueue a bigger loss. This pull request
clarifies the situation for most of them by updating the ones which require
ordered execution to use alloc_ordered_workqueue().
There are some conversions being routed through subsystem-specific trees and
likely a few stragglers. Once they're all converted, workqueue can trigger a
warning on unbound + @max_active==1 usages and eventually drop the implicit
ordered behavior.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoKnA4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGc5SAQDOtjML7Cx9AYzbY5+nYc0wTebRRTXGeOu7A3Xy
j50rVgEAjHgvHLIdmeYmVhCeHOSN4q7Wn5AOwaIqZalOhfLyKQk=
=hs79
-----END PGP SIGNATURE-----
Merge tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull ordered workqueue creation updates from Tejun Heo:
"For historical reasons, unbound workqueues with max concurrency limit
of 1 are considered ordered, even though the concurrency limit hasn't
been system-wide for a long time.
This creates ambiguity around whether ordered execution is actually
required for correctness, which was actually confusing for e.g. btrfs
(btrfs updates are being routed through the btrfs tree).
There aren't that many users in the tree which use the combination and
there are pending improvements to unbound workqueue affinity handling
which will make inadvertent use of ordered workqueue a bigger loss.
This clarifies the situation for most of them by updating the ones
which require ordered execution to use alloc_ordered_workqueue().
There are some conversions being routed through subsystem-specific
trees and likely a few stragglers. Once they're all converted,
workqueue can trigger a warning on unbound + @max_active==1 usages and
eventually drop the implicit ordered behavior"
* tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues
net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues
net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues
dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues
media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues
scsi: NCR5380: Use default @max_active for hostdata->work_q
media: coda: Use alloc_ordered_workqueue() to create ordered workqueues
crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues
wifi: mwifiex: Use default @max_active for workqueues
wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq
xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues
virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues
net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues
greybus: Use alloc_ordered_workqueue() to create ordered workqueues
powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmSYzfYeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/ucH/iOM/1Py/fSg0qSs
7NJ4XXlourT5zrnRMom3cm3d9gYqgTzgvKFL3kjMEexTRVYbhlcO4ZPRsiry8zxF
ToGX+V8tDMqb8WSdFHzkljRY+zDRyfEUDMlTzROAD9DunLmQtkJKyrggkeGdjkpP
OyfGqKpwlLXZRAXBil/U8Mx9MHdjJubloZwghLZr33VdUZa68+JJ9l6w163Oe/ET
K264NM0wxN/kvN57JvePgqMccQwpINylg8IhRI+XelgczjUXeJBsOA8TDv4bDN4Q
bjCLhkWbIaZtTYqvOXa/kD0T8wd7KETsMBQN8YzyDh6W0GmAlJjTawyAhA6jA5in
x3uz2W8=
=L3zp
-----END PGP SIGNATURE-----
Merge tag 'v6.4' into rdma.git for-next
Linux 6.4
Resolve conflicts between rdma rc and next in rxe_cq matching linux-next:
drivers/infiniband/sw/rxe/rxe_cq.c:
https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
- Add new IOCTLs to the int340x thermal driver to allow user space to
retrieve the Passive v2 thermal table (Srinivas Pandruvada).
- Add DT bindings for SM6375, MSM8226 and QCM2290 Qcom platforms (Konrad
Dybcio).
- Add DT bindings and support for QCom MSM8226 (Matti Lehtimäki).
- Add DT bindings for QCom ipq9574 (Praveenkumar I).
- Convert bcm2835 DT bindings to the yaml schema (Stefan Wahren).
- Allow selecting the bang-bang governor as default (Thierry Reding).
- Refactor and prepare the code to set the scene for RCar Gen4 (Wolfram
Sang).
- Clean up and fix the QCom tsens drivers. Add DT bindings and
calibration for the MSM8909 platform (Stephan Gerhold).
- Revert a patch introducing a wrong usage of devm_of_iomap() on the
Mediatek platform (Ricardo Cañuelo).
- Fix the clock vs reset ordering in order to conform to the
documentation on the sun8i (Christophe JAILLET).
- Prevent setting up undocumented registers, enable the only described
sensors and add the version 2.1 on the Qoriq sensor (Peng Fan).
- Add DT bindings and support for the Armada AP807 (Alex Leibovich).
- Update the mlx5 driver with the recent thermal changes (Daniel
Lezcano).
- Convert to platform remove callback returning void on STM32 (Uwe
Kleine-König).
- Add an error information printing for devm_thermal_add_hwmon_sysfs()
and remove the error from the Sun8i, Amlogic, i.MX, TI, K3, Tegra,
Qoriq, Mediateka and QCom (Yangtao Li).
- Register as hwmon sensor for the Generic ADC (Chen-Yu Tsai).
- Use the dev_err_probe() function in the QCom tsens alarm driver (Luca
Weiss).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmSZxuYSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxVicP/0Gl7fQc2l2/AEwe+FuwZz8xERhbLc4s
kSXVIJ1B6GScefUmLBqDZoWokVqP4gXgjTnYdqZNJmnCo+hP9i6VzVjVXSI2eNe7
426cJd7la2K0zWcjsw/apdGa81Btnf0IjIMjeWiZOPVkRE6UmnHBwG52C5Hd+1pa
EJaRcxv37MvrRKgzwUV+4WSBd5p1v3UrElzs10DXt+mVfRjsV7Y3bxnfW0+kh1sK
3J/AzPXx1v6e6mnKQQQAso0r1g22n/lfh9C7VQT01fXt704PLx1jNr4vtlGhCJkQ
sRlodAx+6uSX2GyeVs9hKOPZees3lY9VcZOI3c4g0Sy2ad4Hl7QDRbMmMUiiEVNZ
fmiIs26RbfeRvl2lNDEYxLxsBaMva4rerVuOlRjkKeXDboa/TyZ2/MwvaE/uaxT7
yWBoqoY67aPfyS8YgSu/DqRK60J+6PShaYeOJ4deYA6skctCe1v7nRJu07n62RBd
JuPUUifiBb6jpxDPCRBHLxwUIMK9gzJIRM74ieZs/XSG56v4dJR2e8GmHZLoiLPV
cc/uyHopQc75z7fYNDHdu7EHzUUGsjI6A9cNoCcdG86+b/pUEXTBLxpEuOv3bXfm
DtpoS2O0PEJtblTmS4HFJJuJ3I1qUzJZgo/KmwmFEcXOfo2KEqVzcleSr1t+zAbR
rCTcgVTaYpt4
=dtvf
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These extend the int340x thermal driver, add thermal DT bindings for
some Qcom platforms, add DT bindings and support for Armada AP807 and
MSM8909, allow selecting the bang-bang thermal governor as the default
one, address issues in several thermal drivers for ARM platforms and
clean up code.
Specifics:
- Add new IOCTLs to the int340x thermal driver to allow user space to
retrieve the Passive v2 thermal table (Srinivas Pandruvada)
- Add DT bindings for SM6375, MSM8226 and QCM2290 Qcom platforms
(Konrad Dybcio)
- Add DT bindings and support for QCom MSM8226 (Matti Lehtimäki)
- Add DT bindings for QCom ipq9574 (Praveenkumar I)
- Convert bcm2835 DT bindings to the yaml schema (Stefan Wahren)
- Allow selecting the bang-bang governor as default (Thierry Reding)
- Refactor and prepare the code to set the scene for RCar Gen4
(Wolfram Sang)
- Clean up and fix the QCom tsens drivers. Add DT bindings and
calibration for the MSM8909 platform (Stephan Gerhold)
- Revert a patch introducing a wrong usage of devm_of_iomap() on the
Mediatek platform (Ricardo Cañuelo)
- Fix the clock vs reset ordering in order to conform to the
documentation on the sun8i (Christophe JAILLET)
- Prevent setting up undocumented registers, enable the only
described sensors and add the version 2.1 on the Qoriq sensor (Peng
Fan)
- Add DT bindings and support for the Armada AP807 (Alex Leibovich)
- Update the mlx5 driver with the recent thermal changes (Daniel
Lezcano)
- Convert to platform remove callback returning void on STM32 (Uwe
Kleine-König)
- Add an error information printing for devm_thermal_add_hwmon_sysfs()
and remove the error from the Sun8i, Amlogic, i.MX, TI, K3, Tegra,
Qoriq, Mediateka and QCom (Yangtao Li)
- Register as hwmon sensor for the Generic ADC (Chen-Yu Tsai)
- Use the dev_err_probe() function in the QCom tsens alarm driver
(Luca Weiss)"
* tag 'thermal-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits)
thermal/drivers/qcom/temp-alarm: Use dev_err_probe
thermal/drivers/generic-adc: Register thermal zones as hwmon sensors
thermal/drivers/mediatek/lvts_thermal: Remove redundant msg in lvts_ctrl_start()
thermal/drivers/qcom: Remove redundant msg at probe time
thermal/drivers/ti-soc: Remove redundant msg in ti_thermal_expose_sensor()
thermal/drivers/qoriq: Remove redundant msg in qoriq_tmu_register_tmu_zone()
thermal/drivers/tegra: Remove redundant msg in tegra_tsensor_register_channel()
drivers/thermal/k3: Remove redundant msg in k3_bandgap_probe()
thermal/drivers/imx: Remove redundant msg in imx8mm_tmu_probe() and imx_sc_thermal_probe()
thermal/drivers/amlogic: Remove redundant msg in amlogic_thermal_probe()
thermal/drivers/sun8i: Remove redundant msg in sun8i_ths_register()
thermal/hwmon: Add error information printing for devm_thermal_add_hwmon_sysfs()
thermal/drivers/stm32: Convert to platform remove callback returning void
net/mlx5: Update the driver with the recent thermal changes
thermal/drivers/armada: Add support for AP807 thermal data
dt-bindings: armada-thermal: Add armada-ap807-thermal compatible
thermal/drivers/qoriq: Support version 2.1
thermal/drivers/qoriq: Only enable supported sensors
thermal/drivers/qoriq: No need to program site adjustment register
thermal/drivers/mediatek/lvts_thermal: Register thermal zones as hwmon sensors
...
The thermal framework is migrating to the generic trip points. The set
of changes also implies a self-encapsulation of the thermal zone
device structure where the internals are no longer directly accessible
but with accessors.
Use the new API instead, so the next changes can be pushed in the
thermal framework without this driver failing to compile.
No functional changes intended.
Cc: Sandipan Patra <spatra@nvidia.com>
Cc: Gal Pressman <gal@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230525140135.3589917-2-daniel.lezcano@linaro.org
Add two bytes of padding to the start of struct ef4_loopback_payload,
which are not sent on the wire. This ensures the 'ip' member is
4-byte aligned, preventing the following W=1 warning:
net/ethernet/sfc/falcon/selftest.c:43:15: error: field ip within 'struct ef4_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct ef4_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
struct iphdr ip;
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two bytes of padding to the start of struct efx_loopback_payload,
which are not sent on the wire. This ensures the 'ip' member is
4-byte aligned, preventing the following W=1 warning:
net/ethernet/sfc/siena/selftest.c:46:15: error: field ip within 'struct efx_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct efx_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
struct iphdr ip;
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two bytes of padding to the start of struct efx_loopback_payload,
which are not sent on the wire. This ensures the 'ip' member is
4-byte aligned, preventing the following W=1 warning:
net/ethernet/sfc/selftest.c:46:15: error: field ip within 'struct efx_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct efx_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
struct iphdr ip;
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_net_stats() (.ndo_get_stats64) can be called during an ethtool
selftest, during which time nic_data->mc_stats is NULL as the NIC has
been fini'd. In this case do not attempt to fetch the latest stats
from the hardware, else we will crash on a NULL dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000038
RIP efx_nic_update_stats
abridged calltrace:
efx_ef10_update_stats_pf
efx_net_stats
dev_get_stats
dev_seq_printf_stats
Skipping the read is safe, we will simply give out stale stats.
To ensure that the free in efx_ef10_fini_nic() does not race against
efx_ef10_update_stats_pf(), which could cause a TOCTTOU bug, take the
efx->stats_lock in fini_nic (it is already held across update_stats).
Fixes: d3142c193d ("sfc: refactor EF10 stats handling")
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5 driver minor cleanup and fixes to net-next
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSV8icACgkQSD+KveBX
+j6WgAf+Ph5w2dAfuadjlAcQlK2WKXyS1OliVpvCuglepgsP9Zree11WVoyiAKef
70Xd5Vu4xAQTdpCsw4DGM7sh6xW1SxvZFP1A/FVk7UU1E0zL2SXzKyEHK29I1MH5
nej2hRf/W1dwYtxfwNOTKAFco3wr0e1vLgMDEqZBZbJXzcUetDRADkgWrx9U+pno
lBPhVMZWK5R0GzOjlWZXoedaXx2RIcm+5U02ov5S6d1y8AA+sE93tiYxrP9z/2lj
nml1KeQQl0Ku/y+e8RMSUd9mPdomQZi6CVHMD1wV5DNv6dnrT1bPrUdt4torSXEI
KAzkVie979XP2jvkDKY2nyXZ8dn7cw==
=woKn
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-06-21
mlx5 driver minor cleanup and fixes to net-next
* tag 'mlx5-updates-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Remove pointless vport lookup from mlx5_esw_check_port_type()
net/mlx5: Remove redundant check from mlx5_esw_query_vport_vhca_id()
net/mlx5: Remove redundant is_mdev_switchdev_mode() check from is_ib_rep_supported()
net/mlx5: Remove redundant MLX5_ESWITCH_MANAGER() check from is_ib_rep_supported()
net/mlx5e: E-Switch, Fix shared fdb error flow
net/mlx5e: Remove redundant comment
net/mlx5e: E-Switch, Pass other_vport flag if vport is not 0
net/mlx5e: E-Switch, Use xarray for devcom paired device index
net/mlx5e: E-Switch, Add peer fdb miss rules for vport manager or ecpf
net/mlx5e: Use vhca_id for device index in vport rx rules
net/mlx5: Lag, Remove duplicate code checking lag is supported
net/mlx5: Fix error code in mlx5_is_reset_now_capable()
net/mlx5: Fix reserved at offset in hca_cap register
net/mlx5: Fix SFs kernel documentation error
net/mlx5: Fix UAF in mlx5_eswitch_cleanup()
====================
Link: https://lore.kernel.org/r/20230623192907.39033-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
64-bit DMA detection will fail if axienet was started before (by boot
loader, boot ROM, etc). In this state axienet will not start properly.
XAXIDMA_TX_CDESC_OFFSET + 4 register (MM2S_CURDESC_MSB) is used to detect
64-bit DMA capability here. But datasheet says: When DMACR.RS is 1
(axienet is in enabled state), CURDESC_PTR becomes Read Only (RO) and
is used to fetch the first descriptor. So iowrite32()/ioread32() trick
to this register to detect 64-bit DMA will not work.
So move axienet reset before 64-bit DMA detection.
Fixes: f735c40ed9 ("net: axienet: Autodetect 64-bit DMA capability")
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/20230622192245.116864-1-fido_max@inbox.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the devres variant of stmmac_pltfr_probe() and finally drop the
remove() callback entirely.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-12-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a devres variant of stmmac_pltfr_probe() which allows users to
skip calling stmmac_pltfr_remove() at driver detach.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-11-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Significantly simplify the driver's probe() function by using the devres
variant of stmmac_probe_config_dt(). This allows to drop the goto jumps
entirely.
The remove_new() callback now needs to be switched to
stmmac_pltfr_remove_no_dt().
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-10-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a devres variant of stmmac_probe_config_dt() that allows users to
skip calling stmmac_remove_config_dt() at driver detach.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-9-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a variant of stmmac_pltfr_remove() that only frees resources
allocated by stmmac_pltfr_probe() and - unlike stmmac_pltfr_remove() -
does not call stmmac_remove_config_dt().
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-8-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shrink the code and remove labels by using the new stmmac_pltfr_probe()
function.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-7-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement stmmac_pltfr_probe() which is the logical API counterpart
for stmmac_pltfr_remove(). It calls the platform's init() callback and
then probes the stmmac device.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-6-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shrink the code in dwmac-generic by using the new stmmac_pltfr_exit()
helper.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-5-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a helper wrapper around calling the platform's exit() callback.
This allows users to skip checking if the callback exists.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-4-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shrink the code in dwmac-generic by using the new stmmac_pltfr_init()
helper.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-3-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a helper wrapper around calling the platform's init() callback.
This allows users to skip checking if the callback exists.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230623100417.93592-2-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-22 (ice)
This series contains updates to ice driver only.
Jake adds a slight wait on control queue send to reduce wait time for
responses that occur within normal times.
Maciej allows for hot-swapping XDP programs.
Przemek removes unnecessary checks when enabling SR-IOV and freeing
allocated memory.
Christophe Jaillet converts a managed memory allocation to a regular one.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: use ice_down_up() where applicable
ice: Remove managed memory usage in ice_get_fw_log_cfg()
ice: remove null checks before devm_kfree() calls
ice: clean up freeing SR-IOV VFs
ice: allow hot-swapping XDP programs
ice: reduce initial wait for control queue messages
====================
Link: https://lore.kernel.org/r/20230622183601.2406499-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-22 (iavf)
This series contains updates to iavf driver only.
Przemek defers removing, previous, primary MAC address until after
getting result of adding its replacement. He also does some cleanup by
removing unused functions and making applicable functions static.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
iavf: make functions static where possible
iavf: remove some unused functions and pointless wrappers
iavf: fix err handling for MAC replace
====================
Link: https://lore.kernel.org/r/20230622165914.2203081-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
igc: TX timestamping fixes
This is the fixes part of the series intended to add support for using
the 4 timestamp registers present in i225/i226.
Moving the timestamp handling to be inline with the interrupt handling
has the advantage of improving the TX timestamping retrieval latency,
here are some numbers using ntpperf:
Before:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37
| responses | TX timestamp offset (ns)
rate clients | lost invalid basic xleave | min mean max stddev
1000 100 0.00% 0.00% 0.00% 100.00% -56 +9 +52 19
1500 150 0.00% 0.00% 0.00% 100.00% -40 +30 +75 22
2250 225 0.00% 0.00% 0.00% 100.00% -11 +29 +72 15
3375 337 0.00% 0.00% 0.00% 100.00% -18 +40 +88 22
5062 506 0.00% 0.00% 0.00% 100.00% -19 +23 +77 15
7593 759 0.00% 0.00% 0.00% 100.00% +7 +47 +5168 43
11389 1138 0.00% 0.00% 0.00% 100.00% -11 +41 +5240 39
17083 1708 0.00% 0.00% 0.00% 100.00% +19 +60 +5288 50
25624 2562 0.00% 0.00% 0.00% 100.00% +1 +56 +5368 58
38436 3843 0.00% 0.00% 0.00% 100.00% -84 +12 +8847 66
57654 5765 0.00% 0.00% 100.00% 0.00%
86481 8648 0.00% 0.00% 100.00% 0.00%
129721 12972 0.00% 0.00% 100.00% 0.00%
194581 16384 0.00% 0.00% 100.00% 0.00%
291871 16384 27.35% 0.00% 72.65% 0.00%
437806 16384 50.05% 0.00% 49.95% 0.00%
After:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37
| responses | TX timestamp offset (ns)
rate clients | lost invalid basic xleave | min mean max stddev
1000 100 0.00% 0.00% 0.00% 100.00% -44 +0 +61 19
1500 150 0.00% 0.00% 0.00% 100.00% -6 +39 +81 16
2250 225 0.00% 0.00% 0.00% 100.00% -22 +25 +69 15
3375 337 0.00% 0.00% 0.00% 100.00% -28 +15 +56 14
5062 506 0.00% 0.00% 0.00% 100.00% +7 +78 +143 27
7593 759 0.00% 0.00% 0.00% 100.00% -54 +24 +144 47
11389 1138 0.00% 0.00% 0.00% 100.00% -90 -33 +28 21
17083 1708 0.00% 0.00% 0.00% 100.00% -50 -2 +35 14
25624 2562 0.00% 0.00% 0.00% 100.00% -62 +7 +66 23
38436 3843 0.00% 0.00% 0.00% 100.00% -33 +30 +5395 36
57654 5765 0.00% 0.00% 100.00% 0.00%
86481 8648 0.00% 0.00% 100.00% 0.00%
129721 12972 0.00% 0.00% 100.00% 0.00%
194581 16384 19.50% 0.00% 80.50% 0.00%
291871 16384 35.81% 0.00% 64.19% 0.00%
437806 16384 55.40% 0.00% 44.60% 0.00%
During this series, and to show that as is always the case, things are
never easy as they should be, a hardware issue was found, and it took
some time to find the workaround(s). The bug and workaround are better
explained in patch 4/4.
Note: the workaround has a simpler alternative, but it would involve
adding support for the other timestamp registers, and only using the
TXSTMP{H/L}_0 as a way to clear the interrupt. But I feel bad about
throwing this kind of resources away. Didn't test this extensively but
it should work.
Also, as Marc Kleine-Budde suggested, after some consensus is reached
on this series, most parts of it will be proposed for igb.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: Work around HW bug causing missing timestamps
igc: Retrieve TX timestamp during interrupt handling
igc: Check if hardware TX timestamping is enabled earlier
igc: Fix race condition in PTP tx code
====================
Link: https://lore.kernel.org/r/20230622165244.2202786-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The
reason is that eventually, next hops for mlxsw uppers should be offloaded
and unoffloaded on demand as a netdevice becomes an upper, or stops being
one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a
netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the
netdevice lifetime.
Correspondingly, track at each next hop not its RIF, but its CRIF (from
which a RIF can always be deduced).
Note that now that next hops are tracked at a CRIF, it is not necessary to
move each over to a new RIF when it is necessary to edit a RIF. Therefore
drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy()
call mlxsw_sp_nexthop_rif_update() directly.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nexthop finalization consists of two steps: the part where the offload is
removed, because the backing RIF is now gone; and the part where the
association to the RIF is severed.
Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the
unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later
be called independently.
Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini()
vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a
historical happenstance than a conscious decision. The two cleanups do not
depend on each other, and this change should have no observable effects.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A previous patch added a pointer to loopback CRIF to the router data
structure. That makes the loopback RIF index redundant, as everything
necessary can be derived from the CRIF. Drop the field and adjust the code
accordingly.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a RIF is about to be created, the registration of the netdevice that
it should be associated with must have been seen in the past, and a CRIF
created. Therefore make this a hard requirement by looking up the CRIF
during RIF creation, and complaining loudly when there isn't one.
This then allows to keep a link between a RIF and its corresponding
CRIF (and back, as the relationship is one-to-at-most-one), which do.
The CRIF will later be useful as the objects tracked there will be
offloaded lazily as a result of RIF creation.
CRIFs are created when an "interesting" netdevice is registered, and
destroyed after such device is unregistered. CRIFs are supposed to already
exist when a RIF creation request arises, and exist at least as long as
that RIF exists. This makes for a simple invariant: it is always safe to
dereference CRIF pointer from "its" RIF.
To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER
event is delivered. The reason is that if a RIF's netdevices has an IPv6
address, removal of this address is notified in an atomic block. To remove
the RIF, the IPv6 removal handler schedules a work item. It must be safe
for this work item to access the associated CRIF as well.
Thus when a netdevice that backs the CRIF is removed, if it still has a
RIF, do not actually free the CRIF, only toggle its can_destroy flag, which
this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CRIFs are generally not maintained for loopback RIFs. However, the RIF for
the default VRF is used for offloading of blackhole nexthops. Nexthops
expect to have a valid CRIF. Therefore in this patch, add code to maintain
CRIF for the loopback RIF as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CRIFs are objects that mlxsw maintains for netdevices that may not have an
associated RIF (i.e. they may not have been instantiated in the ASIC), but
if indeed they do not, it is quite possible they will in the future. These
netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are
created include e.g. bridges, LAGs, or front panel ports. The idea is that
next hops would be kept at CRIFs, not RIFs, and thus it would be easier to
offload and unoffload the entities that have been added before the RIF was
created.
In this patch, add the code for low-level CRIF maintenance: create and
destroy, and keep in a table keyed by the netdevice pointer for easy
recall.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the
function mentioned in the subject. As such it forms an external interface
of the router code.
In future patches we will want to maintain connection between RIFs and the
CRIFs (introduced in the next patch) that back them. That will not hold
for the VRF-based loopback netdevices, so the whole CRIF business can be
kept hidden from the rest of mlxsw.
But for the main VRF loopback RIF we do want to keep the RIF-CRIF
connection, because that RIF is used for blackhole next hops, and the next
hop code can be kept simpler for assuming rif->crif is valid.
Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback
RIF. This being an internal function will take the CRIF argument anyway.
Furthermore, the function does not lock, which is not necessary at this
point in code yet.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With support for Ethernet PHY LEDs having been added, while
unregistering a MDIO bus and its child device liks PHYs there may be
"late" accesses to the MDIO bus. One typical use case is setting the PHY
LEDs brightness to OFF for instance.
We need to ensure that the MDIO bus controller remains entirely
functional since it runs off the main GENET adapter clock.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230617155500.4005881-1-andrew@lunn.ch/
Fixes: 9a4e796970 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230622103107.1760280-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As xa_get_mark() returns false in case the entry is not present,
no need to redundantly check if vport is present. Remove the lookup.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Since mlx5_esw_query_vport_vhca_id() could be called either from
mlx5_esw_vport_enable() or mlx5_esw_vport_disable() where the
the check is done, this is always false here.
Remove the redundant check.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
is_mdev_switchdev_mode() check is done in is_eth_rep_supported().
Function is_ib_rep_supported() calls is_eth_rep_supported().
Remove the redundant check from it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
MLX5_ESWITCH_MANAGER() check is done in is_eth_rep_supported().
Function is_ib_rep_supported() calls is_eth_rep_supported().
Remove the redundant check from it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
On error flow resources being freed in esw_master_egress_destroy_resources()
but pointers not being set to null if error flow is from creating a
bounce rule. Then in esw_acl_egress_ofld_cleanup() we try to access already
freed pointers. Fix it by resetting the pointers to null.
Also if error is from creating a second or later bounce rule then the
flow group and table being used and cannot and should not be freed.
Add a check to destroy the flow group and table if there are no bounce
rules.
mlx5_core.sf mlx5_core.sf.2: mlx5_destroy_flow_group:2306:(pid 2235): Flow group 4 wasn't destroyed, refcount > 1
mlx5_core.sf mlx5_core.sf.2: mlx5_destroy_flow_table:2295:(pid 2235): Flow table 3 wasn't destroyed, refcount > 1
Fixes: 5e0202eb49 ("net/mlx5: E-switch, Handle multiple master egress rules")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The function comment says what it is and the comment
is redundant.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When creating flow table for shared fdb resources, there is
only need to pass other_vport flag if vport is not 0 or
if the port is ECPF in BlueField.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To allow devcom events on E-Switch that is not a vport group manager,
use vhca id as an index instead of device index which might be shared
between several E-Switches. for example SF and its PF.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add peer fdb rules for E-Switch that are vport managers or ecpf device.
It is not needed for other devices.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Device index is like PF index and limited to max physical ports.
For example, SFs created under PF the device index is the PF device index.
Use vhca_id which gets the FW index per vport, for vport rx rules
and vport pair events.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Remove duplicate function for checking if device has lag support.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The mlx5_is_reset_now_capable() function returns bool, not negative
error codes. So if fast teardown is not supported it should return
false instead of -EOPNOTSUPP.
Fixes: 92501fa6e4 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5_eswitch_cleanup() is using esw right after freeing it for
releasing devlink_param.
Fix it by releasing the devlink_param before freeing the esw, and
adjust the create function accordingly.
Fixes: 3f90840305 ("net/mlx5: Move esw multiport devlink param to eswitch code")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Automatic Verification <verifier@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Several functions in the hns3 driver have unused parameters.
The compiler will warn about them when building
with -Wunused-parameter option of hns3.
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now, strncpy() in hns3_dbg_fill_content() use src-length as copy-length,
it may result in dest-buf overflow.
This patch is to fix intel compile warning for csky-linux-gcc (GCC) 12.1.0
compiler.
The warning reports as below:
hclge_debugfs.c:92:25: warning: 'strncpy' specified bound depends on
the length of the source argument [-Wstringop-truncation]
strncpy(pos, items[i].name, strlen(items[i].name));
hclge_debugfs.c:90:25: warning: 'strncpy' output truncated before
terminating nul copying as many bytes from a string as its length
[-Wstringop-truncation]
strncpy(pos, result[i], strlen(result[i]));
strncpy() use src-length as copy-length, it may result in
dest-buf overflow.
So,this patch add some values check to avoid this issue.
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/lkml/202207170606.7WtHs9yS-lkp@intel.com/T/
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When processing counter updates, if any action set using the newly
incremented counter includes an encap action, prod the corresponding
neighbouring entry to indicate to the neighbour cache that the entry
is still in use and passing traffic.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230621121504.17004-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a setup where a Thunderbolt hub connects to Ethernet and a display
through USB Type-C, users may experience a hung task timeout when they
remove the cable between the PC and the Thunderbolt hub.
This is because the igb_down function is called multiple times when
the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
triggers the first call, and the igb_remove triggers the second call.
The second call to igb_down will block at napi_synchronize.
Here's the call trace:
__schedule+0x3b0/0xddb
? __mod_timer+0x164/0x5d3
schedule+0x44/0xa8
schedule_timeout+0xb2/0x2a4
? run_local_timers+0x4e/0x4e
msleep+0x31/0x38
igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
__igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
__dev_close_many+0x95/0xec
dev_close_many+0x6e/0x103
unregister_netdevice_many+0x105/0x5b1
unregister_netdevice_queue+0xc2/0x10d
unregister_netdev+0x1c/0x23
igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
pci_device_remove+0x3f/0x9c
device_release_driver_internal+0xfe/0x1b4
pci_stop_bus_device+0x5b/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_and_remove_bus_device+0x12/0x19
pciehp_unconfigure_device+0x76/0xe9
pciehp_disable_slot+0x6e/0x131
pciehp_handle_presence_or_link_change+0x7a/0x3f7
pciehp_ist+0xbe/0x194
irq_thread_fn+0x22/0x4d
? irq_thread+0x1fd/0x1fd
irq_thread+0x17b/0x1fd
? irq_forced_thread_fn+0x5f/0x5f
kthread+0x142/0x153
? __irq_get_irqchip_state+0x46/0x46
? kthread_associate_blkcg+0x71/0x71
ret_from_fork+0x1f/0x30
In this case, igb_io_error_detected detaches the network interface
and requests a PCIE slot reset, however, the PCIE reset callback is
not being invoked and thus the Ethernet connection breaks down.
As the PCIE error in this case is a non-fatal one, requesting a
slot reset can be avoided.
This patch fixes the task hung issue and preserves Ethernet
connection by ignoring non-fatal PCIE errors.
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620174732.4145155-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update macb's embedded PCS drivers to use neg_mode, even though it
makes no use of it or the "mode" argument. This makes the driver
consistent with converted drivers.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qA8Eo-00EaGX-KJ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update Sparx5's embedded PCS driver to use neg_mode rather than the
mode argument. As there is no pcs_link_up() method, this only affects
the pcs_config() method.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qA8EZ-00EaGF-6F@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update prestera's embedded PCS driver to use neg_mode rather than the
mode argument. As there is no pcs_link_up() method, this only affects
the pcs_config() method.
Acked-by: Elad Nachman <enachman@marvell.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qA8EO-00EaG3-TR@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>