Currently, the driver will schedule RX ring reset when we get a buffer
error in the RX completion record. These RX buffer errors can be due
to normal out-of-buffer conditions or a permanent error in the RX
ring. Because the driver cannot distinguish between these 2
conditions, we assume all these buffer errors require reset.
This is very disruptive when it is just a normal out-of-buffer
condition. Newer firmware will now monitor the rings for the permanent
failure and will send a notification to the driver when it happens.
This allows the driver to reset only when such a notification is
received. In environments where we have predominently out-of-buffer
conditions, we now can avoid these unnecessary resets.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is logic in the RX path to detect unexpected handles in the
RX completion. We'll print a warning and schedule a reset. The
next expected handle is then set to 0xffff which is guaranteed to
not match any valid handle. This will force all remaining packets in
the ring to be discarded before the reset. There can be hundreds of
these packets remaining in the ring and there is no need to print the
warnings for these forced errors.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a per ring rx_resets counter to count these RX resets.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some older chips, it is necessary to do a reset when we get buffer
errors associated with an RX ring. These buffer errors may become
frequent if the RX ring underruns under heavy traffic. The current
code does a global reset of all reasources when this happens. This
works but creates a big disruption of all rings when one RX ring is
having problem. This patch implements a localized RX ring reset of
just the RX ring having the issue. All other rings including all
TX rings will not be affected by this single RX ring reset.
Only the older chips prior to the P5 class supports this reset.
Because it is not a global reset, packets may still be arriving
while we are calling firmware to reset that ring. We need to be
sure that we don't post any buffers during this time while the
ring is undergoing reset. After firmware completes successfully,
the ring will be in the reset state with no buffers and we can start
filling it with new buffers and posting them.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_init_one_rx_ring() includes logic to initialize the BDs for one RX
ring and to allocate the buffers. Separate the allocation logic into a
new bnxt_alloc_one_rx_ring() function. The allocation function will be
used later to allocate new buffers for one specified RX ring when we
reset that RX ring.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_free_rx_skbs() frees all the allocated buffers and SKBs for
every RX ring. Refactor this function by calling a new function
bnxt_free_one_rx_ring_skbs() to free these buffers on one specified
RX ring at a time. This is preparation work for resetting one RX
ring during run-time.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If firmware does not come out of reset, log FW health status info
to provide more information on firmware status.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NS3 SoC platforms require assistance from the OP-TEE to recover
firmware if a crash occurs while no driver is bound. The
CRASHED_NO_MASTER condition is recorded in the firmware status register
during the crash to indicate when driver intervension is needed to
coordinate a firmware reload. This condition is detected during early
driver initialization in order to effect a firmware fastboot on
supported platforms when necessary.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware now supports device independent discovery of the status
register location. This status register can provide more detailed
information about firmware errors, especially if problems occur
before the HWRM interface is functioning. Attempt to map this
register if it is present and report the firmware status on firmware
init failures.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The allocator for the firmware health structure conflates allocation
and capability checks, limiting the reusability of the code. This patch
separates out the capability check and disablement and improves the
warning message to better describe the consequences of an allocation
failure.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Main changes is to extend hwrm_nvm_get_dev_info_output() for stored
firmware versions and a new flag is added to fw_status_reg.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the new group of devlink traps - PARSER_ERROR_DROPS.
This consists of registering the array of parser error drops supported,
controlling their action through the .trap_group_action_set() callback
and reporting an erroneous skb received on the error queue
appropriately.
DPAA2 devices do not support controlling the action of independent
parser error traps, thus the .trap_action_set() callback just returns an
EOPNOTSUPP while .trap_group_action_set() actually notifies the hardware
what it should do with a frame marked as having a header error.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic support in dpaa2-eth for devlink. For the moment, just
register the device with devlink, add the corresponding devlink port and
implement the .info_get() callback.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the new firmware image downladed for update is corrupted
or is a bad format, the download process will report a status
code specifically for that.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the lif's ident information for the uc and mc filter
counts rather than the ionic's version, to be sure
we're getting the info that is specific to this lif.
While we're thinking about it, add some missing error
checking where we get the lif's identity information.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
After we do a fw upgrade and refill the ionic->ident.dev, we
also need to update the other identity info. Since the lif
identity needs to be updated each time the ionic identity is
refreshed, we can pull it into ionic_identify().
The debugfs entry is moved so that it doesn't cause an
error message when the data is refreshed after the fw upgrade.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some time ago we short-circuited the queue disables on a timeout
error in order to not have to wait on every queue when we already
know it will time out. However, this meant that we're not
properly stopping all the interrupts and napi contexts. This
changes queue disable to always call ionic_qcq_disable() and to
give it an argument to know when to not do the adminq request.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a couple of error recovery paths that can come through
ionic_qcq_disable() without having set up the qcq, so we need
to make sure we have a valid qcq pointer before using it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear our link check requested flag on an allocation error.
We end up dropping this link check request, but that should
be fine as our watchdog will come back a few seconds later
and request it again.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check through our work list for additional items. This normally
will only have one item, but occasionally may have another
job waiting. There really is no need reschedule ourself here.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The event notification queue is set up a little differently in the
NIC and so the notifyq q and cq descriptor structures need to be
contiguous, which got missed in an earlier patch that separated
out the q and cq descriptor allocations. That patch was aimed at
making the big tx and rx descriptor queue allocations easier to
manage - the notifyq is much smaller and doesn't need to be split.
This patch simply adds an if/else and slightly different code for
the notifyq descriptor allocation.
Fixes: ea5a8b09dc ("ionic: reduce contiguous memory allocation requirement")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Via the OCELOT_MASK_MODE_REDIRECT flag put in the IS2 action vector, it
is possible to replace previous forwarding decisions with the port mask
installed in this rule.
I have studied Table 54 "MASK_MODE and PORT_MASK Combinations" from the
VSC7514 documentation and it appears to behave sanely when this rule is
installed in either lookup 0 or 1. Namely, a redirect in lookup 1 will
overwrite the forwarding decision taken by any entry in lookup 0.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The issue which led to the introduction of this check was that MAC_ETYPE
rules, such as filters on dst_mac and src_mac, would only match non-IP
frames. There is a knob in VCAP_S2_CFG which forces all IP frames to be
treated as non-IP, which is what we're currently doing if the user
requested a dst_mac filter, in order to maintain sanity.
But that knob is actually per IS2 lookup. And the good thing with
exposing the lookups to the user via tc chains is that we're now able to
offload MAC_ETYPE keys to one lookup, and IP keys to the other lookup.
So let's do that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were installing TCAM rules with the LOOKUP field as unmasked, meaning
that all entries were matching on all lookups. Now that lookups are
exposed as individual chains, let's make the LOOKUP explicit when
offloading TCAM entries.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VCAP ES0 is an egress VCAP operating on all outgoing frames.
This patch added ES0 driver to support vlan push action of tc filter.
Usage:
tc filter add dev swp1 egress protocol 802.1Q flower indev swp0 skip_sw \
vlan_id 1 vlan_prio 1 action vlan push id 2 priority 2
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VCAP IS1 is a VCAP module which can filter on the most common L2/L3/L4
Ethernet keys, and modify the results of the basic QoS classification
and VLAN classification based on those flow keys.
There are 3 VCAP IS1 lookups, mapped over chains 10000, 11000 and 12000.
Currently the driver is hardcoded to use IS1_ACTION_TYPE_NORMAL half
keys.
Note that the VLAN_MANGLE has been omitted for now. In hardware, the
VCAP_IS1_ACT_VID_REPLACE_ENA field replaces the classified VLAN
(metadata associated with the frame) and not the VLAN from the header
itself. There are currently some issues which need to be addressed when
operating in standalone, or in bridge with vlan_filtering=0 modes,
because in those cases the switch ports have VLAN awareness disabled,
and changing the classified VLAN to anything other than the pvid causes
the packets to be dropped. Another issue is that on egress, we expect
port tagging to push the classified VLAN, but port tagging is disabled
in the modes mentioned above, so although the classified VLAN is
replaced, it is not visible in the packet transmitted by the switch.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Ocelot switches, there are 2 ingress pipelines for flow offload
rules: VCAP IS1 (Ingress Classification) and IS2 (Security Enforcement).
IS1 and IS2 support different sets of actions. The pipeline order for a
packet on ingress is:
Basic classification -> VCAP IS1 -> VCAP IS2
Furthermore, IS1 is looked up 3 times, and IS2 is looked up twice (each
TCAM entry can be configured to match only on the first lookup, or only
on the second, or on both etc).
Because the TCAMs are completely independent in hardware, and because of
the fixed pipeline, we actually have very limited options when it comes
to offloading complex rules to them while still maintaining the same
semantics with the software data path.
This patch maps flow offload rules to ingress TCAMs according to a
predefined chain index number. There is going to be a script in
selftests that clarifies the usage model.
There is also an egress TCAM (VCAP ES0, the Egress Rewriter), which is
modeled on top of the default chain 0 of the egress qdisc, because it
doesn't have multiple lookups.
Suggested-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Co-developed-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the mscc_ocelot_switch_lib is common between a pure switchdev and
a DSA driver, the procedure of retrieving a net_device for a certain
port index differs, as those are registered by their individual
front-ends.
Up to now that has been dealt with by always passing the port index to
the switch library, but now, we're going to need to work with net_device
pointers from the tc-flower offload, for things like indev, or mirred.
It is not desirable to refactor that, so let's make sure that the flower
offload core has the ability to translate between a net_device and a
port index properly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At this stage, the tc-flower offload of mscc_ocelot can only delegate
rules to the VCAP IS2 security enforcement block. These rules have, in
hardware, separate bits for policing and for overriding the destination
port mask and/or copying to the CPU. So it makes sense that we attempt
to expose some more of that low-level complexity instead of simply
choosing between a single type of action.
Something similar happens with the VCAP IS1 block, where the same action
can contain enable bits for VLAN classification and for QoS
classification at the same time.
So model the action structure after the hardware description, and let
the high-level ocelot_flower.c construct an action vector from multiple
tc actions.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use netif_msg_init() to process param settings
and use only the proper initialized value of
ei_local->msg_level for later processing;
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some EtherAVB variants support internal clock delay configuration, which
can add larger delays than the delays that are typically supported by
the PHY (using an "rgmii-*id" PHY mode, and/or "[rt]xc-skew-ps"
properties).
Historically, the EtherAVB driver configured these delays based on the
"rgmii-*id" PHY mode. This caused issues with PHY drivers that
implement PHY internal delays properly[1]. Hence a backwards-compatible
workaround was added by masking the PHY mode[2].
Add proper support for explicit configuration of the MAC internal clock
delays using the new "[rt]x-internal-delay-ps" properties.
Fall back to the old handling if none of these properties is present.
[1] Commit bcf3440c6d ("net: phy: micrel: add phy-mode support for
the KSZ9031 PHY")
[2] Commit 9b23203c32 ("ravb: Mask PHY mode to avoid inserting
delays twice").
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, full delay handling is done in both the probe and resume
paths. Split it in two parts, so the resume path doesn't have to redo
the parsing part over and over again.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls to kzalloc() and kvzalloc() should be null-checked
in order to avoid any potential failures. In this case,
a potential null pointer dereference.
Fix this by adding null checks for _parse_attr_ and _flow_
right after allocation.
Addresses-Coverity-ID: 1497154 ("Dereference before null check")
Fixes: c620b77215 ("net/mlx5: Refactor tc flow attributes structure")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This code frees "shared_counter" and then dereferences on the next line
to get the error code.
Fixes: 1edae2335a ("net/mlx5e: CT: Use the same counter for both directions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When removing a flow from the slow path fdb, a flow attr struct is
allocated for the rule removal process. If the allocation fails the
code prints a warning message but continues with the removal flow
which include dereferencing a pointer which could be null.
Fix this by exiting the function in case the attr allocation failed.
Fixes: c620b77215 ("net/mlx5: Refactor tc flow attributes structure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use the PCI device directly for dma accesses as non PCI device unlikely
support IOMMU and dma mappings.
Introduce and use helper routine to access DMA device.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Set flow source as hint for local vport.
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently devlink eswitch ports are registered and unregistered by the
representor layer.
However it is better to register them at eswitch layer so that in future
user initiated command port add and delete commands can also
register/unregister devlink ports without depending on representor layer.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To register and unregister devlink ports when loading/unload
representors, refactor the code to helper functions.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently only VF vports need egress ACL table.
Add a generic helper to check whether a vport need egress
ACL table or not.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently only 256 vports can be supported as only 8 bits are
reserved for them and 8 bits are reserved for vhca_ids in
metadata reg c0. To support more than 256 vports, replace
vhca_id with a unique shorter 4-bit PF number which covers
upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095.
This will continue to generate unique metadata even if
multiple PCI devices have same switch_id.
Signed-off-by: sunils <sunils@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Skip the rule according to flow arrival source, in case of RX and the
source is local port skip and in case of TX and the source is uplink
skip, we get this info according to the flow source hint we get from
upper layers when creating the rule.
This is needed because for example in case of FDB table which has a TX
and RX tables and we are inserting a rule with an encap action which
is only a TX action, in this case rule will fail on RX, so we can rely
on the flow source hint and skip RX in such case.
Until now we relied on metadata regc_0 that upper layer mapped the
port in the regc_0, but the problem is that upper layer did not always
use regc_0 for port mapping, so now we added support to flow source
hint which upper layers will pass to SW steering when creating a rule.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Instead of getting the tag in each function, call the builder
directly with the tag. This will allow to use the same function
for building the tag and the bitmask.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The misc3 variable is used only once and can be dropped.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When we create a matcher we check that all fields are consumed.
There is no need for this specific check. This keeps the STE
builder functions simple and clean.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mask validity for ste builders is checked by mlx5dr_ste_build_pre_check
during matcher creation.
It already checks the mask value of source_vport, so removing
this duplicated check.
Also, moving there the check of source_eswitch_owner_vhca_id mask.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Validity check is done by reading the next lu_type from the STE,
this check can be replaced by checking the refcount.
This will make the check independent on internal STE structure.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>