The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but
xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM.
Thus, adjust for this when setting xdp.frame_sz, as it counts
from data_hard_start.
When doing XDP_TX this driver is smart and instead of a full DMA-map
does a DMA-sync on with packet length. As xdp_adjust_tail can now
grow packet length, add checks to make sure that grow size is within
the DMA-mapped size.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/158945342911.97035.11214251236208648808.stgit@firesoul
Fix to return negative error code -ENOMEM from the kzalloc() error
handling case instead of 0, as done elsewhere in this function.
Fixes: 174ab544e3 ("nfp: abm: add cls_u32 offload for simple band classification")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch reverts the folowing commits:
commit 064ff66e2b
"bonding: add missing netdev_update_lockdep_key()"
commit 53d374979e
"net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()"
commit 1f26c0d3d2
"net: fix kernel-doc warning in <linux/netdevice.h>"
commit ab92d68fc2
"net: core: add generic lockdep keys"
but keeps the addr_list_lock_key because we still lock
addr_list_lock nestedly on stack devices, unlikely xmit_lock
this is safe because we don't take addr_list_lock on any fast
path.
Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function nfp_abm_vnic_set_mac, pointer nsp is allocated by nfp_nsp_open.
But when nfp_nsp_has_hwinfo_lookup fail, the pointer is not released,
which can lead to a memory leak bug. Fix this issue by adding
nfp_nsp_close(nsp) in the error path.
Fixes: f6e71efdf9 ("nfp: abm: look up MAC addresses via management FW")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change nfp driver to use globally defined kernel version.
Reported-by: Borislav Petkov <bp@suse.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.
$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.1.1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 10 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Overlapping header include additions in macsec.c
A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c
Overlapping test additions in selftests Makefile
Overlapping PCI ID table adjustments in iwlwifi driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
The nfp driver uses ``fw.bundle_id`` to represent a unique identifier of the
entire firmware bundle.
A future change is going to introduce a similar notion in the ice
driver, so promote ``fw.bundle_id`` into a generic version now.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
NFP flower offload uses delayed stats. Kernel recently gained
the ability to specify stats types. Make nfp accept DELAYED
stats, not just the catch all "any".
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
flow_action_hw_stats_types_check() helper takes one of the
FLOW_ACTION_HW_STATS_*_BIT values as input. If we align
the arguments to the opening bracket of the helper there
is no way to call this helper and stay under 80 characters.
Remove the "types" part from the new flow_action helpers
and enum values.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: oss-drivers@netronome.com
To: netdev@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce flow_action_basic_hw_stats_types_check() helper and use it
in drivers. That sanitizes the drivers which do not have support
for action HW stats types.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the newly added pci_get_dsn() function for obtaining the 64-bit
Device Serial Number in the nfp6000_read_serial and
nfp_6000_get_interface functions.
pci_get_dsn() reports the Device Serial number as a u64 value created by
combining two pci_read_config_dword functions. The lower 16 bits
represent the device interface value, and the next 48 bits represent the
serial value. Use put_unaligned_be32 and put_unaligned_be16 to convert
the serial value portion into a Big Endian formatted serial u8 array.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver correctly rejects all unsupported parameters.
No functional changes.
v3: adjust commit message for new error code and member name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Add WireGuard
2) Add HE and TWT support to ath11k driver, from John Crispin.
3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.
4) Add variable window congestion control to TIPC, from Jon Maloy.
5) Add BCM84881 PHY driver, from Russell King.
6) Start adding netlink support for ethtool operations, from Michal
Kubecek.
7) Add XDP drop and TX action support to ena driver, from Sameeh
Jubran.
8) Add new ipv4 route notifications so that mlxsw driver does not have
to handle identical routes itself. From Ido Schimmel.
9) Add BPF dynamic program extensions, from Alexei Starovoitov.
10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.
11) Add support for macsec HW offloading, from Antoine Tenart.
12) Add initial support for MPTCP protocol, from Christoph Paasch,
Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.
13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
Cherian, and others.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
udp: segment looped gso packets correctly
netem: change mailing list
qed: FW 8.42.2.0 debug features
qed: rt init valid initialization changed
qed: Debug feature: ilt and mdump
qed: FW 8.42.2.0 Add fw overlay feature
qed: FW 8.42.2.0 HSI changes
qed: FW 8.42.2.0 iscsi/fcoe changes
qed: Add abstraction for different hsi values per chip
qed: FW 8.42.2.0 Additional ll2 type
qed: Use dmae to write to widebus registers in fw_funcs
qed: FW 8.42.2.0 Parser offsets modified
qed: FW 8.42.2.0 Queue Manager changes
qed: FW 8.42.2.0 Expose new registers and change windows
qed: FW 8.42.2.0 Internal ram offsets modifications
MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
Documentation: net: octeontx2: Add RVU HW and drivers overview
octeontx2-pf: ethtool RSS config support
octeontx2-pf: Add basic ethtool support
...
ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Pull networking fixes from David Miller:
1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
including adding a missing ipv6 match description.
2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
Bhat.
3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.
4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.
5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.
6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
Chaignon.
7) Multicast MAC limit test is off by one in qede, from Manish Chopra.
8) Fix established socket lookup race when socket goes from
TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
RCU grace period. From Eric Dumazet.
9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.
10) Fix active backup transition after link failure in bonding, from
Mahesh Bandewar.
11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.
12) Fix wrong interface passed to ->mac_link_up(), from Russell King.
13) Fix DSA egress flooding settings in b53, from Florian Fainelli.
14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.
15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.
16) Reject invalid MTU values in stmmac, from Jose Abreu.
17) Fix refcount leak in error path of u32 classifier, from Davide
Caratti.
18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
Kaseorg.
19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.
20) Disable hardware GRO when XDP is attached to qede, frm Manish
Chopra.
21) Since we encode state in the low pointer bits, dst metrics must be
at least 4 byte aligned, which is not necessarily true on m68k. Add
annotations to fix this, from Geert Uytterhoeven.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
sfc: Include XDP packet headroom in buffer step size.
sfc: fix channel allocation with brute force
net: dst: Force 4-byte alignment of dst_metrics
selftests: pmtu: fix init mtu value in description
hv_netvsc: Fix unwanted rx_table reset
net: phy: ensure that phy IDs are correctly typed
mod_devicetable: fix PHY module format
qede: Disable hardware gro when xdp prog is installed
net: ena: fix issues in setting interrupt moderation params in ethtool
net: ena: fix default tx interrupt moderation interval
net/smc: unregister ib devices in reboot_event
net: stmmac: platform: Fix MDIO init for platforms without PHY
llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
net: hisilicon: Fix a BUG trigered by wrong bytes_compl
net: dsa: ksz: use common define for tag len
s390/qeth: don't return -ENOTSUPP to userspace
s390/qeth: fix promiscuous mode after reset
s390/qeth: handle error due to unsupported transport mode
cxgb4: fix refcount init for TC-MQPRIO offload
tc-testing: initial tdc selftests for cls_u32
...
The simple RX resync strategy controlled by the kernel does not
guarantee as good results as if the device helps by detecting
the potential record boundaries and keeping track of them.
We've called this strategy stream scan in the tls-offload doc.
Implement this strategy for the NFP. The device sends a request
for record boundary confirmation, which is then recorded in
per-TLS socket state and responded to once record is reached.
Because the device keeps track of records passing after the
request was sent the response is not as latency sensitive as
when kernel just tries to tell the device the information
about the next record.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make nfp_net_parse_meta() take a packet pointer and return
a drop/no drop decision. Right now it returns the end of
metadata and caller compares it to the packet pointer.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both pre-tunnel match rules and flow merge functions parse compiled
match/action fields for validation.
Update these validation functions to include IPv6 match and action fields.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW sends an update of IPv6 tunnels that are active in a given period. Use
this information to update the kernel table so that neighbour entries do
not time out when active on the NIC.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A notifier is used to track route changes in the kernel. If a change is
made to a route that is offloaded to fw then an update is sent to the NIC.
The driver tracks all routes that are offloaded to determine if a kernel
change is of interest.
Extend the notifier to track IPv6 route changes and create a new list that
stores offloaded IPv6 routes. Modify the IPv4 route helper functions to
accept varying address lengths. This way, the same core functions can be
used to handle IPv4 and IPv6.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fw does not know the next hop for an IPv6 tunnel, it sends a request
to the driver.
Handle this request by doing a route lookup on the IPv6 address and
offloading the next hop to the fw neighbour table.
Similar functions already exist to handle IPv4 no neighbour requests. To
avoid confusion, append these functions with the _ipv4 tag. There is no
change in functionality with this.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv4 set tunnel action allows the setting of tunnel metadata such as
the TTL and ToS values. The pre-tunnel action includes the destination IP
address and is used to calculate the next hop from from the neighbour
table.
Much of the IPv4 tunnel actions can be reused for IPv6 tunnels. Change the
names of associated functions and structs to remove the IPv4 identifier
and make minor modifcations to support IPv6 tunnel actions.
Ensure the pre-tunnel action contains the IPv6 address along with an
identifying flag when an IPv6 tunnel action is required.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fw requires a list of IPv6 addresses that are used as tunnel endpoints to
enable correct decap of tunneled packets.
Store a list of IPv6 endpoints used in rules with a ref counter to track
how many times it is in use. Offload the entire list any time a new IPv6
address is added or when an address is removed (ref count is 0).
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 tunnel matches are now supported by firmware. Modify the NFP driver
to compile these match rules. IPv6 matches are handled similar to IPv4
tunnels with the difference the address length. The type of tunnel is
indicated by the same bitmap that is used in IPv4 with an extra bit
signifying that the IPv6 variation should be used.
Only compile IPv6 tunnel matches when the fw features symbol indicated
that they are compatible with the currently loaded fw.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 UDP and GRE tunnel match rule compile helpers share functions for
compiling fields such as IP addresses. However, they handle fields such
tunnel IDs differently.
Create new helper functions for compiling GRE and UDP tunnel key data.
This is in preparation for supporting IPv6 tunnels where these new
functions can be reused.
This patch does not change functionality.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In kernel 5.1, the flow offload API was introduced along with a helper
function to extract the flow_rule from the TC offload struct. Each of the
match helper functions are passed the offload struct and extract the flow
rule to a local variable.
Simplify the code while also removing the extra compat and local variable
calls by extracting the rule once in the main match handler, and passing
a reference to the rule direct to each helper.
This patch does not change driver functionality.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As flower rules are added, they are given a stats ID based on the number
of rules that can be supported in firmware. Only after the initial
allocation of all available IDs does the driver begin to reuse those that
have been released.
The initial allocation of IDs was modified to account for multiple memory
units on the offloaded device. However, this introduced a bug whereby the
counter that controls the IDs could be decremented before the ID was
assigned (where it is further decremented). This means that the stats ID
could be assigned as -1/0xfffffff which is out of range.
Fix this by only decrementing the main counter after the current ID has
been assigned.
Fixes: 467322e262 ("nfp: flower: support multiple memory units for filter offloads")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 78beef629f ("nfp: abm: fix memory leak in
nfp_abm_u32_knode_replace").
The quoted commit does not fix anything and resulted in a bogus
CVE-2019-19076.
If match is NULL then it is known there is no matching entry in
list, hence, calling nfp_abm_u32_knode_delete() is pointless.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device stats are currently hard coded in the PCI BAR0 layout.
Add a ability to read them from the TLV area instead.
Names for the stats are maintained by the driver, and their
meaning documented. This allows us to more easily add and
remove device stats.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().
This patch is generated using following script:
EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"
git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do
if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
rhashtable_lookup_fast() internally calls rcu_read_lock() then,
calls rhashtable_lookup(). So if rcu_read_lock() is already held,
rhashtable_lookup() is enough.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
92117d8443 ("bpf: fix refcnt overflow") turned refcounting of bpf_map into
potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit
(32k). Due to using 32-bit counter, it's possible in practice to overflow
refcounter and make it wrap around to 0, causing erroneous map free, while
there are still references to it, causing use-after-free problems.
But having a failing refcounting operations are problematic in some cases. One
example is mmap() interface. After establishing initial memory-mapping, user
is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily
splitting it into multiple non-contiguous regions. All this happening without
any control from the users of mmap subsystem. Rather mmap subsystem sends
notifications to original creator of memory mapping through open/close
callbacks, which are optionally specified during initial memory mapping
creation. These callbacks are used to maintain accurate refcount for bpf_map
(see next patch in this series). The problem is that open() callback is not
supposed to fail, because memory-mapped resource is set up and properly
referenced. This is posing a problem for using memory-mapping with BPF maps.
One solution to this is to maintain separate refcount for just memory-mappings
and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively.
There are similar use cases in current work on tcp-bpf, necessitating extra
counter as well. This seems like a rather unfortunate and ugly solution that
doesn't scale well to various new use cases.
Another approach to solve this is to use non-failing refcount_t type, which
uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX,
stays there. This utlimately causes memory leak, but prevents use after free.
But given refcounting is not the most performance-critical operation with BPF
maps (it's not used from running BPF program code), we can also just switch to
64-bit counter that can't overflow in practice, potentially disadvantaging
32-bit platforms a tiny bit. This simplifies semantics and allows above
described scenarios to not worry about failing refcount increment operation.
In terms of struct bpf_map size, we are still good and use the same amount of
space:
BEFORE (3 cache lines, 8 bytes of padding at the end):
struct bpf_map {
const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 */
struct bpf_map * inner_map_meta; /* 8 8 */
void * security; /* 16 8 */
enum bpf_map_type map_type; /* 24 4 */
u32 key_size; /* 28 4 */
u32 value_size; /* 32 4 */
u32 max_entries; /* 36 4 */
u32 map_flags; /* 40 4 */
int spin_lock_off; /* 44 4 */
u32 id; /* 48 4 */
int numa_node; /* 52 4 */
u32 btf_key_type_id; /* 56 4 */
u32 btf_value_type_id; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct btf * btf; /* 64 8 */
struct bpf_map_memory memory; /* 72 16 */
bool unpriv_array; /* 88 1 */
bool frozen; /* 89 1 */
/* XXX 38 bytes hole, try to pack */
/* --- cacheline 2 boundary (128 bytes) --- */
atomic_t refcnt __attribute__((__aligned__(64))); /* 128 4 */
atomic_t usercnt; /* 132 4 */
struct work_struct work; /* 136 32 */
char name[16]; /* 168 16 */
/* size: 192, cachelines: 3, members: 21 */
/* sum members: 146, holes: 1, sum holes: 38 */
/* padding: 8 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 38 */
} __attribute__((__aligned__(64)));
AFTER (same 3 cache lines, no extra padding now):
struct bpf_map {
const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 */
struct bpf_map * inner_map_meta; /* 8 8 */
void * security; /* 16 8 */
enum bpf_map_type map_type; /* 24 4 */
u32 key_size; /* 28 4 */
u32 value_size; /* 32 4 */
u32 max_entries; /* 36 4 */
u32 map_flags; /* 40 4 */
int spin_lock_off; /* 44 4 */
u32 id; /* 48 4 */
int numa_node; /* 52 4 */
u32 btf_key_type_id; /* 56 4 */
u32 btf_value_type_id; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct btf * btf; /* 64 8 */
struct bpf_map_memory memory; /* 72 16 */
bool unpriv_array; /* 88 1 */
bool frozen; /* 89 1 */
/* XXX 38 bytes hole, try to pack */
/* --- cacheline 2 boundary (128 bytes) --- */
atomic64_t refcnt __attribute__((__aligned__(64))); /* 128 8 */
atomic64_t usercnt; /* 136 8 */
struct work_struct work; /* 144 32 */
char name[16]; /* 176 16 */
/* size: 192, cachelines: 3, members: 21 */
/* sum members: 154, holes: 1, sum holes: 38 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 38 */
} __attribute__((__aligned__(64)));
This patch, while modifying all users of bpf_map_inc, also cleans up its
interface to match bpf_map_put with separate operations for bpf_map_inc and
bpf_map_inc_with_uref (to match bpf_map_put and bpf_map_put_with_uref,
respectively). Also, given there are no users of bpf_map_inc_not_zero
specifying uref=true, remove uref flag and default to uref=false internally.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-2-andriin@fb.com
The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.
The rest were (relatively) trivial in nature.
Signed-off-by: David S. Miller <davem@davemloft.net>
Some interface types could be nested.
(VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
These interface types should set lockdep class because, without lockdep
class key, lockdep always warn about unexisting circular locking.
In the current code, these interfaces have their own lockdep class keys and
these manage itself. So that there are so many duplicate code around the
/driver/net and /net/.
This patch adds new generic lockdep keys and some helper functions for it.
This patch does below changes.
a) Add lockdep class keys in struct net_device
- qdisc_running, xmit, addr_list, qdisc_busylock
- these keys are used as dynamic lockdep key.
b) When net_device is being allocated, lockdep keys are registered.
- alloc_netdev_mqs()
c) When net_device is being free'd llockdep keys are unregistered.
- free_netdev()
d) Add generic lockdep key helper function
- netdev_register_lockdep_key()
- netdev_unregister_lockdep_key()
- netdev_update_lockdep_key()
e) Remove unnecessary generic lockdep macro and functions
f) Remove unnecessary lockdep code of each interfaces.
After this patch, each interface modules don't need to maintain
their lockdep keys.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't populate the array exp_mask on the stack but instead make it
static. Makes the object code smaller by 224 bytes.
Before:
text data bss dec hex filename
77832 2290 0 80122 138fa ethernet/netronome/nfp/bpf/jit.o
After:
text data bss dec hex filename
77544 2354 0 79898 1381a ethernet/netronome/nfp/bpf/jit.o
(gcc version 9.2.1, amd64)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Add tracing of device-related interaction to aid performance
analysis, especially around resync:
tls:tls_device_offload_set
tls:tls_device_rx_resync_send
tls:tls_device_rx_resync_nh_schedule
tls:tls_device_rx_resync_nh_delay
tls:tls_device_tx_resync_req
tls:tls_device_tx_resync_send
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Sanity check URB networking device parameters to avoid divide by
zero, from Oliver Neukum.
2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
don't work properly. Longer term this needs a better fix tho. From
Vijay Khemka.
3) Small fixes to selftests (use ping when ping6 is not present, etc.)
from David Ahern.
4) Bring back rt_uses_gateway member of struct rtable, it's semantics
were not well understood and trying to remove it broke things. From
David Ahern.
5) Move usbnet snaity checking, ignore endpoints with invalid
wMaxPacketSize. From Bjørn Mork.
6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
Alex Vesker, and Yevgeny Kliteynik
8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
9) Fix crash when removing sch_cbs entry while offloading is enabled,
from Vinicius Costa Gomes.
10) Signedness bug fixes, generally in looking at the result given by
of_get_phy_mode() and friends. From Dan Crapenter.
11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
13) Fix quantization code in tcp_bbr, from Kevin Yang.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
net: tap: clean up an indentation issue
nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
sk_buff: drop all skb extensions on free and skb scrubbing
tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
Documentation: Clarify trap's description
mlxsw: spectrum: Clear VLAN filters during port initialization
net: ena: clean up indentation issue
NFC: st95hf: clean up indentation issue
net: phy: micrel: add Asym Pause workaround for KSZ9021
net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
ptp: correctly disable flags on old ioctls
lib: dimlib: fix help text typos
net: dsa: microchip: Always set regmap stride to 1
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
...
In nfp_abm_u32_knode_replace if the allocation for match fails it should
go to the error handling instead of returning. Updated other gotos to
have correct errno returned, too.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In nfp_flower_spawn_vnic_reprs in the loop if initialization or the
allocations fail memory is leaked. Appropriate releases are added.
Fixes: b945245297 ("nfp: flower: add per repr private data for LAG offload")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In nfp_flower_spawn_phy_reprs, in the for loop over eth_tbl if any of
intermediate allocations or initializations fail memory is leaked.
requiered releases are added.
Fixes: b945245297 ("nfp: flower: add per repr private data for LAG offload")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- add modpost warn exported symbols marked as 'static' because 'static'
and EXPORT_SYMBOL is an odd combination
- break the build early if gold linker is used
- optimize the Bison rule to produce .c and .h files by a single
pattern rule
- handle PREEMPT_RT in the module vermagic and UTS_VERSION
- warn CONFIG options leaked to the user-space except existing ones
- make single targets work properly
- rebuild modules when module linker scripts are updated
- split the module final link stage into scripts/Makefile.modfinal
- fix the missed error code in merge_config.sh
- improve the error message displayed on the attempt of the O= build
in unclean source tree
- remove 'clean-dirs' syntax
- disable -Wimplicit-fallthrough warning for Clang
- add CONFIG_CC_OPTIMIZE_FOR_SIZE_O3 for ARC
- remove ARCH_{CPP,A,C}FLAGS variables
- add $(BASH) to run bash scripts
- change *CFLAGS_<basetarget>.o to take the relative path to $(obj)
instead of the basename
- stop suppressing Clang's -Wunused-function warnings when W=1
- fix linux/export.h to avoid genksyms calculating CRC of trimmed
exported symbols
- misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJSBAABCgA8FiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl1+OnoeHHlhbWFkYS5t
YXNhaGlyb0Bzb2Npb25leHQuY29tAAoJED2LAQed4NsGoKEQAKcid9lDacMe5KWT
4Ic93hANMFKZ9Qy8WoxivnOr1a93NcloZ0Bhka96QUt7hYUkLmDCs99eMbxKuMfP
m/ViHepojOBPzq+VtAGWOiIyPMCA7XDrTPph4wcPDKeOURTreK1PZ20fxDoAR4to
+qaqKZJGdRcNf2DpJN1yIosz8Wj0Sa2LQrRi9jgUHi3bzgvLfL7P9WM2xyZMggAc
GaSktCEFL0UzMFlMpYyDrKh2EV6ryOnN8+bVAKbmWP89tuU3njutycKdWOoL+bsj
tH2kjFThxQyIcZGNHS1VzNunYAFE2q5nj2q47O1EDN6sjTYUoRn5cHwPam6x3Kly
NH88xDEtJ7sUUc9GZEIXADWWD0f08QIhAH5x+jxFg3529lNgyrNHRSQ2XceYNAnG
i/GnMJ0EhODOFKusXw7sNlWFKtukep+8/pwnvfTXWQu6plEm5EQ3a3RL5SESubVo
mHzXsQDFCE0x/UrsJxEAww+3YO3pQEelfVi74W9z0cckpbRF8FuUq/69ltOT15l4
X+gCz80lXMWBKw/kNoR4GQoAJo3KboMEociawwoj72HXEHTPLJnCdUOsAf3n+opj
xuz/UPZ4WYSgKdnbmmDbJ+1POA1NqtARZZXpMVyKVVCOiLafbJkLQYwLKEpE2mOO
TP9igzP1i3/jPWec8cJ6Fa8UwuGh
=VGqV
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- add modpost warn exported symbols marked as 'static' because 'static'
and EXPORT_SYMBOL is an odd combination
- break the build early if gold linker is used
- optimize the Bison rule to produce .c and .h files by a single
pattern rule
- handle PREEMPT_RT in the module vermagic and UTS_VERSION
- warn CONFIG options leaked to the user-space except existing ones
- make single targets work properly
- rebuild modules when module linker scripts are updated
- split the module final link stage into scripts/Makefile.modfinal
- fix the missed error code in merge_config.sh
- improve the error message displayed on the attempt of the O= build in
unclean source tree
- remove 'clean-dirs' syntax
- disable -Wimplicit-fallthrough warning for Clang
- add CONFIG_CC_OPTIMIZE_FOR_SIZE_O3 for ARC
- remove ARCH_{CPP,A,C}FLAGS variables
- add $(BASH) to run bash scripts
- change *CFLAGS_<basetarget>.o to take the relative path to $(obj)
instead of the basename
- stop suppressing Clang's -Wunused-function warnings when W=1
- fix linux/export.h to avoid genksyms calculating CRC of trimmed
exported symbols
- misc cleanups
* tag 'kbuild-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (63 commits)
genksyms: convert to SPDX License Identifier for lex.l and parse.y
modpost: use __section in the output to *.mod.c
modpost: use MODULE_INFO() for __module_depends
export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols
export.h: remove defined(__KERNEL__), which is no longer needed
kbuild: allow Clang to find unused static inline functions for W=1 build
kbuild: rename KBUILD_ENABLE_EXTRA_GCC_CHECKS to KBUILD_EXTRA_WARN
kbuild: refactor scripts/Makefile.extrawarn
merge_config.sh: ignore unwanted grep errors
kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj)
modpost: add NOFAIL to strndup
modpost: add guid_t type definition
kbuild: add $(BASH) to run scripts with bash-extension
kbuild: remove ARCH_{CPP,A,C}FLAGS
kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC
kbuild: Do not enable -Wimplicit-fallthrough for clang for now
kbuild: clean up subdir-ymn calculation in Makefile.clean
kbuild: remove unneeded '+' marker from cmd_clean
kbuild: remove clean-dirs syntax
kbuild: check clean srctree even earlier
...
The PluDevice register provides the authoritative chip model/revision.
Since the model number is purely used for reporting purposes, follow
the hardware team convention of subtracting 0x10 from the PluDevice
register to obtain the chip model/revision number.
Suggested-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the 'app_fw_from_flash' HWinfo key is invalid, set the
'fw_load_policy' devlink parameter value to unknown.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed the incorrect prefix for the 'nfp_fw_load' function.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the 'reset_dev_on_drv_probe' devlink parameter. The
reset control policy is controlled by the 'abi_drv_reset' hwinfo key.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the 'fw_load_policy' devlink parameter. The FW load
policy is controlled by the 'app_fw_from_flash' hwinfo key.
Remap the values from devlink to the hwinfo key and back.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register devlink parameters for driver use. Subsequent patches will add
support for specific parameters.
In order to support devlink parameters, the management firmware needs to
be able to lookup and set hwinfo keys.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware reset and loading policies can be controlled with the
combination of three hwinfo keys, 'abi_drv_reset', 'abi_drv_load_ifc'
and 'app_fw_from_flash'.
'app_fw_from_flash' defines which firmware should take precedence,
'Disk', 'Flash' or the 'Preferred' firmware. When 'Preferred'
is selected, the management firmware makes the decision on which
firmware will be loaded by comparing versions of the flash firmware
and the host supplied firmware.
'abi_drv_reset' defines when the driver should reset the firmware when
the driver is probed, either 'Disk' if firmware was found on disk,
'Always' reset or 'Never' reset. Note that the device is always reset
on driver unload if firmware was loaded when the driver was probed.
'abi_drv_load_ifc' defines a list of PF devices allowed to load FW on
the device.
Furthermore, we limit the cases to where the driver will unload firmware
again when the driver is removed to only when firmware was loaded by the
driver and only if this particular device was the only one that could
have loaded firmware. This is needed to avoid firmware being removed
while in use on multi-host platforms.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the NSP HWinfo set command. This closely follows the
HWinfo lookup command.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are cases where we want to read a hwinfo entry from the NFP, and
if it doesn't exist, use a default value instead.
To support this, we must silence warning/error messages when the hwinfo
entry doesn't exist since this is a valid use case. The NSP command
structure provides the ability to silence command errors, in which case
the caller should log any command errors appropriately. Protocol errors
are unaffected by this.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the simple command that indicates whether application
firmware is loaded.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flower control message replies are handled in different locations. The truly
high priority replies are handled in the BH (tasklet) context, while the
remaining replies are handled in a predefined Linux work queue. The work
queue handler orders replies into high and low priority groups, and always
start servicing the high priority replies within the received batch first.
Reply Type: Rtnl Lock: Handler:
CMSG_TYPE_PORT_MOD no BH tasklet (mtu)
CMSG_TYPE_TUN_NEIGH no BH tasklet
CMSG_TYPE_FLOW_STATS no BH tasklet
CMSG_TYPE_PORT_REIFY no WQ high
CMSG_TYPE_PORT_MOD yes WQ high (link/mtu)
CMSG_TYPE_MERGE_HINT yes WQ low
CMSG_TYPE_NO_NEIGH no WQ low
CMSG_TYPE_ACTIVE_TUNS no WQ low
CMSG_TYPE_QOS_STATS no WQ low
CMSG_TYPE_LAG_CONFIG no WQ low
A subset of control messages can block waiting for an rtnl lock (from both
work queue priority groups). The rtnl lock is heavily contended for by
external processes such as systemd-udevd, systemd-network and libvirtd,
especially during netdev creation, such as when flower VFs and representors
are instantiated.
Kernel netlink instrumentation shows that external processes (such as
systemd-udevd) often use successive rtnl_trylock() sequences, which can result
in an rtnl_lock() blocked control message to starve for longer periods of time
during rtnl lock contention, i.e. netdev creation.
In the current design a single blocked control message will block the entire
work queue (both priorities), and introduce a latency which is
nondeterministic and dependent on system wide rtnl lock usage.
In some extreme cases, one blocked control message at exactly the wrong time,
just before the maximum number of VFs are instantiated, can block the work
queue for long enough to prevent VF representor REIFY replies from getting
handled in time for the 40ms timeout.
The firmware will deliver the total maximum number of REIFY message replies in
around 300us.
Only REIFY and MTU update messages require replies within a timeout period (of
40ms). The MTU-only updates are already done directly in the BH (tasklet)
handler.
Move the REIFY handler down into the BH (tasklet) in order to resolve timeouts
caused by a blocked work queue waiting on rtnl locks.
Signed-off-by: Fred Lotter <frederik.lotter@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add the ability to use unaligned chunks in the AF_XDP umem. By
relaxing where the chunks can be placed, it allows to use an
arbitrary buffer size and place whenever there is a free
address in the umem. Helps more seamless DPDK AF_XDP driver
integration. Support for i40e, ixgbe and mlx5e, from Kevin and
Maxim.
2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
application can wake up the kernel for rx/tx processing which
avoids busy-spinning of the latter, useful when app and driver
is located on the same core. Support for i40e, ixgbe and mlx5e,
from Magnus and Maxim.
3) bpftool fixes for printf()-like functions so compiler can actually
enforce checks, bpftool build system improvements for custom output
directories, and addition of 'bpftool map freeze' command, from Quentin.
4) Support attaching/detaching XDP programs from 'bpftool net' command,
from Daniel.
5) Automatic xskmap cleanup when AF_XDP socket is released, and several
barrier/{read,write}_once fixes in AF_XDP code, from Björn.
6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
inclusion as well as libbpf versioning improvements, from Andrii.
7) Several new BPF kselftests for verifier precision tracking, from Alexei.
8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.
9) And more BPF kselftest improvements all over the place, from Stanislav.
10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.
11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.
12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.
13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.
14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.
15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
Peter, Wei, Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Continue is not needed at the bottom of a loop.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-08-31
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix 32-bit zero-extension during constant blinding which
has been causing a regression on ppc64, from Naveen.
2) Fix a latency bug in nfp driver when updating stack index
register, from Jiong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Each get_next and lookup call requires a round trip to the device.
However, the device is capable of giving us a few entries back,
instead of just one.
In this patch we ask for a small yet reasonable number of entries
(4) on every get_next call, and on subsequent get_next/lookup calls
check this little cache for a hit. The cache is only kept for 250us,
and is invalidated on every operation which may modify the map
(e.g. delete or update call). Note that operations may be performed
simultaneously, so we have to keep track of operations in flight.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If control channel MTU is too low to support map operations a warning
will be printed. This is not enough, we want to make sure probe fails
in such scenario, as this would clearly be a faulty configuration.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Recent code changes to NFP allowed the offload of neighbour entries to FW
when the next hop device was an internal port. This allows for offload of
tunnel encap when the end-point IP address is applied to such a port.
Unfortunately, the neighbour event handler still rejects events that are
not associated with a repr dev and so the firmware neighbour table may get
out of sync for internal ports.
Fix this by allowing internal port neighbour events to be correctly
processed.
Fixes: 45756dfeda ("nfp: flower: allow tunnels to output to internal port")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Internal port TC offload is implemented through user-space applications
(such as OvS) by adding filters at egress via TC clsact qdiscs. Indirect
block offload support in the NFP driver accepts both ingress qdisc binds
and egress binds if the device is an internal port. However, clsact sends
bind notification for both ingress and egress block binds which can lead
to the driver registering multiple callbacks and receiving multiple
notifications of new filters.
Fix this by rejecting ingress block bind callbacks when the port is
internal and only adding filter callbacks for egress binds.
Fixes: 4d12ba4278 ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MODULE_FIRMWARE entries for AMDA0058 boards.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP is using Local Memory to model stack. LM_addr could be used as base of
a 16 32-bit word region of Local Memory. Then, if the stack offset is
beyond the current region, the local index needs to be updated. The update
needs at least three cycles to take effect, therefore the sequence normally
looks like:
local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
nop
If the local index switch happens on a narrow loads, then the instruction
preparing value to zero high 32-bit of the destination register could be
counted as one cycle, the sequence then could be something like:
local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
immed[gprB_5, 0]
However, we have zero extension optimization that zeroing high 32-bit could
be eliminated, therefore above IMMED insn won't be available for which case
the first sequence needs to be generated.
Fixes: 0b4de1ff19 ("nfp: bpf: eliminate zero extension code-gen")
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Now that the single target build descends into sub-directories in the
same way as the normal build, these dummy Makefiles are not needed
any more.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
When processing FLOW_BLOCK_BIND command on indirect block, check that flow
block cb is not busy.
Fixes: 0d4fd02e71 ("net: flow_offload: add flow_block_cb_is_busy() and use it")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tc transparently maps the software priority number to hardware. Update
it to pass the major priority which is what most drivers expect. Update
drivers too so they do not need to lshift the priority field of the
flow_cls_common_offload object. The stmmac driver is an exception, since
this code assumes the tc software priority is fine, therefore, lshift it
just to be conservative.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Edwin Peer <edwin.peer@netronome.com>
Cc: Yangtao Li <tiny.windzz@gmail.com>
Cc: Simon Horman <simon.horman@netronome.com>
Cc: oss-drivers@netronome.com
Cc: netdev@vger.kernel.org
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a tunnel packet arrives on the NFP card, its destination MAC is
looked up and MAC index returned for it. This index can help verify the
tunnel by, for example, ensuring that the packet arrived on the expected
port. If the packet is destined for a known MAC that is not connected to a
given physical port then the mac index can have a global value (e.g. when
a series of bonded ports shared the same MAC).
If the packet is to be detunneled at a bridge device or internal port like
an Open vSwitch VLAN port, then it should first match a 'pre-tunnel' rule
to direct it to that internal port.
Use the MAC index to indicate if a packet should match a pre-tunnel rule
before decap is allowed. Do this by tracking the number of internal ports
associated with a MAC address and, if the number if >0, set a bit in the
mac_index to forward the packet to the pre-tunnel table before continuing
with decap.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAC addresses along with an identifying index are offloaded to firmware to
allow tunnel decapsulation. If a tunnel packet arrives with a matching
destination MAC address and a verified index, it can continue on the
decapsulation process. This replicates the MAC verifications carried out
in the kernel network stack.
When a netdev is added to a bridge (e.g. OvS) then packets arriving on
that dev are directed through the bridge datapath instead of passing
through the network stack. Therefore, tunnelled packets matching the MAC
of that dev will not be decapped here.
Replicate this behaviour on firmware by removing offloaded MAC addresses
when a MAC representer is added to an OvS bridge. This can prevent any
false positive tunnel decaps.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pre-tunnel rules are TC flower and OvS rules that forward a packet to the
tunnel end point where it can then pass through the network stack and be
decapsulated. These are required if the tunnel end point is, say, an OvS
internal port.
Currently, firmware determines that a packet is in a tunnel and decaps it
if it has a known destination IP and MAC address. However, this bypasses
the flower pre-tunnel rule and so does not update the stats. Further to
this it ignores VLANs that may exist outside of the tunnel header.
Offload pre-tunnel rules to the NFP. This embeds the pre-tunnel rule into
the tunnel decap process based on (firmware) mac index and VLAN. This
means that decap can be carried out correctly with VLANs and that stats
can be updated for all kernel rules correctly.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pre-tunnel rules must direct packets to an internal port based on L2
information. Rules that egress to an internal port are already indicated
by a non-NULL device in its nfp_fl_payload struct. Verfiy the rest of the
match fields indicate that the rule is a pre-tunnel rule. This requires a
full match on the destination MAC address, an option VLAN field, and no
specific matches on other lower layer fields (with the exception of L4
proto and flags).
If a rule is identified as a pre-tunnel rule then mark it for offload to
the pre-tunnel table. Similarly, remove it from the pre-tunnel table on
rule deletion. The actual offloading of these commands is left to a
following patch.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pre-tunnel rules are used when the tunnel end-point is on an 'internal
port'. These rules are used to direct the tunnelled packets (based on outer
header fields) to the internal port where they can be detunnelled. The
rule must send the packet to ingress the internal port at the TC layer.
Currently FW does not support an action to send to ingress so cannot
offload such rules. However, in preparation for populating the pre-tunnel
table to represent such rules, check for rules that send to the ingress of
an internal port and mark them as such. Further validation of such rules
is left to subsequent patches.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP allows the merging of 2 flows together into a single offloaded flow.
In the kernel datapath the packet must match 1 flow, impliment its
actions, recirculate, match the 2nd flow and also impliment its actions.
Merging creates a single flow with all actions from the 2 original flows.
Firmware impliments a tunnel header push as the packet is about to egress
the card. Therefore, if the first merge rule candiate pushes a tunnel,
then the second rule can only have an egress action for a valid merge to
occur (or else the action ordering will be incorrect). This prevents the
pushing of a tunnel header followed by the pushing of a vlan header.
In order to support this behaviour, firmware allows VLAN information to
be encoded in the tunnel push action. If this is non zero then the fw will
push a VLAN after the tunnel header push meaning that 2 such flows with
these actions can be merged (with action order being maintained).
Support tunnel in VLAN pushes by encoding VLAN information in the tunnel
push action of any merge flow requiring this.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl06EYUACgkQSD+KveBX
+j7R4QgAht/C4115mi1Tc3d3zYjHp3SWLFxwK4vF0U2j30ouhsj1oaIP8bQdw6Mr
6hS4IZSdKNO5wo+NNqMnLYVtsAnvNGOuvYwUvMK5TDkdDb2lIzRlxihpWgTqWzXr
6Eh3nv5rTItgLMqxbLL1EE8Idlx3HQDJtU2a/AmxjmU/TqSKzbBTpnKIlRMPDFNC
PLWXjFXBR/XtcTbsnj7RtlD2HkDAERVTiMP2mlTvXjXxlN56YXCle4CWZamgH9H4
bTCrZwQHH9hllMAnAkq4gpHN7Z6/eXjV6jzu+BOE7ChOaEC5N2F+p5ARXqe+HwRL
apMYgRH5u4mzDt+1CbwR/I/pFOw3WA==
=NXce
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-07-25
This series introduces some fixes to mlx5 driver.
1) Ariel is addressing an issue with enacp flow counter race condition
2) Aya fixes ethtool speed handling
3) Edward fixes modify_cq hw bits alignment
4) Maor fixes RDMA_RX capabilities handling
5) Mark reverses unregister devices order to address an issue with LAG
6) From Tariq,
- wrong max num channels indication regression
- TLS counters naming and documentation as suggested by Jakub
- kTLS, Call WARN_ONCE on netdev mismatch
There is one patch in this series that touches nfp driver to align
TLS statistics names with latest documentation, Jakub is CC'ed.
Please pull and let me know if there is any problem.
For -stable v4.9:
('net/mlx5: Use reversed order when unregister devices')
For -stable v4.20
('net/mlx5e: Prevent encap flow counter update async to user query')
('net/mlx5: Fix modify_cq_in alignment')
For -stable v5.1
('net/mlx5e: Fix matching of speed to PRM link modes')
For -stable v5.2
('net/mlx5: Add missing RDMA_RX capabilities')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent additions to the kernel include a TC action module to manipulate
MPLS headers on packets. Such actions are available to offload via the
flow_offload intermediate representation API.
Modify the NFP driver to allow the offload of MPLS set actions to
firmware. Set actions update the outermost MPLS header. The offload
includes a mask to specify which fields should be set.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent additions to the kernel include a TC action module to manipulate
MPLS headers on packets. Such actions are available to offload via the
flow_offload intermediate representation API.
Modify the NFP driver to allow the offload of MPLS pop actions to
firmware. The act_mpls TC module enforces that the next protocol is
supplied along with the pop action. Passing this to firmware allows it
to properly rebuild the underlying packet after the pop.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent additions to the kernel include a TC action module to manipulate
MPLS headers on packets. Such actions are available to offload via the
flow_offload intermediate representation API.
Modify the NFP driver to allow the offload of MPLS push actions to
firmware.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This object stores the flow block callbacks that are attached to this
block. Update flow_block_cb_lookup() to take this new object.
This patch restores the block sharing feature.
Fixes: da3eeb904f ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to annotate the netns on the flow block callback object,
flow_block_cb_is_busy() already checks for used blocks.
Fixes: d63db30c85 ("net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flower rules on the NFP firmware are able to match on an IP protocol
field. When parsing rules in the driver, unknown IP protocols are only
rejected when further matches are to be carried out on layer 4 fields, as
the firmware will not be able to extract such fields from packets.
L4 protocol dissectors such as FLOW_DISSECTOR_KEY_PORTS are only parsed if
an IP protocol is specified. This leaves a loophole whereby a rule that
attempts to match on transport layer information such as port numbers but
does not explicitly give an IP protocol type can be incorrectly offloaded
(in this case with wildcard port numbers matches).
Fix this by rejecting the offload of flows that attempt to match on L4
information, not only when matching on an unknown IP protocol type, but
also when the protocol is wildcarded.
Fixes: 2a04784594 ("nfp: flower: check L4 matches on unknown IP protocols")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP firmware does not explicitly match on an ethernet type field. Rather,
each rule has a bitmask of match fields that can be used to infer the
ethernet type.
Currently, if a flower rule contains an unknown ethernet type, a check is
carried out for matches on other fields of the packet. If matches on
layer 3 or 4 are found, then the offload is rejected as firmware will not
be able to extract these fields from a packet with an ethernet type it
does not currently understand.
However, if a rule contains an unknown ethernet type without any L3 (or
above) matches then this will effectively be offloaded as a rule with a
wildcarded ethertype. This can lead to misclassifications on the firmware.
Fix this issue by rejecting all flower rules that specify a match on an
unknown ethernet type.
Further ensure correct offloads by moving the 'L3 and above' check to any
rule that does not specify an ethernet type and rejecting rules with
further matches. This means that we can still offload rules with a
wildcarded ethertype if they only match on L2 fields but will prevent
rules which match on further fields that we cannot be sure if the firmware
will be able to extract.
Fixes: af9d842c13 ("nfp: extend flower add flow offload")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And any other existing fields in this structure that refer to tc.
Specifically:
* tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule().
* TC_CLSFLOWER_* to FLOW_CLS_*.
* tc_cls_common_offload to tc_cls_common_offload.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a function to check if flow block callback is already in
use. Call this new function from flow_block_cb_setup_simple() and from
drivers.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates flow_block_cb_setup_simple() to use the flow block API.
Several drivers are also adjusted to use it.
This patch introduces the per-driver list of flow blocks to account for
blocks that are already in use.
Remove tc_block_offload alias.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename from TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* and
remove temporary tcf_block_binder_type alias.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename from TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND and remove
temporary tc_block_command alias.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most drivers do the same thing to set up the flow block callbacks, this
patch adds a helper function to do this.
This preparation patch reduces the number of changes to adapt the
existing drivers to use the flow block callback API.
This new helper function takes a flow block list per-driver, which is
set to NULL until this driver list is used.
This patch also introduces the flow_block_command and
flow_block_binder_type enumerations, which are renamed to use
FLOW_BLOCK_* in follow up patches.
There are three definitions (aliases) in order to reduce the number of
updates in this patch, which go away once drivers are fully adapted to
use this flow block API.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If driver has to drop the TLS frame it needs to undo the TCP
sequence tracking changes, otherwise device will receive
segments out of order and drop them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the #ifdef CONFIG_TLS_DEVICE a little so we can eliminate
the other one.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure the contents of the skb which carried key material
to the FW is cleared.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a return code for the tls_dev_resync callback.
When the driver TX resync fails, kernel can retry the resync again
until it succeeds. This prevents drivers from attempting to offload
TLS packets if the connection is known to be out of sync.
We don't worry about the RX resync since they will be retried naturally
as more encrypted records get received.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count the number of successfully submitted TLS segments,
not skbs. This will make it easier to compare the TLS
encryption count against other counters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the batch limit to consume small message bursts more
effectively. Practically, the effect on the 'add' messages is not
significant since the mailbox is sized such that the 'add' messages are
still limited to the same order of magnitude that it was originally set
for.
Furthermore, increase the queue size limit to 1024 entries. This further
improves the handling of bursts of small control messages.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Connection 4 tuple reuse is slightly problematic - TLS socket
and context do not get destroyed until all the associated skbs
left the system and all references are released. This leads
to stale connection entry in the device preventing addition
of new one if the 4 tuple is reused quickly enough.
Instead of using read 4 tuple as the key use a unique ID.
Set the protocol to TCP and port to 0 to ensure no collisions
with real connections.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Long lines are ugly. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to do our best not to drop delete commands, otherwise
we will have stale entries in the connection table. Ignore
the control message queue limits for delete commands.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return negative error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 1f35a56cf5 ("nfp: tls: add/delete TLS TX connections")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For spinlocks the type spinlock_t should be used instead of "struct
spinlock".
Use spinlock_t for spinlock's definition.
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: oss-drivers@netronome.com
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new GRE encapsulation support, which allows offload of filters
using tunnel_key set action in combination with actions that egress
to GRE type ports.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the existing tunnel matching support to include GRE decap
classification. Specifically matching existing tunnel fields for
NVGRE (GRE with protocol field set to TEB).
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously tunnel related functions in action offload only applied
to UDP tunnels. Rename these functions in preparation for new
tunnel types.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds IPv4 address and TTL/TOS helper functions, which is done in
preparation for compiling new tunnel types.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the key layer calculation function, in particular the tunnel
key layer calculation by introducing helper functions. This is done
in preparation for supporting GRE tunnel offloads.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use extack messages in flower offload when compiling match and actions
messages that will configure hardware.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use extack messages in flower offload, specifically focusing on
the extack use in add offload, remove offload and get stats paths.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matching on fields with a protocol that is unknown to hardware
is not strictly unsupported. Determine if hardware can offload
a filter with an unknown protocol by checking if any L4 fields
are being matched as well.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Users sometimes mistakenly try to manually bind the PF driver
to the VFs, print a warning message in that case.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apparently there are still cards in the wild with a very old
management FW. Let's make the error message in that case
indicate more clearly that management firmware has to be
updated.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TCP stream gets out of sync (driver stops receiving skbs
with expected TCP sequence numbers) request a TX resync from
the kernel.
We try to distinguish retransmissions from missed transmissions
by comparing the sequence number to expected - if it's further
than the expected one - we probably missed packets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently only RX direction is ever resynced, however, TX may
also get out of sequence if packets get dropped on the way to
the driver. Rename the resync callback and add a direction
parameter.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set ethtool TLS RX feature based on NIC capabilities, and enable
TLS RX when connections are added for decryption.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable kernel-controlled RX resync and propagate TLS connection
RX resync from kernel TLS to firmware.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some control messages must be sent from atomic context. The mailbox
takes sleeping locks and uses a waitqueue so add a "posted" version
of communication.
Trylock the semaphore and if that's successful kick of the device
communication. The device communication will be completed from
a workqueue, which will also release the semaphore.
If locks are taken queue the message and return. Schedule a
different workqueue to take the semaphore and run the communication.
Note that the there are currently no atomic users which would actually
need the return value, so all replies to posted messages are just
freed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need the name nfp_ccm_mbox_alloc() for allocating the mailbox
communication channel itself.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware indicates when a packet has been decrypted by reusing the
currently unused BPF flag. Transfer this information into the skb
and provide a statistic of all decrypted segments.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets received at the NFP driver may be redirected to egress of another
netdev (e.g. in the case of OvS internal ports). On the egress path, some
processes, like TC egress hooks, may expect the network header offset
field in the skb to be correctly set. If this is not the case there is
potential for abnormal behaviour and even the triggering of BUG() calls.
Set the skb network header field before the mac header pull when doing a
packet redirect.
Fixes: 27f54b5825 ("nfp: allow fallback packets from non-reprs")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count TX TLS packets: successes, out of order, and dropped due to
missing record info. Make sure the RX and TX completion statistics
don't share cache lines with TX ones as much as possible. With TLS
stats they are no longer reasonably aligned.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the functionality to add and delete TLS connections on
the NFP, received from the kernel TLS callbacks.
Make use of the common control message (CCM) infrastructure to propagate
the kernel state to firmware.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepend connection handle to each transmitted TLS packet.
For each connection, the driver tracks the next sequence number
expected. If an out of order packet is observed, the driver calls into
the TLS kernel code to reencrypt that particular skb.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches will add support for more TX metadata fields.
Prepare for this by handling an additional double word - firmware
handle as metadata type 7.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add FW ABI defines and code for basic init of TLS offload.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parse TLV containing a bitmask of supported crypto operations.
The TLV contains a capability bitmask (supported operations)
and enabled bitmask. Each operation describes the crypto
protocol quite exhaustively (protocol, AEAD, direction).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW may prefer to handle some communication via a mailbox
or the vNIC may simply not have a control queue (VFs).
Add a way of exchanging ccm-compatible messages via a
mailbox.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parse the mailbox TLV. When control message queue is not available
we can fall back to passing the control messages via the vNIC
mailbox.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will need to release the bar lock from a workqueue
so move from a mutex to a semaphore. This lock should
not be too hot. Unfortunately semaphores don't have
lockdep support.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if we need to modify the head of the skb and allocation
fails we would free the skb and not increment the error counter.
Make sure all errors are counted.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct nfp_tun_active_tuns {
...
struct route_ip_info {
__be32 ipv4;
__be32 egress_port;
__be32 extra[2];
} tun_info[];
};
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
So, replace the following form:
sizeof(struct nfp_tun_active_tuns) + sizeof(struct route_ip_info) * count
with:
struct_size(payload, tun_info, count)
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminate zero extension code-gen for instructions including
both alu and load/store. The only exception is for ctx load, because
offload target doesn't go through host ctx convert logic so we do
customized load and ignores zext flag set by verifier.
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add rcu locks when accessing netdev when processing route request
and tunnel keep alive messages received from hardware.
Fixes: 8e6a9046b6 ("nfp: flower vxlan neighbour offload")
Fixes: 856f5b1357 ("nfp: flower vxlan neighbour keep-alive")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing kdoc for app member.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP shift instruction has something special. If shift direction is left
then shift amount of 1 to 31 is specified as 32 minus the amount to shift.
But no need to do this for indirect shift which has shift amount be 0. Even
after we do this subtraction, shift amount 0 will be turned into 32 which
will eventually be encoded the same as 0 because only low 5 bits are
encoded, but shift amount be 32 will fail the FIELD_PREP check done later
on shift mask (0x1f), due to 32 is out of mask range. Such error has been
observed when compiling nfp/bpf/jit.c using gcc 8.3 + O3.
This issue has started when indirect shift support added after which the
incoming shift amount to __emit_shf could be 0, therefore it is at that
time shift amount adjustment inside __emit_shf should have been tightened.
Fixes: 991f5b3651 ("nfp: bpf: support logic indirect shifts (BPF_[L|R]SH | BPF_X)")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Pablo Cascón <pablo.cascon@netronome.com
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
NFP does not register devlink ports for representors (without
the "devlink: expose PF and VF representors as ports" series
there are no port flavours to expose them as).
Commit c25f08ac65 ("nfp: remove ndo_get_port_parent_id implementation")
went to far in removing ndo_get_port_parent_id for representors.
This causes redirection offloads to fail, and switch_id attribute
missing.
Reintroduce the ndo_get_port_parent_id callback for representor ports.
Fixes: c25f08ac65 ("nfp: remove ndo_get_port_parent_id implementation")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on feedback from Jiri avoid carrying a pointer to the tcf_block
structure in the tc_cls_common_offload structure. Instead store
a flag in driver private data which indicates if offloads apply
to a shared block at block binding time.
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add stats request function that sends a stats request message to hw for
a specific police-filter. Process stats reply from hw and update the
stored qos structure.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add install and remove offload functionality for qos offloads. We
first check that a police filter can be implemented by the VF rate
limiting feature in hw, then we install the filter via the qos
infrastructure. Finally we implement the mechanism for removing
these types of filters.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce matchall filter offload infrastructure that is needed to
offload qos features like policing. Subsequent patches will make
use of police-filters for ingress rate limiting.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device may be shutdown without the hardware being reinitialized, in
which case we want to ensure we cleanup properly.
This is especially important for kexec with traffic flowing.
The shutdown procedures resembles the remove procedures, so we can reuse
those common tasks.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add extack to shared buffer set operations, so that meaningful error
messages could be propagated to the user.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default VFs are not trusted. Add ndo_set_vf_trust support to toggle
a new per-VF bit. Coupled with FW with this capability allows a
trusted VF to change its MAC even after being administratively set by
the PF. Also populate the trusted field on ndo_get_vf_config. Add the
same ndo to the representors.
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent addition to NFP introduced a function that formats a string with
a size_t variable. This is formatted with %ld which is fine on 64-bit
architectures but produces a compile warning on 32-bit architectures.
Fix this by using the z length modifier.
Fixes: a6156a6ab0f9 ("nfp: flower: handle merge hint messages")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a couple of spelling mistakes in NL_SET_ERR_MSG_MOD error
messages. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nfp_flower_copy_pre_actions function introduces a case statement with
an intentional fallthrough. However, this generates a warning if built
with the -Wimplicit-fallthrough flag.
Remove the warning by adding a fall through comment.
Fixes: 1c6952ca58 ("nfp: flower: generate merge flow rule")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A merge flow is formed from 2 sub flows. The match fields of the merge are
the same as the first sub flow that has formed it, with the actions being
a combination of the first and second sub flow. Therefore, a merge flow
should replace sub flow 1 when offloaded.
Offload valid merge flows by using a new 'flow mod' message type to
replace an existing offloaded rule. Track the deletion of sub flows that
are linked to a merge flow and revert offloaded merge rules if required.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the merging of 2 sub flows, a new 'merge' flow will be created and
written to FW. The TC layer is unaware that the merge flow exists and will
request stats from the sub flows. Conversely, the FW treats a merge rule
the same as any other rule and sends stats updates to the NFP driver.
Add links between merge flows and their sub flows. Use these links to pass
merge flow stats updates from FW to the underlying sub flows, ensuring TC
stats requests are handled correctly. The updating of sub flow stats is
done on (the less time critcal) TC stats requests rather than on FW stats
update.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When combining 2 sub_flows to a single 'merge flow' (assuming the merge is
valid), the merge flow should contain the same match fields as sub_flow 1
with actions derived from a combination of sub_flows 1 and 2. This action
list should have all actions from sub_flow 1 with the exception of the
output action that triggered the 'implicit recirculation' by sending to
an internal port, followed by all actions of sub_flow 2. Any pre-actions
in either sub_flow should feature at the start of the action list.
Add code to generate a new merge flow and populate the match and actions
fields based on the sub_flows. The offloading of the flow is left to
future patches.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two flows can be merged if the second flow (after recirculation) matches
on bits that are either matched on or explicitly set by the first flow.
This means that if a packet hits flow 1 and recirculates then it is
guaranteed to hit flow 2.
Add a 'can_merge' function that determines if 2 sub_flows in a merge hint
can be validly merged to a single flow.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a merge hint is received containing 2 flows that are matched via an
implicit recirculation (sending to and matching on an internal port), fw
reports that the flows (called sub_flows) may be able to be combined to a
single flow.
Add infastructure to accept and process merge hint messages. The actual
merging of the flows is left as a stub call.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each flow is given a context ID that the fw uses (along with its cookie)
to identity the flow. The flows stats are updated by the fw via this ID
which is a reference to a pre-allocated array entry.
In preparation for flow merge code, enable the nfp_fl_payload structure to
be accessed via this stats context ID. Rather than increasing the memory
requirements of the pre-allocated array, add a new rhashtable to associate
each active stats context ID with its rule payload.
While adding new code to the compile metadata functions, slightly
restructure the existing function to allow for cleaner, easier to read
error handling.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The neighbour table in the FW only accepts next hop entries if the egress
port is an nfp repr. Modify this to allow the next hop to be an internal
port. This means that if a packet is to egress to that port, it will
recirculate back into the system with the internal port becoming its
ingress port.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW may receive a packet with its ingress port marked as an internal port.
If a rule does not exist to match on this port, the packet will be sent to
the NFP driver. Modify the flower app to detect packets from such internal
ports and convert the ingress port to the correct kernel space netdev.
At this point, it is assumed that fallback packets from internal ports are
to be sent out said port. Therefore, set the redir_egress bool to true on
detection of these ports.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is assumed that fallback packets will be from reprs. Modify
this to allow an app to receive non-repr ports from the fallback channel -
e.g. from an internal port. If such a packet is received, do not update
repr stats.
Change the naming function calls so as not to imply it will always be a
repr netdev returned. Add the option to set a bool value to redirect a
fallback packet out the returned port rather than RXing it. Setting of
this bool in subsequent patches allows the handling of packets falling
back when they are due to egress an internal port.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent FW modifications allow the offloading of non repr ports. These
ports exist internally on the NFP. So if a rule outputs to an 'internal'
port, then the packet will recirculate back into the system but will now
have this internal port as it's incoming port. These ports are indicated
by a specific type field combined with an 8 bit port id.
Add private app data to assign additional port ids for use in offloads.
Provide functions to lookup or create new ids when a rule attempts to
match on an internal netdev - the only internal netdevs currently
supported are of type openvswitch. Have a netdev notifier to release
port ids on netdev unregister.
OvS offloads rules that match on internal ports as TC egress filters.
Ensure that such rules are accepted by the driver.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Write to a FW symbol to indicate that the driver supports flow merging. If
this symbol does not exist then flow merging and recirculation is not
supported on the FW. If support is available, add a stub to deal with FW
to kernel merge hint messages.
Full flow merging requires the firmware to support of flow mods. If it
does not, then do not attempt to 'turn on' flow merging.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF's control message handler seems like a good base to built
on for request-reply control messages. Split it out to allow
for reuse.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During probe we clear vNIC configuration in case the device
wasn't closed cleanly by previous driver. Move that code
before netdev init, so netdev init can already try to apply
its config parameters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Soon we will try to write to the vNIC mailbox without RTNL held.
Add a new mutex to protect access to specific parts of the PCI
control BAR.
Move the mailbox size checking to the mailbox lock() helper, where
it can be more effective (happen prior to potential overwrite of
other data).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the reconfig was a quick update, we could have results available from
firmware within 200us.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor comment merge conflict in mlx5.
Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove implementation of get_port_parent_id ndo and rely on core calling
into devlink for the information directly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend devlink_port_attrs_set() to pass switch ID for ports which are
part of switch and store it in port attrs. For other ports, this is
NULL.
Note that this allows the driver to group devlink ports into one or more
switches according to the actual topology.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two reasons for this.
First, the xmit_more flag conceptually doesn't fit into the skb, as
xmit_more is not a property related to the skb.
Its only a hint to the driver that the stack is about to transmit another
packet immediately.
Second, it was only done this way to not have to pass another argument
to ndo_start_xmit().
We can place xmit_more in the softnet data, next to the device recursion.
The recursion counter is already written to on each transmit. The "more"
indicator is placed right next to it.
Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more
to check the "more packets coming" hint.
skb->xmit_more is retained (but always 0) to not cause build breakage.
This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/
conversions. Remaining drivers are converted in the next patches.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the NSP provides the ability to read from the SFF modules'
EEPROM, we can use this interface to implement the ethtool callback.
If the NSP only provides partial data, we log the event from within
the driver but pass a success code to ethtool to prevent it from
discarding the partial data.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NSP now provides the ability to read from the SFF module EEPROM.
Note that even if an error occurs, the NSP may still provide some of the
data.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the introduction of flow_action_for_each pedit actions are no
longer grouped together, instead pedit actions are broken out per
32 byte word. This results in an inefficient use of the action list
that is pushed to hardware where each 32 byte word becomes its own
action. Therefore we combine groups of 32 byte word before sending
the action list to hardware.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We no longer set CFI when pushing vlan tags, therefore we remove
the CFI bit from push vlan.
Fixes: 1a1e586f54 ("nfp: add basic action capabilities to flower offloads")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace vlan CFI bit with a vlan present bit that indicates the
presence of a vlan tag. Previously the driver incorrectly assumed
that an vlan id of 0 is not matchable, therefore we indicate vlan
presence with a vlan present bit.
Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP reprs are software device on top of the PF's vNIC.
The comment above __dev_queue_xmit() sayeth:
When calling this method, interrupts MUST be enabled. This is because
the BH enable code must have IRQs enabled so that it will not deadlock.
For netconsole we can't guarantee IRQ state, let's just
disable netpoll on representors to be on the safe side.
When the initial implementation of NFP reprs was added by the
commit 5de73ee467 ("nfp: general representor implementation")
.ndo_poll_controller was required for netpoll to be enabled.
Fixes: ac3d9dd034 ("netpoll: make ndo_poll_controller() optional")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_queue_xmit() may return error codes as well as netdev_tx_t,
and it always consumes the skb. Make sure we always return a
correct netdev_tx_t value.
Fixes: eadfa4c3be ("nfp: add stats and xmit helpers for representors")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If nn->port is defined it means that devlink_port has been registered
for this port as well. Devlink core is handling the port name
formatting.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow-up patch is going to need a devlink port instance according to
a netdev. Devlink port instance should be always available when devlink
is used. So change the recently introduced ndo_get_devlink to
ndo_get_devlink_port. With that, adjust the wrapper for the only
user to get devlink pointer.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the init/fini flow and register devlink port instance before
netdev. Now it is needed for correct behavior of phys_port_name
generation, but in general it makes sense to register devlink port
first.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to other driver, move the port type set after netdev registration
is done. Along with that, clear the type before unregistration.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added IANA_VXLAN_UDP_PORT (4789) definition to vxlan header file so it
can be used by drivers instead of local definition.
Updated drivers which locally defined it as 4789 to use it.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Cc: Peng Li <lipeng321@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move the definition of the default Geneve udp port from the geneve
source to the header file, so we can re-use it from drivers.
Modify existing drivers to use it.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
NFP driver ABI contains bits for L2 switching which were never
implemented in initially envisioned form.
Remove the defines, and open up the possibility of
reclaiming the bits for other uses.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The simple vNIC mailbox length should be 12 decimal and not 0x12.
Using a decimal also makes it clear this is a length value and not
another field within the simple mailbox defines.
Found by code inspection, there are no known firmware configurations
where this would cause issues.
Fixes: 527d7d1b99 ("nfp: read mailbox address from TLV caps")
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The management firmware now supports being passed a bundle with
multiple components to be stored in flash at once. This makes it
easier to update all components to a known state with a single
user command, however, this also has the potential to increase
the time required to perform the update significantly.
The management firmware only updates the components out of a bundle
which are outdated, however, we need to make sure we can handle
the absolute worst case where a CPLD update can take a long time
to perform.
We set a very conservative total timeout of 900s which already
adds a contingency.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer versions of NSP can access host memory. Simplest access
type requires all data to be in one contiguous area. Since we
don't have the guarantee on where callers of the NSP ABI will
allocate their buffers we allocate a bounce buffer and copy
the data in and out.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DMA version of NSP communication is coming, move the code which
copies data into the NFP buffer into a separate function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NSP expresses the buffer size in MB and 4 kB blocks. For small
buffers the kB part may make a difference, so count it in.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reporting twisted pair port type.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that devlink fallback will be called reliably, we can remove
the ethtool flashing code.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support getting devlink instance from a new NDO.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is no longer necessary after a5084bb71f ("nfp: Implement
ndo_get_port_parent_id()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.
The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.
However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP BPF JIT compiler is doing a couple of small optimizations when jitting
ALU imm instructions, some of these optimizations could save code-gen, for
example:
A & -1 = A
A | 0 = A
A ^ 0 = A
However, for ALU32, high 32-bit of the 64-bit register should still be
cleared according to ISA semantics.
Fixes: cd7df56ed3 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The intended optimization should be A ^ 0 = A, not A ^ -1 = A.
Fixes: cd7df56ed3 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Check mask fields of tcp and ip flags when setting the corresponding mask
flag used in hardware.
Fixes: 8f2566225a ("flow_offload: add flow_rule and flow_match")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink now allows updating device flash. Implement this
callback.
Compared to ethtool update we no longer have to release
the networking locks - devlink doesn't take them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-02-16
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) numerous libbpf API improvements, from Andrii, Andrey, Yonghong.
2) test all bpf progs in alu32 mode, from Jiong.
3) skb->sk access and bpf_sk_fullsock(), bpf_tcp_sock() helpers, from Martin.
4) support for IP encap in lwt bpf progs, from Peter.
5) remove XDP_QUERY_XSK_UMEM dead code, from Jan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent cls_flower offload rewrite added a double new line.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently bpf_offload_dev does not have any priv pointer, forcing
the drivers to work backwards from the netdev in program metadata.
This is not great given programs are conceptually associated with
the offload device, and it means one or two unnecessary deferences.
Add a priv pointer to bpf_offload_dev.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The manufacturing team requests we include vendor and product
in the serial number field, as the serial number itself is not
unique across manufacturing facilities and products.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vendor may sound ambiguous, let's rename the fab string to
"board.manufacture" (which was just added as a generic identifier).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
void *entry[];
};
size = sizeof(struct foo) + count * sizeof(void *);
instance = alloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = alloc(struct_size(instance, entry, count), GFP_KERNEL);
Notice that, in this case, variable size is not necessary, hence
it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Static checker warning complains on uninitialized variable:
drivers/net/ethernet/netronome/nfp/flower/action.c:618 nfp_fl_pedit()
error: uninitialized symbol 'idx'.
Which is actually never used from the functions that take it as
parameter. Remove it.
Fixes: 7386788175 ("drivers: net: use flow action infrastructure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get().
Since NFP uses switchdev_port_same_parent_id() convert it to use
netdev_port_same_parent_id().
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates drivers to use the new flow action infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.
To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.
This introduces two new interfaces:
bool flow_rule_match_key(rule, dissector_id)
that returns true if a given matching key is set on, and:
flow_rule_match_XYZ(rule, &match);
To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shared buffer allocation is usually done in cell increments.
Drivers will either round up the allocation or refuse the
configuration if it's not an exact multiple of cell size.
Drivers know exactly the cell size of shared buffer, so help
out users by providing this information in dumps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-02-01
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) introduce bpf_spin_lock, from Alexei.
2) convert xdp samples to libbpf, from Maciej.
3) skip verifier tests for unsupported program/map types, from Stanislav.
4) powerpc64 JIT support for BTF line info, from Sandipan.
5) assorted fixed, from Valdis, Jesper, Jiong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The following ALU32 logic shift supports are missing:
BPF_ALU | BPF_LSH | BPF_X
BPF_ALU | BPF_RSH | BPF_X
BPF_ALU | BPF_RSH | BPF_K
For BPF_RSH | BPF_K, it could be implemented using NFP direct shift
instruction. For the other BPF_X shifts, NFP indirect shifts sequences need
to be used.
Separate code-gen hook is assigned to each instruction to make the
implementation clear.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Shifts by zero do nothing, and should be treated as nops.
Even though compiler is not supposed to generate such instructions and
manual written assembly is unlikely to have them, but they are legal
instructions and have defined behavior.
This patch correct existing shifts code-gen to make sure they do nothing
when shift amount is zero except when the instruction is ALU32 for which
high bits need to be cleared.
For shift amount bigger than type size, already, NFP JIT back-end errors
out for immediate shift and only low 5 bits will be taken into account for
indirect shift which is the same as x86.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Report versions of firmware components using the new NSP command.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retrieve the FW versions with the new command.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report information about the hardware.
RFCv2:
- add defines for board IDs which are likely to be reusable for
other drivers (Jiri).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the basic info through new devlink info API.
RFCv2:
- add driver name;
- align serial to core changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-01-29
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Teach verifier dead code removal, this also allows for optimizing /
removing conditional branches around dead code and to shrink the
resulting image. Code store constrained architectures like nfp would
have hard time doing this at JIT level, from Jakub.
2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
code generation for 32-bit sub-registers. Evaluation shows that this
can result in code reduction of ~5-20% compared to 64 bit-only code
generation. Also add implementation for most JITs, from Jiong.
3) Add support for __int128 types in BTF which is also needed for
vmlinux's BTF conversion to work, from Yonghong.
4) Add a new command to bpftool in order to dump a list of BPF-related
parameters from the system or for a specific network device e.g. in
terms of available prog/map types or helper functions, from Quentin.
5) Add AF_XDP sock_diag interface for querying sockets from user
space which provides information about the RX/TX/fill/completion
rings, umem, memory usage etc, from Björn.
6) Add skb context access for skb_shared_info->gso_segs field, from Eric.
7) Add support for testing flow dissector BPF programs by extending
existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.
8) Split BPF kselftest's test_verifier into various subgroups of tests
in order better deal with merge conflicts in this area, from Jakub.
9) Add support for queue/stack manipulations in bpftool, from Stanislav.
10) Document BTF, from Yonghong.
11) Dump supported ELF section names in libbpf on program load
failure, from Taeung.
12) Silence a false positive compiler warning in verifier's BTF
handling, from Peter.
13) Fix help string in bpftool's feature probing, from Prashant.
14) Remove duplicate includes in BPF kselftests, from Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements code-gen for new JMP32 instructions on NFP.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a verifier callback to the nfp JIT to remove the instructions
the verifier deemed to be dead.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Verifier will now optimize out branches to dead code, implement
the replace_insn callback to take advantage of that optimization.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Instead of passing env->prog->len around, and trying to adjust
for optimized out instructions just save the initial number
of instructions in struct nfp_prog.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We fail program loading if jump lands on a skipped instruction.
This is for historical reasons, it used to be that we only skipped
instructions optimized out based on prior context, and therefore
the optimization would be buggy if we jumped directly to such
instruction (because the context would be skipped by the jump).
There are cases where instructions can be skipped without any
context, for example there is no point in generating code for:
r0 |= 0
We will also soon support dropping dead code, so make the skip
logic differentiate between "optimized with preceding context"
vs other skip types.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Instruction number is meaningless at code gen phase. The target
of the instruction is overwritten by nfp_fixup_branches(). The
convention is to put the raw offset in target address as a place
holder. See cmp_* functions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A MAC address is not necessarily a unique identifier for a netdev. Drivers
such as Linux bonds, for example, can apply the same MAC address to the
upper layer device and all lower layer devices.
NFP MAC offload for tunnel decap includes port verification for reprs but
also supports the offload of non-repr MAC addresses by assigning 'global'
indexes to these. This means that the FW will not verify the incoming port
of a packet matching this destination MAC.
Modify the MAC offload logic to assign global indexes based on MAC address
instead of net device (as it currently does). Use this to allow multiple
devices to share the same MAC. In other words, if a repr shares its MAC
address with another device then give the offloaded MAC a global index
rather than associate it with an ingress port. Track this so that changes
can be reverted as MACs stop being shared.
Implement this by removing the current list based assignment of global
indexes and replacing it with an rhashtable that maps an offloaded MAC
address to the number of devices sharing it, distributing global indexes
based on this.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible to receive a MAC address change notification without the
net device being down (e.g. when an OvS bridge is assigned the same MAC as
a port added to it). This means that an offloaded MAC address may not be
removed if its device gets a new address.
Maintain a record of the offloaded MAC addresses for each repr and netdev
assigned a MAC offload index. Use this to delete the (now expired) MAC if
a change of address event occurs. Only handle change address events if the
device is already up - if not then the netdev up event will handle it.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP repr netdevs contain private data that can store per port information.
In certain cases, the NFP driver offloads information from non-repr ports
(e.g. tunnel ports). As the driver does not have control over non-repr
netdevs, it cannot add/track private data directly to the netdev struct.
Add infastructure to store private information on any non-repr netdev that
is offloaded at a given time. This is used in a following patch to track
offloaded MAC addresses for non-reprs and enable correct house keeping on
address changes.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a potential tunnel end point goes down then its MAC address should
not be matchable on the NFP.
Implement a delete message for offloaded MACs and call this on net device
down. While at it, remove the actions on register and unregister netdev
events. A MAC should only be offloaded if the device is up. Note that the
netdev notifier will replay any notifications for UP devices on
registration so NFP can still offload ports that exist before the driver
is loaded. Similarly, devices need to go down before they can be
unregistered so removal of offloaded MACs is only required on down events.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Potential MAC destination addresses for tunnel end-points are offloaded to
firmware. This was done by building a list of such MACs and writing to
firmware as blocks of addresses.
Simplify this code by removing the list format and sending a new message
for each offloaded MAC.
This is in preparation for delete MAC messages. There will be one delete
flag per message so we cannot assume that this applies to all addresses
in a list.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently MAC addresses of all repr netdevs, along with selected non-NFP
controlled netdevs, are offloaded to FW as potential tunnel end-points.
However, the addresses of VF and PF reprs are meaningless outside of
internal communication and it is only those of physical port reprs
required.
Modify the MAC address offload selection code to ignore VF/PF repr devs.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent additions to the flower app private data have grouped the variables
of a given feature into a struct and added that struct to the main private
data struct.
In keeping with this, move all tunnel related private data to their own
struct. This has no affect on functionality but improves readability and
maintenance of the code.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for multiple memory units which are used for filter
offloads. Each filter is assigned a stats id, the MSBs of the id are
used to determine which memory unit the filter should be offloaded
to. The number of available memory units that could be used for filter
offload is obtained from HW. A simple round robin technique is used to
allocate and distribute the ids across memory units.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
QA tests report occasional timeouts on REIFY message replies. Profiling
of the two cmesg reply types under burst conditions, with a 12-core host
under heavy cpu and io load (stress --cpu 12 --io 12), show both PHY MTU
change and REIFY replies can exceed the 10ms timeout. The maximum MTU
reply wait under burst is 16ms, while the maximum REIFY wait under 40 VF
burst is 12ms. Using a 4 VF REIFY burst results in an 8ms maximum wait.
A larger VF burst does increase the delay, but not in a linear enough
way to justify a scaled REIFY delay. The worse case values between
MTU and REIFY appears close enough to justify a common timeout. Pick a
conservative 40ms to make a safer future proof common reply timeout. The
delay only effects the failure case.
Change the REIFY timeout mechanism to use wait_event_timeout() instead
of wait_event_interruptible_timeout(), to match the MTU code. In the
current implementation, theoretically, a signal could interrupt the
REIFY waiting period, with a return code of ERESTARTSYS. However, this is
caught under the general timeout error code EIO. I cannot see the benefit
of exposing the REIFY waiting period to signals with such a short delay
(40ms), while the MTU mechnism does not use the same logic. In the absence
of any reply (wakeup() call), both reply types will wake up the task after
the timeout period. The REIFY timeout applies to the entire representor
group being instantiated (e.g. VFs), while the MTU timeout apples to a
single PHY MTU change.
Signed-off-by: Fred Lotter <frederik.lotter@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.
This change was generated with the following Coccinelle SmPL patch:
@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@
-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-12-21
The following pull-request contains BPF updates for your *net-next* tree.
There is a merge conflict in test_verifier.c. Result looks as follows:
[...]
},
{
"calls: cross frame pruning",
.insns = {
[...]
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
{
"jset: functional",
.insns = {
[...]
{
"jset: unknown const compare not taken",
.insns = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_prandom_u32),
BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1),
BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "!read_ok",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
[...]
{
"jset: range",
.insns = {
[...]
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.result_unpriv = ACCEPT,
.result = ACCEPT,
},
The main changes are:
1) Various BTF related improvements in order to get line info
working. Meaning, verifier will now annotate the corresponding
BPF C code to the error log, from Martin and Yonghong.
2) Implement support for raw BPF tracepoints in modules, from Matt.
3) Add several improvements to verifier state logic, namely speeding
up stacksafe check, optimizations for stack state equivalence
test and safety checks for liveness analysis, from Alexei.
4) Teach verifier to make use of BPF_JSET instruction, add several
test cases to kselftests and remove nfp specific JSET optimization
now that verifier has awareness, from Jakub.
5) Improve BPF verifier's slot_type marking logic in order to
allow more stack slot sharing, from Jiong.
6) Add sk_msg->size member for context access and add set of fixes
and improvements to make sock_map with kTLS usable with openssl
based applications, from John.
7) Several cleanups and documentation updates in bpftool as well as
auto-mount of tracefs for "bpftool prog tracelog" command,
from Quentin.
8) Include sub-program tags from now on in bpf_prog_info in order to
have a reliable way for user space to get all tags of the program
e.g. needed for kallsyms correlation, from Song.
9) Add BTF annotations for cgroup_local_storage BPF maps and
implement bpf fs pretty print support, from Roman.
10) Fix bpftool in order to allow for cross-compilation, from Ivan.
11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order
to be compatible with libbfd and allow for Debian packaging,
from Jakub.
12) Remove an obsolete prog->aux sanitation in dump and get rid of
version check for prog load, from Daniel.
13) Fix a memory leak in libbpf's line info handling, from Prashant.
14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info
does not get unaligned, from Jesper.
15) Fix test_progs kselftest to work with older compilers which are less
smart in optimizing (and thus throwing build error), from Stanislav.
16) Cleanup and simplify AF_XDP socket teardown, from Björn.
17) Fix sk lookup in BPF kselftest's test_sock_addr with regards
to netns_id argument, from Andrey.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.
Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
The top word of the constant can only have bits set if sign
extension set it to all-1, therefore we don't really have to
mask the top half of the register. We can just OR it into
the result as is.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The verifier will now understand the JSET instruction, so don't
mark the dead branch in the JIT as noop. We won't generate any
code, anyway.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Previously the identifier used for indirect block callback registry and
for block rule cb registry (when done via indirect blocks) was the pointer
to the netdev we were interested in receiving updates on. This worked fine
if a single app existed that registered one callback per netdev of
interest. However, if multiple cards are in place and, in turn, multiple
apps, then each app may register the same callback with the same
identifier to both the netdev's indirect block cb list and to a block's cb
list. This can lead to EEXIST errors and/or incorrect cb deletions.
Prevent this conflict by using the app pointer as the identifier for
netdev indirect block cb registry, allowing each app to register a unique
callback per netdev. For block cb registry, the same app may register
multiple cbs to the same block if using TC shared blocks. Instead of the
app, use the pointer to the allocated cb_priv data as the identifier here.
This means that there can be a unique block callback for each app/netdev
combo.
Fixes: 3166dd07a9 ("nfp: flower: offload tunnel decap rules via indirect TC blocks")
Reported-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW team asks to be able to not support RED even if NIC is capable
of buffering for testing and experimentation. Add an opt-out flag.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-12-11
The following pull-request contains BPF updates for your *net-next* tree.
It has three minor merge conflicts, resolutions:
1) tools/testing/selftests/bpf/test_verifier.c
Take first chunk with alignment_prevented_execution.
2) net/core/filter.c
[...]
case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
case bpf_ctx_range(struct __sk_buff, wire_len):
return false;
[...]
3) include/uapi/linux/bpf.h
Take the second chunk for the two cases each.
The main changes are:
1) Add support for BPF line info via BTF and extend libbpf as well
as bpftool's program dump to annotate output with BPF C code to
facilitate debugging and introspection, from Martin.
2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter
and all JIT backends, from Jiong.
3) Improve BPF test coverage on archs with no efficient unaligned
access by adding an "any alignment" flag to the BPF program load
to forcefully disable verifier alignment checks, from David.
4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for
proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz.
5) Extend tc BPF programs to use a new __sk_buff field called wire_len
for more accurate accounting of packets going to wire, from Petar.
6) Improve bpftool to allow dumping the trace pipe from it and add
several improvements in bash completion and map/prog dump,
from Quentin.
7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for
kernel addresses and add a dedicated BPF JIT backend allocator,
from Ard.
8) Add a BPF helper function for IR remotes to report mouse movements,
from Sean.
9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info
member naming consistent with existing conventions, from Yonghong
and Song.
10) Misc cleanups and improvements in allowing to pass interface name
via cmdline for xdp1 BPF example, from Matteo.
11) Fix a potential segfault in BPF sample loader's kprobes handling,
from Daniel T.
12) Fix SPDX license in libbpf's README.rst, from Andrey.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we did not ensure tcp flags have a place to be stored
when using IPv6. We correct this by including IPv6 key layer when
we match tcp flags and the IPv6 key layer has not been included
already.
Fixes: 07e1671cfc ("nfp: flower: refactor shared ip header in match offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several conflicts, seemingly all over the place.
I used Stephen Rothwell's sample resolutions for many of these, if not
just to double check my own work, so definitely the credit largely
goes to him.
The NFP conflict consisted of a bug fix (moving operations
past the rhashtable operation) while chaning the initial
argument in the function call in the moved code.
The net/dsa/master.c conflict had to do with a bug fix intermixing of
making dsa_master_set_mtu() static with the fixing of the tagging
attribute location.
cls_flower had a conflict because the dup reject fix from Or
overlapped with the addition of port range classifiction.
__set_phy_supported()'s conflict was relatively easy to resolve
because Andrew fixed it in both trees, so it was just a matter
of taking the net-next copy. Or at least I think it was :-)
Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
intermixed with changes on how the sdif and caller_net are calculated
in these code paths in net-next.
The remaining BPF conflicts were largely about the addition of the
__bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
to the relevant data structure where the MD pointer macros are used.
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF_X support needs indirect shift mode, please see code comments for
details.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW reconfiguration timeouts are a common indicator of FW trouble.
To make debugging easier print requested update and control word
when reconfiguration fails.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When troubleshooting incorrect FW capabilities it's useful to know
where the faulty TLV is located. Add offset to all errors messages.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW/HW can generally support the standard networking offloads
on representors without any trouble. Add the ability for FW
to advertise which features should be available on representors.
Because representors are muxed on top of the vNIC we need to listen
on feature changes of their lower devices, and update their features
appropriately.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now we never needed to keep a networking locks around
representors accesses, we only accessed them when device was
reconfigured (under nfp pf->lock) or on fast path (under RCU).
Now we want to be able to iterate over all representors during
notifications, so make sure representor assignment is done
under RTNL lock.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our representors are software devices built on top of the PF
vNIC, the queuing should only happen at the vNIC netdevice.
Allow representors to run qdisc-less.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our representors are software devices built on top of the PF
vNIC, the only state they have are per-cpu stats, so make
the TX run locklessly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for TSO over representors make sure the port id
prepend will always fit in the frame. The current max header
length is 255, which is ample, so assume worst case scenario
of 8 byte prepend and save ourselves the conditionals.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TSO-related offsets in the descriptor should not include
the length of the prepended metadata. Adjust them. Note that
this could not have caused issues in the past as we don't
support TSO with metadata prepend as of this patch.
Signed-off-by: Michael Rapson <michael.rapson@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nd_q is only used at the very end of nfp_net_tx(), there is no need
to initialize it early.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move temporary variables in scope of the loop in nfp_net_tx_complete(),
and add a temp for txbuf software structure. This saves us 0.2% of CPU.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chained descriptors for fragments need to duplicate all the descriptor
fields of the skb head, so we copy the descriptor and then modify the
relevant fields. This is wasteful, because the top half of the descriptor
will get overwritten entirely while the bottom half is not modified at all.
Copy only the bottom half. This saves us 0.3% of CPU in a GSO test.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For flow offload adds, if the rhash insert code fails, the flow will still
have been offloaded but the reference to it in the driver freed.
Re-order the offload setup calls to ensure that a flow will only be written
to FW if a kernel reference is held and stored in the rhashtable. Remove
this hashtable entry if the offload fails.
Fixes: c01d0efa51 ("nfp: flower: use rhashtable for flow caching")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling nfp_compile_flow_metadata both assigns a stats context and
increments a ref counter on (or allocates) a mask id table entry. These
are released by the nfp_modify_flow_metadata call on flow deletion,
however, if a flow add fails after metadata is set then the flow entry
will be deleted but the metadata assignments leaked.
Add an error path to the flow add offload function to ensure allocated
metadata is released in the event of an offload fail.
Fixes: 81f3ddf254 ("nfp: add control message passing capabilities to flower offloads")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-11-26
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Extend BTF to support function call types and improve the BPF
symbol handling with this info for kallsyms and bpftool program
dump to make debugging easier, from Martin and Yonghong.
2) Optimize LPM lookups by making longest_prefix_match() handle
multiple bytes at a time, from Eric.
3) Adds support for loading and attaching flow dissector BPF progs
from bpftool, from Stanislav.
4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.
5) Enable verifier to support narrow context loads with offset > 0
to adapt to LLVM code generation (currently only offset of 0 was
supported). Add test cases as well, from Andrey.
6) Simplify passing device functions for offloaded BPF progs by
adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
Also convert nfp and netdevsim to make use of them, from Quentin.
7) Add support for sock_ops based BPF programs to send events to
the perf ring-buffer through perf_event_output helper, from
Sowmini and Daniel.
8) Add read / write support for skb->tstamp from tc BPF and cg BPF
programs to allow for supporting rate-limiting in EDT qdiscs
like fq from BPF side, from Vlad.
9) Extend libbpf API to support map in map types and add test cases
for it as well to BPF kselftests, from Nikita.
10) Account the maximum packet offset accessed by a BPF program in
the verifier and use it for optimizing nfp JIT, from Jiong.
11) Fix error handling regarding kprobe_events in BPF sample loader,
from Daniel T.
12) Add support for queue and stack map type in bpftool, from David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Original FW only allowed us to perform ECN marking. Newer releases
also support plain old drop. Add the ability to configure drop
policy. This is particularly useful in combination with GRED,
because different bands can have different ECN marking setting.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use offload of very simple u32 filters to direct packets to GRED
bands based on the DSCP marking. No u32 hashing is supported,
just plain simple filters matching on ToS or Priority with
appropriate mask device can support.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Learn how to set the DSCP map. FW uses a packed array which
geometry depends on the number of supported priorities and
virtual queues. Write code to assemble this map and to communicate
the setting to the FW via mailbox.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for PRIO offload calculate how long the prio map
for FW will be and make sure the configuration can be performed
via the vNIC mailbox.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for GRED offload. It behaves much like RED, but
can apply different parameters to different bands. GRED operates
pretty much exactly like our HW/FW with a single FIFO and different
RED state instances.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrap RED parameters and stats into a structure, and a 1-element
array. Upcoming GRED offload will add the support for more bands.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add up stats for all bands for the extra ethtool statistics.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In PRIO-enabled FW read the statistics from per-band symbol, rather
than from the standard per-PCIe-queue counters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure the threshold table is large enough to hold information
for all bands.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for per-band RED offload pass band parameter to
functions. For now it will always be 0.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for multi-band RED offload if FW is capable map
the extended symbols which will allow us to set per-band parameters
and read stats.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation of handling more Qdisc types switch to a different
offload strategy. We have now recreated the Qdisc hierarchy in
the driver. Every time the hierarchy changes parse it, and update
the configuration of the HW accordingly.
While at it drop the support of pretending that we can instantiate
a single queue on a multi-queue device in HW/FW. MQ is now required,
and each queue will have its own instance of RED.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new driver Qdisc structure to keep track of parameters
of RED Qdiscs. This way as the Qdisc moves around in the hierarchy
we will be able to configure the HW appropriately.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RED qdisc will replace its child Qdisc with a new FIFO queue if
it is reconfigured and the limit parameter is not 0.
This means that when it's created with limit of 0 it will have no FIFO,
and all packets will be dropped. If it's changed and limit is specified
it will loose its existing child (implicit graft). Make sure we mark
RED Qdisc child as NFP_QDISC_UNTRACKED if its not the expected FIFO.
nfp_abm_qdisc_replace() will return 1 if Qdisc already existed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using graft notifications recreate in the driver the full Qdisc
hierarchy. Keep track of how many times each Qdisc is attached
to the hierarchy to make sure we don't offload Qdiscs which are
attached multiple times (device queues can't be shared). For
graft events of Qdiscs we don't know exist make the child as
invalid/untracked.
Note that MQ Qdisc doesn't send destruction events reliably when
device is dismantled, so we need to manually clean out the
children otherwise we'd think Qdiscs which are still in use
are getting freed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To keep track of Qdisc hierarchy allocate a table for children
for each Qdisc. RED Qdisc can only have one child.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep track of which Qdisc is currently root. We need to implement
TC_SETUP_ROOT_QDISC handling, and for completeness also clear the
root Qdisc pointer when it's freed. TC_SETUP_ROOT_QDISC isn't always
sent when device is dismantled.
Remembering the root Qdisc will allow us to build the entire hierarchy
in following patches.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate an object corresponding to any offloaded qdisc we are
informed about by the kernel. Not only the qdiscs we have a
chance of offloading.
The count of created objects will be used to decide whether
the ethtool TC offload can be disabled, since otherwise we may
miss destroy commands.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of writing the threshold out when Qdisc is configured
and not remembering it move to a scheme where we remember all
thresholds. When configuration changes parse the offloaded
Qdiscs and set thresholds appropriately.
This will help future extensions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename qdiscs member to red_qdiscs. One of following patches will
use the name qdiscs for tracking all qdisc types.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent changes to NFP mean that stats updates from fw to driver no longer
require a flow lookup and (because egdev offload has been removed) the
ingress netdev for a lookup is now always known.
Remove obsolete code in a flow lookup that matches on host context and
that allows for a netdev to be NULL.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, only tunnel decap rules required egdev registration for
offload in NFP. These are now supported via indirect TC block callbacks.
Remove the egdev code from NFP.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, TC block tunnel decap rules were only offloaded when a
callback was triggered through registration of the rules egress device.
This meant that the driver had no access to the ingress netdev and so
could not verify it was the same tunnel type that the rule implied.
Register tunnel devices for indirect TC block offloads in NFP, giving
access to new rules based on the ingress device rather than egress. Use
this to verify the netdev type of VXLAN and Geneve based rules and offload
the rules to HW if applicable.
Tunnel registration is done via a netdev notifier. On notifier
registration, this is triggered for already existing netdevs. This means
that NFP can register for offloads from devices that exist before it is
loaded (filter rules will be replayed from the TC core). Similarly, on
notifier unregister, a call is triggered for each currently active netdev.
This allows the driver to unregister any indirect block callbacks that may
still be active.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both the actions and tunnel_conf files contain local functions that check
the type of an input netdev. In preparation for re-use with tunnel offload
via indirect blocks, move these to static inline functions in a header
file.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the offload functions in NFP assumed that the ingress (or
egress) netdev passed to them was an nfp repr.
Modify the driver to permit the passing of non repr netdevs as the ingress
device for an offload rule candidate. This may include devices such as
tunnels. The driver should then base its offload decision on a combination
of ingress device and egress port for a rule.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel functions to prepare verifier and translate for offloaded
program retrieve "offload" from "prog", and "netdev" from "offload".
Then both "prog" and "netdev" are passed to the callbacks.
Simplify this by letting the drivers retrieve the net device themselves
from the offload object attached to prog - if they need it at all. There
is currently no need to pass the netdev as an argument to those
functions.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Function bpf_prog_offload_verifier_prep(), called from the kernel BPF
verifier to run a driver-specific callback for preparing for the
verification step for offloaded programs, takes a pointer to a struct
bpf_verifier_env object. However, no driver callback needs the whole
structure at this time: the two drivers supporting this, nfp and
netdevsim, only need a pointer to the struct bpf_prog instance held by
env.
Update the callback accordingly, on kernel side and in these two
drivers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to program destruction to the struct and remove the
subcommand that was used to call them through the NDO.
Remove function __bpf_offload_ndo(), which is no longer used.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to code translation to the struct and remove the
subcommand that was used to call them through the NDO.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In a way similar to the change previously brought to the verify_insn
hook and to the finalize callback, switch to the newly added ops in
struct bpf_prog_offload for calling the functions used to prepare driver
verifiers.
Since the dev_ops pointer in struct bpf_prog_offload is no longer used
by any callback, we can now remove it from struct bpf_prog_offload.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.
As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since
none of those operations rely on RTNL, we can turn these three commands
into hooks inside the struct bpf_prog_offload_ops, and pass them as part
of bpf_offload_dev_create().
This commit effectively passes a pointer to the struct to
bpf_offload_dev_create(). We temporarily have two struct
bpf_prog_offload_ops instances, one under offdev->ops and one under
offload->dev_ops. The next patches will make the transition towards the
former, so that offload->dev_ops can be removed, and callbacks relying
on ndo_bpf() added to offdev->ops as well.
While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and
similarly for netdevsim).
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We are about to add several new callbacks to the struct, all of them
defined in offload.c. Move the struct bpf_prog_offload_ops object in
that file. As a consequence, nfp_verify_insn() and nfp_finalize() can no
longer be static.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
__netdev_tx_sent_queue() was added in commit e59020abf0f
("net: bql: add __netdev_tx_sent_queue()") and allows for
better GSO performance.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP is refusing to offload programs whenever the MTU is set to a value
larger than the max packet bytes that fits in NFP Cluster Target Memory
(CTM). However, a eBPF program doesn't always need to access the whole
packet data.
Verifier has always calculated maximum direct packet access (DPA) offset,
and kept it in max_pkt_offset inside prog auxiliar information. This patch
relax prog rejection based on max_pkt_offset.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
RED Qdisc will now inform the drivers about the state of the harddrop
flag. Refuse to offload in case harddrop is set.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turns out the threshold value is used in signed compares in the FW,
so we should avoid setting the top bit.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve log messages printed when RED can't be offloaded because
of Qdisc parameters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In certain cases initialization logic which follows allocation of
the vNIC structure may want to validate the capabilities of that vNIC.
This is easy before vNIC is initialized for normal capabilities which
are at fixed offsets in control memory, easy to locate and read, but
poses a challenge if the capabilities are in form of TLVs. Parse
the TLVs early on so other code can just access parsed info, instead
of having to do the parsing by itself.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move setting ctrl_bar pointer to the nfp_net_alloc function,
to make sure we can parse capabilities early in the following
patch.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Qdisc offload code is logically separate, and we will soon
do significant surgery on it to support more Qdiscs, so move
it to a separate file.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offload of geneve decap rules is supported in NFP. Include geneve in the
check for supported types.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the recently added VXLAN and geneve helper functions to
determine the type of the netdev from its rtnl_link_ops.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use driver's common notifier for LAG and tunnel configuration.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code interested in networking events registers its own notifier
handlers. Create one device-wide notifier instance.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp_fl_lag_changels_event() never fails, and therefore we would
never return NOTIFY_BAD for NETDEV_CHANGELOWERSTATE. Make this
clearer by changing nfp_fl_lag_changels_event()'s return type
to void.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Returning an error from a notifier means we want to veto the change.
We shouldn't veto NETDEV_UNREGISTER just because we couldn't find
the tracking info for given master.
I can't seem to find a way to trigger this unless we have some
other bug, so it's probably not fix-worthy.
While at it move the checking if the netdev really is of interest
into the handling functions, like we do for other events.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For flower tunnel offloads FW has to be informed about MAC addresses
of tunnel devices. We use a netdev notifier to keep track of these
addresses.
Remove unnecessary loop over netdevices after notifier is registered.
The intention of the loop was to catch devices which already existed
on the system before nfp driver got loaded, but netdev notifier will
replay NETDEV_REGISTER events.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ipv6 set flow label and hop limit action offload. Since pedit sets
headers per 4 byte word, we need to ensure that setting either version,
priority, payload_len or nexthdr does not get offloaded.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ipv4 set ttl and tos action offload. Since pedit sets headers per 4
byte word, we need to ensure that setting either version, ihl, protocol,
total length or checksum does not get offloaded.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-10-21
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Implement two new kind of BPF maps, that is, queue and stack
map along with new peek, push and pop operations, from Mauricio.
2) Add support for MSG_PEEK flag when redirecting into an ingress
psock sk_msg queue, and add a new helper bpf_msg_push_data() for
insert data into the message, from John.
3) Allow for BPF programs of type BPF_PROG_TYPE_CGROUP_SKB to use
direct packet access for __skb_buff, from Song.
4) Use more lightweight barriers for walking perf ring buffer for
libbpf and perf tool as well. Also, various fixes and improvements
from verifier side, from Daniel.
5) Add per-symbol visibility for DSO in libbpf and hide by default
global symbols such as netlink related functions, from Andrey.
6) Two improvements to nfp's BPF offload to check vNIC capabilities
in case prog is shared with multiple vNICs and to protect against
mis-initializing atomic counters, from Jakub.
7) Fix for bpftool to use 4 context mode for the nfp disassembler,
also from Jakub.
8) Fix a return value comparison in test_libbpf.sh and add several
bpftool improvements in bash completion, documentation of bpf fs
restrictions and batch mode summary print, from Quentin.
9) Fix a file resource leak in BPF selftest's load_kallsyms()
helper, from Peng.
10) Fix an unused variable warning in map_lookup_and_delete_elem(),
from Alexei.
11) Fix bpf_skb_adjust_room() signature in BPF UAPI helper doc,
from Nicolas.
12) Add missing executables to .gitignore in BPF selftests, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net/sched/cls_api.c has overlapping changes to a call to
nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
to the 5th argument, and another (from 'net-next') added cb->extack
instead of NULL to the 6th argument.
net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
code which moved (to mr_table_dump)) in 'net-next'. Thanks to David
Ahern for the heads up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ability to determine whether a netdev is a VxLAN netdev by
calling the above mentioned function that checks the netdev's
rtnl_link_ops.
This will allow modules to identify netdev events involving a VxLAN
netdev and act accordingly. For example, drivers capable of VxLAN
offload will need to configure the underlying device when a VxLAN netdev
is being enslaved to an offloaded bridge.
Convert nfp to use the newly introduced helper.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Program translation stage checks that program can be offloaded to
the netdev which was passed during the load (bpf_attr->prog_ifindex).
After program sharing was introduced, however, the netdev on which
program is loaded can theoretically be different, and therefore
we should recheck the program size and max stack size at load time.
This was found by code inspection, AFAIK today all vNICs have
identical caps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Atomic operations on the NFP are currently always in big endian.
The driver keeps track of regions of memory storing atomic values
and byte swaps them accordingly. There are corner cases where
the map values may be initialized before the driver knows they
are used as atomic counters. This can happen either when the
datapath is performing the update and the stack contents are
unknown or when map is updated before the program which will
use it for atomic values is loaded.
To avoid situation where user initializes the value to 0 1 2 3
and then after loading a program which uses the word as an atomic
counter starts reading 3 2 1 0 - only allow atomic counters to be
initialized to endian-neutral values.
For updates from the datapath the stack information may not be
as precise, so just allow initializing such values to 0.
Example code which would break:
struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(__u32),
.value_size = sizeof(__u64),
.max_entries = 1,
};
int xdp_prog1()
{
__u64 nonzeroval = 3;
__u32 key = 0;
__u64 *value;
value = bpf_map_lookup_elem(&rxcnt, &key);
if (!value)
bpf_map_update_elem(&rxcnt, &key, &nonzeroval, BPF_ANY);
else
__sync_fetch_and_add(value, 1);
return XDP_PASS;
}
$ offload bpftool map dump
key: 00 00 00 00 value: 00 00 00 03 00 00 00 00
should be:
$ offload bpftool map dump
key: 00 00 00 00 value: 03 00 00 00 00 00 00 00
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previously when populating the set ipv6 address action, we incorrectly
made use of pedit's key index to determine which 32bit word should be
set. We now calculate which word has been selected based on the offset
provided by the pedit action.
Fixes: 354b82bb32 ("nfp: add set ipv6 source and destination address")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we only allowed a single header key per pedit action to
change the header. This used to result in the last header key in the
pedit action to overwrite previous headers. We now keep track of them
and allow multiple header keys per pedit action.
Fixes: c0b1bd9a8b ("nfp: add set ipv4 header action flower offload")
Fixes: 354b82bb32 ("nfp: add set ipv6 source and destination address")
Fixes: f8b7b0a6b1 ("nfp: add set tcp and udp header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we did not correctly change headers when using multiple
pedit actions with partial masks. We now take this into account and
no longer just commit the last pedit action.
Fixes: c0b1bd9a8b ("nfp: add set ipv4 header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit makes it possible to use devlink to split the 100G CXP
Netronome into two 40G interfaces. Currently when you ask for 2
interfaces, the math in src/nfp_devlink.c:nfp_devlink_port_split
calculates that you want 5 lanes per port because for some reason
eth_port.port_lanes=10 (shouldn't this be 12 for CXP?). What we really
want when asking for 2 breakout interfaces is 4 lanes per port. This
commit makes that happen by calculating based on 8 lanes if 10 are
present.
Signed-off-by: Ryan C Goodfellow <rgoodfel@isi.edu>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Greg Weeks <greg.weeks@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the repeated license text with SDPX identifiers.
While at it bump the Copyright dates for files we touched
this year.
Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Read the host context count symbols provided by firmware and use
it to determine the number of allocated stats ids. Previously it
won't be possible to offload more than 2^17 filter even if FW was
able to do so.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of an array stats instead of storing stats per flow which
would require a hash lookup at critical times.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of relativistic hash tables for tracking flows instead
of fixed sized hash tables.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-10-08
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.
2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c
3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet
4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski
5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.
6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf
7) various AF_XDP fixes from Björn and Magnus
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark instructions that use pointers to areas in the stack outside of the
current stack frame, and process them accordingly in mem_op_stack().
This way, we also support BPF-to-BPF calls where the caller passes a
pointer to data in its own stack frame to the callee (typically, when
the caller passes an address to one of its local variables located in
the stack, as an argument).
Thanks to Jakub and Jiong for figuring out how to deal with this case,
I just had to turn their email discussion into this patch.
Suggested-by: Jiong Wang <jiong.wang@netronome.com>
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When pre-processing the instructions, it is trivial to detect what
subprograms are using R6, R7, R8 or R9 as destination registers. If a
subprogram uses none of those, then we do not need to jump to the
subroutines dedicated to saving and restoring callee-saved registers in
its prologue and epilogue.
This patch introduces detection of callee-saved registers in subprograms
and prevents the JIT from adding calls to those subroutines whenever we
can: we save some instructions in the translated program, and some time
at runtime on BPF-to-BPF calls and returns.
If no subprogram needs to save those registers, we can avoid appending
the subroutines at the end of the program.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On performing a BPF-to-BPF call, we first jump to a subroutine that
pushes callee-saved registers (R6~R9) to the stack, and from there we
goes to the start of the callee next. In order to do so, the caller must
pass to the subroutine the address of the NFP instruction to jump to at
the end of that subroutine. This cannot be reliably implemented when
translated the caller, as we do not always know the start offset of the
callee yet.
This patch implement the required fixup step for passing the start
offset in the callee via the register used by the subroutine to hold its
return address.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Relocation for targets of BPF-to-BPF calls are required at the end of
translation. Update the nfp_fixup_branches() function in that regard.
When checking that the last instruction of each bloc is a branch, we
must account for the length of the instructions required to pop the
return address from the stack.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Offloaded programs using BPF-to-BPF calls use the stack to store the
return address when calling into a subprogram. Callees also need some
space to save eBPF registers R6 to R9. And contrarily to kernel
verifier, we align stack frames on 64 bytes (and not 32). Account for
all this when checking the stack size limit before JIT-ing the program.
This means we have to recompute maximum stack usage for the program, we
cannot get the value from the kernel.
In addition to adapting the checks on stack usage, move them to the
finalize() callback, now that we have it and because such checks are
part of the verification step rather than translation.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This is the main patch for the logics of BPF-to-BPF calls in the nfp
driver.
The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were
used to call helpers and exit from the program, respectively; make them
usable for calling into, or returning from, a BPF subprogram as well.
For all calls, push the return address as well as the callee-saved
registers (R6 to R9) to the stack, and pop them upon returning from the
calls. In order to limit the overhead in terms of instruction number,
this is done through dedicated subroutines. Jumping to the callee
actually consists in jumping to the subroutine, that "returns" to the
callee: this will require some fixup for passing the address in a later
patch. Similarly, returning consists in jumping to the subroutine, which
pops registers and then return directly to the caller (but no fixup is
needed here).
Return to the caller is performed with the RTN instruction newly added
to the JIT.
For the few steps where we need to know what subprogram an instruction
belongs to, the struct nfp_insn_meta is extended with a new subprog_idx
field.
Note that checks on the available stack size, to take into account the
additional requirements associated to BPF-to-BPF calls (storing R6-R9
and return addresses), are added in a later patch.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will
require additional processing before translation starts, in order to
record and mark jump destinations.
We also mark the instructions where each subprogram begins. This will be
used in a following commit to determine where to add prologues for
subprograms.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The checks related to eBPF helper calls are performed each time the nfp
driver meets a BPF_JUMP | BPF_CALL instruction. However, these checks
are not relevant for BPF-to-BPF call (same instruction code, different
value in source register), so just skip the checks for such calls.
While at it, rename the function that runs those checks to make it clear
they apply to _helper_ calls only.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In order to support BPF-to-BPF calls in offloaded programs, the nfp
driver must collect information about the distinct subprograms: namely,
the number of subprograms composing the complete program and the stack
depth of those subprograms. The latter in particular is non-trivial to
collect, so we copy those elements from the kernel verifier via the
newly added post-verification hook. The struct nfp_prog is extended to
store this information. Stack depths are stored in an array of dedicated
structs.
Subprogram start indexes are not collected. Instead, meta instructions
associated to the start of a subprogram will be marked with a flag in a
later patch.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In preparation for support for BPF to BPF calls in offloaded programs,
rename the "stack_depth" field of the struct nfp_prog as
"stack_frame_depth". This is to make it clear that the field refers to
the maximum size of the current stack frame (as opposed to the maximum
size of the whole stack memory).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In preparation for BPF-to-BPF calls in offloaded programs, add a new
function attribute to the struct bpf_prog_offload_ops so that drivers
supporting eBPF offload can hook at the end of program verification, and
potentially extract information collected by the verifier.
Implement a minimal callback (returning 0) in the drivers providing the
structs, namely netdevsim and nfp.
This will be useful in the nfp driver, in later commits, to extract the
number of subprograms as well as the stack depth for those subprograms.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
mlx5 core driver and ethernet netdev updates, please note there is a small
devlink releated update to allow extack argument to eswitch operations.
From Eli Britstein,
1) devlink: Add extack argument to the eswitch related operations
2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
3) net/mlx5e: Add extack messages for TC offload failures
From Eran Ben Elisha,
4) mlx5e: Add counter for aRFS rule insertion failures
From Feras Daoud
5) Fast teardown support for mlx5 device
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the FW pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been scatter
to memory
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbtU45AAoJEEg/ir3gV/o+/C4H/RHA4KImrb476EdB3VNYMqAN
dgXb+bmh6sZP+jHWqQ4c3aVeh6/T8qm4gwiSn2nVTtHEnxtCdIYljzDC1Nswczeg
pSjD1eOP7M1LpAOmBb8xdnJcX7yM7r1bTklnp2sN853WShbsDRYgZBHsBwTzx25U
ZdzL4QTLuohlG/aLrbGXMntIy45ya2fVQrnK54s18nFlgsdFjEs0mi0xaUKNBC6+
P8CTohHAxuuxmL5b+6MIYLZCdgd8cLNQFdtqbckEVw7SvcRTxfraRlyqJ0YOgTGB
TdSWnqZz2JYH29wSFbpFG8qX6GCv8FoiZ+fKzldbolHk442rrktHv3+Y7qQuZVs=
=NVks
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2018-10-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-10-03
mlx5 core driver and ethernet netdev updates, please note there is a small
devlink releated update to allow extack argument to eswitch operations.
From Eli Britstein,
1) devlink: Add extack argument to the eswitch related operations
2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
3) net/mlx5e: Add extack messages for TC offload failures
From Eran Ben Elisha,
4) mlx5e: Add counter for aRFS rule insertion failures
From Feras Daoud
5) Fast teardown support for mlx5 device
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the FW pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been scatter
to memory
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add extack argument to the eswitch related operations.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When FW floods the driver with control messages try to exit the cmsg
processing loop every now and then to avoid soft lockups. Cmsg
processing is generally very lightweight so 512 seems like a reasonable
budget, which should not be exceeded under normal conditions.
Fixes: 77ece8d5f1 ("nfp: add control vNIC datapath")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In current ABI the size of the messages carrying map elements was
statically defined to at most 16 words of key and 16 words of value
(NFP word is 4 bytes). We should not make this assumption and use
the max key and value sizes from the BPF capability instead.
To make sure old kernels don't get surprised with larger (or smaller)
messages bump the FW ABI version to 3 when key/value size is different
than 16 words.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some apps may want to have higher MTU on the control vNIC/queue.
Allow them to set the requested MTU at init time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Up until now we only had per-vNIC BPF ABI version capabilities,
which are slightly awkward to use because bulk of the resources
and configuration does not relate to any particular vNIC. Add
a new capability for global ABI version and check the per-vNIC
version are equal to it. Assume the ABI version 2 if no explicit
version capability is present.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reserve two TLV types for feature development, and warn in the driver
if they ever leak into production.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Version bump conflict in batman-adv, take what's in net-next.
iavf conflict, adjustment of netdev_ops in net-next conflicting
with poll controller method removal in net.
Signed-off-by: David S. Miller <davem@davemloft.net>
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
nfp uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP supports fairly enormous ring sizes (up to 256k descriptors).
In commit 4662717038 ("nfp: use kvcalloc() to allocate SW buffer
descriptor arrays") we have started using kvcalloc() functions to
make sure the allocation of software state arrays doesn't hit
the MAX_ORDER limit. Unfortunately, we can't use virtual mappings
for the DMA region holding HW descriptors. In case this allocation
fails instead of the generic (and fairly scary) warning/splat in
the logs print a helpful message explaining what happened and
suggesting how to fix it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report in standard netdev stats drops and errors as well as
RX multicast from the FW vNIC counters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a bug where ipv6 tunnels would report that it is
getting offloaded to hardware but would actually be rejected
by hardware.
Fixes: b27d6a95a7 ("nfp: compile flower vxlan tunnel set actions")
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we only checked if the vlan id field is present when trying
to match a vlan tag. The vlan id and vlan pcp field should be treated
independently.
Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
Signed-off-by: jun qian <hangdianqj@163.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VXLAN and GRE FW features have to currently be both advertised
for the driver to enable them. Separate the handling.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the accesses to rtsyms now all going via special helpers
we can easily make sure the driver is not reading past the
end of the symbol.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ease of debug preface all error messages with the name
of the symbol which caused them. Use the same message format
for existing messages while at it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return the error and report value through the output param.
Fixes: 640917dd81 ("nfp: support access to absolute RTsyms")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid leaking a running timer we need to wait for the
posted reconfigs after netdev is unregistered. In common
case the process of deinitializing the device will perform
synchronous reconfigs which wait for posted requests, but
especially with VXLAN ports being actively added and removed
there can be a race condition leaving a timer running after
adapter structure is freed leading to a crash.
Add an explicit flush after deregistering and for a good
measure a warning to check if timer is running just before
structures are freed.
Fixes: 3d780b926a ("nfp: add async reconfiguration mechanism")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the RTsym users access the size via the helper, which
takes care of special handling of absolute symbols.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support in nfpcore for reading the absolute RTsyms.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert all users of RTsym to the new set of helpers which
handle all targets correctly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make nfp_rtsym_{read,write}_le() and nfp_rtsym_map() use the new
target resolution helpers to allow accessing in-cache symbols.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Align nfp_cpp_map_area() with other CPP-level APIs and pass
encoded cpp_id/dest rather than target, action, domain tuple.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTsyms may have special encodings for more complex symbol types.
For example symbols which are placed in external memory unit's
cache directly, constants or local memory. Add set of helpers
which will check for those special encodings and handle them
correctly.
For now only add direct cache accesses, we don't have a need to
access the other ones in foreseeable future.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add error prints to CPP target encoding/decoding logic, otherwise
it's quite hard to pin point the reasons why read or write
operations fail.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will soon need the MU locality field offset much more
often than just for decoding MIP address. Save it in nfp_cpp
for quick access. Note that we can already reuse the target
config from nfp_cpp, no need to do the XPB read.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a switch statement instead of ifs for code dependent
on chip version. While at it make sure we fail for unknown
chip revisions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add NFP5000 to supported chips, the chip is backward compatible
with NFP4000 and NFP6000, so core PCIe code needs to handle it
the same way as 4k and 6k.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In multi-host scenarios Management FW may allocate MAC addresses
at runtime, we have to use the indirect lookup to find them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management FW can adjust some of the information in the HWinfo table
at runtime. In some cases reading the table directly will not yield
correct results. Add a NSP command for looking up information.
Up until now we weren't making use of any of the values which may
get adjusted.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To enable easier FW distribution NFP can now automatically
select between FW stored on the flash and loaded from the
kernel.
If FW loading policy is set to auto it will compare the
versions of FW from the host and from the flash and load
the newer one. If FW type doesn't match (e.g. one advanced
application vs another) the FW from the host takes precedence,
unless one of them is the basic NIC firmware, in which case
the non-basic-NIC FW is selected.
This automatic selection mechanism requires we inform user
what the verdict was. Print a message to the logs explaining
the decision and the reason.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flash may contain a default NFP application FW. This application
can either be put there by the user (with ethtool -f) or shipped
with the card. If file system FW is not found, attempt to load
this flash stored app FW.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is already a fair number of arguments to nfp_nsp_command()
family of functions. Encapsulate them into structures to make
adding new ones easier. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 90b73b77d0, list_head is no longer needed.
Now we just need to convert the list iteration to array
iteration for drivers.
Fixes: 90b73b77d0 ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove 'Return:' information from functions which no longer
return a value. Also update name and return types of nfp_nffw_info
access functions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new layer for matching on geneve options. This allows
offloading filters configured to match geneve with options.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new push geneve option action. This allows offloading
filters configured to entunnel geneve with options.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The addition of FLOW_DISSECTOR_KEY_ENC_IP to TC flower means that the ToS
and TTL of the tunnel header can now be matched on.
Extend the NFP tunnel match function to include these new fields.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TTL for encapsulating headers in IPv4 UDP tunnels is taken from a
route lookup. Modify this to first check if a user has specified a TTL to
be used in the TC action.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-08-07
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add cgroup local storage for BPF programs, which provides a fast
accessible memory for storing various per-cgroup data like number
of transmitted packets, etc, from Roman.
2) Support bpf_get_socket_cookie() BPF helper in several more program
types that have a full socket available, from Andrey.
3) Significantly improve the performance of perf events which are
reported from BPF offload. Also convert a couple of BPF AF_XDP
samples overto use libbpf, both from Jakub.
4) seg6local LWT provides the End.DT6 action, which allows to
decapsulate an outer IPv6 header containing a Segment Routing Header.
Adds this action now to the seg6local BPF interface, from Mathieu.
5) Do not mark dst register as unbounded in MOV64 instruction when
both src and dst register are the same, from Arthur.
6) Define u_smp_rmb() and u_smp_wmb() to their respective barrier
instructions on arm64 for the AF_XDP sample code, from Brian.
7) Convert the tcp_client.py and tcp_server.py BPF selftest scripts
over from Python 2 to Python 3, from Jeremy.
8) Enable BTF build flags to the BPF sample code Makefile, from Taeung.
9) Remove an unnecessary rcu_read_lock() in run_lwt_bpf(), from Taehee.
10) Several improvements to the README.rst from the BPF documentation
to make it more consistent with RST format, from Tobin.
11) Replace all occurrences of strerror() by calls to strerror_r()
in libbpf and fix a FORTIFY_SOURCE build error along with it,
from Thomas.
12) Fix a bug in bpftool's get_btf() function to correctly propagate
an error via PTR_ERR(), from Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for adjust_tail. There are no FW changes needed but add
a FW capability just in case there would be any issue with previously
released FW, or we will have to change the ABI in the future.
The helper is trivial and shouldn't be used too often so just inline
the body of the function. We add the delta to locally maintained
packet length register and check for overflow, since add of negative
value must overflow if result is positive. Note that if delta of 0
would be allowed in the kernel this trick stops working and we need
one more instruction to compare lengths before and after the change.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The BTF conflicts were simple overlapping changes.
The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.
Signed-off-by: David S. Miller <davem@davemloft.net>
'app' is dereferenced before used for the devlink trace point.
In case FW is buggy and sends a control message to a VF queue
we should make sure app is not NULL.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Function nfp_flower_repr_get_type_and_port expects an enum nfp_repr_type
return value but, if the repr type is unknown, returns a value of type
enum nfp_flower_cmsg_port_type. This means that if FW encodes the port
ID in a way the driver does not understand instead of dropping the frame
driver may attribute it to a physical port (uplink) provided the port
number is less than physical port count.
Fix this and ensure a net_device of NULL is returned if the repr can not
be determined.
Fixes: 1025351a88 ("nfp: add flower app")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW can put constraints on map element size to maximize resource
use and efficiency. When user attempts offload of a map which
does not fit into those constraints an informational message is
printed to kernel logs to inform user about the reason offload
failed. Map offload does not have access to any advanced error
reporting like verifier log or extack. There is also currently
no way for us to nicely expose the FW capabilities to user
space. Given all those constraints we should make sure log
messages are as informative as possible. Improve them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Record perf maps by map ID, not raw kernel pointer. This helps
with debug messages, because printing pointers to logs is frowned
upon, and makes debug easier for the users, as map ID is something
they should be more familiar with. Note that perf maps are offload
neutral, therefore IDs won't be orphaned.
While at it use a rate limited print helper for the error message.
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Control queue is fairly low latency, and requires SKB allocations,
which means we can't even reach 0.5Msps with perf events. Allow
perf events to be delivered to data queues. This allows us to not
only use multiple queues, but also receive and deliver to user space
more than 5Msps per queue (Xeon E5-2630 v4 2.20GHz, no retpolines).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In preparation for SKB-less perf event handling make
nfp_bpf_event_output() take buffer address and length,
not SKB as parameters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Port id 0xffffffff is reserved for control messages. Allow reception
of messages with this id on data queues. Hand off a raw buffer to
the higher layer code, without allocating SKB for max efficiency.
The RX handle can't modify or keep the buffer, after it returns
buffer is handed back over to the NIC RX free buffer list.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Representor packets are received on PF queues with special metadata tag
for demux. There is no reason to resolve the representor ID -> netdev
after the skb has been allocated. Move the code, this will allow us to
handle special FW messages without SKB allocation overhead.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use array_size() and store the size as full size_t to protect from
theoretical size overflow when handling HW descriptor rings.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7f1c684a89 ("nfp: setup xdp_rxq_info") mixed the cache
cold and cache hot data in the nfp_net_rx_ring structure (ignoring
the feedback), to try to fit the structure into 2 cache lines
after struct xdp_rxq_info was added. Now that we are about to add
a new field the structure will grow back to 3 cache lines, so
order the members correctly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kvcalloc() instead of tmp variable + kzalloc() when allocating
SW buffer information to allow falling back to vmalloc and to protect
from theoretical integer overflow.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On machines with buggy ACPI tables or when SR-IOV is already enabled
we may not be able to set the SR-IOV VF limit in sysfs, it's not fatal
because the limit is imposed by the driver anyway. Only the sysfs
'sriov_totalvfs' attribute will be too high. Print an error to inform
user about the failure but allow probe to continue.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After device is stopped we reset the rings by moving all free buffers
to positions [0, cnt - 2], and clear the position cnt - 1 in the ring.
We then proceed to clear the read/write pointers. This means that if
we try to reset the ring again the code will assume that the next to
fill buffer is at position 0 and swap it with cnt - 1. Since we
previously cleared position cnt - 1 it will lead to leaking the first
buffer and leaving ring in a bad state.
This scenario can only happen if FW communication fails, in which case
the ring will never be used again, so the fact it's in a bad state will
not be noticed. Buffer leak is the only problem. Don't try to move
buffers in the ring if the read/write pointers indicate the ring was
never used or have already been reset.
nfp_net_clear_config_and_disable() is now fully idempotent.
Found by code inspection, FW communication failures are very rare,
and reconfiguring a live device is not common either, so it's unlikely
anyone has ever noticed the leak.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have offload replay infrastructure added by
commit 326367427c ("net: sched: call reoffload op on block callback reg")
and flows are guaranteed to be removed correctly, we can revert
commit 951a8ee6de ("nfp: reject binding to shared blocks").
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously only the neighbour state was checked to decide if an offloaded
entry should be removed. However, there can be situations when the entry
is dead but still marked as valid. This can lead to dead entries not
being removed from fw tables or even incorrect data being added.
Check the entry dead bit before deciding if it should be added to or
removed from fw neighbour tables.
Fixes: 8e6a9046b6 ("nfp: flower vxlan neighbour offload")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-20
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add sharing of BPF objects within one ASIC: this allows for reuse of
the same program on multiple ports of a device, and therefore gains
better code store utilization. On top of that, this now also enables
sharing of maps between programs attached to different ports of a
device, from Jakub.
2) Cleanup in libbpf and bpftool's Makefile to reduce unneeded feature
detections and unused variable exports, also from Jakub.
3) First batch of RCU annotation fixes in prog array handling, i.e.
there are several __rcu markers which are not correct as well as
some of the RCU handling, from Roman.
4) Two fixes in BPF sample files related to checking of the prog_cnt
upper limit from sample loader, from Dan.
5) Minor cleanup in sockmap to remove a set but not used variable,
from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow program sharing between netdevs of the same NFP ASIC.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Create a higher-level entity to represent a device/ASIC to allow
programs and maps to be shared between device ports. The extra
work is required to make sure we don't destroy BPF objects as
soon as the netdev for which they were loaded gets destroyed,
as other ports may still be using them. When netdev goes away
all of its BPF objects will be moved to other netdevs of the
device, and only destroyed when last netdev is unregistered.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently we have two lists of offloaded objects - programs and maps.
Netdevice unregister notifier scans those lists to orphan objects
associated with device being unregistered. This puts unnecessary
(even if negligible) burden on all netdev unregister calls in BPF-
-enabled kernel. The lists of objects may potentially get long
making the linear scan even more problematic. There haven't been
complaints about this mechanisms so far, but it is suboptimal.
Instead of relying on notifiers, make the few BPF-capable drivers
register explicitly for BPF offloads. The programs and maps will
now be collected per-device not on a global list, and only scanned
for removal when driver unregisters from BPF offloads.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF code should unregister the offload capabilities from .ndo_uninit(),
to make sure the operation is atomic with unlist_netdevice(). Plumb
the init/uninit NDOs for vNICs and representors.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-15
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various different arm32 JIT improvements in order to optimize code emission
and make the JIT code itself more robust, from Russell.
2) Support simultaneous driver and offloaded XDP in order to allow for advanced
use-cases where some work is offloaded to the NIC and some to the host. Also
add ability for bpftool to load programs and maps beyond just the cgroup case,
from Jakub.
3) Add BPF JIT support in nfp for multiplication as well as division. For the
latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong.
4) Add BTF pretty print functionality to bpftool in plain and JSON output
format, from Okash.
5) Add build and installation to the BPF helper man page into bpftool, from Quentin.
6) Add a TCP BPF callback for listening sockets which is triggered right after
the socket transitions to TCP_LISTEN state, from Andrey.
7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup
tree and prints all attached programs, from Roman.
8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged
packets, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Split handling of offloaded and driver programs completely. Since
offloaded programs always come with XDP_FLAGS_HW_MODE set in reality
there could be no sharing, anyway, programs would only be installed
in driver or in hardware. Splitting the handling allows us to install
programs in HW and in driver at the same time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Split the query of HW-attached program from the software one.
Introduce new .ndo_bpf command to query HW-attached program.
This will allow drivers to install different programs in HW
and SW at the same time. Netlink can now also carry multiple
programs on dump (in which case mode will be set to
XDP_ATTACHED_MULTI and user has to check per-attachment point
attributes, IFLA_XDP_PROG_ID will not be present). We reuse
IFLA_XDP_PROG_ID skb space for second mode, so rtnl_xdp_size()
doesn't need to be updated.
Note that the installation side is still not there, since all
drivers currently reject installing more than one program at
the time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Basic operations drivers perform during xdp setup and query can
be moved to helpers in the core. Encapsulate program and flags
into a structure and add helpers. Note that the structure is
intended as the "main" program information source in the driver.
Most drivers will additionally place the program pointer in their
fast path or ring structures.
The helpers don't have a huge impact now, but they will
decrease the code duplication when programs can be installed
in HW and driver at the same time. Encapsulating the basic
operations in helpers will hopefully also reduce the number
of changes to drivers which adopt them.
Helpers could really be static inline, but they depend on
definition of struct netdev_bpf which means they'd have
to be placed in netdevice.h, an already 4500 line header.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
prog_attached of struct netdev_bpf should have been superseded
by simply setting prog_id long time ago, but we kept it around
to allow offloading drivers to communicate attachment mode (drv
vs hw). Subsequently drivers were also allowed to report back
attachment flags (prog_flags), and since nowadays only programs
attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell
the attachment mode from the flags driver reports. Remove
prog_attached member.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
getnstimeofday64 is deprecated in favor of the ktime_get() family of
functions. The direct replacement would be ktime_get_real_ts64(),
but I'm picking the basic ktime_get() instead:
- using a ktime_t simplifies the code compared to timespec64
- using monotonic time instead of real time avoids issues caused
by a concurrent settimeofday() or during a leap second adjustment.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAltBPacUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwx+hAAvzTP6o3VOtgNK7lm3nOBuzfykgCv
TFhXP2yeDItWDBLDpWX7wjs2657W3Sjrpw6FyVIYvsoMKKRuOYeZ6ChDieG5ZgTj
oxav5U6TlDHoF0fNq0LWfv78lP6+++7/6yaer6j9xDksVqE4/zxlFcExxuszhZlC
8ptJ54ORn92RdfRHCDptA4PFReNlQWNw3bpKpGxu8xj0TN/sYPN3ggHfnmiEyGMZ
8/KLNOzGrhoqztaBPyRoG3GhU1hpicT+fWzg11Li8hP1solQDht+S3mnCC1IxqdI
JSU7/jhti4nUV56qE7QzYzdsVbTFcItGn0WImklIFCt7htp4Rms0wN+QCmwhtjKM
0WH02gRDip4CcYcPZGeGaeexWJFEScamFK1yxxDV7059KsoQKUN1Sm+R7y9xORYw
nQAHPTO/nv02Xo+ADAYzV0aBPD7fEvFaWtdXAuLocVWVj3eiEqp8ftoYmuWym6T3
gHWt9Dod8olj3t9bkR1MWZ121Ar5MsobgJSF30smEx8VMKv+4Xx8eXvq0FCuUQwT
s/WLTPqK31Sqjii0xe1wO8g0yexCVaBpXJB/e9PpYf/+ICItG1DHLz0Ygw01QWVK
Tv7HTAofdO42kChHjErB6GbzSuxDxSVCPN26bwkZu0eV91uKiLKA7wLUv8+qSXlZ
lK98Eypy0Ti7pvA=
=IOF/
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix a use-after-free in the endpoint code (Dan Carpenter)
- Stop defaulting CONFIG_PCIE_DW_PLAT_HOST to yes (Geert Uytterhoeven)
- Fix an nfp regression caused by a change in how we limit the number
of VFs we can enable (Jakub Kicinski)
- Fix failure path cleanup issues in the new R-Car gen3 PHY support
(Marek Vasut)
- Fix leaks of OF nodes in faraday, xilinx-nwl, xilinx (Nicholas Mc
Guire)
* tag 'pci-v4.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
nfp: stop limiting VFs to 0
PCI/IOV: Reset total_VFs limit after detaching PF driver
PCI: faraday: Add missing of_node_put()
PCI: xilinx-nwl: Add missing of_node_put()
PCI: xilinx: Add missing of_node_put()
PCI: endpoint: Use after free in pci_epf_unregister_driver()
PCI: controller: dwc: Do not let PCIE_DW_PLAT_HOST default to yes
PCI: rcar: Clean up PHY init on failure
PCI: rcar: Shut the PHY down in failpath
As we are doing JIT, we would want to use the advanced version of the
reciprocal divide (reciprocal_value_adv) to trade performance with host.
We could reduce the required ALU instructions from 4 to 2 or 1.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
NFP doesn't have integer divide instruction, this patch use reciprocal
algorithm (the basic one, reciprocal_div) to emulate it.
For each u32 divide, we would need 11 instructions to finish the operation.
7 (for multiplication) + 4 (various ALUs) = 11
Given NFP only supports multiplication no bigger than u32, we'd require
divisor and dividend no bigger than that as well.
Also eBPF doesn't support signed divide and has enforced this on C language
level by failing compilation. However LLVM assembler hasn't enforced this,
so it is possible for negative constant to leak in as a BPF_K operand
through assembly code, we reject such cases as well.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
step, therefore we need 2 steps for u16 and 4 steps for u32.
We also need one start instruction to initialize the sequence and one or
two instructions to fetch the result depending on either you need the high
halve of u32 multiplication.
For ALU64, if either operand is beyond u32's value range, we reject it. One
thing to note, if the source operand is BPF_K, then we need to check "imm"
field directly, and we'd reject it if it is negative. Because for ALU64,
"imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
doesn't support. For ALU32, it is fine for "imm" be negative though,
because the result is 32-bits and here is no difference on the low halve
of result for signed/unsigned mul, so we will get correct result.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
NFP verifier hook is coping range information of the shift amount for
indirect shift operation so optimized shift sequences could be generated.
We want to use range info to do more things. For example, to decide whether
multiplication and divide are supported on the given range.
This patch simply let NFP verifier hook to copy range info for all operands
of all ALU operands.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The two fields are a copy of umin and umax info of bpf_insn->src_reg
generated by verifier.
Rename to make their meaning clear.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-03
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various improvements to bpftool and libbpf, that is, bpftool build
speed improvements, missing BPF program types added for detection
by section name, ability to load programs from '.text' section is
made to work again, and better bash completion handling, from Jakub.
2) Improvements to nfp JIT's map read handling which allows for optimizing
memcpy from map to packet, from Jiong.
3) New BPF sample is added which demonstrates XDP in combination with
bpf_perf_event_output() helper to sample packets on all CPUs, from Toke.
4) Add a new BPF kselftest case for tracking connect(2) BPF hooks
infrastructure in combination with TFO, from Andrey.
5) Extend the XDP/BPF xdp_rxq_info sample code with a cmdline option to
read payload from packet data in order to use it for benchmarking.
Also for '--action XDP_TX' option implement swapping of MAC addresses
to avoid drops on some hardware seen during testing, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simple overlapping changes in stmmac driver.
Adjust skb_gro_flush_final_remcsum function signature to make GRO list
changes in net-next, as per Stephen Rothwell's example merge
resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-07-01
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) A bpf_fib_lookup() helper fix to change the API before freeze to
return an encoding of the FIB lookup result and return the nexthop
device index in the params struct (instead of device index as return
code that we had before), from David.
2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
reject progs when set_memory_*() fails since it could still be RO.
Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
an issue, and a memory leak in s390 JIT found during review, from
Daniel.
3) Multiple fixes for sockmap/hash to address most of the syzkaller
triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
a missing sock_map_release() routine to hook up to callbacks, and a
fix for an omitted bucket lock in sock_close(), from John.
4) Two bpftool fixes to remove duplicated error message on program load,
and another one to close the libbpf object after program load. One
additional fix for nfp driver's BPF offload to avoid stopping offload
completely if replace of program failed, from Jakub.
5) Couple of BPF selftest fixes that bail out in some of the test
scripts if the user does not have the right privileges, from Jeffrin.
6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
where we need to set the flag that some of the test cases are expected
to fail, from Kleber.
7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
since it has no relation to it and lirc2 users often have configs
without cgroups enabled and thus would not be able to use it, from Sean.
8) Fix a selftest failure in sockmap by removing a useless setrlimit()
call that would set a too low limit where at the same time we are
already including bpf_rlimit.h that does the job, from Yonghong.
9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the NFP fw only supports L3/L4 hashing so rejects the offload of
filters that output to LAG ports implementing other hash algorithms. Team,
however, uses a BPF function for the hash that is not defined. To support
Team offload, accept hashes that are defined as 'unknown' (only Team
defines such hash types). In this case, use the NFP default of L3/L4
hashing for egress port selection.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the tos and the tunnel flags from the tunnel key and offload these
action fields. Only the checksum and tunnel key flags are implemented in
fw so reject offloads of other flags. The tunnel key flag is always
considered set in the fw so enforce that it is set in the rule. Note that
the compulsory setting of the tunnel key flag and optional setting of
checksum is inline with how tc currently generates ipv4 udp tunnel
actions.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the ttl for ipv4 udp tunnels was set to the namespace default.
Modify this to attempt to extract the ttl from a full route lookup on the
tunnel destination. If this is not possible then resort to the default.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware will automatically update csum in headers when a set action has
been performed. This means we could in the driver ignore the explicit
checksum action when performing a set action.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to leave bus-info in ethtool driver info empty for
representors in case multi-PCIe-to-single-host cards make
the association between PCIe device and NFP many to one.
It seems these attempts are futile, we need to link the
representors to one PCIe device in sysfs to get consistent
naming, plus devlink uses one PCIe as a handle, anyway.
The multi-PCIe-to-single-system support won't be clean,
if it ever comes.
Turns out some user space (RHEL tests) likes to read bus-info
so just populate it.
While at it remove unnecessary app NULL-check, representors
are spawned by an app, so it must exist.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use napi_consume_skb() in nfp_net_tx_complete() to get bulk free.
Pass 0 as budget for ctrl queue completion since it runs out of
a tasklet.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP NAPI handling will only complete the TXed packets when called
with budget of 0, implement ndo_poll_controller by scheduling NAPI
on all TX queues.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some platforms with broken ACPI tables we may not have access
to the Serial Number PCIe capability. This capability is crucial
for us for switchdev operation as we use serial number as switch ID,
and for communication with management FW where interface ID is used.
If we can't determine the Serial Number we have to fail device probe.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After user changes the ring count statistics for deactivated
rings disappear from ethtool -S output. This causes loss of
information to the user and means that ethtool stats may not
add up to interface stats. Always expose counters from all
the rings. Note that we allocate at most num_possible_cpus()
rings so number of rings should be reasonable.
The alternative of only listing stats for rings which were
ever in use could be confusing.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before 8d85a7a4f2 ("PCI/IOV: Allow PF drivers to limit total_VFs to 0"),
pci_sriov_set_totalvfs(pdev, 0) meant "we can enable TotalVFs virtual
functions". After 8d85a7a4f2, it means "we can't enable *any* VFs".
That broke this scenario where nfp intends to remove any limit on the
number of VFs that can be enabled:
nfp_pci_probe
nfp_pcie_sriov_read_nfd_limit
nfp_rtsym_read_le("nfd_vf_cfg_max_vfs", &err)
pci_sriov_set_totalvfs(pf->pdev, 0) # if FW didn't expose a limit
...
# userspace writes N to sysfs "sriov_numvfs":
sriov_numvfs_store
pci_sriov_get_totalvfs # now returns 0
return -ERANGE
Prior to 8d85a7a4f2, pci_sriov_get_totalvfs() returned TotalVFs, but it
now returns 0.
Remove the pci_sriov_set_totalvfs(pdev, 0) calls so we don't limit the
number of VFs that can be enabled.
Fixes: 8d85a7a4f2 ("PCI/IOV: Allow PF drivers to limit total_VFs to 0")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Map read has been supported on NFP, this patch enables optimization
for memcpy from map to packet.
This patch also fixed one latent bug which will cause copying from
unexpected address once memcpy for map pointer enabled. The fixed
code path was not exercised before.
Reported-by: Mary Pham <mary.pham@netronome.com>
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
sizeof() will return unsigned value so in the error check
negative error code will be always larger than sizeof().
Fixes: a0d8e02c35 ("nfp: add support for reading nffw info")
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC shared blocks allow multiple qdiscs to be grouped together and filters
shared between them. Currently the chains of filters attached to a block
are only flushed when the block is removed. If a qdisc is removed from a
block but the block still exists, flow del messages are not passed to the
callback registered for that qdisc. For the NFP, this presents the
possibility of rules still existing in hw when they should be removed.
Prevent binding to shared blocks until the kernel can send per qdisc del
messages when block unbinds occur.
tcf_block_shared() was not used outside of the core until now, so also
add an empty implementation for builds with CONFIG_NET_CLS=n.
Fixes: 4861738775 ("net: sched: introduce shared filter blocks infrastructure")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously it was not possible to distinguish between mpls ether types and
other ether types. This leads to incorrect classification of offloaded
filters that match on mpls ether type. For example the following two
filters overlap:
# tc filter add dev eth0 parent ffff: \
protocol 0x8847 flower \
action mirred egress redirect dev eth1
# tc filter add dev eth0 parent ffff: \
protocol 0x0800 flower \
action mirred egress redirect dev eth2
The driver now correctly includes the mac_mpls layer where HW stores mpls
fields, when it detects an mpls ether type. It also sets the MPLS_Q bit to
indicate that the filter should match mpls packets.
Fixes: bb055c198d ("nfp: add mpls match offloading support")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the extact struct from a tc qdisc add to the block bind function and,
in turn, to the setup_tc ndo of binding device via the tc_block_offload
struct. Pass this back to any block callback registrations to allow
netlink logging of fails in the bind process.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stopping offload completely if replace of program failed dates
back to days of transparent offload. Back then we wanted to
silently fall back to the in-driver processing. Today we mark
programs for offload when they are loaded into the kernel, so
the transparent offload is no longer a reality.
Flags check in the driver will only allow replace of a driver
program with another driver program or an offload program with
another offload program.
When driver program is replaced stopping offload is a no-op,
because driver program isn't offloaded. When replacing
offloaded program if the offload fails the entire operation
will fail all the way back to user space and we should continue
using the old program. IOW when replacing a driver program
stopping offload is unnecessary and when replacing offloaded
program - it's a bug, old program should continue to run.
In practice this bug would mean that if offload operation was to
fail (either due to FW communication error, kernel OOM or new
program being offloaded but for a different netdev) driver
would continue reporting that previous XDP program is offloaded
but in fact no program will be loaded in hardware. The failure
is fairly unlikely (found by inspection, when working on the code)
but it's unpleasant.
Backport note: even though the bug was introduced in commit
cafa92ac25 ("nfp: bpf: add support for XDP_FLAGS_HW_MODE"),
this fix depends on commit 441a33031f ("net: xdp: don't allow
device-bound programs in driver mode"), so this fix is sufficient
only in v4.15 or newer. Kernels v4.13.x and v4.14.x do need to
stop offload if it was transparent/opportunistic, i.e. if
XDP_FLAGS_HW_MODE was not set on running program.
Fixes: cafa92ac25 ("nfp: bpf: add support for XDP_FLAGS_HW_MODE")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently the default case is not handled, which with future command
introductions would introduce a warning. So handle it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Various netfilter fixlets from Pablo and the netfilter team.
2) Fix regression in IPVS caused by lack of PMTU exceptions on local
routes in ipv6, from Julian Anastasov.
3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.
4) Don't crash on poll in TLS, from Daniel Borkmann.
5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
including Avahi mDNS. From Bart Van Assche.
6) Missing of_node_put in qcom/emac driver, from Yue Haibing.
7) We lack checking of the TCP checking in one special case during SYN
receive, from Frank van der Linden.
8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.
9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.
10) Must grab HW caps before doing quirk checks in stmmac driver, from
Jose Abreu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
net: stmmac: Run HWIF Quirks after getting HW caps
neighbour: skip NTF_EXT_LEARNED entries during forced gc
net: cxgb3: add error handling for sysfs_create_group
tls: fix waitall behavior in tls_sw_recvmsg
tls: fix use-after-free in tls_push_record
l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
mlxsw: spectrum_switchdev: Fix port_vlan refcounting
mlxsw: spectrum_router: Align with new route replace logic
mlxsw: spectrum_router: Allow appending to dev-only routes
ipv6: Only emit append events for appended routes
stmmac: added support for 802.1ad vlan stripping
cfg80211: fix rcu in cfg80211_unregister_wdev
mac80211: Move up init of TXQs
mac80211_hwsim: fix module init error paths
cfg80211: initialize sinfo in cfg80211_get_station
nl80211: fix some kernel doc tag mistakes
hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
rds: avoid unenecessary cong_update in loop transport
l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
...
We need to release the refcnt on dst_entry in the route table, otherwise
we will leak the route.
Fixes: 8e6a9046b6 ("nfp: flower vxlan neighbour offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.ndo_get_phys_port_name was recently extended to support multi-vNIC
FWs. These are firmwares which can have more than one vNIC per PF
without associated port (e.g. Adaptive Buffer Management FW), therefore
we need a way of distinguishing the vNICs. Unfortunately, it's too
late to make flower use the same naming. Flower users may depend on
.ndo_get_phys_port_name returning -EOPNOTSUPP, for example the name
udev gave the PF vNIC was just the bare PCI device-based name before
the change, and will have 'nn0' appended after.
To ensure flower's vNIC doesn't have phys_port_name attribute, add
a flag to vNIC struct and set it in flower code. New projects will
not set the flag adhere to the naming scheme from the start.
Fixes: 51c1df83e3 ("nfp: assign vNIC id as phys_port_name of vNICs which are not ports")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are gathering software statistics on per-ring basis.
.ndo_get_stats64 handler adds the rings up. Unfortunately
we are currently only adding up active rings, which means
that if user decreases the number of active rings the
statistics from deactivated rings will no longer be counted
and total interface statistics may go backwards.
Always sum all possible rings, the stats are allocated
statically for max number of rings, so we don't have to
worry about them being removed. We could add the stats
up when user changes the ring count, but it seems unnecessary..
Adding up inactive rings will be very quick since no datapath
will be touching them.
Fixes: 164d1e9e5d ("nfp: add support for ethtool .set_channels")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once upon a time nfp_cpp_resource_find() took a name parameter,
which could be any user-chosen string. Resources are identified
by a CRC32 hash of a 8 byte string, so we had to pad user input
with zeros to make sure CRC32 gave the correct result.
Since then nfp_cpp_resource_find() was made to operate on allocated
resources only (struct nfp_resource). We kzalloc those so there is
no need to pad the strings and use memcmp.
This avoids a GCC 8 stringop-truncation warning:
In function ‘nfp_cpp_resource_find’,
inlined from ‘nfp_resource_try_acquire’ at .../nfpcore/nfp_resource.c:153:8,
inlined from ‘nfp_resource_acquire’ at .../nfpcore/nfp_resource.c:206:9:
.../nfpcore/nfp_resource.c:108:2: warning: strncpy’ output may be truncated copying 8 bytes from a string of length 8 [-Wstringop-truncation]
strncpy(name_pad, res->name, sizeof(name_pad));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add extack argument to reload, port_split and port_unsplit operations.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the stat diff to make sure MQ stats add up to child stats.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>