mirror_ubuntu-kernels

mirror of https://git.proxmox.com/git/mirror_ubuntu-kernels.git synced 2025-12-24 23:24:31 +00:00

Author	SHA1	Message	Date
Colin Ian King	fb0a1dacf2	mlxsw: spectrum_router: remove redundant continue statement The continue statement at the end of a for-loop has no effect, remove it. Addresses-Coverity: ("Continue has no effect") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:46:21 -07:00
Louis Peens	30c4a9f4fe	nfp: flower-ct: implement action_merge check Fill in code stub to check that the flow actions are valid for merge. The actions of the flow X should not conflict with the matches of flow X+1. For now this check is quite strict and set_actions are very limited, will need to update this when NAT support is added. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:53 -07:00
Louis Peens	5e5f08168d	nfp: flower-ct: fill ct metadata check function Fill in check_meta stub to check that ct_metadata action fields in the nft flow matches the ct_match data of the post_ct flow. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:53 -07:00
Louis Peens	c698e2adcc	nfp: flower-ct: fill in ct merge check function Replace merge check stub code with the actual implementation. This checks that the match parts of two tc flows does not conflict. Only overlapping keys needs to be checked, and only the narrowest masked parts needs to be checked, so each key is masked with the AND'd result of both masks before comparing. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:52 -07:00
Louis Peens	a6ffdd3a0e	nfp: flower-ct: implement code to save merge of tc and nft flows Add in the code to merge the tc_merge objects with the flows received from nft. At the moment flows are just merged blindly as the validity check functions are stubbed out, this will be populated in follow-up patches. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:52 -07:00
Louis Peens	b5e30c61d8	nfp: flower-ct: add nft_merge table Add table and struct to save the result of the three-way merge between pre_ct,post_ct, and nft flows. Merging code is to be added in follow-up patches. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:52 -07:00
Yinjun Zhang	4772ad3f58	nfp: flower-ct: make a full copy of the rule when it is a NFT flow The nft flow will be destroyed after offload cb returns. This means we need save a full copy of it since it can be referenced through other paths other than just the offload cb, for example when a new pre_ct or post_ct entry is added, and it needs to be merged with an existing nft entry. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:52 -07:00
Louis Peens	95255017e0	nfp: flower-ct: add nft flows to nft list Implement code to add and remove nft flows to the relevant list. Registering and deregistering the callback function for the nft table is quite complicated. The safest is to delete the callback on the removal of the last pre_ct flow. This is because if this is also the latest pre_ct flow in software it means that this specific nft table will be freed, so there will not be a later opportunity to do this. Another place where it looks possible to delete the callback is when the last nft_flow is deleted, but this happens under the flow_table lock, which is also taken when deregistering the callback, leading to a deadlock situation. This means the final solution here is to delete the callback when removing the last pre_ct flow, and then clean up any remaining nft_flow entries which may still be present, since there will never be a callback now to do this, leaving them orphaned if not cleaned up here as well. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:52 -07:00
Louis Peens	62268e7814	nfp: flower-ct: add nft callback stubs Add register/unregister of the nft callback. For now just add stub code to accept the flows, but don't do anything with it. Decided to accept the flows since netfilter will keep on trying to offload a flow if it was rejected, which is quite noisy. Follow-up patches will start implementing the functions to add nft flows to the relevant tables. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:52 -07:00
Louis Peens	d33d24a7b4	nfp: flower-ct: add delete flow handling for ct Add functions to handle delete flow callbacks for ct flows. Also accept the flows for offloading by returning 0 instead of -EOPNOTSUPP. Flows will still not actually be offloaded to hw, but at this point it's difficult to not accept the flows and also exercise the cleanup paths properly. Traffic will still be handled safely through the fallback path. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:42:52 -07:00
Subash Abhinov Kasiviswanathan	56a967c4f7	net: qualcomm: rmnet: Remove some unneeded casts Remove the explicit casts in the checksum complement functions and pass the actual protocol specific headers instead. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:19:50 -07:00
Bjorn Andersson	d917c35a45	net: qualcomm: rmnet: Allow partial updates of IFLA_FLAGS The idiomatic way to handle the changelink flags/mask pair seems to be allow partial updates of the driver's link flags. In contrast the rmnet driver masks the incoming flags and then use that as the new flags. Change the rmnet driver to follow the common scheme, before the introduction of IFLA_RMNET_FLAGS handling in iproute2 et al. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:16:22 -07:00
Wei Yongjun	61273f9d83	net: stmmac: Fix error return code in ingenic_mac_probe() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `2bb4b98b60` ("net: stmmac: Add Ingenic SoCs MAC support.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:02:38 -07:00
Yang Yingliang	c765449591	net: chelsio: cxgb4: use eth_zero_addr() to assign zero address Using eth_zero_addr() to assign zero address insetad of inefficient copy from an array. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:53:17 -07:00
Wang Hai	56b57b809f	qlcnic: Use list_for_each_entry() to simplify code in qlcnic_main.c Convert list_for_each() to list_for_each_entry() where applicable. This simplifies the code. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:44:54 -07:00
Yunsheng Lin	99f6b5fb5f	net: hns3: use bounce buffer when rx page can not be reused Currently rx page will be reused to receive future packet when the stack releases the previous skb quickly. If the old page can not be reused, a new page will be allocated and mapped, which comsumes a lot of cpu when IOMMU is in the strict mode, especially when the application and irq/NAPI happens to run on the same cpu. So allocate a new frag to memcpy the data to avoid the costly IOMMU unmapping/mapping operation, and add "frag_alloc_err" and "frag_alloc" stats in "ethtool -S ethX" cmd. The throughput improves above 50% when running single thread of iperf using TCP when IOMMU is in strict mode and iperf shares the same cpu with irq/NAPI(rx_copybreak = 2048 and mtu = 1500). Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:36:06 -07:00
Yunsheng Lin	fa7711b888	net: hns3: optimize the rx page reuse handling process Current rx page offset only reset to zero when all the below conditions are satisfied: 1. rx page is only owned by driver. 2. rx page is reusable. 3. the page offset that is above to be given to the stack has reached the end of the page. If the page offset is over the hns3_buf_size(), it means the buffer below the offset of the page is usable when the above condition 1 & 2 are satisfied, so page offset can be reset to zero instead of increasing the offset. We may be able to always reuse the first 4K buffer of a 64K page, which means we can limit the hot buffer size as much as possible. The above optimization is a side effect when refacting the rx page reuse handling in order to support the rx copybreak. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:36:06 -07:00
Yunsheng Lin	7459775e9f	net: hns3: support dma_map_sg() for multi frags skb Using the queue based tx buffer, it is also possible to allocate a sgl buffer, and use skb_to_sgvec() to convert the skb to the sgvec in order to support the dma_map_sg() to decreases the overhead of IOMMU mapping and unmapping. Firstly, it reduces the number of buffers. For example, a tcp skb may have a 66-byte header and 3 fragments of 4328, 32768, and 28064 bytes. With this patch, dma_map_sg() will combine them into two buffers, 66-bytes header and one 65160-bytes fragment by using IOMMU. Secondly, it reduces the number of dma mapping and unmapping. All the original 4 buffers are mapped only once rather than 4 times. The throughput improves above 10% when running single thread of iperf using TCP when IOMMU is in strict mode. Suggested-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:36:06 -07:00
Huazhong Tan	1a00197b7d	net: hns3: add support to query tx spare buffer size for pf Add support to query tx spare buffer size from configuration file, and use this info to do spare buffer initialization when the module parameter 'tx_spare_buf_size' is not specified. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:36:06 -07:00
Yunsheng Lin	907676b130	net: hns3: use tx bounce buffer for small packets when the packet or frag size is small, it causes both security and performance issue. As dma can't map sub-page, this means some extra kernel data is visible to devices. On the other hand, the overhead of dma map and unmap is huge when IOMMU is on. So add a queue based tx shared bounce buffer to memcpy the small packet when the len of the xmitted skb is below tx_copybreak. Add tx_spare_buf_size module param to set the size of tx spare buffer, and add set/get_tunable to set or query the tx_copybreak. The throughtput improves from 30 Gbps to 90+ Gbps when running 16 netperf threads with 32KB UDP message size when IOMMU is in the strict mode(tx_copybreak = 2000 and mtu = 1500). Suggested-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:36:06 -07:00
Yunsheng Lin	8677d78c3d	net: hns3: refactor for hns3_fill_desc() function Factor out hns3_fill_desc() so that it can be reused in the tx bounce supporting. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:36:06 -07:00
Yunsheng Lin	26f1ccdf60	net: hns3: minor refactor related to desc_cb handling desc_cb is used to store mapping and freeing info for the corresponding desc, which is used in the cleaning process. There will be more desc_cb type coming up when supporting the tx bounce buffer, change desc_cb type to bit-wise value in order to reduce the desc_cb type checking operation in the data path. Also move the desc_cb type definition to hns3_enet.h because it is only used in hns3_enet.c, and declare a local variable desc_cb in hns3_clear_desc() to reduce lines of code. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 00:36:06 -07:00
Lorenzo Bianconi	a078d981f8	net: ti: add pp skb recycling support As already done for mvneta and mvpp2, enable skb recycling for ti ethernet drivers ti driver on net-next: ---------------------- [perf top] 47.15% [kernel] [k] _raw_spin_unlock_irqrestore 11.77% [kernel] [k] __cpdma_chan_free 3.16% [kernel] [k] ___bpf_prog_run 2.52% [kernel] [k] cpsw_rx_vlan_encap 2.34% [kernel] [k] __netif_receive_skb_core 2.27% [kernel] [k] free_unref_page 2.26% [kernel] [k] kmem_cache_free 2.24% [kernel] [k] kmem_cache_alloc 1.69% [kernel] [k] __softirqentry_text_start 1.61% [kernel] [k] cpsw_rx_handler 1.19% [kernel] [k] page_pool_release_page 1.19% [kernel] [k] clear_bits_ll 1.15% [kernel] [k] page_frag_free 1.06% [kernel] [k] __dma_page_dev_to_cpu 0.99% [kernel] [k] memset 0.94% [kernel] [k] __alloc_pages_bulk 0.92% [kernel] [k] kfree_skb 0.85% [kernel] [k] packet_rcv 0.78% [kernel] [k] page_address 0.75% [kernel] [k] v7_dma_inv_range 0.71% [kernel] [k] __lock_text_start [iperf3 tcp] [ 5] 0.00-10.00 sec 873 MBytes 732 Mbits/sec 0 sender [ 5] 0.00-10.01 sec 866 MBytes 726 Mbits/sec receiver ti + skb recycling: ------------------- [perf top] 40.58% [kernel] [k] _raw_spin_unlock_irqrestore 16.18% [kernel] [k] __softirqentry_text_start 10.33% [kernel] [k] __cpdma_chan_free 2.62% [kernel] [k] ___bpf_prog_run 2.05% [kernel] [k] cpsw_rx_vlan_encap 2.00% [kernel] [k] kmem_cache_alloc 1.86% [kernel] [k] __netif_receive_skb_core 1.80% [kernel] [k] kmem_cache_free 1.63% [kernel] [k] cpsw_rx_handler 1.12% [kernel] [k] cpsw_rx_mq_poll 1.11% [kernel] [k] page_pool_put_page 1.04% [kernel] [k] _raw_spin_unlock 0.97% [kernel] [k] clear_bits_ll 0.90% [kernel] [k] packet_rcv 0.88% [kernel] [k] __dma_page_dev_to_cpu 0.85% [kernel] [k] kfree_skb 0.80% [kernel] [k] memset 0.71% [kernel] [k] __lock_text_start 0.66% [kernel] [k] v7_dma_inv_range 0.64% [kernel] [k] gen_pool_free_owner [iperf3 tcp] [ 5] 0.00-10.00 sec 884 MBytes 742 Mbits/sec 0 sender [ 5] 0.00-10.01 sec 878 MBytes 735 Mbits/sec receiver Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:50:43 -07:00
Colin Ian King	f25dcde974	octeontx2-pf: Fix spelling mistake "morethan" -> "more than" There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:29:58 -07:00
David S. Miller	f0c227c7df	mlx5-updates-2021-06-14 1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature - First 3 patches 2) Scalable IRQ distriburion for Sub-functions A subfunction (SF) is a lightweight function that has a parent PCI function (PF) on which it is deployed. Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their parent PCI function. Before this series the PF allocates enough IRQs to cover all the cores in a system, Newly created SFs will re-use all the IRQs that the PF has allocated for itself. Hence, the more SFs are created, there are more EQs per IRQs. Therefore, whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs instead of PF EQs without SFs on the system. This leads to a hard impact on the performance of SFs and PF. For example, on machine with: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores. PCI Express 3 with BW of 126 Gb/s. ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16. test case: iperf TX BW single CPU, affinity of app and IRQ are the same. PF only: no SFs on the system, 56 IRQs. SF (before), 250 SFs Sharing the same 56 IRQs . SF (now), 250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below). application SF-IRQ channel BW(Gb/sec) interrupts/sec iperf TX affinity PF only cpu={0} cpu={0} cpu={0} 79 8200 SF (before) cpu={0} cpu={0} cpu={0} 51.3 (-35%) 9500 SF (now) cpu={0} cpu={0} cpu={0} 78 (-2%) 8200 command: $ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 \| grep SUM The different between the SF examples is that before this series we allocate num_cpus (56) IRQs, and all of them were shared among the PF and the SFs. And after this series, we allocate 255 IRQs, and we spread the SFs among the above IRQs. This have significantly decreased the load on each IRQ and the number of EQs per IRQ is down by 95% (251->11). In this patchset the solution proposed is to have a dedicated IRQ pool for SFs to use. the pool will allocate a large number of IRQs for SFs to grab from in order to minimize irq sharing between the different SFs. IRQs will not be requested from the OS until they are 1st requested by an SF consumer, and will be eventually released when the last SF consumer releases them. For the detailed IRQ spread and allocation scheme please see last patch: ("net/mlx5: Round-Robin EQs over IRQs") -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDIJUkACgkQSD+KveBX +j7tgQf+KtxzniuEY+JgbGWWyQvglx88S6WfhTOhZZllm2QXa2wWX24mz/AdYc0x QCT6yUzvaeaHPNpw/KwCw1IKpB9dlT+wIBD9NCEqtHqj+bVz+ioL/OlM5VJj+wC2 kp+EjYsQbwgZIM40JgLLu2uzLy/5w7a1v9Rj0l4mLRZqPmrqeKrIAsVkVutaxtPg PtECBag4XtYERMXOfKohnXanwjW6ZyYQ0Yal76jNqoXXgy5dHr/JJDZQZTDURt7S 3ex0gwTZwHfOLFQdRzD+U0kuC2/6sHMfeVrKO6QxuG/gihYe8FXEQ4qVSJmgXANP VH6n1Vk5IhaMzYKfGFb2OGOWanAVIA== =z0x7 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-06-14 1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature - First 3 patches 2) Scalable IRQ distriburion for Sub-functions A subfunction (SF) is a lightweight function that has a parent PCI function (PF) on which it is deployed. Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their parent PCI function. Before this series the PF allocates enough IRQs to cover all the cores in a system, Newly created SFs will re-use all the IRQs that the PF has allocated for itself. Hence, the more SFs are created, there are more EQs per IRQs. Therefore, whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs instead of PF EQs without SFs on the system. This leads to a hard impact on the performance of SFs and PF. For example, on machine with: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores. PCI Express 3 with BW of 126 Gb/s. ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16. test case: iperf TX BW single CPU, affinity of app and IRQ are the same. PF only: no SFs on the system, 56 IRQs. SF (before), 250 SFs Sharing the same 56 IRQs . SF (now), 250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below). application SF-IRQ channel BW(Gb/sec) interrupts/sec iperf TX affinity PF only cpu={0} cpu={0} cpu={0} 79 8200 SF (before) cpu={0} cpu={0} cpu={0} 51.3 (-35%) 9500 SF (now) cpu={0} cpu={0} cpu={0} 78 (-2%) 8200 command: $ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 \| grep SUM The different between the SF examples is that before this series we allocate num_cpus (56) IRQs, and all of them were shared among the PF and the SFs. And after this series, we allocate 255 IRQs, and we spread the SFs among the above IRQs. This have significantly decreased the load on each IRQ and the number of EQs per IRQ is down by 95% (251->11). In this patchset the solution proposed is to have a dedicated IRQ pool for SFs to use. the pool will allocate a large number of IRQs for SFs to grab from in order to minimize irq sharing between the different SFs. IRQs will not be requested from the OS until they are 1st requested by an SF consumer, and will be eventually released when the last SF consumer releases them. For the detailed IRQ spread and allocation scheme please see last patch: ("net/mlx5: Round-Robin EQs over IRQs") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:14:21 -07:00
Subbaraya Sundeep	68fbff68db	octeontx2-pf: Add police action for TC flower Added police action for ingress TC flower hardware offload. With this rate limiting can be done per flow. Since rate limiting is tied to RQs in hardware the number of TC flower filters with action as police is limited to number of receive queues of the interface. Both bps and pps modes are supported. Examples to rate limit a flow: $ ethtool -K eth0 hw-tc-offload on $ tc qdisc add dev eth0 ingress $ tc filter add dev eth0 parent ffff: protocol ip \ flower ip_proto udp dst_port 80 action \ police rate 100Mbit burst 32Kbit $ tc filter add dev eth0 parent ffff: \ protocol ip flower dst_mac 5e:b2:34:ee:29:49 \ action police pkts_rate 5000 pkts_burst 2048 Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Subbaraya Sundeep	5d2fdd86d5	octeontx2-pf: Use NL_SET_ERR_MSG_MOD for TC This patch modifies all netdev_err messages in tc code to NL_SET_ERR_MSG_MOD. NL_SET_ERR_MSG_MOD does not support format specifiers yet hence netdev_err messages with only strings are modified. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Sunil Goutham	2ca89a2c37	octeontx2-pf: TC_MATCHALL ingress ratelimiting offload Add TC_MATCHALL ingress ratelimiting offload support with POLICE action for entire traffic coming into the interface. Eg: To ratelimit ingress traffic to 100Mbps $ ethtool -K eth0 hw-tc-offload on $ tc qdisc add dev eth0 clsact $ tc filter add dev eth0 ingress matchall skip_sw \ action police rate 100Mbit burst 32Kbit To support this, a leaf level bandwidth profile is allocated and all RQs' contexts used by this interface are updated to point to it. And the leaf level bandwidth profile is configured with user specified rate and burst sizes. Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Sunil Goutham	e7d8971763	octeontx2-af: cn10k: Debugfs support for bandwidth profiles Added support for dumping current resource status of bandwidth profiles and contexts of allocated profiles via debugfs. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Sunil Goutham	e8e095b3b3	octeontx2-af: cn10k: Bandwidth profiles config support CN10K silicons supports hierarchial ingress packet ratelimiting. There are 3 levels of profilers supported leaf, mid and top. Ratelimiting is done after packet forwarding decision is taken and a NIXLF's RQ is identified to DMA the packet. RQ's context points to a leaf bandwidth profile which can be configured to achieve desired ratelimit. This patch adds logic for management of these bandwidth profiles ie profile alloc, free, context update etc. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Matteo Croce	a955318fe6	stmmac: align RX buffers On RX an SKB is allocated and the received buffer is copied into it. But on some architectures, the memcpy() needs the source and destination buffers to have the same alignment to be efficient. This is not our case, because SKB data pointer is misaligned by two bytes to compensate the ethernet header. Align the RX buffer the same way as the SKB one, so the copy is faster. An iperf3 RX test gives a decent improvement on a RISC-V machine: before: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 733 MBytes 615 Mbits/sec 88 sender [ 5] 0.00-10.01 sec 730 MBytes 612 Mbits/sec receiver after: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver And the memcpy() overhead during the RX drops dramatically. before: Overhead Shared O Symbol 43.35% [kernel] [k] memcpy 33.77% [kernel] [k] __asm_copy_to_user 3.64% [kernel] [k] sifive_l2_flush64_range after: Overhead Shared O Symbol 45.40% [kernel] [k] __asm_copy_to_user 28.09% [kernel] [k] memcpy 4.27% [kernel] [k] sifive_l2_flush64_range Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:25:18 -07:00
Shay Drory	c36326d38d	net/mlx5: Round-Robin EQs over IRQs Whenever users provided affinity for an EQ creation request, map the EQ to a matching IRQ. Matching IRQ=IRQ with the same affinity and type (completion/control) of the EQ created. This mapping is being done in agressive dedicated IRQ allocation scheme, which described bellow. First, we check whether there is a matching IRQ that his min threshold is not exhausted. - min_eqs_threshold = 3 for control EQ. - min_eqs_threshold = 1 for completion EQ. In case no matching IRQ was found, try to request a new IRQ. In case we can't request a new IRQ, reuse least-used matching IRQ. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:58:00 -07:00
Shay Drory	c8ea212bfd	net/mlx5: Separate between public and private API of sf.h Move mlx5_sf_max_functions() and friends from the privete sf/sf.h to the public lib/sf.h. This is done in order to have one direction include paths. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:58:00 -07:00
Shay Drory	71e084e264	net/mlx5: Allocating a pool of MSI-X vectors for SFs SFs (Sub Functions) currently use IRQs from the global IRQ table their parent Physical Function have. In order to better scale, we need to allocate more IRQs and share them between different SFs. Driver will maintain 3 separated irq pools: 1. A pool that serve the PF consumer (PF's netdev, rdma stacks), similar to what the driver had before this patch. i.e, this pool will share irqs between rdma and netev, and will keep the irq indexes and allocation order. The last is important for PF netdev rmap (aRFS). 2. A pool of control IRQs for SFs. The size of this pool is the number of SFs that can be created divided by SFS_PER_IRQ. This pool will serve the control path EQs of the SFs. 3. A pool of completion data path IRQs for SFs transport queues. The size of this pool is: num_irqs_allocated - pf_pool_size - sf_ctrl_pool_size. This pool will served netdev and rdma stacks. Moreover, rmap is not supported on SFs. Sharing methodology of the SFs pools is explained in the next patch. Important note: rmap is not supported on SFs because rmap mapping cannot function correctly for IRQs that are shared for different core/netdev RX rings. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:58:00 -07:00
Shay Drory	fc63dd2a85	net/mlx5: Change IRQ storage logic from static to dynamic Store newly created IRQs in the xarray DB instead of a static array, so we will be able to store only IRQs which are being used. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:59 -07:00
Shay Drory	2d74524c01	net/mlx5: Moving rmap logic to EQs IRQs are being simplified in order to ease their sharing and any feature specific object will be moved to upper layer. Hence we move rmap object into eq_table. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:59 -07:00
Shay Drory	e8abebb3a4	net/mlx5: Extend mlx5_irq_request to request IRQ from the kernel Extend mlx5_irq_request so that IRQs will be requested upon EQ creation, and not on driver boot. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:59 -07:00
Shay Drory	2de6153837	net/mlx5: Removing rmap per IRQ In next patches, IRQs will be requested according to demand, instead of statically on driver boot. Also, currently, rmap is managed by the IRQ layer. rmap management will move out from the IRQ layer in future patches. Therefore, we want to remove the IRQ from the rmap, when IRQ is destroyed, instead of removing all the IRQs from the rmap when irq_table is destroyed. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:58 -07:00
Leon Romanovsky	652e3581f2	net/mlx5: Clean license text in eq.[c\|h] files The eq.[c\|h] files are under major rewrite. so use this opportunity and update their copyright and license texts. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:58 -07:00
Leon Romanovsky	e4e3f24b82	net/mlx5: Provide cpumask at EQ creation phase The users of EQ are running their code on different CPUs and with various affinity patterns. Move the cpumask setting close to their actual usage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:57 -07:00
Shay Drory	3b43190b2f	net/mlx5: Introduce API for request and release IRQs Introduce new API that will allow IRQs users to hold a pointer to mlx5_irq. In the end of this series, IRQs will be allocated on demand. Hence, this will allow us to properly manage and use IRQs. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:57 -07:00
Leon Romanovsky	c38421abcf	net/mlx5: Delay IRQ destruction till all users are gone Shared IRQ are consumed by multiple EQ users and in order to properly initialize and later release such IRQs, we add kref counting of IRQ structure. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:57 -07:00
Mark Bloch	8a66e45859	net/mlx5: Change ownership model for lag Lag is used to combine two PCI functions of the same HCA into a single logical unit. This is a core functionality and as such should be managed by the core driver. Currently this isn't the case. While we store the lag software structure inside the lower device, its lifetime (creation / destruction) is dictated by the mlx5e part. Change the ownership model so lag is tied to the lifetime of the lower level driver instead to the mlx5e part. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:56 -07:00
Mark Bloch	8ed19471fd	net/mlx5: Lag, Don't rescan if the device is going down If MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV is set it means the device is going down and mlx5_rescan_drivers_locked() shouldn't be called. With this patch and the previous one in the series, unbinding a PCI function when its netdev is part of a bond works and leaves the system in a working state. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:56 -07:00
Mark Bloch	8c22ad36ee	net/mlx5: Lag, refactor disable flow When a net device is removed (can happen if the PCI function is unbound from the system) it's not enough to destroy the hardware lag. The system should recreate the original devices that were present before the lag. As the same flow is done when a net device is removed from the bond refactor and reuse the code. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:56 -07:00
David S. Miller	ed0141d113	Merge branch 'Ingenic-SOC-mac-support' Zhou Yanjie says: ==================== Add Ingenic SoCs MAC support. v2->v3: 1.Add "ingenic,mac.yaml" for Ingenic SoCs. 2.Change tx clk delay and rx clk delay from hardware value to ps. 3.return -EINVAL when a unsupported value is encountered when parsing the binding. 4.Simplify the code of the RGMII part of X2000 SoC according to Andrew Lunn’s suggestion. 5.Follow the example of "dwmac-mediatek.c" to improve the code that handles delays according to Andrew Lunn’s suggestion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:12:33 -07:00
周琰杰 (Zhou Yanjie)	2bb4b98b60	net: stmmac: Add Ingenic SoCs MAC support. Add support for Ingenic SoC MAC glue layer support for the stmmac device driver. This driver is used on for the MAC ethernet controller found in the JZ4775 SoC, the X1000 SoC, the X1600 SoC, the X1830 SoC, and the X2000 SoC. Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:06:52 -07:00
Oleksandr Mazur	a80cf955c9	net: marvell: prestera: devlink: add traps with DROP action Add traps that have init_action being set to DROP. Add 'trap_drop_counter_get' (devlink API) callback implementation, that is used to get number of packets that have been dropped by the HW (traps with action 'DROP'). Add new FW command CPU_CODE_COUNTERS_GET. Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:04:25 -07:00
Oleksandr Mazur	0a9003f45e	net: marvell: prestera: devlink: add traps/groups implementation Add devlink traps registration (with corresponding groups) for all the traffic types that driver traps to the CPU; prestera_rxtx: report each packet trapped to the CPU (RX) to the prestera_devlink; Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:04:25 -07:00
Lijun Pan	673ead2431	ibmvnic: fix send_request_map incompatible argument The 3rd argument is u32 by function definition while it is __be32 by function declaration. Signed-off-by: Lijun Pan <lijunp213@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 12:56:50 -07:00

1 2 3 4 5 ...

38245 Commits