Commit Graph

37659 Commits

Author SHA1 Message Date
Russ White
06c72fae70
Merge pull request #17575 from opensourcerouting/fix/outgoing_rmap_supressed
bgpd: Show which route-map is used when the prefix is filtered by route-map
2024-12-10 11:32:30 -05:00
Russ White
3f6bf6d03c
Merge pull request #17608 from opensourcerouting/fix/vpn_import_routes_allowas-in
bgpd: Import allowed routes with self AS if desired
2024-12-10 11:24:15 -05:00
Donatas Abraitis
dd4c2acc2e lib: Take ge/le into consideration when checking the prefix with the prefix-list
Without the fix:

```
show ip prefix-list test_1 10.20.30.96/27 first-match
 <no result>

show ip prefix-list test_2 192.168.1.2/32 first-match
 <no result>
```

With the fix:

```
ip prefix-list test_1 seq 10 permit 10.20.30.64/26 le 27
!
end
donatas# show ip prefix-list test_1 10.20.30.96/27
   seq 10 permit 10.20.30.64/26 le 27 (hit count: 1, refcount: 0)
donatas# show ip prefix-list test_1 10.20.30.64/27
   seq 10 permit 10.20.30.64/26 le 27 (hit count: 2, refcount: 0)
donatas# show ip prefix-list test_1 10.20.30.64/28
donatas# show ip prefix-list test_1 10.20.30.126/26
   seq 10 permit 10.20.30.64/26 le 27 (hit count: 3, refcount: 0)
donatas# show ip prefix-list test_1 10.20.30.126/30
donatas#
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-10 16:28:26 +02:00
Chirag Shah
3f3c923d7a bgpd: copy asn of src vrf to evpn type5 route
When a unicast route from source vrf is imported into
evpn as type5 route, prepend the asn of the source vrf into
type5 asn path list.

The condition of evpn type5 prefix path info is present but
source vrf route has been cleared via clear command. In this
case existing path info needs to rewrite the source vrf asn.

prepends asn of the source vrf, but the further condition
for existing path attribute for the same route needs to prepend
source vrf asn.

Ticket: #2943080
Testing:
Before fix:
r4# clear ip bgp vrf overlay prefix 0.0.0.0/0
Route Distinguisher: 128.117.243.209:4
*> [5]:[0]:[0]:[0.0.0.0]
         203.0.113.1          0          0 194 ? <--- 64512 is missing
         ET:8 RT:64532:104001 Rmac:06:ec:bf:59:e8:93

After fix:
r4# clear ip bgp vrf overlay prefix 0.0.0.0/0

Route's source vrf bgp output containing ASN 64512:
r4# show bgp vrf overlay
BGP table version is 2, local router ID is 128.117.243.209, vrf id 10
Default local pref 100, local AS 64512
...

Notice after clear command source vrf asn 64512 is retained.
Route Distinguisher: 128.117.243.209:4
*> [5]:[0]:[0]:[0.0.0.0]
         203.0.113.1          0          0 64512 194 ?
         ET:8 RT:64532:104001 Rmac:06:ec:bf:59:e8:93

Signed-off-by: Chirag Shah <chirag@nvidia.com>
2024-12-09 12:39:08 -05:00
Trey Aspelund
7ec7446ea8 bgpd: only import specific route-types into EVIs
Prior to this we were only filtering EVPN routes from the import logic
if they were not route-type 1/2/3/5, which allowed things like RT-5s to
be imported into an L2VNI/MAC-VRF. This adds additional logic to ensure
routes are only imported into EVIs where they make sense.
No more nonsensical route importing!

Ticket: 2848204
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
2024-12-09 12:39:08 -05:00
Donald Sharp
da7393b8fd zebra: Fix another ships in the night issue with WFI
Effectively When bgp would send a route update down
to zebra and immediately after that a asic update
from the kernel was read.  Zebra would choose the
asic update and drop the bgp update leaving us in
a state where bgp was not used as the true source.

Modify the code so that in rib_multipath_nhe
we notice that we have an unprocessed route update
from bgp.  And if so just drop this kernel update
about an older version of the route since it is
no longer needed.

Ticket: 2722533
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-09 12:35:42 -05:00
Quentin Young
5fba3c4d74 watchfrr: increase restart timer 20s -> 90s
This commit:
"tools: run `vtysh -b` once for all-startup"

changed things so that `vtysh -b` is run after all daemons have started
up instead of doing it for each daemon as they are started up. This
results in one long `vtysh -b`, which for large configs and many daemons
(in the case I saw, 4 daemons and 30,000 line config) can exceed the 20
second timer watchfrr uses to kill "hung" background tasks.

Shouldn't be any harm to increasing this to 90 seconds to give us some
leeway while still making sure we kill anything truly misbehaving.

Signed-off-by: Quentin Young <qlyoung@nvidia.com>
2024-12-09 12:35:42 -05:00
Wesley Coakley
a72d1a1124 pbrd: fix vrf_unchanged which may depend on other seqs
Ticket: 2740911
Signed-off-by: Wesley Coakley <wcoakley@nvidia.com>
2024-12-09 12:31:29 -05:00
Anuradha Karuppiah
e57ad2fbcd pimd: skip init of mlag roles based on the zebra capabilities message
Looks like the cap setting was added for testing mlag via zebra test cli
to config the mlag role. However it is interfering with the valid state
updates rxed from the MLAG daemon based on timing (in some cases the
MLAG state changes are rxed before the capabilities).

Reference logs -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@TORC11:mgmt:/home/cumulus# grep -ri "my_role\|MlagRole" /var/log/frr/bgpd.log
2021/06/18 13:26:40.380130 PIM: pim_mlag_process_mlagd_state_change: msg dump: my_role: SECONDARY, peer_state: DOWN
2021/06/18 13:26:40.380766 PIM: pim_mlag_process_mlagd_state_change: msg dump: my_role: SECONDARY, peer_state: DOWN
2021/06/18 13:26:41.382258 PIM: pim_mlag_process_mlagd_state_change: msg dump: my_role: SECONDARY, peer_state: RUNNING
2021/06/18 13:26:41.382379 PIM: pim_mlag_process_mlagd_state_change: msg dump: my_role: PRIMARY, peer_state: RUNNING
2021/06/18 13:26:52.386071 ZEBRA: Sending capabilities to client pim: MPLS enabled numMultipath 128 GR disabled MaintMode off MlagRole 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Ticket: #2691629

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
2024-12-09 12:31:29 -05:00
Donald Sharp
b3facc23df zebra: Reduce memory usage of streams for encoding packets
For those packets that we are not sending 16k of data, but something
far less than 256 bytes.  Reduce those stream sizes we allocate
to something much more reasonable.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-09 12:31:29 -05:00
vivek
f8c464688d bgpd: Check L3VNI status before announcing default
Check that the L3VNI is "up" before taking action to announce or
withdraw the EVPN type-5 default based on configuration. Otherwise,
there can be timing conditions where a EVPN type-5 default route
gets announced without a VNI and with invalid route targets.

Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>

Ticket: #2684144
Reviewed By: Chirag Shah
Testing Done:
1. Rerun failed test multiple times successfully
2. Some manual testing
3. precommit and partial evpn-smoke
2024-12-09 12:31:29 -05:00
vivek
e2b20dfb33 zebra: Reset MAC's remote sequence number appropriately
When a MAC gets deleted but associated neighbors remain, the MAC is
kept in the zebra MAC database as an internal ("auto") entry. When
this happens, reset the MAC's remote sequence number. This ensures that
when the host with the MAC later comes up behind a remote VTEP, the
local switch accepts the MAC and installs it into the bridge FDB and
we don't end up in a situation where remote MACs are not installed
into the bridge FDB.

This fix is a corollary of CM-22753 and is this time done for local
MACs upon delete.

Note: Commit is marked Cumulus-only because I need to evalute more
comprehensive changes before upstreaming it.

Ticket: CM-29581
Reviewed By: As above
Testing Done:
1. Multiple rounds of manual testing
2. Two rounds of evpn-smoke, 1 round of precommit

Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Acked-by:      Chirag Shah <chirag@cumulusnetworks.com>
Acked-by:      Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2024-12-09 12:29:38 -05:00
Rajasekar Raja
bd32706c70 bgpd: Suppress redundant L3VNI delete processing
Consider a master bridge interface (br_l3vni) having a slave vxlan99
mapped to vlans used by 3 L3VNIs.

During ifdown br_l3vni interface, the function
zebra_vxlan_process_l3vni_oper_down() where zebra sends ZAPI to bgp for
a delete L3VNI is sent twice.
 1) if_down -> zebra_vxlan_svi_down()
 2) VXLAN is unlinked from the bridge i.e. vxlan99
    zebra_if_dplane_ifp_handling() --> zebra_vxlan_if_update_vni()
    (since ZEBRA_VXLIF_MASTER_CHANGE flag is set)

During ifup br_l3vni interface, the function
zebra_vxlan_process_l3vni_oper_down() is invoked because of access-vlan
change - process oper down, associate with new svi_if and then process
oper up again

The problem here is that the redundant ZAPI message of L3VNI delete
results in BGP doing a inline Global table walk for remote route
installation when the L3VNI is already removed/deleted. Bigger the
scale, more wastage is the CPU utilization.

Given the triggers for bridge flap is not a common scenario, idea is to
simply return from BGP if the L3VNI is already set to 0 i.e.
if the L3VNI is already deleted, do nothing and return.

NOTE/TBD: An ideal fix is to make zebra not send the second L3VNI delete
ZAPI message. However it is a much involved and a day-1 code to handle
corner cases.

Ticket :#3864372

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2024-12-09 08:46:16 -08:00
Rajasekar Raja
0f2cb27310 bgpd: backpressure - Optimize EVPN L3VNI remote routes processing
Anytime BGP gets a L3 VNI ADD/DEL from zebra,
 - Walking the entire global routing table per L3VNI is very expensive.
 - The next read (say of another VNI ADD/DEL) from the socket does
   not proceed unless this walk is complete.

So for triggers where a bulk of L3VNI's are flapped, this results in
huge output buffer FIFO growth spiking up the memory in zebra since bgp
is slow/busy processing the first message.

To avoid this, idea is to hookup the BGP-VRF off the struct bgp_master
and maintain a struct bgp FIFO list which is processed later on, where
we walk a chunk of BGP-VRFs and do the remote route install/uninstall.

Ticket :#3864372

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2024-12-09 08:46:16 -08:00
Rajasekar Raja
07a80709c7 bgpd: backpressure - Optimize EVPN L2VNI remote routes processing
Anytime BGP gets a L2 VNI ADD from zebra,
 - Walking the entire global routing table per L2VNI is very expensive.
 - The next read (say of another VNI ADD) from the socket does
   not proceed unless this walk is complete.

So for triggers where a bulk of L2VNI's are flapped, this results in
huge output buffer FIFO growth spiking up the memory in zebra since bgp
is slow/busy processing the first message.

To avoid this, idea is to hookup the VPN off the bgp_master struct and
maintain a VPN FIFO list which is processed later on, where we walk a
chunk of VPNs and do the remote route install.

Note: So far in the L3 backpressure cases(#15524), we have considered
the fact that zebra is slow, and the buffer grows in the BGP.

However this is the reverse i.e. BGP is very busy processing the first
ZAPI message from zebra due to which the buffer grows huge in zebra
and memory spikes up.

Ticket :#3864372

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2024-12-09 08:46:16 -08:00
Trey Aspelund
8c713609dd bgpd: add EVPN route type msg list
Adds a msg list for getting strings mapping to enum bgp_evpn_route_type

Ticket: #3318830

Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
2024-12-09 08:46:16 -08:00
Donald Sharp
c05c2b15e5
Merge pull request #17461 from csiltala/multicast-boundary-acl
pimd: Extend multicast boundary/ACL functionality
2024-12-09 10:42:04 -05:00
Donatas Abraitis
3d89c67889 bgpd: Print the actual prefix when we try to import in vpn_leak_to_vrf_update
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-08 21:48:14 +02:00
Donatas Abraitis
222ba5f390 bgpd: Import allowed routes with self AS if desired
Previously we couldn't install VPN routes with self AS in the path because
we never checked if we have allowas-in enabled, or not.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-08 21:46:59 +02:00
Donatas Abraitis
77857dc210 tests: Check if vpn routes can be imported if allowas-in is set
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-08 21:44:52 +02:00
Donatas Abraitis
17a0d92ffd
Merge pull request #17589 from anlancs/master_up
zebra: use macro for one check
2024-12-07 22:35:12 +02:00
Donatas Abraitis
797cf4757e
Merge pull request #17538 from idryzhov/netns-doc
doc: remove no-op "netns NAMESPACE" command from the docs
2024-12-07 22:32:00 +02:00
Igor Ryzhov
e51c6dd256 zebra: add deprecation notice for no-op netns command
Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
2024-12-07 17:02:58 +02:00
Igor Ryzhov
c3bffa9277 doc: remove no-op "netns NAMESPACE" command from the docs
Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
2024-12-07 17:02:58 +02:00
Donatas Abraitis
b0800bfdf0 bgpd: Validate only affected RPKI prefixes instead of a full RIB
Before this fix, if rpki_sync_socket_rtr socket returns EAGAIN, then ALL routes
in the RIB are revalidated which takes lots of CPU and some unnecessary traffic,
e.g. if using BMP servers. With a full feed it would waste 50-80Mbps.

Instead we should try to drain an existing pipe (another end), and revalidate
only affected prefixes.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-06 23:31:33 +02:00
Corey Siltala
8465ba1dde pimd: Convert boundary_oil_plist to struct prefix_list
Rather than storing the prefix-list name and looking it up every time we use it, store a pointer to the prefix-list itself.

Signed-off-by: Corey Siltala <csiltala@atcorp.com>
2024-12-06 14:44:52 -06:00
Corey Siltala
ff5309ca2d doc: Expand ACL and multicast boundary documentation
Add documentation for existing extended access-list functionality and
the new "ip multicast boundary" command leveraging that functionality.

Signed-off-by: Corey Siltala <csiltala@atcorp.com>
2024-12-06 14:44:52 -06:00
Corey Siltala
7c2c70dd2b tests: Add basic multicast boundary test
Add simple test to show filtering of IGMP joins using new "ip multicast
boundary" filtering with access-lists, include test of existing prefix-
list based "ip multicast boundary oil" command.

Signed-off-by: Corey Siltala <csiltala@atcorp.com>
2024-12-06 14:44:52 -06:00
Corey Siltala
4de4017d64 pimd,yang: Extend multicast boundary functionality
Add new interface command ip multicast boundary ACCESSLIST4_NAME. This
allows filtering on both source and group using the extended access-list
syntax vs. group-only as with the existing "ip multicast boundary oil"
command, which uses prefix-lists. If both are configured, the prefix-
list is evaluated first. The default behavior for both prefix-lists and
access-lists remains "deny", so the prefix-list must have a terminating
"permit" statement in order to also evaluate against the access-list.

The following example denies groups in range 229.1.1.0/24 and groups in
range 232.1.1.0/24 with source 10.0.20.2:

!
ip prefix-list pim-oil-plist seq 10 deny 229.1.1.0/24
ip prefix-list pim-oil-plist seq 20 permit any
!
access-list pim-acl seq 10 deny ip host 10.0.20.2 232.1.1.0 0.0.0.255
access-list pim-acl seq 20 permit ip any any
!
interface r1-eth0
 ip address 10.0.20.1/24
 ip igmp
 ip pim
 ip multicast boundary oil pim-oil-plist
 ip multicast boundary pim-acl
!

Signed-off-by: Corey Siltala <csiltala@atcorp.com>
2024-12-06 14:44:17 -06:00
Corey Siltala
a9bee74ea2 pimd: Move ACL handling to pim_util.c
Move the extended access-list handling from pim_msdp_packet.c to
pim_util.c to allow use elsewhere in the daemon.

Signed-off-by: Corey Siltala <csiltala@atcorp.com>
2024-12-06 14:44:17 -06:00
Jafar Al-Gharaibeh
f1a9b9292c
Merge pull request #17603 from opensourcerouting/fix/bgp_peer_with_peer-group
bgpd: Check if as_type is not specified when peer is a peer-group member
2024-12-06 08:55:56 -06:00
Donatas Abraitis
03ea25af68
Merge pull request #17545 from pguibert6WIND/peerup_loc_rib_wrong_format
bgpd: fix peer up message for loc-rib not sent
2024-12-06 14:47:48 +02:00
Donatas Abraitis
3d15035491
Merge pull request #17579 from donaldsharp/timer_connect_bgp_vrf_netns
Timer connect bgp vrf netns
2024-12-06 14:26:33 +02:00
Donatas Abraitis
2797506a5e bgpd: Check if as_type is not specified when peer is a peer-group member
Fixes this sequences:

```
neighbor pg4 peer-group
neighbor 127.0.0.4 peer-group pg4
neighbor 127.0.0.4 remote-as 65004

neighbor pg5 peer-group
neighbor 127.0.0.5 peer-group pg5
neighbor 127.0.0.5 remote-as internal
```

Fixes: 0dfe256 ("bgpd: Implement neighbor X remote-as auto")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-06 08:25:09 +02:00
Jafar Al-Gharaibeh
38ca408c54
Merge pull request #17600 from donaldsharp/bfd_shared_network
Bfd shared network
2024-12-05 22:13:39 -06:00
Donald Sharp
a5c5b87389 tests: Fix invalid escape seq seen in bgp_nexthop_ipv6
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-05 14:08:20 -05:00
Donald Sharp
dc372568ad tests: Convert to using neighbor X timers connect 1 for exabgp using tests
For those tests using exabgp convert them all to use `neighbor X timers
connect 1`.  I have noticed that occassionally when looking at the
support files for tests run that peers are in a wait period for
reconnecting which is longer than the test is waiting to converge.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-05 14:08:12 -05:00
Donald Sharp
a43b11fcf6
Merge pull request #17570 from btrent98/igmp-proxy-memfix
pimd: free igmp proxy joins on interface deletion
2024-12-05 10:23:30 -05:00
Donald Sharp
3b97cbf77e bgpd: When bgp notices a change to shared_network inform bfd of it
When bgp is started up and reads the config in *before* it has
received interface addresses from zebra, shared_network can
be set to false in this case.  Later on once bgp attempts to
reconnect it will refigure out the shared_network again( because
it has received the data from zebra now ).  In this case
tell bfd about it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-05 10:21:26 -05:00
Donald Sharp
7cde71a8e3 bgpd: shared_network is a bool, convert it to such
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-05 10:19:55 -05:00
Donald Sharp
645a82ec60 tests: bfd_profiles_topo1 is taking a long time to reconnect
Make it faster

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-05 10:19:55 -05:00
Philippe Guibert
f921a8d09a topotests: bmp, test that loc-rib peer up message is sent
Add a test at startup to ensure that peer up message for loc-rib is
correctly set.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-12-05 15:42:10 +01:00
Philippe Guibert
013b9d4c19 bgpd: fix peer up message for loc-rib not sent
At startup, there is no peer up message for loc-rib instance peer.
Instead, a global peer up message with address 0.0.0.0 is sent.

Such message is wrong, violates the RFC and should be dropped by
a strict collector. Actually, the peer type message sent is wrong,
and should be set to LOC-RIB peer type.

Fix this by changing the peer type of peer up message to either
loc-rib or global instance peer type.

Fixes: 035304c25a ("bgpd: bmp loc-rib peer up/down for vrfs")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-12-05 15:42:10 +01:00
Rafael Zalamena
98c68a37d8 doc: document new SA limit command
Let user know about the new MSDP SA limit command.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2024-12-05 10:39:00 -03:00
Rafael Zalamena
0d904c28c3 topotests: test new MSDP SA limit feature
Test that only the limit amount of SAs is learned from the peer.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2024-12-05 10:38:56 -03:00
Rafael Zalamena
a38ed18a4e pimd: implement MSDP peer SA limiting
Implement a command to enable/disable per peer MSDP SA limiting.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2024-12-05 10:38:52 -03:00
Rafael Zalamena
c8ded86e9e yang,pimd: support shutdown and SA limit
Add MSDP shutdown and SA limiting configuration to YANG model.

(no implementation, just boiler plate code)

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2024-12-05 10:35:10 -03:00
anlan_cs
f536ca30f5 zebra: use macro for one check
Signed-off-by: anlan_cs <anlan_cs@126.com>
2024-12-05 21:20:05 +08:00
Donatas Abraitis
34485ee536
Merge pull request #17582 from pguibert6WIND/no_neighbor_asdot_fix
bgpd: fix unconfigure asdot neighbor
2024-12-05 09:32:54 +02:00
Jafar Al-Gharaibeh
e814b000c3
Merge pull request #17585 from donaldsharp/zclient_speedup
lib: Speed up reconnection attempts for zapi
2024-12-04 21:59:33 -06:00