Check that the L3VNI is "up" before taking action to announce or
withdraw the EVPN type-5 default based on configuration. Otherwise,
there can be timing conditions where a EVPN type-5 default route
gets announced without a VNI and with invalid route targets.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
Ticket: #2684144
Reviewed By: Chirag Shah
Testing Done:
1. Rerun failed test multiple times successfully
2. Some manual testing
3. precommit and partial evpn-smoke
When a MAC gets deleted but associated neighbors remain, the MAC is
kept in the zebra MAC database as an internal ("auto") entry. When
this happens, reset the MAC's remote sequence number. This ensures that
when the host with the MAC later comes up behind a remote VTEP, the
local switch accepts the MAC and installs it into the bridge FDB and
we don't end up in a situation where remote MACs are not installed
into the bridge FDB.
This fix is a corollary of CM-22753 and is this time done for local
MACs upon delete.
Note: Commit is marked Cumulus-only because I need to evalute more
comprehensive changes before upstreaming it.
Ticket: CM-29581
Reviewed By: As above
Testing Done:
1. Multiple rounds of manual testing
2. Two rounds of evpn-smoke, 1 round of precommit
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Acked-by: Chirag Shah <chirag@cumulusnetworks.com>
Acked-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Consider a master bridge interface (br_l3vni) having a slave vxlan99
mapped to vlans used by 3 L3VNIs.
During ifdown br_l3vni interface, the function
zebra_vxlan_process_l3vni_oper_down() where zebra sends ZAPI to bgp for
a delete L3VNI is sent twice.
1) if_down -> zebra_vxlan_svi_down()
2) VXLAN is unlinked from the bridge i.e. vxlan99
zebra_if_dplane_ifp_handling() --> zebra_vxlan_if_update_vni()
(since ZEBRA_VXLIF_MASTER_CHANGE flag is set)
During ifup br_l3vni interface, the function
zebra_vxlan_process_l3vni_oper_down() is invoked because of access-vlan
change - process oper down, associate with new svi_if and then process
oper up again
The problem here is that the redundant ZAPI message of L3VNI delete
results in BGP doing a inline Global table walk for remote route
installation when the L3VNI is already removed/deleted. Bigger the
scale, more wastage is the CPU utilization.
Given the triggers for bridge flap is not a common scenario, idea is to
simply return from BGP if the L3VNI is already set to 0 i.e.
if the L3VNI is already deleted, do nothing and return.
NOTE/TBD: An ideal fix is to make zebra not send the second L3VNI delete
ZAPI message. However it is a much involved and a day-1 code to handle
corner cases.
Ticket :#3864372
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Anytime BGP gets a L3 VNI ADD/DEL from zebra,
- Walking the entire global routing table per L3VNI is very expensive.
- The next read (say of another VNI ADD/DEL) from the socket does
not proceed unless this walk is complete.
So for triggers where a bulk of L3VNI's are flapped, this results in
huge output buffer FIFO growth spiking up the memory in zebra since bgp
is slow/busy processing the first message.
To avoid this, idea is to hookup the BGP-VRF off the struct bgp_master
and maintain a struct bgp FIFO list which is processed later on, where
we walk a chunk of BGP-VRFs and do the remote route install/uninstall.
Ticket :#3864372
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Anytime BGP gets a L2 VNI ADD from zebra,
- Walking the entire global routing table per L2VNI is very expensive.
- The next read (say of another VNI ADD) from the socket does
not proceed unless this walk is complete.
So for triggers where a bulk of L2VNI's are flapped, this results in
huge output buffer FIFO growth spiking up the memory in zebra since bgp
is slow/busy processing the first message.
To avoid this, idea is to hookup the VPN off the bgp_master struct and
maintain a VPN FIFO list which is processed later on, where we walk a
chunk of VPNs and do the remote route install.
Note: So far in the L3 backpressure cases(#15524), we have considered
the fact that zebra is slow, and the buffer grows in the BGP.
However this is the reverse i.e. BGP is very busy processing the first
ZAPI message from zebra due to which the buffer grows huge in zebra
and memory spikes up.
Ticket :#3864372
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Previously we couldn't install VPN routes with self AS in the path because
we never checked if we have allowas-in enabled, or not.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Before this fix, if rpki_sync_socket_rtr socket returns EAGAIN, then ALL routes
in the RIB are revalidated which takes lots of CPU and some unnecessary traffic,
e.g. if using BMP servers. With a full feed it would waste 50-80Mbps.
Instead we should try to drain an existing pipe (another end), and revalidate
only affected prefixes.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Rather than storing the prefix-list name and looking it up every time we use it, store a pointer to the prefix-list itself.
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
Add documentation for existing extended access-list functionality and
the new "ip multicast boundary" command leveraging that functionality.
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
Add simple test to show filtering of IGMP joins using new "ip multicast
boundary" filtering with access-lists, include test of existing prefix-
list based "ip multicast boundary oil" command.
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
Add new interface command ip multicast boundary ACCESSLIST4_NAME. This
allows filtering on both source and group using the extended access-list
syntax vs. group-only as with the existing "ip multicast boundary oil"
command, which uses prefix-lists. If both are configured, the prefix-
list is evaluated first. The default behavior for both prefix-lists and
access-lists remains "deny", so the prefix-list must have a terminating
"permit" statement in order to also evaluate against the access-list.
The following example denies groups in range 229.1.1.0/24 and groups in
range 232.1.1.0/24 with source 10.0.20.2:
!
ip prefix-list pim-oil-plist seq 10 deny 229.1.1.0/24
ip prefix-list pim-oil-plist seq 20 permit any
!
access-list pim-acl seq 10 deny ip host 10.0.20.2 232.1.1.0 0.0.0.255
access-list pim-acl seq 20 permit ip any any
!
interface r1-eth0
ip address 10.0.20.1/24
ip igmp
ip pim
ip multicast boundary oil pim-oil-plist
ip multicast boundary pim-acl
!
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
Move the extended access-list handling from pim_msdp_packet.c to
pim_util.c to allow use elsewhere in the daemon.
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
For those tests using exabgp convert them all to use `neighbor X timers
connect 1`. I have noticed that occassionally when looking at the
support files for tests run that peers are in a wait period for
reconnecting which is longer than the test is waiting to converge.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When bgp is started up and reads the config in *before* it has
received interface addresses from zebra, shared_network can
be set to false in this case. Later on once bgp attempts to
reconnect it will refigure out the shared_network again( because
it has received the data from zebra now ). In this case
tell bfd about it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
At startup, there is no peer up message for loc-rib instance peer.
Instead, a global peer up message with address 0.0.0.0 is sent.
Such message is wrong, violates the RFC and should be dropped by
a strict collector. Actually, the peer type message sent is wrong,
and should be set to LOC-RIB peer type.
Fix this by changing the peer type of peer up message to either
loc-rib or global instance peer type.
Fixes: 035304c25a ("bgpd: bmp loc-rib peer up/down for vrfs")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add MSDP shutdown and SA limiting configuration to YANG model.
(no implementation, just boiler plate code)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Currently the zapi reconnection is once every 10 seconds
for the first 3 times and then once every 60 seconds from then
on out. We are seeing interesting behavior under loaded systems
where zebra is just slow to come up and daemons are spending a long
time waiting to connect. Let's just make things a bit more aggressive.
Change the code to attempt to reconnect once every second for 30 seconds
and then change to once every 5 seconds from then on out.
This should help with non-integrated configuration on system startup.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The below command is not successfull on an existing as dot peer
> no neighbor 10.0.0.2 remote-as 1.1
> % Create the peer-group or interface first
Handle the case where the remote-as argument can be an ASNUM.
Fixes: 8079a4138d ("lib, bgp: add initial support for asdot format")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If the socket associated with the auto-rp fails to initialize then
the memory for the auto-rp is just dropped on the floor. Additionally
any type of attempt at using the feature will just cause pimd to crash,
when the pointer is derefed. Since it is derefed all over the place
without checking.
Clearly if you cannot bind/use the socket let's allow continuation.
Fixes: #17540
Signed-off-by: Donald Sharp <sharpd@nvidia.com>