Anytime BGP gets a L2 VNI ADD from zebra,
- Walking the entire global routing table per L2VNI is very expensive.
- The next read (say of another VNI ADD) from the socket does
not proceed unless this walk is complete.
So for triggers where a bulk of L2VNI's are flapped, this results in
huge output buffer FIFO growth spiking up the memory in zebra since bgp
is slow/busy processing the first message.
To avoid this, idea is to hookup the VPN off the bgp_master struct and
maintain a VPN FIFO list which is processed later on, where we walk a
chunk of VPNs and do the remote route install.
Note: So far in the L3 backpressure cases(#15524), we have considered
the fact that zebra is slow, and the buffer grows in the BGP.
However this is the reverse i.e. BGP is very busy processing the first
ZAPI message from zebra due to which the buffer grows huge in zebra
and memory spikes up.
Ticket :#3864372
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Rather than storing the prefix-list name and looking it up every time we use it, store a pointer to the prefix-list itself.
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
Add documentation for existing extended access-list functionality and
the new "ip multicast boundary" command leveraging that functionality.
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
Add simple test to show filtering of IGMP joins using new "ip multicast
boundary" filtering with access-lists, include test of existing prefix-
list based "ip multicast boundary oil" command.
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
Add new interface command ip multicast boundary ACCESSLIST4_NAME. This
allows filtering on both source and group using the extended access-list
syntax vs. group-only as with the existing "ip multicast boundary oil"
command, which uses prefix-lists. If both are configured, the prefix-
list is evaluated first. The default behavior for both prefix-lists and
access-lists remains "deny", so the prefix-list must have a terminating
"permit" statement in order to also evaluate against the access-list.
The following example denies groups in range 229.1.1.0/24 and groups in
range 232.1.1.0/24 with source 10.0.20.2:
!
ip prefix-list pim-oil-plist seq 10 deny 229.1.1.0/24
ip prefix-list pim-oil-plist seq 20 permit any
!
access-list pim-acl seq 10 deny ip host 10.0.20.2 232.1.1.0 0.0.0.255
access-list pim-acl seq 20 permit ip any any
!
interface r1-eth0
ip address 10.0.20.1/24
ip igmp
ip pim
ip multicast boundary oil pim-oil-plist
ip multicast boundary pim-acl
!
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
Move the extended access-list handling from pim_msdp_packet.c to
pim_util.c to allow use elsewhere in the daemon.
Signed-off-by: Corey Siltala <csiltala@atcorp.com>
For those tests using exabgp convert them all to use `neighbor X timers
connect 1`. I have noticed that occassionally when looking at the
support files for tests run that peers are in a wait period for
reconnecting which is longer than the test is waiting to converge.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When bgp is started up and reads the config in *before* it has
received interface addresses from zebra, shared_network can
be set to false in this case. Later on once bgp attempts to
reconnect it will refigure out the shared_network again( because
it has received the data from zebra now ). In this case
tell bfd about it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
At startup, there is no peer up message for loc-rib instance peer.
Instead, a global peer up message with address 0.0.0.0 is sent.
Such message is wrong, violates the RFC and should be dropped by
a strict collector. Actually, the peer type message sent is wrong,
and should be set to LOC-RIB peer type.
Fix this by changing the peer type of peer up message to either
loc-rib or global instance peer type.
Fixes: 035304c25a ("bgpd: bmp loc-rib peer up/down for vrfs")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Currently the zapi reconnection is once every 10 seconds
for the first 3 times and then once every 60 seconds from then
on out. We are seeing interesting behavior under loaded systems
where zebra is just slow to come up and daemons are spending a long
time waiting to connect. Let's just make things a bit more aggressive.
Change the code to attempt to reconnect once every second for 30 seconds
and then change to once every 5 seconds from then on out.
This should help with non-integrated configuration on system startup.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The below command is not successfull on an existing as dot peer
> no neighbor 10.0.0.2 remote-as 1.1
> % Create the peer-group or interface first
Handle the case where the remote-as argument can be an ASNUM.
Fixes: 8079a4138d ("lib, bgp: add initial support for asdot format")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If the socket associated with the auto-rp fails to initialize then
the memory for the auto-rp is just dropped on the floor. Additionally
any type of attempt at using the feature will just cause pimd to crash,
when the pointer is derefed. Since it is derefed all over the place
without checking.
Clearly if you cannot bind/use the socket let's allow continuation.
Fixes: #17540
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This would give more details why at some point we return deny, no match, etc.
Before this we have sometimes (I don't know why), e.g.:
```
Route-map: null, prefix: 192.168.2.0/24, result: deny
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Commit: 9112fb367b
Introduced the idea of setting the socket buffer
send/receive sizes. BSD's in general have the fun
issue of not allowing nearly as large as a size as
linux. Since the above commit was developed on linux
and not run on bsd it was never tested. Modify the
codebase to use the backoff setsockopt that we have
in the code base and use the returned values to allow
us to notice what was set and respond appropriately.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently if you have this sequence of events:
a) BGP starts
b) BGP reads cli that has bfd configuration
c) BGP attempts to install bfd configuration but fails because
zebra is not connected to yet
d) BGP connects to zebra
e) BGP receives resend bfd code from bfdd
f) BGP was not sending down the unsent data to bfd, never causing
the bfd session to be established.
So effectively bfd was attempting to install but failed
and then when it was asked to replay everything it decided
that the bfd information for a particular peer was actually
installed and does not need to be resent. Modify the code
such that the bfd code now tracks failed installation and
allows the resend of data to bfdd.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Separate zebra's ZAPI server socket handling into two phases:
an early phase that opens the socket, and a later phase that
starts listening for client connections.
Signed-off-by: Mark Stapp <mjs@cisco.com>