Consider this scenario:
Lots of peers with a bunch of route information that is changing
fast. One of the peers happens to be really slow for whatever
reason. The way the output queue is filled is that bgpd puts
64 packets at a time and then reschedules itself to send more
in the future. Now suppose that peer has hit it's input Queue
limit and is slow. As such bgp will continue to add data to
the output Queue, irrelevant if the other side is receiving
this data.
Let's limit the Output Queue to the same limit as the Input
Queue. This should prevent bgp eating up large amounts of
memory as stream data when under severe network trauma.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Use `getpid()` to initialize the sequence number. This change silences
Coverity Scan warning about truncated use of `time()` which in this case
is not a problem.
Found by Coverity Scan (CID 1519828)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
If we got inside the condition of `vrfp->status == VRF_ACTIVE` then
don't make the same check again.
Found by Coverity Scan (CID 1519760)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Don't attempt to dereference `ifp` directly if it might be null: there
is a check right before this usage: `ifp ? ifp->info : NULL`.
In this context it should be safe to assume `ifp` is not NULL because
the only caller of this function checks that for this `ifindex`. For
consistency we'll check for null anyway in case this ever changes (and
with this the coverity scan warning gets silenced).
Found by Coverity Scan (CID 1519776)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Testcase: test_pim6_multiple_groups_different_RP_address_p2
was failing because of a bug in framework, Fixed the
bug in this commit.
Signed-off-by: Kuldeep Kashyap <kashyapk@vmware.com>
Multicast pim6 static RP tests are failing
when run in parallel using micronet. There
are APIs to clean mcast traffic before
starting new test but these cleanups
are not needed when socat is used.
Signed-off-by: Kuldeep Kashyap <kashyapk@vmware.com>
Use "(null)" for empty IP address.
One example in `bgp_zebra_send_remote_macip()` to install mac:
Before:
```
2023/01/18 02:09:09 BGP: [SCHS5-AK960] Tx ADD MACIP, VNI 200 MAC 06:6b:7c:db:83:72 IP flags 0x0 seq 0 remote VTEP 88.88.88.88 esi -
```
After:
```
2023/01/18 20:19:57 BGP: [SCHS5-AK960] Tx ADD MACIP, VNI 200 MAC 06:6b:7c:db:83:72 IP (null) flags 0x0 seq 0 remote VTEP 88.88.88.88 esi -
```
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Under really heavily loaded systems this is insufficient. Looking
at the run output we have this:
"2.1.3.22\/32":[
{
"installed":true,
}
],
"2.1.3.23\/32":[
{
"queued":true,
}
],
So after 10 seconds on the micronet system only 30 of the 100 routes are installed.
Give it more time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Looks like under heavy load, the test is not giving enough
time to come to steady state. Do this:
a) send more udp packets and for longer
b) Increase time spent waiting
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Don't attempt to encode the pointer address instead pass the pointer
directly so the real contents can be accessed.
(`ri->pref_src` type is `union g_addr *`)
Found by Coverity Scan (CID 1482162)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Do extra inotify data structure checks and copy the file name to a stack
buffer making sure it is null byte terminated.
Found by Coverity Scan (CID 1465494)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Changes:
- Convert `unsigned int` to `time_t` to satisfy time truncation warnings
even though at this point we had already used the modulus operator.
- Avoid trying to access outside the bounds of the array
`months` array has a size of 13 elements, but the code inside the loop
uses `i + 1` to peek on the next month.
Found by Coverity Scan (CID 1519752 and 1519769)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
MPLS VPN networks can either peer with iBGP or eBGP. When
calculating the distance to send to zebra, the imported prefix
is never sent with distance information, even if the vty
command is used under the ipv4 unicast address family:
router bgp 65505 vrf vrf1
address-family ipv4 unicast
distance bgp 26 27 28
[vpn config]
The observation is that the distance sent to zebra for an
imported prefix is still 20:
[..]
VRF vrf1:
B> 192.168.0.0/24 [20/0] via 2.2.2.2 (vrf default) (recursive), label 20, weight 1, 00:00:12
* via 10.125.0.6, ntfp3 (vrf default), label implicit-null/20, weight 1, 00:00:12
The expectation is that the incoming prefix has to follow the
distance that is configured, or the distance derived from the peer
relationship established by the parent prefix.
In the case, an iBGP relationship is done, and no distance
configuration is done, the below show is expected:
[..]
VRF vrf1:
B*> 192.168.0.0/24 [200/0] via 192.168.0.2, r1-gre0 (vrf default), label 20, weight 1, 00:00:12
In the case an iBGP relationship is done, and distance configuration
is performed as below:
[..]
distance bgp 21 201 41
[..]
Then the below show is expected:
[..]
VRF vrf1:
B*> 192.168.0.0/24 [201/0] via 192.168.0.2, r1-gre0 (vrf default), label 20, weight 1, 00:00:12
To get this behaviour, get the peer origin where the prefix is coming
from.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When setting rule for access-list ( and prefix-list ) without sequence, it
will automatically get a sequence by `acl_get_seq()`, and return
`CMD_SUCCESS` for command even this sequence value is wrong.
In this scene, `CMD_WARNING_CONFIG_FAILED` should be returned with a
warning.
So, add the check in `acl_get_seq()` and move `nb_cli_enqueue_change()`
after the check of wrong sequence.
Both `plist_remove_if_empty()` and `acl_remove_if_empty()` should ignore
this check, there is no change on them.
Before:
```
anlan(config)# access-list aa seq 4294967295 deny 6.6.6.6/32
anlan(config)# access-list aa deny 6.6.6.7/32 <- Return CMD_SUCCESS
YANG error(s):
Value "4294967300" is out of uint32's min/max bounds.
Value "4294967300" is out of uint32's min/max bounds.
Value "4294967300" is out of uint32's min/max bounds.
Value "4294967300" is out of uint32's min/max bounds.
Value "4294967300" is out of uint32's min/max bounds.
YANG path: Schema location /frr-filter:lib/prefix-list/entry/sequence.
% Failed to edit configuration.
```
After:
```
anlan(config)# access-list aa seq 4294967295 deny 6.6.6.6/32
anlan(config)# access-list aa deny 6.6.6.7/32 <- Return CMD_WARNING_CONFIG_FAILED
% Malformed sequence value
```
Additionally, fixed the overflow issue on `acl_get_seq()` on **32bit** platforms.
Just change the returned type of `acl_get_seq()` from `long` to `int64_t`.
Before:
```
anlan(config)# access-list bb seq 4294967295 deny 6.6.6.6/32
anlan(config)# access-list bb deny 6.6.6.7/32
anlan(config)# do show run
...
access-list bb seq 4294967295 deny 6.6.6.6/32
access-list bb seq 4 deny 6.6.6.7/32 <- Overflow
```
After:
```
anlan(config)# access-list bb seq 4294967295 deny 6.6.6.6/32
anlan(config)# access-list bb deny 6.6.6.7/32
% Malformed sequence value
```
Signed-off-by: anlan_cs <vic.lan@pica8.com>
After calling `rib_unlink` the variable `re` will point to `free()`d
memory, so don't attempt to use it after this point.
Found by Coverity Scan (Coverity ID 1519784)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
[New LWP 2524]
[New LWP 2539]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/avi/bin/bgpd -f /run/frr/avi_ns3_bgpd.config -i /opt/avi/etc/avi_ns3_bgpd.'.
Program terminated with signal SIGABRT, Aborted.
[Current thread is 1 (Thread 0x7f92ac8f1740 (LWP 2524))]
0 0x00007f92acb3800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f92ac8f1740 (LWP 2524))]
0 0x00007f92acb3800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007f92acb17859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007f92acb17729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
3 0x00007f92acb28fd6 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
4 0x00007f92accf2164 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
5 0x000055b46be1ef63 in bgp_keepalives_wake () at bgpd/bgp_keepalives.c:311
6 0x000055b46be1f111 in bgp_keepalives_stop (fpt=0x55b46cfacf20, result=<optimized out>) at bgpd/bgp_keepalives.c:323
7 0x00007f92acea9521 in frr_pthread_stop (fpt=0x55b46cfacf20, result=result@entry=0x0) at lib/frr_pthread.c:176
8 0x00007f92acea9586 in frr_pthread_stop_all () at lib/frr_pthread.c:188
9 0x000055b46bdde54a in bgp_pthreads_finish () at bgpd/bgpd.c:8150
10 0x000055b46bd696ca in bgp_exit (status=0) at bgpd/bgp_main.c:210
11 sigint () at bgpd/bgp_main.c:154
12 0x00007f92acecc1e9 in quagga_sigevent_process () at lib/sigevent.c:105
13 0x00007f92aced689a in thread_fetch (m=m@entry=0x55b46cf23540, fetch=fetch@entry=0x7fff95379238) at lib/thread.c:1487
14 0x00007f92aceb2681 in frr_run (master=0x55b46cf23540) at lib/libfrr.c:1010
15 0x000055b46bd676f4 in main (argc=11, argv=0x7fff953795a8) at bgpd/bgp_main.c:482
Root cause:
This is due to race condition between main thread & keepalive thread during clean-up.
This happens when the keepalive thread is processing a wake signal owning the mutex, when meanwhile the main thread tries to stop the keepalives thread.
In main thread, the keepalive thread’s running bit (fpt->running) is set to false, without taking the mutex & then it blocks on mutex.
Meanwhile, keepalive thread which owns the mutex sees that the running bit is false & executes bgp_keepalives_finish() which also frees up mutex.
Main thread that is waiting on mutex with pthread_mutex_lock() will cause core while trying to access mutex.
Fix:
Take the lock in main thread while setting the fpt->running to false.
Signed-off-by: Samanvitha B Bhargav <bsamanvitha@vmware.com>