I'm seeing crashes in ospf6_write on the `assert(node)`. The only
sequence of events that I see that could possibly cause this to happen
is this:
a) Someone has scheduled a outgoing write to the ospf6->t_write and
placed item(s) on the ospf6->oi_write_q
b) A decision is made in ospf6_send_lsupdate() to send an immediate
packet via a event_execute(..., ospf6_write,....).
c) ospf6_write is called and the oi_write_q is cleaned out.
d) the t_write event is now popped and the oi_write_q is empty
and FRR asserts on the `assert(node)` <crash>
When event_execute is called for ospf6_write, just cancel the t_write
event. If ospf6_write has more data to send at the end of the function
it will reschedule itself. I've only seen this crash one time and am
unable to reliably reproduce this at all. But this is the only mechanism
that I can see that could make this happen, given how little the oi_write_q
is actually touched in code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There were a couple of cli paths that NULL-checked in the
vtysh output path, but not in the json path.
Signed-off-by: Mark Stapp <mjs@labn.net>
(cherry picked from commit 864a3bc185)
Crash with empty `ip-protocol`:
```
anlan(config-pbr-map)# match ip-protocol
vtysh: error reading from pbrd: Resource temporarily unavailable (11)Warning: closing connection to pbrd because of an I/O error!
```
So, give warning for empty `ip-protocol`.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
(cherry picked from commit 4e313ee450)
This reverts commit b1d29673ca.
This commit introduced a crash. When the VRF is deleted, the RIPNG
instance should not be freed, because the NB infrastructure still stores
the pointer to it. The instance should be deleted only when it's actually
deleted from the configuration.
To reproduce the crash:
```
frr# conf t
frr(config)# vrf vrf1
frr(config-vrf)# exit
frr(config)# router ripng vrf vrf1
frr(config-router)# exit
frr(config)# no vrf vrf1
frr(config)# no router ripng vrf vrf1
vtysh: error reading from ripngd: Resource temporarily unavailable (11)Warning: closing connection to ripngd because of an I/O error!
frr(config)#
```
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
(cherry picked from commit 9f6dade90e)
This reverts commit 3d1588d8ed.
This commit introduced a crash. When the VRF is deleted, the RIP instance
should not be freed, because the NB infrastructure still stores the
pointer to it. The instance should be deleted only when it's actually
deleted from the configuration.
To reproduce the crash:
```
frr# conf t
frr(config)# vrf vrf1
frr(config-vrf)# exit
frr(config)# router rip vrf vrf1
frr(config-router)# exit
frr(config)# no vrf vrf1
frr(config)# no router rip vrf vrf1
vtysh: error reading from ripd: Resource temporarily unavailable (11)Warning: closing connection to ripd because of an I/O error!
frr(config)#
```
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
(cherry picked from commit 054ca9b9ee)
* Bug Fixes
bfdd
Fix malformed session with vrf
Remove redundant nb destroy callbacks
bgpd
Ensure stream received has enough data
Fix bgpd core when unintern attr
Fix the json output of show bgp all json to be in a valid format
Make sure aigp attribute is non-transitive
Using no pretty json output for l2vpn-evpn routes
doc
Add `neighbor aigp` command for bgp
lib
Fix memory leak in in link state
Fix vtysh core when handling questionmark
Link state memory corruption
ospfd
Fix interface param type update
Fix memory leaks w/ `show ip ospf int x json` commands
Ospf opaque lsa stale processing fix and topotests.
Respect loopback's cost that is set and set loopback costs to 0
pim6d
Fix crash in ipv6 pim command
pimd
Pim not sending register packets after changing from non dr to dr
tests
Adjust aigp metric numbers for ibgp setup
tools
Fix list value remove in frr-reload
vtysh
Give actual pam error messages
zebra
Evpn handle del event for dup detected mac
Fix dp_out_queued counter to actually reflect real life
Fix evpn dup detected local mac del event
Reduce creation and fix memory leak of frrscripting pointers
Unlock the route node when sending route notifications
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
interface link update event needs
to be handle properly in ospf interface
cache.
Example:
When vrf (interface) is created its default type
would be set to BROADCAST because ifp->status
is not set to VRF.
Subsequent link event sets ifp->status to vrf,
ospf interface update need to compare current type
to new default type which would be VRF (OSPF_IFTYPE_LOOPBACK).
Since ospf type param was created in first add event,
ifp vrf link event didn't update ospf type param which
leads to treat vrf as non loopback interface.
Ticket:#3459451
Testing Done:
Running config suppose to bypass rendering default
network broadcast for loopback/vrf types.
Before fix:
vrf vrf1
vni 4001
exit-vrf
!
interface vrf1
ip ospf network broadcast
exit
After fix: (interface vrf1 is not displayed).
vrf vrf1
vni 4001
exit-vrf
Signed-off-by: Chirag Shah <chirag@nvidia.com>
(cherry picked from commit 0d005b2d5c)
BGP_PREFIX_SID_SRV6_L3_SERVICE attributes must not
fully trust the length value specified in the nlri.
Always ensure that the amount of data we need to read
can be fullfilled.
Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 06431bfa75)
There might be a time element(s) from
temporary list are removed more than once
which leads to valueError in certain python3
version.
commit-id 1543f58b5 did not handle valueError
properly. This caused regression where
prefix-list config leads to delete followed
by add.
The new fix should just pass the exception as
value removal from list_to_add or list_to_del
is best effort.
This allows prefix-list config has no change
then removes the lines from lines_to_del and
lines_to_add properly.
Ticket:#3490252
Testing:
Configure prefix-list in frr.conf and perform
multiple frr-reload. After first reload operatoin
subsequent ones should not result in delete followed
by add of the prefix-list but rather no-op operation.
(Pdb) lines_to_add
[(('ip prefix-list FOO permit 10.2.1.0/24',), None)]
(Pdb) lines_to_del
[(('ip prefix-list FOO seq 5 permit 10.2.1.0/24',), None),
(('ip prefix-list FOO seq 10 permit 10.2.1.0/24',), None)]
(Pdb) lines_to_del_to_del
[(('ip prefix-list FOO seq 5 permit 10.2.1.0/24',), None),
(('ip prefix-list FOO seq 10 permit 10.2.1.0/24',), None)]
(Pdb) lines_to_add_to_del
[(('ip prefix-list FOO permit 10.2.1.0/24',), None),
(('ip prefix-list FOO permit 10.2.1.0/24',), None)]
(Pdb) c
> /usr/lib/frr/frr-reload.py(1562)ignore_delete_re_add_lines()
-> return (lines_to_add, lines_to_del)
(Pdb) lines_to_add
[]
(Pdb) lines_to_del
[]
Signed-off-by: Chirag Shah <chirag@nvidia.com>
(cherry picked from commit 9845c09d61)
When using a context to send route notifications to upper
level protocols, the code was using a locking function to
get the route node. There is no need for this to be locked
as such FRR should free it up.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 82c6e4fea5)
When issue vtysh command with ?, the initial buf size for the
element is 16. Then it would loop through each element in the cmd
output vector. If the required size for printing out the next
element is larger than the current buf size, realloc the buf memory
by doubling the current buf size regardless of the actual size
that's needed. This would cause vtysh core when the doubled size
is not enough for the next element.
Signed-off-by: Yuan Yuan <yyuanam@amazon.com>
(cherry picked from commit f8aa257997)
Problem:
Execute the below commands, pim6d core happens.
interface ens193
ip address 69.0.0.2/24
ipv6 address 8000::1/120
ipv6 mld
ipv6 pim
We see crash only if the interface is not configured, and
we are executing PIM/MLD commands.
RootCause:
Interface ens193 is not configured. So, it will have
ifindex = 0 and mroute_vif_index = -1.
Currently, we don't enable MLD on an interface if
mroute_vif_index < 0. So, pim_ifp->MLD = NULL.
In the API pim_if_membership_refresh(), we are accessing
pim_ifp->MLD NULL pointer which leads to crash.
Fix:
Added NULL check before accessing pim_ifp->MLD pointer in
the API pim_if_membership_refresh().
Issue: #13385
Signed-off-by: Sarita Patra <saritap@vmware.com>
(cherry picked from commit 6d1d2c27a3)
When the remote peer is neither EBGP nor confed, aspath is the
shadow copy of attr->aspath in bgp_packet_attribute(). Striping
AS4_PATH should not be done on the aspath directly, since
that would lead to bgpd core dump when unintern the attr.
Signed-off-by: Yuan Yuan <yyuanam@amazon.com>
(cherry picked from commit 32af4995aa)
Code was was written where the pam error message put out
was the result from a previous call to the pam modules
instead of the current call to the pam module.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 8495b425bd)
The output of show bgp all json is inconsistent across Address-families
i.e. ipv4/ipv6 is a no pretty format while l2vpn-evpn is in a pretty
format. For huge scale (lots of routes with lots of paths), it is better
to use no_pretty format.
Before fix:
torm-11# sh bgp all json
{
"ipv4Unicast":{
"vrfId": 0,
"vrfName": "default",
"tableVersion": 1,
"routerId": "27.0.0.15",
"defaultLocPrf": 100,
"localAS": 65000,
"routes": { } }
,
"l2VpnEvpn":{
"routes":{
"27.0.0.15:2":{
"rd":"27.0.0.15:2",
"[1]:[0]:[03:44:38:39:ff:ff:01:00:00:01]:[128]:[::]:[0]":{
"prefix":"[1]:[0]:[03:44:38:39:ff:ff:01:00:00:01]:[128]:[::]:[0]",
"prefixLen":352,
"paths":[
<SNIP>.............
After fix:
torm-11# sh bgp all json
{
"ipv4Unicast":{
"vrfId": 0,
"vrfName": "default",
"tableVersion": 1,
"routerId": "27.0.0.15",
"defaultLocPrf": 100,
"localAS": 65000,
"routes": { } }
,
"l2VpnEvpn":{
"routes":{"27.0.0.15:2":{"rd":"27.0.0.15:2","[1]:[0]:[03:44:38:39:ff:ff:01:00:00:01]:[128]:[::]:[0]":{"prefix":"[1]:[0]:[03:44:38:39:ff:ff:01:00:00:01]:[128]:[::]:[0]","prefixLen":352,"paths":[[{"valid":true,"bestpath":true,"selectionReason":"First path received","pathFrom":"external","routeType":1,"weight":32768,"peerId":"(unspec)","path":"","origin":"IGP","extendedCommunity"
<SNIP>.............
Issue: 3472865
Ticket:#3472865
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit 82465ca7f9)
In the json output of show bgp all json, the l2VpnEvpn afi-safi is
missing the 'routes' key making the json output format invalid.
Before Fix:
torm-11# sh bgp all json
{
<SNIP>....................
"l2VpnEvpn":{
{
"27.0.0.15:2":{
"rd":"27.0.0.15:2",
"[4]:[03:44:38:39:ff:ff:01:00:00:01]:[32]:[27.0.0.15]":{
"prefix":"[4]:[03:44:38:39:ff:ff:01:00:00:01]:[32]:[27.0.0.15]",
"prefixLen":352,
"paths":[
<SNIP>....................
After Fix:
torm-11# sh bgp all json
{
<SNIP>....................
"l2VpnEvpn":{
"routes":{
"27.0.0.15:2":{
"rd":"27.0.0.15:2",
"[1]:[0]:[03:44:38:39:ff:ff:01:00:00:01]:[128]:[::]:[0]":{
"prefix":"[1]:[0]:[03:44:38:39:ff:ff:01:00:00:01]:[128]:[::]:[0]",
"prefixLen":352,
"paths":[
Issue: 3472865
Ticket:#3472865
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit be66fa05c9)
With this configuration:
```
bfd
peer 33:33::66 local-address 33:33::88 vrf vrf8 interface enp1s0
exit
!
exit
```
The bfd session can't be established with error:
```
bfdd[18663]: [YA0Q5-C0BPV] control-packet: wrong vrfid. [mhop:no peer:33:33::66 local:33:33::88 port:2 vrf:61]
```
The vrf check should use the carefully adjusted `vrfid`, which is
based on globally/reliable interface. We can't believe the
`bvrf->vrf->vrf_id` because the `/proc/sys/net/ipv4/udp_l3mdev_accept`
maybe is set "1" in VRF-lite backend even with security drawback.
Just correct the vrf check.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
(cherry picked from commit b17c179664)
The prov->dp_out_queued counter was never being decremented
when a ctx was pulled off of the list. Let's change it to
accurately reflect real life.
Broken:
janelle.pinkbelly.org# show zebra dplane providers detailed
Zebra dataplane providers:
Kernel (1): in: 330872, q: 0, q_max: 100, out: 330872, q: 330872, q_max: 330872
janelle.pinkbelly.org#
Fixed:
sharpd@janelle:/tmp/topotests$ vtysh -c "show zebra dplane providers detailed"
Zebra dataplane providers:
Kernel (1): in: 221495, q: 0, q_max: 100, out: 221495, q: 0, q_max: 100
sharpd@janelle:/tmp/topotests$
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 995d810d08)
1. Fix OSPF opaque LSA processing to preserve the stale opaque
LSAs in the Link State Database for 60 seconds consistent with
what is done for other LSA types.
2. Add a topotest that tests for cases where ospfd is restarted
and a stale OSPF opaque LSA exists in the OSPF routing domain
both when the LSA is purged and when the LSA is reoriginagted
with a more recent instance.
Signed-off-by: Acee <aceelindem@gmail.com>
(cherry picked from commit 4e7eb1e62c)
When setting an loopback's cost, set the value to 0, unless the operator
has assigned a value for the loopback's cost.
RFC states:
If the state of the interface is Loopback, add a Type 3
link (stub network) as long as this is not an interface
to an unnumbered point-to-point network. The Link ID
should be set to the IP interface address, the Link Data
set to the mask 0xffffffff (indicating a host route),
and the cost set to 0.
FRR is going to allow this to be overridden if the operator specifically
sets a value too.
Fixes: #13472
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit dd2bc4fb40)
There are two issues being addressed:
a) The ZEBRA_ON_RIB_PROCESS_HOOK_CALL script point
was creating a fs pointer per dplane ctx in
rib_process_dplane_results().
b) The fs pointer was not being deleted and directly
leaked.
For (a) Move the creation of the fs to outside
the do while loop.
For (b) At function end ensure that the pointer
is actually deleted.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In function ls_find_subnet(), prefix argument is directly copied into
subnet.key structure to find corresponding subnet in RB Tree. This could leadr
to a memory corruption. Function prefix_copy() must be used instead.
This patch replaces the direct prefix copy by a call to prefix_copy() function
to avoid this memory issue.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When using ls_stream2ted() function to parse Opaque Link State message to local
TED, in case of vertex or subnet deletion, the function return a pointer to the
deleted ls_element instead of NULL. This could lead into a potential pointer
corruption when caller try to access to the deleted ls_element.
This patch ensure that the ls_element pointer return by ls_stream2ted()
function is NULL when the message event is a delete operation for vertex and
subnet. Note that edge deletion was correctly handled.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
FRR has a memory leak in the case when int X does not
exist and a memory leak when int X does exist. Fix
these
Fixes: #13434
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 74e21732db)