When pimd has this setup:
src ----- rtr ------ receiver
|
rp
And the receiver sends a *,G join to rtr. When the
src starts sending a S,G, rtr can wait up to one join/prune
interval before sending a S,G rpt prune. This interval
causes the pimreg device to be in the S,G OIL as that the
RP does not know to prune this leg off.
before:
Timestamp: Thu Mar 31 10:15:18 2022 288767 usec
[MROUTE](10.103.0.5,239.0.0.4) Iif: rtr-lan_src Oifs: rtr-lan State: resolved Table: default
Timestamp: Thu Mar 31 10:15:18 2022 288777 usec
[MROUTE](10.103.0.5,239.0.0.4) Iif: rtr-lan_src Oifs: rtr-lan rtr-lan-1 State: resolved Table: default
Timestamp: Thu Mar 31 10:15:18 2022 288789 usec
[MROUTE](10.103.0.5,239.0.0.4) Iif: rtr-lan_src Oifs: pimreg rtr-lan rtr-lan-1 State: resolved Table: default
Timestamp: Thu Mar 31 10:15:49 2022 324995 usec
[MROUTE](10.103.0.5,239.0.0.4) Iif: rtr-lan_src Oifs: rtr-lan rtr-lan-1 State: resolved Table: default
<31 seconds>
After:
Timestamp: Thu Mar 31 12:56:15 2022 501921 usec
(10.103.0.5,239.0.0.27) Iif: rtr-lan_src Oifs: pimreg rtr-lan State: resolved Table: default
Timestamp: Thu Mar 31 12:56:15 2022 501930 usec
(10.103.0.5,239.0.0.27) Iif: rtr-lan_src Oifs: pimreg rtr-lan rtr-lan-1 State: resolved Table: default
Timestamp: Thu Mar 31 12:56:15 2022 502181 usec
(10.103.0.5,239.0.0.27) Iif: rtr-lan_src Oifs: rtr-lan rtr-lan-1 State: resolved Table: default
<sub second>
What is actually happening:
rtr receives a *,G igmp join, sends a *,G join towards the rp
rtr receives a S,G packet <WRVIFWHOLE>
creates the S,G upstream, sends the register packet to the rp
the rp sees that it still has downstream interest so it forwards the packet on
After (up to 60 seconds ) the rtr, sends the normally scheduled join for
the G and sends the S,GRPT prune as part of it.
What is being done to fix it:
In wrvifwhole handling, when pimd detects that this is the FHR
and is not the RP *and* the incoming interface for the *,G
is different than the incomding interface for the S,G immediately
send a single join packet for the G( which will have the S,G RPT
prune in it ). Only do this on the first time receiving the
WRVIFWHOLE.
Ticket: #2755650
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
a) Remove the retry mechanism to continue looking for 75%
of the time for pim code.
This alone saves a bunch of time in tests that use lib/pim.py
Effectively all the times given for retry are already long
enough. Additionally some tests are gathering data with
the expectation that they will not find data so the entire
time is being taken up in retry's. Extending the retry
mechanism makes this even worse. This is especially bad
for pim in that keep alive timers are counting down and
state can be removed due to excessive time waiting.
b) Reduce verify verify_multicast_traffic from 40 seconds
to 20 seconds to gather traffic data.
A bunch of tests are doing this:
a) gather pre test start traffic data( taking about 70
seconds to run, because a bunch of time it was looking
for data that does not exist yet)
b) run a change to introduce a different traffic flow
c) gather post test traffic data ( taking about 70
seconds to run )
Why does this matter? Tests were iterating through
all the different routers looking for traffic flow
as well as different mroute state. This is against
the keepalive timer of 210 seconds. It does not take
long before the stream can be removed and the test is
still looking for data that is no longer there due
to state timeout.
The multicast_pim_sm_topo3/test_multicast_pim_sm_topo3.py
test reduced run time from 398 seconds to 297 seconds.
Greatly reducing keepalive timeout problems.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Since there are two kinds of ESI (Type-0 and Type-3), the warnings
should distinguish between the two cases.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
"no ead-es-route-target export RT":
Since existance is already checked in `bgp_evpn_ead_es_rt_cmd`
with `bgp_evpn_rt_matches_existing()`, there MUST be a deleting
node in evpn's `bgp_mh_info->ead_es_export_rtl` list.
Just modify the check for deleting node to an `assert`.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
The commands:
router isis 1
mpls-te on
no mpls-te on
mpls-te on
no mpls-te on
!
Will crash
Valgrind gives us this:
==652336== Invalid read of size 8
==652336== at 0x49AB25C: typed_rb_min (typerb.c:495)
==652336== by 0x4943B54: vertices_const_first (link_state.h:424)
==652336== by 0x493DCE4: vertices_first (link_state.h:424)
==652336== by 0x493DADC: ls_ted_del_all (link_state.c:1010)
==652336== by 0x47E77B: isis_instance_mpls_te_destroy (isis_nb_config.c:1871)
==652336== by 0x495BE20: nb_callback_destroy (northbound.c:1131)
==652336== by 0x495B5AC: nb_callback_configuration (northbound.c:1356)
==652336== by 0x4958127: nb_transaction_process (northbound.c:1473)
==652336== by 0x4958275: nb_candidate_commit_apply (northbound.c:906)
==652336== by 0x49585B8: nb_candidate_commit (northbound.c:938)
==652336== by 0x495CE4A: nb_cli_classic_commit (northbound_cli.c:64)
==652336== by 0x495D6C5: nb_cli_apply_changes_internal (northbound_cli.c:250)
==652336== Address 0x6f928e0 is 272 bytes inside a block of size 320 free'd
==652336== at 0x48399AB: free (vg_replace_malloc.c:538)
==652336== by 0x494BA30: qfree (memory.c:141)
==652336== by 0x493D99D: ls_ted_del (link_state.c:997)
==652336== by 0x493DC20: ls_ted_del_all (link_state.c:1018)
==652336== by 0x47E77B: isis_instance_mpls_te_destroy (isis_nb_config.c:1871)
==652336== by 0x495BE20: nb_callback_destroy (northbound.c:1131)
==652336== by 0x495B5AC: nb_callback_configuration (northbound.c:1356)
==652336== by 0x4958127: nb_transaction_process (northbound.c:1473)
==652336== by 0x4958275: nb_candidate_commit_apply (northbound.c:906)
==652336== by 0x49585B8: nb_candidate_commit (northbound.c:938)
==652336== by 0x495CE4A: nb_cli_classic_commit (northbound_cli.c:64)
==652336== by 0x495D6C5: nb_cli_apply_changes_internal (northbound_cli.c:250)
==652336== Block was alloc'd at
==652336== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==652336== by 0x494B6F8: qcalloc (memory.c:116)
==652336== by 0x493D7D2: ls_ted_new (link_state.c:967)
==652336== by 0x47E4DD: isis_instance_mpls_te_create (isis_nb_config.c:1832)
==652336== by 0x495BB29: nb_callback_create (northbound.c:1034)
==652336== by 0x495B547: nb_callback_configuration (northbound.c:1348)
==652336== by 0x4958127: nb_transaction_process (northbound.c:1473)
==652336== by 0x4958275: nb_candidate_commit_apply (northbound.c:906)
==652336== by 0x49585B8: nb_candidate_commit (northbound.c:938)
==652336== by 0x495CE4A: nb_cli_classic_commit (northbound_cli.c:64)
==652336== by 0x495D6C5: nb_cli_apply_changes_internal (northbound_cli.c:250)
==652336== by 0x495D23E: nb_cli_apply_changes (northbound_cli.c:268)
Let's null out the pointer. After this change. Valgrind no longer reports issues
and isisd no longer crashes.
Fixes: #10939
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
It's confusing for a user to see 'Tx RA failed' in the logs when
they've enabled RAs (either through interface config or BGP unnumbered)
on an interface that can't send them. Let's avoid sending RAs on
interfaces that are bridge_slaves or don't have a link-local address,
since they are the two of the most common reasons for RA Tx failures.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Without `-g` in LDFLAGS we won't get debug info even if it's enabled in
CFLAGS. Since we're controlling debug info through CFLAGS, there's no
harm in always having `-g` in LDFLAGS.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
- Create frr docker container based in new Red Hat Universal Base
Images.
- This build a docker container based in ubi-8.
- Need to get the devel packages from centos-8 stream repos.
- Centos-8 stream repos added : base, appstream, powertools and epel
Signed-off-by: Javier Garcia <javier.martin.garcia@ibm.com>
When using bfd on a single level, one may access a null pointer
list. Prevent from using it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The bounded vrf of `l2vni/zevpn` have wrong relation with the order
in which vxlan interface and svi interface are set.
If set vxlan interface with vlanid first, then set svi interface with
vrf, it is ok that vxlan interface will get correct `vrf` inherited
from svi. But reverse the set sequence (i.e. set svi first, then vxlan),
vxlan interface can't get correct `vrf`, becasue the handling of
`ZEBRA_VXLIF_VLAN_CHANGE` missed inheritting `vrf` by mistake.
```
host# do show evpn vni 101
VNI: 101
Type: L2
Tenant VRF: vrf1
```
So update `vrf` ("Tenant VRF") of l2vni in `zebra_vxlan_if_update()`.
Signed-off-by: anlan_cs <vic.lan@pica8.com>