SPT switchover handling is done by adding pimreg in the OIL of the (*, G)
entry on the LHR. This causes multicast data with that destination to be
sent to pimd as IGMPMSG_WHOLEPKT. These packets trigger creation of (S,G)
and also register forwarding. However register forwarding must only be done
if the router is also a FHR. That FHR check was missing causing strange
source registrations from multicast routers that were not directly
connected to the source.
Relevant logs from LHR -
PIM: pim_mroute_msg: pim kernel upcall WHOLEPKT type=3 ip_p=0 from fd=9 for (S,G)=(6.0.0.30,239.1.1.111) on pimreg vifi=0 size=98
PIM: Sending (6.0.0.30,239.1.1.111) Register Packet to 81.0.0.5
PIM: pim_register_send: Sending (6.0.0.30,239.1.1.111) Register Packet to 81.0.0.5 on swp2
And 6.0.0.30 is clearly not directly connected on that router -
root@tor-11:~# ip route |grep 6.0.0.30 -A2
6.0.0.30 proto ospf metric 20
nexthop via 6.0.0.22 dev swp1 weight 1 onlink
nexthop via 6.0.0.23 dev swp2 weight 1 onlink
root@tor-11:~#
Ticket: CM-24549
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
It was causing a Join on (S,G) who's prune state was being cleared. This
was an inactive (KAT not running; no immediate OIL) entry that was being
flushed out but because of this incorrect Join (that was being done with
out join-state checks) the source was getting populated repeatedy i.e.
never aged.
Output of "ip monitor mroute"
=============================
(27.0.0.11,239.1.1.102) Iif: lo State: resolved Table: default
Deleted (27.0.0.11,239.1.1.102) Iif: lo State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: pimreg State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: lo Oifs: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.104) Iif: lo Oifs: pimreg uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: lo Oifs: pimreg uplink-1 State: resolved Table: default
Deleted (27.0.0.11,239.1.1.102) Iif: lo State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: pimreg State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: lo Oifs: uplink-1 State: resolved Table: default
These mroute events (on a no longer existing multicast souce) continue in
a never ending loop.
Triggered joins/prunes MUST only done via state machine transitions i.e.
via pim_upstream_update_join_desired.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
JD macro is defined by the RFC as -
bool JoinDesired(S,G) {
return (immediate_olist(S,G) != NULL
OR (KeepaliveTimer(S,G) is running
AND inherited_olist(S,G) != NULL))
}
However for MSDP synced SA the KAT will not be running so an exception is
needed. Earlier I had done this by relaxing KAT_run requirements entirely
on the RP. However as that prevents the source from being aged out in some
cases I have made the check more narrow i.e. has to an MSDP peer added
entry.
Ticket: CM-24398
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Added event logs around add/del of upstream entries into the nbr's
jp-agg list. This is to help debug a problem with stale (deleted)
upstream entries being present in the list causing pimd to crash on
the periodic processing.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Today we are only pruning the SPT when (S,G) upstream entry
switches from Joined toNotJoined. This leaves the source still
pruned along the RPT till the next periodic XG join-prune is sent
to the RPF(RP). Traffic from the source will be blackholed for this
duration. To prevent that we need send a new JP message
to RPF(RP) immediately.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
It is now used to evaluate and display join-desired state for
each upstream entry -
root@spine-1:~# net show pim upstream-join-desired
Source Group EvalJD
* 239.1.1.111 yes
6.0.0.28 239.1.1.111 yes
6.0.0.29 239.1.1.111 no
6.0.0.30 239.1.1.111 yes
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This re-naming was needed because the JD state on an upstream is
not just based on channel info i.e. we can have JD=true even if there
is no downstream channel. The "show ip upstream-join-desired" command
will be changed to display that info i.e. upstream's JD state instead
of downstream channel params. The downstream channel params are now
available via "show ip pim channel"
PS: This change maybe reverted if upstream NAKs it. But there is a
pressing need for it to debug some not-so-reproduible problems.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This was causing pimd to crash later; call-stack -
(gdb) bt
context=<optimized out>) at lib/sigevent.c:254
group=group@entry=0x7ffffa9797e0) at pimd/pim_rp.c:207
grp=grp@entry=0x7ffffa9799fe, sgs=sgs@entry=0x560ac069edb0, size=52)
at pimd/pim_msg.c:200
groups=<optimized out>) at pimd/pim_join.c:562
at pimd/pim_neighbor.c:288
at lib/thread.c:1599
at lib/libfrr.c:1024
envp=<optimized out>) at pimd/pim_main.c:162
(gdb) fr 4
group=group@entry=0x7ffffa9797e0) at pimd/pim_rp.c:207
207 pimd/pim_rp.c: No such file or directory.
(gdb) fr 6
grp=grp@entry=0x7ffffa9799fe, sgs=sgs@entry=0x560ac069edb0, size=52)
at pimd/pim_msg.c:200
200 pimd/pim_msg.c: No such file or directory.
(gdb) p source->up->sg_str
$1 = '\000' <repeats 31 times>, <incomplete sequence \361>
(gdb)
This problem can manifest in the following event sequence -
1. upstream RPF neighbor is resolved
2. upstream RPF neighbor becomes unresolved (but upstream entry
stays on the jp-agg list)
3. upstream entry is removed
on the next old-neighbor jp-agg-list processing the stale entry is
accessed resulting in the crash.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Dumps while in problem state -
============================
[from "show ip pim state"]
Active Source Group RPT IIF OIL
1 6.0.0.31 239.1.1.111 n swp1 swp4( J * )
[from "show ip pim join"]
Interface Address Source Group State Uptime Expire Prune
swp3 6.0.0.22 6.0.0.31 239.1.1.111 JOIN --:--:-- 03:11 --:--
You can see from the dumps that the pim downstream router has joined on
swp3 but that OIF has not been added to the OIL with flag
PIM_OIF_FLAG_PROTO_PIM. This is because the join was rxed while the
ifchannel was in a prune-pending state.
Relevant logs -
===============
[
PIM: recv_prune: prune (S,G)=(6.0.0.31,239.1.1.111) rpt=1 wc=0 upstream=6.0.0.22 holdtime=210 from 6.0.0.28 on swp3
PIM: pim_upstream_ref(pim_ifchannel_add): upstream (6.0.0.31,239.1.1.111) ref count 3 increment
PIM: pim_upstream_add(pim_ifchannel_add): (6.0.0.31,239.1.1.111), iif 6.0.0.26/0 (swp1) found: 1: ref_count: 3
PIM: pim_ifchannel_add: ifchannel (6.0.0.31,239.1.1.111) is created
PIM: pim_joinprune_recv: SGRpt flag is set, del inherit oif from up (6.0.0.31,239.1.1.111)
PIM: pim_mroute_add(pim_channel_del_oif), vrf default Added Route: (6.0.0.31,239.1.1.111) IIF: swp1, OIFS: swp4
PIM: pim_channel_del_oif(pim_joinprune_recv): (S,G)=(6.0.0.31,239.1.1.111): proto_mask=4 IIF:1 OIF=swp3 vif_index=3
PIM: recv_join: join (S,G)=(6.0.0.31,239.1.1.111) rpt=0 wc=0 upstream=6.0.0.22 holdtime=210 from 6.0.0.28 on swp3
PIM: PIM_IFCHANNEL(swp3): (6.0.0.31,239.1.1.111) is switching from SGRpt(PP) to JOIN
PIM: Sending Request for New Channel Oil Information(6.0.0.31,239.1.1.111) VIIF 1(default)
]
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is needed for two reasons -
1. The inherited OIL needs to be setup independent of the RPF interface
to allow correct computation of the JoinDesired macro.
2. The RPF interface is computed at the time of MFC programming so
it is not possible to permanently evict the OIF at that time oif_add
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a inherited OIL becomes empty join-desired can go to false. So
we need to re-run join-desired evaluation on any inherited OIL changes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The macro was always returning non-empty because of comparing an
array of u8_t with an array of u32_t.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If an dummy upstream entry (no RPF nbr) which is already in a JOINED
state is resolved we were not triggering an immediate join via the
per-interface upstream switch list.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A dummy pim upstream entry can be in a JOINED state before its RPF nbr is
added. Handle that case by triggering an immediate join.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Deviations -
1. Avoid using SPTbit setting. Replace that with Use_Spt macro.
2. If S is supposed to be forwarded along the RPT but has an empty OIL
prune it.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. KAT should be re-started only if traffic rxed along the SPT i.e.
IIF == RPF_Interface(S).
Only exception to the rule is if you are LHR.
2. KAT should be started on all routers (not just FHR, RP, LHR).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Criteria for switching to SPT is different on RP and LHR. Re-name
the functions to make that apparent.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Joined state is computed based on the downstream state and cannot be
changed if the RPF link flaps.
Reference: rfc 7761, section 4.5.5
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This commit includes the following changes -
1. kat needs to be included when evaluting join desired on a (S,G)
entry.
2. there were cases where we were adding OIF based on joindesired
being true for unrelated reasons (on other OIFs). cleaned up those
cases.
3. make all calls to pim_upstream_switch conditional on the JoinDesired
macro.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
RP config change is a big hammer and use_rpt/spt needs to be
re-evaluated on all existing (S,G) entries.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If a source is being forwarded along the RPT it uses the parent (*,G)'s
IIF. When the parent's IIF changes all the children need to be updated
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
mfcc_parent for an (S, G) entry was being updated on any upstream RPF
change. With the change to use RPT for (S,G) in some cases we can no
longer do that. Instead the upstream entry's RPF neigbor is managed
separately form the channel_oil's mfcc_parent i.e. via NHT. And the
mfcc_parent is evaluated at the time of mroute programming.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An (S,G) mroute can be created as a result of rpt prune. However that
entry needs to stay on the parent (*,G)'s tree (IIF) till a decision is
made to switch the source to the SPT.
The decision to stay on the RPT is made based on the SPTbit setting
according to - RFC7761, Section 4.2 “Data Packet Forwarding Rules”
However those rules are hard to achieve when hw acceleration i.e.
control and data planes are separate. So instead of relying on data
we make the decision of using SPT if we have decided to join the SPT -
Use_RPT(S,G) {
if (Joined(S,G) == TRUE // we have decided to join the SPT
OR Directly_Connected(S) == TRUE // source is directly connected
OR I_am_RP(G) == TRUE) // RP
//use_spt
return FALSE;
//use_rpt
return TRUE;
}
To make that change some re-org was needed -
1. pim static mroutes and dynamic (upstream mroutes) top level APIs
have been separated. This is to limit the state machine to dynamic
mroutes.
2. c_oil->oil.mfcc_parent is re-evaluated based on if we decided
to use the SPT or stay on the RPT.
3. upstream mroute re-eval is done when any of the criteria involved
in Use_RPT changes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Theoretically there should be no case where the channel-oil hangs
around after the upstream entry is removed. But currently there are
cases where it does. This is a precautionary fixup till we are
rid off all of those cases.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. This avoids the needs to re-run "muting" decisions.
2. Avoids the need to restore's pim OIL after fixup and send to kernel
(this is getting harder to manage).
In the future we need to also move the PIM maintained channel OIL from
an array of MAXVIFs to a simple DLL. This will be a significant
optimization in memory usage and preformance (OIL reads, copies etc).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If an mroute loses DF election (with the MLAG peer) it has to stop
forwarding traffic on active-active devices such as ipmr-lo used
for vxlan traffic termination. To acheive that this commit
introduces a concept of OIF muting. That way we can let the PIM and
IGMP state machines play out and silence OIFs after the fact.
Relevant outputs:
=================
1. muted OIFs are displayed with the M flag in "pim state" -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@TORC12:~# net show pim state |grep "27.0.0.13"|grep 100
1 27.0.0.13 239.1.1.100 uplink-1 ipmr-lo( *M)
root@TORC12:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2. And supressed altogether in the mroute output -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@TORC12:~# net show mroute |grep "27.0.0.13"|grep 100
27.0.0.13 239.1.1.100 none uplink-1 none 0 --:--:--
root@TORC12:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
These logs were printing file name which has little value (is always
pim_oil.c). Instead print the caller.
add_oif/del_oif are being called directly from one too many. Instead OIF
setup needs to be consolidated via the PIM state machine. These
debugs are expected to help in understanding what needs to be cleaned up.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When receiving igmp packets we are spitting out a lot of
debugs. Attempt to clean this up to allow us to understand
what is going on a bit better by just being able to look
at the log file.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When you turn on `debug igmp trace` we are seeing a bunch
of debugs associated with pim processing. This is because we were
using PIM_DEBUG_TRACE which is both `debug igmp trace` and `debug pim trace`
when tracing igmp code it would be nice to only see igmp work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We are seeing situations where PIM is sending a IGMP v3 query
and immediately receiving back up the pim kernel interface the
query from itself:
from `show int brief`:
swp7 up default 192.168.202.1/24
We are also receiving these debugs:
2019-11-11T20:52:40.452307+00:00 leaf02 pimd[1592]: Send IGMPv3 query to 224.4.0.8 on swp7 for group 224.4.0.8, sources=0 msg_size=12 s_flag=0 QRV=2 QQI=125 QQIC=7d
2019-11-11T20:52:40.452430+00:00 leaf02 pimd[1592]: pim_mroute_msg(default): igmp kernel upcall on swp7(0x55eaa7dc7dc0) for 192.168.202.1 -> 224.4.11.123
2019-11-11T20:52:40.452574+00:00 leaf02 pimd[1592]: Recv IP packet from 192.168.202.1 to 224.4.11.123 on swp7: size=40 ip_header_size=24 ip_proto=2
2019-11-11T20:52:40.452699+00:00 leaf02 pimd[1592]: Recv IGMP packet from 192.168.202.1 to 224.4.11.123 on swp7: ttl=1 msg_type=17 msg_size=16
2019-11-11T20:52:40.452824+00:00 leaf02 pimd[1592]: Recv IGMP query v3 from 192.168.202.1 on swp7 for group 224.4.11.123
This query is causing us to do some weird gyrations around the IGMP state machine for handling
queries. Let's just prevent it from happening.
Ticket: CM-27247
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When adding an OIF to the OIL, if we are not the DR
there is no need to install it then remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have a zlog_warn that is unguarded ( and really is a debug message )
as that there is nothing the end user can do and nothing to note
here other than a debug message to track refcounts. Change
to an appropriate debug and zlog_debug it instead.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When you enter:
ip pim ssm prefix-list my-custom-ssm-range
ip pim ssm prefix-list my-custom-ssm-range
The second instance would cause a failure to happen which
should not happen w/ duplicate config.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Scenarios where this code change is required:
1. BFD is un-configured from BGP at remote end.
Neighbour BFD sends ADMIN_DOWN state, but BFD on local side will send
DOWN to BGP, resulting in BGP session DOWN.
Removing BFD session administratively shouldn't bring DOWN BGP session
at local or remote.
2. BFD is un-configured from BGP or shutdown locally.
BFD will send state DOWN to BGP resulting in BGP session DOWN.
(This is akin to saying do not use BFD for BGP)
Removing BFD session administratively shouldn't bring DOWN BGP session at
local or remote.
Signed-off-by: Sayed Mohd Saquib sayed.saquib@broadcom.com
The result variable was already tested against PIM_GROUP_BAD_ADDR_MASK_COMBO
earlier in the function. No need to do the same thing twice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
All paths leading to this point in the code have already deref'ed
the pim->global_scope.bsrp_table. No point in testing for
validness now. This was caught by Coverity.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_msg_send() return code was not being checked. Make
consistent with it's usage everywhere else.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleanup the interface creation apis to make it more
clear what they are doing.
Make it explicit that the creation via name/ifindex will
only add it to the appropriate list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>