Commit Graph

278 Commits

Author SHA1 Message Date
Donald Sharp
e5b5ea92c5
Merge pull request #9906 from patrasar/2553196
pimd: During Joined -> NotJoined, upstream should send prune nomatter
2021-11-24 12:33:44 -05:00
sarita patra
0a4497f14a pimd: During Joined -> NotJoined, upstream should send prune nomatter
RCA: When upstream transition from Joined to NotJoined due to SGRpt
prune, then only SGRpt prune was sent and SG Prune is missed.

Fix: Send SG Prune towards source as well as SGRpt prune towards RP.

Signed-off-by: sarita patra <saritap@vmware.com>
2021-11-24 04:30:10 -08:00
Donald Sharp
f1189d7374
Merge pull request #9919 from mobash-rasool/pim-upst-3
pimd: STAR inherited Flag not properly set in certain scenarios
2021-11-22 14:42:56 -05:00
David Lamparter
86696f7bbe pimd: remove some constant parameters
ch_del is always true for all callers of ifjoin_to_noinfo.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2021-11-17 16:46:05 +01:00
David Lamparter
4efdb9c628 pimd: clean up BSR NHT & fix parallel links
The Bootstrap message RX path needs a RPF check for the BSR address,
and this is implemented both incorrectly as well as quite ugly.

Clean up and fix case when we have multiple interfaces to the same LAN
and/or ECMP nexthops (both would cause message duplication, the former
can even cause BSM forwarding loops.)

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2021-11-17 11:17:44 +01:00
Donald Sharp
e1d1b1dec7 pimd: Remove default from enum based switch
enum based switches should never use default.  It makes
it very hard to fix and find issues when the enum is
changed.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-11-10 17:35:22 -05:00
Mobashshera Rasool
5428f5ac64 pimd: STAR inherited Flag not properly set in certain scenarios
Problem Statement:
==================
Mroutes are not recovered after shut/no shut of DUT to RP links

One interface is not added in OIL List in intermediate router,
hence traffic never received at LHR and mroutes not created for (S,G).

Root Cause Analysis:
====================
Generally (*,G) PIM Join is received first and then (S,G) joins are received.
This issue occurs when (S,G) join comes first and then the (*,G) Join.
When (S,G) PIM Join is received, ifchannel is created and channel_oil
OIF flag is set to PIM_OIF_FLAG_PROTO_PIM. Now when (*,G) join is received
the flag PIM_OIF_FLAG_PROTO_STAR is not inherited due to wrong check present in
function pim_upstream_inherited_olist_decide.

Fix:
===================
When (*,G) PIM Join is received, it should always add PIM_OIF_FLAG_PROTO_STAR
flag for all the (S,G) channel oils no matter what order the (*,G) or (S,G)
is received.

Fixes: #9918

Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
2021-10-29 02:57:34 -07:00
Christian Hopps
e8b7548c0d pimd: fix register suppress timer code
Signed-off-by: Christian Hopps <chopps@labn.net>
2021-08-19 00:28:35 -04:00
Donatas Abraitis
936fbaef47 *: Replace IPV4_MAX_PREFIXLEN to IPV4_MAX_BITLEN
Just drop IPV4_MAX_PREFIXLEN at all, no need keeping both.

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-07-01 17:44:09 +03:00
Don Slice
3f1f8641fa pimd: adjust rp_keep_alive_time when register_suppress_time is changed
The router->register_suppress_time is used to derive the
rp_keep_alive_time, but when the suppress time was changed, pim was
not recalculating the rp_keep_alive_time and left it at the old value.
This fix applies the changes when a new suppress_time is entered
(or removed.)

Signed-off-by: Don Slice <dslice@nvidia.com>
2021-05-05 09:02:28 -04:00
Mobashshera Rasool
195427c8fd pimd: SPT-bit is not set to false as per RFC section in one flow
1. As per RFC 4601 Sec 4.5.7:
* JoinDesired(S,G) -> False, set SPTbit to false.

2. Change the debug type.

Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
2021-01-11 05:23:41 +00:00
Mark Stapp
5047884528 *: unify thread/event cancel macros
Replace all lib/thread cancel macros, use thread_cancel()
everywhere. Only the THREAD_OFF macro and thread_cancel() api are
supported. Also adjust thread_cancel_async() to NULL caller's pointer (if
present).

Signed-off-by: Mark Stapp <mjs@voltanet.io>
2020-10-23 12:16:52 -04:00
Mark Stapp
ee2bbf7ce2 pimd: replace inet_ntoa
Replace all use of inet_ntoa, using %pI4 or inet_ntoa instead

Signed-off-by: Mark Stapp <mjs@voltanet.io>
2020-10-22 10:13:56 -04:00
Donatas Abraitis
2dbe669bdf :* Convert prefix2str to %pFX
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2020-10-22 09:07:41 +03:00
Donald Sharp
88b5958e31
Merge pull request #6054 from sarav511/dr2ndr
pimd: When DR becomes non DR, Still sends register packets to RP
2020-06-01 07:58:32 -04:00
Donald Sharp
75e43de7a6
Merge pull request #6048 from sarav511/joinsup
pimd: In join suppression period, join is being sent
2020-06-01 07:36:24 -04:00
David Lamparter
c334a16ef1
Merge pull request #6262 from qlyoung/remove-sprintf 2020-04-23 20:27:26 +02:00
Quentin Young
772270f3b6 *: sprintf -> snprintf
Replace sprintf with snprintf where straightforward to do so.

- sprintf's into local scope buffers of known size are replaced with the
  equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
  size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
  buffer followed by strlcat

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2020-04-20 19:14:33 -04:00
Rafael Zalamena
5920b3eb38 *: replace all random() calls
Replace all `random()` calls with a function called `frr_weak_random()`
and make it clear that it is only supposed to be used for weak random
applications.

Use the annotation described by the Coverity Scan documentation to
ignore `random()` call warnings.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-17 20:57:43 -03:00
Donald Sharp
f32b16b09f
Merge pull request #6017 from sarav511/ovrride
pimd: Join not sent within prune override time when received non local prune
2020-04-14 08:32:32 -04:00
Donald Sharp
8fcca5024f
Merge pull request #6079 from sarav511/regstop_exp
pimd: Reg Suppression expiry has to account for couldreg->false while in RegPrune
2020-03-25 06:32:42 -04:00
saravanank
cf575d0975 pimd: Reg Suppression expiry has to account for couldreg->false while in prune
Problem: This happened in once in a while during testing the scenario multiple
times. When regstop timer expire and at that point if rpf interface doesn't
exist, the register state for the upstream gets struck in reg-prune state indefinitely.
This will not recover even when rpf comes back and traffic resumed because
register state is struck on prune.

RCA: Reg suppression expiry is keeping reg state unchanged when iif is absent.

Fix: When iif is absent during reg suppression expiry, treat it as couldreg
becoming false and move it NO_INFO state.

Signed-off-by: Saravanan K <saravanank@vmware.com>
2020-03-24 02:31:04 -07:00
saravanank
46a9ea8bfa pimd: When DR becomes non DR, couldreg state events not handled.
RCA: Upstreams which are in register state other than noinfo, doesnt remove
register tunnel from oif after it becomes nonDR

Fix: scan upstreams with iif as the old dr and check if couldReg becomes false.
If couldreg becomes false from true, remove regiface and stop reg timer.
Do not disturb the entry. Later the entry shall be removed by kat expiry.

Signed-off-by: Saravanan K <saravanank@vmware.com>
2020-03-19 18:27:37 -07:00
saravanank
810cbaf7c1 pimd: In join suppression period, join is being sent
RCA:
Either JP timer is used to send join or join timer.
We are not removing the group from jp aggregate during suppression.
So even if join timer is restarted, jp aggregate expiry during suppression
is sending join for the group.

Fix:
Remove the group from jp aggregate on the neighbor during jp suppression.

Signed-off-by: Saravanan K <saravanank@vmware.com>
2020-03-19 03:20:25 -07:00
saravanank
af9106e544 pimd: Join not sent within prune override time when received non local prune.
RCA: Periodic join is mostly sent by nbr jp timer except for few scenarios by upstream join timer

Fix: If join timer not running, we have to use nbr jp timer to calculate
remaining time for next join.

Signed-off-by: Saravanan K <saravanank@vmware.com>
2020-03-17 02:01:35 -07:00
Sarita Patra
9443810eef pimd: fix OIL not removed after IGMP prune
Issue: Client1------LHR-----(int-1)RP(int-2)------client2
Client2 send IGMP join for group G.
Client1 send IGMP join for group G.
verify show ip mroute in RP, will have 2 OIL.
Client2 send IGMP leave.
Verify show ip mroute in RP, will still have 2.

Root cause: When RP receives IGMP join from client2, it creates
a (s,g) channel oil and add the interface int-2 into oil list and
set the flag PIM_OIF_FLAG_PROTO_IGMP to int-2
Client1 send IGMP join, LHR will send a (*,G) join to RP. RP will
add the interface int-1 into the oil list of (s,g) channel_oil and
will set the flag PIM_OIF_FLAG_PROTO_IGMP and PIM_OIF_FLAG_PROTO_PIM
to the int-1 and set PIM_OIF_FLAG_PROTO_PIM to int-2 as well. It is
happening because of the pim_upstream_inherited_olist_decide() and
forward_on() get all the oil and update the flag wrongly.
So now when client 2 sends IGMP prune, RP will not remove the int-2
from oil list since both PIM_OIF_FLAG_PROTO_PIM & PIM_OIF_FLAG_PROTO_IGMP
are set, it just unset the flag PIM_OIF_FLAG_PROTO_IGMP.

Fix: Introduced new flags in if_channel, PIM_IF_FLAG_MASK_PROTO_PIM
& PIM_IF_FLAG_MASK_PROTO_IGMP. If a if_channel is created because of
pim join or pim (s,g,rpt) prune received, then set the flag
PIM_IF_FLAG_MASK_PROTO_PIM. If a if_channel is created becuase of IGMP
join received, then set the flag PIM_IF_FLAG_MASK_PROTO_IGMP.
When an interface needs to be added into the oil list check if
PIM_IF_FLAG_MASK_PROTO_PIM or PIM_IF_FLAG_MASK_PROTO_IGMP is set, then
update oil flag accordingly.

Signed-off-by: Sarita Patra <saritap@vmware.com>
2020-03-16 21:54:34 -07:00
Anuradha Karuppiah
ea6d91c86b pimd: re-eval flow activity on kat expiry
When the (S,G) KAT expires we need to poll for activity before dropping the
entry as traffic may have been forwarded by the dataplane since the last
periodic poll cycle.

This only works if traffic is being forwarded by the kernel i.e. if the
entries were HW accelerated via an ASIC we may still miss out on last
minute activity on the mroute in the HW.

Ticket: CM-26871

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-03-06 16:18:44 -05:00
Satheesh Kumar K
22c35834ea pimd: Use PIM EVPN MLAG Infra for syncing PIM MLAG Entries
Initially, MLAG Sync is happened at pim_ifchannel, this is mainly to
support even config mismatches(missing configuration of dual active).
But this causes more syncs for each entry.

and also it is not In-line with PIM EVPN. to avoid that moving to
pm_upstream based syncing.

Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
2020-03-06 16:03:36 -05:00
Anuradha Karuppiah
ec85b101e6 pimd: run DF election only on (*, G) termination mroutes
(S,G) entries that inherit ipmr-lo into the OIL also inherit
the DF role from the parent (*, G) entry.

This change is done primarily to simplify the sync process and
to prevent the MLAG peers from having to track (S, G) activity etc.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-03-06 16:03:28 -05:00
Donald Sharp
5e81f5dd1a *: Finish off the __PRETTY_FUNCTION__ to __func__
FINISH IT

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2020-03-06 09:23:22 -05:00
Donatas Abraitis
15569c58f8 *: Replace __PRETTY_FUNCTION__/__FUNCTION__ to __func__
Just keep the code cool.

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2020-03-05 20:23:23 +02:00
Donatas Abraitis
286bbbecb0 pimd: Convert pim_upstream_evaluate_join_desired type to bool
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2020-03-04 17:13:01 +02:00
Anuradha Karuppiah
448139e704 pimd: stop overloading SRC_IGMP upstream for vxlan local membership
A local membership is created on the vxlan termination device ipmr-lo. This
is done to -
1. Pull multicast vxlan tunnel traffic to the VTEP for termination by
triggering JoinDesired on the BUM multicast group.
2. Include the OIF in the mroute to signal to the dataplane component
that flow needs to be vxlan terminated.

Earlier we were overloading the PIM_UPSTREAM_FLAG_MASK_SRC_IGMP for
this local membership creation but that is creating confusion both in
the state machine and in the show outputs. To avoid that we use the
more apparent PIM_UPSTREAM_FLAG_MASK_SRC_VXLAN_TERM. With this change -
1. We get LHR functionality for VXLAN_TERM mroutes
2. OIF is populated with PIM_OIF_FLAG_PROTO_PIM only

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-02-14 09:18:30 -08:00
Anuradha Karuppiah
73db824993 pimd: skip syncing and running DF election on orig mroutes
This is not causing functional problems but has become a source
of confusion. DF status is only relevant to multicast tunnel decaps.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-02-14 09:18:30 -08:00
Anuradha Karuppiah
f03999caa6 pimd: increase RPF metric via the peerlink_rif by plus-10
The RPF cost is incremented by 10 if the RPF interface is the peerlink-rif.
This is used to force the MLAG switch with the lowest cost to the RPF
to become the MLAG DF. If a switch has to go via the peerlink-rif to get
to the RP or source it simplly cannot be the designated forwarder.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-02-14 09:18:30 -08:00
Anuradha Karuppiah
95586137e6 pimd: inherit MLAG DF role from the parent (*, G) entry
DF election is only run for (*,G) entries i.e. election is skipped
for (S,G) entries that are setup as a result of SPT switchover. (S,G)
entries inherit the DF role from the parent (*,G) entry. So the DF is
responsible for terminating all sources associated with a group.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-02-14 09:18:30 -08:00
Anuradha Karuppiah
05ca004b80 pim: DF election for tunnel termination mroutes in an anycast-VTEP setup
1. Upstream entries associated with tunnel termination mroutes are
synced to the MLAG peer via the local MLAG daemon.
2. These entries are installed in the peer switch (via an upstream
ref flag).
3. DF (Designated Forwarder) election is run per-upstream entry by both
the MLAG switches -
a. The switch with the lowest RPF cost is the DF winner
b. If both switches have the same RPF cost the MLAG role is
used as a tie breaker with the MLAG primary becoming the DF
winner.
4. The DF winner terminates the multicast traffic by adding the tunnel
termination device to the OIL. The non-DF suppresses the termination
device from the OIL.

Note: Before the PIM-MLAG interface was available hidden config was
used to test the EVPN-PIM functionality with MLAG. I have removed the
code to persist that config to avoid confusion. The hidden commands are
still available.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-02-14 09:18:30 -08:00
Donatas Abraitis
752022670a *: Remove break after return
Just a deadcode.

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2020-02-13 15:39:54 +02:00
Donald Sharp
dd3364cb1a pimd: Convert the upstream_list and hash to a rb tree
Convert the upstream_list and hash to a rb tree, Significant
time was being spent in the listnode_add_sort.  This reduces
this time greatly.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2020-01-03 08:39:55 -05:00
Mark Stapp
174d3891ab pimd: clear SA warning in pimd
Remove a dead store in pim_upstream.c to clear up an SA
warning.

Signed-off-by: Mark Stapp <mjs@voltanet.io>
2019-12-10 12:10:44 -05:00
Anuradha Karuppiah
35d6862d60 pimd: eval use_rpt on new upstream post IIF setup but before MFC programming
use_rpt macro depends on JoinDesired macro and is mostly independent of the
actual RPF interface i.e. doesn't change when the RPF interface changes.

There is however one exception to this handling and that is on the
first hop router (DR or non-DR). On the DR the FHR flag is set so the
RPF interface stays irrelevant to use_rpt eval. But on the non-DR the
IIF is the only way to know we are directly connected to the SG i.e.
to know that we must NOT switch the source to RPT.

This commit fixes up the order of use_rpt eval -
1. it is done before mroute programming
2. but after IIF setup, for SRC_NOCACHE and STATIC_IIF upstream entries

Note: drop an unnecessary check to verify that the RPF interface is
pim enabled. This is just to make the code consistent.

Ticket: CM-27446

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-22 09:48:15 -08:00
Anuradha Karuppiah
075a475e0c pimd: fixup whitespace errors reported by CI
No functional changes.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-20 13:32:08 -08:00
Anuradha Karuppiah
a1be09396c pimd: drop redundant checks for RPF interface
pim_upstream_kat_start_ok was checking if RPF interface was present,
twice!

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-20 10:20:09 -08:00
Anuradha Karuppiah
41a115e4f0 pimd: fixup JD macro to use "peer-msdp-sa" check instead of I_am_RP check
JD macro is defined by the RFC as -
bool JoinDesired(S,G) {
    return (immediate_olist(S,G) != NULL
        OR (KeepaliveTimer(S,G) is running
        AND inherited_olist(S,G) != NULL))
}

However for MSDP synced SA the KAT will not be running so an exception is
needed. Earlier I had done this by relaxing KAT_run requirements entirely
on the RP. However as that prevents the source from being aged out in some
cases I have made the check more narrow i.e. has to an MSDP peer added
entry.

Ticket: CM-24398

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-15 14:16:08 -08:00
Anuradha Karuppiah
c5cdf06960 pimd: jp-agg list update debug logs
Added event logs around add/del of upstream entries into the nbr's
jp-agg list. This is to help debug a problem with stale (deleted)
upstream entries being present in the list causing pimd to crash on
the periodic processing.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-15 14:16:08 -08:00
Anuradha Karuppiah
c692bd2ad4 pimd: send an immediate XG JP message when switching from SPT to RPT
Today we are only pruning the SPT when (S,G) upstream entry
switches from Joined toNotJoined. This leaves the source still
pruned along the RPT till the next periodic XG join-prune is sent
to the RPF(RP). Traffic from the source will be blackholed for this
duration. To prevent that we need send a new JP message
to RPF(RP) immediately.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-15 14:16:08 -08:00
Anuradha Karuppiah
b36576e44c pimd: RPF change to unreachable was leaving a stale entry in the jp-agg list
This was causing pimd to crash later; call-stack -
(gdb) bt
    context=<optimized out>) at lib/sigevent.c:254
    group=group@entry=0x7ffffa9797e0) at pimd/pim_rp.c:207
    grp=grp@entry=0x7ffffa9799fe, sgs=sgs@entry=0x560ac069edb0, size=52)
    at pimd/pim_msg.c:200
    groups=<optimized out>) at pimd/pim_join.c:562
    at pimd/pim_neighbor.c:288
    at lib/thread.c:1599
    at lib/libfrr.c:1024
    envp=<optimized out>) at pimd/pim_main.c:162
(gdb) fr 4
    group=group@entry=0x7ffffa9797e0) at pimd/pim_rp.c:207
207     pimd/pim_rp.c: No such file or directory.
(gdb) fr 6
    grp=grp@entry=0x7ffffa9799fe, sgs=sgs@entry=0x560ac069edb0, size=52)
    at pimd/pim_msg.c:200
200     pimd/pim_msg.c: No such file or directory.
(gdb) p source->up->sg_str
$1 = '\000' <repeats 31 times>, <incomplete sequence \361>
(gdb)

This problem can manifest in the following event sequence -
1. upstream RPF neighbor is resolved
2. upstream RPF neighbor becomes unresolved (but upstream entry
   stays on the jp-agg list)
3. upstream entry is removed
on the next old-neighbor jp-agg-list processing the stale entry is
accessed resulting in the crash.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-15 14:16:08 -08:00
Anuradha Karuppiah
7ef66af956 pimd: insert upstream entry into nbr's jp-agg list when a new nbr is added
A dummy pim upstream entry can be in a JOINED state before its RPF nbr is
added. Handle that case by triggering an immediate join.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-15 14:16:08 -08:00
Anuradha Karuppiah
8c55c1325a pimd: add caller string prefix to pim_rpf_update logs
No functional change; log enhancements only.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-15 14:16:08 -08:00
Anuradha Karuppiah
cfa8f7eb05 pimd: fixup kat restart to conform to the RFC
1. KAT should be re-started only if traffic rxed along the SPT i.e.
IIF == RPF_Interface(S).
Only exception to the rule is if you are LHR.
2. KAT should be started on all routers (not just FHR, RP, LHR).

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2019-11-15 12:00:29 -08:00