Criteria for switching to SPT is different on RP and LHR. Re-name
the functions to make that apparent.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This commit includes the following changes -
1. kat needs to be included when evaluting join desired on a (S,G)
entry.
2. there were cases where we were adding OIF based on joindesired
being true for unrelated reasons (on other OIFs). cleaned up those
cases.
3. make all calls to pim_upstream_switch conditional on the JoinDesired
macro.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
RP config change is a big hammer and use_rpt/spt needs to be
re-evaluated on all existing (S,G) entries.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If a source is being forwarded along the RPT it uses the parent (*,G)'s
IIF. When the parent's IIF changes all the children need to be updated
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
mfcc_parent for an (S, G) entry was being updated on any upstream RPF
change. With the change to use RPT for (S,G) in some cases we can no
longer do that. Instead the upstream entry's RPF neigbor is managed
separately form the channel_oil's mfcc_parent i.e. via NHT. And the
mfcc_parent is evaluated at the time of mroute programming.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An (S,G) mroute can be created as a result of rpt prune. However that
entry needs to stay on the parent (*,G)'s tree (IIF) till a decision is
made to switch the source to the SPT.
The decision to stay on the RPT is made based on the SPTbit setting
according to - RFC7761, Section 4.2 “Data Packet Forwarding Rules”
However those rules are hard to achieve when hw acceleration i.e.
control and data planes are separate. So instead of relying on data
we make the decision of using SPT if we have decided to join the SPT -
Use_RPT(S,G) {
if (Joined(S,G) == TRUE // we have decided to join the SPT
OR Directly_Connected(S) == TRUE // source is directly connected
OR I_am_RP(G) == TRUE) // RP
//use_spt
return FALSE;
//use_rpt
return TRUE;
}
To make that change some re-org was needed -
1. pim static mroutes and dynamic (upstream mroutes) top level APIs
have been separated. This is to limit the state machine to dynamic
mroutes.
2. c_oil->oil.mfcc_parent is re-evaluated based on if we decided
to use the SPT or stay on the RPT.
3. upstream mroute re-eval is done when any of the criteria involved
in Use_RPT changes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Theoretically there should be no case where the channel-oil hangs
around after the upstream entry is removed. But currently there are
cases where it does. This is a precautionary fixup till we are
rid off all of those cases.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
These logs were printing file name which has little value (is always
pim_oil.c). Instead print the caller.
add_oif/del_oif are being called directly from one too many. Instead OIF
setup needs to be consolidated via the PIM state machine. These
debugs are expected to help in understanding what needs to be cleaned up.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When you turn on `debug igmp trace` we are seeing a bunch
of debugs associated with pim processing. This is because we were
using PIM_DEBUG_TRACE which is both `debug igmp trace` and `debug pim trace`
when tracing igmp code it would be nice to only see igmp work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify the code to create an upstream reference whenever the code
creates an channel_oil via the pim_mroute.c code. This code also
starts a keep alive timer to clean up the reference if we do
nothing with it after the normal time.
I've left alone the source->channel_oil creation because these
are kept and tracked independently already.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify code base so that pim_upstream *always* creates a channel_oil
and as such we do not need to create it later or play other games.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we create a channel_oil ensure that all paths that
we can go down will create one. Future commits
can remove the (up->channel_oil) tests.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We already log whether or not we add nht tracking, having
an additional boolean to say to log another line is
a bit over the top.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
become stale entries.
Topology:
--------
Source
|
FHR
|
RP ------ LHR --- Recv1
|
Recv2
Root case :
-----------
When RP acts as a LHR i.e RP has a local receiver and registed for
the same group where LHR connected receiver also registered for the
same multicast group.When RP receives a (s,g) join form LHR , it
increments upstream ref count to two to track the Local membership
as well.But at the time of KAT expiry in RP , upstream reference
is not being removed Which is added to track local membership which
is causing to make these entries as stale in RP and FHR.
Fix : Made the change such that it removes the upstream reference
if it is added to track the local memberships.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
Always when creating a new S,G state look at all possible ifchannels
to decide what the mroute should be.
The bug that this is fixing is this:
Suppose two incoming `*,G` joins on swp1, and swp2.
Now suppose that one of those ifchannel `*,G` sends a `*,G S,G RPT Prune`.
We were creating the S,G upstream state as we should but we were
only looking at the S,G ifchannel to decide the S,G mroute we would
be creating. As such what we need to do is to look over the associated
*,G ifchannels and allow us to associate correct oil needed.
Ticket: CM-24732
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For multicast vxlan tunnels we register the local VTEP-IP independent
of the prescence of BUM traffic i.e. we prime the pump. This
is acheived via NULL registers.
VxLAN orig entries with upstream in a PIM_REG_JOIN state are linked to
a work list for periodic NULL register transmission. Once the SPT setup
is complete the upstream-entry moves to a PIM_REG_PRUNE state and is
remved from the VxLAN work list.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pim_vxlan will use this for registering the local-VTEP-IP wth the RP
independent of the presence of BUM traffic.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In a VxLAN-AA setup both the anycast VTEPS can send VxLAN encapsulated
traffic. This is despite the fact that the it is not-DR on the IIF
associated with the originating mroute.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of vxlan origination entries IIF is set to -
1. lo for single VTEPs
2. MLAG-ISL for VTEPs multihomed via MLAG.
This commit creates the necessary infrastructure by -
1. allowing the IIF to be set statically (without RPF lookup)
2. and by preventing next-hop-tracking registration
PS: Note that I have skipped additional checks in pim_upstream_del
intentionally i.e. an attempt will be made to remove nexthop-tracking
for the upstream entry (with STATIC_IIF) which will fail because of the
up-entry not being in the nh's hash table. Ideally we should maintain
a nh pointer in the up-entry to prevent this unnecessary processing.
In the abscence of that I wanted to avoid spraying STATIC_IIF checks
all over.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of pim vxlan we create and keep upstream entries alive
in the abscence of traffic. So we need a mechanism to purge entries
abruptly on vxlan SG delete without having to wait for the entry
to age out.
These are again just the infrastructure changes needed for it.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
For vxlan BUM MDTs we prime the pump and register the local-VTEP-ip
as source even before the first BUM packet is rxed. This commit provides
the infrastructure changes needed for that.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The decision for Update_SPTbit(S,G, iif) includes a test
for JoinDesired(S,G) in section 4.2.2. When we were deciding
to update the spt bit we were not taking this into account.
This commit fixes this issue.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
On the LHR after we decide that traffic is flowing and
we set the SPT bit for the S,G *and* the incoming IIF
of the S,G is different than the incoming IIF of the *,G
we should immediately send the *,G S,G RPT Prune as
a triggered response instead of waiting for the next
cycle.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are displaying S,G string data we already auto
display the string as (S,G) no need to have ((S,G)).
Cleanup some that were found during log look through.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When RP gets deleted, find all the (*, G) upstream whose group belongs to
the deleted RP, release the upstream from pnc->upstream_hash in the function
pim_delete_tracked_nexthop().
Signed-off-by: Sarita Patra <saritap@vmware.com>
When route to RP gets modified, FRR receives a notification from
zebra, and call the function pim_resolve_upstream_nh() to compute the
nexthop and update upstream->rpf structure.
Issue: In case when RP becomes not reachable, FRR only uninstall
the mroute from the kernal, but not update the upstream->rpf structure.
Fix: When FRR receives a notification from zebra saying RP becomes
not reachable, then update the following fields.
1. update channel_oil incoming interface as MAXVIFS
2. Un-install the mroute from the kernel.
3. Switch upstream state from JOINED to NOTJOINED.
4. Clear the nexthop information of the upstream.
Signed-off-by: Sarita Patra <saritap@vmware.com>
In this commit, we are creating a dummy upstream & dummy channel_oil
for (*, G) when RP is not configured or not reachable.
Dummy upstream: <upstream_addr = INADDR_ANY, rpf = Unknown>
Dummy channel oil: <iif = MAXVIFS>
Signed-off-by: Sarita Patra <saritap@vmware.com>
When FRR receives IGMP/PIM (*, G) join and RP is not configured or not
reachable, then we are creating a dummy upstream with incoming interface
as NULL and upstream address as INADDR_ANY.
Added upstream address and incoming interface validation where it is necessary,
before doing any operation on the upstream.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Create a `struct pim_router` and move the thread master into it.
Future commits will further move global varaibles into the pim_router
structure.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit fixes two issues during pim shutdown.
1) The rp_info structure was being freed before the
outgoing notifications that depended on it's information
was sent out as part of shutdown.
2) The pim->upstream_list shutdown involved iterating
over the list via ALL_LIST_ELEMENTS. This typically
is enough but pim will auto delete child nodes as well
as itself when it goes away and they depend on it. As such
the node and nnode could possibly already have been freed.
So change the way we look at all the data in the upstream_list
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow at timer wheel creation time the ability to specify a
name for what we want the 'show thread cpu' to show up as.
Modify pim to note this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When pimd is getting terminated, pim_upstream_del() gets called as
part of cleaning process. pim_upstream_del() deletes the route and
assigns NULL to the up->channel_oil. It also deletes each if_channel
by calling the function pim_ifchannel_delete().
pim_ifchannel_delete() internally calls the caller function pim_upstream_del(),
if it is the last ifchannel for that upstream. So pim_upstream_del
is getting called twice, which will access the up->channel_oil which
was already set to NULL before. This results in crash.
Fix:
pim_ifchannel_delete() should call pim_upstream_del (caller function)
only if the up->ref_count > 0. Added an assert(up->ref_count > 0) in
the function pim_upstream_del().
Signed-off-by: Sarita Patra <saritap@vmware.com>
On shutdown and cleaning up pim_upstream ensure that the
upstream_sg_wheel still exists to remove item from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_upstream_free command was leaving slag by
not deleting data associated with the upstream
data structure. Modify the code to explicitly free
all data associated with an upstream on a pim instance
deletion event. Additionally the end result is that
the pim_upstream_free command is not needed anymore
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This improves code readability and also future-proofs our codebase
against new changes in the data structure used to store interfaces.
The FOR_ALL_INTERFACES_ADDRESSES macro was also moved to lib/ but
for now only babeld is using it.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This is an important optimization for users running FRR on systems with
a large number of interfaces (e.g. thousands of tunnels). Red-black
trees scale much better than sorted linked-lists and also store the
elements in an ordered way (contrary to hash tables).
This is a big patch but the interesting bits are all in lib/if.[ch].
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Convert the list_delete(struct list *) function to use
struct list **. This is to allow the list pointer to be nulled.
I keep running into uses of this list_delete function where we
forget to set the returned pointer to NULL and attempt to use
it and then experience a crash, usually after the developer
has long since left the building.
Let's make the api explicit in it setting the list pointer
to null.
Cynical Prediction: This code will expose a attempt
to use the NULL'ed list pointer in some obscure bit
of code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The commit '19b807c pimd: Allow the keepalive time to be per vrf.'
is missing some data. Probably as a result of the indentation
and I accidently dropped it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If a single S,G is being deleted because the keepalive
timer has timed out, Send a *,G join upstream to clear
the S,G RPT prune bit.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The channel_oil has a back pointer(up) to the upstream data structure.
If we are planning on keeping the channel oil (due to ref count issues)
longer than keeping the upstream, when we delete the upstream we were
not clearing the back pointer to up. This would result in a situation
where if that memory has started to be used again it will cause a
crash and other fun things.
Ticket: CM-17092
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The NHT upstream list at scale is horribly inefficient due to keeping
a sorted list of upstream entries. The attempting to find
the upstream and the insertion of it into the upstream_list
was consuming a large amount of cpu cycles.
Convert to a hash, allow add/deletions to effectively become
O(1) events.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a S,G,RPT prune as part of a *,G tree, install
the NULL oil S,G mroute. This will cause the traffic to stop
flowing for this particular S,G as we expect.
Ticket: CM-16978
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a SGRPT Prune we were switching the upstream
to JOINED and immediately sending a join. This was not
the right thing to do.
This was happening because we were making decisions about the
new ifchannel before it was fully formed.
Rework ifchannel startup to provide enough information to
the pim upstream data structure to make the right decisions
Ticket: CM-16425
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Move the upstream_list, hash and wheel into 'struct pim_instance'
Remove all pimg to pim in pim_upstream
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
-Upon Rx (*,G) Join w/o SGRpt at RP, trigger (S,G) Join
towards FHR, unset SGRpt flag from channel,
add (*,G) oif to (S,G) entry.
-Add I am not RP check to triger SGRpt on *,G path otherwise,
send S,G Prune on SPT path from RP to FHR upon receving *,G Prune.
-Upon Rx SGRpt receive, remove OIF(downstream where Prune received) from specific S,G.
Testing Done:
pim-smoke
Ran 95 tests in 11790.552s
FAILED (SKIP=10, failures=4)
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
When we add a thread pointer to thread_add_XXX functions
when the specified function is called, thread.c is setting
the thread pointer to NULL. This was causing pim to
liberally pull it's zassert grenade pin's.
Additionally clean up code to not set the NULL pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
-Upon Receving SGRpt Prune message, transitioning from Prune Pending state
to NOINFO state, ifchannel entry was getting deleted in prune pending timer
expiry. This can result in SGRpt ifhchannel deleted and recreated upon receving
triggered or periodic SGRpt received from downstream.
The automation test failed as it expected (check) SGRpt entry at RP after it triggers
SPT switchover.
- While transitioning from Prune-Pending state to NOINFO(Pruned) state, Trigger
SGRpt message towards RP.
- Add/del some of the debug traces
Ticket:CM-16057
Reviewed By:CCR-6198
Testing Done:
Rerun test08 multiple times and observed passing it.
Pim-smoke with hardnode
Ran 95 tests in 11219.420s
FAILED (SKIP=10, failures=4)
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The way thread.c is written, a caller who wishes to be able to cancel a
thread or avoid scheduling it twice must keep a reference to the thread.
Typically this is done with a long lived pointer whose value is checked
for null in order to know if the thread is currently scheduled. The
check-and-schedule idiom is so common that several wrapper macros in
thread.h existed solely to provide it.
This patch removes those macros and adds a new parameter to all
thread_add_* functions which is a pointer to the struct thread * to
store the result of a scheduling call. If the value passed is non-null,
the thread will only be scheduled if the value is null. This helps with
consistency.
A Coccinelle spatch has been used to transform code of the form:
if (t == NULL)
t = thread_add_* (...)
to the form
thread_add_* (..., &t)
The THREAD_ON macros have also been transformed to the underlying
thread.c calls.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
During processing of Join/Prune,
for a S,G entry, current state is SGRpt, when only *,G is
received, need to clear SGRpt and add/inherit the *,G OIF to S,G so
it can forward traffic to downstream where *,G is received.
Upon receiving SGRpt prune remove the inherited *,G OIF.
From, downstream router received *,G Prune along with SGRpt
prune. Avoid sending *,G and SGRpt Prune together.
Reset upstream_del reset ifchannel to NULL.
Testing Done:
Run failed smoke test of sending data packets, trigger SPT switchover,
*,G path received SGRpt later data traffic stopped S,G ages out from LHR, sends only
*,G join to upstream, verified S,G entry inherit the OIF.
Upon receiving SGRpt deletes inherited oif and retains in SGRpt state.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
During PIM Neighbor change/UP event, pim_scan_oil api
scans all channel oil to see any rpf impacted. Instead of
passing current upstream's RPF it passes current RPF as 0 and
does query to rib for nexhtop (without ECMP/Rebalance). This creates
inconsist RPF between Upstream and Channel oil.
In Channel Oil keep backward pointer to upstream DB and fetch up's
RPF and passed to channel_oil scan.
Decrement channel_oil ref_count in upstream_del when decrementing
up ref_count and it is not the last.
Created ECMP based FIB lookup API.
Testing Done:
Performed following testing on tester setup:
5 x LHR, 4 x MSDP Spines, 6 Sources each sending to 1023 groups from one of the spines.
Total send rate 8Mpps.
Test that caused problems was to reboot every device at the same time.
After fix performed 5 iterations of reboot devices and show no sign of the problem.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Now that pim has the ability to use ecmp, the Group
path to the RP, may be different than what is choosen
for the *,G IIF. As such when we are making the
spt switchover decision, compare the S,G IIF to the
*,G IIF.
Ticket: CM-15870
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
During neighbor down event, all upstream entries rpf lookup may result
into nhop address with 0.0.0.0 and rpf interface info being NULL.
Put preventin check where rpf interface info is accessed.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In this patch, PIM nexthop tracking uses locally populated nexthop cached list
to determine ECMP based nexthop (w/ ECMP knob enabled), otherwise picks
the first nexthop as RPF.
Introduced '[no] ip pim ecmp' command to enable/disable PIM ECMP knob.
By default, PIM ECMP is disabled.
Intorudced '[no] ip pim ecmp rebalance' command to provide existing mcache
entry to switch new path based on hash chosen path.
Introduced, show command to display pim registered addresses and respective nexthops.
Introuduce, show command to find nexthop and out interface for (S,G) or (RP,G).
Re-Register an address with nexthop when Interface UP event received,
to ensure the PIM nexthop cache is updated (being PIM enabled).
During PIM neighbor UP, traverse all RPs and Upstreams nexthop and determine, if
any of nexthop's IPv4 address changes/resolves due to neigbor UP event.
Testing Done: Run various LHR, RP and FHR related cases to resolve RPF using
nexthop cache with ECMP knob disabled, performed interface/PIM neighbor flap events.
Executed pim-smoke with knob disabled.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
(cherry picked from commit cba4448178)
We have a bunch of places where we iterate over
the pim_ifchannel_list to find those ifchannels
that match a certain upstream. Since we already
know in the upstream the list of ifchannels
associated with it, just look at those instead.
Functions changed:
forward_on
forward_off
pim_upstream_rpf_interface_changed
pim_upstream_update_could_assert
pim_upstream_update_my_assert_metric
pim_upstream_update_assert_tracking_desired
pim_upstream_is_sg_rpt
Ticket: CM-15629
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a list structure to track the ifchannels associated
with a particular upstream.
We are not doing anything with this particular knowledge
yet but it will be come useful in the near future.
Ticket: CM-15629
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are determining an inherited_olist, let's be allot
smarter about what we look at. Before this code change
we are looping over the entirety of all ifchannels in
the system to find the relevant ones. Convert the
code to *find*(hash table lookup) the specific ifchannels we
are interested in.
Ticket: CM-15629
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>