Modify the code to create an upstream reference whenever the code
creates an channel_oil via the pim_mroute.c code. This code also
starts a keep alive timer to clean up the reference if we do
nothing with it after the normal time.
I've left alone the source->channel_oil creation because these
are kept and tracked independently already.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify code base so that pim_upstream *always* creates a channel_oil
and as such we do not need to create it later or play other games.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we create a channel_oil ensure that all paths that
we can go down will create one. Future commits
can remove the (up->channel_oil) tests.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We already log whether or not we add nht tracking, having
an additional boolean to say to log another line is
a bit over the top.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
become stale entries.
Topology:
--------
Source
|
FHR
|
RP ------ LHR --- Recv1
|
Recv2
Root case :
-----------
When RP acts as a LHR i.e RP has a local receiver and registed for
the same group where LHR connected receiver also registered for the
same multicast group.When RP receives a (s,g) join form LHR , it
increments upstream ref count to two to track the Local membership
as well.But at the time of KAT expiry in RP , upstream reference
is not being removed Which is added to track local membership which
is causing to make these entries as stale in RP and FHR.
Fix : Made the change such that it removes the upstream reference
if it is added to track the local memberships.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
Always when creating a new S,G state look at all possible ifchannels
to decide what the mroute should be.
The bug that this is fixing is this:
Suppose two incoming `*,G` joins on swp1, and swp2.
Now suppose that one of those ifchannel `*,G` sends a `*,G S,G RPT Prune`.
We were creating the S,G upstream state as we should but we were
only looking at the S,G ifchannel to decide the S,G mroute we would
be creating. As such what we need to do is to look over the associated
*,G ifchannels and allow us to associate correct oil needed.
Ticket: CM-24732
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For multicast vxlan tunnels we register the local VTEP-IP independent
of the prescence of BUM traffic i.e. we prime the pump. This
is acheived via NULL registers.
VxLAN orig entries with upstream in a PIM_REG_JOIN state are linked to
a work list for periodic NULL register transmission. Once the SPT setup
is complete the upstream-entry moves to a PIM_REG_PRUNE state and is
remved from the VxLAN work list.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pim_vxlan will use this for registering the local-VTEP-IP wth the RP
independent of the presence of BUM traffic.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In a VxLAN-AA setup both the anycast VTEPS can send VxLAN encapsulated
traffic. This is despite the fact that the it is not-DR on the IIF
associated with the originating mroute.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of vxlan origination entries IIF is set to -
1. lo for single VTEPs
2. MLAG-ISL for VTEPs multihomed via MLAG.
This commit creates the necessary infrastructure by -
1. allowing the IIF to be set statically (without RPF lookup)
2. and by preventing next-hop-tracking registration
PS: Note that I have skipped additional checks in pim_upstream_del
intentionally i.e. an attempt will be made to remove nexthop-tracking
for the upstream entry (with STATIC_IIF) which will fail because of the
up-entry not being in the nh's hash table. Ideally we should maintain
a nh pointer in the up-entry to prevent this unnecessary processing.
In the abscence of that I wanted to avoid spraying STATIC_IIF checks
all over.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of pim vxlan we create and keep upstream entries alive
in the abscence of traffic. So we need a mechanism to purge entries
abruptly on vxlan SG delete without having to wait for the entry
to age out.
These are again just the infrastructure changes needed for it.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
For vxlan BUM MDTs we prime the pump and register the local-VTEP-ip
as source even before the first BUM packet is rxed. This commit provides
the infrastructure changes needed for that.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The decision for Update_SPTbit(S,G, iif) includes a test
for JoinDesired(S,G) in section 4.2.2. When we were deciding
to update the spt bit we were not taking this into account.
This commit fixes this issue.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
On the LHR after we decide that traffic is flowing and
we set the SPT bit for the S,G *and* the incoming IIF
of the S,G is different than the incoming IIF of the *,G
we should immediately send the *,G S,G RPT Prune as
a triggered response instead of waiting for the next
cycle.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are displaying S,G string data we already auto
display the string as (S,G) no need to have ((S,G)).
Cleanup some that were found during log look through.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When RP gets deleted, find all the (*, G) upstream whose group belongs to
the deleted RP, release the upstream from pnc->upstream_hash in the function
pim_delete_tracked_nexthop().
Signed-off-by: Sarita Patra <saritap@vmware.com>
When route to RP gets modified, FRR receives a notification from
zebra, and call the function pim_resolve_upstream_nh() to compute the
nexthop and update upstream->rpf structure.
Issue: In case when RP becomes not reachable, FRR only uninstall
the mroute from the kernal, but not update the upstream->rpf structure.
Fix: When FRR receives a notification from zebra saying RP becomes
not reachable, then update the following fields.
1. update channel_oil incoming interface as MAXVIFS
2. Un-install the mroute from the kernel.
3. Switch upstream state from JOINED to NOTJOINED.
4. Clear the nexthop information of the upstream.
Signed-off-by: Sarita Patra <saritap@vmware.com>
In this commit, we are creating a dummy upstream & dummy channel_oil
for (*, G) when RP is not configured or not reachable.
Dummy upstream: <upstream_addr = INADDR_ANY, rpf = Unknown>
Dummy channel oil: <iif = MAXVIFS>
Signed-off-by: Sarita Patra <saritap@vmware.com>
When FRR receives IGMP/PIM (*, G) join and RP is not configured or not
reachable, then we are creating a dummy upstream with incoming interface
as NULL and upstream address as INADDR_ANY.
Added upstream address and incoming interface validation where it is necessary,
before doing any operation on the upstream.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Create a `struct pim_router` and move the thread master into it.
Future commits will further move global varaibles into the pim_router
structure.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit fixes two issues during pim shutdown.
1) The rp_info structure was being freed before the
outgoing notifications that depended on it's information
was sent out as part of shutdown.
2) The pim->upstream_list shutdown involved iterating
over the list via ALL_LIST_ELEMENTS. This typically
is enough but pim will auto delete child nodes as well
as itself when it goes away and they depend on it. As such
the node and nnode could possibly already have been freed.
So change the way we look at all the data in the upstream_list
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow at timer wheel creation time the ability to specify a
name for what we want the 'show thread cpu' to show up as.
Modify pim to note this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When pimd is getting terminated, pim_upstream_del() gets called as
part of cleaning process. pim_upstream_del() deletes the route and
assigns NULL to the up->channel_oil. It also deletes each if_channel
by calling the function pim_ifchannel_delete().
pim_ifchannel_delete() internally calls the caller function pim_upstream_del(),
if it is the last ifchannel for that upstream. So pim_upstream_del
is getting called twice, which will access the up->channel_oil which
was already set to NULL before. This results in crash.
Fix:
pim_ifchannel_delete() should call pim_upstream_del (caller function)
only if the up->ref_count > 0. Added an assert(up->ref_count > 0) in
the function pim_upstream_del().
Signed-off-by: Sarita Patra <saritap@vmware.com>
On shutdown and cleaning up pim_upstream ensure that the
upstream_sg_wheel still exists to remove item from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_upstream_free command was leaving slag by
not deleting data associated with the upstream
data structure. Modify the code to explicitly free
all data associated with an upstream on a pim instance
deletion event. Additionally the end result is that
the pim_upstream_free command is not needed anymore
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This improves code readability and also future-proofs our codebase
against new changes in the data structure used to store interfaces.
The FOR_ALL_INTERFACES_ADDRESSES macro was also moved to lib/ but
for now only babeld is using it.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This is an important optimization for users running FRR on systems with
a large number of interfaces (e.g. thousands of tunnels). Red-black
trees scale much better than sorted linked-lists and also store the
elements in an ordered way (contrary to hash tables).
This is a big patch but the interesting bits are all in lib/if.[ch].
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Convert the list_delete(struct list *) function to use
struct list **. This is to allow the list pointer to be nulled.
I keep running into uses of this list_delete function where we
forget to set the returned pointer to NULL and attempt to use
it and then experience a crash, usually after the developer
has long since left the building.
Let's make the api explicit in it setting the list pointer
to null.
Cynical Prediction: This code will expose a attempt
to use the NULL'ed list pointer in some obscure bit
of code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The commit '19b807c pimd: Allow the keepalive time to be per vrf.'
is missing some data. Probably as a result of the indentation
and I accidently dropped it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If a single S,G is being deleted because the keepalive
timer has timed out, Send a *,G join upstream to clear
the S,G RPT prune bit.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The channel_oil has a back pointer(up) to the upstream data structure.
If we are planning on keeping the channel oil (due to ref count issues)
longer than keeping the upstream, when we delete the upstream we were
not clearing the back pointer to up. This would result in a situation
where if that memory has started to be used again it will cause a
crash and other fun things.
Ticket: CM-17092
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The NHT upstream list at scale is horribly inefficient due to keeping
a sorted list of upstream entries. The attempting to find
the upstream and the insertion of it into the upstream_list
was consuming a large amount of cpu cycles.
Convert to a hash, allow add/deletions to effectively become
O(1) events.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a S,G,RPT prune as part of a *,G tree, install
the NULL oil S,G mroute. This will cause the traffic to stop
flowing for this particular S,G as we expect.
Ticket: CM-16978
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a SGRPT Prune we were switching the upstream
to JOINED and immediately sending a join. This was not
the right thing to do.
This was happening because we were making decisions about the
new ifchannel before it was fully formed.
Rework ifchannel startup to provide enough information to
the pim upstream data structure to make the right decisions
Ticket: CM-16425
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Move the upstream_list, hash and wheel into 'struct pim_instance'
Remove all pimg to pim in pim_upstream
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
-Upon Rx (*,G) Join w/o SGRpt at RP, trigger (S,G) Join
towards FHR, unset SGRpt flag from channel,
add (*,G) oif to (S,G) entry.
-Add I am not RP check to triger SGRpt on *,G path otherwise,
send S,G Prune on SPT path from RP to FHR upon receving *,G Prune.
-Upon Rx SGRpt receive, remove OIF(downstream where Prune received) from specific S,G.
Testing Done:
pim-smoke
Ran 95 tests in 11790.552s
FAILED (SKIP=10, failures=4)
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
When we add a thread pointer to thread_add_XXX functions
when the specified function is called, thread.c is setting
the thread pointer to NULL. This was causing pim to
liberally pull it's zassert grenade pin's.
Additionally clean up code to not set the NULL pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
-Upon Receving SGRpt Prune message, transitioning from Prune Pending state
to NOINFO state, ifchannel entry was getting deleted in prune pending timer
expiry. This can result in SGRpt ifhchannel deleted and recreated upon receving
triggered or periodic SGRpt received from downstream.
The automation test failed as it expected (check) SGRpt entry at RP after it triggers
SPT switchover.
- While transitioning from Prune-Pending state to NOINFO(Pruned) state, Trigger
SGRpt message towards RP.
- Add/del some of the debug traces
Ticket:CM-16057
Reviewed By:CCR-6198
Testing Done:
Rerun test08 multiple times and observed passing it.
Pim-smoke with hardnode
Ran 95 tests in 11219.420s
FAILED (SKIP=10, failures=4)
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The way thread.c is written, a caller who wishes to be able to cancel a
thread or avoid scheduling it twice must keep a reference to the thread.
Typically this is done with a long lived pointer whose value is checked
for null in order to know if the thread is currently scheduled. The
check-and-schedule idiom is so common that several wrapper macros in
thread.h existed solely to provide it.
This patch removes those macros and adds a new parameter to all
thread_add_* functions which is a pointer to the struct thread * to
store the result of a scheduling call. If the value passed is non-null,
the thread will only be scheduled if the value is null. This helps with
consistency.
A Coccinelle spatch has been used to transform code of the form:
if (t == NULL)
t = thread_add_* (...)
to the form
thread_add_* (..., &t)
The THREAD_ON macros have also been transformed to the underlying
thread.c calls.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
During processing of Join/Prune,
for a S,G entry, current state is SGRpt, when only *,G is
received, need to clear SGRpt and add/inherit the *,G OIF to S,G so
it can forward traffic to downstream where *,G is received.
Upon receiving SGRpt prune remove the inherited *,G OIF.
From, downstream router received *,G Prune along with SGRpt
prune. Avoid sending *,G and SGRpt Prune together.
Reset upstream_del reset ifchannel to NULL.
Testing Done:
Run failed smoke test of sending data packets, trigger SPT switchover,
*,G path received SGRpt later data traffic stopped S,G ages out from LHR, sends only
*,G join to upstream, verified S,G entry inherit the OIF.
Upon receiving SGRpt deletes inherited oif and retains in SGRpt state.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
During PIM Neighbor change/UP event, pim_scan_oil api
scans all channel oil to see any rpf impacted. Instead of
passing current upstream's RPF it passes current RPF as 0 and
does query to rib for nexhtop (without ECMP/Rebalance). This creates
inconsist RPF between Upstream and Channel oil.
In Channel Oil keep backward pointer to upstream DB and fetch up's
RPF and passed to channel_oil scan.
Decrement channel_oil ref_count in upstream_del when decrementing
up ref_count and it is not the last.
Created ECMP based FIB lookup API.
Testing Done:
Performed following testing on tester setup:
5 x LHR, 4 x MSDP Spines, 6 Sources each sending to 1023 groups from one of the spines.
Total send rate 8Mpps.
Test that caused problems was to reboot every device at the same time.
After fix performed 5 iterations of reboot devices and show no sign of the problem.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Now that pim has the ability to use ecmp, the Group
path to the RP, may be different than what is choosen
for the *,G IIF. As such when we are making the
spt switchover decision, compare the S,G IIF to the
*,G IIF.
Ticket: CM-15870
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
During neighbor down event, all upstream entries rpf lookup may result
into nhop address with 0.0.0.0 and rpf interface info being NULL.
Put preventin check where rpf interface info is accessed.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In this patch, PIM nexthop tracking uses locally populated nexthop cached list
to determine ECMP based nexthop (w/ ECMP knob enabled), otherwise picks
the first nexthop as RPF.
Introduced '[no] ip pim ecmp' command to enable/disable PIM ECMP knob.
By default, PIM ECMP is disabled.
Intorudced '[no] ip pim ecmp rebalance' command to provide existing mcache
entry to switch new path based on hash chosen path.
Introduced, show command to display pim registered addresses and respective nexthops.
Introuduce, show command to find nexthop and out interface for (S,G) or (RP,G).
Re-Register an address with nexthop when Interface UP event received,
to ensure the PIM nexthop cache is updated (being PIM enabled).
During PIM neighbor UP, traverse all RPs and Upstreams nexthop and determine, if
any of nexthop's IPv4 address changes/resolves due to neigbor UP event.
Testing Done: Run various LHR, RP and FHR related cases to resolve RPF using
nexthop cache with ECMP knob disabled, performed interface/PIM neighbor flap events.
Executed pim-smoke with knob disabled.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
(cherry picked from commit cba4448178)
We have a bunch of places where we iterate over
the pim_ifchannel_list to find those ifchannels
that match a certain upstream. Since we already
know in the upstream the list of ifchannels
associated with it, just look at those instead.
Functions changed:
forward_on
forward_off
pim_upstream_rpf_interface_changed
pim_upstream_update_could_assert
pim_upstream_update_my_assert_metric
pim_upstream_update_assert_tracking_desired
pim_upstream_is_sg_rpt
Ticket: CM-15629
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a list structure to track the ifchannels associated
with a particular upstream.
We are not doing anything with this particular knowledge
yet but it will be come useful in the near future.
Ticket: CM-15629
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are determining an inherited_olist, let's be allot
smarter about what we look at. Before this code change
we are looping over the entirety of all ifchannels in
the system to find the relevant ones. Convert the
code to *find*(hash table lookup) the specific ifchannels we
are interested in.
Ticket: CM-15629
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
To the 'ip pim spt-switchover infinity-and-beyond' command
add 'prefix-list <PLIST>'. To the command.
Use this as the basis to deny (Not immediate switchover)
or permit (Immediate switchover), based upon matching
the group address and the prefix-list.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This allows SPT switchover for S,G upon receipt of packets
on the LHR.
1) When we create a *,G from a IGMP Group Report, install
the *,G route with the pimreg device on the OIL.
2) When a packet hits the LHR that matches the *,G, we will
get a WHOLEPKT callback from the kernel and if we cannot
find the S,G, that means we have matched it on the LHR via
the *,G mroute. Create the S,G start the KAT and run
inherited_olist.
3) When the S,G times out, safely remove the S,G via
the KAT expiry
4) When the *,G is removed, remove any S,G associated
with it via the LHR flag.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
SSM groups (232/8 or user configured SSM range) can exist in the same
multicast network as ASM groups. For such groups all RPT related state
machine operations have to be skipped as defined by section 4.8 of
RFC4601 -
1. Source registration is skipped for SSM groups. For SSM groups mroute
is setup on the FHR when a new multicast flow is rxed; however source
registration (i.e. pimreg join) is skipped. This will let the ASIC black
hole the traffic till a valid OIL is added to the mroute.
2. (*,G) IGMP registrations are ignored for SSM groups.
Sample output:
=============
fhr# sh ip pim group-type
SSM group range : 232.0.0.0/8
fhr# sh ip pim group-type 232.1.1.1
Group type: SSM
fhr# sh ip pim group-type 239.1.1.1
Group type: ASM
fhr#
Sample config:
=============
fhr(config)# ip pim ssm prefix-list ssm-ranges
fhr(config)#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-15344
Testing Done:
1. SSM/ASM source-registration/igmp-joins.
2. On the fly multicast group type changes.
3. pim-smoke.
When we get a SGrpt Prune embedded in the *,G Join,
Display the created ifchannel as being SGRpt state.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
pim_jp_agg list should not ref count pim_upstream as that
the deletion of pim_upstream deletion should remove
the pim_upstream from the j/p agg list.
Cleanup a memory leag of jag
Make comparison of js cleaner in add_group
Move THREAD_OFF to before the neighbor find.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>