Found some more missing code that got dropped during the
upstreaming process causing issues with things actually
working.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
RCA: Upstreams which are in register state other than noinfo, doesnt remove
register tunnel from oif after it becomes nonDR
Fix: scan upstreams with iif as the old dr and check if couldReg becomes false.
If couldreg becomes false from true, remove regiface and stop reg timer.
Do not disturb the entry. Later the entry shall be removed by kat expiry.
Signed-off-by: Saravanan K <saravanank@vmware.com>
Initially, MLAG Sync is happened at pim_ifchannel, this is mainly to
support even config mismatches(missing configuration of dual active).
But this causes more syncs for each entry.
and also it is not In-line with PIM EVPN. to avoid that moving to
pm_upstream based syncing.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
1. show ip pim mlag summary
provides MLAG session information and stats
2. show ip pim mlag upstream
displays the upstream entries synced across the MLAG switches
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A local membership is created on the vxlan termination device ipmr-lo. This
is done to -
1. Pull multicast vxlan tunnel traffic to the VTEP for termination by
triggering JoinDesired on the BUM multicast group.
2. Include the OIF in the mroute to signal to the dataplane component
that flow needs to be vxlan terminated.
Earlier we were overloading the PIM_UPSTREAM_FLAG_MASK_SRC_IGMP for
this local membership creation but that is creating confusion both in
the state machine and in the show outputs. To avoid that we use the
more apparent PIM_UPSTREAM_FLAG_MASK_SRC_VXLAN_TERM. With this change -
1. We get LHR functionality for VXLAN_TERM mroutes
2. OIF is populated with PIM_OIF_FLAG_PROTO_PIM only
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The RPF cost is incremented by 10 if the RPF interface is the peerlink-rif.
This is used to force the MLAG switch with the lowest cost to the RPF
to become the MLAG DF. If a switch has to go via the peerlink-rif to get
to the RP or source it simplly cannot be the designated forwarder.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. Upstream entries associated with tunnel termination mroutes are
synced to the MLAG peer via the local MLAG daemon.
2. These entries are installed in the peer switch (via an upstream
ref flag).
3. DF (Designated Forwarder) election is run per-upstream entry by both
the MLAG switches -
a. The switch with the lowest RPF cost is the DF winner
b. If both switches have the same RPF cost the MLAG role is
used as a tie breaker with the MLAG primary becoming the DF
winner.
4. The DF winner terminates the multicast traffic by adding the tunnel
termination device to the OIL. The non-DF suppresses the termination
device from the OIL.
Note: Before the PIM-MLAG interface was available hidden config was
used to test the EVPN-PIM functionality with MLAG. I have removed the
code to persist that config to avoid confusion. The hidden commands are
still available.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Convert the upstream_list and hash to a rb tree, Significant
time was being spent in the listnode_add_sort. This reduces
this time greatly.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Criteria for switching to SPT is different on RP and LHR. Re-name
the functions to make that apparent.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This commit includes the following changes -
1. kat needs to be included when evaluting join desired on a (S,G)
entry.
2. there were cases where we were adding OIF based on joindesired
being true for unrelated reasons (on other OIFs). cleaned up those
cases.
3. make all calls to pim_upstream_switch conditional on the JoinDesired
macro.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
RP config change is a big hammer and use_rpt/spt needs to be
re-evaluated on all existing (S,G) entries.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An (S,G) mroute can be created as a result of rpt prune. However that
entry needs to stay on the parent (*,G)'s tree (IIF) till a decision is
made to switch the source to the SPT.
The decision to stay on the RPT is made based on the SPTbit setting
according to - RFC7761, Section 4.2 “Data Packet Forwarding Rules”
However those rules are hard to achieve when hw acceleration i.e.
control and data planes are separate. So instead of relying on data
we make the decision of using SPT if we have decided to join the SPT -
Use_RPT(S,G) {
if (Joined(S,G) == TRUE // we have decided to join the SPT
OR Directly_Connected(S) == TRUE // source is directly connected
OR I_am_RP(G) == TRUE) // RP
//use_spt
return FALSE;
//use_rpt
return TRUE;
}
To make that change some re-org was needed -
1. pim static mroutes and dynamic (upstream mroutes) top level APIs
have been separated. This is to limit the state machine to dynamic
mroutes.
2. c_oil->oil.mfcc_parent is re-evaluated based on if we decided
to use the SPT or stay on the RPT.
3. upstream mroute re-eval is done when any of the criteria involved
in Use_RPT changes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Modify the code to create an upstream reference whenever the code
creates an channel_oil via the pim_mroute.c code. This code also
starts a keep alive timer to clean up the reference if we do
nothing with it after the normal time.
I've left alone the source->channel_oil creation because these
are kept and tracked independently already.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Two flags have been introduced per-upstream entry -
1. XXX_MLAG_VXLAN - This indicates that MLAG DF (designated-forwarded)
election is needed on the entry. In the case of pim-evpn this flag is set
for termination (*, G) entries and will be inherited by the (S, G) entries
that are created as a result of SPT switchover on the G.
2. XXX_MLAG_NON_DF - This is set on entries that have lost the
DF election. Such entries are primarily used for blackholing traffic on
one of the MLAG switches. On a hardware accelerated switch this blackholing
happens in the ASIC preventing (non-needed) traffic hitting the CPU.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
PIM VxLAN handling will create two types of upstream entries and
maintain app-specific properties against the entry.
This commit provides the header definitions for that.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In a VxLAN-AA setup both the anycast VTEPS can send VxLAN encapsulated
traffic. This is despite the fact that the it is not-DR on the IIF
associated with the originating mroute.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In a MLAG setup both of the VTEPs can rx and reg-encapsulate BUM traffic
toward the RP. To prevent these duplicates we need a mechanism to disable
register encaps on specific mroutes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is specifically needed to allow pim-evpn mroutes in the MLAG setup -
(36.0.0.11, 239.1.1.100) Iif: peerlink.4094 Oifs: uplink-1, peerlink.4094
I could have gone the other way and disabled PIM_ENFORCE_LOOPFREE_MFC but
that opens the door too wide. Relaxing the checks for mlag-specific mroutes
seemed like the safer choice.
This commit provides the infrastructure to relax checks on a per-mroute
basis.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of vxlan origination entries IIF is set to -
1. lo for single VTEPs
2. MLAG-ISL for VTEPs multihomed via MLAG.
This commit creates the necessary infrastructure by -
1. allowing the IIF to be set statically (without RPF lookup)
2. and by preventing next-hop-tracking registration
PS: Note that I have skipped additional checks in pim_upstream_del
intentionally i.e. an attempt will be made to remove nexthop-tracking
for the upstream entry (with STATIC_IIF) which will fail because of the
up-entry not being in the nh's hash table. Ideally we should maintain
a nh pointer in the up-entry to prevent this unnecessary processing.
In the abscence of that I wanted to avoid spraying STATIC_IIF checks
all over.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of pim vxlan we create and keep upstream entries alive
in the abscence of traffic. So we need a mechanism to purge entries
abruptly on vxlan SG delete without having to wait for the entry
to age out.
These are again just the infrastructure changes needed for it.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
For vxlan BUM MDTs we prime the pump and register the local-VTEP-ip
as source even before the first BUM packet is rxed. This commit provides
the infrastructure changes needed for that.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Added comments which explains the new values for existing fields
and new fields in the upstream and channel_oil data structure.
Following are the summary of the behaviour change in PIM code.
Scenario 1 : RP doesn’t exist/RP not reachable
Event: Join received
Current behaviour:
No upstream gets created
Changed behaviour:
Upstream data structure created with below info
upstream_addr: INADDR_ANY
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE (flag introduced to indicate if this entry is valid to get installed in hardware)
RPF details: Not valid
Join state: NOT_JOINED
Kernal installed: FALSE
Scenario 2: Dummy upstream exists
Event: RP configured
Current Behaviour:
upstream address updated for the dummy upstream created.
Changed Behaviour:
upstream_addr: RP address
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE
RPF details: only RP address updated
Join state: NOT_JOINED
Kernel installed: FALSE
Scenario 3: Dummy upstream exists
Event: RP becomes reachable
Current Behaviour:
Update channel oil, rpf details in the upstream and install in hardware
Changed Behaviour:
upstream_addr: RP Adress
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE
RPF details: RPF details updated via NHT callback
Join state: JOINED
Kernel installed: TRUE
Scenario 4: MRoute exists
Event: RP gets deleted
Current behaviour:
Nothing got updated in him upstream and channel oil,
join timer still runs. Mroute still exists in kernel.
Changed behaviour:
upstream_addr: INADDR_ANY
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE
RPF details: Not valid
Join state: NOT_JOINED (also sent prune towards deleted RPF nbr)
Kernel installed: FALSE
Scenario 5: MRoute Exists
Event: RP unreachable
Current behaviour:
Nothing got updated in him upstream and channel oil,
join timer still runs. Mroute sdeleted from kernel.
Changed behaviour:
upstream_addr: RP address
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE
RPF details: only RP address updated
Join state: NOT_JOINED (also sent prune towards deleted RPF nbr)
Kernel installed: FALSE
Scenario 6: Mroute exists
Event: Better RP configured with precise group range & reachable.
Current behaviour:
No effect on existing route.
Changed behaviour:
Upstream address: Better RP
RPF interface: towards the better RP
Join state: JOINED (Send a prune towards the old RP and send a join
towards the better RP)
Scenario 7: Mroute exists
Event: RP deleted and another RP with broad group range fits this group & reachable
Current behaviour:
No effect on current behaviour
Changed behaviour:
Upstream address: next available RP
RPF interface: towards the next available RP
Join state: JOINED (Send a prune towards the old RP and send a join
towards the better RP)
Signed-off-by: Sarita Patra <saritap@vmware.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_upstream_free command was leaving slag by
not deleting data associated with the upstream
data structure. Modify the code to explicitly free
all data associated with an upstream on a pim instance
deletion event. Additionally the end result is that
the pim_upstream_free command is not needed anymore
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The NHT upstream list at scale is horribly inefficient due to keeping
a sorted list of upstream entries. The attempting to find
the upstream and the insertion of it into the upstream_list
was consuming a large amount of cpu cycles.
Convert to a hash, allow add/deletions to effectively become
O(1) events.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a SGRPT Prune we were switching the upstream
to JOINED and immediately sending a join. This was not
the right thing to do.
This was happening because we were making decisions about the
new ifchannel before it was fully formed.
Rework ifchannel startup to provide enough information to
the pim upstream data structure to make the right decisions
Ticket: CM-16425
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Move the upstream_list, hash and wheel into 'struct pim_instance'
Remove all pimg to pim in pim_upstream
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
All PIM Neighbors for a given pim interface is registered with
BFD.
Upon receiving BFD status down event, PIM Neighbor with BFD info is deleted.
Add pim bfd configuraiton (CLI) per interface, '[no] ip pim bfd'
Testing Done:
Configure BFD under PIM interface on all neighbor routers,
check bfd sessions up, remote end unconfigure BFD, results in BFD session down.
Previous state was UP to New state DOWN, results in PIM neighbor delete behind
that particular pim interface.
Pim-smoke Results:
Ran 94 tests in 7409.680s
FAILED (SKIP=8, failures=2)
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
-Upon Receving SGRpt Prune message, transitioning from Prune Pending state
to NOINFO state, ifchannel entry was getting deleted in prune pending timer
expiry. This can result in SGRpt ifhchannel deleted and recreated upon receving
triggered or periodic SGRpt received from downstream.
The automation test failed as it expected (check) SGRpt entry at RP after it triggers
SPT switchover.
- While transitioning from Prune-Pending state to NOINFO(Pruned) state, Trigger
SGRpt message towards RP.
- Add/del some of the debug traces
Ticket:CM-16057
Reviewed By:CCR-6198
Testing Done:
Rerun test08 multiple times and observed passing it.
Pim-smoke with hardnode
Ran 95 tests in 11219.420s
FAILED (SKIP=10, failures=4)
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Add a list structure to track the ifchannels associated
with a particular upstream.
We are not doing anything with this particular knowledge
yet but it will be come useful in the near future.
Ticket: CM-15629
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are determining an inherited_olist, let's be allot
smarter about what we look at. Before this code change
we are looping over the entirety of all ifchannels in
the system to find the relevant ones. Convert the
code to *find*(hash table lookup) the specific ifchannels we
are interested in.
Ticket: CM-15629
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
To the 'ip pim spt-switchover infinity-and-beyond' command
add 'prefix-list <PLIST>'. To the command.
Use this as the basis to deny (Not immediate switchover)
or permit (Immediate switchover), based upon matching
the group address and the prefix-list.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This allows SPT switchover for S,G upon receipt of packets
on the LHR.
1) When we create a *,G from a IGMP Group Report, install
the *,G route with the pimreg device on the OIL.
2) When a packet hits the LHR that matches the *,G, we will
get a WHOLEPKT callback from the kernel and if we cannot
find the S,G, that means we have matched it on the LHR via
the *,G mroute. Create the S,G start the KAT and run
inherited_olist.
3) When the S,G times out, safely remove the S,G via
the KAT expiry
4) When the *,G is removed, remove any S,G associated
with it via the LHR flag.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>