Joined state is computed based on the downstream state and cannot be
changed if the RPF link flaps.
Reference: rfc 7761, section 4.5.5
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This commit includes the following changes -
1. kat needs to be included when evaluting join desired on a (S,G)
entry.
2. there were cases where we were adding OIF based on joindesired
being true for unrelated reasons (on other OIFs). cleaned up those
cases.
3. make all calls to pim_upstream_switch conditional on the JoinDesired
macro.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
RP config change is a big hammer and use_rpt/spt needs to be
re-evaluated on all existing (S,G) entries.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If a source is being forwarded along the RPT it uses the parent (*,G)'s
IIF. When the parent's IIF changes all the children need to be updated
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
mfcc_parent for an (S, G) entry was being updated on any upstream RPF
change. With the change to use RPT for (S,G) in some cases we can no
longer do that. Instead the upstream entry's RPF neigbor is managed
separately form the channel_oil's mfcc_parent i.e. via NHT. And the
mfcc_parent is evaluated at the time of mroute programming.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An (S,G) mroute can be created as a result of rpt prune. However that
entry needs to stay on the parent (*,G)'s tree (IIF) till a decision is
made to switch the source to the SPT.
The decision to stay on the RPT is made based on the SPTbit setting
according to - RFC7761, Section 4.2 “Data Packet Forwarding Rules”
However those rules are hard to achieve when hw acceleration i.e.
control and data planes are separate. So instead of relying on data
we make the decision of using SPT if we have decided to join the SPT -
Use_RPT(S,G) {
if (Joined(S,G) == TRUE // we have decided to join the SPT
OR Directly_Connected(S) == TRUE // source is directly connected
OR I_am_RP(G) == TRUE) // RP
//use_spt
return FALSE;
//use_rpt
return TRUE;
}
To make that change some re-org was needed -
1. pim static mroutes and dynamic (upstream mroutes) top level APIs
have been separated. This is to limit the state machine to dynamic
mroutes.
2. c_oil->oil.mfcc_parent is re-evaluated based on if we decided
to use the SPT or stay on the RPT.
3. upstream mroute re-eval is done when any of the criteria involved
in Use_RPT changes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Theoretically there should be no case where the channel-oil hangs
around after the upstream entry is removed. But currently there are
cases where it does. This is a precautionary fixup till we are
rid off all of those cases.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. This avoids the needs to re-run "muting" decisions.
2. Avoids the need to restore's pim OIL after fixup and send to kernel
(this is getting harder to manage).
In the future we need to also move the PIM maintained channel OIL from
an array of MAXVIFs to a simple DLL. This will be a significant
optimization in memory usage and preformance (OIL reads, copies etc).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If an mroute loses DF election (with the MLAG peer) it has to stop
forwarding traffic on active-active devices such as ipmr-lo used
for vxlan traffic termination. To acheive that this commit
introduces a concept of OIF muting. That way we can let the PIM and
IGMP state machines play out and silence OIFs after the fact.
Relevant outputs:
=================
1. muted OIFs are displayed with the M flag in "pim state" -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@TORC12:~# net show pim state |grep "27.0.0.13"|grep 100
1 27.0.0.13 239.1.1.100 uplink-1 ipmr-lo( *M)
root@TORC12:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2. And supressed altogether in the mroute output -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@TORC12:~# net show mroute |grep "27.0.0.13"|grep 100
27.0.0.13 239.1.1.100 none uplink-1 none 0 --:--:--
root@TORC12:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
These logs were printing file name which has little value (is always
pim_oil.c). Instead print the caller.
add_oif/del_oif are being called directly from one too many. Instead OIF
setup needs to be consolidated via the PIM state machine. These
debugs are expected to help in understanding what needs to be cleaned up.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Before the fix:
2019/11/14 19:52:21 BGP: peer 192.168.2.5 deleted from subgroup s4peer
cnt 0 - missing space after s4 before peer
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Recently Lot of issues are seen in OSPF adjacnecy establishements,
sessions was tear down because of DD Sequence Number mismatch.
adding Debugs to capture Master & slave generated sequence numbers.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Problem reported with tracebacks seen when making multiple bfd timer
changes in frr.conf and applying via frr-reload.py. Found that when
multiple bfd timer changes are made, the same line can be added for
deletion more than once, causing the traceback when the deletion is
performed. This fix verifies the correct line is being appended for
deletion.
Ticket: CM-27233
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
The bug:
As part of displaying advertised routes to a peer, in the outer loop, we
iterate through all prefixes in the evpn table. In the inner loop,
we iterate through adj_out of each prefix.
If a prefix which is present in the evpn table is not advertised to a peer,
its corresponding attr == NULL. Checking for this condition is the fix.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
This code is called from the zebra main pthread during shutdown
but the thread event is scheduled via the zebra dplane pthread.
Hence, we should be using the `thread_cancel_async()` API to
cancel the thread event on a different pthread.
This is only ever hit in the rare case that we still have work left
to do on the update queue during shutdown.
Found via zebra crash:
```
(gdb) bt
\#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
\#1 0x00007f4e4d3f7535 in __GI_abort () at abort.c:79
\#2 0x00007f4e4d3f740f in __assert_fail_base (fmt=0x7f4e4d559ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f4e4d9071d0 "master->owner == pthread_self()",
file=0x7f4e4d906cf8 "lib/thread.c", line=1185, function=<optimized out>) at assert.c:92
\#3 0x00007f4e4d405102 in __GI___assert_fail (assertion=assertion@entry=0x7f4e4d9071d0 "master->owner == pthread_self()", file=file@entry=0x7f4e4d906cf8 "lib/thread.c",
line=line@entry=1185, function=function@entry=0x7f4e4d906b68 <__PRETTY_FUNCTION__.15817> "thread_cancel") at assert.c:101
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
\#5 0x000055b40c291845 in zebra_dplane_shutdown () at zebra/zebra_dplane.c:3274
\#6 0x000055b40c27ee13 in zebra_finalize (dummy=<optimized out>) at zebra/main.c:202
\#7 0x00007f4e4d8d1416 in thread_call (thread=thread@entry=0x7ffcbbc08870) at lib/thread.c:1599
\#8 0x00007f4e4d8a1ef8 in frr_run (master=0x55b40ce35510) at lib/libfrr.c:1024
\#9 0x000055b40c270916 in main (argc=8, argv=0x7ffcbbc08c78) at zebra/main.c:483
(gdb) down
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
1185 assert(master->owner == pthread_self());
(gdb)
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When receiving igmp packets we are spitting out a lot of
debugs. Attempt to clean this up to allow us to understand
what is going on a bit better by just being able to look
at the log file.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When you turn on `debug igmp trace` we are seeing a bunch
of debugs associated with pim processing. This is because we were
using PIM_DEBUG_TRACE which is both `debug igmp trace` and `debug pim trace`
when tracing igmp code it would be nice to only see igmp work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We are seeing situations where PIM is sending a IGMP v3 query
and immediately receiving back up the pim kernel interface the
query from itself:
from `show int brief`:
swp7 up default 192.168.202.1/24
We are also receiving these debugs:
2019-11-11T20:52:40.452307+00:00 leaf02 pimd[1592]: Send IGMPv3 query to 224.4.0.8 on swp7 for group 224.4.0.8, sources=0 msg_size=12 s_flag=0 QRV=2 QQI=125 QQIC=7d
2019-11-11T20:52:40.452430+00:00 leaf02 pimd[1592]: pim_mroute_msg(default): igmp kernel upcall on swp7(0x55eaa7dc7dc0) for 192.168.202.1 -> 224.4.11.123
2019-11-11T20:52:40.452574+00:00 leaf02 pimd[1592]: Recv IP packet from 192.168.202.1 to 224.4.11.123 on swp7: size=40 ip_header_size=24 ip_proto=2
2019-11-11T20:52:40.452699+00:00 leaf02 pimd[1592]: Recv IGMP packet from 192.168.202.1 to 224.4.11.123 on swp7: ttl=1 msg_type=17 msg_size=16
2019-11-11T20:52:40.452824+00:00 leaf02 pimd[1592]: Recv IGMP query v3 from 192.168.202.1 on swp7 for group 224.4.11.123
This query is causing us to do some weird gyrations around the IGMP state machine for handling
queries. Let's just prevent it from happening.
Ticket: CM-27247
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were only checking that two nhg_hash_entry's were equal
based on the active nexthop NUMBER. This is not sufficient in
special cases where whats active with one route using it,
might not be active with the other. We can see this with
routes trying to resolve to themselves.
Ex)
1.1.1.0/24
-> 1.1.1.1 dummy1 (inactive)
-> 1.1.1.2 dummy2
1.1.2.0/24
-> 1.1.1.1 dummy1
-> 1.1.1.2 dummy1 (inactive)
Without checking each nexthop individually, they will
hash to the same group since they have the same number of
active nexthops.
Fix this by looping over every nexthop for each nhe (they should
be sorted) and checking if the NEXTHOP_FLAG_ACTIVE flag's match.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We cannot clear the NEXTHOP_FLAG_FIB nexthop flag
when sending routes to the dataplane anymore since
nexthops are now shared.
We were seeing a situation where if we delete a route
using a nexthop group that is still active with another
route, the fib flag was being unset by this code
path despite them still being valid fib nexthops with the
other route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were crashing due to a missed label change code path
in mpls_ftn_uninstall() with the zebra_nhg hashing code.
Add a static handler function for label changing everywhere
in that code and use it in mpls_ftn_uninstall().
The crash was found in the ISIS-SR tests:
==23== Thread 1:
==23== Invalid read of size 4
==23== at 0x15B20E: zebra_nhg_hash_equal (zebra_nhg.c:365)
==23== by 0x489A2FD: hash_get (hash.c:143)
==23== by 0x489A4BC: hash_lookup (hash.c:183)
==23== by 0x15B5A3: zebra_nhg_find (zebra_nhg.c:494)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x1573E8: mpls_ftn_update (zebra_mpls.c:2661)
==23== by 0x1A2554: zread_mpls_labels_replace (zapi_msg.c:1890)
==23== by 0x1A41CD: zserv_handle_commands (zapi_msg.c:2613)
==23== by 0x199B17: zserv_process_messages (zserv.c:517)
==23== by 0x48EE6B7: thread_call (thread.c:1549)
==23== by 0x48A8AD5: frr_run (libfrr.c:1064)
==23== by 0x1391B7: main (main.c:468)
==23== Address 0x5839330 is 0 bytes inside a block of size 80 free'd
==23== at 0x48369AB: free (vg_replace_malloc.c:530)
==23== by 0x48AEE6C: qfree (memory.c:129)
==23== by 0x15C5F8: zebra_nhg_free (zebra_nhg.c:1095)
==23== by 0x15BC8C: zebra_nhg_handle_uninstall (zebra_nhg.c:734)
==23== by 0x15DCFA: zebra_nhg_uninstall_kernel (zebra_nhg.c:1826)
==23== by 0x15C666: zebra_nhg_decrement_ref (zebra_nhg.c:1106)
==23== by 0x15D9D7: zebra_nhg_re_update_ref (zebra_nhg.c:1711)
==23== by 0x15D8B1: nexthop_active_update (zebra_nhg.c:1660)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
==23== Block was alloc'd at
==23== at 0x4837B65: calloc (vg_replace_malloc.c:752)
==23== by 0x48AED56: qcalloc (memory.c:110)
==23== by 0x15B07B: zebra_nhg_copy (zebra_nhg.c:307)
==23== by 0x15B13E: zebra_nhg_hash_alloc (zebra_nhg.c:329)
==23== by 0x489A339: hash_get (hash.c:148)
==23== by 0x15B6CA: zebra_nhg_find (zebra_nhg.c:532)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x15D89A: nexthop_active_update (zebra_nhg.c:1658)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Use a local variable to avoid trying to take the address
of a packed struct member - an address from the ip header
in these cases.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
PR 5118 introduce additional (prepend) keywords
like 'ip' to Route Distinguisher output which
breaks existing evpn route show commands parsing.
Restore to original behavior.
Testing Done:
vtysh -c 'show bgp l2vpn evpn route'
Before fix:
Route Distinguisher: ip 27.0.0.15:44
Post fix:
Route Distinguisher: 27.0.0.15:44
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The opaque lsa that we are storing is stored on various
lists depending on it's type. This removal from the
list was being done *after* the pointer was freed. This
is not a good idea. Since the use after free was just
removal from a linked list and the freeing function does
not do anything other than free data, than just switching the function
order should be sufficient.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This reverts commit 7d5bb02b1a.
Allow zebra to actually maintain the nexthop group in the
linux kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>