Rename functions (`pim_msdp_peer_new` => `pim_msdp_peer_add` and
`pim_msdp_peer_do_del` => `pim_msdp_peer_del`) to keep consistency and
update the `pim_msdp_peer_add` documentation to tell users that this is
also used for non meshed group peers.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Fully utilize the northbound to hold pointers to our private data
instead of searching for data structures every time we need to change a
configuration.
Highlights:
* Support multiple mesh groups per PIM instance (instead of one)
* Use DEFPY instead of DEFUN to reduce code complexity
* Use northbound private pointers to store data structures
* Reduce callback names size
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When running pim on an interface and that interface has
state and we move that interface into a different vrf
there exists a call path where we have not created the pimreg
device yet. Prevent a crash in this rare situation.
Ticket: #2552763
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Compile with v2.0.0 tag of `libyang2` branch of:
https://github.com/CESNET/libyang
staticd init load time of 10k routes now 6s vs ly1 time of 150s
Signed-off-by: Christian Hopps <chopps@labn.net>
VRF creation can happen from either cli or from
knowledged about the vrf learned from zebra.
In the case where we learn about the vrf from
the cli, the vrf id is UNKNOWN. Upon actual
creation of the vrf, lib/vrf.c touches up the vrf_id
and calls pim_vrf_enable to turn it on properly.
At this point in time we have a pim->vrf_id of
UNKNOWN and the vrf->vrf_id of the right value.
There is no point in duplicating this data. So just
remove all pim->vrf_id and use the vrf->vrf_id instead
since we keep a copy of the pim->vrf pointer.
This will remove some crashes where we expect the
pim->vrf_id to be usable and it's not.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When creating configuration for a vrf *Before* the vrf has been
created, pim will not properly create the pimreg device and
we will promptly crash when we try to pass data.
Put some code checks in place to ensure that the pimreg is
created for vrf's.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The router->register_suppress_time is used to derive the
rp_keep_alive_time, but when the suppress time was changed, pim was
not recalculating the rp_keep_alive_time and left it at the old value.
This fix applies the changes when a new suppress_time is entered
(or removed.)
Signed-off-by: Don Slice <dslice@nvidia.com>
Problem reported that when certain pim commands were entered, they
showed up duplicated in the configuration both under default instance
and every vrf (whether pim was used there or not.) This was because
these particular parameters are global only and the function doing
the display would repeat for each vrf. This fix only displays those
in the default case (and removes them from the help for entering
under a vrf.)
Signed-off-by: Don Slice <dslice@nvidia.com>
Just some cleanup before I touch this code; switching to typesafe list
macros & putting the data directly inline.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
... yes we may need it later, but if and when that happens we can put it
back there. No point carrying around unused things.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Display the MSDP peer configuration in `show running-config` so it can
be saved on configuration write.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
encoding signed int as unsigned is bad practice; since we want to do
it here lets at least be explicit about it
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
`config.h` has all the defines from autoconf, which may include things
that switch behavior of other included headers (e.g. _GNU_SOURCE
enabling prototypes for additional functions.)
So, the first include in any `.c` file must be either `config.h` (with
the appropriate guard) or `zebra.h` (which includes `config.h` first
thing.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
... by referencing all autogenerated headers relative to the root
directory. (90% of the changes here is `version.h`.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Most of these are many, many years out of date. All of them vary
randomly in quality. They show up by default in packages where they
aren't really useful now that we use integrated config. Remove them.
The useful ones have been moved to the docs.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Issue:
User is allowed to configure only hello without hold timer but when undo
config, the hold timer is mandatory as shown below:
FRR-4(config-if)# ip pim hello 10
<cr>
(1-180) Time in seconds for Hold Interval
FRR-4(config-if)# ip pim hello 10
FRR-4(config-if)# no ip pim hello 10
(1-180) Time in seconds for Hold Interval
FRR-4(config-if)# no ip pim hello 10
% Command incomplete: no ip pim hello 20
Fix:
Making the hold timer as optional when undo config.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Also included display of hold time in CLI 'show ip pim int <intf>' cmd
and json commands.
Issue:
PIM neighbor not coming up if hold time is less than hello timer
since hello is sent every 4 sec and hold is 1 sec,
because of this nbr is flapping
Fix:
Do not allow configuration of hold timer less than hello timer
Also reset the value of hold timer to 3.5 times to hello whenever
only hello is modified so that the relationship holds good.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Back when I put this together in 2015, ISO C11 was still reasonably new
and we couldn't require it just yet. Without ISO C11, there is no
"good" way (only bad hacks) to require a semicolon after a macro that
ends with a function definition. And if you added one anyway, you'd get
"spurious semicolon" warnings on some compilers...
With C11, `_Static_assert()` at the end of a macro will make it so that
the semicolon is properly required, consumed, and not warned about.
Consistently requiring semicolons after "file-level" macros matches
Linux kernel coding style and helps some editors against mis-syntax'ing
these macros.
Signed-off-by: David Lamparter <equinox@diac24.net>
There are places in the code where function nb_running_get_entry is used
with abort_if_not_found set to true during the config validation stage.
This is incorrect because when used in transactional CLI, the running
entry won't be set until the apply stage, and such usage leads to crash.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Recent change in commit: 6b73800ba2
Caused this error to pop up in pim_igmp_mtrace.c:
error: taking address of packed member 'rsp_addr' of class or structure 'igmp_mtrace' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member]
Follow the pattern used in the code to solve this problem for clang
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This was found while doing libyang2 work (causes assert); however, it is
also incorrect for libyang1 (empty canonical value for incorrectly
referenced interface vs interface-name node).
While here, fix 2 other incorrect uses of "." on a container node.
Signed-off-by: Christian Hopps <chopps@labn.net>
We can proactively check whether this mroute will be nacked by loopfree
MFC checks so let's do it in the apply phase and emit a useful error
message.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
When the control plane protocol is created, the vrf structure is
allocated, and its address is stored in the northbound node.
The vrf structure may later be deleted by the user, which will lead to
a stale pointer stored in this node.
Instead of this, allow daemons that use the vrf pointer to register the
dependency between the control plane protocol and vrf nodes. This will
guarantee that the nodes will always be created and deleted together, and
there won't be any stale pointers.
Add such registration to staticd and pimd.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Neither tabs nor newlines are acceptable in syslog messages. They also
break line-based parsing of file logs.
Signed-off-by: David Lamparter <equinox@diac24.net>
Modify code to add JSON format output in show command.
"show ip igmp [vrf NAME] join" and "show ip igmp vrf all join" with proper formatting
Signed-off-by: Sai Gomathi <nsaigomathi@vmware.com>
Valgrind reports:
469901-==469901==
469901-==469901== Conditional jump or move depends on uninitialised value(s)
469901:==469901== at 0x3A090D: bgp_bfd_dest_update (bgp_bfd.c:416)
469901-==469901== by 0x497469E: zclient_read (zclient.c:3701)
469901-==469901== by 0x4955AEC: thread_call (thread.c:1684)
469901-==469901== by 0x48FF64E: frr_run (libfrr.c:1126)
469901-==469901== by 0x213AB3: main (bgp_main.c:540)
469901-==469901== Uninitialised value was created by a stack allocation
469901:==469901== at 0x3A0725: bgp_bfd_dest_update (bgp_bfd.c:376)
469901-==469901==
469901-==469901== Conditional jump or move depends on uninitialised value(s)
469901:==469901== at 0x3A093C: bgp_bfd_dest_update (bgp_bfd.c:421)
469901-==469901== by 0x497469E: zclient_read (zclient.c:3701)
469901-==469901== by 0x4955AEC: thread_call (thread.c:1684)
469901-==469901== by 0x48FF64E: frr_run (libfrr.c:1126)
469901-==469901== by 0x213AB3: main (bgp_main.c:540)
469901-==469901== Uninitialised value was created by a stack allocation
469901:==469901== at 0x3A0725: bgp_bfd_dest_update (bgp_bfd.c:376)
On looking at bgp_bfd_dest_update the function call into bfd_get_peer_info
when it fails to lookup the ifindex ifp pointer just returns leaving
the dest and src prefix pointers pointing to whatever was passed in.
Let's do two things:
a) The src pointer was sometimes assumed to be passed in and sometimes not.
Forget that. Make it always be passed in
b) memset the src and dst pointers to be all zeros. Then when we look
at either of the pointers we are not making decisions based upon random
data in the pointers.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
RCA: There were 2 problems.
1. SGRpt prune expiry didn't create S,G entry with none oil when no other
interfaces were part of the oil.
2. When restarting the timer with new hold value, comparision was missing and
old timer was not stopping.
Fix:
SGRpt Prune pending expiry will put SG entry with none oil if no other
Signed-off-by: Saravanan K <saravanank@vmware.com>
interfaces present. If present we will be deleting the inherited oif from oil.
Deleting the oif in that scenario will take care of changing mroute.
When alone interface expires in SGRpt prune pending state, we shall detect by
checking installed flag. if not installed, install mroute.
Valgrind is reporting this:
==22220== Invalid read of size 4
==22220== at 0x11DC2B: pim_if_delete (pim_iface.c:215)
==22220== by 0x11DD71: pim_if_terminate (pim_iface.c:76)
==22220== by 0x128E03: pim_instance_terminate (pim_instance.c:66)
==22220== by 0x128E03: pim_vrf_delete (pim_instance.c:159)
==22220== by 0x48E0010: vrf_delete (vrf.c:251)
==22220== by 0x48E0010: vrf_delete (vrf.c:225)
==22220== by 0x48E02FE: vrf_terminate (vrf.c:551)
==22220== by 0x149495: pim_terminate (pimd.c:142)
==22220== by 0x13C61B: pim_sigint (pim_signals.c:44)
==22220== by 0x48CF862: quagga_sigevent_process (sigevent.c:103)
==22220== by 0x48DD324: thread_fetch (thread.c:1404)
==22220== by 0x48A926A: frr_run (libfrr.c:1122)
==22220== by 0x11B85E: main (pim_main.c:167)
==22220== Address 0x5912160 is 1,200 bytes inside a block of size 1,624 free'd
==22220== at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==22220== by 0x128E52: pim_instance_terminate (pim_instance.c:74)
==22220== by 0x128E52: pim_vrf_delete (pim_instance.c:159)
==22220== by 0x48E0010: vrf_delete (vrf.c:251)
==22220== by 0x48E0010: vrf_delete (vrf.c:225)
==22220== by 0x48F1353: zclient_vrf_delete (zclient.c:1896)
==22220== by 0x48F1353: zclient_read (zclient.c:3511)
==22220== by 0x48DD826: thread_call (thread.c:1585)
==22220== by 0x48A925F: frr_run (libfrr.c:1123)
==22220== by 0x11B85E: main (pim_main.c:167)
==22220== Block was alloc'd at
==22220== at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==22220== by 0x48ADA4F: qcalloc (memory.c:110)
==22220== by 0x128B9B: pim_instance_init (pim_instance.c:82)
==22220== by 0x128B9B: pim_vrf_new (pim_instance.c:142)
==22220== by 0x48E0C5A: vrf_get (vrf.c:217)
==22220== by 0x48F13C9: zclient_vrf_add (zclient.c:1863)
==22220== by 0x48F13C9: zclient_read (zclient.c:3508)
==22220== by 0x48DD826: thread_call (thread.c:1585)
==22220== by 0x48A925F: frr_run (libfrr.c:1123)
==22220== by 0x11B85E: main (pim_main.c:167)
On pim vrf deletion, ensure that the vrf->info pointers are NULL as well
as the free'd pim pointer for ->vrf is NULL as well.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Issue: #7892 has a startup config where only igmp
interfaces are created and a igmp report comes in.
Effectively we are not creating the regiface device unless
we do a `ip pim`. This is incorrect we should always create
the regiface.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
1. As per RFC 4601 Sec 4.5.7:
* JoinDesired(S,G) -> False, set SPTbit to false.
2. Change the debug type.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
This command has been added in the context of
PIM BSM functionality. This command will clear the
data structs having bsr information.
Co-authored-by: Sarita Patra <saritap@vmware.com>
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
Test case 5.10 sends leave message to unicast address, the leave
packet is accepted and a query message is sent in response to this.
No validation for address is present in the function
Add check for addresses as per RFC. Leave messages are allowed only
sent to either ALL-ROUTERS (224.0.0.2) or group address.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
1. When a node changes from non-DR to DR in the given topology,
the node was receiving both PIM Join as well as IGMP join.
Since it was already receiving PIM Join previously, ifchannel was
already present. Hence when it becomes DR, the IGMP source flag is not
set due to issue in the code. Hence it never creates S,G entry thinking
that it is not DR.
2. When pim join expires, the pim flag is not reset when ifchannel is not
deleted.
Issue: #7752
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
The flag "R" in below display description denotes the
SGRpt Pruned but the description shows something else
hence correcting the definition.
R2(config)# do show ip mroute
IP Multicast Routing Table
Flags: S- Sparse, C - Connected, P - Pruned
R - RP-bit set, F - Register flag
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Issue:
Configure msdp mesh mg1 for 2 groups.
ip msdp mesh-group mg1 member 1.1.1.1
ip msdp mesh-group mg1 member 1.1.1.2
Remove mg1 for 1.1.1.1
no ip msdp mesh-group mg1 member 1.1.1.1
mg1 for 1.1.1.1 still present. This is fixed.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue:
Configure RP.
ip pim rp 20.0.0.1 239.1.1.1/32
ip pim rp 20.0.0.1 239.1.1.2/32
Remove RP.
no ip pim rp 20.0.0.1 239.1.1.1/32
Rp is not removed. This is fixed now.
Signed-off-by: Sarita Patra <saritap@vmware.com>
The pim_version.[c|h] files are never used and we are getting
warnings about PIM_VERSION changing pointer sizes from
newer versions of the compiler. I see no reason to keep this
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
pim is enabled internally/implicitly on the vxlan termination device
and displaying that can confuse the admin and tools (such as
frr-reload).
Ticket: CM-30180
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If we screw up and don't have the right flags we'll print
out garbage. At the very least just print out nothing.
Signed-off-by: Donald Sharp <sharp@nvidia.com>
Issue: When an IGMPv2 leave packet is received, it did not validate
the checksum and hence the packet is accepted and group specific
query is sent out in response to this.
Due to this IGMP conformance test case 6.1 failed.
https://github.com/FRRouting/frr/issues/6868
Fix: Validate the checksum for all IGMP packets
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
The `enum zclient_send_status` enum needs to be extended
throughout the code base to use the new states and
to fix up places where we tested against the return
value being non zero.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
problem :
=========
When (*,G) prune received where we have SGRpt state,
ifchannel goes to NO_INFO state and doesn't get removed.
Root cause :
============
During the processing of (*,G) prune, we are not removing the
ifchannel on PruneTmp or PrunePendingTmp state.
Fix :
=====
In that scenario, stop joinExpiry timer and delete the ifchannel.
issue #7347
Co-authored-by: Saravanan K <saravanank@vmware.com>
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
Replace all lib/thread cancel macros, use thread_cancel()
everywhere. Only the THREAD_OFF macro and thread_cancel() api are
supported. Also adjust thread_cancel_async() to NULL caller's pointer (if
present).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Create appropriate accessor functions for the rn->lock
data. We should be accessing this data through accessor
functions since it is private data to the data structure.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We have this pattern in the code base:
if (thread)
THREAD_OFF(thread);
If we look at THREAD_OFF we check to see if thread
is non-null too. So we have a double check.
This is unnecessary. Convert to just using THREAD_OFF
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
* If the MSDP peer receives the SA from a non-RPF peer towards the
originating RP, it will drop the message.
* SA messages are forwarded away from the RP address only.
* SA messages are not forwarded within the mesh group.
* Preventing the MSDP connection from being dropped due to RPF check
failure (RFC3618, section 13 "MSDP Error Handling")
Signed-off-by: Adriano Marto Reis <adrianomarto@gmail.com>
Signed-off-by: Adriano Reis <areis@barrukka.local>
This should never happen; no need to debug guard it and it's not a
warning, if this isn't working then NHT is not working at all.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Clean up the rare situation when bind fails to not
close the fd that was just opened and have the socket
leaked.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We use the pim mroute socket for kernel notifications of events.
Currently this is limited to 8 bits of data. There are patches
coming down the pike in kernel land to allow this to expand.
Rather than fix this and all the other places we assume MAXVIFS < 256
in the pim code right now. Leave a land mine for the developer
doing this work to point them in the right direction.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Match by exact address rather than by prefix match to
determine if we generated the IGMPP query. Othwerwise
we will be ignoring IGMP queries coming from other
hosts on the same subnet.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Reviewed-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
IGMP queries should contain the source address of the IGMP socket
they are being sent from.
Added binding the IGMP sockets to their specific source, otherwise
interfaces with multiple addresses will send multiple queries using
the same source, which is determined by the kernel.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Reviewed-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
IGMP packets received from a source that does not match the subnet
of any configured addresses on the receive interface should be
ignored.
Also, find and use the correct IGMP socket object for the received
IGMP packet.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
Suppose you have more than 2 addresses on a pim interface:
lo up default 10.255.0.1/32
10.255.0.101/32
10.255.0.254/32
A `show ip pim int lo` gives us this:
eva# show ip pim interface lo
Interface : lo
State : up
Address : 10.255.0.1 (primary)
10.255.0.101/32
When we go look at the code that pulls secondary addresses in
we are using a prefix_cmp to know if we know about a secondary already
but were expecting true values instead of -1/0/1 being returned.
Modify code so that pim sees all secondary addresses
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
pimd crash at pim_msg_build_jp_groups (
grp=grp@entry=0x7ffca55b5d1e, sgs=sgs@entry=0x17821a0, size=20)
at pimd/pim_msg.c:198
Fix for https://github.com/FRRouting/frr/issues/6849
Root Cause:
===========
pimd has crashed because pim_upstream_rpf_clear function sets the
up->rpf.source_nexthop.interface pointer to NULL and has not removed
the upstream source node from the neighbor. When the upstream gets
deleted the source is not removed from neighbor
neigh->upstream_jp_agg->groups->sources list. This source node has
pointer to upstream freed memory. Hence when on_neighbor_jp_timer expires,
it tries to access the upstream pointer and crashed.
Fix:
====
Before setting the interface pointer to NULL, remove the node from
neigh->upstream_jp_agg->groups->sources list. Also the upstream state
has to be changed to Not joined.
Removed extra line changes.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Move pim and igmp yang files registery to appropriate makefiles.
In yang directory makefile move under `PIMD`
Remove pimd yang files from library makefile instead move them
to pimd makefile.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
There are couple spots where group may be NULL and
when we output strings associated with it we should
ensure we are not doing something stupid.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When creating a pim instance, we were allocating table information
but never freeing it. Do so.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
BFD profiles can now be used on the interface level like this:
interface eth1
ip router isis 1
isis bfd
isis bfd profile default
Here the 'default' profile needs to be specified as usual in the
bfdd configuration.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>
This section of code was ported incorrectly and is causing
issues when running against some tests. Fix the missed
code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don't crash when trying to `show running-config` because of missing
filter northbound integration.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Issue: After SPT switchover, do shut and no-shut the received connected
interface, traffic stops.
R2
| |
+ +
Client-----R1--------R3----Source
R2 is RP.
Root cause:
Client is sending join for G and Source is sending traffic for G.
Before SPT switchover, traffic flows R3-R2-R1, after SPT switchover,
traffic flows R3-R1. Now Check in R2, there will be 2 ifchannel gets
created. first is (*, G) ifchannel which gets created because of (*, G)
join received from R1, second is (S, G) ifchannel which gets created
because of (s,g,rpt) prune received from R1
Shut the receiver connected interface on R1, R1 will send a (*, G) prune
towards RP (R2). On receiving (*, G) prune, R2 deletes the (*, G) ifchannel.
(s,g) ifchannel with flag (s,g,rpt) set will be timeout after the prune timer
expires. Before this timer expires, do noshut the received connected inrterface
on R1. R1 will send a (*,G) join to R2(RP), So oil will be updated in (*, G),
but wont get updated in (s,g) since the flag (s,g,rpt) is set. So traffic flow
stops.
Fix: When (*, G) ifchannel is getting deleted because of (*, G) prune
received, as (*,G) prune indicates that the router no longer wishes
to receive shared tree traffic, so clear (S,G,RPT) flag on all the child (S,G)
ifchannel, which was created because of (S,G,RPT) prune received
Signed-off-by: Sarita Patra <saritap@vmware.com>
Found some more missing code that got dropped during the
upstreaming process causing issues with things actually
working.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the mlag code was ported upstream the structure
of the calls in igmp_source_forward_start had been
reset and as a result we missed the call to check
that creating an ifchannel when we are DualActive
is allowed.
Ticket: CM-29941
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Description:
"show ip mroute" displays only installed(kernel) mroutes, where
as "show ip mroute json" diplays both installed and not installed
mroutes in the o/p.To make this consistant, diplaying only valid
routes in json o/p.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
Description:
Added json support for the following PIM commands.
1. show ip mroute [vrf NAME] count [json]
2. show ip mroute vrf all count [json]
3. show ip mroute [vrf NAME] summary [json]
4. show ip mroute vrf all summary [json]
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
We were using thread_cancel() directly instead of
THREAD_OFF which correctly ensures the pointer is not NULL.
Ticket:CM-29866
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Added a new show command "show ip multicast", display the multicast data
packet in and out on interface level.
Signed-off-by: Sarita Patra <saritap@vmware.com>
These are easy to get subtly wrong, and doing so can cause
nondeterministic failures when racing in parallel builds.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replace all `random()` calls with a function called `frr_weak_random()`
and make it clear that it is only supposed to be used for weak random
applications.
Use the annotation described by the Coverity Scan documentation to
ignore `random()` call warnings.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
It is possible that a if_lookup_by_index can return NULL
ensure that we handle this appropriately.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
And again for the name. Why on earth would we centralize this, just so
people can forget to update it?
Signed-off-by: David Lamparter <equinox@diac24.net>
Same as before, instead of shoving this into a big central list we can
just put the parent node in cmd_node.
Signed-off-by: David Lamparter <equinox@diac24.net>
There is really no reason to not put this in the cmd_node.
And while we're add it, rename from pointless ".func" to ".config_write".
[v2: fix forgotten ldpd config_write]
Signed-off-by: David Lamparter <equinox@diac24.net>
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>
Issue: no ip msdp mesh-group <word> source command
deleting the mesh group, which might be used by the member.
Solution: no ip msdp mesh-group <word> source command, deletes
the mesh-group source.
Add a new cli command "no ip msdp mesh-group <word>" to delete
the mesh group.
Signed-off-by: Sarita Patra <saritap@vmware.com>
JSON output for igmp group display is modified as follows for better python handling.
1. Under each interface make array of group as an object
2. Include source and group inside each group object
These improvements are required for the set of topotest cases which will be upstreamed shortly.
Signed-off-by: Saravanan K <saravanank@vmware.com>
This CLI will allow user to configure a igmp group limit which will generate
a watermark warning when reached.
Though watermark may not make sense without setting a limit, this
implementation shall serve as a base to implementing limit in future and helps
tracking a particular scale currently.
Testing:
=======
ip igmp watermark-warn <10-60000>
on reaching the configured number of group, pim will issue warning
2019/09/18 18:30:55 PIM: SCALE ALERT: igmp group count reached watermak limit: 210(vrf: default)
Also added group count and watermark limit configured on cli - show ip igmp groups [json]
<snip>
Sw3# sh ip igmp groups json
{
"Total Groups":221, <=====
"Watermark limit":210, <=========
"ens224":{
"name":"ens224",
"state":"up",
"address":"40.0.0.1",
"index":6,
"flagMulticast":true,
"flagBroadcast":true,
"lanDelayEnabled":true,
"groups":[
{
"source":"40.0.0.1",
"group":"225.1.1.122",
"timer":"00:03:56",
"sourcesCount":1,
"version":2,
"uptime":"00:00:24"
<\snip>
<snip>
Sw3(config)# do sh ip igmp group
Total IGMP groups: 221
Watermark warn limit(Set) : 210
Interface Address Group Mode Timer Srcs V Uptime
ens224 40.0.0.1 225.1.1.122 ---- 00:04:06 1 2 00:13:22
ens224 40.0.0.1 225.1.1.144 ---- 00:04:02 1 2 00:13:22
ens224 40.0.0.1 225.1.1.57 ---- 00:04:01 1 2 00:13:22
ens224 40.0.0.1 225.1.1.210 ---- 00:04:06 1 2 00:13:22
<\snip>
Signed-off-by: Saravanan K <saravanank@vmware.com>
Valid range for hashmasklen is 0-32 under IPv4; failure to validate this
results in a negative bitshift later
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This is a full rewrite of the "back end" logging code. It now uses a
lock-free list to iterate over logging targets, and the targets
themselves are as lock-free as possible. (syslog() may have a hidden
internal mutex in the C library; the file/fd targets use a single
write() call which should ensure atomicity kernel-side.)
Note that some functionality is lost in this patch:
- Solaris printstack() backtraces are ditched (unlikely to come back)
- the `log-filter` machinery is gone (re-added in followup commit)
- `terminal monitor` is temporarily stubbed out. The old code had a
race condition with VTYs going away. It'll likely come back rewritten
and with vtysh support.
- The `zebra_ext_log` hook is gone. Instead, it's now much easier to
add a "proper" logging target.
v2: TLS buffer to get some actual performance
Signed-off-by: David Lamparter <equinox@diac24.net>
Line break at the end of the message is implicit for zlog_* and flog_*,
don't put it in the string. Mid-message line breaks are currently
unsupported. (LF is "end of message" in syslog.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Problem: This happened in once in a while during testing the scenario multiple
times. When regstop timer expire and at that point if rpf interface doesn't
exist, the register state for the upstream gets struck in reg-prune state indefinitely.
This will not recover even when rpf comes back and traffic resumed because
register state is struck on prune.
RCA: Reg suppression expiry is keeping reg state unchanged when iif is absent.
Fix: When iif is absent during reg suppression expiry, treat it as couldreg
becoming false and move it NO_INFO state.
Signed-off-by: Saravanan K <saravanank@vmware.com>
Problem: output is cut short when prefix string all octets are 3 digit.
RCA: Buffer was allocated only to hold ip addr str.
Fix: Added 3 bytes more to hold prefix length and a /.
Modified buffer in 'show ip pim bsrp-info' and 'show ip pim bsm database'
Signed-off-by: Saravanan K <saravanank@vmware.com>
Zebra is currently sending messages on interface add/delete/update,
VRF add/delete, and interface address change - regardless of whether
its clients had requested them. This is problematic for lde and isis,
which only listens to label chunk messages, and only when it is
waiting for one (synchronous client). The effect is the that messages
accumulate on the lde synchronous message queue.
With this change:
- Zebra does not send unsolicited messages to synchronous clients.
- Synchronous clients send a ZEBRA_HELLO to zebra.
The ZEBRA_HELLO contains a new boolean field: sychronous.
- LDP and PIM have been updated to send a ZEBRA_HELLO for their
synchronous clients.
Signed-off-by: Karen Schoener <karen@voltanet.io>
This is as per RFC. This is identified when conformance suite catched join.
RCA:
Packets were processed without checking allowed dest IP for that packet.
Fix:
Added check for dest IP
Converted this check to a function
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA: preferred bsr routine, compare address in network byte order
Fix: changed to host format before comparision.
Testing:
Verified between 1.1.2.7 and 10.2.1.1, 10.2.1.1 is chosen as bsr
Initially:
R11# sh ip pim bsr
PIMv2 Bootstrap information
Current preferred BSR address: 1.1.2.7
Priority Fragment-Tag State UpTime
0 2862 ACCEPT_PREFERRED 00:00:30
Last BSM seen: 00:00:30
After next bsr started:
R11# sh ip pim bsr
PIMv2 Bootstrap information
Current preferred BSR address: 10.2.1.1
Priority Fragment-Tag State UpTime
0 3578 ACCEPT_PREFERRED 00:00:01
Last BSM seen: 00:00:01
R11# sh ip pim bsr
PIMv2 Bootstrap information
Current preferred BSR address: 10.2.1.1
Priority Fragment-Tag State UpTime
0 3578 ACCEPT_PREFERRED 00:00:04
Last BSM seen: 00:00:04
Signed-off-by: Saravanan K <saravanank@vmware.com>
Problem:
When the ifchannel is in SGRpt prune, if we receive a join, we go into no info
state but mroute still present with none oil
Join Prune Expiry timer on the ifchannel was still running when
Prune pending expired. This causes ifchannel not to be deleted and hence mroute.
Fix:
Stop expiry timer when we move into NOINFO state and delete the ifchannel.
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA: Upstreams which are in register state other than noinfo, doesnt remove
register tunnel from oif after it becomes nonDR
Fix: scan upstreams with iif as the old dr and check if couldReg becomes false.
If couldreg becomes false from true, remove regiface and stop reg timer.
Do not disturb the entry. Later the entry shall be removed by kat expiry.
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA:
Either JP timer is used to send join or join timer.
We are not removing the group from jp aggregate during suppression.
So even if join timer is restarted, jp aggregate expiry during suppression
is sending join for the group.
Fix:
Remove the group from jp aggregate on the neighbor during jp suppression.
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA: This was todo item in current code base
Fix: Hello sent with 0 hold time before we update the pim ifp primary address
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA: starg join fell in to SGRpt join check and was treated as SGRpt join
so ifchannel state machine moved from prunepending to noinfo.
Fix: Check if it is starg join and process
Signed-off-by: Saravanan K <sarav511@vmware.com>
RCA:
Trying to get primary address for the interface.
Unnumbered interface pick address from vrf device for non default.
While doing so it ends up in recursion without exit condition if vrf dev doesnt have any address.
Solution:
Break the recursion by checking if it is vrf device.
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA:
It has asserted because during neighbor delete on a interface,
pim_number_of_nonlandelay_neighbors count has become less less than 0.
This was due to not updating the count when hello option changed.
Fix:
During hello option update, check and increment or decrement when this option changes.
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA:
1. Prune processing during join state was putting of join expiry timer
2. Join received during prune pending state was not comparing hold time with remaining expiry timer.
Fix:
Fixed as per RFC 4601/7761
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA: We are ignoring (*,G) prune when the RP in packet is not RP(G)
Fix:
According to RFC 4601 Section 4.5.2:
Received Prune(*,G) messages are processed even if the RP in the message does not match RP(G).
We have allowed starg prune to be processed in the scenario.
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA: Periodic join is mostly sent by nbr jp timer except for few scenarios by upstream join timer
Fix: If join timer not running, we have to use nbr jp timer to calculate
remaining time for next join.
Signed-off-by: Saravanan K <saravanank@vmware.com>
Issue: Client1------LHR-----(int-1)RP(int-2)------client2
Client2 send IGMP join for group G.
Client1 send IGMP join for group G.
verify show ip mroute in RP, will have 2 OIL.
Client2 send IGMP leave.
Verify show ip mroute in RP, will still have 2.
Root cause: When RP receives IGMP join from client2, it creates
a (s,g) channel oil and add the interface int-2 into oil list and
set the flag PIM_OIF_FLAG_PROTO_IGMP to int-2
Client1 send IGMP join, LHR will send a (*,G) join to RP. RP will
add the interface int-1 into the oil list of (s,g) channel_oil and
will set the flag PIM_OIF_FLAG_PROTO_IGMP and PIM_OIF_FLAG_PROTO_PIM
to the int-1 and set PIM_OIF_FLAG_PROTO_PIM to int-2 as well. It is
happening because of the pim_upstream_inherited_olist_decide() and
forward_on() get all the oil and update the flag wrongly.
So now when client 2 sends IGMP prune, RP will not remove the int-2
from oil list since both PIM_OIF_FLAG_PROTO_PIM & PIM_OIF_FLAG_PROTO_IGMP
are set, it just unset the flag PIM_OIF_FLAG_PROTO_IGMP.
Fix: Introduced new flags in if_channel, PIM_IF_FLAG_MASK_PROTO_PIM
& PIM_IF_FLAG_MASK_PROTO_IGMP. If a if_channel is created because of
pim join or pim (s,g,rpt) prune received, then set the flag
PIM_IF_FLAG_MASK_PROTO_PIM. If a if_channel is created becuase of IGMP
join received, then set the flag PIM_IF_FLAG_MASK_PROTO_IGMP.
When an interface needs to be added into the oil list check if
PIM_IF_FLAG_MASK_PROTO_PIM or PIM_IF_FLAG_MASK_PROTO_IGMP is set, then
update oil flag accordingly.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue 1: "show ip pim interface traffic" not show prune TX/RX
Rootcause : not added the variable in the json
Fix : add prune TX/RX in show ip pim interface traffic json
Issue 2: "show ip pim rp-info" not shows the key iAmRp when it is false
Rootcause: Only display the key when the value is true.
Fix: add iAmRp as false in show ip pim rp-info json
Issue 3: "show ip pim rp-info" not showing outbound interface if it is empty
Rootcause: Only display when there is any OIL
Fix: When RP is not reachable then, the outbound interface is Unknown
The command "show ip pim rp-info json" not displaying the outbound
interafce if it is unknown
Signed-off-by: Sarita Patra <saritap@vmware.com>
S - Sparse Mode
C - indicates there is a member of the group directly connected to the router.
R -set on an (S, G) by the receipt of an (S, G) RP bit prune message.
F -This indicates that this router is a FHR and send register messages to RP to inform RP of this active source
P - OIL list is NULL. That means the router will send a prune.
T - At least one packet received via SPT.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Problem:
We are receiving PIM BSR packet over the pim interface which has no nbrs
According to RFC 5059 Sec 3.4
When a Bootstrap message is forwarded, it is forwarded out of every
multicast-capable interface that has PIM neighbors (including the one
over which the message was received).
RCA:
We are sending to all pim neighbors.
Fix:
We will avoid the interfaces which has no neighbors.
Verification: Manually verified that Pim router doesn't forward to intf with no nbrs
Signed-off-by: Saravanan K <saravanank@vmware.com>
RCA: When configured more than 32(MAXVIS), the inerfaces that are configured
after 32nd interfaces have the value of MAXVIF.
This is used as index to access the free vif tracker of array size 32(MAXVIFS).
So the channel oil list pointer which is present as the next field in pim structure get corrupt, when updating free vif.
This gets accessed during rpf update resulting in crash.
Fix: Refrain from allocating mcast interface structure and throw config error when more than MAXVIFS are attempted to configure.
Max vif checks are exempted for vrf device and pimreg as vrf device will be the first interface and not expected to fail and pimreg has reserved vif.
vxlan tunnel termination device creation has this check and throw warning on max vif.
All other creation are through CLI.
Signed-off-by: Saravanan K <saravanank@vmware.com>
Problem: Route node is not de referenced after search when pim debug events are
not enabled when pim_rp_find_match_group is called. So this memory will not get
released when route node is deleted after hitting this path.
RCA: Dereferencing is done inside debug condition.
Fix: Moving outside debug condition
Signed-off-by: Saravanan K <saravanank@vmware.com>
Root cause: The header display is put in common outside the vtysh/json if-else.
Fix: Brought inside vtysh condition.
Signed-off-by: Saravanan K <saravanank@vmware.com>
Issue: when (*,G) has some receiver and directly connected source sends traffic,
new (S,G) entry created is not inheriting the oil from (*,g)
RCA: pim_mroute_msg_nocache haven't assume FHR have (*,g) member ports
Fix : Added inherit oil from parent from (*,g) receivers to get added
Signed-off-by: Saravanan K <saravanank@vmware.com>
Issue: When any interface is getting added/deleted in the outgoing
interface list, it calls pim_mroute_add() which is updating the
mroute_creation time without checking if the mroute is already
installed in the kernel.
Fix: Check if mroute is already installed, then dont refresh the
mroute_creation timer.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue: show ip mroute displays the mroute uptime (time when
mroute installed into the kernel) per oif.
This is confusing.
Fix: Display mroute uptime per (s,g) mroute entry.
Signed-off-by: Sarita Patra <saritap@vmware.com>
There exists a chain of events where calling pim_mlag_up_peer_deref
can free the up pointer. Prevent a use after free by returning
the up pointer as needed and checking to make sure we are
ok.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
On shutdown processing we may have gotten a interface down event
which might clear the rpf interface and we might trigger a
work queue item on the vxlan_sg to send a NULL register.
Ensure that we cannot attempt to do the impossible.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
VRF deletion events here calling hash_clean() with
nothing to clean up the vxlan_sg's associated with it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When pim receives a register packet, we will apply the
received source to the prefix list. If accepted normal
processing continues. If denied we will send a register
stop message to the source.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We already use the pim pointer a bunch off of pim_ifp->pim
just add another pim variable to allow us to shorten code
a bit.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The memory type PIM_SPT_PLIST_NAME is specific to
SPT but we are going to store more prefix-list names
in pim, make it generic to allow for less confusion
in the future.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the (S,G) KAT expires we need to poll for activity before dropping the
entry as traffic may have been forwarded by the dataplane since the last
periodic poll cycle.
This only works if traffic is being forwarded by the kernel i.e. if the
entries were HW accelerated via an ASIC we may still miss out on last
minute activity on the mroute in the HW.
Ticket: CM-26871
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An mroute can transition from non-origination to a vxlan origination
mroute. In that case we need to re-evaluate if the interfaces in the
OIL need to be muted; pimreg and termination device need to be muted (if
they were previously un-muted).
Dump in a problem state:
=======================
root@TORC11:~# net show pim state
Codes: J -> Pim Join, I -> IGMP Report, S -> Source, * -> Inherited from (*,G), V -> VxLAN, M -> Muted
Active Source Group RPT IIF OIL
1 * 239.1.1.100 y uplink-1 pimreg(I ), ipmr-lo( J )
1 36.0.0.11 239.1.1.100 n peerlink-3.4094 ipmr-lo( * ), uplink-1( J ), uplink-2( J ), peerlink-3.4094( V )
PS: ipmr-lo should have M set in (36.0.0.11,239.1.1.100)
Ticket: CM-26747
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Add a special catch to the test for pim_macro_chisin_pim_include
to allow the LHR to signal interest in joining upstream.
This will allow both the DR and non DR of the ActiveActive
situation to draw traffic to itself.
The non-DR will continue to not forward traffic.
Ticket: CM-26610
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Series of events leading to the problem -
1. (S,G) has been pruned on the rp on downlink-1
2. a (*,G) join is rxed on downlink-1 without the source S. This
results in the (S,G,rpt) prune state being cleared on downlink-1.
As a part of the clear the ifchannel associated with downlink-1
is deleted.
3. The ifchannel_delete handling is expected to add downlink-1
as an inherited OIF to the channel OIL (which it does). However
it is also added in as an immediate OIF (accidentally) as the
ifchannel is still present (in the process of being deleted).
To avoid the problem defer pim_upstream_update_join_desired
evaluation until after the channel is deleted.
Relevant debug logs -
PIM: pim_ifchannel_delete: ifchannel entry (27.0.0.15,239.1.1.106)(downlink-1) del start
PIM: pim_channel_add_oif(pim_ifchannel_delete): (S,G)=(27.0.0.15,239.1.1.106): proto_mask=4 OIF=downlink-1 vif_index=7: DONE
PIM: pimd/pim_oil.c pim_channel_del_oif: no existing protocol mask 2(4) for requested OIF downlink-1 (vif_index=7, min_ttl=1) for channel (S,G)=(27.0.0.15,239.1.1.106)
PIM: pim_upstream_switch: PIM_UPSTREAM_(27.0.0.15,239.1.1.106): (S,G) old: NotJoined new: Joined
PIM: pim_channel_add_oif(pim_upstream_inherited_olist_decide): (S,G)=(27.0.0.15,239.1.1.106): proto_mask=2 OIF=downlink-1 vif_index=7 added to 0x6 >>>>>>>>>>>>>>>>>>
PIM: pim_upstream_del(pim_ifchannel_delete): Delete (27.0.0.15,239.1.1.106)[default] ref count: 2 , flags: 81 c_oil ref count 1 (Pre decrement)
PIM: pim_ifchannel_delete: ifchannel entry (27.0.0.15,239.1.1.106)(downlink-1) del end
Ticket: CM-26732
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Additional protocols were being set on the OIF proto-mask without
logs. Added logs in that area.
Also added start and end logs to ifchannel_delete to help
identify state machine changes that play out as a part of this
event handling.
Ticket: CM-26732
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This patch does two things:
1) Ensure the decoding of stream data between pim <-> zebra is properly
decoded and we don't read beyond the end of the stream.
2) In zebra when we are freeing memory alloced ensure that we
actually have memory to delete before we do so.
Ticket: CM-27055
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Initially, MLAG Sync is happened at pim_ifchannel, this is mainly to
support even config mismatches(missing configuration of dual active).
But this causes more syncs for each entry.
and also it is not In-line with PIM EVPN. to avoid that moving to
pm_upstream based syncing.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
(S,G) entries that inherit ipmr-lo into the OIL also inherit
the DF role from the parent (*, G) entry.
This change is done primarily to simplify the sync process and
to prevent the MLAG peers from having to track (S, G) activity etc.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
There exists the possibility that a RP exists as a anycast
pair for a lan segment. As such one side may receive
the register and properly handle the registration mechanics.
The one that does not receive the register packets will still
get S,G state and WRVIFWHOLE upcalls across the lan. In
this case notice that we have not received the Registration
packets and prevent nexthop lookups.
Ticket: CM-27466
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There was some code missed during the upstreaming process
due to code squash. Identify and put into a commit
to keep code consistent and correct.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the WRVIFWHOLE callback is made with a iifp of the pimreg
device we *know* that the packet is a PIM Register packet
( see net/ipv4/ipmr.c for kernel behavior ). As such
we know that we will shortly read the pim register packet
and handle it through those mechanics. There is nothing
to do here so we can move along.
Ticket: CM-27729
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Issue 1:
1. Enable pim on an interface.
2. Configure query-interval or query max response time,
which results in pimd crash.
Root cause:
1. When pim is enabled on an interface, it creates a igmp socket
with querier_timer and other_querier time as NULL.
2. When query-interval/max_response_time is configured, it call the
function igmp_sock_query_reschedule() to reshedule the query. This
function check either of querier_timer or other_querier timer should
be running. Since in this case both are NULL, it results in crash.
Issue 2:
1. Enable pim on an interface.
2. Execute no ip igmp query-interval or query max response time,
which results in pimd crash.
Root cause:
1. When pim is enabled on an interface, it creates a pim interface
with querier_timer and other_querier time as NULL.
2. When no ip igmp query-interval/max_response_time is executed, it will
check either of querier_timer or other_querier timer should be running.
Since in this case both are NULL, it results in crash.
Fix:
When pim is enabled on an interface, it creates a igmp socket with
mtrace_only as true. So add a check if mtrace_only is true, then don't
reshedule the query.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue:
Client---LHR---RP
1. Add kernel route for RP on LHR. Client send join
2. (*,G) will be get created in LHR and RP.
3. Kill the FRR on all the nodes
4. Start FRR only on LHR node
5. In LHR, (*, G) will be created with iif as unknown.
Root cause:
In the step 4, When LHR will receive igmp join, it will call
the function pim_ecmp_fib_lookup_if_vif_index which will look
for nexthop to RP with neighbor needed as false. So RPF lookup will
be true as the route is present in the kernel. It will create a
(*, G) channel_oil with incoming interface as the RPF interface
towards RP and install the (*,G) mroute in kernel.
Along with this (*,G) upstream gets craeted, which call the function
pim_rpf_update, which will look for the nexthop to RP with neighbor
needed as true. As the frr is not running in RP, no neighbor is present
on the nexthop interface. Due to which this will fail and will update
the channel_oil incoming interface as MAXVIFS(32).
Fix:
pim_ecmp_fib_lookup_if_vif_index() call the function pim_ecmp_nexthop_lookup
with neighbor_needed as true.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue: REGISTER-STOP Rx is always displaying 0.
Root-cause: pim_ifstat_reg_stop_recv is not getting
incremented when register stop message is received.
Fix: Increment pim_ifstat_reg_stop_recv on receiving
of pim register stop packet.
Signed-off-by: Sarita Patra <saritap@vmware.com>
It's been a year search and destroy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
- Add extern modifier to some declarations in header file and move
qpim_all_pim_routers_addr definition to pimd/pimd.c
`GCC now defaults to -fno-common. As a result, global variable accesses
are more efficient on various targets. In C, global variables with
multiple tentative definitions now result in linker errors.`
Taken from https://gcc.gnu.org/gcc-10/changes.html
Signed-off-by: Tomas Korbar <tkorbar@redhat.com>
1. show ip pim mlag summary
provides MLAG session information and stats
2. show ip pim mlag upstream
displays the upstream entries synced across the MLAG switches
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
ipmr-lo is an internally added device used for multicast vxlan tunnel
termination. This device is not expected to be managed by the admin
however in the case it is accidentally shut we need to be able handle
it by recovering when it is "no shut" again.
Ticket: CM-24985
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
PIM MLAG DF election API was not being triggered on cost change if the
upstream neighbor remained the same.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In an anycast VTEP setup the peerlink_rif is added as a static OIF
to the originating mroute (bypassing the pim state machine). This is
needed to ensure both MLAG switches rx a copy of encapsulated BUM flow.
We were not handling link state changes on this static OIF resulting
in the wrong vifi being used in the OIL (because of vifi re-allocation).
This commit re-acts to oper state changes by deleting the OIF on link
down and re-adding it on link up.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A local membership is created on the vxlan termination device ipmr-lo. This
is done to -
1. Pull multicast vxlan tunnel traffic to the VTEP for termination by
triggering JoinDesired on the BUM multicast group.
2. Include the OIF in the mroute to signal to the dataplane component
that flow needs to be vxlan terminated.
Earlier we were overloading the PIM_UPSTREAM_FLAG_MASK_SRC_IGMP for
this local membership creation but that is creating confusion both in
the state machine and in the show outputs. To avoid that we use the
more apparent PIM_UPSTREAM_FLAG_MASK_SRC_VXLAN_TERM. With this change -
1. We get LHR functionality for VXLAN_TERM mroutes
2. OIF is populated with PIM_OIF_FLAG_PROTO_PIM only
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When local member is added the (*, G) entry may already be in a JOINED
state. In that case the OIL is not updated i.e. pim_channel_add_oif is
not happening for ipmr-lo. Because of this the traffic associated with
the multicast vxlan tunnel is pulled down to the VTEP but not terminated
by the kernel.
This change force updates the OIL anytime ipmr-lo is added or removed
as a local member.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is not causing functional problems but has become a source
of confusion. DF status is only relevant to multicast tunnel decaps.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The RPF cost is incremented by 10 if the RPF interface is the peerlink-rif.
This is used to force the MLAG switch with the lowest cost to the RPF
to become the MLAG DF. If a switch has to go via the peerlink-rif to get
to the RP or source it simplly cannot be the designated forwarder.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
DF election is only run for (*,G) entries i.e. election is skipped
for (S,G) entries that are setup as a result of SPT switchover. (S,G)
entries inherit the DF role from the parent (*,G) entry. So the DF is
responsible for terminating all sources associated with a group.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. Upstream entries associated with tunnel termination mroutes are
synced to the MLAG peer via the local MLAG daemon.
2. These entries are installed in the peer switch (via an upstream
ref flag).
3. DF (Designated Forwarder) election is run per-upstream entry by both
the MLAG switches -
a. The switch with the lowest RPF cost is the DF winner
b. If both switches have the same RPF cost the MLAG role is
used as a tie breaker with the MLAG primary becoming the DF
winner.
4. The DF winner terminates the multicast traffic by adding the tunnel
termination device to the OIL. The non-DF suppresses the termination
device from the OIL.
Note: Before the PIM-MLAG interface was available hidden config was
used to test the EVPN-PIM functionality with MLAG. I have removed the
code to persist that config to avoid confusion. The hidden commands are
still available.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Channel with the MLAG daemon is setup on the first VxLAN BUM MDT or
pim-mlag AA SVI.
This channel is used for -
1. rxing MLAG status status updates (peer state, role etc.)
2. for syncing active-active upstream entries with the peer MLAG
switch.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The vrrpd one conflicts with the standalone vrrpd package; also we're
installing daemons to /usr/lib/frr on some systems so they're not on
PATH.
Signed-off-by: David Lamparter <equinox@diac24.net>
Update zclient_lookup_nexthop_once() to create the zapi
header using the vrf_id on the pim->vrf struct.
This is the one we do a check on a couple lines before, so
we should be using it when we actually create the header as
well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Allow pimd to stop the lookup if zebra tells pimd that the
lookup failed due to a zapi error. Otherwise, it will keep
waiting for a nexthop message that will never come.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Convert the upstream_list and hash to a rb tree, Significant
time was being spent in the listnode_add_sort. This reduces
this time greatly.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The channel_oil_list and hash are taking significant
cpu at scale when adding to the sorted list. Replace
with a RB_TREE.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Kernel might not hand us a bad packet, but better safe than sorry here.
Validate the IP header length field. Also adds an additional check that
the packet length is sufficient for an IGMP packet, and a check that we
actually have enough for an ip header at all.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We check that the IGMP message is sufficently sized for an mtrace query,
but not a response, leading to uninitialized stack read.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Allow 'ip igmp join' to join group for any source if no source is
specified.
Disallow joining source "0.0.0.0" as it is used to define an
any-source multicast group.
Signed-off-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Commit: ddbf3e6060
This commit modified the interface up handling code in
ZAPI such that the zclient handled the decoding for you.
Prior to this commit ospf assumed that it could use the
old ifp pointer to know state before reading the stream.
This lead to a situation where ospf would `smartly` track
and do the right thing in this situation. This commit
changed this assumption and in certain scenarios, say
a interface was changed after it was already up would
lead to situations where ospf would not properly handle
the new interface up.
Modify ospf to track data that is important to it in
it's interface->info pointer.
This code pattern was followed in both eigrp and pim.
In eigrp's case it was just behaving weirdly in any event
so fixing this pattern is not a big deal. In pim's
case it was not properly using this so it's a no-op
to fix.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
use_rpt macro depends on JoinDesired macro and is mostly independent of the
actual RPF interface i.e. doesn't change when the RPF interface changes.
There is however one exception to this handling and that is on the
first hop router (DR or non-DR). On the DR the FHR flag is set so the
RPF interface stays irrelevant to use_rpt eval. But on the non-DR the
IIF is the only way to know we are directly connected to the SG i.e.
to know that we must NOT switch the source to RPT.
This commit fixes up the order of use_rpt eval -
1. it is done before mroute programming
2. but after IIF setup, for SRC_NOCACHE and STATIC_IIF upstream entries
Note: drop an unnecessary check to verify that the RPF interface is
pim enabled. This is just to make the code consistent.
Ticket: CM-27446
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A variety of buffer overflow reads and crashes
that could occur if you fed bad info into pim.
1) When type is setup incorrectly we were printing the first 8 bytes
of the pim_parse_addr_source, but the min encoding length is
4 bytes. As such we will read beyond end of buffer.
2) The RP(pim, grp) macro can return a NULL value
Do not automatically assume that we can deref
the data.
3) BSM parsing was not properly sanitizing data input from wire
and we could enter into situations where we would read beyond
the end of the buffer. Prevent this from happening, we are
probably left in a bad way.
4) The received bit length cannot be greater than 32 bits,
refuse to allow it to happen.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Inherited OIL is used as a part of the JoinDesired macro. And in FRR we
use the channel OIL as the inherited OIL (to reduce processing overhead
everytime JD needs to be re-evaluated). On a FHR pimreg is a part of the
channel-OIL but must not be used for JD computation.
This commit blacklists pimreg from the inherited_oil i.e. present but
ignored.
Note: This fixup is being done to address topotest failures.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If a register packet is received that is less than the PIM_MSG_REGISTER_LEN
in size we can have a possible situation where the data being
checksummed is just random data from the buffer we read into.
2019/11/18 21:45:46 warnings: PIM: int pim_if_add_vif(struct interface *, _Bool, _Bool): could not get address for interface fuzziface ifindex=0
==27636== Invalid read of size 4
==27636== at 0x4E6EB0D: in_cksum (checksum.c:28)
==27636== by 0x4463CC: pim_pim_packet (pim_pim.c:194)
==27636== by 0x40E2B4: main (pim_main.c:117)
==27636== Address 0x771f818 is 0 bytes after a block of size 24 alloc'd
==27636== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27636== by 0x40E261: main (pim_main.c:112)
==27636==
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When you configure interface configuration without explicitly
configuring pim on that interface, we were not creating the pimreg
interface and as such we would crash in an attempted register
since the pimreg device is non-existent.
The crash is this:
==8823== Invalid read of size 8
==8823== at 0x468614: pim_channel_add_oif (pim_oil.c:392)
==8823== by 0x46D0F1: pim_register_join (pim_register.c:61)
==8823== by 0x449AB3: pim_mroute_msg_nocache (pim_mroute.c:242)
==8823== by 0x449AB3: pim_mroute_msg (pim_mroute.c:661)
==8823== by 0x449AB3: mroute_read (pim_mroute.c:707)
==8823== by 0x4FC0676: thread_call (thread.c:1549)
==8823== by 0x4EF3A2F: frr_run (libfrr.c:1064)
==8823== by 0x40DCB5: main (pim_main.c:162)
==8823== Address 0xc8 is not stack'd, malloc'd or (recently) free'd
pim_register_join calls pim_channel_add_oif with:
pim_channel_add_oif(up->channel_oil, pim->regiface,
PIM_OIF_FLAG_PROTO_PIM);
We just need to make srue pim->regiface exists once we start configuring
pim.
Fixes: #5358
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When configuring a RP, dissallow the choice of 0.0.0.0 or
255.255.255.255 as the address as that they make no sense
what so ever.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were adding a newline for the source in some cases
but not others and tighten up the display of data
eva# show ip pim rp-info
RP address group/prefix-list OIF I am RP Source
10.254.0.1 224.0.0.0/4 lo yes Static
4.4.4.4 225.1.2.3/32 abcdefghijklmno yes Static
10.0.20.45 226.200.100.100/32 r1-eth0 no Static
eva#
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
SPT switchover handling is done by adding pimreg in the OIL of the (*, G)
entry on the LHR. This causes multicast data with that destination to be
sent to pimd as IGMPMSG_WHOLEPKT. These packets trigger creation of (S,G)
and also register forwarding. However register forwarding must only be done
if the router is also a FHR. That FHR check was missing causing strange
source registrations from multicast routers that were not directly
connected to the source.
Relevant logs from LHR -
PIM: pim_mroute_msg: pim kernel upcall WHOLEPKT type=3 ip_p=0 from fd=9 for (S,G)=(6.0.0.30,239.1.1.111) on pimreg vifi=0 size=98
PIM: Sending (6.0.0.30,239.1.1.111) Register Packet to 81.0.0.5
PIM: pim_register_send: Sending (6.0.0.30,239.1.1.111) Register Packet to 81.0.0.5 on swp2
And 6.0.0.30 is clearly not directly connected on that router -
root@tor-11:~# ip route |grep 6.0.0.30 -A2
6.0.0.30 proto ospf metric 20
nexthop via 6.0.0.22 dev swp1 weight 1 onlink
nexthop via 6.0.0.23 dev swp2 weight 1 onlink
root@tor-11:~#
Ticket: CM-24549
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
It was causing a Join on (S,G) who's prune state was being cleared. This
was an inactive (KAT not running; no immediate OIL) entry that was being
flushed out but because of this incorrect Join (that was being done with
out join-state checks) the source was getting populated repeatedy i.e.
never aged.
Output of "ip monitor mroute"
=============================
(27.0.0.11,239.1.1.102) Iif: lo State: resolved Table: default
Deleted (27.0.0.11,239.1.1.102) Iif: lo State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: pimreg State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: lo Oifs: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.104) Iif: lo Oifs: pimreg uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: lo Oifs: pimreg uplink-1 State: resolved Table: default
Deleted (27.0.0.11,239.1.1.102) Iif: lo State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: pimreg State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: uplink-1 State: resolved Table: default
(27.0.0.11,239.1.1.102) Iif: lo Oifs: uplink-1 State: resolved Table: default
These mroute events (on a no longer existing multicast souce) continue in
a never ending loop.
Triggered joins/prunes MUST only done via state machine transitions i.e.
via pim_upstream_update_join_desired.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
JD macro is defined by the RFC as -
bool JoinDesired(S,G) {
return (immediate_olist(S,G) != NULL
OR (KeepaliveTimer(S,G) is running
AND inherited_olist(S,G) != NULL))
}
However for MSDP synced SA the KAT will not be running so an exception is
needed. Earlier I had done this by relaxing KAT_run requirements entirely
on the RP. However as that prevents the source from being aged out in some
cases I have made the check more narrow i.e. has to an MSDP peer added
entry.
Ticket: CM-24398
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Added event logs around add/del of upstream entries into the nbr's
jp-agg list. This is to help debug a problem with stale (deleted)
upstream entries being present in the list causing pimd to crash on
the periodic processing.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Today we are only pruning the SPT when (S,G) upstream entry
switches from Joined toNotJoined. This leaves the source still
pruned along the RPT till the next periodic XG join-prune is sent
to the RPF(RP). Traffic from the source will be blackholed for this
duration. To prevent that we need send a new JP message
to RPF(RP) immediately.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
It is now used to evaluate and display join-desired state for
each upstream entry -
root@spine-1:~# net show pim upstream-join-desired
Source Group EvalJD
* 239.1.1.111 yes
6.0.0.28 239.1.1.111 yes
6.0.0.29 239.1.1.111 no
6.0.0.30 239.1.1.111 yes
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This re-naming was needed because the JD state on an upstream is
not just based on channel info i.e. we can have JD=true even if there
is no downstream channel. The "show ip upstream-join-desired" command
will be changed to display that info i.e. upstream's JD state instead
of downstream channel params. The downstream channel params are now
available via "show ip pim channel"
PS: This change maybe reverted if upstream NAKs it. But there is a
pressing need for it to debug some not-so-reproduible problems.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This was causing pimd to crash later; call-stack -
(gdb) bt
context=<optimized out>) at lib/sigevent.c:254
group=group@entry=0x7ffffa9797e0) at pimd/pim_rp.c:207
grp=grp@entry=0x7ffffa9799fe, sgs=sgs@entry=0x560ac069edb0, size=52)
at pimd/pim_msg.c:200
groups=<optimized out>) at pimd/pim_join.c:562
at pimd/pim_neighbor.c:288
at lib/thread.c:1599
at lib/libfrr.c:1024
envp=<optimized out>) at pimd/pim_main.c:162
(gdb) fr 4
group=group@entry=0x7ffffa9797e0) at pimd/pim_rp.c:207
207 pimd/pim_rp.c: No such file or directory.
(gdb) fr 6
grp=grp@entry=0x7ffffa9799fe, sgs=sgs@entry=0x560ac069edb0, size=52)
at pimd/pim_msg.c:200
200 pimd/pim_msg.c: No such file or directory.
(gdb) p source->up->sg_str
$1 = '\000' <repeats 31 times>, <incomplete sequence \361>
(gdb)
This problem can manifest in the following event sequence -
1. upstream RPF neighbor is resolved
2. upstream RPF neighbor becomes unresolved (but upstream entry
stays on the jp-agg list)
3. upstream entry is removed
on the next old-neighbor jp-agg-list processing the stale entry is
accessed resulting in the crash.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Dumps while in problem state -
============================
[from "show ip pim state"]
Active Source Group RPT IIF OIL
1 6.0.0.31 239.1.1.111 n swp1 swp4( J * )
[from "show ip pim join"]
Interface Address Source Group State Uptime Expire Prune
swp3 6.0.0.22 6.0.0.31 239.1.1.111 JOIN --:--:-- 03:11 --:--
You can see from the dumps that the pim downstream router has joined on
swp3 but that OIF has not been added to the OIL with flag
PIM_OIF_FLAG_PROTO_PIM. This is because the join was rxed while the
ifchannel was in a prune-pending state.
Relevant logs -
===============
[
PIM: recv_prune: prune (S,G)=(6.0.0.31,239.1.1.111) rpt=1 wc=0 upstream=6.0.0.22 holdtime=210 from 6.0.0.28 on swp3
PIM: pim_upstream_ref(pim_ifchannel_add): upstream (6.0.0.31,239.1.1.111) ref count 3 increment
PIM: pim_upstream_add(pim_ifchannel_add): (6.0.0.31,239.1.1.111), iif 6.0.0.26/0 (swp1) found: 1: ref_count: 3
PIM: pim_ifchannel_add: ifchannel (6.0.0.31,239.1.1.111) is created
PIM: pim_joinprune_recv: SGRpt flag is set, del inherit oif from up (6.0.0.31,239.1.1.111)
PIM: pim_mroute_add(pim_channel_del_oif), vrf default Added Route: (6.0.0.31,239.1.1.111) IIF: swp1, OIFS: swp4
PIM: pim_channel_del_oif(pim_joinprune_recv): (S,G)=(6.0.0.31,239.1.1.111): proto_mask=4 IIF:1 OIF=swp3 vif_index=3
PIM: recv_join: join (S,G)=(6.0.0.31,239.1.1.111) rpt=0 wc=0 upstream=6.0.0.22 holdtime=210 from 6.0.0.28 on swp3
PIM: PIM_IFCHANNEL(swp3): (6.0.0.31,239.1.1.111) is switching from SGRpt(PP) to JOIN
PIM: Sending Request for New Channel Oil Information(6.0.0.31,239.1.1.111) VIIF 1(default)
]
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is needed for two reasons -
1. The inherited OIL needs to be setup independent of the RPF interface
to allow correct computation of the JoinDesired macro.
2. The RPF interface is computed at the time of MFC programming so
it is not possible to permanently evict the OIF at that time oif_add
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a inherited OIL becomes empty join-desired can go to false. So
we need to re-run join-desired evaluation on any inherited OIL changes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The macro was always returning non-empty because of comparing an
array of u8_t with an array of u32_t.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If an dummy upstream entry (no RPF nbr) which is already in a JOINED
state is resolved we were not triggering an immediate join via the
per-interface upstream switch list.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A dummy pim upstream entry can be in a JOINED state before its RPF nbr is
added. Handle that case by triggering an immediate join.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Deviations -
1. Avoid using SPTbit setting. Replace that with Use_Spt macro.
2. If S is supposed to be forwarded along the RPT but has an empty OIL
prune it.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. KAT should be re-started only if traffic rxed along the SPT i.e.
IIF == RPF_Interface(S).
Only exception to the rule is if you are LHR.
2. KAT should be started on all routers (not just FHR, RP, LHR).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Criteria for switching to SPT is different on RP and LHR. Re-name
the functions to make that apparent.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Joined state is computed based on the downstream state and cannot be
changed if the RPF link flaps.
Reference: rfc 7761, section 4.5.5
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This commit includes the following changes -
1. kat needs to be included when evaluting join desired on a (S,G)
entry.
2. there were cases where we were adding OIF based on joindesired
being true for unrelated reasons (on other OIFs). cleaned up those
cases.
3. make all calls to pim_upstream_switch conditional on the JoinDesired
macro.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
RP config change is a big hammer and use_rpt/spt needs to be
re-evaluated on all existing (S,G) entries.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If a source is being forwarded along the RPT it uses the parent (*,G)'s
IIF. When the parent's IIF changes all the children need to be updated
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
mfcc_parent for an (S, G) entry was being updated on any upstream RPF
change. With the change to use RPT for (S,G) in some cases we can no
longer do that. Instead the upstream entry's RPF neigbor is managed
separately form the channel_oil's mfcc_parent i.e. via NHT. And the
mfcc_parent is evaluated at the time of mroute programming.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An (S,G) mroute can be created as a result of rpt prune. However that
entry needs to stay on the parent (*,G)'s tree (IIF) till a decision is
made to switch the source to the SPT.
The decision to stay on the RPT is made based on the SPTbit setting
according to - RFC7761, Section 4.2 “Data Packet Forwarding Rules”
However those rules are hard to achieve when hw acceleration i.e.
control and data planes are separate. So instead of relying on data
we make the decision of using SPT if we have decided to join the SPT -
Use_RPT(S,G) {
if (Joined(S,G) == TRUE // we have decided to join the SPT
OR Directly_Connected(S) == TRUE // source is directly connected
OR I_am_RP(G) == TRUE) // RP
//use_spt
return FALSE;
//use_rpt
return TRUE;
}
To make that change some re-org was needed -
1. pim static mroutes and dynamic (upstream mroutes) top level APIs
have been separated. This is to limit the state machine to dynamic
mroutes.
2. c_oil->oil.mfcc_parent is re-evaluated based on if we decided
to use the SPT or stay on the RPT.
3. upstream mroute re-eval is done when any of the criteria involved
in Use_RPT changes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Theoretically there should be no case where the channel-oil hangs
around after the upstream entry is removed. But currently there are
cases where it does. This is a precautionary fixup till we are
rid off all of those cases.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. This avoids the needs to re-run "muting" decisions.
2. Avoids the need to restore's pim OIL after fixup and send to kernel
(this is getting harder to manage).
In the future we need to also move the PIM maintained channel OIL from
an array of MAXVIFs to a simple DLL. This will be a significant
optimization in memory usage and preformance (OIL reads, copies etc).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If an mroute loses DF election (with the MLAG peer) it has to stop
forwarding traffic on active-active devices such as ipmr-lo used
for vxlan traffic termination. To acheive that this commit
introduces a concept of OIF muting. That way we can let the PIM and
IGMP state machines play out and silence OIFs after the fact.
Relevant outputs:
=================
1. muted OIFs are displayed with the M flag in "pim state" -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@TORC12:~# net show pim state |grep "27.0.0.13"|grep 100
1 27.0.0.13 239.1.1.100 uplink-1 ipmr-lo( *M)
root@TORC12:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2. And supressed altogether in the mroute output -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@TORC12:~# net show mroute |grep "27.0.0.13"|grep 100
27.0.0.13 239.1.1.100 none uplink-1 none 0 --:--:--
root@TORC12:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
These logs were printing file name which has little value (is always
pim_oil.c). Instead print the caller.
add_oif/del_oif are being called directly from one too many. Instead OIF
setup needs to be consolidated via the PIM state machine. These
debugs are expected to help in understanding what needs to be cleaned up.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. add the Mlag ProtoBuf Lib to Zebra Compilation
2. Encode the messages with protobuf before writing to MLAG
3. Decode the MLAG Messages using protobuf and write to clients
based on their subscrption.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
This includes:
1. Defining message formats
2. Stream Decoding after receiving the message at PIM
3. Handling MLAG UP & Down Notifications
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
when ever a FRR Client wants to send any data to another node
using MLAG Channel, uses below mechanisam.
1. sends a MLAG Registration to zebra with interested messages that
it is intended to receive from peer.
2. In response to this request, Zebra opens communication channel with
MLAG. and also in Rx. diretion zebra forwards only those messages which
client shown interest during registration
3. when client is no-longer interested in communicating with MLAG, client
posts De-register to Zebra
4. if this is the last client which is interested for MLAG Communication,
zebra closes the channel.
why PIM Needs MLAG Communication
================================
1. In general on LAN Networks elecetd DR will send the Join towards
Multicast RP in case of a LHR and Register in case of FHR.
2. But in case DR Goes down, traffic will be re-converged only after
the New DR is elected, but this can take time based on Hold Timer to
detect the DR down.
3. this can be optimised by using MLAG Mecganisam.
4. and also Traffic can be forwarded more efficiently by knowing the cost
towards RP using MLAG
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
When receiving igmp packets we are spitting out a lot of
debugs. Attempt to clean this up to allow us to understand
what is going on a bit better by just being able to look
at the log file.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When you turn on `debug igmp trace` we are seeing a bunch
of debugs associated with pim processing. This is because we were
using PIM_DEBUG_TRACE which is both `debug igmp trace` and `debug pim trace`
when tracing igmp code it would be nice to only see igmp work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We are seeing situations where PIM is sending a IGMP v3 query
and immediately receiving back up the pim kernel interface the
query from itself:
from `show int brief`:
swp7 up default 192.168.202.1/24
We are also receiving these debugs:
2019-11-11T20:52:40.452307+00:00 leaf02 pimd[1592]: Send IGMPv3 query to 224.4.0.8 on swp7 for group 224.4.0.8, sources=0 msg_size=12 s_flag=0 QRV=2 QQI=125 QQIC=7d
2019-11-11T20:52:40.452430+00:00 leaf02 pimd[1592]: pim_mroute_msg(default): igmp kernel upcall on swp7(0x55eaa7dc7dc0) for 192.168.202.1 -> 224.4.11.123
2019-11-11T20:52:40.452574+00:00 leaf02 pimd[1592]: Recv IP packet from 192.168.202.1 to 224.4.11.123 on swp7: size=40 ip_header_size=24 ip_proto=2
2019-11-11T20:52:40.452699+00:00 leaf02 pimd[1592]: Recv IGMP packet from 192.168.202.1 to 224.4.11.123 on swp7: ttl=1 msg_type=17 msg_size=16
2019-11-11T20:52:40.452824+00:00 leaf02 pimd[1592]: Recv IGMP query v3 from 192.168.202.1 on swp7 for group 224.4.11.123
This query is causing us to do some weird gyrations around the IGMP state machine for handling
queries. Let's just prevent it from happening.
Ticket: CM-27247
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When adding an OIF to the OIL, if we are not the DR
there is no need to install it then remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have a zlog_warn that is unguarded ( and really is a debug message )
as that there is nothing the end user can do and nothing to note
here other than a debug message to track refcounts. Change
to an appropriate debug and zlog_debug it instead.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When you enter:
ip pim ssm prefix-list my-custom-ssm-range
ip pim ssm prefix-list my-custom-ssm-range
The second instance would cause a failure to happen which
should not happen w/ duplicate config.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Scenarios where this code change is required:
1. BFD is un-configured from BGP at remote end.
Neighbour BFD sends ADMIN_DOWN state, but BFD on local side will send
DOWN to BGP, resulting in BGP session DOWN.
Removing BFD session administratively shouldn't bring DOWN BGP session
at local or remote.
2. BFD is un-configured from BGP or shutdown locally.
BFD will send state DOWN to BGP resulting in BGP session DOWN.
(This is akin to saying do not use BFD for BGP)
Removing BFD session administratively shouldn't bring DOWN BGP session at
local or remote.
Signed-off-by: Sayed Mohd Saquib sayed.saquib@broadcom.com
The result variable was already tested against PIM_GROUP_BAD_ADDR_MASK_COMBO
earlier in the function. No need to do the same thing twice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
All paths leading to this point in the code have already deref'ed
the pim->global_scope.bsrp_table. No point in testing for
validness now. This was caught by Coverity.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_msg_send() return code was not being checked. Make
consistent with it's usage everywhere else.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleanup the interface creation apis to make it more
clear what they are doing.
Make it explicit that the creation via name/ifindex will
only add it to the appropriate list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
There are several places in the pim where we are mixing up
zlog_warn w/ zlog_debug and vice versa. If we are protecting
a zlog_warn w/ a debug is it really a warn? If we have an actual
error situation we should also warn about it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we create the up data structure we create the channel_oil
as well. As such it is impossible to get into this code
so it can be removed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This will facilitate the Hardware to prefer control packets over
Normal Data packets while queuing, so that during congestion, the
chance of dropping control packet will be minimised.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
For all the places we have a zclient->interface_up convert
them to use the interface ifp_up callback instead.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Switch the zclient->interface_add functionality to have everyone
use the interface create callback in lib/if.c
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Start the conversion to allow zapi interface callbacks to be
controlled like vrf creation/destruction/change callbacks.
This will allow us to consolidate control into the interface.c
instead of having each daemon read the stream and react accordingly.
This will hopefully reduce a bunch of cut-n-paste stuff
Create 4 new callback functions that will be controlled by
lib/if.c
create -> A upper level protocol receives an interface creation event
The ifp is brand spanking newly created in the system.
up -> A upper level protocol receives a interface up event
This means the interface is up and ready to go.
down -> A upper level protocol receives a interface down
destroy -> A upper level protocol receives a destroy event
This means to delete the pointers associated with it.
At this point this is just boilerplate setup for future commits.
There is no new functionality.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The Pim RFC does not appear to state any length requirements
of pim, other than the checksum must be correct.
Certain vendors are sending extra data at the end of a pim assert
message. This while not explicitly against the rules was a bit
of surprise to pim when we threw the assert message on the floor
for being too long.
Modify the test to see if length left will allow us to read
the 8 bytes of data that we need. If it is sufficient for
that allow the packet to be used.
Fixes: #4957
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Original Idea is to display normal & detailed debugs when detailed
debug alone is configured. because of this "sh debugs" are showing
wrong Information, because same macro is used to disply the configured
debugs.
that means even if Normal debug is configured, detailed macro returns
TRUE. To avoid this ambiguity check whetehr detailed debug is configured
or not during dumping configured debugs. In all other places using
old macro.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Debian packaging when run finds a bunch of spelling errors:
I: frr: spelling-error-in-binary usr/bin/vtysh occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bfdd Amount of times Number of times
I: frr: spelling-error-in-binary usr/lib/frr/bgpd occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bgpd recieved received
I: frr: spelling-error-in-binary usr/lib/frr/isisd betweeen between
I: frr: spelling-error-in-binary usr/lib/frr/ospf6d Infomation Information
I: frr: spelling-error-in-binary usr/lib/frr/ospfd missmatch mismatch
I: frr: spelling-error-in-binary usr/lib/frr/pimd bootsrap bootstrap
I: frr: spelling-error-in-binary usr/lib/frr/pimd Unknwon Unknown
I: frr: spelling-error-in-binary usr/lib/frr/zebra Requsted Requested
I: frr: spelling-error-in-binary usr/lib/frr/zebra uknown unknown
I: frr: spelling-error-in-binary usr/lib/x86_64-linux-gnu/frr/libfrr.so.0.0.0 overriden overridden
This commit fixes all of them except the bgp `recieved` issue due to
it being part of json output. That one will need to go through
a deprecation cycle.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Pim will do the nexthop registration with "ip pim rp" static configuration
with this Zebra will advertise the Route Information.
But while processing this info at PIM, if Nexthop Interfaces are not PIM
enabled, currently PIM is dropping those paths. in case all paths are not
PIM enabled, there is no valid RPF Interface at PIM.
and PIM will be stuck at this state until Next update this to route, that
can happen only if there is a Routing change at Zebra for this prefix.
until that time PIM will not have any valid outgoing Interface.
This issue was mainly seen during Node bootup scenarios.
Fix Proposed
=============
store the paths in PIM PNC Data structure though they are not enabled
with PIM, because while selecting the Interface PIM checks for multicast
enabled Interface.
Tests Performed
===============
1. Verified fail Test case
2. Disabling the PIM on selected outgoing Interface, PIM is choosing
another path when Neighbor is down on this Interface.
3. Re-configure the PIM on above un-configured Interface, PIM is staying
with old NHop since it is valid.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Modify the code to create an upstream reference whenever the code
creates an channel_oil via the pim_mroute.c code. This code also
starts a keep alive timer to clean up the reference if we do
nothing with it after the normal time.
I've left alone the source->channel_oil creation because these
are kept and tracked independently already.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify code base so that pim_upstream *always* creates a channel_oil
and as such we do not need to create it later or play other games.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a function that allows you to modify the channel oil's incoming
interface and to appropriately install/remove it from the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zvni setup in zebra is controlled via bgpd i.e. advertise_all_vni
from bgpd triggers this setup. As a part of zvni creation we may need
to setup BUM mcast SG entries which are propagated to pimd for MDT setup.
Now pimd may not be present at the time of zvni creation or may restart
post zvni creation so we need a mechanism to replay (on pimd startup) and
to cleanup (on pimd stop). This is addressed via zebra_vxlan_sg_replay and
zebra_evpn_pim_cfg_clean_up.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When we receive an igmp query on a interface, ensure that the
source address of the packet is connected to the incoming
interface. This will prevent a meanie from crafting a igmp
packet with a source address less than ours and causing
us to suspend query activities.
Fixes: #1692
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fix details :
Added a utility cli to generate a igmp query on an interface.
This won't impact the existing query generation based on the
general query interval.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
In a pim-evpn setup (say TORC11<=>TORC12) an mroute can have a mix of
PIM and IGMP joins. The vxlan termination device ipmr-lo is IGMP
joined on termination mroutes and the peerlink-rif can be pim joined
on the same mroute if the MLAG peer (TORC11) loses all its uplinks to
underlay -
root@TORC12:~# net show pim state 239.1.1.101|grep pimreg
1 * 239.1.1.101 uplink-1
pimreg(I ), ipmr-lo( J ), peerlink-3.4094( J )
root@TORC12:~#
When the uplinks come back up on TORC11 it will prune the peerlink-rif
and join the RP (say spine) via the uplinks.
TORC12 is rxing the prune and removing the if_channel
(pim_ifchannel_delete). However it is not removing the OIF from
mfcc_ttl basically leaving behind a leaked OIF in the forwarding
entry. And this is because it is deriving the owner flag from the
parent upstream entry and incorrectly concluding that all OIFs are
IGMP joined.
Thix fix flushes out both PIM and IGMP ownership when the ifchannel is
deleted.
There is a second fix in the commit and that is to set the proto mask
correctly (to STAR) for inherited OIFs. (S,G) entries can inherit the
OIF from the (*, G) entry and this decision can change when the pim/igmp
ifchannel is removed. The earlier code was setting the proto-mask
incorrectly to PIM or IGMP.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
(cherry picked from commit d4d1d968dbbe61347393f7dace8b675496ff1024)
When a link goes down the vifi was being deleted but the OIF stayed
in the OIL with a stale vifi -
oroot@act-7726-03:~# net show pim state
Codes: J -> Pim Join, I -> IGMP Report, S -> Source, * -> Inherited from (*,G), V -> VxLAN
Installed Source Group IIF OIL
1 * 239.1.1.111 swp1s1 pimreg(I ), ipmr-lo( J )
1 6.0.0.28 239.1.1.111 lo pimreg( J ), ipmr-lo( *), swp1s1( J )
root@act-7726-03:~# ip link set swp1s1 down
root@act-7726-03:~# net show pim state
Codes: J -> Pim Join, I -> IGMP Report, S -> Source, * -> Inherited from (*,G), V -> VxLAN
Installed Source Group IIF OIL
1 * 239.1.1.111 swp1s0 pimreg(I ), ipmr-lo( J )
1 6.0.0.28 239.1.1.111 lo ipmr-lo( *), swp1s0( J ), <oif?>( J ) >>>>>>>>
root@act-7726-03:~#
The problem was as a part ifchannel_delete the join state of the channel
was checked to avoid incorrect OIF deletion this was preventing the OIF
from being flushed. Fix is to flip the channel join-state to NOINFO before
deleting it.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There has never been a `debug igmp trace detail` but we have
had code to display this when we had the appropriate flags
set. Since we never can accept this, let's remove this.
This showed up because of commit:0ab16492d2d9fcc6cba7e001227deed6765ed261
where we re-arranged some debugs to combine them being turned on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Position of address family and encoding type is defined wrongly in the
packed structure. This will correct the position as per RFC 4601 4.9.1
Signed-off-by: Saravanan K <saravanank@vmware.com>
If we create a channel_oil ensure that all paths that
we can go down will create one. Future commits
can remove the (up->channel_oil) tests.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is no need to check for ALLOC function failures
in the code base. If we cannot get more memory we
assert.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The vifi being displayed is just confusing. Display the
actual interface name being used in the mroute.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is mostly relevant for Solaris, where config.h sets up some #define
that affect overall header behaviour, so it needs to be before anything
else.
Signed-off-by: David Lamparter <equinox@diac24.net>
In Scenario where receiver is present in a subnet where 2 or more pim mrouters.
When IGMP query received on a DR interface and RP is reachable through non DR.
Currently we are blocking to create upstream where iif == oif. So pim join
not generated towards RP. We have to allow the DR router in the network to create an upstream.
Signed-off-by: Saravanan K <saravanank@vmware.com>
clippy can't process #ifdef or similar bits inside of an argument list
(e.g. within the braces of a DEFUN or DEFPY statement.) Improve error
reporting to catch these cases instead of generating broken C code.
Fixes: #3840
Signed-off-by: David Lamparter <equinox@diac24.net>
Field vrf_id is replaced by the pointer of the struct vrf *.
For that all other code referencing to (interface)->vrf_id is replaced.
This work should not change the behaviour.
It is just a continuation work toward having an interface API handling
vrf pointer only.
some new generic functions are created in vrf:
vrf_to_id, vrf_to_name,
a zebra function is also created:
zvrf_info_lookup
an ospf function is also created:
ospf_lookup_by_vrf
it is to be noted that now that interface has a vrf pointer, some more
optimisations could be thought through all the rest of the code. as
example, many structure store the vrf_id. those structures could get
the exact vrf structure if inherited from an interface vrf context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
vrf_id parameter is replaced with struct vrf * parameter. It is
needed to create vrf structure before entering in the fuction.
an error is generated in case the vrf parameter is missing.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the vrf_id parameter is replaced by struct vrf * parameter.
this impacts most of the daemons that look for an interface based on the
name and the vrf identifier.
Also, it fixes 2 lookup calls in zebra and sharpd, where the vrf_id was
ignored until now.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The pim ifchannel expiry timer was not setting any debug output.
Let's add something in to help us understand what is going on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We already log whether or not we add nht tracking, having
an additional boolean to say to log another line is
a bit over the top.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
While this is impossible, make the compilers a bit happier
for those of us having to use something old.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com
Various compilers in our CI system were complaining about various
auto-conversions. Let's get these cleaned up a bit more.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When pim is started and has nothing to talk to zebra about
over the zlookup socket and interface events are happening
then the zlookup socket is not being drained.
This eventually leads to a situation where the kernel send buffer
fills up and zebra is unable to write anything down the pipe.
At this time the zapi starts buffering data in `struct buffer`
which of course slowly fills up as pim has nothing to do.
As a bit of a hack allow the zlookup socket to wake up 1 time
a minute and ask for information about the default route
and do nothing with it. This will cause the socket buffers
to be drained and the system will be happy.
Long term we need to get rid of this synchronous/asynchronous
duality that pim has. This is on the radar but is not something
that could be fixed in an afternoon or a week of effort in my
opinion. Given time constraints right now. Let's put this
in place and then once we get pim completely async then
we can just remove the zlookup( I hope ) code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
A couple of places of strncpy snuck in due to my confusion
about if Quentin's earlier change had gotten in. Just some
code in flux. This should fix the issue/warnings in our
CI system.
Recent commits rewrote the `clear mroute` command and this caused
these two two functions to no longer be used, remove.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The header inclusion in mtracebis.c should include zebra.h
and not config.h. There are other issues that need to
be worked through as well, but we can do this in the future.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Introduce new cli commands ip igmp last-member-query-count <1-7>
ip igmp last-member-query-interval <1-255> deciseconds.
Display the config in show running config and show ip igmp interface
Signed-off-by: Sarita Patra <saritap@vmware.com>
Introduced a new command "show ip mroute summary"
to display total number of (*, G) and (S, G) mroutes
created and number of mroutes installed in the kernel.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Made changes to clean up the all upstreams and ifchannels
in FRR apart from cleanup datapath mroutes when this command
issued.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
Fix: When RP receives a (*, G) join and corresponding (s,g)
is present, then check for OIL is not-empty, then only switch
upstream (s, g) state to JOINED.
Signed-off-by: Sarita Patra <saritap@vmware.com>
This would show only bsm related statistics for now.
We shall add more statistics to this later.
Sw3# show ip pim statistics
BSM Statistics :
----------------
Number of Received BSMs : 1584
Number of Forwared BSMs : 793
Number of Dropped BSMs : 1320
Interface : ens192
-------------------
Number of BSMs dropped due to config miss : 0
Number of unicast BSMs dropped : 0
Number of BSMs dropped due to invalid scope zone : 0
Interface : ens224
-------------------
Number of BSMs dropped due to config miss : 0
Number of unicast BSMs dropped : 0
Number of BSMs dropped due to invalid scope zone : 0
Interface : ens256
-------------------
Number of BSMs dropped due to config miss : 0
Number of unicast BSMs dropped : 0
Number of BSMs dropped due to invalid scope zone : 0
Signed-off-by: Saravanan K <saravanank@vmware.com>
This command shows all the fragments of the last received preferred BSM.
This displayed in readable format.
Sw3# sh ip pim bsm-database
Scope Zone: Global
Number of the fragments: 1
BSM Fragment : 1
------------------
BSR-Address BSR-Priority Hashmask-len Fragment-Tag
30.0.0.100 0 0 3289
Group : 225.1.1.1/32
-------------------
Rp Count:9
Fragment Rp Count : 9
RpAddress HoldTime Priority
20.0.0.2 150 0
2.2.2.2 150 0
9.9.9.10 150 0
7.7.2.7 150 0
7.2.2.7 150 0
7.7.9.7 150 0
7.8.9.10 150 0
7.5.2.7 150 0
9.10.9.10 150 0
Group : 226.1.1.1/32
-------------------
Rp Count:1
Fragment Rp Count : 1
RpAddress HoldTime Priority
9.9.9.9 150 0
Signed-off-by: Saravanan K <saravanank@vmware.com>
If no_fwd bit not set,
forward on all interfaces including which it came.
store it in bsm list with size for forwarding it later to new neighbor.
calculate PIM mtu of the interface, if bsm size is more do sematic frag and send
Signed-off-by: Saravanan K <saravanank@vmware.com>
When all rp received on a partial list, this routine is called.
if static rp configured for the group range
if partial list is empty
clean main list and partial list
else
replace main with partial and start the g2rp timer with head of new main
return
if main list was empty
call rp new with head of partial list and start g2rp timer.
else
if partial list is empty
call rp del
else
stop g2rp timer of old elected rp.
call rp change with new rp(head of partial list) and start g2rp timer.
swap the lists and clean the old list(now partial list).
Signed-off-by: Saravanan K <saravanank@vmware.com>
Bootstrap rp table is route_table datastructure with group range as key.
Each node represents a group range.
Every node has two lists of rp nodes. partial list and active list(bsrp_list)
Whenever a rp is parsed from BSM, it is updated to partial list.
When partial list is full, we move it to main list(bsrp_list). This commit doesn't cover that.
Rp Election routine based on RFC 7761 Sec 4.7
Hash calculation for rp election based on RFC 7761 Sec 4.7.2
Signed-off-by: Saravanan K <saravanank@vmware.com>
1. Packet validation as per RFC 5059 Sec 3.1.3
We won't supporting scope zone BSM as of now, they are dropped now.
Order of the check slightly be changed in code for optimization.
if ((DirectlyConnected(BSM.src_ip_address) == FALSE) OR
(we have no Hello state for BSM.src_ip_address)) {
drop the Bootstrap message silently
}
if (BSM.dst_ip_address == ALL-PIM-ROUTERS) {
if (BSM.no_forward_bit == 0) {
if (BSM.src_ip_address != RPF_neighbor(BSM.BSR_ip_address)) {
drop the Bootstrap message silently
}
} else if ((any previous BSM for this scope has been accepted) OR
(more than BS_Period has elapsed since startup)) {
#only accept no-forward BSM if quick refresh on startup
drop the Bootstrap message silently
}
} else if ((Unicast BSM support enabled) AND
(BSM.dst_ip_address is one of my addresses)) {
if ((any previous BSM for this scope has been accepted) OR
(more than BS_Period has elapsed since startup)) {
#the packet was unicast, but this wasn't
#a quick refresh on startup
drop the Bootstrap message silently
}
} else {
drop the Bootstrap message silently
}
2. Nexthop tracking registration for BSR
3. RPF check for BSR Message.
Zebra Lookup based rpf check for new BSR
NHT cache(pnc) based lookup for old BSR
Signed-off-by: Saravanan K <saravanank@vmware.com>
This commit includes parsing of Nbit and contructing pim hdr with Nbit
Adding Nbit to PIm hdr structure
Adding Scope zone bit and Bidir bit to Encoded IPv4 Group Address
Signed-off-by: Saravanan K <saravanank@vmware.com>
When bs time out occurs,
1. Delete the bsm list
2. Reset the BSR address
3. delete nexthop tracking for the expired BSR
4. Give one more lease of life to all the bsr advertised rp with hold time
5. clear partial list of each grp node if not empty
Signed-off-by: Saravanan K <saravanank@vmware.com>
DS Overview:
Bootstrap RP table has grp node.
scope --> rp table --> grp node1 --> rp list --> rp nodes(g2rp timer)
|
-------> grp node2 --> rp list --> rp nodes(g2rp timer)
When grp2rp mapping expires, following has to be done.
1. delete the rp node from the active bs-rp list in the list
2. calculate the elapsed time for other rp nodes in the list
3. delete those nodes having more elapse time than their hold time
4. If the list is not empty and current rp src is not static
rp change with new rp(head) & start g2rp timer with value holdtime - elapse
5. If the list is empty and current rp src for the grp is not static
delete the rp
6. If the list is not empty and current rp is static, just start the
g2rp timer with value holdtime - elapse
7. If list is empty and pending list is empty, delete grp node.
Note: g2rp timer will be run only on elected RP node for optimization.
when it expires, other node are update with elapse time.
This list is sorted insuch way that elected RP is the HEAD of list
Signed-off-by: Saravanan K <saravanank@vmware.com>
Command to display current bsr, last received bsm ts, bsr uptime
Sw3# sh ip pim bsr
PIMv2 Bootstrap information
Current preferred BSR address: 30.0.0.100
Priority Fragment-Tag State UpTime
0 6390 ACCEPT_PREFERRED 91:26:24
Last BSM seen: 00:00:37
Signed-off-by: Saravanan K <saravanank@vmware.com>
pim_rp_new split into pim_rp_new_config and pim_rp_new.
pim_rp_new_config is called by CLI.
pim_rp_new will be called by pim_rp_new_config and bsm rp config.
pim_rp_del is split into pim_rp_del_config and pim_rp_del
pim_rp_del_config is called by CLI.
pim_rp_del is called by pim_rp_del_config and bsm rp config
Signed-off-by: Saravanan K <saravanank@vmware.com>
(intf)ip pim bsm - to enable bsm processing on the interface
(intf)no ip pim bsm - to disable bsm processing on the interface
(intf)ip pim unicast-bsm - to enable ucast bsm processing on the interface
(intf)no ip pim unicast-bsm - to disable ucast bsm processing on the interface
Note: bsm processing and ucast bsm processing is enabled by default on a
pim interface. The CLI is implemented as a security feature as recommended by
RFC 5059
Signed-off-by: Saravanan K <saravanank@vmware.com>
Sw3# sh ip pim rp-info
RP address group/prefix-list OIF I am RP Source
20.0.0.2 225.1.1.1/32 ens192 no BSR
9.9.9.9 226.1.1.1/32 (Unknown) no BSR
30.0.0.100 229.1.1.5/32 ens192 no Static
Signed-off-by: Saravanan K <saravanank@vmware.com>
For each BSM packet, rpf check is performed. We will be accepting if the
source address match any of the next hop neighbor(in ecmp case) to reach
the Bootstrap Router.
1. pim_nexthop_match - this lookup in zebra and return true if any of the
next hop nbr is matching (in ecmp case).
2. pim_nexthop_match_nht_cache - this api searches the given address in local
pnc and return true if any of the next hop
nbr is matching (in ecmp case).
Signed-off-by: Saravanan K <saravanank@vmware.com>
Apart from datastructure, bsm scope initialization and deinitialiation
routines called during pim instance init and deinit. Also makefile changes.
Signed-off-by: Saravanan K <saravanank@vmware.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
bfd cbit is a value carried out in bfd messages, that permit to keep or
not, the independence between control plane and dataplane. In other
words, while most of the cases plan to flush entries, when bfd goes
down, there are some cases where that bfd event should be ignored. this
is the case with non stop forwarding mechanisms where entries may be
kept. this is the case for BGP, when graceful restart capability is
used. If BFD event down happens, and bgp is in graceful restart mode, it
is wished to ignore the BFD event while waiting for the remote router to
restart.
The changes take into account the following:
- add a config flag across zebra layer so that daemon can set or not the
cbit capability.
- ability for daemons to read the remote bfd capability associated to a bfd
notification.
- in bfdd, according to the value, the cbit value is set
- in bfdd, the received value is retrived and stored in the bfd session
context.
- by default, the local cbit announced to remote is set to 1 while
preservation of the local path is not set.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The route_map_event_hook callback was passing the `route_map_event_t`
to each individual interested party. No-one is ever using this data
so let's cut to the chase a bit and remove the pass through of data.
This is considered ok in that the routemap.c code came this way
originally and after 15+ years no-one is using this functionality.
Nor do I see any `easy` way to do anything useful with this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
become stale entries.
Topology:
--------
Source
|
FHR
|
RP ------ LHR --- Recv1
|
Recv2
Root case :
-----------
When RP acts as a LHR i.e RP has a local receiver and registed for
the same group where LHR connected receiver also registered for the
same multicast group.When RP receives a (s,g) join form LHR , it
increments upstream ref count to two to track the Local membership
as well.But at the time of KAT expiry in RP , upstream reference
is not being removed Which is added to track local membership which
is causing to make these entries as stale in RP and FHR.
Fix : Made the change such that it removes the upstream reference
if it is added to track the local memberships.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
vrf_id parameter is added to the api of bfd_client_sendmsg().
this permits being registered to bfd from a separate vrf.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This macro:
- Marks ZAPI callbacks for readability
- Standardizes argument names
- Makes it simple to add ZAPI arguments in the future
- Ensures proper types
- Looks better
- Shortens function declarations
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
There exists a possiblity that we have upstream data but
at this point in time the rpf failed because there is no
path. As such the rpf interface will be NULL and we
should not necessarily trust it. Prevent a crash
Ticket: CM-24857
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
So when we remove a ifchannel from the system we should check to
see if we still care about the S,G having it in the OIL still
due to inheritance rules. The deletion does not necessarily
mean it should not be in the OIL for the S,G.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the json code has not been updated since a variety of new flags have
been added to the code base. Add those flags in so we can tell
what is going on sometimes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add the ability to select on a S or G for a `show ip mroute`
command.
show ip mroute 225.1.1.111
show ip mroute 4.5.6.7 225.1.1.111
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Always when creating a new S,G state look at all possible ifchannels
to decide what the mroute should be.
The bug that this is fixing is this:
Suppose two incoming `*,G` joins on swp1, and swp2.
Now suppose that one of those ifchannel `*,G` sends a `*,G S,G RPT Prune`.
We were creating the S,G upstream state as we should but we were
only looking at the S,G ifchannel to decide the S,G mroute we would
be creating. As such what we need to do is to look over the associated
*,G ifchannels and allow us to associate correct oil needed.
Ticket: CM-24732
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. vxlan instance cleanup needs to be done before the upstream entries are
force-flushed.
2. also vxlan callbacks need to be ignored post instance-cleanup.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This was resulting in static analyzer warnings for subsequent usage
of the same pointer -
pimd/pim_vxlan.c:962:36: warning: Access to field 'info' results in a
dereference of a null pointer (loaded from variable 'ifp')
pim_ifp = (struct pim_interface *)ifp->info;
^~~~~~~~~
1 warning generated.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The MLAG component on the switch is expected to provide some
properties (such as peerlink-rif) to bootstrap the anycast-VTEP
functionality. The final interface for this is being defined as
a part of the pim-mlag functionality.
This commit provides a hidden command to test the anycast-VTEP
functionality independent of the MLAG component.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Sample output:
root@TORS1:~# vtysh -c "show ip pim vxlan-groups"
Codes: I -> installed
Source Group Input Output Flags
27.0.0.7 239.1.1.101 lo I
* 239.1.1.100 - ipmr-lo I
* 239.1.1.101 - ipmr-lo I
27.0.0.7 239.1.1.100 lo I
root@TORS1:~#
root@TORS1:~# vtysh -c "show ip pim vxlan-work"
Codes: I -> installed
Source Group Input Flags
27.0.0.7 239.1.1.100 lo I
PS: note the worklist dump is a hidden command
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The unique physical IP is used as the SIP in the ip header to ensure
that pim-register-stop makes it back to the right MLAG switch.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. peerlink-rif as OIF in origination mroutes -
Hosts are multi-homed to the anycast-VTEP pair and can send BUM traffic to
either switch. But the RP would have only joined one MLAG switch for
pulling down the MDT. To make that work we add the peerlink/ISL as
an OIF to origination mroutes (TORC11<=>TORC12 is an anycast VTEP pair) -
root@TORC11:~# ip mr |grep "(36.0.0.9, 239.1.1.100)"
(36.0.0.9, 239.1.1.100) Iif: peerlink-3.4094 Oifs: peerlink-3.4094 uplink-1
root@TORC11:~#
root@TORC12:~# ip mr |grep "(36.0.0.9, 239.1.1.100)"
(36.0.0.9, 239.1.1.100) Iif: peerlink-3.4094 Oifs: peerlink-3.4094
root@TORC12:~#
2. VTEP-PIP as register source -
TORC11 and TORC12 share the same anycast VTEP IP (36.0.0.9 in the above
example). And that is the source registered by both VTEPs for all the BUM
mcast-groups. However to allow the pim register start machine to close
the SIP in the register-pkt's IP header must be set to an unique IP address.
This is the VTEP PIP.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. special handling of term device in orig mroutes -
The multicast-vxlan termination device ipmr-lo is added to the (*, G)
mroute -
(0.0.0.0, 239.1.1.100) Iif: uplink-1 Oifs: uplink-1 ipmr-lo
This means that it will be inherited into all the SG entries including the
origination mroute. However we cannot terminate the traffic we originate
so some special handling is needed to exclude the termination device
in the origination entries -
27.0.0.7, 239.1.1.100) Iif: lo Oifs: uplink-1
2. special handling of term device on the MLAG pair -
Both MLAG switches pull down BUM-MDT traffic but only one (the DF) can
terminate the traffic. The non-DF must not exclude the termination device
from the MFC to prevent dups to the overlay.
DF -
root@TORC11:~# ip mr |grep "(0.0.0.0, 239.1.1.100)"
(0.0.0.0, 239.1.1.100) Iif: uplink-1 Oifs: uplink-1 ipmr-lo State: resolved
root@TORC11:~#
non-DF -
root@TORC12:~# ip mr |grep "(0.0.0.0, 239.1.1.100)"
(0.0.0.0, 239.1.1.100) Iif: uplink-1 Oifs: uplink-1 State: resolved
root@TORC12:~#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An interface needs to be designated as "termination device" and added to
the termination mroute's OIL. This is used by kernel and ASIC backends
to vxlan-decaps matching flows.
The default termination device is expected to have the prefix (start
sub-string) "ipmr-lo". This can be made configurable if needed -
root@TORS1:~# ip -d link show ipmr-lo
28: ipmr-lo: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 12:5a:ae:74:51:a2 brd ff:ff:ff:ff:ff:ff promiscuity 0
dummy addrgenmode eui64
root@TORS1:~# ip mr
This commit includes the changes to enable pim implicitly on the device
and set it up as the vxlan-term device per-pim-instance.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
For vxlan origination mroutes the IIF is pinned to
a. lo for single VTEPs
b. peerlink-rif for anycast VTEPs
This commit includes the changes to react to pim-vifi add/del for these
devices.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
To terminate a multicast VxLAN tunnel entry we setup a mroute with
ipmr-lo in the OIL -
(0.0.0.0, 239.1.1.100) Iif: uplink-1 Oifs: uplink-1 ipmr-lo
This is done by the vxlan component that add ipmr-lo as a local
member to termination SG entries. In addition termination entries
are also subject to MLAG DF election on the anycast VxLAN-AA setup.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Two flags have been introduced per-upstream entry -
1. XXX_MLAG_VXLAN - This indicates that MLAG DF (designated-forwarded)
election is needed on the entry. In the case of pim-evpn this flag is set
for termination (*, G) entries and will be inherited by the (S, G) entries
that are created as a result of SPT switchover on the G.
2. XXX_MLAG_NON_DF - This is set on entries that have lost the
DF election. Such entries are primarily used for blackholing traffic on
one of the MLAG switches. On a hardware accelerated switch this blackholing
happens in the ASIC preventing (non-needed) traffic hitting the CPU.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
For multicast vxlan tunnels we register the local VTEP-IP independent
of the prescence of BUM traffic i.e. we prime the pump. This
is acheived via NULL registers.
VxLAN orig entries with upstream in a PIM_REG_JOIN state are linked to
a work list for periodic NULL register transmission. Once the SPT setup
is complete the upstream-entry moves to a PIM_REG_PRUNE state and is
remved from the VxLAN work list.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In a PIM MLAG setup (say L11<->L12 is the anycast VTEP pair) the RP
can choose to join either MLAG switch as the anycast VTEP-IP is
present on both. Let's say the RP joins L11.
Hosts are dual connected to L11<->L12 and can send traffic to either
switch. Let's say a host sends broadcast traffic to L12; now L12
must encapsulate and send the traffic toward L11. To do that the
origination-mroute on L12 must include the ISL in its OIL -
(36.0.0.9, 239.1.1.100) Iif: peerlink-3.4094 Oifs: peerlink-3.4094
L11 has a similar mroute -
(36.0.0.9, 239.1.1.100) Iif: peerlink-3.4094 Oifs: peerlink-3.4094 uplink-1
This mroute is used to rx traffic on peerlink-3.4094 and send it out of
uplink-1. Note that this mroute also includes the peerlink-rif in its
OIL. Explicit removal of IIF from OIL is done by the kernel (and other
dataplanes) to prevent traffic looping.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
For every (local-vtep-ip, bum-mcast-grp) registered by evpn an origination
mroute is setup by pimd. The purpose of this mroute is to forward vxlan
encapsulated BUM -
Sample mroute (single VTEP):
(27.0.0.7, 239.1.1.100) Iif: lo Oifs: uplink-1
Sample mroute (anycast VTEP):
(36.0.0.9, 239.1.1.100) Iif: peerlink-3.4094\
Oifs: peerlink-3.4094 uplink-1
This commit is part-1 of orignation mroute setup and includes setup
of upstream entries with vxlan properties.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pim-vxlan work is maintained in a lists and processing staggered. One
such work is the generation of periodic null registers for the local
VTEP-IP.
This info is instance agnostic and maintained in a global cache.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This information will come in from a MLAG component. Hidden commands
will also be provided for testing the interface independent of the
separate MLAG component.
PS: It is possible that this cache will be merged with the base
pim-mlag datastructures once they are available.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pim vxlan component will create upstream entries and OIFs for
multicast VxLAN tunnel origination and termination in single-VTEP
and anycast-VTEP (MLAG) setups.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pim_vxlan will use this for registering the local-VTEP-IP wth the RP
independent of the presence of BUM traffic.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
ipmr-lo is a dummy netdev with no additional addressing requirements -
root@TORS1:~# ip -d link show ipmr-lo
28: ipmr-lo: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 12:5a:ae:74:51:a2 brd ff:ff:ff:ff:ff:ff promiscuity 0
dummy addrgenmode eui64
root@TORS1:~#
This device is used by pim-vxlan to signify multicast-vxlan-tunnel
termination.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In an anycast VTEP setup the peerlink-rif (ISL) is added as a OIF to the
tunnel origination mroute. A new OIF protocol, VxLAN, has been added to
allow that functionalty.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Two devices have special significance to multicast VxLAN tunnels -
1. tunnel origination device -
This device is used as the source device to vxlan-encapsulate BUM traffic.
In the case of the default-vrf this is lo. And in the case of non-default
VRF this is vrf-net-device. This patchset is limited to default-VRF
underlay so all subsequent references of origination-dev are to lo. But it
is possible in the future to extend support to non-default VRFs.
Sample origination mroute on single-VTEP:
(27.0.0.7, 239.1.1.100) Iif: lo Oifs: uplink-1
In the case of MLAG we need to mroute traffic form the MLAG-peer so
we force the IIF to the ISL.
Sample origination mroute on MLAG-VTEP:
(36.0.0.9, 239.1.1.100) Iif: peerlink-3.4094 Oifs: peerlink-3.4094 uplink-1
2. tunnel termination device -
This device is used in the OIL to indicate that packets matching the flow
must be vxlan terminated and overlay packets subsequently forward to the
tenants. A special device has been created for this purpose called ipmr-lo.
This is a simple dummy interface from the kernel perspective which has
special siginficance only to pimd which implicitly enabled pim on the
device and adds it to the termination mroutes.
Sample termination mroute:
(0.0.0.0, 239.1.1.100) Iif: uplink-1 Oifs: uplink-1 ipmr-lo
PS: currently we default the termination device name to "ipmr-lo" but in
the future it is possible to provide a config command to set the
termination device.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
PIM VxLAN handling will create two types of upstream entries and
maintain app-specific properties against the entry.
This commit provides the header definitions for that.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In a VxLAN-AA setup both the anycast VTEPS can send VxLAN encapsulated
traffic. This is despite the fact that the it is not-DR on the IIF
associated with the originating mroute.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In a MLAG setup both of the VTEPs can rx and reg-encapsulate BUM traffic
toward the RP. To prevent these duplicates we need a mechanism to disable
register encaps on specific mroutes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is specifically needed to allow pim-evpn mroutes in the MLAG setup -
(36.0.0.11, 239.1.1.100) Iif: peerlink.4094 Oifs: uplink-1, peerlink.4094
I could have gone the other way and disabled PIM_ENFORCE_LOOPFREE_MFC but
that opens the door too wide. Relaxing the checks for mlag-specific mroutes
seemed like the safer choice.
This commit provides the infrastructure to relax checks on a per-mroute
basis.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of vxlan origination entries IIF is set to -
1. lo for single VTEPs
2. MLAG-ISL for VTEPs multihomed via MLAG.
This commit creates the necessary infrastructure by -
1. allowing the IIF to be set statically (without RPF lookup)
2. and by preventing next-hop-tracking registration
PS: Note that I have skipped additional checks in pim_upstream_del
intentionally i.e. an attempt will be made to remove nexthop-tracking
for the upstream entry (with STATIC_IIF) which will fail because of the
up-entry not being in the nh's hash table. Ideally we should maintain
a nh pointer in the up-entry to prevent this unnecessary processing.
In the abscence of that I wanted to avoid spraying STATIC_IIF checks
all over.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of pim vxlan we create and keep upstream entries alive
in the abscence of traffic. So we need a mechanism to purge entries
abruptly on vxlan SG delete without having to wait for the entry
to age out.
These are again just the infrastructure changes needed for it.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
For vxlan BUM MDTs we prime the pump and register the local-VTEP-ip
as source even before the first BUM packet is rxed. This commit provides
the infrastructure changes needed for that.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra sends (S, G) and (*, G) entries for BUM mcast groups to pimd. This
commit includes the changes to handle the notifications and trigger the
creation of (S, G) base cache in pimd.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
These entries will be used over the subsequent commits for
1. vxlan-tunnel-termination handling - setup MDT to rx VxLAN encapsulated
BUM traffic.
2. vxlan-tunnel-origination handling - register local-vtep-ip as a
multicast source.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Add a bit of code to allow us to look at specified S,G for
the upstream available to us.
If one item is listed we assume Group, if both we assume Source
then Group.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of code to allow us to look at specified S,G for
the upstreams available to us.
If one item is listed we assume Group, if both we assume Source then
Group.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Replace cli 'debug static' with 'debug pim static', to make
the 'debug static' node available for staticd (eventually).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The decision for Update_SPTbit(S,G, iif) includes a test
for JoinDesired(S,G) in section 4.2.2. When we were deciding
to update the spt bit we were not taking this into account.
This commit fixes this issue.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a S,G join and the ifchannel is in S,G RPT Prune state,
pim should transition the ifchannel state to JOIN and transition the
pim_upstream state for the S,G stream.
Ticket: CM-24513
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
pim was sending a triggered response on every S,G RPT prune information
read. Suppose we had this in a *,G message:
*,G
S1, G RPT Prune
S2, G RPT Prune
We would send two triggered *,G messages upstream. This leads to over
processing and quickly changing state if S1 or S2 were in different
states.
Modify the code to send just one Triggered *,G upstream after looking
at all S,G state for a *,G.
Ticket: CM-24531
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
On the LHR after we decide that traffic is flowing and
we set the SPT bit for the S,G *and* the incoming IIF
of the S,G is different than the incoming IIF of the *,G
we should immediately send the *,G S,G RPT Prune as
a triggered response instead of waiting for the next
cycle.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Track whether or not we have received an answer from
our query to do nexthop tracking. This allows us to
go straight to doing a synchronous query for our
RPF.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Start the separation of tracking a Destination from the act
of looking it up. The cojoining of these two concepts led
to a bunch of code that had to think about both problems leading
to weird situations and code paths. Simplify the code by making
pim_ecmp_nexthop_search a static function and we only ever
call pim_ecmp_nexthop_lookup when we need to do a RPF().
pim_ecmp_nexthop_lookup will now attempt to find a stored pnc
and if it finds one it will report on the answer from it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When creating new RP information from a `ip pim rp A.B.C.D/M A.B.C.D`
we should determine if we are the RP even if we can or cannot
determine if we have a path to the RP via RPF.
This is because we should determine if we are the RP based upon a
connected ip address match not whether or not we have a path to
the RPF. We would normally think this is not important but
RPF is inherently asynchronous and we can have a state where
we have registered for nht but have not received the actual
path back yet from zebra.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The current reverse logic led to this code construct
if (!pim_nexthop_lookup(...)) {
//Do something successfull
}
This is backwards and will cause logic errors when people
use this code. Fix to use true/false for success/failure.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When a new pim neighbor comes up, we need to go through and
mark nexthops that we have received from zebra for reachability
tracking so we can refigure stuff. If we pass in the new neighbor
we can limit the search to those nexthops out the interface that
the new neighbor has come up.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_resolve_upstream_nh function call is no longer being used
let's remove it from the code base.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Issue: (*,G) mroute uptime is not updated in mroute table,
after deleting and adding the RP
Root cause: When RP gets deleted or becomes not reachable, then
we un-install the entry from the kernel, but still maintains the
entry in the stack. When RP gets re-configured or becomes reachable,
"show ip mroute" shows the uptime, when the channel_oil gets created.
Fix: Introduce a new time mroute_creation in the channel_oil, gets
updated when mroute gets installed in the kernel.
Signed-off-by: Sarita Patra <saritap@vmware.com>
When we delete an interface, we need to set the interface
ifindex to an internal value so that we don't end up in
a state where the re-addition of the same ifindex, due to
a rename operation, causes an infinite loop.
Fixes:#4007
Fix-Suggested-by: Saravanan K
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are displaying S,G string data we already auto
display the string as (S,G) no need to have ((S,G)).
Cleanup some that were found during log look through.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_rp_check_is_my_ip_address function was checking to see
if we were the actual RP as well as the pim_register code
was doing the same thing. Remove the reduncancy a bit and
just make this function check for that we are the actual receiver
of this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The code as written will scan the entirety of all pim upstreams
on a rpf change, this is not necessary because we know that when
we get a nexthop change we already scan the upstreams reliant
on that and do this work. There is no need to do this again a
short time later.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The interface column in pim was limited to 8 or 9 columns
all over the place in pim, fix the code up to allow interface
length to be up to 16 columns.
Ticket: CM-23083
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are shutting down, delay the zlookup free to as
late as possible since we may need it still
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Suppose we have 2 routers A and B. Both Router A and B have
the same priority of 1000. Router A is the elected DR.
Now suppose B lowers his priority to 1. He still looses the
DR election and we are not sending a hello with the new priority.
Immediately after this A's priority is also lowered to 1, it
looses the election and sends the hello. B receives this hello
and elects A as the DR( since it has the better ip address)
At this point A believes B is the DR, and B believes A is the
DR until such time that the normal hello from B is sent to A,
which if timed correctly can be a significant amount of time).
This code just causes a hello to be sent if the priority is
changed. Now both sides will be able to converge quickly
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When RP gets deleted, find all the (*, G) upstream whose group belongs to
the deleted RP, release the upstream from pnc->upstream_hash in the function
pim_delete_tracked_nexthop().
Signed-off-by: Sarita Patra <saritap@vmware.com>
When a RP gets deleted, find all the (*, G) upstream whose
group belongs to the deleted RP.
case 1: if the group belongs to any other rp, then call
pim_upstream_update() to update the upstream addr and rpf
information.
case 2: If no RP found for the group, then clear the pim
upstream address and rpf information.
Signed-off-by: Sarita Patra <saritap@vmware.com>
When a new RP is configured, find all the (*, G) upstream whose
group belongs to the new RP and then update the upstream structure
with the below fields.
1. De-register for the old RP.
2. Set the upstream address as new RP
3. Register for the new RP.
4. Update the upstream rpf information and kernel multicast forwarding
cache(MFC), if the new RP is reachable.
Signed-off-by: Sarita Patra <saritap@vmware.com>
When route to RP gets modified, FRR receives a notification from
zebra, and call the function pim_resolve_upstream_nh() to compute the
nexthop and update upstream->rpf structure.
Issue: In case when RP becomes not reachable, FRR only uninstall
the mroute from the kernal, but not update the upstream->rpf structure.
Fix: When FRR receives a notification from zebra saying RP becomes
not reachable, then update the following fields.
1. update channel_oil incoming interface as MAXVIFS
2. Un-install the mroute from the kernel.
3. Switch upstream state from JOINED to NOTJOINED.
4. Clear the nexthop information of the upstream.
Signed-off-by: Sarita Patra <saritap@vmware.com>
When route to RP gets modified, FRR receives a notification from
zebra, and call the function pim_update_rp_nh() to compute the
new nexthop and will update the source_nexthop information of
rp_info. This is not working for the case when RP becomes not
reachable.
Fix: When FRR receives a notification from zebra saying RP becomes
not reachable, then delete the source_nexthop informatio of rp_info.
Signed-off-by: Sarita Patra <saritap@vmware.com>
In this commit, we are creating a dummy upstream & dummy channel_oil
for (*, G) when RP is not configured or not reachable.
Dummy upstream: <upstream_addr = INADDR_ANY, rpf = Unknown>
Dummy channel oil: <iif = MAXVIFS>
Signed-off-by: Sarita Patra <saritap@vmware.com>
When FRR receives IGMP/PIM (*, G) join and RP is not configured or not
reachable, then we are creating a dummy upstream with incoming interface
as NULL and upstream address as INADDR_ANY.
Added upstream address and incoming interface validation where it is necessary,
before doing any operation on the upstream.
Signed-off-by: Sarita Patra <saritap@vmware.com>
When FRR receives IGMP/PIM (*, G) join and RP is not configured or
not reachable, then we are creating a dummy upstream with incoming
interface as NULL.
Added some null checks for the incoming interface, while displaying
the pim upstream information in the cli command "show ip pim upstream".
Signed-off-by: Sarita Patra <saritap@vmware.com>
Added comments which explains the new values for existing fields
and new fields in the upstream and channel_oil data structure.
Following are the summary of the behaviour change in PIM code.
Scenario 1 : RP doesn’t exist/RP not reachable
Event: Join received
Current behaviour:
No upstream gets created
Changed behaviour:
Upstream data structure created with below info
upstream_addr: INADDR_ANY
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE (flag introduced to indicate if this entry is valid to get installed in hardware)
RPF details: Not valid
Join state: NOT_JOINED
Kernal installed: FALSE
Scenario 2: Dummy upstream exists
Event: RP configured
Current Behaviour:
upstream address updated for the dummy upstream created.
Changed Behaviour:
upstream_addr: RP address
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE
RPF details: only RP address updated
Join state: NOT_JOINED
Kernel installed: FALSE
Scenario 3: Dummy upstream exists
Event: RP becomes reachable
Current Behaviour:
Update channel oil, rpf details in the upstream and install in hardware
Changed Behaviour:
upstream_addr: RP Adress
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE
RPF details: RPF details updated via NHT callback
Join state: JOINED
Kernel installed: TRUE
Scenario 4: MRoute exists
Event: RP gets deleted
Current behaviour:
Nothing got updated in him upstream and channel oil,
join timer still runs. Mroute still exists in kernel.
Changed behaviour:
upstream_addr: INADDR_ANY
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE
RPF details: Not valid
Join state: NOT_JOINED (also sent prune towards deleted RPF nbr)
Kernel installed: FALSE
Scenario 5: MRoute Exists
Event: RP unreachable
Current behaviour:
Nothing got updated in him upstream and channel oil,
join timer still runs. Mroute sdeleted from kernel.
Changed behaviour:
upstream_addr: RP address
channel_oil iif: MAXVIF
channel_oil is_valid: FALSE
RPF details: only RP address updated
Join state: NOT_JOINED (also sent prune towards deleted RPF nbr)
Kernel installed: FALSE
Scenario 6: Mroute exists
Event: Better RP configured with precise group range & reachable.
Current behaviour:
No effect on existing route.
Changed behaviour:
Upstream address: Better RP
RPF interface: towards the better RP
Join state: JOINED (Send a prune towards the old RP and send a join
towards the better RP)
Scenario 7: Mroute exists
Event: RP deleted and another RP with broad group range fits this group & reachable
Current behaviour:
No effect on current behaviour
Changed behaviour:
Upstream address: next available RP
RPF interface: towards the next available RP
Join state: JOINED (Send a prune towards the old RP and send a join
towards the better RP)
Signed-off-by: Sarita Patra <saritap@vmware.com>
Add a test command to pim that allows you to reset the keepalive timer
for an upstream to it's max value. This is to allow purposeful testing
of cleanup code in pim, by forcing the keeaplive timer to expire later.
robot# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
enp3s0 192.168.201.136 225.1.0.0 NotJ,RegP 00:00:10 00:00:52 00:00:25 00:02:54 1
robot# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
enp3s0 192.168.201.136 225.1.0.0 NotJ,RegP 00:00:11 00:00:51 00:00:24 00:02:53 1
robot# test pim keep 192.168.201.136 225.1.0.0
Setting (192.168.201.136,225.1.0.0) to current keep alive time: 210
robot# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
enp3s0 192.168.201.136 225.1.0.0 NotJ,RegP 00:00:27 00:00:35 00:00:08 00:03:27 1
robot#
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Issue: Configure "ip pim rp x.x.x.x 225.0.0.0/4".
Show running config shows "ip pim rp x.x.x.x 224.0.0.0/4"
This is mis-leading.
Root-cause: Internally 225.0.0.0/4 is getting converted to
224.0.0.0/4 group mask, since the prefix length is 4.
Fix: Restrict the user to configure inconsistent group address
mask by throughing a cli error "Inconsistent address and mask".
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue: Shut the RP interface in the router RP. LHR will get to know
RP becomes not-reachable, so it send a prune towards the RP. On
receiving the prune, RP clear the (*, G) entry, but (S, G) should
not get removed if present.
Now no-shut the RP interface in the router RP. LHR will send a (*, G)
join towards the RP. On receiving join FRR create the (*, G) entry.
Along with this, it also add the interface(join received) in the OIL
of (S, G) and also refresh the (S, G) timer.
Fix: Dont refresh the timer for S, G or (*, G), if the flag for the
channel OIL is PIM_OIF_FLAG_PROTO_ANY.
Signed-off-by: Sarita Patra <saritap@vmware.com>
Add a command to track if an interface should be in active-active
mode or not. This command is hidden at this time because it
is not finished fully.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
- some target_CFLAGS that needed to include AM_CFLAGS didn't do so
- libyang/sysrepo/sqlite3/confd CFLAGS + LIBS weren't used at all
- consistently use $(FOO_CFLAGS) instead of @FOO_CFLAGS@
- 2 dependencies were missing for clippy
Signed-off-by: David Lamparter <equinox@diac24.net>
If you have an interface being added to a static mroute
and that interface has been configured w/ pim but does
not have a valid ip address yet, we do not create a
VIF for that device yet. As such when we attempt
to assign the vif array in the pim static data structure
we attempt to write into -1 of that array.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pimg data structure is only used in one spot to send the default
vrf id to zebra upon startup. Add the default vrf id to the struct pim_router
data structure and remove the pimg pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Create a `struct pim_router` and move the thread master into it.
Future commits will further move global varaibles into the pim_router
structure.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Just add the ability to notice the capabilities on startup,
but don't do anything with it yet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a igmp report and attempt to initiate
a pim ifchannel for it and that fails to work then
let's back out the work done setting stuff up to this
point.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we fail to add a local membership add some additional debugs
so that we can have a bit more information on when something goes
bad.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It's been a year since we added the new optional parameters
to instantiation. Let's switch over to the new name.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When sending a sockoption for MRT_INIT, *bsd requires that
the data passed in must be 1. While linux does not, the
code was sending in a positive value that was causing issues
on *bsd of protocol not supported.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When trying to run PIM on *bsd, the kernel expects to only
allow the pim kernel socket to work if we elevate priviledges.
So do so.
This commit gets us further in the startup of PIM on *bsd
but is not sufficient to get it fully started yet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Introduce frr-interface.yang, which defines a model for managing FRR
interfaces.
Update the 'frr_yang_module_info' array of all daemons that will
implement this module.
Add automatically generated stub callbacks in if.c. These callbacks will
be implemented in the following commit.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
FRR_DAEMON_INFO should now contain an array of 'frr_yang_module_info'
structures describing the YANG modules implemented by the daemon.
This array will be used by frr_init() function to load all YANG modules
and initialize the northbound callbacks during the daemon initialization.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Resolves issue with exit-vrf being placed at the end of zebra's portion
of a vrf block, but before other daemons' portions of the same config
block.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit fixes two issues during pim shutdown.
1) The rp_info structure was being freed before the
outgoing notifications that depended on it's information
was sent out as part of shutdown.
2) The pim->upstream_list shutdown involved iterating
over the list via ALL_LIST_ELEMENTS. This typically
is enough but pim will auto delete child nodes as well
as itself when it goes away and they depend on it. As such
the node and nnode could possibly already have been freed.
So change the way we look at all the data in the upstream_list
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Suppose we have a bridge with a host and two routers attached
to it.
r1 r2
| |
--------
|
host
host is sending traffic.
r1 and r2 are pim neighbors and r2 is the DR.
Both r1 and r2 will receive data from the stream up the pim
kernel socket. r1 will notice that it is not the DR and
stop processing in pim. This code adds a bit more code to blackhole
the route when r1 detects it is not the DR in this scenario.
This is being done because the kernel is both keeping state and
sending data to the pim process to continue processing this.
Additionally if we happen to be running this on a asic, then
blackholing the route in the asic can save a significant amount
of cpu time handling this situation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we decide we are not the right pim process to add upstream state
for the igmp state received, notice this in a debug to make life
easier to debug.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The tracking of who have drpriority on an interface
in pim was not displayed anywhere. Add to the show
command for future reference.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In pim_if_new use bool instead of an int to pass
true/false values for what we should create the
pim interface type for.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The startup of a non-integrated config was not properly
allowing for startup to create the vif when we have
not learned about the interface we are trying to configure
at this point in time. Actually notice when we are
trying to create a pimreg device or not to properly
notice when to attempt to create the vif or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow at timer wheel creation time the ability to specify a
name for what we want the 'show thread cpu' to show up as.
Modify pim to note this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
A new command "ip pim" is created to configure pim sm on an
interface, which replaces the existing commands "ip pim sm"
and "ip pim ssm" and make "ip pim sm" and "ip pim ssm" as
hidden commands. The command "ip multicast-routing" is removed
since it is already enabled on FRR by default.
Signed-off-by: Sarita Patra saritap@vmware.com
config.h (or, transitively, zebra.h) must be the first include file
listed for autoconf things like _GNU_SOURCE and _POSIX_C_SOURCE to work
correctly.
Signed-off-by: David Lamparter <equinox@diac24.net>
Since we're now building through one large Makefile, we can easily put
things with their daemons and crossreference nicely.
Signed-off-by: David Lamparter <equinox@diac24.net>
Problem reported that some bgp and ospf json commands did not return
any json output at all if the bgp/ospf instance did not exist.
Additionally, some bgp and ospf json commands did not return any json
output if the instance existed but no neighbors were defined. This
fix makes these commands more consistent in returning empty braces for
json output and issue a message if not using json output. Additionally,
made the flag "use_json" a bool to make it consistent since previously,
it had been defined as an int, char, u_char, and bool at various places.
Ticket: CM-21040
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
The Vrf aliases can be known with a specific hook. That hook will then,
from zebra propagate the information to the relevant zapi clients.
The registration hook function is the same for all daemons.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
End user was seeing this debug but we are not giving
the user enough information to debug this on his own.
Add a tiny bit of extra information that could point
the user to solving the problem for themselves.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When pimd is getting terminated, pim_upstream_del() gets called as
part of cleaning process. pim_upstream_del() deletes the route and
assigns NULL to the up->channel_oil. It also deletes each if_channel
by calling the function pim_ifchannel_delete().
pim_ifchannel_delete() internally calls the caller function pim_upstream_del(),
if it is the last ifchannel for that upstream. So pim_upstream_del
is getting called twice, which will access the up->channel_oil which
was already set to NULL before. This results in crash.
Fix:
pim_ifchannel_delete() should call pim_upstream_del (caller function)
only if the up->ref_count > 0. Added an assert(up->ref_count > 0) in
the function pim_upstream_del().
Signed-off-by: Sarita Patra <saritap@vmware.com>
e.g.
pimd/pim_oil.c: In function ‘pim_channel_oil_dump’:
pimd/pim_oil.c:51:19: error: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Werror=format-overflow=]
Build on gcc-8.2.0 is warning-free after this patch.
Signed-off-by: David Lamparter <equinox@diac24.net>
While terminating pim instance, the memory allocated for pim nexthop
should be released before deallocating the memory of pim nexthop cache(pnc).
This resolves the memory leak detected in pnc->nexthop creation.
Signed-off-by: Sarita Patra <saritap@vmware.com>
* Use the correct license header
* Stop headers from including themselves
* Use uniform relative include conventions
* Ensure that sources include what they use
* Turn off clang-format around struct array blocks
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The definition of the interface commands in vtysh.c were outdated.
Currently, all daemons that call if_cmd_init() will have the "no interface
IFNAME" command and the "[no] description" commands as well, so there's
no need to define exceptions for these commands anymore.
To fix this, make extract.pl parse the if.c file so that vtysh can get the
interface commands from there automatically. Only the "interface IFNAME
[vrf NAME]" must be kept in vtysh.c because it changes the vty node and
thus needs special treatment.
Finally, make pimd and pbrd display interface descriptions on "sh run"
when they are configured.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
There is no need to check for failure of a ALLOC call
as that any failure to do so will result in a assert
happening. So we can safely remove all of this code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don't show BFD commands with timers since it might confuse users
("show running-config" won't display timers in client daemons anymore),
but keep accepting this command from previous configurations.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When BFD timers are configured, don't show it anymore in the daemon
side. This will help us migrate the timers command from daemons to
`bfdd`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
pimd/pim_nht.c: In function ‘pim_ecmp_nexthop_search’:
pimd/pim_nht.c:523:17: error: ‘nbr’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
nexthop->nbr = nbr;
(on gcc 5.4.0; this is the only warning with that version.)
Signed-off-by: David Lamparter <equinox@diac24.net>
When calling route_map_finish, every place that we do we must
first set the deletion event to NULL, or we will create an infinite
loop, if we are using the delayed route-map application code.
As such we might as well just make the route_map_finish code
do this work, as that there is really no viable alternative here
and route_map_finish should only be called on shutdown.
This fixes an infinite loop in zebra on shutdown when there
are route-maps.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
On shutdown and cleaning up pim_upstream ensure that the
upstream_sg_wheel still exists to remove item from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When shutting down, do not free oil information after
interface information since we use the data there to
do so.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
frr_fini and pim_free both call zprivs_terminate. There is
no need for pim_free to call this function.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If `ip igmp query-max-response-time` is set move it to
display first as that this command has order dependencies
on `ip igmp query-interval`.
Ticket: CM-21598
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When there is a return at the end of a function, there
is no need for another one immediately after it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_socket_join_source function only ever calls
pim_igmp_join_source and pim_socket_join_source is only
called from 1 place. Skip the level of indirection.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Abstract the RPF change for upstream handling code so
that we do not have two copies of the code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
After we have decided what has changed as part of a update
we need to send the j/p messages to our peers.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Packet sending in PIM is a two step process.
1) Gather data size of next G to be packed into a packet.
2) Write data
After 1 we need to ensure that the next G to pack will actually
fit in a packet. If it does not send what we've currently written
and start a new packet to send.
Because this was a 2 step process it is important to be consistent
in what you think you have packed -vs- what you think you should.
PIM has a bug where we were considering S,G RPT Prunes for a *,G
even when the *,G was being pruned. This lead to a situation where
we were figuring a write size of more data then what we actually wrote
into a packet. This would leave a 8 byte whole of 0's in the packet
due to the way we moved pointers around.
Fix the code so that we do not attempt to consider S,G rpt prunes
for a *,G prune when figuring out how much we should write in step 1.
Ticket: CM-21644
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a IGMP report on an interface, do not create upstream
state for that request, unless we are the DR for the incoming interface.
This will prevent a interface on a LAN segment from causing traffic
to flow to itself.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In places where we do a pim_ecmp_nexthop_search, also
use pim_ecmp_nexthop_lookup instead of the single path
case of pim_nexthop_lookup.
This is in preparation of more serious surgery to fix
the weird api of pim_find_or_track_nexthop.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Both pim_ecmp_nexthop_lookup and pim_ecmp_fib_lookup_if_vif_index
pass the address in 2 times. Make function calls consistent
and just pass in the src once.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_ecmp_fib_looikup_if_vif_index does practically
the same work as pim_ecmp_nexthop_lookup, refactor to
use that function so that we do not have more code
that must parse the results from zclient_lookup_nexthop.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When doing nexthop lookups do not permanently allocate
memory in zebra and pim to track the nexthop specified
on the cli.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are looking up a RPF with a ecmp path, there
are situations where we are failing to find a path change
because we were not considering the actual number of neighbors
we have available to us at the start of the loop.
Example:
Suppose 2 way ecmp with a neighbor on each path. We have
multiple upstreams that are strewn across both paths.
If we loose a pim neighbor on one of the paths we would
initiate a rescan of the upstreams. If the neighbor
we lost happened to be the last ecmp path we rescanned
we would not successfully find a new path and leave
the upstream stranded.
This code change looks at the number of available neighbors
that we have -vs- the number of paths we have and chooses
the smaller of the two for figuring out what to do.
There probably exist other failure scenarios as well that
I am missing here and quite frankly the current code muddies
the water between a RPF lookup failure -vs- a RPF lookup succeeded
and there are no paths. Further work is needed here imo.
Additionally this idea of a pim_ecmp_nexthop_lookup and
pim_ecmp_nexthop_search is bogus. They are the same function and
should be merged at some point in time.
Ticket: CM-21599
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is no reason that a IGMP src should need a upstream
pim neighbor when doing a RPF lookup.
Ticket: CM-21599
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Unless someone intentionally changes MCAST_ALL_ROUTERS ("224.0.0.2") with a
wrong IP, this should never fail, so the fix is using "(void)" at the left
of the function call, as an explicit way of indicating we discard the
return value on purpose.
Signed-off-by: F. Aragon <paco@voltanet.io>
Additional fix over d94023d85c (PR #2546)
Removed all pointer arithmetic used for the checks, while keeping same
coverage. I hope this removes the Coverity warning (If this don't fix it, I'll
make Coverity work with a fork and try there as many times as necessary)
Signed-off-by: F. Aragon <paco@voltanet.io>
Additional fix over 18e994a043 (PR #2457)
Previous correction was not enough for fixing the Coverity warning. Now we
ensure we don't overflow the buffer.
Signed-off-by: F. Aragon <paco@voltanet.io>
route_map_mark_updated has a `int del_later` variable
that is passed in but never used. Just remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fix a couple of problems in my 1st fix for PIM nexthops reachable via a
connected route:
Use NEXTHOP_TYPE_IPV4_IFINDEX instead of NEXTHOP_TYPE_IPV4 since we add an
IPv4 address to an already known ifindex.
Assign nexthop_tab[num_ifindex].protocol_distance and .route_metric before
incrementing num_ifindex.
Revert the default: to individual switch case statement conversion in
zclient_read_nexthop() as requested by donaldsharp in #2347
Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
These commands were being accepted in all vrf's and
affecting all vrf's behavior globally, since they were
global variables.
Modify the code to make these two commands work
on a per-vrf basis.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When sending a PIM join upwards on the RP-based tree, it may get dropped on
the last hop before the RP if the RP is reachable via a connected route
(i.e. there's no associated nexthop). pimd needs to put the nexthop IP
address into the PIM join payload and fails to do that if that route has a
nexthop of 0.0.0.0. So whenever we look up a route to determine the nexthop
or we receive a nexthop tracking update from Zebra, use the destination
address as the nexthop address for connected routes.
Fixes#2326.
Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
The assignment of sa with the usage of hash_get and hash_alloc_intern
can never fail. No need to look for a failure case.
Found by Coverity SA.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleanup the pim->rpf_hash after upstream cleanup is done
since upstream cleanup uses the rpf_hash to cleanup itself.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The pim_upstream_free command was leaving slag by
not deleting data associated with the upstream
data structure. Modify the code to explicitly free
all data associated with an upstream on a pim instance
deletion event. Additionally the end result is that
the pim_upstream_free command is not needed anymore
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the underlying networking subsystem is fundamentally
changed via some system controls. If we have msdp running
there exists a possibility that we need to stop some running
timers to prevent a crash.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The interface itself knows if it is a vrf device or
not, so let's just use a check for that in the decision
if a interface is a loopback or not.
Additionally modify function to return a bool.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Adding to mtracebis querying with group address. Same change
to vtysh mtrace command. Support for querying (S,G) and (*,G)
state in mtrace router code. Further improvments to mtrace router
code with closer complience to IETF draft. More references in
comments to the draft. Man page has been updated accordingly.
Signed-off-by: Mladen Sablic <mladen.sablic@gmail.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Mtrace router code has been fixed for the case where no PIM
interface exists towards the source and next-hop is returned.
fwd_ttl is set to one, for correct reporting of mcast ttl.
Also, NO_MULTICAST error is returned when there is no PIM.
And prev_hop is set to source, for case when connected to source,
to help termination of hop-by-hop queries.
Dependency of logging on PIM, when sending mtrace packet, has been
removed.
Signed-off-by: Mladen Sablic <mladen.sablic@gmail.com>
Can't take the address of members of packed structures due to potential for
alignment faults on some platforms.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When the pim_nexthop_lookup fails, close the opened fd
as part of the failure condition.
Additionally pim_nexthop_lookup assumes that we've
actually already looked up a nexthop in the past.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is no need to look at all VRF's when we need to
reevaluate the source forward since the calling function
knows the vrf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We know the vrf that we are in when we need to initiate a
rescan of the rpf cache. So pass it in and use that information.
This should help the rescan at scale with several vrf's cutting
out a lot of unnecessary work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>