Commit Graph

23292 Commits

Author SHA1 Message Date
Anuradha Karuppiah
46bf266c1c zebra: debug logs to detect incorrect mac deletions
A MAC entry cannot be deleted while a neigh is referencing it. It seems
there is some race condition where this may be happening. The log is
to help identify those cases.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:46:28 -08:00
Anuradha Karuppiah
4f9bb78eca zebra: change the L2 NHG id format to co-exist with the L3NHG ids
It is now 4bits of type and 28bits of value -
1. type=0 is for L3 NHG
2. type=1 is for L2 NH
3. type=2 is for L2 NHG

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:46:28 -08:00
Anuradha Karuppiah
5de10c3705 zebra: allocate one nexthop id per-VTEP instead of one per-ES-VTEP
This is an optimization to reduce the number of L2 nexthops. A
l2 or fdb nexthop simply provides the dataplane with a nexthop ip-
torm-12:mgmt:~# ip nexthop
id 268435461 via 27.0.0.20 scope link fdb
id 268435463 via 27.0.0.20 scope link fdb
id 268435465 via 27.0.0.20 scope link fdb

So there is no need to allocate a nexthop per-ES/per-VTEP. There
can be 100+ ESs per-VTEP so this change cuts the scale down by a
factor of 100.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:46:28 -08:00
Anuradha Karuppiah
15400f95b7 zebra: support for slow-failover of local MACs on an ES
When a local ES flaps there are two modes in which the local
MACs are failed over -
1. Fast failover - A backup NHG (ES-peer group) is programmed in the
dataplane per-access port. When a local ES flaps the MAC entries
are left unaltered i.e. pointing to the down access port. And the
dataplane redirects traffic destined to the oper-down access port
via the backup NHG.
2. Slow failover - This mode needs to be turned on to allow dataplanes
not capable of re-directing traffic. In this mode local MAC entries
on a down local ES are re-programmed to point to the ES-peers'
NHG. And vice-versa i.e. when the ES comes up the MAC entries
are re-programmed with the access port as dest.

Fast failover is on by default. Slow failover can be enabled via the
following config -
evpn mh redirect-off

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:46:26 -08:00
Anuradha Karuppiah
69711b3f83 zebra: on local mac add from the dplane a re-install maybe need as static
As a part of extended MM handing a MAC can be updated from local
to remote while being referenced by SYNC neighs (this is really a
temporary/small window). During this window if the MAC transitions
back to local again we need to re-inforce the previous SYNC flags
(based on the sync-neigh count) as subsequent SYNC updates to the
MAC will be de-duped and ignored.

Ticket: CM-29636

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:44:37 -08:00
Anuradha Karuppiah
1a4f9efd54 zebra: set inactive bit when zebra re-installs the MAC on dplane del
When a local mac is deleted by the dataplane zebra can re-install it
if the MAC is a SYNC MAC (learned from ES peers). The "local_inactive"
bit must be set as a part of the re-install to prevent zebra turning
around and advertising the MAC as locally active.

Also fixed up some debug logs in the slow-fail path to include the VNI.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:44:37 -08:00
Anuradha Karuppiah
80e19eb71f zebra: skip NDA_DST attr if NHG is present
NHG and DST (VTEP-IP) are mutually exclusive attributes. If DST is
present the kernel ignores NHG.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:44:37 -08:00
Anuradha Karuppiah
de86cc5bb1 zebra: free up the L2 NHG bitmap as a part of shutdown
Fix for a shutdown time memory leak found during review.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:44:37 -08:00
Anuradha Karuppiah
f3722826a4 zebra: remove FDB entries before de-activating a L2-NHG
NHG is activated i.e. programmed in the dataplane only if there
are active-VTEPs associated with it. When a NHG is de-activated
all the remote-mac entries associated with it need to be removed
before the NHG is removed.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-12-01 09:44:37 -08:00
Donald Sharp
9f3bcd52e1
Merge pull request #7641 from rampxxxx/fix_run_folder
Fix run folder permissions
2020-12-01 12:40:40 -05:00
Patrick Ruddy
0091461961
Merge pull request #7483 from AnuradhaKaruppiah/evpn-mh-dad
bgpd, zebra: Keep DAD disabled if EVPN MH is turned on
2020-12-01 17:37:32 +00:00
Emanuele Di Pascale
265ac74a87 zebra: fix show ip route vrf X summary
The lookup for non default VRFs was always using a tableId; if not
provided, we were defaulting to RT_TABLE_MAIN. This is fine for the
default VRF but not for others. As a result, the command was silently
failing for non-default VRFs unless we also specified the correct tableId.

Fix this by only performing the lookup using the tableId if it is
provided; else use zebra_vrf_table.

Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
2020-12-01 18:34:05 +01:00
Donald Sharp
5c1a899432
Merge pull request #7632 from idryzhov/vtysh-memory-fixes
vtysh memory fixes
2020-12-01 12:08:52 -05:00
Stephen Worley
306720345a zebra: make a couple NHG errors debugs
A couple NHG messages we were logging as errors are a bit spammy
in usecases where you routinely add/remove interfaces (VM heavy
deployments). Its not really an error a user cares about and
more for a developer to know what went wrong after the fact so
it makes more sense for these to be under a debug rather than
an error since seeing them does not implicitly mean error during
those usecases.

Signed-off-by: Stephen Worley <sworley@nvidia.com>
2020-12-01 12:04:30 -05:00
Mark Stapp
958c62b712
Merge pull request #7642 from donaldsharp/remove_pimd_version
pimd: Remove pim_version.c it is never used
2020-12-01 11:59:31 -05:00
Donald Sharp
9171f28204
Merge pull request #7640 from opensourcerouting/bfd-echo-minttl-check
bfdd: session specific command type checks
2020-12-01 09:31:57 -05:00
Sarita Patra
851cb2e5de pimd: mesh group not removed when configured for multiple groups
Issue:
Configure msdp mesh mg1 for 2 groups.
ip msdp mesh-group mg1 member 1.1.1.1
ip msdp mesh-group mg1 member 1.1.1.2

Remove mg1 for 1.1.1.1
no ip msdp mesh-group mg1 member 1.1.1.1

mg1 for 1.1.1.1 still present. This is fixed.

Signed-off-by: Sarita Patra <saritap@vmware.com>
2020-12-01 05:35:57 -08:00
Sarita Patra
e224558939 pimd: rp not removed when configured for multiple groups
Issue:
Configure RP.
ip pim rp 20.0.0.1 239.1.1.1/32
ip pim rp 20.0.0.1 239.1.1.2/32

Remove RP.
no ip pim rp 20.0.0.1 239.1.1.1/32
Rp is not removed. This is fixed now.

Signed-off-by: Sarita Patra <saritap@vmware.com>
2020-12-01 05:34:54 -08:00
Donald Sharp
91f20e5cd5 pimd: Remove pim_version.c it is never used
The pim_version.[c|h] files are never used and we are getting
warnings about PIM_VERSION changing pointer sizes from
newer versions of the compiler.  I see no reason to keep this

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2020-12-01 07:57:45 -05:00
Donald Sharp
fc54da0966
Merge pull request #7404 from vishaldhingra/pim
pimd: (*,G) Prune processing doesn't remove SGRpt ifchannel
2020-12-01 07:48:54 -05:00
Donald Sharp
b8a9f6c6a2
Merge pull request #7578 from mjstapp/fix_pim_subdir_am
pimd: fix build and compilation errors
2020-12-01 07:41:33 -05:00
Javier Garcia
d3a3e6253b tools: Fix run folder permissions
In the case of some linux distros the /var/run dir is mounted
with tmpfs so in every reboot it's removed.
Then the frrcommon.sh will recreate it without 'x' perm
So no pid file cannot be created in /var/run/frr

Signed-off-by: Javier Garcia <rampxxxx@gmail.com>
2020-12-01 12:37:51 +01:00
Rafael Zalamena
7d2de131ce bfdd: session specific command type checks
Replace the unclear error message:

```
% Failed to edit configuration.

YANG error(s):
 Schema node not found.
 YANG path: /frr-bfdd:bfdd/bfd/sessions/single-hop[dest-addr='192.168.253.6'][interface=''][vrf='default']/minimum-ttl
```

With:

```
frr(config-bfd-peer)# minimum-ttl 250
% Minimum TTL is only available for multi hop sessions.

! or

frr(config-bfd-peer)# echo
% Echo mode is only available for single hop sessions.
frr(config-bfd-peer)# echo-interval 300
% Echo mode is only available for single hop sessions.
```

Reported-by: Trae Santiago
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-12-01 08:01:37 -03:00
Igor Ryzhov
fdac05fd9a doc: add description of the new memory macro
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2020-12-01 11:39:40 +03:00
Donatas Abraitis
6b1ca21086
Merge pull request #7630 from donaldsharp/snmp_clarity
doc: Explicitly call out need to add snmp module
2020-12-01 10:32:06 +02:00
Philippe Guibert
aa72bd7e7f bgpd: add peer description for each afi/safi line in show summary
For each afi/safi of 'show bgp summary', display the peer description
each time needed. This information is useful, for instance in the case
of a device connected with multiple peers.
The topotest all_protocol_startup is changed accordingly.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2020-12-01 08:06:37 +00:00
Donald Sharp
34c9b28ba8 zebra: Reduce warn -> debug
During times of network trauma and when we are at large network scale
the process_remote_macip_add function can issue a zlog_warn for
a common occurrence.  Modify the code to be a debug statement.
This behavior is the same now as the process_remote_macip_del function

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2020-11-30 19:37:53 -05:00
Mark Stapp
aa21da071c zebra: add an api to process/clean the pending dplane queue
Add an api that allows a caller in the zebra main pthread to
process the queue of pending dplane updates. The caller supplies
a function to call to test each pending context. Selected
contexts are dequeued, and freed without being processed.

Signed-off-by: Mark Stapp <mjs@voltanet.io>
2020-11-30 16:42:18 -05:00
Anuradha Karuppiah
0c16fb7262 zebra: fix crash seen on VxLAN SG table cleanup done as a part of vrf disable
There are two fixes in this commit -
1. Prevent implicit deletion of (*,G) entries during (S,G) cleanup.
This is done by creating a dummy reference on all (*,G) entries.
This is needed for a hash-walk based table cleanup.
2. Free up the SG hash table when the VRF is deleted.

Ticket: CM-30151

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-11-30 12:50:38 -08:00
Anuradha Karuppiah
d149c75aa1 pimd: skip displaying pim config on the vxlan termination device ipmr-lo
pim is enabled internally/implicitly on the vxlan termination device
and displaying that can confuse the admin and tools (such as
frr-reload).

Ticket: CM-30180

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-11-30 12:50:33 -08:00
Anuradha Karuppiah
325d694b93 zebra: support for type-0 ESI
Earlier type-3 ESI was the only format supported for evpn-mh. Updated the
CLI to allow a 10-byte type-0 ESI.

Both type-0 and type-3 ESIs are statically configured; just in two different
ways -
1. type-0 is configured as a complete 10-byte string
2. type-3 is configured as a 6-byte es-sys-mac and a 3-byte
local-discriminator.

Sample config -
!
interface hostbond1
 evpn mh es-id 00:44:38:39:ff:ff:01:00:00:01
!

This is a CLI-only change and has no functional impact.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-11-30 12:36:41 -08:00
Donald Sharp
d5ecf80558
Merge pull request #7631 from mjstapp/fix_pw_ctx_leak
zebra: free dplane ctx after pw update
2020-11-30 13:48:38 -05:00
Igor Ryzhov
6df43392d8 vtysh: fix incorrect memory statistics
As code comment states, 1 count of MTYPE_COMPLETION is leaked for each
autocompleted token. Let's manually decrement the counter before passing
the pointer to readline.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2020-11-30 18:55:40 +03:00
Igor Ryzhov
40ab41115d vtysh: fix memory leak
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2020-11-30 18:55:40 +03:00
Mark Stapp
a20e6c32a2 zebra: free dplane ctx after pw update
Free the dplane contexts used for pseudowire updates; we were
leaking these.

Signed-off-by: Mark Stapp <mjs@voltanet.io>
2020-11-30 10:02:40 -05:00
Igor Ryzhov
d313699058 ospf6: move serv_close to ospf6_delete
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2020-11-30 17:36:10 +03:00
Igor Ryzhov
1fa5858735 ospf6: fix crash on shutdown
The crash is sometimes reproduced by all_protocol_startup topotest.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2020-11-30 17:36:10 +03:00
Igor Ryzhov
f5f26b8fca ospf6: get instance from lsdb data
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2020-11-30 17:36:10 +03:00
Igor Ryzhov
e285b70d3c ospf6: get instance from route table scope
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2020-11-30 17:36:10 +03:00
Donald Sharp
7815b994c3 doc: Explicitly call out need to add snmp module
The documentation implied how snmp works.  Explicitly call
it out a bit more for future users.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2020-11-30 08:39:29 -05:00
Rafael Zalamena
78695ce3a4
Merge pull request #7621 from idryzhov/fix-cisco-access-list
yang: fix cisco access list source value
2020-11-30 09:16:33 -03:00
Duncan Eastoe
f9bf1ecc38 zebra: dplane FPM LSP table walk
Add routines to walk the LSP table and generate FPM updates for all
entries. A walk of the LSP table is triggered when (re-)connecting
to an FPM.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-11-30 12:13:43 +00:00
Rafael Zalamena
9173163369
Merge pull request #7620 from ckishimo/cosmetic2
ospfd: fix a couple of typos
2020-11-30 08:53:33 -03:00
Donald Sharp
d615c345f2 ospfd: Restore POINTOMULTIPOINT to working order
Commit: 1d376ff539 removed
the code to find nexthops for the POINTOMULTIPOINT and
replaced it with a generic bit of code that was
supposed to handle both POINTOPOINT and POINTOMULTIPOINT
the problem is that the ospf rfc states that the
network mask on point to multipoint should be /32
which will not allow you to properly do a prefix match
on it against the network.

Restore original behavior as much as possible and leave
the new POINTOPOINT code alone.

Fixes: #7624
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2020-11-28 15:35:18 -05:00
Donald Sharp
67b8d7ec09 eigrpd: Remove unneeeded if state types
There are some unused/unneeded if state types in eigrp, remove them

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2020-11-28 07:45:08 -05:00
Donald Sharp
2734c7471c
Merge pull request #7584 from opensourcerouting/topotest-asan-fix
tests: Fix Topotest runs with newerversion of Address Sanitizer
2020-11-27 17:35:08 -05:00
Igor Ryzhov
de4132bfe5 yang: fix cisco access list source value
Source value must be a choice between host, network and any, not a set
of all three.

Fixes #7599.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2020-11-27 21:53:25 +03:00
Martin Winter
960c3f25f5
tests: Ignore YANG stderr messages in test_all_protocol_startup test
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
2020-11-27 19:45:15 +01:00
Martin Winter
8972e2710c
tests: Fix logging output directory for older tests
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
2020-11-27 19:45:15 +01:00
Martin Winter
1a31ada871
tests: Fix FRR process shutdown in failed topotest teardown phase
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
2020-11-27 19:45:14 +01:00