Commit Graph

32927 Commits

Author SHA1 Message Date
Donald Sharp
6f6b7e1706 doc: Add --v6-with-v4-nexthops documentation
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-03 08:57:55 -04:00
Donald Sharp
0435b31bb8 bgpd: Allow bgp to specify if it will allow v6 routing with v4 nexthops
Add a `--v6-with-v4-nexthop` cli to bgp to allow it to peer with
neighbors in the configuration where the interface has no v6 addresses
at all and there is a v4 address that is usable as a v4 address
embedded in a v6 address.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-03 08:25:20 -04:00
Donald Sharp
95002ded3e bgpd: Do not allow a peer to come up on v6 if we have no ability to route
Modify bgp to not allow a v6 peer to come up if the v6 afi is negotiated
and the outgoing interface has no v6 address as well as zebra does
not support the v6 with v4 nexthop capabilities that some dataplanes
allow.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-03 08:25:20 -04:00
Donald Sharp
052debc3ee bgpd: Have bgp notice the zebra ability to use v6_with_v4_nexthops
Store the data.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-03 08:25:20 -04:00
Donald Sharp
68f52d7a0c lib, zebra: Send up whether or not v6_with_v4_nexthops are supported
After Zebra knows it's capability surrounding v6 with v4 nexthops
have it send this ability up to interested parties.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-03 08:25:20 -04:00
Donald Sharp
1f5611c06d zebra: Allow zebra cli to accept v6 routes with v4 nexthops
add --v6-with-v4-nexthop cli to zebra to allow operator to
specify that this functionality is allowed.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-03 08:25:20 -04:00
Donald Sharp
7415f1e120
Merge pull request #14129 from samanvithab/bgpd_frr_fix
bgpd: Fix for session reset issue caused by malformed core attributes  in update message
2023-08-02 13:48:14 -04:00
Donatas Abraitis
dd08585f1a
Merge pull request #13466 from donaldsharp/remove_unneeded_test_files
tests: Remove unused file in isis_snmp test
2023-08-02 17:26:26 +03:00
Donald Sharp
cbbbf64f9a tests: Remove unused file in isis_snmp test
The */show_ip_route.ref files are never used, let's remove them
to prevent any future issues.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-02 07:15:59 -04:00
Samanvitha B Bhargav
70ff940fd1 bgpd: Fix session reset issue caused by malformed core attributes
RCA:
On encountering any attribute error for core attributes in update message,
the error handling is set to 'treat as withdraw' and
further parsing of the remaining attributes is skipped.
But the stream pointer is not being correctly adjusted to
point to the next NLRI field skipping the rest of the attributes.
This leads to incorrect parsing of the NLRI field,
which causes BGP session to reset.

Fix:
The stream pointer offset is rightly adjusted to point to the NLRI field correctly
when the malformed attribute is encountered and remaining attribute parsing is skipped.

Signed-off-by: Samanvitha B Bhargav <bsamanvitha@vmware.com>
2023-08-01 23:17:19 -07:00
Jafar Al-Gharaibeh
ec8ae8f093
Merge pull request #14125 from opensourcerouting/fix/drop_unused_lua_stuff
lib: Do not use time_t as a special Lua encoder/decoder
2023-08-01 23:52:56 -05:00
Donatas Abraitis
5da58d355a
Merge pull request #14116 from donaldsharp/test_naming
Test naming
2023-08-01 18:13:02 +03:00
Donald Sharp
901437c26b
Merge pull request #14071 from achernavin22/ospf-fix-abr-type-chg
ospfd: fix changing ABR type between standard and shortcut
2023-08-01 10:42:54 -04:00
Donald Sharp
369bdcaa1e tests: Convert d1 and d2 to output and expected in gen_json_diff_report
The output of gen_json_diff_report is used all over the place and
it outputs d1 and d2.  Let's change this to output and expected
as that is how it is used.  Should help with debugging.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-01 07:57:16 -04:00
Donald Sharp
29848dbe98 tests: Run black over lib/topotest.py
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-01 07:57:16 -04:00
Donald Sharp
0099493f1e tests: Start using output and expected vs d1 and d2
Let us start using output and expected in lib/topotest.py
because when we see output it is confusing what d1 is
versus what d2 is.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-01 07:57:16 -04:00
Donald Sharp
031d4586f0
Merge pull request #14117 from patrasar/pimv6_13893
pimd, pim6d: Don't set SRC_STREAM flag on LHR
2023-08-01 07:30:57 -04:00
Donatas Abraitis
27dbf81a73 lib: Do not use time_t as a special Lua encoder/decoder
This is purely an integer (long long/long), and causes issues for 32-bit systems.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-08-01 14:08:25 +03:00
Donatas Abraitis
12e4a71b3a
Merge pull request #14078 from raja-rajasekar/frr_dev1
zebra: fix nhe refcnt when frr service goes down
2023-08-01 12:19:18 +03:00
Jafar Al-Gharaibeh
ea21d8c46f
Merge pull request #14115 from opensourcerouting/fix/build_alpine_images
docker: Adjustments for Alpine 3.18 and buildx
2023-07-31 23:53:12 -05:00
Jafar Al-Gharaibeh
6ce04d8b6c
Merge pull request #14121 from opensourcerouting/deb-protobuf-depend
debian: Add missing protobuf dependency
2023-07-31 23:48:55 -05:00
恭简
4aa1aace3e zebra: remove duplicated nexthops when sending fpm msg
When zebra send msg to fpm client, it doesn't handle duplicated nexthops especially, which means if zebra has a route with NUM1 recursive nexthops, each resolved to the same NUM2 connected nexthops, it will send to fpm client a route with NUM1*NUM2 nexthops. But actually there are only NUM2 useful nexthops, the left NUM1*NUM2-NUM2 nexthops are all duplicated nexthops. By the way, zebra has duplicated nexthop remove logic when sending msg to kernel.
Add duplicated nexthop remove logic to zebra when sending msg to fpm client.

Signed-off-by: 恭简 <gongjian.lhr@alibaba-inc.com>
2023-08-01 09:58:00 +08:00
Martin Winter
62559e53ac
debian: Add missing protobuf dependency
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
2023-08-01 00:48:26 +02:00
Donatas Abraitis
fbedeb958f
Merge pull request #14042 from FRIDM636/extcomm-list-delete
bgp: add set BGP extended community for deletion command
2023-07-31 15:50:16 +03:00
Sarita Patra
8b36ee47d3 pimd, pim6d: Don't set SRC_STREAM flag on LHR
Setup:
------
R1( LHR) ---------R2( RP) ----------R3( FHR)

Problem:
-------
- Send IGMP/MLD join and traffic.
  LHR: (S,G) mroute is created with reference count = 2
  and set the flag SRC_STREAM.
  (Code flow: pim_mroute_msg_wholepkt -> pim_upstream_add,
              pim_upstream_sg_running_proc -> pim_upstream_ref)
- Send IGMP/MLD prune.
  LHR: removes (*,G) entry and it tries to remove childen (S,G) entries.
       But (S,G) is having reference count = 2. So after prune,
       (S,G) entry reference count becomes 1 and will be present
       until KAT expires.

Fix:
---
Don't set SRC_STREAM flag for LHR.
In LHR, (S,G) should be maintained, until (*,G) is present.
When prune receives delete (*,G) and children (S,G).
When traffic stops, delete (S,G) after KAT expires.

Issue: #13893

Signed-off-by: Sarita Patra <saritap@vmware.com>
2023-07-31 05:48:35 -07:00
Farid Mihoub
6e01399077 tests: test set extended-comm-list <> delete command
Signed-off-by: Farid Mihoub <farid.mihoub@6wind.com>
2023-07-31 11:52:40 +02:00
Farid Mihoub
902a8d1fd3 bgpd: add set extended-comm-list <> delete command
Signed-off-by: Farid Mihoub <farid.mihoub@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-07-31 11:52:36 +02:00
Donatas Abraitis
18becdc29e docker: Install the apk packages regardless of the platform
It was hardcoded to x86_64, but we build Alpine images for more platforms, let's
be dynamical here.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-07-31 11:05:15 +03:00
Donatas Abraitis
617b450d01 docker: Use openssl instead of libressl
libressl is dropped from Alpine 3.18 for s390x.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-07-31 11:04:30 +03:00
Donatas Abraitis
1aa34e5fb7
Merge pull request #14109 from taspelund/noreset_update_src
bgpd: skip reset when removing dup update-source
2023-07-31 10:03:28 +03:00
Donatas Abraitis
27330655f3
Merge pull request #14112 from donaldsharp/test_sponging
Test sponging
2023-07-31 10:02:36 +03:00
Donald Sharp
aba77de946
Merge pull request #14111 from opensourcerouting/isisd-tilfa-topotest-fixes
tests: improve stability of the IS-IS TI-LFA topotest
2023-07-29 19:22:34 -04:00
Donald Sharp
f3a4a85da7
Merge pull request #14103 from opensourcerouting/reload-keychain
tools: fix key chain reload removal
2023-07-29 14:01:31 -04:00
Donald Sharp
32bd81e365
Merge pull request #14104 from rampxxxx/bfd_audit
bfdd: add additional parameters to json command
2023-07-29 13:57:33 -04:00
Donald Sharp
15bf9baa98 tests: Convert isis to use 1 and 10 for hello/multiplier
Current isis tests use a variety of hello timers as well
as hello-multiplier, let's modify all of the isis test
cases to use 1 and 10.  This cleans up some spurious test
failures I was seeing locally.  As an example without
these changes running isis_tilfa_topo1 2r6 times I would
see 5-10 test failures now I am seeing ~2 test failures.

In any event part of the problem was that some tests were
not fully converged when looking at them under heavy
system load.  Changing this to 1/10 gives us 10 chances
to see the incoming packet.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-29 13:39:37 -04:00
Donald Sharp
940ac2a6fd tests: bfd_bgp_cbit_topo3 allow bgp to converge before testing
This test was failing upstream a bunch of times.  Upon examining
the log files as well as the test script it was noticed that
the bfd peers were checked to see that they had come up.  But
both the timers used for bgp as well as not checking that bgp
has actually come up would cause the test to fail in subsuquent
steps if bgp has not come up.  Test that bgp peering is actually
established before testing link down events.  It's possible
this test might need to be revisited to ensure that the routes
are actually installed and ready to go before as well, but I am
not seeing that right now.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-29 13:39:37 -04:00
Donald Sharp
e566e565c5 tests: Fix zebra_seg6_route to give more time for routes to be installed
This test is failing upstream regularly, when inspecting the log
files we see that the route being looked for is in a queued state
when the test fails.  Give this test more time for when the
system is under severe load.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-29 13:39:37 -04:00
Donald Sharp
3984301417 tests: isis_te_topo1 can fail occassionally
Upstream ( and locally ) this test fails.  The adj-sid value
being looked for in the testing is a dynamic value that is
assigned based upon how the network comes up.  The reality
is that there is no enforced order of what the adj-sid
can be.  As such this test looking for this value makes
no sense.  Let's remove that from the test.

Additionally bring the isis hello-interval to 1 down
from 3 to make things converge faster.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-29 13:39:37 -04:00
Donald Sharp
b0e3fcfe8c tests: zebra_netlink ensure the address is installed
Ran test under high load and system rejected the sharp
install of routes.  Only reason that that would happen
would be if the address had not been set by the kernel
yet.  The test log files had timestamp precision and the
addition of the sharp routes was under 1/10 of a second
after the address was attempted to be installed.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-29 13:39:37 -04:00
Renato Westphal
2dd4cad60b tests: increase wait timer in TI-LFA topotest
Starting from step 11, this topotest focuses on validating the TI-LFA
switchover functionality, where the backup nexthops are activated
after an adjacency expires, either with or without BFD.

Currently, the test checks the RIB shortly after the switchover using
a tight 5 seconds interval to ensure that the RIB update is due to the
switchover and not an SPF update (which is configured with an initial
delay of 15 seconds). However, it was observed that the kernel might
take longer than 5 seconds to install routes when the system is under
heavy load. To account for that, double the wait interval so that
this topotest will succeed even in those conditions.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
2023-07-29 14:18:39 -03:00
Renato Westphal
174a3d1b6e tests: ensure BFD session is up before proceeding to the next step
In this topotest, BFD is configured at the end of step 13. However,
in certain cases where the testing machine is exceptionally fast (e.g.,
Donald's quantum computer), there is a possibility that the interface
shutdown event from step 14 may occur before BFD has had sufficient
time to establish the session, which leads to a test failure. To fix
this problem, ensure the BFD session is up before proceeding to the
next step.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
2023-07-29 14:18:39 -03:00
Renato Westphal
e189d1ff1b isisd: update Node-SID flag dynamically
Node-SIDs refer to Prefix-SIDs associated with host prefixes of
loopback addresses. As such, whenever an interface address is added
or deleted, all configured Prefix-SIDs must be reevaluated to check
if the N-flag needs to be set or unset.

This change fixes some race conditions in the TI-LFA topotest where
specific sequence of events could cause Prefix-SIDs to not have the
N-flag set when they should, resulting in various failures.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
2023-07-29 14:18:39 -03:00
Renato Westphal
824b1d299a tests: increase hello multiplier in TI-LFA topotest
In this topotest, the IS-IS hello interval is set to 1 for fast
convergence. However, the current hello multiplier of 3 results in a
tight IS-IS adjacency holdtime of 3 seconds. This tight timeframe can
cause failures when the testing machine is running multiple tests at
full capacity. To improve stability under such conditions, this commit
raises the hello multiplier to 10, providing a more forgiving holdtime
and reducing the likelihood of failures.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
2023-07-29 14:10:31 -03:00
lynnemorrison
59342702bf bfdd: add debug flag around log
Signed-off-by: Lynne Morrison <lynne.morrison@ibm.com>
2023-07-28 12:38:02 -04:00
Trey Aspelund
62a452c47f bgpd: skip reset when removing dup update-source
When 'no neighbor .. update-source' is issued for a regular peer, that
peer is always reset.  This is unnecessary if the peer is a member of a
peer-group and it inherits an identical update-source, so let's skip
the reset/Notification for that condition.

Config:
------------
router bgp 1
 neighbor PG peer-group
 neighbor PG remote-as internal
 neighbor PG update-source 100.64.0.3
 neighbor 192.168.122.99 peer-group PG
 neighbor 192.168.122.99 update-source 100.64.0.3

Before:
------------
ub20-2(config-router)# do show ip bgp sum | include .99
192.168.122.99  4          1        36        34        0    0    0 00:00:17            0        0 N/A
ub20-2(config-router)# do show ip bgp neighbors 192.168.122.99 | include Local host
Local host: 100.64.0.3, Local port: 46083
ub20-2(config-router)# no neighbor 192.168.122.99 update-source
ub20-2(config-router)# do show ip bgp sum | include .99
192.168.122.99  4          1        36        35        0    0    0 00:00:01         Idle        0 N/A
ub20-2(config-router)# do show ip bgp neighbors 192.168.122.99 | include Local host
Local host: 100.64.0.3, Local port: 39847

After:
------------
ub20-2(config-router)# do show ip bgp sum | include .99
192.168.122.99  4          1         3         3        0    0    0 00:00:20            0        0 N/A
ub20-2(config-router)# do show ip bgp neighbors 192.168.122.99 | include Local host
Local host: 100.64.0.3, Local port: 39415
ub20-2(config-router)# no neighbor 192.168.122.99 update-source
ub20-2(config-router)# do show ip bgp sum | include .99
192.168.122.99  4          1         3         3        0    0    0 00:00:28            0        0 N/A
ub20-2(config-router)# do show ip bgp neighbors 192.168.122.99 | include Local host
Local host: 100.64.0.3, Local port: 39415

Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
2023-07-28 16:34:55 +00:00
lynnemorrison
d63ccc9248 bfdd: add additional parameters to json command
Add parameters to the "show bfd peers json" command to
display interface and type of BFD session.

Signed-off-by: Lynne Morrison <lynne.morrison@ibm.com>
2023-07-28 11:45:23 -04:00
Rafael Zalamena
96f76f7663 tools: fix key chain reload removal
When deleting a key chain with frr-reload track if the whole root node
is being removed or not.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2023-07-27 10:47:23 -03:00
Russ White
49e9bb2819
Merge pull request #12524 from leonshaw/fix/evpn-nh-order
lib, zebra: Fix EVPN nexthop config order
2023-07-27 07:30:23 -04:00
Xiao Liang
cea3f7f25a lib, zebra: Fix EVPN nexthop config order
Delay EVPN route addition to synchronize with rib_delete(), which now
uses early route queue.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
2023-07-27 15:07:42 +08:00
Rajasekar Raja
cd0558e629 zebra: fix nhe refcnt when frr service goes down
When frr.service is going down(restart or stop),
zebra core can be seen.

Sequence of events leading to crash:
Increments of nhe refcnt:
  - Upper level creates a new nhe(say NHE1) —> nhe->refcnt=1
  - Two RE’s (Say RE1 & RE2) associate with NHE1 —> nhe->refcnt = 3

Decrements of nhe refcnt:
  - BGP sends a zapi msg to zebra to delete NHG. —> nhe->refcnt = 2
  - RE1 is queued for delete in META-Q
  - As zebra is dissociating with its clients, zebra_nhg_score_proto() is
     invoked -> nhe->refcnt=1
  - RE2 is no more associated with the NHE1 —>nhe->refcnt=0 &
     hence NHE IS FREED
  - Now RE1 is dequeued from META-Q for processing the re delete. —> At
     this point re->nhe is pointing to freed pointer. CRASH CRASH!!!!

Fix:
  - When we iterate zebra_nhg_score_proto_entry() to delete the upper
     proto specific nhe’s,  we need to skip the additional nhe->refcnt
     decrement in case nhe->flags has NEXTHOP_GROUP_PROTO_RELEASED set.

Backtrace-1
0x00007fa8449ce8eb in raise () from /lib/x86_64-linux-gnu/libc.so.6
0x00007fa8449b9535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
0x00007fa844d32f86 in _zlog_assert_failed (xref=xref@entry=0x55fa37871040 <_xref.28142>, extra=extra@entry=0x0) at lib/zlog.c:680
0x000055fa3778f770 in rib_re_nhg_free (re=0x55fa39e33770) at zebra/zebra_rib.c:2578
rib_unlink (rn=0x55fa39e27a60, re=0x55fa39e33770) at zebra/zebra_rib.c:3930
0x000055fa3778ff18 in rib_process (rn=0x55fa39e27a60) at zebra/zebra_rib.c:1439
0x000055fa37790b1c in process_subq_route (qindex=8 '\b', lnode=0x55fa39e1c1b0) at zebra/zebra_rib.c:2549
process_subq (qindex=META_QUEUE_BGP, subq=0x55fa3999c580) at zebra/zebra_rib.c:3107
meta_queue_process (dummy=<optimized out>, data=0x55fa3999c480) at zebra/zebra_rib.c:3146
0x00007fa844d232b8 in work_queue_run (thread=0x7ffffbdf6cb0) at lib/workqueue.c:285
0x00007fa844d195fd in thread_call (thread=thread@entry=0x7ffffbdf6cb0) at lib/thread.c:2008
0x00007fa844cd3888 in frr_run (master=0x55fa397b7630) at lib/libfrr.c:1223
0x000055fa3771e294 in main (argc=12, argv=0x7ffffbdf7098) at zebra/main.c:526

Backtrace-2
0x00007f125af3f535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
0x00007f125b2b8f96 in _zlog_assert_failed (xref=xref@entry=0x7f125b344260 <_xref.18768>, extra=extra@entry=0x0) at lib/zlog.c:680
0x00007f125b268190 in nexthop_copy_no_recurse (copy=copy@entry=0x5606dd726f10, nexthop=nexthop@entry=0x7f125b0d7f90, rparent=<optimized out>) at lib/nexthop.c:806
0x00007f125b2681b2 in nexthop_copy (copy=0x5606dd726f10, nexthop=0x7f125b0d7f90, rparent=<optimized out>) at lib/nexthop.c:836
0x00007f125b268249 in nexthop_dup (nexthop=nexthop@entry=0x7f125b0d7f90, rparent=rparent@entry=0x0) at lib/nexthop.c:860
0x00007f125b26b67b in copy_nexthops (tnh=tnh@entry=0x5606dd9ec748, nh=<optimized out>, rparent=rparent@entry=0x0) at lib/nexthop_group.c:457
0x00007f125b26b6ba in nexthop_group_copy (to=to@entry=0x5606dd9ec748, from=from@entry=0x5606dd9ee9f8) at lib/nexthop_group.c:291
0x00005606db6ec678 in zebra_nhe_copy (orig=0x5606dd9ee9d0, id=id@entry=0) at zebra/zebra_nhg.c:431
0x00005606db6ddc63 in mpls_ftn_uninstall_all (zvrf=zvrf@entry=0x5606dd6e7cd0, afi=afi@entry=2, lsp_type=ZEBRA_LSP_NONE) at zebra/zebra_mpls.c:3410
0x00005606db6de108 in zebra_mpls_cleanup_zclient_labels (client=0x5606dd8e03b0) at ./zebra/zebra_mpls.h:471
0x00005606db73e575 in hook_call_zserv_client_close (client=0x5606dd8e03b0) at zebra/zserv.c:566
zserv_client_free (client=0x5606dd8e03b0) at zebra/zserv.c:585
zserv_close_client (client=0x5606dd8e03b0) at zebra/zserv.c:706
0x00007f125b29f60d in thread_call (thread=thread@entry=0x7ffc2a740290) at lib/thread.c:2008
0x00007f125b259888 in frr_run (master=0x5606dd3b7630) at lib/libfrr.c:1223
0x00005606db68d298 in main (argc=12, argv=0x7ffc2a740678) at zebra/main.c:534

Issue: 3492031

Ticket# 3492031

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2023-07-26 21:17:16 +00:00