If the RT changes on a L3VPN route then any leak of this route into
a VRF should be withdrawn.
Extend existing EVPN check for RT change to cover L3VPN routes.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
the validation of rpki routes will impact the matching bgp instance.
Until now, the rpki was triggering validation of all bgp entries.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki config can be displayed in the 'show running-config'.
there is a fix to be done yet, this is related to the order of rpki per
vrf configuration. actually, the output is not saveable in the
running-config since the rpki commands are swapped. this prevents from
running rpki config at startup.
That commit also changes the identation, since rpki configure node was
with one extra space. reducing this, and add the changes for vrf
configuration too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki vrf subnode is instantiated under the vrf subnode.
It it to be noted that this commit contains a change in vtysh.
Actually, the output of bgp daemon from show running-config is extracted
in vtysh, and reengineered ( hence the vtysh_config.c change done). This
permits having a subnode under vrf sub node.
Also, add vrf node support to bgpd, as rpki command can not be found
under vrf node.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
it is possible to dump rpki commands per vrf context.
also, rpki start/stop commands are also appended with vrfname parameter.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this commit change introduces a callback function pointer that rtrlib
calls. this permits to create the socket and initialising the socket
with the right information, in the right vrf. Adding to this, rpki uses
a hook to be triggered when a vrf is enabled/disabled. in this way,
start mechanisms will be triggered only when vrf is available, and stop
mechanism will be done upon vrf disable event.
Adding to this, the cache structure contains a back pointer to the rpki
vrf structure. this is done to retrieve the vrf where the cache points
to.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki context can be removed by doing 'no rpki' command from configure
node. this work allows to allocate the associated rpki_vrf context when
entering in rpki node, instead of at the initialisation step.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this work is a preparatory work so that rpki can have per-vrf contexts.
the work consists in allocating a rpki_vrf structure with all inside:
rtr_config, cache, etc..
This work is also necessary in the long term support with yang
northboundapi. Indeed, there may be highly possible that yang context
for rpki be defined per core instance.
That work also instantiates a list of rpki_vrf, though only one instance
is created.
That work also introduces a vrfname field attribute that is set to null
for now , and stands for default vrf where rpki is configured on.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki debugging is linked with standard bgp debugging facilities.
- debug rpki is dumped in running-config if the command is executed from
configure terminal.
- show debugging indicated whether rpki debug is enabled or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when a plugin is attached, some debugs may be attached to that plugin.
For that, add one hook that is interacting with vty: a boolean indicates
what the usage is for: either for impacting the 'show running-config',
or for impacting the 'show debugging' command.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the show running-config rpki was displaying systematically the default
values, when at least one cache server was configured. now, if the rpki
configuration has been changed, either because of a new cache server, or
because of a change in the default settings, then the associated
configuration is dumped in the 'show running-config' command.
adding to this, to permit user to dump the settings values, the command
'show rpki configuration' dumps the values whatever default or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
if ssh cache servers are configured, then show rpki-table is looking at
the tcp server context. Fix this by checking the server cache type, and
also display the ssh context if this is configured.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
currently, private and public key files must differ with the suffix
keywork : '.pub'. If it is not the case, the pub key is ignored.
Inform user for that.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
We have a bunch of code in bgp_vty.c that was passing
to peer_af_flag_modify_vty more than 1 flag at a time.
This was causing the underlying routines to get the
flags wrong. In order to prevent this convert all the
places where we send multiple flags down to this function
to individual flag changes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
SIGHUP is ostensibly supposed to reload configuration
from a fresh slate. This is currently horribly broken
so much so that bgp just crashes. I see no point
in trying to make this work considering the yang
work coming down the pike.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Issue: bgp_process_writes will be called when the fd is writable.
And it will bgp_generate_updgrp_packets to generate the
update packets no matter MRAI is set or not.
Fix: bgp_generate_updgrp_packets thread will return without sending
any update when MRAI timer is still running.
Signed-off-by: Richard Wu <wutong23@baidu.com>
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
We use ASN:VNI format to calculate auto RT for L3VNI.
When L3VNI is not configured, if we delete the configured RT, incorrect auto-RT
value is generated as VRF VNI is 0.
Fix:
Do not configure auto-RT if L3VNI is not configured.
Trigger:
1. Delete L3VNI
2. Delete configured RT.
Before fix:
dev# sh bgp vrf vrf-blue vni
BGP VRF: vrf-blue
Local-Ip: 10.100.0.1
L3-VNI: 0
Rmac: 00:00:00:00:00:00
VNI Filter: none
L2-VNI List:
Export-RTs:
RT:101:0
Import-RTs:
RT:101:0
RD: 10.100.0.1:2
After fix:
dev# sh bgp vrf vrf-blue vni
BGP VRF: vrf-blue
Local-Ip: 10.100.0.1
L3-VNI: 0
Rmac: 00:00:00:00:00:00
VNI Filter: none
L2-VNI List:
Export-RTs:
Import-RTs:
RD: 10.100.0.1:2
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
If we have something like:
```
ip route 1.1.1.0/24 Null0
!
router bgp 100
no bgp ebgp-requires-policy
neighbor 192.168.0.2 remote-as 200
!
address-family ipv4 unicast
network 1.1.1.0/24
redistribute connected
exit-address-family
!
line vty
!
```
1.1.1.0/24 is not advertised due to martian nexthop (0.0.0.0). It starts
working only when we use `redistribute static`.
By checking if it's a BGP static route we able to announce
1.1.1.0/24 with `network 1.1.1.0/24` without redistribute even when
`bgp import-check` is enabled.
Disabling `bgp import-check` works as well, but it's enabled by default
since 7.4.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
RFC states that time should be in seconds since the epoch.
The code was using system uptime in seconds.
Fixes: #6549
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Announcements that are marked as invalid were previously not revalidated.
This was fixed by replacing the range lookup with a subtree lookup.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Currently the I/O pthread handles incoming/outgoing data
communication with all peers. There is no attempt at modifying
the hold timers. It's sole goal is to read/write data to appropriate
channels. All this data is handled as *events* on the master pthread
in BGP. The problem is that if the master pthread is extremely busy
then any packet read that would be treated as a keepalive event may
happen after the hold timer pops, due to the way thread events are handled
in lib/thread.c.
In a last gap attempt, if we notice that we have incoming data
to proceses on the input Queue, slightly delay the hold timer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The tr_*_config structs were previously not pre initialized because
every field is initialized explicitly. But future rtrlib version will
introduce additional fields. Preinitialising the entire struct will
ensure forward compatibility.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Problem reported where bgp sessions were being torn down for ibgp
peers with the reason being optional attribute error. Found that
when a route was leaked, the RTs were stripped but the actual
EXTCOMMUNUNITY attribute was not cleared so an empty ecommunity
attribute stayed in the bgp table and was sent in updates.
Ticket: CM-30000
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
When a peer is bound to a peer-group, the GR flags set on the
peer are over-written.
Update the GR flags for the peer after it has been bound to a
peer-group.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
The code in the bgp extcommunity-list function was using
argv_find to get the correct idx. The problem was that
we had already done argv_finds before and idx was non-zero
thus having us always set the seq pointer to what was last
looked up. This causes us to pass in a value to the
underlying function and it would just wisely ignore it
causing a seq number of 0.
We would then write this seq number of 0 and then immediately
reject it on read in again. BOO!
Actually handle argv_find the way it was meant to be.
Ticket:CM-29926
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When issuing the command `match ip next-hop address`
bgp would crash. This is because the no form of the
command was making the address optional and we would
try to read data we should not be.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
bgp_accept() gets called over and over again when a VRF device is
deleted out from under a bgp listener socket that is bound to it.
Prevent this by noting the error and cancelling ourselves, allowing the
vrf status code to clean up the mess when it receives word about the
change from Zebra.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Try to give a bit more useful data about where we
think the connection is trying to come in from.
Hopefully this will let us debug connection issues
a bit faster in cases where there are config issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When received packet is processed in bgp_process_reads(), the data
is copied to static buffer and then copied to stream buffer.
The data can be copied directly to stream buffer which will avoid extra memcpy
Signed-off-by: kssoman <somanks@gmail.com>
Don't attempt to send BFD daemon a message to remove the peer
registration on daemon exit, otherwise we'll access a dangling
interface pointer and we'll crash.
This crash was not previosly possible because the function that built
the message was passing the interface pointer but not using it due to
the exit condition.
In `lib/bfd.c`:
```
void bfd_peer_sendmsg(struct zclient *zclient, struct bfd_info *bfd_info,
int family, void *dst_ip, void *src_ip, char *if_name,
int ttl, int multihop, int cbit, int command,
int set_flag, vrf_id_t vrf_id)
{
struct bfd_session_arg args = {};
size_t addrlen;
/* Individual reg/dereg messages are suppressed during shutdown. */
if (CHECK_FLAG(bfd_gbl.flags, BFD_GBL_FLAG_IN_SHUTDOWN)) {
if (bfd_debug)
zlog_debug(
"%s: Suppressing BFD peer reg/dereg messages",
__func__);
return;
}
```
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
To remove a BFD profile without removing the BFD configuration just call
`neighbor <A.B.C.D|X:X::X:X|WORD> bfd`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Allow BGP to use the new API to configure BFD session profiles. Now it
is possible to preconfigure BFD sessions without needing to create the
peers.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
"set community accept-own-nexthop" returns "malformed communities"
error. This is because the token matching hits an earlier "accept-own"
and leaves "-nexthop" as a separate token to be processed.
Reorder the switch cases so that both are processed correctly.
Signed-off-by: Appu Joseph <apjo@kaloom.com>
We are crashing in thread_cancel on shutdown because
the thread pointer is NULL. Use the more appropriate
THREAD_CANCEL macro
Ticket: CM-29873
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Extend the next hop tracking for type-2 and type-3 EVPN routes also.
Updates: "bgpd: Add nexthop of received EVPN RT-5 for nexthop tracking"
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When there is a NHT change and the paths dependent on that NHT are being
evaluated, skip those that are marked for removal or as history.
When a route gets withdrawn, its valid flag is cleared and it is flagged
for removal; in the case of an EVPN route, it is also unimported from
VRFs (L2 and/or L3). bgp_process is then scheduled. Under rare timing
conditions, an NHT update for the route's next hop may arrive right after,
and if routes flagged for removal are not skipped, they may not only be
incorrectly marked as valid but also re-imported in the case of EVPN,
which will be a serious error.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ensure that only if there is a change to the path's validity based
on the NHT update, EVPN import or unimport is invoked.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Display next hop resolution information, whether the "detail" option is
specified or not as it is quite fundamental and only minimally increases
the output.
Introduce option to look at a specific NHT entry, which will also show
the paths associated with that entry.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Clean up a few lines of cli command installation; remove a
duplicate; follow the command grouping pattern better.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
There can be cases where evpn traffic is not meshed across various
endpoints, but sent to a central pe. For this situation, remove the
nexthop unchanged default behaviour for bgp evpn. Also add route
reflector commands to bgp evpn node.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Non-best paths (path info structures) also need to be freed during
table cleanup not only to release their memory but to also ensure
any linkages are updated correctly. One such example is for EVPN
where there is a link between the imported path info (in a L2 or
L3 vrf instance) and its parent path info.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
We had already removed the `ip as-path..` command
to have `bgp as-path` but for some reason a `no ip as-path..`
command ALIAS was still around. Kill with extreme prejudice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Without specifying a default afi/safi we get a segfault:
```
(gdb) frame 4
bgp_table_stats (..., afi=32724, safi=SAFI_UNICAST, ...
11349 if (!bgp->rib[afi][safi]) {
(gdb)
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
L3VNI is configured with "prefix-routes-only" flag. Even in this case,
intermittently, we observed that local EVPN MACIP routes are installed and
advertised with 2 labels and 2 export RTs.
This is a sequencing issue. Consider following case where L2VNI 200 and L3VNI
1000 are configured for tenant vrf vrf-blue.
Bug is observed for following sequence of events:
1. vrf-blue BGP instance is created.
2. L2VNI is created in bgp for vni 200. It is linked to the tenant vrf vrf-blue
in function bgpevpn_link_to_l3vni.
Following code sets "VNI_FLAG_USE_TWO_LABELS" flag for vni 200 as L3VNI is not
yet attached to vrf-blue BGP instance.
/* check if we are advertising two labels for this vpn */
if (!CHECK_FLAG(bgp_vrf->vrf_flags, BGP_VRF_L3VNI_PREFIX_ROUTES_ONLY))
SET_FLAG(vpn->flags, VNI_FLAG_USE_TWO_LABELS);
2. Now L3VNI is attached to vrf-blue BGP instance. In this case, we set
BGP_VRF_L3VNI_PREFIX_ROUTES_ONLY flag for vrf-blue but we do not clear
VNI_FLAG_USE_TWO_LABELS flag set on the corresponding L2VNIs.
This fix resolves following 2 issues observed above.
1. When L2VNI is created in BGP, flag VNI_FLAG_USE_TWO_LABELS should not be set
for this VNI if BGP vrf is not attached to any L3VNI.
2. When L3VNI is attached to the BGP vrf, set "VNI_FLAG_USE_TWO_LABELS" flag
if "prefix-routes-only" is not for the vrf.
UT cases:
1. Flap "prefix-routes-only" config for a vrf.
2. Test following triggers for vrfs with and without "prefix-routes-only"
- Flap L2VNI from kernel.
- Flap L3VNI from kernel.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
The `bgp bestpath bandwidth` command should not be a legal
command. Pull out the `no` form to allow this. Allow
`no bgp bestpath bandwidth` to work as we would expect.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is not the attribute involved in path selection and by rfc7606 it should
be just ignored.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Community attributes might have been removed by an inbound route map, so we
should check to ensure they still exist before trying to free them.
This fixes a segfault described in issue #6345.
Signed-off-by: Josh Cox <josh.cox@pureport.com>
The problem is that peer_af_array returns NULL when SAFI is changed to
unicast. We use unicast table, but peer is created and activated under
labeled-unicast, hence we should lookup with a proper SAFI id.
Without this patch peer_af_find() returns NULL and we can't show
PfxSnt in `show bgp summary`.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
It is possible that the if_lookup_by_index() call will return
a NULL value and calling zclient_send_interface_radv_req. Just
test that we have a valid interface pointer.
Found by Coverity
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
unicast and labeled-unicast share the same table, but configuration should
be visible for both independently. Without this fix it confuses a bit
because when you enter `network 10.0.0.0/24` under labeled-unicast it's
written in unicast family block.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Modify the import-check command to require the underlying prefix
to exist in the rib. General consensus is that this is the correct
behavior.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that in many circumstances, RAs created in the
process of bringing up numbered IPv6 peers with extended-nexthop
capability enabled (for ipv4 over ipv6) were not stopped on the
interface when those peers were deleted. Found several circumstances
where this occurred and fix them in this patch.
Ticket: CM-26875
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
These are easy to get subtly wrong, and doing so can cause
nondeterministic failures when racing in parallel builds.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Issue:
Configuring default-originate when static default route is previously
advertised results in withdrawal of the route.
Fix :
Delete the adj-out entry for the previously advertised static
default route without sending explicit withdraw message.
Signed-off-by: kssoman <somanks@gmail.com>
the nlri flowspec above 240 bytes size was not handled.
Over 240 bytes, the length is 2 bytes length, and a calculation must be
done to obtain the real length. This commit handles it appropriately.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
- Fix 1 byte overflow when showing GR info in bgpd
- Use PATH_MAX for path buffers
- Use unsigned specifiers for uint16_t's in zebra pbr
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When we receive an UPDATE with MP_NEXTHOP len as 32 bytes, we shouldn't
check if the global (1st) nexthop is unspecified.
Peering between bird and FRRouting we receive from Bird something like:
```
rcvd UPDATE w/ attr: , origin i, mp_nexthop ::(fe80::a00:27ff:fe09:f8a3)
```
The link-local (2nd) nexthop is valid and validated later in the code.
Before it was marked:
```
IPv6 unicast -- DENIED due to: martian or self next-hop;
```
After it's a valid prefix:
```
spine1-debian-9# show bgp
BGP table version is 0, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
2a02:4780::/64 fe80::a00:27ff:fe09:f8a3
0 65001 i
Displayed 1 routes and 1 total paths
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Replace all `random()` calls with a function called `frr_weak_random()`
and make it clear that it is only supposed to be used for weak random
applications.
Use the annotation described by the Coverity Scan documentation to
ignore `random()` call warnings.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
In real world sometimes happens that bgp_nexthop_cache is NULL. Avoid
segfaulting when using `show [ip] bgp ...` CLI commands.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Rather than doing a f*gly hack for the RPKI code, let's do an on-exit
hook in cmd_node. Also allows replacing some special-casing in the vty
code.
Signed-off-by: David Lamparter <equinox@diac24.net>
And again for the name. Why on earth would we centralize this, just so
people can forget to update it?
Signed-off-by: David Lamparter <equinox@diac24.net>
Same as before, instead of shoving this into a big central list we can
just put the parent node in cmd_node.
Signed-off-by: David Lamparter <equinox@diac24.net>
There is really no reason to not put this in the cmd_node.
And while we're add it, rename from pointless ".func" to ".config_write".
[v2: fix forgotten ldpd config_write]
Signed-off-by: David Lamparter <equinox@diac24.net>
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>
The problem is when using kinda such topologies:
(192.168.1.1/32) r1 <-- eBGP --> r2 <-- iBGP --> r3
Looking at r3's nexthop for 192.168.1.1/32 we have it as r2, but really
it MUST be r1.
Checking if the nexthop is connected solves the problem even for cases
when route-reflectors are used.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
This fixes unnecessary whitespaces and makes capitalization
match for route type help strings.
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
Some competitive vendors like Cisco, Bird, OpenBGPD,
Nokia already have this by default enabled.
The list is here: https://github.com/bgp/RFC8212
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Problem Description:
=====================
+--+ +--+
|R1|-(192.201.202.1)----iBGP----(192.201.202.2)-|R2|
+--+ +--+
Routes on R2:
=============
S>* 202.202.202.202/32 [1/0] via 192.201.78.1, ens256, 00:40:48
Where, the next-hop network, 192.201.78.0/24, is a directly connected network address.
C>* 192.201.78.0/24 is directly connected, ens256, 00:40:48
Configurations on R1:
=====================
!
router bgp 201
bgp router-id 192.168.0.1
neighbor 192.201.202.2 remote-as 201
!
Configurations on R2:
=====================
!
ip route 202.202.202.202/32 192.201.78.1
!
router bgp 201
bgp router-id 192.168.0.2
neighbor 192.201.202.1 remote-as 201
!
address-family ipv4 unicast
redistribute static
exit-address-family
!
Step-1:
=======
R1 receives the route 202.202.202.202/32 from R2.
R1 installs the route in its BGP RIB.
Step-2:
=======
On R1, a connected interface address is added.
The address is the same as the next-hop of the BGP route received from R2 (192.201.78.1).
Point of Failure:
=================
R1 resolves the BGP route even though the route's next-hop is its own connected address.
Even though this appears to be a misconfiguration it would still be better to safeguard the code against it.
Fix:
====
When BGP receives a connected route from Zebra, it processes the
routes for the next-hop update.
While doing so, BGP must ignore routes whose next-hop address matches
the address of the connected route for which Zebra sent the next-hop update
message.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
Ensure that upon a link-bandwidth change - for e.g., due to change in
the number of multipaths - EVPN type-5 route injection is triggered.
In the absence of this, the proper link-bandwidth is not updated in
EVPN type-5 routes originated by the router.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
take into account polychaeta tips ono code style.
also, take into account miscellaneous code style recommandations like
braces usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Multiple different issues causing mostly UAFs but maybe other more
subtle things.
- Cluster lists were the only attributes whose pointers were not being
NULL'd when freed, resulting in heap UAF
- When performing an insert into the cluster hash, our temporary struct
used for hash_get() was inconsistent with our hash keying and
comparison functions. In the case of a zero length cluster list, the
->length field is 0 and the ->list field is NULL. When performing an
insert, we set the ->list field regardless of whether the length is 0.
This resulted in the two cluster lists hashing equal but not comparing
equal. Later, when removing one of them from the hash before freeing
it, because the key matched and the comparison succeeded (because it
was set to NULL *after* the search but *before* inserting into the
hash) we would sometimes release the duplicated copy of the struct,
and then free the one that remained in the hash table. Later accesses
constitute UAF. This is fixed by making sure the fields used for the
existence check match what is actually inserted into the hash when
that check fails.
This patch also makes cluster_unintern static, because it should be.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
... Oops ...
(for context, the defaults code originally didn't have a dedicated
"bool" variant and just used long for bools... I derp'd this when
adding bool as a separate case :( )
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@diac24.net>
This macro is undefined if vnc is disabled, and while it defaults to 0,
this is still wrong and causes issues with -Werror
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This is a full rewrite of the "back end" logging code. It now uses a
lock-free list to iterate over logging targets, and the targets
themselves are as lock-free as possible. (syslog() may have a hidden
internal mutex in the C library; the file/fd targets use a single
write() call which should ensure atomicity kernel-side.)
Note that some functionality is lost in this patch:
- Solaris printstack() backtraces are ditched (unlikely to come back)
- the `log-filter` machinery is gone (re-added in followup commit)
- `terminal monitor` is temporarily stubbed out. The old code had a
race condition with VTYs going away. It'll likely come back rewritten
and with vtysh support.
- The `zebra_ext_log` hook is gone. Instead, it's now much easier to
add a "proper" logging target.
v2: TLS buffer to get some actual performance
Signed-off-by: David Lamparter <equinox@diac24.net>
- each statistics is encapsulated into concatenated "<afi><safi>" value.
- the json encoding for floating and double values is using json api
double api. this change is done for bgp statistics.
- the lines over 80 characters have been handled.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this command is a shortcut to facilitate the extraction of statistics
for all afi/safi related to one bgp instance.
the command is: show bgp [vrf XX] statistics-all [json]
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
safis that use a route distinguisher in bgp tables, and as such
introduce a two level hierarchy on the bgp table, must be made available
to statistics too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
add json support for show bgp statistics command.
The title of the stats entry is aggregated without spaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The BGP Router MAC extended community should be unique and not occur
multiple times. In a VRF-to-VRF route-leak scenario where EVPN routes
from a source VRF are leaked into the target VRF and then injected
back into EVPN from the target VRF, the resulting route had more than
one RMAC. With this fix, the resulting route will have only the
target VRF's RMAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The EVPN advertise route-map may generate extended communities for an IPv4
or IPv6 route injected into EVPN as type-5. If so, allow for it and add
to it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Support configurable options to control how link bandwidth is handled
by the receiver. The default behavior is to automatically honor the
link bandwidths received and use it to perform a weighted ECMP BUT only
if all paths in the multipath have associated link bandwidth; if one or
more paths do not have link bandwidth, normal ECMP is performed among
the multipaths. This behavior is as recommended by
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth.
The additional options available are to (a) completely ignore any link
bandwidth (i.e., weighted ECMP is effectively disabled), (b) skip paths
in the multipath which do not have link bandwidth and perform weighted
ECMP among the other paths (if at least some paths have the bandwidth)
or (c) use a default weight (value chosen is 1) for the paths which
do not have link bandwidth.
The command syntax is
bgp bestpath bandwidth <ignore|skip-missing|default-weight-for-missing>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When announcing ourselves as the next hop (e.g., to EBGP peers), if the
best path has the link bandwidth extended community and it is transitive,
change the value of the link bandwidth to the cumulative downstream
bandwidth (sum of the link bandwidths of all our multipaths) as this
makes the most sense. It is also implied by
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz. Of course, do
not override the link bandwidth if it has been specified by policy.
Note: Transitive extended communities will be automatically passed along
to EBGP peers; this commit is updating the value that is announced to
something that is the most appropriate.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Implement the code to handle the other route-map options to generate
the link bandwidth, namely, to use the cumulative bandwidth or to
base this on the number of multipaths. In the latter case, a reference
bandwidth is internally chosen - the implementation uses a value of
1 Gbps.
These additional options mean that the prefix may need to be advertised
if there is a link bandwidth change, which is a new criteria. Define a
new path (change) flag to support this and implement the advertisement.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
The BGP link bandwidth extended community must not be repeated. If the
attribute already carries this and the route-map specifies a new value,
the implementation will honor the policy configuration and overwrite
the existing values.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Certain extended communities cannot be repeated. An example is the
BGP link bandwidth extended community. Enhance the extended community
add function to ensure uniqueness, if requested.
Note: This commit does not change the lack of uniqueness for any of
the already-supported extended communities. Many of them such as the
BGP route target can obviously be present multiple times. Others like
the Router's MAC should most probably be present only once. The portions
of the code which add these may already be structured such that duplicates
do not arise.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Perform weighted ECMP if the multipaths have link bandwidth. This involves
assigning weights to each of the next hops associated with the prefix based
on the link bandwidth of the corresponding path as a factor of the total
(cumulative) link bandwidth for the prefix. The weight values used are
between 1 and 100. Weights are assigned only if all paths in the multipath
have link bandwidth, otherwise any bandwidths are ignored and regular
ECMP is performed. This is as recommended in
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth
A subsequent commit will implement additional (user-configurable) behaviors.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
During multipath update, track the cumulative link bandwidth
as well as update flags appropriately.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Introduce fields in the multipath structure for link bandwidth handling.
In the process, the mp_count field is changed to a uint16 as that is the
value set anyway.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Additional extended community definitions and display of link-bandwidth
extended community.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Implement route-map option to set the link-bandwidth extended
community. The command is of the form:
set extcommunity bandwidth <(1-26214400)|cumulative|num-multipaths>
[non-transitive]
The options available are to specify the actual bandwidth value in
Mbps, base it on the cumulative downstream bandwidth or base it on
the number of multipaths. The last option is based on
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz. Further,
in alignment with the use case described in this IETF draft, the
extended community is encoded as transitive by default. There is an
option available to specify that it should be non-transitive.
The link-bandwidth itself is carried in bytes per second as specifed in
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth
Note: This commit only handles the processing for bandwidth specifed
as a value; subsequent commits will handle the processing of the other
options.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
In the past, we always displayed the number of buffered ingress packets
as zero because there was no packet buffering in the input path and
therefore never any queue size to report. They're buffered now so we can
display something meaningful instead of 0.
Also change the inq / outq lookups to be atomic, since they can be
modified elsewhere. These should still compile down to an unfenced word
read but it's good to be explicit.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
https://lists.frrouting.org/pipermail/frog/2020-March/000776.html
It was pointed out that we are not properly passing the nexthop
through and instead we were replacing the nexthop as a Route Server
with our own.
https://tools.ietf.org/html/rfc4456#section-4
10. Implementation Considerations
Care should be taken to make sure that none of the BGP path
attributes defined above can be modified through configuration when
exchanging internal routing information between RRs and Clients and
Non-Clients. Their modification could potentially result in routing
loops.
In addition, when a RR reflects a route, it SHOULD NOT modify the
following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED.
Their modification could potentially result in routing loops.
Modify the code such that when FRR is instructed to act as a
Route-Server to pass through the nexthop.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add new function `bgp_node_get_prefix()` and modify
the bgp code base to use it.
This is prep work for the struct bgp_dest rework.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Show if this malformed under `show [ip] bgp <prefix>`:
```
eva# sh ip bgp 103.79.124.0/22
BGP routing table entry for 103.79.124.0/22
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.201.136
64539 15096 6939 7545 7545 136001, (aggregated by 0(malformed) 0.0.0.0)
192.168.201.136 from 192.168.201.136 (192.168.201.136)
Origin IGP, valid, external, best (First path received)
Last update: Thu Mar 26 10:02:07 2020
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Having a full feed this leads to unknown. You can't point which prefix or
aspath has this malforming behavior.
Printing just `[EC 33554434] AGGREGATOR attribute is BGP_AS_ZERO(0)` isn't
enough, you can't directly pin-point where is the problem.
Additionally print at least aspath here:
```
[EC 33554434] AGGREGATOR AS number is 0 for aspath: 65000 65031
```
Overall the full table has only 6 such malformed prefixes:
```
aspath: 64539 15096 6939 45430 45458
aspath: 64539 15096 6939 1299 3257 34984 34984 34984 34984 34984 51174
aspath: 64539 15096 6939 286 34984 16135 16135 {16135}
aspath: 64539 15096 6939 7545 7545 136001
aspath: 64539 15096 6939 6762 3269 20746
aspath: 64539 15096 6939 7018 3379
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Line break at the end of the message is implicit for zlog_* and flog_*,
don't put it in the string. Mid-message line breaks are currently
unsupported. (LF is "end of message" in syslog.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Problem seen that if "import vrf route-map RMAP" was entered
without any vrfs being imported, the configuration was displayed
as "route-map vpn import RMAP". Additionally, if "import vrf
route-map" was entered without specifying a route-map name,
the command was accepted and the word "route-map" would be
treated as a vrf name. This fix resolves both of those issues
and also allows deleting the "import vrf route-map" line without
providing the route-map name.
Ticket: CM-28821
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Modify code to use lookup function agg_node_get_prefix()
as the abstraction layer. When we rework bgp_node to
bgp_dest this will allow us to greatly limit the amount
of work needed to do that.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Future work needs the ability to specify a
const struct prefix value. Iterate into
bgp a bit to get this started.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Defer the grabbing of the prefix for as long as is possible.
This is a long term rework of how we access the `struct bgp_node`
to only use accessor functions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
More second order effects of cleaning up rn usage
in bgp. Sprinkle the fairy const's all over the place.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify more code to use `const struct prefix` throughout
bgp. This is all prep work for adding an accessor function
for bgp_node to get the prefix and reduce all the places that
code needs to be touched when we get that work done.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Tell the compiler that the prefix is being used for lookups
and it will never change.
Setup for future work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some were converted to bool, where true/false status is needed.
Converted to void only those, where the return status was only false or true.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Convert some status defines for the fsm to an enum
so that we cannot mix and match them in the future.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In PR #6052 which fixes issue #5963 the bgp fsm events
were confused with the bgp fsm status leading
to a bug. Let's start separating those out
so these types of failures cannot just
easily occur.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the EVPN advertise route-map is applied on a copy of the
original path_info and associated attribute, so that if the route-map
has SET clauses, they can operate properly. This closely follows
the model already in use in other route-map application code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
During VRF-to-VRF route leaking, strip any extraneous route targets. This
ensures that source-VRF-specific route targets or route targets that are
internally assigned for the VRF-to-VRF route leaking don't get attached
to the route in the target VRF.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Extended communities like the BGP Route Target can be present multiple
times in a route's path attribute. Ensure that the strip function for a
particular extended community (type and subtype) handles this and
strips all occurrences.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
A BGP update-group is dynamically created to group together a set of peers
such that any BGP updates can be formed just once for the entire group and
only the next hop attribute may need to be modified when the update is sent
out to each peer in the group. The update formation code attempts to
determine as much as possible if the next hop will be set to our own IP
address for every peer in the group. This helps to avoid additional checks
at the point of sending the update (which happens on a per-peer basis) and
also because some other attributes may/could vary depending on whether the
next hop is set to our own IP or not. Resetting the next hop to our own IP
address is the most common behavior for EBGP peerings in the absence of
other user-configured or internal (e.g., for l2vpn/evpn) settings and
peerings on a shared subnet.
The code had a flaw in the multiaccess check to see if there are peers in
the update group which are on a shared subnet as the next hop of the path
being announced - the source peer could itself be in the same update group
and cause the check to give an incorrect result. Modify the check to skip
the source peer so that the check is more accurate.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
this command is missing, compared with 'match ipv6 next-hop' command
available. Adding it by taking into account the backward compatible
effect when supposing that some people have configured acls with name
being an ipv4 address.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The bgp reason code was being reset in bgp_best_selection
by rerunning bgp_path_info_cmp multiple times under certain
receiving patterns of data from peers.
This is the debugs that show this issue:
2020/03/16 19:17:22.523780 BGP: 2001:20:1:1::6 rcvd UPDATE w/ attr: nexthop 20.1.1.6, origin i, metric 600, community 1000:1006, path 20
2020/03/16 19:17:22.523819 BGP: 2001:20:1:1::6 rcvd 20.10.0.6/32 IPv4 unicast
2020/03/16 19:17:22.556168 BGP: 20.1.1.6 rcvd UPDATE w/ attr: nexthop 20.1.1.6, origin i, metric 500, community 1000:1006, path 20
2020/03/16 19:17:22.556209 BGP: 20.1.1.6 rcvd 20.10.0.6/32 IPv4 unicast
2020/03/16 19:17:22.572358 BGP: bgp_process_main_one: p=20.10.0.6/32 afi=IPv4, safi=unicast start
2020/03/16 19:17:22.572408 BGP: 20.10.0.6/32: Comparing path 2001:20:1:1::6 flags 0x410 with path 20.1.1.6 flags 0x410
2020/03/16 19:17:22.572415 BGP: 20.10.0.6/32: path 2001:20:1:1::6 loses to path 20.1.1.6 due to MED 600 > 500
2020/03/16 19:17:22.572422 BGP: 20.10.0.6/32: path 20.1.1.6 is the bestpath from AS 20
2020/03/16 19:17:22.572429 BGP: 20.10.0.6/32: path 20.1.1.6 is the initial bestpath
2020/03/16 19:17:22.572435 BGP: bgp_best_selection: pi 0x5627187c66c0 dmed
2020/03/16 19:17:22.572441 BGP: 20.10.0.6/32: After path selection, newbest is path 20.1.1.6 oldbest was NONE
2020/03/16 19:17:22.572447 BGP: 20.10.0.6/32: path 20.1.1.6 is the bestpath, add to the multipath list
2020/03/16 19:17:22.572453 BGP: 20.10.0.6/32: path 2001:20:1:1::6 has the same nexthop as the bestpath, skip it
2020/03/16 19:17:22.572460 BGP: 20.10.0.6/32: starting mpath update, newbest 20.1.1.6 num candidates 1 old-mpath-count 0 old-cum-bw u0
2020/03/16 19:17:22.572466 BGP: 20.10.0.6/32: comparing candidate 20.1.1.6 with existing mpath NONE
2020/03/16 19:17:22.572473 BGP: 20.10.0.6/32: New mpath count (incl newbest) 1 mpath-change NO all_paths_lb 0 cum_bw u0
Effectively if BGP receives 2 paths it could end up running bgp_path_info_cmp multiple times
and in some situations overwrite the reason selected the first time through.
In this example path selection is run and the MED is the reason for the choice.
Then in bgp_best_selection is run again this time clearing new_select
to NULL before calling path selection for the first time. This second
call into path selection resets the reason, since it is only passing in one
path. So save the last reason selected and restore in this case.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This scenario has been seen against microtik virtual machine with
bfd enabled. When remote microtik bgp reestablishes the bgp session
after a bgp reset, the bgp establishment comes first, then bfd is
initialising.
The second point is true for microtik, but not for frrouting, as the
frrouting, when receiving bfd down messages, is not at init state.
Actually, bfd state is up, and sees the first bfd down packet from
bfd as an issue. Consequently, the BGP session is cleared.
The fix consists in resetting the BFD session, only if bfd status is
considered as up, once BGP comes up.
That permits to align state machines of both local and remote bfd.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When bgp is updated with local source, the bgp session is reset; bfd
also must be reset. The bgp_stop() handler handles all kind of
unexpected failures, so the placeholder to deregister from bfd should be
ok, providing that when bgp establishes, a similar function in bgp will
recreate bfd context.
Note that the bfd session is not reset on one specific case, where BFD
down event is the last reset. In that case, we must let BFD to monitor
the link.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If the default BGP instance is importing routes from another instance and
the latter has a router-id update, the update handler needs to handle the
default instance in a special way.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-26007
Reviewed By: CCR-9108
Testing Done: Detailed verification in 3.x
Ensure that the late registration for NHT done for IPv4 route exchange
over IPv6 GUA peering is not attempted for peer-groups, only for peers.
Fixes: "bgpd: Late registration of Extended Nexthop"
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
This scenario has been seen against microtik virtual machine with bfd
enabled. When remote microtik bgp reestablishes the bgp session after a
bgp reset, the bgp establishment comes first, then bfd is initialising.
The second point is true for microtik, but not for frrouting, as the
frrouting, when receiving bfd down messages, is not at init state.
Actually, bfd state is up, and sees the first bfd down packet from bfd
as an issue. Consequently, the BGP session is cleared.
The fix consists in resetting the BFD session, once BGP comes up. That
permits to align state machines of both local and remote bfd.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When bgp is updated with local source, the bgp session is reset; bfd
also must be reset. The bgp_stop() handler handles all kind of
unexpected failures, so the placeholder to deregister from bfd should be
ok, providing that when bgp establishes, a similar function in bgp will
recreate bfd context.
Note that the bfd session is not reset on one specific case, where BFD
down event is the last reset. In that case, we must let BFD to monitor
the link.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This fixes a linking issue on Fedora Rawhide:
/usr/bin/ld: bgpd/libbgp.a(bgp_flowspec.o):/home/ruben/src/frr/./bgpd/bgp_attr_evpn.h:37: multiple definition of `eth_tag_id'; bgpd/bgp_btoa-bgp_btoa.o:/home/ruben/src/frr/./bgpd/bgp_attr_evpn.h:37: first defined here
collect2: error: ld returned 1 exit status
Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
We were using XMALLOC for these, and only initializing the refcount to 0
on one of them. Let's just use XCALLOC instead...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
RCA: When doppelganger still around and clear bgp is issued
there are chances of peer getting deleted and next pointer
is a freed peer pointer.
Fix: Pass address of nnode to get next safe peer pointer.
Signed-off-by: Santosh P K <sapk@vmware.com>
It's been a year search and destroy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We already have a generic support for add/sub in route-maps. It's already
handled in route_value_compile().
Just convert to string (allow passing (-) minus sign) - works like expected.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Some code in bgp_route_refresh_receive was spread across several
lines because of an end of line commit. Move comment to a place
to allow better formating.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
SA has found a case where we did a table lookup of a rn( and
associated lock of that node ) where we did not unlock it.
Unlock the node before moving on to the next one.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Currently During bgp open collision resolution if both
the router-id's are the same, we correctly follow
the RFC and close the connection. The problem is of course
that there is no notification of the error in configuration
to the end user other than a subtle open debug message.
Explicitly call out the miss-configuration as an error message
as that this miss-config took several hours of debugging to notice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
annie# show bgp ipv4 uni summ
BGP router identifier 192.168.201.136, local AS number 64539 vrf-id 0
BGP table version 22458946
RIB entries 1458006, using 178 MiB of memory
Peers 4, using 68 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
45.33.5.119 4 0 0 0 0 0 0 never Active
65.19.134.122 4 15096 4611832 108292 0 0 0 6d22h55m 800670
107.13.46.23 4 0 0 0 0 0 0 never Connect
robot(192.168.201.139) 4 64540 11159975 11365599 0 0 0 05w2d05h Connect
Total number of neighbors 4
On very busy systems The column output for MsgRcvd and MsgSent can quickly move past 7 columns.
Add a couple more to allow for even display.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If the peer was shutdown locally, it doesn't show up as admin. shutdown.
Instead it's treated as "Waiting for peer OPEN".
The same applies to when the peer reaches maximum-prefix count.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
During route-map processing we return an enum, the rpki
code was doing some extra gyrations that were unnecessary.
Simplify.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Prefix-SID path attribute Label-index TLV (type-1) is
used by SR-MPLS. And Label-index TLV MUST ignored
if that path attribute is append on non-Labeled-unicast
UPDATE message described on [ref1].
There is a problem case exist arround this implementation.
This commit fix that.
Before this commit,
unfortunally, setting Label-Index value is skipped at somecases.
because, Label-Index TLV implementation check the AFI/SAFI pair.
by mp_update variable that is set by bgp_mp_reach_parse function.
if MP_REACH_NLRI is present after PREFIX_SID, bgp_attr_psid_sub
function can't understand AFI/SAFI pair. and the order of each
path attributes is never no-deterministic thing for receiver.[ref2]
In this commit,
I re-located checking code of AFI/SAFI pair after path-attr loop.
[ref1](https://tools.ietf.org/html/draft-ietf-idr-bgp-prefix-sid-27#section-3.2)
> The Originator SRGB TLV may only appear in a BGP Prefix-SID attribute
> attached to IPv4/IPv6 Labeled Unicast prefixes ([RFC8277]). It MUST
> be ignored when received for other BGP AFI/SAFI combinations.
[ref2](https://tools.ietf.org/html/rfc4271#section-5)
> The sender of an UPDATE message SHOULD order path attributes within
> the UPDATE message in ascending order of attribute type. The
> receiver of an UPDATE message MUST be prepared to handle path
> attributes within UPDATE messages that are out of order.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Prefix-SID is desined to capable for TLV array.
That behaviour is important to support SR-MPLS feature
and that supported by previous PR #5418.
In that implementation, but if some additional data
(such as next BGP update message or next path attributes)
was present after Prefix-SID path attribute,
bgpd will parse that addional data as Prefix-SID TLV.
This commit fix that. before this commit, loop condition
is determed by stream is readable or not. In more correct
implementatoin, the prefix-sid boundaly should be checked
additonally. the length of Prefix-sid path attribute can
be get by bgp_attr_parse_args.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Override ORIGIN attribute if defined.
E.g.: Cisco and Juniper set ORIGIN for aggregated address
to IGP which is not what rfc4271 says.
This enables the same behavior, optionally.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Track the returned peer_sorted value and use it where
we can and recalculate where necessary.
This is an effort to reduce the amount of work done here.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The act of peer_sort() being called always set this value
even when we are just looking it up. We need to seperate
out the idea of lookup from set.
For those places that this is immediately obvious that
this is a lookup switch over to using this function.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
A route where ESI, GW IP, MAC and Label are all zero at the same time SHOULD
be treat-as-withdraw.
Invalid MAC addresses are broadcast or multicast MAC addresses. The route
MUST be treat-as-withdraw in case of an invalid MAC address.
As FRR support Ethernet NVO Tunnels only.
Route will be withdrawn when ESI, GW IP and MAC are zero or Invalid MAC
Test cases:
1) ET-5 route with valid RMAC extended community
2) ET-5 route no RMAC extended community
3) ET-5 route with Multicast MAC in RMAC extended community
4) ET-5 route with Broadcast MAC in RMAC extended community
Signed-off-by: Kishore Aramalla <karamalla@vmware.com>
Current failed reasons for bgp when you have a peer that
is not online yet is `Waiting for NHT`, even if NHT has
succeeded. Add some code to differentiate this.
eva# show bgp ipv4 uni summ failed
BGP router identifier 192.168.201.135, local AS number 3923 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
192.168.44.1 0 0 never Waiting for NHT
192.168.201.139 0 0 never Waiting for Open to Succeed
Total number of neighbors 2
eva#
eva# show bgp nexthop
Current BGP nexthop cache:
192.168.44.1 invalid, peer 192.168.44.1
Must be Connected
Last update: Mon Feb 10 19:05:19 2020
192.168.201.139 valid [IGP metric 0], #paths 0, peer 192.168.201.139
So 192.168.201.139 is a peer for a connected route that has not been
created on .139, while 44.1 nexthop tracking has not succeeded yet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
RCA:
When we install IPv6 prefix imported from EVPN RT-5 in vrf, nexthop of the IPv6
route should be IPv4 mapped IPv6 address. In function
install_evpn_route_entry_in_vrf, we generate a new attribute with IPv4 mapped
IPv6 nexthop, but we use parent->attr while creating the actual route.
Thus, Ipv4 nexthop is assigned to this route.
Because of this incorrect nexthop, we observed a crash in function
update_ipv6nh_for_route_install.
Fix:
Pass the new attribute with Ipv4 mapped Ipv6 nexthop to
bgp_create_evpn_bgp_path_info
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
For some reason we are getting a compile error around a variable I didn't
touch in the other commits. Make it happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is no need to have a temp variable to then store that
data in another temporary variable.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The new_afi and afi were being used over and over. Switch
to the end result we want and just use that from the get go.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The creation of a prefix pointer is unnecessary. Save the
prefix as part of the actual data structure. This will
reduce the data needed by 8 bytes per nexthop stored.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
According to https://tools.ietf.org/html/rfc7606 some of the attributes
MUST be handled as "treat-as-withdraw" approach.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
bgp flowspec packets are being forged correctly. There is no need to
check for bgp length, as the bgp nlri length is checked at reception.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There is no need for a call into get_afi_safi_str for the
json side since we add it based upon the afi safi str below.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The coverity SA believes that the regex value can possibly
be NULL. Not possible so let's make it happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If you have enums handled in a switch adding a default case
makes it fun to fix when new stuff is added later. Remove.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For Graceful restart clients have to send GR capabilities
library functions are added to encode capabilities and
also for zebra to decode client capabilities.
Co-authored-by: Santosh P K <sapk@vmware.com>
Co-authored-by: Soman K S <somanks@vmware.com>
Signed-off-by: Santosh P K <sapk@vmware.com>
Two of the evpn show commands with json option has memory leak.
1) show bgp l2vpn evpn route vni all json
2) show bgp l2vpn evpn route esi json
Before fix:
----------
Executed 'show bgp l2vpn evpn route vni all json' multiple times
used ordinary blocks continue to increase.
Note at the time of show command capture there were 22 evpn routes
in vni evpn route table.
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 9152 KiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 7300 KiB
Free small blocks: 1760 bytes
Free ordinary blocks: 1852 KiB
Ordinary blocks: 880
Small blocks: 51
Holding blocks: 0
Ticket:CM-27920
Reviewed By:
Testing Done:
After fix:
---------
Executed 'show bgp l2vpn evpn route vni all json' multiple times
Used ordinary blocks remains low.
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 8356 KiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 6492 KiB
Free small blocks: 1840 bytes
Free ordinary blocks: 1864 KiB
Ordinary blocks: 939
Small blocks: 52
Holding blocks: 0
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Found memory leak in json output of evpn's route
commands.
After executing 'show bgp l2vpn evpn route type prefix json'
and 'show bgp l2vpn evpn route type macip json' few times
(6 times) with more than 600 routes in total seeing
memory footprint for bgpd continue to grow.
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 12 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 8390 KiB
Free small blocks: 1760 bytes
Free ordinary blocks: 3762 KiB
Ordinary blocks: 1161
Small blocks: 51
Holding blocks: 0
Ticket:CM-27920
Testing Done:
After fix:
excute few times,
'show bgp l2vpn evpn route type prefix json'
and 'show bgp l2vpn evpn route type macip json'
commands where used ordinary blocks (uordblks) is
in steady state.
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 9968 KiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 6486 KiB
Free small blocks: 1984 bytes
Free ordinary blocks: 3482 KiB
Ordinary blocks: 1110
Small blocks: 54
Holding blocks: 0
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 10100 KiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 6488 KiB
Free small blocks: 1984 bytes
Free ordinary blocks: 3612 KiB
Ordinary blocks: 1113
Small blocks: 54
Holding blocks: 0
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* While the Deferral timer is running, signal route update pending
(ZEBRA_CLIENT_ROUTE_UPDATE_PENDING) from BGPD to Zebra.
* After expiry of the Deferral timer, the deferred routes are processed.
When the deferred route_list becomes empty, End-of-Rib is send to the
peer and route processing complete message (ZEBRA_CLIENT_ROUTE_UPDATE_COMPLETE)
is sent to Zebra. So that Zebra would delete any stale routes still
present in the rib.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Added CLI commands to update rib-stale-time, running in
Cmd : "bgp gaceful-restart rib-stale-time (1-3000)".
Cmd : "no bgp gaceful-restart rib-stale-time".
* Integrating the hooks function for signalling from BGPD
to ZEBRA to ZEBRA to enable or disable GR feature in ZEBRA
depending on bgp per peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
*Adding helper caller hooks function for signalling from BGPD
to ZEBRA to enable or disable GR feature in ZEBRA depending
on bgp per peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
*Adding helper function for signalling from BGPD to ZEBRA to
enable or disable GR feature in ZEBRA depending on bgp per
peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Data Structures, function declaration and Macros forSignalling
from BGPD to ZEBRA to enable or disable GR feature in ZEBRA
depending on bgp per peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
*After a restarting router comes up and the bgp session is
successfully established with the peer. If the restarting
router doesn’t have any route to send, it send EOR to
the peer immediately before receiving updates from its peers.
*Instead the restarting router should send EOR, if the
selection deferral timer is not running OR count of eor received
and eor required are matches then send EOR.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
BGP disable EOR sending is a useful command for testing various
scenarios of BGP graceful restart.
* Added the hidden CLI command : bgp graceful-restart disable-eor
* The CLI will not be displayed in "show running-config" and will not
be stored in configuration file.
* When enabled, EOR will not be sent to peer
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Signed-off-by: Soman K S <somanks@vmware.com>
When the peer router's gr mode had changed from helper/restart
to disable. The local bgp gr router should reset the peer
router's restart-time stored.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
bgp tcp connection.
When the BGP peer is configured between two bgp routes both routers would create
peer structure , when they receive each other’s open message. In this event both
speakers, open duplicate TCP sessions and send OPEN messages on each socket
simultaneously, the BGP Identifier is used to resolve which socket should be closed.
If BGP GR is enabled the old tcp session is dumped and the new session is retained.
So while this transfer of connection is happening, if all the bgp gr config
is not migrated to the new connection, the new bgp gr mode will never get applied.
Fix Summary:
1. Replicate GR configuration from the old session to the new session in bgp_accept().
2. Replicate GR configuration from stub to full-fledged peer in bgp_establish().
3. Disable all NSF flags, clear stale routes (if present), stop restart & stale timers
(if they are running) when the bgp GR mode is changed to “Disabled”.
4. Disable R-bit in cap, if it is not set the received open message.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
BGP GR Neighbor mode is showing the default string as “NotRecieved”,
as the bgp gr neighbour capability was not processed,
since the local mode is “Disable”.
However now it would be changed to “NotApplicable”.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
& GR is enabled.
When GR with deferral is enabled and connected routes are
distributed then in one race condition route node gets added
in to both deferred queue and work queue. If deferred queue
gets processed first then it ends up delete only flag while
leaving the entry in the work queue as it is. When a new update
comes for the same route node next time from peer then it hits
assert. Assert check is added to ensure we don’t add to work queue
again while it is already present.
So, check before adding in to deferred queue if it is already present
in work queue and bail if so.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Changing GR mode on a router needs a session reset from the
SAME router to negotiate new GR capability.
* The present GR implementation needs a session reset after every
new BGP GR mode change.
* When BGP session reset happens due to sending or receiving BGP
notification after changing BGP GR mode, there is no need of
explicit session reset.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* BGP GR Neighbour mode in show command would show as
“NotApplicable”, when local mode is “Disable”. As the bgp
gr neighbour capability was not processed, since the local mode
is “Disable”.
* Minor changes in show Selection Deferral Time.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Selection Deferral Timer for Graceful Restart.
* Added selection deferral timer handling function.
* Route marking as selection defer when update message is received.
* Staggered processing of routes which are pending best selection.
* Fix for multi-path test case.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
and DS.
* Added config commands and data structures for deferral timer
configuration and processing.
Cmd : bgp graceful-restart select-defer-time (0-3600)
Cmd : no bgp graceful-restart select-defertime (0-3600)
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Signed-off-by: Soman K S <somanks@vmware.com>
* Added new show command to show the graceful restart
information for each neighbor.
Cmd: show bgp [<ipv4|ipv6>] neighbors [<A.B.C.D|X:X::X:X|WORD>] graceful-restart
* Changes to show neighbors commands for displaying
graceful restart information.
Cmd :show [ip] bgp [<view|vrf> VIEWVRFNAME] [<ipv4|ipv6>] neighbors [<A.B.C.D|X:X::X:X|
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Changes to the capability sending function to advertise
graceful restart capability in the bgp OPEN message.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Added FSM for peer and global configuration for graceful restart
* Added debug option BGP_GRACEFUL_RESTART for logs specific to
graceful restart processing
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Duplicated domain name capability messages cause memory leak. The amount
of leaked memory is proportional to the size of the duplicated
capabilities. This bug was introduced in 2015.
To hit this, a BGP OPEN message must contain multiple FQDN capabilities.
Memory is leaked when the hostname portion of the capability is of
length 0, but the domainname portion is not, for any of the duplicated
capabilities beyond the first one.
https://tools.ietf.org/html/draft-walton-bgp-hostname-capability-00
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When withdrawing addpaths, adj_lookup was called to find the path that
needed to be withdrawn. It would lookup in the RB tree based on subgroup
pointer alone, often find the path with the wrong addpath ID, and return
null. Only the path highest in the tree sent to the subgroup could be
found, thus withdrawn.
Adding the addpath ID to the sort criteria for the RB tree allows us to
simplify the logic for adj_lookup, and address this problem. We are able
to remove the logic around non-addpath subgroups because the addpath ID
is consistently 0 for non-addpath adj_outs, so special logic to skip
matching the addpath ID isn't required. (As a side note, addpath will
also never use ID 0, so there won't be any ambiguity when looking at the
structure content.)
Signed-off-by: Mitchell Skiba <mskiba@amazon.com>
The vrrpd one conflicts with the standalone vrrpd package; also we're
installing daemons to /usr/lib/frr on some systems so they're not on
PATH.
Signed-off-by: David Lamparter <equinox@diac24.net>
bgpd already supports BGP Prefix-SID path attribute and
there are some sub-types of Prefix-SID path attribute.
This commits makes bgpd to support additional sub-types.
sub-Type-4 and sub-Type-5 for construct the VPNv4 SRv6 backend
with vpnv4-unicast address family.
This path attributes is already supported by Ciscos IOS-XR and NX-OS.
Prefix-SID sub-Type-4 and sub-Type-5 is defined on following
IETF-drafts.
Supports(A-part-of):
- https://tools.ietf.org/html/draft-dawra-idr-srv6-vpn-04
- https://tools.ietf.org/html/draft-dawra-idr-srv6-vpn-05
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
For evpn routes, nexthop and RMAC fileds are synced
in route add to zebra.
In case of EVPN routes display RMAC field in route add
debug log.
Reviewed By:CCR-9381
Testing Done:
BGP: nhop [1]: 27.0.0.11 if 30 VRF 26 RMAC 00:02:00:00:00:2e
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit makes bgpd to support VPNv4's extended
nexthop capability for bgp-capability negotiation
when BGP open messaging.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
uint8_t * cannot be cast to uint32_t * unless the
pointed-to address is aligned according to uint32_t's
alignment rules. And it usually is not.
Signed-off-by: Santosh P K <sapk@vmware.com>
With this change, we are able to set attributes via route-map to the default
route. It's useful in cases where we have two or more spines and we want to
prefer one router over others for leaves. This simplifies configuration instead
of using 'network 0.0.0.0/0' or 'ip route 0.0.0.0/0 ...' and 'redistribute
static' combination.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
uint8_t * cannot be cast to uint32_t * unless the pointed-to address is
aligned according to uint32_t's alignment rules. And it usually is not.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
advertise pip running configuration should
display ip followed by mac parameters value as defined
in cli signature.
advertise-pip is enabled by default, when displaying the
running configuration, there is '\n' added after
ip and mac parameters which was not guarded around
the non-default parameters.
Currently, for every bgp vrf instance it ends up
displaying l2vpn address-family section due to
unguarded newline.
running config:
router bgp 6004 vrf vrf1
!
address-family l2vpn evpn
exit-address-family
!
Ticket:CM-26964
Testing Done:
With fix when only 'router bgp 6004 vrf vrf1' configured,
running config looks like:
!
router bgp 6004 vrf vrf1
!
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit is about #5629 's issue.
Before this commit, bgpd creates format string of
bgp-route-distinguisher as int32, but correctly format
is uint32. current bgpd's sh-run-cli generate int32 rd,
so if user sets the rd as 1:4294967295(0x1:0xffffffff),
sh-run cli generates 1: -1 as running-config. This
commit fix that issue.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Guess what - for a bounds check to work, it has to happen *before* you
read the data. We were trusting the attribute field received in a prefix
SID attribute and then checking if it was correct afterwards, but if was
wrong we'd crash before that.
This fixes the problem, and adds additional paranoid bounds checks.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We should send a NOTIFICATION message with the Error Code Finite State
Machine Error if we receive NOTIFICATION in OpenSent state
as defined in https://tools.ietf.org/html/rfc4271#section-8.2.2
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
bgp nexthop cache update triggers RA for global ipv6
nexthop update.
In case of blackhole route type the outgoing interface
information is NULL which leads to bgpd crash.
Skip sending RA for blackhole nexthop type.
Ticket:CM-27299
Reviewed By:
Testing Done:
Configure bgp neighbor over global ipv6 address.
Configure static blackhole route with prefix includes
connected ipv6 global address.
Upon link flap, zebra sends nexthop update to bgp.
Bgp nexthop cache skips sending RA for blackhole nexthop type.
router bgp 65002
bgp router-id 91.189.93.190
...
neighbor 2001:67c:1360::b peer-group internal
static route:
ipv6 route 2001:67c:1360::/48 Null0 254
iface rowlink.4010
address 91.189.93.190/32
address 2001:67c:1360::a/128
Trigger ifdown rowlink.4010; ifup rowlink.4010
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
It's quite confusing when you see this:
```
exit1-debian-9(config-router)# bgp listen
listen Configure BGP defaults
```
And:
```
exit1-debian-9(config-router)# no bgp listen
listen unset maximum number of BGP Dynamic Neighbors that can be created
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
When passing a v4 multicast route to a peer send
the v4 nexthop as a preferred methodology.
Fixes: #5582
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fixes:
```
exit1-debian-9(config-router)# no bgp listen range 192.168.10.0/24 peer-group TEST
% Peer-group does not exist
exit1-debian-9(config-router)#
```
Closes https://github.com/FRRouting/frr/issues/5570
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
'NOTIFICATION' string in this message incorrectly implies a BGP
Notification message was the cause of this log. Removing it to
reduce confusion and replacing with function name.
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
Before it was:
```
exit1-debian-9# show ip bgp regexp ^200a
Invalid character in as-path access-list ^200a
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
rfapi_descriptor_rfp_utils.c is already built into libbgp.a and these
include paths have no effect at all.
Signed-off-by: David Lamparter <equinox@diac24.net>
This should keep backward compatibility when bgp show-hostname is
enabled/disabled.
Also show the real originator IP instead of showing fqdn of the route
reflector.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
To keep the calling code agnostic of the DNS resolver libary used, pass
a strerror-style string instead of a status code that would need extra
handling.
Signed-off-by: David Lamparter <equinox@diac24.net>
libc-ares doesn't do IP literals, so we have to do that before running
off to do DNS. Since this isn't BMP specific, move to lib/ so NHRP can
benefit too.
Signed-off-by: David Lamparter <equinox@diac24.net>
Problem: BGP peer pointer is present in keepalive hash table
even when socket has been closed in some race condition.
When keepalive tries to access this peer it asserts.
RCA: Below sequence of events causing assert.
1. Config node peer has went down due to TCP reset
it's FD has been set to -1.
2. Doppelganger peer goes to established state and it has
been added to peer hash table for keepalive when it was
in openconfirm state.
3. Config node parameters including FD are exchanged with
doppelganger. Doppelganger will not have FD -1.
4. Doppelganger will be deleted as part of this it will
remove it from the keepalive peer hash table.
5. While removing from hash table it tries to acquire lock.
6. During this time keepalive thread has the lock and in
a loop trying to send keepalive for peers in hash table.
7. It tries to send keepalive for doppelganger peer with fd
set to -1 and asserts.
Signed-off-by: Santosh P K <sapk@vmware.com>
* Move VNC interning to the appropriate spot
* Use existing bgp_attr_flush_encap to free encap sets
* Assert that refcounts are correct before exiting to keep the demons
contained in their fiery prison
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Use a per-nexthop flag to indicate the presence of labels; add
some utility zapi encode/decode apis for nexthops; use the zapi
apis more consistently.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This moves all the DFLT_BGP_* stuff over to the new defaults mechanism.
bgp_timers_nondefault() added to get better file-scoping.
v2: moved everything into bgp_vty.c so that the core BGP code is
independent of the CLI-specific defaults. This should make the future
northbound conversion easier.
Signed-off-by: David Lamparter <equinox@diac24.net>
There's no good reason to have this in bgpd.c; it's just there
historically. Move it to bgp_vty.c where it makes more sense.
Signed-off-by: David Lamparter <equinox@diac24.net>
Add a bit of code to allow hostname lookup failure to
not stall bmp communication.
Fixes: #5382
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The function bgp_table_range_lookup attempts to walk down
the table node data structures to find a list of matching
nodes. We need to guard against the current node from
not matching and not having anything in the child nodes.
Add a bit of code to guard against this.
Traceback that lead me down this path:
Nov 24 12:22:38 frr bgpd[20257]: Received signal 11 at 1574616158 (si_addr 0x2, PC 0x46cdc3); aborting...
Nov 24 12:22:38 frr bgpd[20257]: Backtrace for 11 stack frames:
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7fd1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(zlog_signal+0x113) [0x7fd1ad445db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(+0x70e65) [0x7fd1ad465e65]ad445db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libpthread.so.0(+0xf5f0) [0x7fd1abd605f0]45db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x63) [0x46cdc3]445957]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib64/frr/modules/bgpd_rpki.so(+0x4f0d) [0x7fd1a934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(thread_call+0x60) [0x7fd1ad4736e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(frr_run+0x128) [0x7fd1ad443ab8]e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd(main+0x2e3) [0x41c043]1ad443ab8]e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd1ab9a5505]f0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd() [0x41d9bb]main+0xf5) [0x7fd1ab9a5505]f0d]57]
Nov 24 12:22:38 frr bgpd[20257]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:351#012; aborting...
Nov 24 12:22:38 frr watchfrr[6779]: [EC 268435457] bgpd state -> down : read returned EOF
Nov 24 12:22:38 frr zebra[5952]: [EC 4043309116] Client 'bgp' encountered an error and is shutting down.
Nov 24 12:22:38 frr zebra[5952]: zebra/zebra_ptm.c:1345 failed to find process pid registration
Nov 24 12:22:38 frr zebra[5952]: client 15 disconnected. 0 bgp routes removed from the rib
I am not really 100% sure what we are really trying to do with this function, but we must
guard against child nodes not having any data.
Fixes: #5440
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When dumping a large bit of table data via bgp_show_table
and if there is no information to display for a particular
`struct bgp_node *` the data allocated via json_object_new_array()
is leaked. Not a big deal on small tables but if you have a full
bgp feed and issue a show command that does not match any of
the route nodes ( say `vtysh -c "show bgp ipv4 large-community-list FOO"`)
then we will leak memory.
Before code change and issuing the above show bgp large-community-list command 15-20 times:
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: > 2GB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: > 2GB
Free small blocks: 31 MiB
Free ordinary blocks: 616 KiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
After:
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 924 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 558 MiB
Free small blocks: 26 MiB
Free ordinary blocks: 340 MiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
Please note the 340mb of free ordinary blocks is from the fact I issued a
`show bgp ipv4 uni json` command and generated a large amount of data.
Fixes: #5445
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Early exits without appropriate cleanup were causing obscure double
frees and other issues later on in the attribute parsing code. If we
return anything except a hard attribute parse error, we have cleanup and
refcounts to manage.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Without with fix we can't delete large-community-list using
no bgp large-community-list standard WORD, but no bgp large-community-list WORD
Let's keep this identical what we have with expanded lists as well.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
This patch allows using sequence numbers for community lists. We already have
this for prefix-lists and access-lists.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
In rare situations, the local route in a VNI may not get selected as the
best route. One situation is during a race between bgp and zebra which
was addressed in a prior commit. This change addresses another situation
where due to a change of tunnel IP, it is possible that a received route
may be selected as the best route if the path selection needs to take
next hop IPs into consideration. This is a pretty convoluted scenario,
but the code should handle it and delete and withdraw the local route
as well as (re)install the received route.
Ticket: CM-24114
Reviewed By: CCR-9487
Testing Done:
1. Manual tests - note, problem is not readily reproducible
2. evpn-smoke - results documented in the ticket
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
If a peer advertised capability addpath in their OPEN, but sent us an
UPDATE without an ADDPATH, we overflow a heap buffer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This changeset follows the PR
https://github.com/FRRouting/frr/pull/5334
Above PR adds nexthop tracking support for EVPN RT-5 nexthops.
This route is marked VALID only if the BGP route has a valid nexthop.
If the EVPN peer is an EBGP pee and "disable_connected_check" flag is not set,
"connected" check is performed for the EVPN nexthop.
But, usually EVPN nexthop is not the BGP peering address, but the VTEP address.
Also, NEXTHOP_UNCHANGED flag is enabled by default for EVPN.
As a result, in a common deployment for EVPN, EVPN nexthop is not connected.
Thus, adding a fix to remove the "connected" check for EVPN nexthops.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
Instead of CMD_WARNING, use CMD_WARNING_CONFIG_FAILED
for any mis-configuration scenario.
Testing Done:
TOR(config)# router bgp 5548
TOR(config-router)# address-family l2vpn evpn
TOR(config-router-af)# no advertise-pip
This command is supported under L3VNI BGP EVPN VRF
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
when a pip is disabled or mac-vlan is not present
use anycast MAC as RMAC value.
Ticket:CM-26923
Reviewed By:CCR-9417
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
For self type-2 routes, do not assign system-rmac
as attribute RMAC value if advertise-pip is disable
or macvlan is not present.
Ticket:CM-26923
Reviewed By:CCR-9397
Testing Done:
pip is disabled under bgp vrf2 instance.
Trigger frr-restart.
Before fix:
*> [2]:[0]:[48]:[00:02:00:00:00:2e]:[32]:[45.0.4.4]
36.0.0.11 32768 i
ET:8 RT:5546:1004 RT:5546:4002 Rmac:00:02:00:00:00:2e
After fix:
*> [2]:[0]:[48]:[00:02:00:00:00:2e]:[32]:[45.0.4.4]
36.0.0.11 32768 i
ET:8 RT:5546:1004 RT:5546:4002 Rmac:44:38:39:ff:ff:01
TOR# ifquery vlan1004
auto vlan1004
iface vlan1004
address 45.0.4.4/24
vlan-id 1004
vrf vrf2
VNI: 4002 (known to the kernel)
Type: L3
Tenant VRF: vrf2
RD: 45.0.6.4:3
Originator IP: 36.0.0.11
Advertise-pip: Yes
System-IP: 27.0.0.11
System-MAC: 00:02:00:00:00:2e
Router-MAC: 44:38:39:ff:ff:01
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>